Repository: PaddlePaddle/PALM
Branch: master
Commit: 2555c0e2a5fa
Files: 98
Total size: 443.8 KB

Directory structure:
gitextract_o6rx2q6_/

├── .gitignore
├── README.md
├── README_zh.md
├── customization_cn.md
├── examples/
│   ├── classification/
│   │   ├── README.md
│   │   ├── download.py
│   │   ├── evaluate.py
│   │   └── run.py
│   ├── matching/
│   │   ├── README.md
│   │   ├── download.py
│   │   ├── evaluate.py
│   │   ├── process.py
│   │   └── run.py
│   ├── mrc/
│   │   ├── README.md
│   │   ├── download.py
│   │   ├── evaluate.py
│   │   └── run.py
│   ├── multi-task/
│   │   ├── README.md
│   │   ├── download.py
│   │   ├── evaluate_intent.py
│   │   ├── evaluate_slot.py
│   │   ├── joint_predict.py
│   │   ├── predict_intent.py
│   │   ├── predict_slot.py
│   │   ├── process.py
│   │   └── run.py
│   ├── predict/
│   │   ├── README.md
│   │   ├── download.py
│   │   ├── evaluate.py
│   │   └── run.py
│   ├── tagging/
│   │   ├── README.md
│   │   ├── download.py
│   │   ├── evaluate.py
│   │   └── run.py
│   └── train_with_eval/
│       ├── README.md
│       ├── download.py
│       ├── evaluate.py
│       └── run.py
├── paddlepalm/
│   ├── __init__.py
│   ├── _downloader.py
│   ├── backbone/
│   │   ├── README.md
│   │   ├── __init__.py
│   │   ├── base_backbone.py
│   │   ├── bert.py
│   │   ├── ernie.py
│   │   └── utils/
│   │       ├── __init__.py
│   │       └── transformer.py
│   ├── distribute/
│   │   ├── __init__.py
│   │   └── reader.py
│   ├── downloader.py
│   ├── head/
│   │   ├── __init__.py
│   │   ├── base_head.py
│   │   ├── cls.py
│   │   ├── match.py
│   │   ├── mlm.py
│   │   ├── mrc.py
│   │   └── ner.py
│   ├── lr_sched/
│   │   ├── __init__.py
│   │   ├── base_schedualer.py
│   │   ├── slanted_triangular_schedualer.py
│   │   └── warmup_schedualer.py
│   ├── multihead_trainer.py
│   ├── optimizer/
│   │   ├── __init__.py
│   │   ├── adam.py
│   │   └── base_optimizer.py
│   ├── reader/
│   │   ├── __init__.py
│   │   ├── base_reader.py
│   │   ├── cls.py
│   │   ├── match.py
│   │   ├── mlm.py
│   │   ├── mrc.py
│   │   ├── seq_label.py
│   │   └── utils/
│   │       ├── __init__.py
│   │       ├── batching4bert.py
│   │       ├── batching4ernie.py
│   │       ├── mlm_batching.py
│   │       ├── mrqa_helper.py
│   │       └── reader4ernie.py
│   ├── tokenizer/
│   │   ├── __init__.py
│   │   ├── bert_tokenizer.py
│   │   └── ernie_tokenizer.py
│   ├── trainer.py
│   └── utils/
│       ├── __init__.py
│       ├── basic_helper.py
│       ├── config_helper.py
│       ├── plot_helper.py
│       ├── print_helper.py
│       ├── reader_helper.py
│       ├── saver.py
│       └── textprocess_helper.py
├── setup.cfg
├── setup.py
└── test/
    ├── test2/
    │   ├── config.yaml
    │   ├── run.py
    │   └── run.sh
    └── test3/
        ├── config.yaml
        ├── run.py
        └── run.sh

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
*.pyc
paddlepalm.egg-info
data
__pycache__
*egg-info
pretrain_model
pretrain
output*
output_model
build
dist
paddle_palm.egg-info
mrqa_output
*.log


================================================
FILE: README.md
================================================
# PaddlePALM

English | [简体中文](./README_zh.md)

PaddlePALM (PArallel Learning from Multi-tasks) is a fast, flexible, extensible and easy-to-use NLP large-scale pretraining and multi-task learning framework. PaddlePALM is a high level framework **aiming at fastly developing high-performance NLP models**. 

With PaddlePALM, it is easy to achieve effecient exploration of robust learning of NLP models with multiple auxilary tasks. For example, based on PaddlePALM, the produced robust MRC model, [D-Net](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/Research/MRQA2019-D-NET), has achieved **the 1st place** in [EMNLP2019 MRQA](https://mrqa.github.io) track.

<p align="center">
	<img src="https://tva1.sinaimg.cn/large/006tNbRwly1gbjkuuwrmlj30hs0hzdh2.jpg" alt="Sample"  width="300" height="333">
	<p align="center">
		<em>MRQA2019 Leaderboard</em>
	</p>
</p>

Beyond the research scope, PaddlePALM has been applied on **Baidu Search Engine** to seek for more accurate user query understanding and answer mining, which implies the high reliability and performance of PaddlePALM.

#### Features:

- **Easy-to-use:** with PALM, *8 steps* to achieve a typical NLP task. Moreover, all basic components (e.g., the model backbone, dataset reader, task output head, optimizer...) have been decoupled, which allows the replacement of any component to other candidates with quite minor changes of your code. 
- **Built-in Popular NLP Backbones and Pre-trained models:** multiple state-of-the-art general purpose model architectures and pretrained models (e.g., BERT,ERNIE,RoBERTa,...) are built-in. 
- **Easy to play Multi-task Learning:** only one API is needed for jointly training of several tasks with parameters reusement.
- **Support train/eval with Multi-GPUs:** automatically recognize and adapt to multiple gpus mode to accelerate training and inference.
- **Pre-training friendly:** self-supervised tasks (e.g., mask language model) are built-in to facilitate pre-training. Easy to train from scratch.
- **Easy to Customize:** support customized development of any component (e.g, backbone, task head, reader and optimizer) with reusement of pre-defined ones, which gives developers high flexibility and effeciency to adapt for diverse NLP scenes. 

You can easily re-produce following competitive results with minor codes, which covers most of NLP tasks such as classification, matching, sequence labeling, reading comprehension, dialogue understanding and so on. More details can be found in `examples`.

<table>
  <tbody>
    <tr>
      <th><strong>Dataset</strong>
        <br></th>
      <th colspan="2"><center><strong>chnsenticorp</strong></center></th>
      <th colspan="2"><center><strong>Quora Question Pairs matching</strong><center></th>
      <th colspan="1"><strong>MSRA-NER<br>(SIGHAN2006)</strong></th>
      <th colspan="2"><strong>CMRC2018</strong></th>
    </tr>
    <tr>
      <td rowspan="2">
        <p>
          <strong>Metric</strong>
          <br></p>
      </td>
      <td colspan="1">
        <center><strong>accuracy</strong></center>
        <br></td>
      <td colspan="1">
        <strong>f1-score</strong>
        <strong></strong>
        <br></td>
      <td colspan="1">
        <center><strong>accuracy</strong></center>
        <br></td>
      <td colspan="1">
        <strong>f1-score</strong>
        <strong></strong>
        <br></td>
      <td colspan="1">
        <strong>f1-score</strong>
        <strong></strong>
        <br></td>
      <td colspan="1">
        <strong>em</strong>
        <br></td>
      <td colspan="1">
        <strong>f1-score</strong>
        <br></td>
    </tr>
    <tr>
      <td colspan="2" width="">
        <strong>test</strong>
        <br></td>
      <td colspan="2" width="">
        <strong>test</strong>
        <br></td>
      <td colspan="1" width="">
        <strong>test</strong>
        <br></td>
      <td colspan="2" width="">
        <strong>dev</strong>
        <br></td>
    </tr>
    <tr>
      <td><strong>ERNIE Base</strong></td>
      <td>95.8</td>
      <td>95.8</td>
      <td>86.2</td>
      <td>82.2</td>
      <td>99.2</td>
      <td>64.3</td>
      <td>85.2</td>
    </tr>

  </tbody>
</table>


## Overview

<p align="center">
	<img src="https://tva1.sinaimg.cn/large/0082zybply1gbyo8d4ltoj31ag0n3tby.jpg" alt="Sample"  width="600px" height="auto">
	<p align="center">
		<em>Architecture Diagram</em>
	</p>
</p>

PaddlePALM is a well-designed high-level NLP framework. You can efficiently achieve **supervised learning, unsupervised/self-supervised learning, multi-task learning and transfer learning** with minor codes based on PaddlePALM. There are three layers in PaddlePALM architecture, i.e., component layer, trainer layer and high-level trainer layer from bottom to top. 

In component layer, PaddlePALM supplies 6 **decoupled** components to achieve a NLP task. Each component contains rich `pre-defined` classes and a `Base` class. Pre-defined classes are aiming at typical NLP tasks, and the base class is to help users develop a new Class (based on pre-defined ones or from the base). 

The trainer layer is to establish a computation graph with selected components and do training and predicting. The training strategy, model saving and loading, evaluation and predicting procedures are described in this layer. Noted a trainer can only process one task. 

The high-level trainer layer is for complicated learning and inference strategy, e.g., multi-task learning. You can add auxilary tasks to train robust NLP models (improve test set and out-of-domain performance of a model), or jointly training multiple related tasks to gain more performance for each task.

| module | illustration | 
| - | - |
| **paddlepalm** | an open source NLP pretraining and multitask learning framework, built on paddlepaddle. |
| **paddlepalm.reader** | a collection of elastic task-specific dataset readers. |
| **paddlepalm.backbone** | a collection of classic NLP representation models, e.g., BERT, ERNIE, RoBERTa. |
| **paddlepalm.head** | a collection of task-specific output layers. |
| **paddlepalm.lr_sched** | a collection of learning rate schedualers. |
| **paddlepalm.optimizer** | a collection of optimizers. |
| **paddlepalm.downloader** | a download module for pretrained models with configure and vocab files. |
| **paddlepalm.Trainer** | the core unit to start a single task training/predicting session. A trainer is to build computation graph, manage training and evaluation process, achieve model/checkpoint saving and pretrain_model/checkpoint loading.|
| **paddlepalm.MultiHeadTrainer** | the core unit to start a multi-task training/predicting session. A MultiHeadTrainer is built based on several Trainers. Beyond the inheritance of Trainer, it additionally achieves model backbone reuse across tasks, trainer sampling for multi-task learning, and multi-head inference for effective evaluation and prediction. |


## Installation

PaddlePALM support both python2 and python3, linux and windows, CPU and GPU. The preferred way to install PaddlePALM is via `pip`. Just run following commands in your shell.

```bash
pip install paddlepalm
```

### Installing via source

```shell
git clone https://github.com/PaddlePaddle/PALM.git
cd PALM && python setup.py install
```

### Library Dependencies
- Python >= 2.7
- cuda >= 9.0
- cudnn >= 7.0
- PaddlePaddle >= 1.7.0 (Please refer to [this](http://www.paddlepaddle.org/#quick-start) to install)


### Downloading pretrain models
We incorporate many pretrained models to initialize model backbone parameters. Training big NLP model, e.g., 12-layer transformers, with pretrained models is practically much more effective than that with randomly initialized parameters. To see all the available pretrained models and download, run following code in python interpreter (input command `python` in shell):

```python
>>> from paddlepalm import downloader
>>> downloader.ls('pretrain')
Available pretrain items:
  => RoBERTa-zh-base
  => RoBERTa-zh-large
  => ERNIE-v2-en-base
  => ERNIE-v2-en-large
  => XLNet-cased-base
  => XLNet-cased-large
  => ERNIE-v1-zh-base
  => ERNIE-v1-zh-base-max-len-512
  => BERT-en-uncased-large-whole-word-masking
  => BERT-en-cased-large-whole-word-masking
  => BERT-en-uncased-base
  => BERT-en-uncased-large
  => BERT-en-cased-base
  => BERT-en-cased-large
  => BERT-multilingual-uncased-base
  => BERT-multilingual-cased-base
  => BERT-zh-base

>>> downloader.download('pretrain', 'BERT-en-uncased-base', './pretrain_models')
...
```


## Usage

#### Quick Start

8 steps to start a typical NLP training task.

1. use `paddlepalm.reader` to create a *reader* for dataset loading and input features generation, then call `reader.load_data` method to load your training data.
2. use `paddlepalm.backbone` to create a model *backbone* to extract text features (e.g., contextual word embedding, sentence embedding).
3. register your *reader* with your *backbone* through `reader.register_with` method. After this step, your reader is able to yield input features used by backbone.
4. use `paddlepalm.head` to create a task output *head*. This head can provide task loss for training and predicting results for model inference.
5. create a task *trainer* with `paddlepalm.Trainer`, then build forward graph with backbone and task head (created in step 2 and 4) through `trainer.build_forward`.
6. use `paddlepalm.optimizer` (and `paddlepalm.lr_sched` if is necessary) to create a *optimizer*, then build backward through `trainer.build_backward`.
7. fit prepared reader and data (achieved in step 1) to trainer with `trainer.fit_reader` method.
8. load pretrain model with `trainer.load_pretrain`, or load checkpoint with `trainer.load_ckpt` or nothing to do for training from scratch, then do training with `trainer.train`.

For more implementation details, see following demos: 

- [Sentiment Classification](https://github.com/PaddlePaddle/PALM/tree/master/examples/classification)
- [Question Pairs matching](https://github.com/PaddlePaddle/PALM/tree/master/examples/matching)
- [Named Entity Recognition](https://github.com/PaddlePaddle/PALM/tree/master/examples/tagging)
- [SQuAD-like Machine Reading Comprehension](https://github.com/PaddlePaddle/PALM/tree/master/examples/mrc).


#### Multi-task Learning
To run with multi-task learning mode:

1. repeatedly create components (i.e., reader, backbone and head) for each task followed with step 1~5 above. 
2. create empty trainers (each trainer is corresponded to one task) and pass them to create a `MultiHeadTrainer`. 
3. build multi-task forward graph with `multi_head_trainer.build_forward` method.
4. use `paddlepalm.optimizer` (and `paddlepalm.lr_sched` if is necessary) to create a *optimizer*, then build backward through `multi_head_trainer.build_backward`.
5. fit all prepared readers and data to multi_head_trainer with `multi_head_trainer.fit_readers` method.
6. load pretrain model with `multi_head_trainer.load_pretrain`, or load checkpoint with `multi_head_trainer.load_ckpt` or nothing to do for training from scratch, then do training with `multi_head_trainer.train`.

The save/load and predict operations of a multi_head_trainer is the same as a trainer.

For more implementation details with `multi_head_trainer`, see

- [ATIS: joint training of dialogue intent recognition and slot filling](https://github.com/PaddlePaddle/PALM/tree/master/examples/multi-task)

#### Save models

To save models/checkpoints and logs during training, just call `trainer.set_saver` method. More implementation details see [this](https://github.com/PaddlePaddle/PALM/tree/master/examples).

#### Evaluation/Inference
To do predict/evaluation after a training stage, just create another three reader, backbone and head instance with `phase='predict'` (repeat step 1~4 above). Then do predicting with `predict` method in trainer (no need to create another trainer). More implementation details see [this](https://github.com/PaddlePaddle/PALM/tree/master/examples/predict).

If you want to do evaluation during training process, use `trainer.train_one_step()` instead of `trainer.train()`. The `trainer.train_one_step(batch)` achieves to train only one step, thus you can insert evaluation code into any point of training process. The argument `batch` can be fetched from `trainer.get_one_batch`.

PaddlePALM also supports multi-head inference, please reference `examples/multi-task/joint_predict.py`.

#### Play with Multiple GPUs
If there exists multiple GPUs in your environment, you can control the number and index of these GPUs through the environment variable [CUDA_VISIBLE_DEVICES](https://devblogs.nvidia.com/cuda-pro-tip-control-gpu-visibility-cuda_visible_devices/). For example, if 4 GPUs in your enviroment, indexed with 0,1,2,3, you can run with GPU2 only with following commands

```shell
CUDA_VISIBLE_DEVICES=2 python run.py
```

Multiple GPUs should be seperated with `,`. For example, running with GPU2 and GPU3, following commands is refered:

```shell
CUDA_VISIBLE_DEVICES=2,3 python run.py
```

On multi-gpu mode, PaddlePALM will automatically split each batch onto the available cards. For example, if the `batch_size` is set 64, and there are 4 cards visible for PaddlePALM, then the batch_size in each card is actually 64/4=16. Therefore, when running with multiple cards, **you need to ensure that the set batch_size can be divided by the number of cards.**

## License

This tutorial is contributed by [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) and licensed under the [Apache-2.0 license](https://github.com/PaddlePaddle/models/blob/develop/LICENSE).


================================================
FILE: README_zh.md
================================================
# PaddlePALM

[English](./README.md) | 简体中文

PaddlePALM (PArallel Learning from Multi-tasks) 是一个灵活，通用且易于使用的NLP大规模预训练和多任务学习框架。 PALM是一个旨在**快速开发高性能NLP模型**的上层框架。

使用PaddlePALM，可以非常轻松灵活的探索具有多种任务辅助训练的“高鲁棒性”阅读理解模型，基于PALM训练的模型[D-Net](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/Research/MRQA2019-D-NET)在[EMNLP2019国际阅读理解评测](https://mrqa.github.io/)中夺得冠军。

<p align="center">
	<img src="https://tva1.sinaimg.cn/large/006tNbRwly1gbjkuuwrmlj30hs0hzdh2.jpg" alt="Sample"  width="300" height="333">
	<p align="center">
		<em>MRQA2019 排行榜</em>
	</p>
</p>

除了降低NLP研究成本以外，PaddlePALM已被应用于“百度搜索引擎”，有效地提高了用户查询的理解准确度和挖掘出的答案质量，具备高可靠性和高训练/推理性能。

#### 特点:

- **易于使用**：使用PALM， *8个步骤*即可实现一个典型的NLP任务。此外，模型主干网络、数据集读取工具和任务输出层已经解耦，只需对代码进行相当小的更改，就可以将任何组件替换为其他候选组件。
- **支持多任务学习**：*6个步骤*即可实现多任务学习任务。
- **支持大规模任务和预训练**：可自动利用多gpu加速训练和推理。集群上的分布式训练需要较少代码。
- **流行的NLP骨架和预训练模型**：内置多种最先进的通用模型架构和预训练模型(如BERT、ERNIE、RoBERTa等)。
- **易于定制**：支持任何组件的定制开发(例如：主干网络，任务头，读取工具和优化器)与预定义组件的复用，这给了开发人员高度的灵活性和效率，以适应不同的NLP场景。

你可以很容易地用较少的代码复现出很好的性能，涵盖了大多数NLP任务，如分类、匹配、序列标记、阅读理解、对话理解等等。更多细节可以在`examples`中找到。

<table>
  <tbody>
    <tr>
      <th><strong>数据集</strong>
        <br></th>
      <th colspan="2"><center><strong>chnsenticorp</strong></center></th>
      <th colspan="2"><center><strong>Quora Question Pairs matching</strong><center></th>
      <th colspan="1"><strong>MSRA-NER<br>(SIGHAN2006)</strong></th>
      <th colspan="2"><strong>CMRC2018</strong></th>
    </tr>
    <tr>
      <td rowspan="2">
        <p>
          <strong>评价标准</strong>
          <br></p>
      </td>
      <td colspan="1">
        <center><strong>accuracy</strong></center>
        <br></td>
      <td colspan="1">
        <strong>f1-score</strong>
        <strong></strong>
        <br></td>
      <td colspan="1">
        <center><strong>accuracy</strong></center>
        <br></td>
      <td colspan="1">
        <strong>f1-score</strong>
        <strong></strong>
        <br></td>
      <td colspan="1">
        <strong>f1-score</strong>
        <strong></strong>
        <br></td>
      <td colspan="1">
        <strong>em</strong>
        <br></td>
      <td colspan="1">
        <strong>f1-score</strong>
        <br></td>
    </tr>
    <tr>
      <td colspan="2" width="">
        <strong>test</strong>
        <br></td>
      <td colspan="2" width="">
        <strong>test</strong>
        <br></td>
      <td colspan="1" width="">
        <strong>test</strong>
        <br></td>
      <td colspan="2" width="">
        <strong>dev</strong>
        <br></td>
    </tr>
    <tr>
      <td><strong>ERNIE Base</strong></td>
      <td>95.8</td>
      <td>95.8</td>
      <td>86.2</td>
      <td>82.2</td>
      <td>99.2</td>
      <td>64.3</td>
      <td>85.2</td>
    </tr>

  </tbody>
</table>


## Package概览

<p align="center">
	<img src="https://tva1.sinaimg.cn/large/0082zybply1gbyo8d4ltoj31ag0n3tby.jpg" alt="Sample"  width="600px" height="auto">
	<p align="center">
		<em>PALM架构图</em>
	</p>
</p>


PaddlePALM是一个设计良好的高级NLP框架。基于PaddlePALM的轻量级代码可以高效实现**监督学习、非监督/自监督学习、多任务学习和迁移学习**。在PaddlePALM架构中有三层，从下到上依次是component层、trainer层、high-level trainer层。

在组件层，PaddlePALM提供了6个 **解耦的**组件来实现NLP任务。每个组件包含丰富的预定义类和一个基类。预定义类是针对典型的NLP任务的，而基类是帮助用户开发一个新类（基于预定义类或基类）。

训练器层是用选定的构件建立计算图，进行训练和预测。该层描述了训练策略、模型保存和加载、评估和预测过程。一个训练器只能处理一个任务。

高级训练器层用于复杂的学习和推理策略，如多任务学习。您可以添加辅助任务来训练健壮的NLP模型（提高模型的测试集和领域外的性能），或者联合训练多个相关任务来获得每个任务的更高性能。


| 模块 | 描述 | 
| - | - |
| **paddlepalm** | 基于PaddlePaddle框架的high-level NLP预训练和多任务学习框架。 |
| **paddlepalm.reader** | 预置的任务数据集读取与预处理工具。|
| **paddlepalm.backbone** | 预置的主干网络，如BERT, ERNIE, RoBERTa。|
| **paddlepalm.head** | 预置的任务输出层。|
| **paddlepalm.lr_sched** | 预置的学习率规划策略。|
| **paddlepalm.optimizer** | 预置的优化器。|
| **paddlepalm.downloader** | 预训练模型管理与下载模块。|
| **paddlepalm.Trainer** | 任务训练/预测单元。训练器用于建立计算图，管理训练和评估过程，实现模型/检查点保存和pretrain_model/检查点加载等。|
| **paddlepalm.MultiHeadTrainer** | 完成多任务训练/预测的模块。一个MultiHeadTrainer建立在几个Trainer的基础上。实现了模型主干网络跨任务复用、多任务学习、多任务推理等。|

## 安装

PaddlePALM 支持 python2 和 python3, linux 和 windows, CPU 和 GPU。安装PaddlePALM的首选方法是通过`pip`。只需运行以下命令：

```bash
pip install paddlepalm
```

### 通过源码安装

```shell
git clone https://github.com/PaddlePaddle/PALM.git
cd PALM && python setup.py install
```

### 库依赖
- Python >= 2.7
- cuda >= 9.0
- cudnn >= 7.0
- PaddlePaddle >= 1.7.0 (请参考[安装指南](http://www.paddlepaddle.org/#quick-start)进行安装)


### 下载预训练模型
我们提供了许多预训练的模型来初始化模型主干网络参数。用预先训练好的模型训练大的NLP模型，如12层Transformer，实际上比用随机初始化的参数更有效。要查看所有可用的预训练模型并下载，请在python解释器中运行以下代码(在shell中输入命令`python`):

```python
>>> from paddlepalm import downloader
>>> downloader.ls('pretrain')
Available pretrain items:
  => RoBERTa-zh-base
  => RoBERTa-zh-large
  => ERNIE-v2-en-base
  => ERNIE-v2-en-large
  => XLNet-cased-base
  => XLNet-cased-large
  => ERNIE-v1-zh-base
  => ERNIE-v1-zh-base-max-len-512
  => BERT-en-uncased-large-whole-word-masking
  => BERT-en-cased-large-whole-word-masking
  => BERT-en-uncased-base
  => BERT-en-uncased-large
  => BERT-en-cased-base
  => BERT-en-cased-large
  => BERT-multilingual-uncased-base
  => BERT-multilingual-cased-base
  => BERT-zh-base

>>> downloader.download('pretrain', 'BERT-en-uncased-base', './pretrain_models')
...
```


## 使用

#### 快速开始

8个步骤开始一个典型的NLP训练任务。

1. 使用`paddlepalm.reader` 为数据集加载和输入特征生成创建一个`reader`，然后调用`reader.load_data`方法加载训练数据。
2. 使用`paddlepalm.load_data`创建一个模型*主干网络*来提取文本特征(例如，上下文单词嵌入，句子嵌入)。
3. 通过`reader.register_with`将`reader`注册到主干网络上。在这一步之后，reader能够使用主干网络产生的输入特征。
4. 使用`paddlepalm.head`。创建一个任务*head*，可以为训练提供任务损失，为模型推理提供预测结果。
5. 使用`paddlepalm.Trainer`创建一个任务`Trainer`，然后通过`Trainer.build_forward`构建包含主干网络和任务头的前向图(在步骤2和步骤4中创建)。
6. 使用`paddlepalm.optimizer`（如果需要，创建`paddlepalm.lr_sched`）来创建一个*优化器*，然后通过`train.build_back`向后构建。
7. 使用`trainer.fit_reader`将准备好的reader和数据（在步骤1中实现）给到trainer。
8. 使用`trainer.load_pretrain`加载预训练模型或使用 `trainer.load_pretrain`加载checkpoint，或不加载任何已训练好的参数，然后使用`trainer.train`进行训练。

更多实现细节请见示例: 

- [情感分析](https://github.com/PaddlePaddle/PALM/tree/master/examples/classification)
- [Quora问题相似度匹配](https://github.com/PaddlePaddle/PALM/tree/master/examples/matching)
- [命名实体识别](https://github.com/PaddlePaddle/PALM/tree/master/examples/tagging)
- [类SQuAD机器阅读理解](https://github.com/PaddlePaddle/PALM/tree/master/examples/mrc)


#### 多任务学习

多任务学习模式下运行:

1. 重复创建组件（每个任务按照上述第1~5步执行）。
2. 创建空的`Trainer`(每个`Trainer`对应一个任务)，并通过它们创建一个`MultiHeadTrainer`。
3. 使用`multi_head_trainer.build_forward`构建多任务前向图。
4. 使用`paddlepalm.optimizer`（如果需要，创建`paddlepalm.lr_sched`）来创建一个*optimizer*，然后通过` multi_head_trainer.build_backward`创建反向。
5. 使用`multi_head_trainer.fit_readers`将所有准备好的读取器和数据放入`multi_head_trainer`中。
6. 使用`multi_head_trainer.load_pretrain`加载预训练模型或使用 `multi_head_trainer.load_pretrain`加载checkpoint，或不加载任何已经训练好的参数，然后使用`multi_head_trainer.train`进行训练。

multi_head_trainer的保存/加载和预测操作与trainer相同。


更多实现`multi_head_trainer`的细节，请见

- [ATIS: 对话意图识别和插槽填充的联合训练](https://github.com/PaddlePaddle/PALM/tree/master/examples/multi-task)

#### 设置saver

在训练时保存 models/checkpoints 和 logs，调用 `trainer.set_saver` 方法。更多实现细节见[这里](https://github.com/PaddlePaddle/PALM/tree/master/examples)。

#### 评估/预测
训练结束后进行预测和评价, 只需创建额外的reader, backbone和head（重复上面1~4步骤），注意创建时需设`phase='predict'`。 然后使用trainer的`predict`方法进行预测（不需创建额外的trainer）。更多实现细节请见[这里](https://github.com/PaddlePaddle/PALM/tree/master/examples/predict)。

#### 使用多GPU
如果您的环境中存在多个GPU，您可以通过环境变量控制这些GPU的数量和索引[CUDA_VISIBLE_DEVICES](https://devblogs.nvidia.com/cuda-pro-tip-control-gpu-visibility-cuda_visible_devices/)。例如，如果您的环境中有4个gpu，索引为0、1、2、3，那么您可以运行以下命令来只使用GPU2：

```shell
CUDA_VISIBLE_DEVICES=2 python run.py
```

多GPU的使用需要 `,`作为分隔。例如，使用GPU2和GPU3，运行以下命令：

```shell
CUDA_VISIBLE_DEVICES=2,3 python run.py
```

在多GPU模式下，PaddlePALM会自动将每个batch数据分配到可用的GPU上。例如，如果`batch_size`设置为64，并且有4个GPU可以用于PaddlePALM，那么每个GPU中的batch_size实际上是64/4=16。因此，**当使用多个GPU时，您需要确保batch_size可以被暴露给PALM的GPU数量整除**。


## 许可证书

此向导由[PaddlePaddle](https://github.com/PaddlePaddle/Paddle)贡献，受[Apache-2.0 license](https://github.com/PaddlePaddle/models/blob/develop/LICENSE)许可认证。


================================================
FILE: customization_cn.md
================================================
# PALM组件定制化教程

PALM支持对如下组件自定义：

- **head**
  定义一个新的任务输出头，接收来自backbone和reader的输入，输出训练阶段的loss和预测阶段的预测结果。例如：分类任务头，序列标注任务头，机器阅读理解任务头等。
- **backbone**
  定义一个新的主干网络，接收来自reader的文本相关的序列特征输入（如token ids），输出文本的特征向量表示（如词向量、上下文相关的词向量表示、句子向量等）。例如：BERT encoder，CNN encoder等。
- **reader**
  定义一个新的数据集载入与预处理模块，接收来自原始数据集文件的输入（纯文本，原始标签等），输出文本相关的序列特征（如token ids，position ids等）。例如：文本分类数据集处理模块；文本匹配数据集处理模块等。
- **optimizer**
  定义一个新的优化器
- **lr_sched**
  定义一种新的学习率规划策略

PALM中的每个组件均使用类来描述，因此可以允许存在内部记忆（成员变量）。

新增某种类型的组件时，只需要实现该组件类型所在目录下的接口类中所描述的方法。若希望新增的组件跟框架的某个内置组件功能相似，那么实现新增组件时，可以继承自已有的内置组件，且仅对需要变动的方法进行修改即可。

### head自定义

head的接口类（Interface）位于`paddlepalm/head/base_head.py`。

该接口类定义如下：

```python
# -*- coding: UTF-8 -*-
#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import os
import json
import copy

class Head(object):

    def __init__(self, phase='train'):
        """该函数完成一个任务头的构造，至少需要包含一个phase参数。
        注意：实现该构造函数时，必须保证对基类构造函数的调用，以创建必要的框架内建的成员变量。
        Args:
            phase: str类型。用于区分任务头被调用时所处的任务运行阶段，目前支持训练阶段train和预测阶段predict
            """
        self._stop_gradient = {}
        self._phase = phase
        self._prog = None
        self._results_buffer = []

    @property
    def inputs_attrs(self):
        """step级别的任务输入对象声明。

        描述该任务头所依赖的reader、backbone和来自其他任务头的输出对象（每个step获取一次）。使用字典进行描述，
        字典的key为输出对象所在的组件（如’reader‘，’backbone‘等），value为该组件下任务头所需要的输出对象集。
        输出对象集使用字典描述，key为输出对象的名字（该名字需保证在相关组件的输出对象集中），value为该输出对象
        的shape和dtype。当某个输出对象的某个维度长度可变时，shape中的相应维度设置为-1。
        Return:
            dict类型。描述该任务头所依赖的step级输入，即来自各个组件的输出对象。"""
        raise NotImplementedError()

    @property
    def outputs_attr(self):
        """step级别的任务输出对象声明。
        描述该任务头的输出对象（每个step输出一次），包括每个输出对象的名字，shape和dtype。输出对象会被加入到
        fetch_list中，从而在每个训练/推理step时得到实时的计算结果，该计算结果可以传入batch_postprocess方
        法中进行当前step的后处理。当某个对象为标量数据类型（如str, int, float等）时，shape设置为空列表[]，
        当某个对象的某个维度长度可变时，shape中的相应维度设置为-1。

        Return:
            dict类型。描述该任务头所产生的输出对象。注意，在训练阶段时必须包含名为loss的输出对象。
            """

        raise NotImplementedError()

    @property
    def epoch_inputs_attrs(self):
        """epoch级别的任务输入对象声明。
        描述该任务所依赖的来自reader、backbone和来自其他任务头的输出对象（每个epoch结束后产生一次），如完整的
        样本集，有效的样本数等。使用字典进行描述，字典的key为输出对象所在的组件（如’reader‘，’backbone‘等），
        value为该组件下任务头所需要的输出对象集。输出对象集使用字典描述，key为输出对象的名字（该名字需保证在相关
        组件的输出对象集中），value为该输出对象的shape和dtype。当某个输出对象的某个维度长度可变时，shape中的相
        应维度设置为-1。
        
        Return:
            dict类型。描述该任务头所产生的输出对象。注意，在训练阶段时必须包含名为loss的输出对象。
        """
        return {}

    def build(self, inputs, scope_name=""):
        """建立任务头的计算图。

        将符合inputs_attrs描述的来自各个对象集的静态图Variables映射成符合outputs_attr描述的静态图Variable输出。
        Args:
            inputs: dict类型。字典中包含inputs_attrs中的对象名到计算图Variable的映射，inputs中至少会包含inputs_attr中定义的对象
        Return:
           需要输出的计算图变量，输出对象会被加入到fetch_list中，从而在每个训练/推理step时得到runtime的计算结果，该计算结果会被传入postprocess方法中供用户处理。
        """
        raise NotImplementedError()

    def batch_postprocess(self, rt_outputs):
        """batch/step级别的后处理。

        每个训练或推理step后针对当前batch的任务头输出对象的实时计算结果来进行相关后处理。
        默认将输出结果存储到缓冲区self._results_buffer中。"""
        if isinstance(rt_outputs, dict):
            keys = rt_outputs.keys()
            vals = [rt_outputs[k] for k in keys]
            lens = [len(v) for v in vals]
            if len(set(lens)) == 1:
                results = [dict(zip(*[keys, i])) for i in zip(*vals)]
                self._results_buffer.extend(results)
                return results
            else:
                print('WARNING: irregular output results. visualize failed.')
                self._results_buffer.append(rt_outputs)
        return None

    def reset(self):
        """清空该任务头的缓冲区（在训练或推理过程中积累的处理结果）"""
        self._results_buffer = []

    def get_results(self):
        """返回当前任务头积累的处理结果。"""
        return copy.deepcopy(self._results_buffer)
        
    def epoch_postprocess(self, post_inputs=None, output_dir=None):
        """epoch级别的后处理。

        每个训练或推理epoch结束后，对积累的各样本的后处理结果results进行后处理。默认情况下，当output_dir为None时，直接将results打印到
        屏幕上。当指定output_dir时，将results存储在指定的文件夹内，并以任务头所处阶段来作为存储文件的文件名。

        Args:
            post_inputs: 当声明的epoch_inputs_attr不为空时，该参数会携带对应的输入变量的内容。
            output_dir: 积累结果的保存路径。
        """
        if output_dir is not None:
            for i in self._results_buffer:
                print(i)
        else:
            if not os.path.exists(output_dir):
                os.makedirs(output_dir)
            with open(os.path.join(output_dir, self._phase), 'w') as writer:
                for i in self._results_buffer:
                    writer.write(json.dumps(i)+'\n')
```


在基类的基础上，定义一个全新的Head时需要至少实现的方法有：

- \_\_init\_\_
- inputs_attrs
- outputs_attr
- build

可以重写的方法有：

- epoch_inputs_attrs
- batch_postprocess
- epoch_postprocess

### backbone自定义

backbone的接口类（Interface）位于`paddlepalm/backbone/base_backbone.py`。

该接口类定义如下：

```python
# -*- coding: UTF-8 -*-
#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


class Backbone(object):
    """interface of backbone model."""

    def __init__(self, phase):
        """该函数完成一个主干网络的构造，至少需要包含一个phase参数。
        注意：实现该构造函数时，必须保证对基类构造函数的调用，以创建必要的框架内建的成员变量。
        Args:
            phase: str类型。用于区分主干网络被调用时所处的运行阶段，目前支持训练阶段train和预测阶段predict
            """

        assert isinstance(config, dict)

    @property
    def inputs_attr(self):
        """描述backbone从reader处需要得到的输入对象的属性，包含各个对象的名字、shape以及数据类型。当某个对象
        为标量数据类型（如str, int, float等）时，shape设置为空列表[]，当某个对象的某个维度长度可变时，shape
        中的相应维度设置为-1。

        Return:
            dict类型。对各个输入对象的属性描述。例如，
            对于文本分类和匹配任务，bert backbone依赖的reader对象主要包含如下的对象
                {"token_ids": ([-1, max_len], 'int64'),
                 "input_ids": ([-1, max_len], 'int64'),
                 "segment_ids": ([-1, max_len], 'int64'),
                 "input_mask": ([-1, max_len], 'float32')}"""
        raise NotImplementedError()

    @property
    def outputs_attr(self):
        """描述backbone输出对象的属性，包含各个对象的名字、shape以及数据类型。当某个对象为标量数据类型（如
        str, int, float等）时，shape设置为空列表[]，当某个对象的某个维度长度可变时，shape中的相应维度设置为-1。
        
        Return:
            dict类型。对各个输出对象的属性描述。例如，
            对于文本分类和匹配任务，bert backbone的输出内容可能包含如下的对象
                {"word_emb": ([-1, max_seqlen, word_emb_size], 'float32'),
                 "sentence_emb": ([-1, hidden_size], 'float32'),
                 "sim_vec": ([-1, hidden_size], 'float32')}""" 
        raise NotImplementedError()

    def build(self, inputs):
        """建立backbone的计算图。将符合inputs_attr描述的静态图Variable输入映射成符合outputs_attr描述的静态图Variable输出。
        Args:
            inputs: dict类型。字典中包含inputs_attr中的对象名到计算图Variable的映射，inputs中至少会包含inputs_attr中定义的对象
        Return:
           需要输出的计算图变量，输出对象会被加入到fetch_list中，从而在每个训练/推理step时得到runtime的计算结果，该计算结果会被传入postprocess方法中供用户处理。
            """
        raise NotImplementedError()
```


在基类的基础上，定义一个全新的Backbone时需要至少实现的方法有：

- \_\_init\_\_
- input_attrs
- output_attr
- build

### reader自定义

reader的接口类（Interface）位于`paddlepalm/reader/base_reader.py`。

该接口类定义如下：

```python
# -*- coding: UTF-8 -*-
#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from copy import copy

class Reader(object):
    """interface of data reader."""

    def __init__(self, phase='train'):
        """该函数完成一个Reader的构造，至少需要包含一个phase参数。
        注意：实现该构造函数时，必须保证对基类构造函数的调用，以创建必要的框架内建的成员变量。
        Args:
            phase: str类型。用于区分主干网络被调用时所处的运行阶段，目前支持训练阶段train和预测阶段predict
            """
        
        self._phase = phase
        self._batch_size = None
        self._num_epochs = 1
        self._register = set()
        self._registered_backbone = None

    @classmethod
    def create_register(self):
        return set()
        
    def clone(self, phase='train'):
        """拷贝一个新的reader对象。"""
        if phase == self._phase:
            return copy(self)
        else:
            ret = copy(self)
            ret._phase = phase
            return ret

    def require_attr(self, attr_name):
        """在注册器中新增一个需要产生的对象。

        Args:
            attr_name: 需要产出的对象的对象名，例如’segment_ids‘。
            """
        self._register.add(attr_name)
            
    def register_with(self, backbone):
        """根据backbone对输入对象的依赖，在注册器中对每个依赖的输入对象进行注册。

        Args:
            backbone: 需要对接的主干网络。
        """
        for attr in backbone.inputs_attr:
            self.require_attr(attr)
        self._registered_backbone = backbone

    def get_registered_backbone(self):
        """返回该reader所注册的backbone。"""
        return self._registered_backbone

    def _get_registed_attrs(self, attrs):
        ret = {}
        for i in self._register:
            if i not in attrs:
                raise NotImplementedError('output attr {} is not found in this reader.'.format(i))
            ret[i] = attrs[i]
        return ret

    def load_data(self, input_file, batch_size, num_epochs=None, \
                  file_format='tsv', shuffle_train=True):
        """将磁盘上的数据载入到reader中。

        注意：实现该方法时需要同步创建self._batch_size和self._num_epochs。

        Args:
            input_file: 数据集文件路径。文件格式需要满足`file_format`参数的要求。
            batch_size: 迭代器每次yield出的样本数量。注意：当环境中存在多个GPU时，batch_size需要保证被GPU卡数整除。
            num_epochs: 数据集遍历次数。默认为None, 在单任务模式下代表遍历一次，在多任务模式下该参数会被上层的Trainer进行自动赋值。该参数仅对训练阶段有效。
            file_format: 输入文件的文件格式。目前支持的格式: tsv. 默认为tsv.
            shuffle_train: 是否打乱训练集中的样本。默认为True。该参数仅对训练阶段有效。
        """
        raise NotImplementedError()

    @property
    def outputs_attr(self):
        """描述reader输出对象（被yield出的对象）的属性，包含各个对象的名字、shape以及数据类型。当某个对象为标量数据
        类型（如str, int, float等）时，shape设置为空列表[]，当某个对象的某个维度长度可变时，shape中的相应维度设置为-1。
        注意：当使用mini-batch梯度下降学习策略时，，应为常规的输入对象设置batch_size维度（一般为-1）
        Return:
            dict类型。对各个输入对象的属性描述。例如，
            对于文本分类和匹配任务，yield的输出内容可能包含如下的对象（下游backbone和task可按需访问其中的对象）
                {"token_ids": ([-1, max_len], 'int64'),
                 "input_ids": ([-1, max_len], 'int64'),
                 "segment_ids": ([-1, max_len], 'int64'),
                 "input_mask": ([-1, max_len], 'float32'),
                 "label": ([-1], 'int')}
        """
        raise NotImplementedError()
    
    def _iterator(self):
        """数据集遍历接口，注意，当数据集遍历到尾部时该接口应自动完成指针重置，即重新从数据集头部开始新的遍历。
        Yield:
            dict类型。符合outputs_attr描述的当前step的输出对象。
        """
        raise NotImplementedError()

    def get_epoch_outputs(self):
        """返回数据集每个epoch遍历后的输出对象。"""
        raise NotImplementedError()

    @property
    def num_examples(self):
        """数据集中的样本数量，即每个epoch中iterator所生成的样本数。注意，使用滑动窗口等可能导致数据集样本数发生变化的策略时
        该接口应返回runtime阶段的实际样本数。"""
        raise NotImplementedError()

    @property
    def num_epochs(self):
        """数据集遍历次数"""
        return self._num_epochs
```


在基类的基础上，定义一个全新的Reader时需要至少实现的方法有：

- \_\_init\_\_
- outputs_attr
- load_data
- _iterator
- num_examples

可以重写的方法有：

- get_epoch_outputs


================================================
FILE: examples/classification/README.md
================================================
## Example 1: Classification
This task is a sentiment analysis task. The following sections detail model preparation, dataset preparation, and how to run the task.

### Step 1: Prepare Pre-trained Model & Dataset

#### Pre-trained Model

The pre-training model of this mission is: [ERNIE-v1-zh-base](https://github.com/PaddlePaddle/PALM/tree/r0.3-api).

Make sure you have downloaded the required pre-training model in the current folder.


#### Dataset

This example demonstrates with [ChnSentiCorp](https://github.com/SophonPlus/ChineseNlpCorpus/tree/master/datasets/ChnSentiCorp_htl_all), a Chinese sentiment analysis dataset.

Download dataset:
```shell
python download.py
```

If everything goes well, there will be a folder named `data/`  created with all the data files in it.

The dataset file (for training) should have 2 fields,  `text_a` and `label`, stored with [tsv](https://en.wikipedia.org/wiki/Tab-separated_values) format. Here shows an example:

```
label  text_a
0   当当网名不符实，订货多日不见送货，询问客服只会推托，只会要求用户再下订单。如此服务留不住顾客的。去别的网站买书服务更好。
0   XP的驱动不好找！我的17号提的货，现在就降价了100元，而且还送杀毒软件！
1   <荐书> 推荐所有喜欢<红楼>的红迷们一定要收藏这本书,要知道当年我听说这本书的时候花很长时间去图书馆找和借都没能如愿,所以这次一看到当当有,马上买了,红迷们也要记得备货哦!
```

### Step 2: Train & Predict

The code used to perform this task is in `run.py`. If you have prepared the pre-training model and the data set required for the task, run:

```shell
python run.py
```

If you want to specify a specific gpu or use multiple gpus for training, please use **`CUDA_VISIBLE_DEVICES`**, for example:

```shell
CUDA_VISIBLE_DEVICES=0,1 python run.py
```

Note: On multi-gpu mode, PaddlePALM will automatically split each batch onto the available cards. For example, if the `batch_size` is set 64, and there are 4 cards visible for PaddlePALM, then the batch_size in each card is actually 64/4=16. If you want to change the `batch_size` or the number of gpus used in the example, **you need to ensure that the set batch_size can be divided by the number of cards.**


Some logs will be shown below:

```
step 1/154 (epoch 0), loss: 5.512, speed: 0.51 steps/s
step 2/154 (epoch 0), loss: 2.595, speed: 3.36 steps/s
step 3/154 (epoch 0), loss: 1.798, speed: 3.48 steps/s
```


After the run, you can view the saved models in the `outputs/` folder and the predictions in the `outputs/predict` folder. Here are some examples of predictions:


```
{"index": 0, "logits": [-0.2014336884021759, 0.6799028515815735], "probs": [0.29290086030960083, 0.7070990800857544], "label": 1}
{"index": 1, "logits": [0.8593899011611938, -0.29743513464927673], "probs": [0.7607553601264954, 0.23924466967582703], "label": 0}
{"index": 2, "logits": [0.7462944388389587, -0.7083730101585388], "probs": [0.8107157349586487, 0.18928426504135132], "label": 0}
```

### Step 3: Evaluate

Once you have the prediction, you can run the evaluation script to evaluate the model:

```shell
python evaluate.py
```

The evaluation results are as follows:

```
data num: 1200
accuracy: 0.9575, precision: 0.9634, recall: 0.9523, f1: 0.9578
```


================================================
FILE: examples/classification/download.py
================================================
#  -*- coding: utf-8 -*-
from __future__ import print_function
import os
import tarfile
import shutil
import sys
import urllib
URLLIB=urllib
if sys.version_info >= (3, 0):
    import urllib.request
    URLLIB=urllib.request

def download(src, url):
    def _reporthook(count, chunk_size, total_size):
        bytes_so_far = count * chunk_size
        percent = float(bytes_so_far) / float(total_size)
        if percent > 1:
            percent = 1
        print('\r>> Downloading... {:.1%}'.format(percent), end="")

    URLLIB.urlretrieve(url, src, reporthook=_reporthook)

abs_path = os.path.abspath(__file__)
download_url = "https://ernie.bj.bcebos.com/task_data_zh.tgz"
downlaod_path = os.path.join(os.path.dirname(abs_path), "task_data_zh.tgz")
target_dir = os.path.dirname(abs_path)
download(downlaod_path, download_url)

tar = tarfile.open(downlaod_path)
tar.extractall(target_dir)
os.remove(downlaod_path)

abs_path = os.path.abspath(__file__)
dst_dir = os.path.join(os.path.dirname(abs_path), "data")
if not os.path.exists(dst_dir) or not os.path.isdir(dst_dir):
    os.makedirs(dst_dir)

for file in os.listdir(os.path.join(target_dir, 'task_data', 'chnsenticorp')):
    shutil.move(os.path.join(target_dir, 'task_data', 'chnsenticorp', file), dst_dir)

shutil.rmtree(os.path.join(target_dir, 'task_data'))
print(" done!")


================================================
FILE: examples/classification/evaluate.py
================================================
#  -*- coding: utf-8 -*-

import json
import numpy as np

def accuracy(preds, labels):
    preds = np.array(preds)
    labels = np.array(labels) 
    return (preds == labels).mean()

def pre_recall_f1(preds, labels):
    preds = np.array(preds)
    labels = np.array(labels)
    # recall=TP/(TP+FN)
    tp = np.sum((labels == '1') & (preds == '1'))
    fp = np.sum((labels == '0') & (preds == '1'))
    fn = np.sum((labels == '1') & (preds == '0'))
    r = tp * 1.0 / (tp + fn)
    # Precision=TP/(TP+FP)
    p = tp * 1.0 / (tp + fp)
    epsilon = 1e-31
    f1 = 2 * p * r / (p+r+epsilon)
    return p, r, f1


def res_evaluate(res_dir="./outputs/predict/predictions.json", eval_phase='test'):
    if eval_phase == 'test':
        data_dir="./data/test.tsv"
    elif eval_phase == 'dev':
        data_dir="./data/dev.tsv"
    else:
        assert eval_phase in ['dev', 'test'], 'eval_phase should be dev or test'
    
    labels = []
    with open(data_dir, "r") as file:
        first_flag = True
        for line in file:
            line = line.split("\t")
            label = line[0]
            if label=='label':
                continue
            labels.append(str(label))
    file.close()

    preds = []
    with open(res_dir, "r") as file:
        for line in file.readlines():
            line = json.loads(line)
            pred = line['label']
            preds.append(str(pred))
    file.close()
    assert len(labels) == len(preds), "prediction result doesn't match to labels"
    print('data num: {}'.format(len(labels)))
    p, r, f1 = pre_recall_f1(preds, labels)
    print("accuracy: {:.4f}, precision: {:.4f}, recall: {:.4f}, f1: {:.4f}".format(accuracy(preds, labels), p, r, f1))

res_evaluate()


================================================
FILE: examples/classification/run.py
================================================
# coding=utf-8
import paddlepalm as palm
import json


if __name__ == '__main__':

    # configs
    max_seqlen = 256
    batch_size = 8
    num_epochs = 10
    lr = 5e-5
    weight_decay = 0.01
    vocab_path = './pretrain/ERNIE-v1-zh-base/vocab.txt'

    train_file = './data/train.tsv'
    predict_file = './data/test.tsv'
    config = json.load(open('./pretrain/ERNIE-v1-zh-base/ernie_config.json'))
    input_dim = config['hidden_size']
    num_classes = 2
    dropout_prob = 0.1
    random_seed = 1
    task_name = 'chnsenticorp'
    save_path = './outputs/'
    pred_output = './outputs/predict/'
    save_type = 'ckpt'
    print_steps = 20
    pre_params = './pretrain/ERNIE-v1-zh-base/params'

    # -----------------------  for training ----------------------- 

    # step 1-1: create readers for training
    cls_reader = palm.reader.ClassifyReader(vocab_path, max_seqlen, seed=random_seed)
    # step 1-2: load the training data
    cls_reader.load_data(train_file, batch_size, num_epochs=num_epochs)

    # step 2: create a backbone of the model to extract text features
    ernie = palm.backbone.ERNIE.from_config(config)

    # step 3: register the backbone in reader
    cls_reader.register_with(ernie)

    # step 4: create the task output head
    cls_head = palm.head.Classify(num_classes, input_dim, dropout_prob)

    # step 5-1: create a task trainer
    trainer = palm.Trainer(task_name)
    # step 5-2: build forward graph with backbone and task head
    loss_var = trainer.build_forward(ernie, cls_head)

    # step 6-1*: use warmup
    n_steps = cls_reader.num_examples * num_epochs // batch_size
    warmup_steps = int(0.1 * n_steps)
    sched = palm.lr_sched.TriangularSchedualer(warmup_steps, n_steps)
    # step 6-2: create a optimizer
    adam = palm.optimizer.Adam(loss_var, lr, sched)
    # step 6-3: build backward
    trainer.build_backward(optimizer=adam, weight_decay=weight_decay)
  
    # step 7: fit prepared reader and data
    trainer.fit_reader(cls_reader)
    
    # step 8-1*: load pretrained parameters
    trainer.load_pretrain(pre_params)
    # step 8-2*: set saver to save model
    # save_steps = n_steps 
    save_steps = 2396
    trainer.set_saver(save_steps=save_steps, save_path=save_path, save_type=save_type)
    # step 8-3: start training
    trainer.train(print_steps=print_steps)
   
    # -----------------------  for prediction ----------------------- 

    # step 1-1: create readers for prediction
    print('prepare to predict...')
    predict_cls_reader = palm.reader.ClassifyReader(vocab_path, max_seqlen, seed=random_seed, phase='predict')
    # step 1-2: load the training data
    predict_cls_reader.load_data(predict_file, batch_size)
    
    # step 2: create a backbone of the model to extract text features
    pred_ernie = palm.backbone.ERNIE.from_config(config, phase='predict')

    # step 3: register the backbone in reader
    predict_cls_reader.register_with(pred_ernie)
    
    # step 4: create the task output head
    cls_pred_head = palm.head.Classify(num_classes, input_dim, phase='predict')
    
    # step 5: build forward graph with backbone and task head
    trainer.build_predict_forward(pred_ernie, cls_pred_head)
 
    # step 6: load checkpoint
    # model_path = './outputs/ckpt.step'+str(save_steps)
    model_path = './outputs/ckpt.step'+str(11980)
    trainer.load_ckpt(model_path)

    # step 7: fit prepared reader and data
    trainer.fit_reader(predict_cls_reader, phase='predict')

    # step 8: predict
    print('predicting..')
    trainer.predict(print_steps=print_steps, output_dir=pred_output)


================================================
FILE: examples/matching/README.md
================================================
## Example 2: Matching
This task is a sentence pair matching task. The following sections detail model preparation, dataset preparation, and how to run the task with PaddlePALM.

### Step 1: Prepare Pre-trained Models & Datasets

#### Download Pre-trained Model

The pre-training model of this mission is: [ERNIE-v2-en-base](https://github.com/PaddlePaddle/PALM/tree/r0.3-api).

Make sure you have downloaded the required pre-training model in the current folder.


#### Dataset

Here takes the [Quora Question Pairs](https://www.quora.com/q/quoradata/First-Quora-Dataset-Release-Question-Pairs) dataset as the testbed for matching.

Download dataset:
```shell
python download.py
```

After the dataset is downloaded, you should convert the data format for training:
```shell
python process.py data/quora_duplicate_questions.tsv data/train.tsv data/test.tsv
```

If everything goes well, there will be a folder named `data/`  created with all the converted datas in it.

The dataset file (for training) should have 3 fields,  `text_a`, `text_b` and `label`, stored with [tsv](https://en.wikipedia.org/wiki/Tab-separated_values) format. Here shows an example:

```
text_a  text_b  label
How can the arrangement of corynebacterium xerosis be described?  How would you describe waves? 0
How do you fix a Google Play Store account that isn't working?  What can cause the Google Play store to not open? How are such probelms fixed?  1
Which is the best earphone under 1000?  What are the best earphones under 1k? 1
What are the differences between the Dell Inspiron 3000, 5000, and 7000 series laptops? "Should I buy an Apple MacBook Pro 15"" or a Dell Inspiron 17 5000 series?" 0
```


### Step 2: Train & Predict

The code used to perform this task is in `run.py`. If you have prepared the pre-training model and the data set required for the task, run:

```shell
python run.py
```

If you want to specify a specific gpu or use multiple gpus for training, please use **`CUDA_VISIBLE_DEVICES`**, for example:

```shell
CUDA_VISIBLE_DEVICES=0,1 python run.py
```

Note: On multi-gpu mode, PaddlePALM will automatically split each batch onto the available cards. For example, if the `batch_size` is set 64, and there are 4 cards visible for PaddlePALM, then the batch_size in each card is actually 64/4=16. If you want to change the `batch_size` or the number of gpus used in the example, **you need to ensure that the set batch_size can be divided by the number of cards.**

Some logs will be shown below:

```
step 20/49087 (epoch 0), loss: 1.079, speed: 3.48 steps/s
step 40/49087 (epoch 0), loss: 1.251, speed: 5.18 steps/s
step 60/49087 (epoch 0), loss: 1.193, speed: 5.04 steps/s
```


After the run, you can view the saved models in the `outputs/` folder and the predictions in the `outputs/predict` folder. Here are some examples of predictions:


```
{"index": 0, "logits": [-0.32688724994659424, -0.8568955063819885], "probs": [0.629485011100769, 0.3705149292945862], "label": 0}
{"index": 1, "logits": [-0.2735646963119507, -0.7983021140098572], "probs": [0.6282548904418945, 0.37174513936042786], "label": 0}
{"index": 2, "logits": [-0.3381381630897522, -0.8614270091056824], "probs": [0.6279165148735046, 0.37208351492881775], "label": 0}
```

### Step 3: Evaluate

Once you have the prediction, you can run the evaluation script to evaluate the model:

```shell
python evaluate.py
```

The evaluation results are as follows:

```
data num: 4300
accuracy: 0.8619, precision: 0.8061, recall: 0.8377, f1: 0.8216
```


================================================
FILE: examples/matching/download.py
================================================
#  -*- coding: utf-8 -*-
from __future__ import print_function
import os
import sys
import urllib
URLLIB=urllib
if sys.version_info >= (3, 0):
    import urllib.request
    URLLIB=urllib.request

def download(src, url):
    def _reporthook(count, chunk_size, total_size):
        bytes_so_far = count * chunk_size
        percent = float(bytes_so_far) / float(total_size)
        if percent > 1:
            percent = 1
        print('\r>> Downloading... {:.1%}'.format(percent), end="")

    URLLIB.urlretrieve(url, src, reporthook=_reporthook)


abs_path = os.path.abspath(__file__)
data_dir = os.path.join(os.path.dirname(abs_path), "data")
if not os.path.exists(data_dir) or not os.path.isdir(data_dir):
    os.makedirs(data_dir)

download_url = "http://qim.fs.quoracdn.net/quora_duplicate_questions.tsv"
downlaod_path = os.path.join(data_dir, "quora_duplicate_questions.tsv")
download(downlaod_path, download_url)
print(" done!")


================================================
FILE: examples/matching/evaluate.py
================================================
#  -*- coding: utf-8 -*-

import json
import numpy as np

def accuracy(preds, labels):
    preds = np.array(preds)
    labels = np.array(labels) 
    return (preds == labels).mean()

def pre_recall_f1(preds, labels):
    preds = np.array(preds)
    labels = np.array(labels)
    # recall=TP/(TP+FN)
    tp = np.sum((labels == '1') & (preds == '1'))
    fp = np.sum((labels == '0') & (preds == '1'))
    fn = np.sum((labels == '1') & (preds == '0'))
    r = tp * 1.0 / (tp + fn)
    # Precision=TP/(TP+FP)
    p = tp * 1.0 / (tp + fp)
    epsilon = 1e-31
    f1 = 2 * p * r / (p+r+epsilon)
    return p, r, f1


def res_evaluate(res_dir="./outputs/predict/predictions.json", eval_phase='test'):
    if eval_phase == 'test':
        data_dir="./data/test.tsv"
    elif eval_phase == 'dev':
        data_dir="./data/dev.tsv"
    else:
        assert eval_phase in ['dev', 'test'], 'eval_phase should be dev or test'
    
    labels = []
    with open(data_dir, "r") as file:
        first_flag = True
        for line in file:
            line = line.split("\t")
            label = line[2][:-1]
            if label=='label':
                continue
            labels.append(str(label))
    file.close()

    preds = []
    with open(res_dir, "r") as file:
        for line in file.readlines():
            line = json.loads(line)
            pred = line['label']
            preds.append(str(pred))
    file.close()
    assert len(labels) == len(preds), "prediction result({}) doesn't match to labels({})".format(len(preds),len(labels))
    print('data num: {}'.format(len(labels)))
    p, r, f1 = pre_recall_f1(preds, labels)
    print("accuracy: {:.4f}, precision: {:.4f}, recall: {:.4f}, f1: {:.4f}".format(accuracy(preds, labels), p, r, f1))

res_evaluate()


================================================
FILE: examples/matching/process.py
================================================
#  -*- coding: utf-8 -*-

import sys
import os

if len(sys.argv) != 4:
    exit(0)

data_dir = sys.argv[1]
if not os.path.exists(data_dir):
    print("%s not exists" % data_dir)
    exit(0)

train_dir = sys.argv[2]
train_file = open(train_dir, "w")
train_file.write("text_a\ttext_b\tlabel\n")

test_dir = sys.argv[3]
test_file = open(test_dir, "w")
test_file.write("text_a\ttext_b\tlabel\n")
with open(data_dir, "r") as file:
    before = ""
    cnt = 0
    for line in file:
        line = line.strip("\n")
        line_t = line.split("\t")
        flag = 0
        if len(line_t) < 6:
            if flag: 
                flag = 0
                out_line = "{}{}\n".format(out_line, line)
            else:
                flag = 1
                outline = "{}".format(line)
            continue
        else:
            out_line = "{}\t{}\t{}\n".format(line_t[3], line_t[4], line_t[5])
        cnt += 1

        if 2 <= cnt <= 4301:
            test_file.write(out_line)
        if 4301 <= cnt <= 104301:
            train_file.write(out_line)

train_file.close()
test_file.close()


================================================
FILE: examples/matching/run.py
================================================
# coding=utf-8
import paddlepalm as palm
import json

if __name__ == '__main__':

    # configs 
    max_seqlen = 128
    batch_size = 16 
    num_epochs = 3
    lr = 3e-5
    weight_decay = 0.0
    num_classes = 2
    random_seed = 1
    dropout_prob = 0.1
    save_path = './outputs/'
    save_type = 'ckpt'
    pred_model_path = './outputs/ckpt.step'+str(18732)
    print_steps = 50
    pred_output = './outputs/predict/'
    pre_params = './pretrain/ERNIE-v2-en-base/params'
    task_name = 'Quora Question Pairs matching'

    vocab_path = './pretrain/ERNIE-v2-en-base/vocab.txt'
    train_file = './data/train.tsv'
    predict_file = './data/test.tsv'
    config = json.load(open('./pretrain/ERNIE-v2-en-base/ernie_config.json'))
    input_dim = config['hidden_size']

    # -----------------------  for training ----------------------- 

    # step 1-1: create readers for training
    match_reader = palm.reader.MatchReader(vocab_path, max_seqlen, seed=random_seed)
    # step 1-2: load the training data
    match_reader.load_data(train_file, file_format='tsv', num_epochs=num_epochs, batch_size=batch_size)
    
    # step 2: create a backbone of the model to extract text features
    ernie = palm.backbone.ERNIE.from_config(config)

    # step 3: register the backbone in reader
    match_reader.register_with(ernie)
    
    # step 4: create the task output head
    match_head = palm.head.Match(num_classes, input_dim, dropout_prob)
 
    # step 5-1: create a task trainer
    trainer = palm.Trainer(task_name)
    # step 5-2: build forward graph with backbone and task head
    loss_var = trainer.build_forward(ernie, match_head)
    
    # step 6-1*: use warmup
    n_steps = match_reader.num_examples * num_epochs // batch_size
    warmup_steps = int(0.1 * n_steps)
    print('total_steps: {}'.format(n_steps))
    print('warmup_steps: {}'.format(warmup_steps))
    sched = palm.lr_sched.TriangularSchedualer(warmup_steps, n_steps)

    # step 6-2: create a optimizer
    adam = palm.optimizer.Adam(loss_var, lr, sched)
    # step 6-3: build backward
    trainer.build_backward(optimizer=adam, weight_decay=weight_decay)
    
    # step 7: fit prepared reader and data
    trainer.fit_reader(match_reader)

    # step 8-1*: load pretrained parameters
    trainer.load_pretrain(pre_params, False)
    # step 8-2*: set saver to save model
    # save_steps = n_steps-16
    save_steps = 6244
    trainer.set_saver(save_path=save_path, save_steps=save_steps, save_type=save_type)
    # step 8-3: start training
    trainer.train(print_steps=print_steps)
     
    # -----------------------  for prediction ----------------------- 

    # step 1-1: create readers for prediction
    print('prepare to predict...')
    predict_match_reader = palm.reader.MatchReader(vocab_path, max_seqlen, seed=random_seed, phase='predict')
    # step 1-2: load the training data
    predict_match_reader.load_data(predict_file, batch_size)

    # step 2: create a backbone of the model to extract text features
    pred_ernie = palm.backbone.ERNIE.from_config(config, phase='predict')

    # step 3: register the backbone in reader
    predict_match_reader.register_with(pred_ernie)
    
    # step 4: create the task output head
    match_pred_head = palm.head.Match(num_classes, input_dim, phase='predict')

    # step 5: build forward graph with backbone and task head
    trainer.build_predict_forward(pred_ernie, match_pred_head)

    # step 6: load checkpoint
    trainer.load_ckpt(pred_model_path)

    # step 7: fit prepared reader and data
    trainer.fit_reader(predict_match_reader, phase='predict')
    
    # step 8: predict
    print('predicting..')
    trainer.predict(print_steps=print_steps, output_dir=pred_output)


================================================
FILE: examples/mrc/README.md
================================================
## Example 4: Machine Reading Comprehension
This task is a machine reading comprehension task. The following sections detail model preparation, dataset preparation, and how to run the task.

### Step 1: Prepare Pre-trained Models & Datasets

#### Pre-trianed Model

The pre-training model of this mission is: [ERNIE-v1-zh-base](https://github.com/PaddlePaddle/PALM/tree/r0.3-api).

Make sure you have downloaded the required pre-training model in the current folder.


#### Dataset

This task uses the `CMRC2018` dataset. `CMRC2018` is an evaluation conducted by Chinese information society. The task of evaluation is to extract reading comprehension.

Download dataset:
```shell
python download.py
```

If everything goes well, there will be a folder named `data/`  created with all the datas in it.

Here is some example datas:

 ```json
"paragraphs": [
         {
           "id": "TRAIN_36",
           "context": "NGC 6231是一个位于天蝎座的疏散星团，天球座标为赤经16时54分，赤纬-41度48分，视觉观测大小约45角分，亮度约2.6视星等，距地球5900光年。NGC 6231年龄约为三百二十万年，是一个非常年轻的星团，星团内的最亮星是5等的天蝎座 ζ1星。用双筒望远镜或小型望远镜就能看到个别的行星。NGC 6231在1654年被意大利天文学家乔瓦尼·巴蒂斯特·霍迪尔纳（Giovanni Battista Hodierna）以Luminosae的名字首次纪录在星表中，但是未见记载于夏尔·梅西耶的天体列表和威廉·赫歇尔的深空天体目录。这个天体在1678年被爱德蒙·哈雷（I.7）、1745年被夏西亚科斯（Jean-Phillippe Loys de Cheseaux）（9）、1751年被尼可拉·路易·拉卡伊（II.13）分别再次独立发现。",
           "qas": [
             {
               "question": "NGC 6231的经纬度是多少？",
               "id": "TRAIN_36_QUERY_0",
               "answers": [
                 {
                   "text": "赤经16时54分，赤纬-41度48分",
                   "answer_start": 27
                 }
               ]
             }
         }
 ```


### Step 2: Train & Predict

The code used to perform this task is in `run.py`. If you have prepared the pre-training model and the data set required for the task, run:

```shell
python run.py
```

If you want to specify a specific gpu or use multiple gpus for training, please use **`CUDA_VISIBLE_DEVICES`**, for example:

```shell
CUDA_VISIBLE_DEVICES=0,1 python run.py
```

Note: On multi-gpu mode, PaddlePALM will automatically split each batch onto the available cards. For example, if the `batch_size` is set 64, and there are 4 cards visible for PaddlePALM, then the batch_size in each card is actually 64/4=16. If you want to change the `batch_size` or the number of gpus used in the example, **you need to ensure that the set batch_size can be divided by the number of cards.**

Some logs will be shown below:

```
step 1/1515 (epoch 0), loss: 6.251, speed: 0.31 steps/s
step 2/1515 (epoch 0), loss: 6.206, speed: 0.80 steps/s
step 3/1515 (epoch 0), loss: 6.172, speed: 0.86 steps/s
```


After the run, you can view the saved models in the `outputs/` folder and the predictions in the `outputs/predict` folder. Here are some examples of predictions:


```json
{
    "DEV_0_QUERY_0": "光 荣 和 ω-force 开 发", 
    "DEV_0_QUERY_1": "任 天 堂 游 戏 谜 之 村 雨 城", 
    "DEV_0_QUERY_2": "战 史 演 武 」&「 争 霸 演 武 」。", 
    "DEV_1_QUERY_0": "大 陆 传 统 器 乐 及 戏 曲 里 面 常 用 的 打 击 乐 记 谱 方 法 ， 以 中 文 字 的 声 音 模 拟 敲 击 乐 的 声 音 ， 纪 录 打 击 乐 的 各 种 不 同 的 演 奏 方 法 。", 
    "DEV_1_QUERY_1": "「 锣 鼓 点", 
    "DEV_1_QUERY_2": "锣 鼓 的 运 用 有 约 定 俗 成 的 程 式 ， 依 照 角 色 行 当 的 身 份 、 性 格 、 情 绪 以 及 环 境 ， 配 合 相 应 的 锣 鼓 点", 
    "DEV_1_QUERY_3": "鼓 、 锣 、 钹 和 板 四 类 型", 
    "DEV_2_QUERY_0": "364.6 公 里", 
}
```

### Step 3: Evaluate

#### Library Dependencies
Before the evaluation, you need to install `nltk` and download the `punkt` tokenizer for nltk:

```shell
pip insall nltk
python -m nltk.downloader punkt
```

#### Evaluate
You can run the evaluation script to evaluate the model:

```shell
python evaluate.py
```

The evaluation results are as follows:

```
data_num: 3219
em_sroce: 0.6434, f1: 0.8518
```


================================================
FILE: examples/mrc/download.py
================================================
#  -*- coding: utf-8 -*-
from __future__ import print_function
import os
import tarfile
import shutil
import sys
import urllib
URLLIB=urllib
if sys.version_info >= (3, 0):
    import urllib.request
    URLLIB=urllib.request

def download(src, url):
    def _reporthook(count, chunk_size, total_size):
        bytes_so_far = count * chunk_size
        percent = float(bytes_so_far) / float(total_size)
        if percent > 1:
            percent = 1
        print('\r>> Downloading... {:.1%}'.format(percent), end="")

    URLLIB.urlretrieve(url, src, reporthook=_reporthook)

abs_path = os.path.abspath(__file__)
download_url = "https://ernie.bj.bcebos.com/task_data_zh.tgz"
downlaod_path = os.path.join(os.path.dirname(abs_path), "task_data_zh.tgz")
target_dir = os.path.dirname(abs_path)
download(downlaod_path, download_url)

tar = tarfile.open(downlaod_path)
tar.extractall(target_dir)
os.remove(downlaod_path)

abs_path = os.path.abspath(__file__)
dst_dir = os.path.join(os.path.dirname(abs_path), "data")
if not os.path.exists(dst_dir) or not os.path.isdir(dst_dir):
    os.makedirs(dst_dir)

for file in os.listdir(os.path.join(target_dir, 'task_data', 'cmrc2018')):
    shutil.move(os.path.join(target_dir, 'task_data', 'cmrc2018', file), dst_dir)

shutil.rmtree(os.path.join(target_dir, 'task_data'))
print(" done!")


================================================
FILE: examples/mrc/evaluate.py
================================================
# -*- coding: utf-8 -*-
#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
'''
Evaluation script for CMRC 2018
version: v5
Note:
v5 formatted output, add usage description
v4 fixed segmentation issues
'''
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
from __future__ import absolute_import

from collections import Counter, OrderedDict
import string
import re
import argparse
import json
import sys
import nltk
import pdb


# split Chinese with English
def mixed_segmentation(in_str, rm_punc=False):
    in_str = in_str.lower().strip()
    segs_out = []
    temp_str = ""
    sp_char = [
        '-', ':', '_', '*', '^', '/', '\\', '~', '`', '+', '=', '，', '。', '：',
        '？', '！', '“', '”', '；', '’', '《', '》', '……', '·', '、', '「', '」', '（',
        '）', '－', '～', '『', '』',' '
    ]
    for char in in_str:
        if rm_punc and char in sp_char:
            continue
        if re.search(r'[\u4e00-\u9fa5]', char) or char in sp_char:
            if temp_str != "":
                ss = nltk.word_tokenize(temp_str)
                segs_out.extend(ss)
                temp_str = ""
            segs_out.append(char)
        else:
            temp_str += char

    #handling last part
    if temp_str != "":
        ss = nltk.word_tokenize(temp_str)
        segs_out.extend(ss)

    return segs_out


# remove punctuation
def remove_punctuation(in_str):
    in_str = in_str.lower().strip()
    sp_char = [
        '-', ':', '_', '*', '^', '/', '\\', '~', '`', '+', '=', '，', '。', '：',
        '？', '！', '“', '”', '；', '’', '《', '》', '……', '·', '、', '「', '」', '（',
        '）', '－', '～', '『', '』', ' '
    ]
    out_segs = []
    for char in in_str:
        if char in sp_char:
            continue
        else:
            out_segs.append(char)
    return ''.join(out_segs)


# find longest common string
def find_lcs(s1, s2):
    m = [[0 for i in range(len(s2) + 1)] for j in range(len(s1) + 1)]
    mmax = 0
    p = 0
    for i in range(len(s1)):
        for j in range(len(s2)):
            if s1[i] == s2[j]:
                m[i + 1][j + 1] = m[i][j] + 1
                if m[i + 1][j + 1] > mmax:
                    mmax = m[i + 1][j + 1]
                    p = i + 1
    return s1[p - mmax:p], mmax


def evaluate(ground_truth_file, prediction_file):
    f1 = 0
    em = 0
    total_count = 0
    skip_count = 0
    for instances in ground_truth_file["data"]:
        for instance in instances["paragraphs"]:
            context_text = instance['context'].strip()
            for qas in instance['qas']:
                total_count += 1
                query_id = qas['id'].strip()
                query_text = qas['question'].strip()
                answers = [ans["text"] for ans in qas["answers"]]

                if query_id not in prediction_file:
                    print('Unanswered question: {}\n'.format(
                        query_id))
                    skip_count += 1
                    continue

                prediction = prediction_file[query_id]
                f1 += calc_f1_score(answers, prediction)
                em += calc_em_score(answers, prediction)

    f1_score = f1 / total_count
    em_score = em / total_count
    return f1_score, em_score, total_count, skip_count


def calc_f1_score(answers, prediction):
    f1_scores = []
    for ans in answers:
        ans_segs = mixed_segmentation(ans, rm_punc=True)
        prediction_segs = mixed_segmentation(prediction, rm_punc=True)
        lcs, lcs_len = find_lcs(ans_segs, prediction_segs)
        if lcs_len == 0:
            f1_scores.append(0)
            continue
        precision = 1.0 * lcs_len / len(prediction_segs)
        recall = 1.0 * lcs_len / len(ans_segs)
        f1 = (2 * precision * recall) / (precision + recall)
        f1_scores.append(f1)
    return max(f1_scores)


def calc_em_score(answers, prediction):
    em = 0
    for ans in answers:
        ans_ = remove_punctuation(ans)
        prediction_ = remove_punctuation(prediction)
        if ans_ == prediction_:
            em = 1
            break
    return em


def eval_file(dataset_file, prediction_file):
    ground_truth_file = json.load(open(dataset_file, 'r'))
    prediction_file = json.load(open(prediction_file, 'r'))
    F1, EM, TOTAL, SKIP = evaluate(ground_truth_file, prediction_file)
    AVG = (EM + F1) * 0.5
    return EM, F1, AVG, TOTAL


if __name__ == '__main__':
    EM, F1, AVG, TOTAL = eval_file("data/dev.json", "outputs/predict/predictions.json")
    print('data_num: {}'.format(TOTAL))
    print('em_sroce: {:.4f}, f1: {:.4f}'.format(EM,F1))


================================================
FILE: examples/mrc/run.py
================================================
# coding=utf-8
import paddlepalm as palm
import json


if __name__ == '__main__':

    # configs
    max_seqlen = 512
    batch_size = 8   
    num_epochs = 2
    lr = 3e-5
    doc_stride = 128
    max_query_len = 64
    max_ans_len = 128
    weight_decay = 0.01
    print_steps = 20
    vocab_path = './pretrain/ERNIE-v1-zh-base/vocab.txt'
    do_lower_case = True

    train_file = './data/train.json'
    predict_file = './data/dev.json'
    save_path = './outputs/'
    pred_output = './outputs/predict/'
    save_type = 'ckpt'
    task_name = 'cmrc2018'
    pre_params = './pretrain/ERNIE-v1-zh-base/params'
    config = json.load(open('./pretrain/ERNIE-v1-zh-base/ernie_config.json'))

    # -----------------------  for training ----------------------- 

    # step 1-1: create readers for training
    mrc_reader = palm.reader.MRCReader(vocab_path, max_seqlen, max_query_len, doc_stride, do_lower_case=do_lower_case)
    # step 1-2: load the training data
    mrc_reader.load_data(train_file, file_format='json', num_epochs=num_epochs, batch_size=batch_size)

    # step 2: create a backbone of the model to extract text features
    ernie = palm.backbone.ERNIE.from_config(config)

    # step 3: register the backbone in reader
    mrc_reader.register_with(ernie)

    # step 4: create the task output head
    mrc_head = palm.head.MRC(max_query_len, config['hidden_size'], do_lower_case=do_lower_case, max_ans_len=max_ans_len)
 
    # step 5-1: create a task trainer
    trainer = palm.Trainer(task_name)
    # step 5-2: build forward graph with backbone and task head
    loss_var = trainer.build_forward(ernie, mrc_head)
    
    # step 6-1*: use warmup
    n_steps = mrc_reader.num_examples * num_epochs // batch_size
    warmup_steps = int(0.1 * n_steps)
    sched = palm.lr_sched.TriangularSchedualer(warmup_steps, n_steps)
    # step 6-2: create a optimizer
    adam = palm.optimizer.Adam(loss_var, lr, sched)
    # step 6-3: build backward
    trainer.build_backward(optimizer=adam, weight_decay=weight_decay)

    # step 7: fit prepared reader and data
    trainer.fit_reader(mrc_reader)
 
    # step 8-1*: load pretrained parameters
    trainer.load_pretrain(pre_params)
    # step 8-2*: set saver to save model
    save_steps = 3040
    trainer.set_saver(save_path=save_path, save_steps=save_steps, save_type=save_type)
    # step 8-3: start training
    trainer.train(print_steps=print_steps)
   
    # -----------------------  for prediction ----------------------- 

    # step 1-1: create readers for prediction
    predict_mrc_reader = palm.reader.MRCReader(vocab_path, max_seqlen, max_query_len, doc_stride, do_lower_case=do_lower_case, phase='predict')
    # step 1-2: load the training data
    predict_mrc_reader.load_data(predict_file, batch_size)

    # step 2: create a backbone of the model to extract text features
    pred_ernie = palm.backbone.ERNIE.from_config(config, phase='predict')

    # step 3: register the backbone in reader
    predict_mrc_reader.register_with(pred_ernie)

    # step 4: create the task output head
    mrc_pred_head = palm.head.MRC(max_query_len, config['hidden_size'], do_lower_case=do_lower_case, max_ans_len=max_ans_len, phase='predict')
    
    # step 5: build forward graph with backbone and task head
    trainer.build_predict_forward(pred_ernie, mrc_pred_head)

    # step 6: load checkpoint
    pred_model_path =  './outputs/ckpt.step'+str(3040)
    trainer.load_ckpt(pred_model_path)
    
    # step 7: fit prepared reader and data
    trainer.fit_reader(predict_mrc_reader, phase='predict')

    # step 8: predict
    print('predicting..')
    trainer.predict(print_steps=print_steps, output_dir="outputs/predict")


================================================
FILE: examples/multi-task/README.md
================================================
## Example 6: Joint Training of Dialogue Intent Recognition and Slot Filling
This example achieves the joint training ofg Dialogue Intent Recognition and Slot Filling. The intent recognition can be regared as a text classification task, and slot filling as sequence labeling task. Both classification and sequence labeling have been built-in in PaddlePALM.

### Step 1: Prepare Pre-trained Models & Datasets

#### Pre-trained Model

We prepare [ERNIE-v2-en-base](https://github.com/PaddlePaddle/PALM/tree/r0.3-api) as our pre-trained model for this example.

Make sure you have downloaded `ERNIE` to current folder.

#### Dataset

Here we use `Airline Travel Information System` dataset as our testbed. 

Download dataset:
```shell
python download.py
```

After the dataset is downloaded, you should convert the data format for training:
```shell
python process.py
```

If everything goes well, there will be a folder named `data/atis/`  created with all the datas in it.

Here is some example datas:

`data/atis/atis_slot/train.tsv` :
```
text_a	label
i want to fly from boston at 838 am and arrive in denver at 1110 in the morning 	O O O O O B-fromloc.city_name O B-depart_time.time I-depart_time.time O O O B-toloc.city_name O B-arrive_time.time O O B-arrive_time.period_of_day 
what flights are available from pittsburgh to baltimore on thursday morning 	O O O O O B-fromloc.city_name O B-toloc.city_name O B-depart_date.day_name B-depart_time.period_of_day 
what is the arrival time in san francisco for the 755 am flight leaving washington 	O O O B-flight_time I-flight_time O B-fromloc.city_name I-fromloc.city_name O O B-depart_time.time I-depart_time.time O O B-fromloc.city_name 
cheapest airfare from tacoma to orlando 	B-cost_relative O O B-fromloc.city_name O B-toloc.city_name 
```

`data/atis/atis_intent/train.tsv` :
```
label	text_a
0	i want to fly from boston at 838 am and arrive in denver at 1110 in the morning
0	what flights are available from pittsburgh to baltimore on thursday morning
1	what is the arrival time in san francisco for the 755 am flight leaving washington
2	cheapest airfare from tacoma to orlando
```

### Step 2: Train & Predict

The code used to perform this task is in `run.py`. If you have prepared the pre-training model and the data set required for the task, run:

```shell
python run.py
```

If you want to specify a specific gpu or use multiple gpus for training, please use **`CUDA_VISIBLE_DEVICES`**, for example:

```shell
CUDA_VISIBLE_DEVICES=0,1 python run.py
```

Note: On multi-gpu mode, PaddlePALM will automatically split each batch onto the available cards. For example, if the `batch_size` is set 64, and there are 4 cards visible for PaddlePALM, then the batch_size in each card is actually 64/4=16. If you want to change the `batch_size` or the number of gpus used in the example, **you need to ensure that the set batch_size can be divided by the number of cards.**

Some logs will be shown below:

```
global step: 5,   slot: step 3/309 (epoch 0), loss: 68.965, speed: 0.58 steps/s
global step: 10, intent: step 3/311 (epoch 0), loss: 3.407, speed: 8.76 steps/s
global step: 15,   slot: step 12/309 (epoch 0), loss: 54.611, speed: 1.21 steps/s
global step: 20, intent: step 7/311 (epoch 0), loss: 3.487, speed: 10.28 steps/s
```


After the run, you can view the saved models in the `outputs/` folder.


If you want to use the trained model to predict the `atis_slot & atis_intent` data, run:

```shell
python predict-slot.py
python predict-intent.py
```

If you want to specify a specific gpu or use multiple gpus for predict, please use **`CUDA_VISIBLE_DEVICES`**, for example:

```shell
CUDA_VISIBLE_DEVICES=0,1 python predict-slot.py
CUDA_VISIBLE_DEVICES=0,1 python predict-intent.py
```

Note: On multi-gpu mode, PaddlePALM will automatically split each batch onto the available cards. For example, if the `batch_size` is set 64, and there are 4 cards visible for PaddlePALM, then the batch_size in each card is actually 64/4=16. If you want to change the `batch_size` or the number of gpus used in the example, **you need to ensure that the set batch_size can be divided by the number of cards.**

After the run, you can view the predictions in the `outputs/predict-slot` folder and `outputs/predict-intent` folder. Here are some examples of predictions:

`atis_slot`:
```
[129, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 5, 19, 1, 1, 1, 1, 1, 21, 21, 68, 129]
[129, 1, 39, 37, 1, 1, 1, 1, 1, 2, 1, 5, 19, 1, 23, 3, 4, 129, 129, 129, 129, 129]
[129, 1, 39, 37, 1, 1, 1, 1, 1, 1, 2, 1, 5, 19, 129, 129, 129, 129, 129, 129, 129, 129]
[129, 1, 1, 1, 1, 1, 1, 14, 15, 1, 2, 1, 5, 19, 1, 39, 37, 129, 24, 129, 129, 129]
```

`atis_intent`:
```
{"index": 0, "logits": [9.938603401184082, -0.3914794623851776, -0.050973162055015564, -1.0229418277740479, 0.04799401015043259, -0.9632213115692139, -0.6427211761474609, -1.337939739227295, -0.7969412803649902, -1.4441455602645874, -0.6339573264122009, -1.0393054485321045, -0.9242327213287354, -1.9637483358383179, 0.16733427345752716, -0.5280354619026184, -1.7195699214935303, -2.199411630630493, -1.2833174467086792, -1.3081035614013672, -1.6036226749420166, -1.8527079820632935, -2.289180040359497, -2.267214775085449, -2.2578916549682617, -2.2010505199432373], "probs": [0.999531626701355, 3.26210938510485e-05, 4.585415081237443e-05, 1.7348344044876285e-05, 5.06243304698728e-05, 1.8415948943584226e-05, 2.5373808966833167e-05, 1.266065828531282e-05, 2.174747896788176e-05, 1.1384962817828637e-05, 2.5597169951652177e-05, 1.7066764485207386e-05, 1.914815220516175e-05, 6.771284006390488e-06, 5.70411684748251e-05, 2.8457265216275118e-05, 8.644025911053177e-06, 5.349628736439627e-06, 1.3371440218179487e-05, 1.3044088518654462e-05, 9.706698619993404e-06, 7.5665011536329985e-06, 4.890325726591982e-06, 4.99892985317274e-06, 5.045753368904116e-06, 5.340866664482746e-06], "label": 0}
{"index": 1, "logits": [0.8863624930381775, -2.232290506362915, 8.191509246826172, -0.03161466494202614, -0.9149583578109741, -2.172696352005005, -0.3937145471572876, -0.3954394459724426, 1.5333592891693115, 0.8630291223526001, -0.9684226512908936, -2.722721815109253, -0.0060247331857681274, -0.9865402579307556, 1.6328885555267334, 0.3972966969013214, 0.27919167280197144, -1.4911551475524902, -0.9552251696586609, -0.9169244170188904, -0.810670793056488, -1.5118697881698608, -2.0140435695648193, -1.6299077272415161, -1.8589974641799927, -2.07601261138916], "probs": [0.0006675600307062268, 2.9517297662096098e-05, 0.9932880997657776, 0.0002665741485543549, 0.0001102013120544143, 3.132982965325937e-05, 0.00018559220188762993, 0.00018527248175814748, 0.0012749042361974716, 0.0006521637551486492, 0.00010446414671605453, 1.8075270418194123e-05, 0.0002734838053584099, 0.00010258861584588885, 0.0014083238784223795, 0.00040934717981144786, 0.00036374686169438064, 6.193659646669403e-05, 0.00010585198469925672, 0.00010998480865964666, 0.0001223145518451929, 6.0666847275570035e-05, 3.671637750812806e-05, 5.391232480178587e-05, 4.287416595616378e-05, 3.4510172554291785e-05], "label": 0}
{"index": 2, "logits": [9.789957046508789, -0.1730862706899643, -0.7198237776756287, -1.0460278987884521, 0.23521068692207336, -0.5075851678848267, -0.44724929332733154, -1.2945927381515503, -0.6984466314315796, -1.8749892711639404, -0.4631594121456146, -0.6256799697875977, -1.0252169370651245, -1.951456069946289, -0.17572557926177979, -0.6771697402000427, -1.7992591857910156, -2.1457295417785645, -1.4203097820281982, -1.4963451623916626, -1.692310094833374, -1.9219486713409424, -2.2533645629882812, -2.430952310562134, -2.3094685077667236, -2.2399914264678955], "probs": [0.9994625449180603, 4.708383130491711e-05, 2.725377635215409e-05, 1.9667899323394522e-05, 7.082601223373786e-05, 3.3697724575176835e-05, 3.579350595828146e-05, 1.5339375750045292e-05, 2.784266871458385e-05, 8.58508519741008e-06, 3.522853512549773e-05, 2.9944207199150696e-05, 2.0081495677004568e-05, 7.953084605105687e-06, 4.695970710599795e-05, 2.8441407266655006e-05, 9.26048778637778e-06, 6.548832516273251e-06, 1.3527245755540207e-05, 1.2536826943687629e-05, 1.030578732752474e-05, 8.19125762063777e-06, 5.880556273041293e-06, 4.923717369820224e-06, 5.559719284065068e-06, 5.9597273320832755e-06], "label": 0}
{"index": 3, "logits": [9.787659645080566, -0.6223222017288208, -0.03971472755074501, -1.038114070892334, 0.24018540978431702, -0.8904737830162048, -0.7114139795303345, -1.2315020561218262, -0.5120854377746582, -1.4273980855941772, -0.44618460536003113, -1.0241562128067017, -0.9727545380592346, -1.8587366342544556, 0.020689941942691803, -0.6228570342063904, -1.6020199060440063, -2.130260467529297, -1.370570421218872, -1.40530526638031, -1.6782578229904175, -1.94076669216156, -2.2038567066192627, -2.336832284927368, -2.268157720565796, -2.140028953552246], "probs": [0.9994485974311829, 3.0113611501292326e-05, 5.392447565100156e-05, 1.986949791898951e-05, 7.134198676794767e-05, 2.303065048181452e-05, 2.7546762794372626e-05, 1.6375688574044034e-05, 3.362310235388577e-05, 1.3462414244713727e-05, 3.591357381083071e-05, 2.0148761905147694e-05, 2.12115264730528e-05, 8.74570196174318e-06, 5.728216274292208e-05, 3.0097504350123927e-05, 1.1305383850412909e-05, 6.666126409982098e-06, 1.4249604646465741e-05, 1.3763145034317859e-05, 1.0475521776243113e-05, 8.056933438638225e-06, 6.193143690325087e-06, 5.422014055511681e-06, 5.807448815176031e-06, 6.601325367228128e-06], "label": 0}
```

### Step 3: Evaluate

Once you have the prediction, you can run the evaluation script to evaluate the model:

```shell
python evaluate-slot.py
python evaluate-intent.py
```

The evaluation results are as follows:

`atis_slot`:
```
data num: 891
f1: 0.8934
```

`atis_intent`:
```
data num: 893
accuracy: 0.7088, precision: 1.0000, recall: 1.0000, f1: 1.0000
```


================================================
FILE: examples/multi-task/download.py
================================================
#  -*- coding: utf-8 -*-
from __future__ import print_function
import os
import tarfile
import shutil
import sys
import urllib
URLLIB=urllib
if sys.version_info >= (3, 0):
    import urllib.request
    URLLIB=urllib.request

def download(src, url):
    def _reporthook(count, chunk_size, total_size):
        bytes_so_far = count * chunk_size
        percent = float(bytes_so_far) / float(total_size)
        if percent > 1:
            percent = 1
        print('\r>> Downloading... {:.1%}'.format(percent), end="")

    URLLIB.urlretrieve(url, src, reporthook=_reporthook)

abs_path = os.path.abspath(__file__)
download_url = "https://baidu-nlp.bj.bcebos.com/dmtk_data_1.0.0.tar.gz"
downlaod_path = os.path.join(os.path.dirname(abs_path), "dmtk_data_1.0.0.tar.gz")
target_dir = os.path.dirname(abs_path)
download(downlaod_path, download_url)

tar = tarfile.open(downlaod_path)
tar.extractall(target_dir)
os.remove(downlaod_path)

shutil.rmtree(os.path.join(target_dir, 'data/dstc2/'))
shutil.rmtree(os.path.join(target_dir, 'data/mrda/'))
shutil.rmtree(os.path.join(target_dir, 'data/multi-woz/'))
shutil.rmtree(os.path.join(target_dir, 'data/swda/'))
shutil.rmtree(os.path.join(target_dir, 'data/udc/'))
print(" done!")


================================================
FILE: examples/multi-task/evaluate_intent.py
================================================
#  -*- coding: utf-8 -*-

import json
import numpy as np

def accuracy(preds, labels):
    preds = np.array(preds)
    labels = np.array(labels) 
    return (preds == labels).mean()
  
def pre_recall_f1(preds, labels):
    preds = np.array(preds)
    labels = np.array(labels)
    # recall=TP/(TP+FN)
    tp = np.sum((labels == '1') & (preds == '1'))
    fp = np.sum((labels == '0') & (preds == '1'))
    fn = np.sum((labels == '1') & (preds == '0'))
    r = tp * 1.0 / (tp + fn)
    # Precision=TP/(TP+FP)
    p = tp * 1.0 / (tp + fp)
    epsilon = 1e-31
    f1 = 2 * p * r / (p+r+epsilon)
    return p, r, f1


def res_evaluate(res_dir="./outputs/predict-intent/predictions.json", eval_phase='test'):
    if eval_phase == 'test':
        data_dir="./data/atis/atis_intent/test.tsv"
    elif eval_phase == 'dev':
        data_dir="./data/dev.tsv"

    else:
        assert eval_phase in ['dev', 'test'], 'eval_phase should be dev or test'
    
    labels = []
    with open(data_dir, "r") as file:
        first_flag = True
        for line in file:
            line = line.split("\t")
            label = line[0]
            if label=='label':
                continue
            labels.append(str(label))
    file.close()

    preds = []
    with open(res_dir, "r") as file:
        for line in file.readlines():
            line = json.loads(line)
            pred = line['label']
            preds.append(str(pred))
    file.close()
    assert len(labels) == len(preds), "prediction result doesn't match to labels"
    print('data num: {}'.format(len(labels)))
    p, r, f1 = pre_recall_f1(preds, labels)
    print("accuracy: {:.4f}, precision: {:.4f}, recall: {:.4f}, f1: {:.4f}".format(accuracy(preds, labels), p, r, f1))

res_evaluate()


================================================
FILE: examples/multi-task/evaluate_slot.py
================================================
#  -*- coding: utf-8 -*-

import json


def load_label_map(map_dir="./data/atis/atis_slot/label_map.json"):
    """
    :param map_dir: dict indictuing chunk type
    :return:
    """
    return json.load(open(map_dir, "r"))


def cal_chunk(pred_label, refer_label):
    tp = dict()
    fn = dict()
    fp = dict()
    for i in range(len(refer_label)):
        if refer_label[i] == pred_label[i]:
            if refer_label[i] not in tp:
                tp[refer_label[i]] = 0
            tp[refer_label[i]] += 1
        else:
            if pred_label[i] not in fp:
                fp[pred_label[i]] = 0
            fp[pred_label[i]] += 1
            if refer_label[i] not in fn:
                fn[refer_label[i]] = 0
            fn[refer_label[i]] += 1

    tp_total = sum(tp.values())
    fn_total = sum(fn.values())
    fp_total = sum(fp.values())
    p_total = float(tp_total) / (tp_total + fp_total)
    r_total = float(tp_total) / (tp_total + fn_total)
    f_micro = 2 * p_total * r_total / (p_total + r_total)

    return f_micro


def res_evaluate(res_dir="./outputs/predict-slot/predictions.json", data_dir="./data/atis/atis_slot/test.tsv"):
    label_map = load_label_map()

    total_label = []
    with open(data_dir, "r") as file:
        first_flag = True
        for line in file:
            if first_flag:
                first_flag = False
                continue
            line = line.strip("\n")
            if len(line) == 0:
                continue
            line = line.split("\t")
            if len(line) < 2:
                continue
            labels = line[1][:-1].split("\x02")
            total_label.append(labels)
    total_label = [[label_map[j] for j in i] for i in total_label]

    total_res = []
    with open(res_dir, "r") as file:
        cnt = 0
        for line in file:
            line = line.strip("\n")
            if len(line) == 0:
                continue
            try:
                res_arr = json.loads(line)

                if len(total_label[cnt]) < len(res_arr):
                    total_res.append(res_arr[1: 1 + len(total_label[cnt])])
                elif len(total_label[cnt]) == len(res_arr):
                    total_res.append(res_arr)
                else:
                    total_res.append(res_arr)
                    total_label[cnt] = total_label[cnt][: len(res_arr)]
            except:
                print("json format error: {}".format(cnt))
                print(line)

            cnt += 1

    total_res_equal = []
    total_label_equal = []
    assert len(total_label) == len(total_res), "prediction result doesn't match to labels"
    for i in range(len(total_label)):
        num = len(total_label[i])
        total_label_equal.extend(total_label[i])
        total_res[i] = total_res[i][:num]
        total_res_equal.extend(total_res[i])

    f1 = cal_chunk(total_res_equal, total_label_equal)
    print('data num: {}'.format(len(total_label)))
    print("f1: {:.4f}".format(f1))


res_evaluate()


================================================
FILE: examples/multi-task/joint_predict.py
================================================
# coding=utf-8
import paddlepalm as palm
import json
import numpy as np


if __name__ == '__main__':

    # configs
    max_seqlen = 128
    batch_size = 128
    num_epochs = 20
    print_steps = 5
    lr = 2e-5
    num_classes = 130
    weight_decay = 0.01
    num_classes_intent = 26
    dropout_prob = 0.1
    random_seed = 0
    label_map = './data/atis/atis_slot/label_map.json'
    vocab_path = './pretrain/ERNIE-v2-en-base/vocab.txt'

    train_slot = './data/atis/atis_slot/train.tsv'
    train_intent = './data/atis/atis_intent/train.tsv'

    config = json.load(open('./pretrain/ERNIE-v2-en-base/ernie_config.json'))
    input_dim = config['hidden_size']

    # -----------------------  for training ----------------------- 

    # step 1-1: create readers 
    slot_reader = palm.reader.SequenceLabelReader(vocab_path, max_seqlen, label_map, seed=random_seed, phase='predict')
    intent_reader = palm.reader.ClassifyReader(vocab_path, max_seqlen, seed=random_seed, phase='predict')

    # step 1-2: load train data
    slot_reader.load_data(train_slot, file_format='tsv', num_epochs=None, batch_size=batch_size)
    intent_reader.load_data(train_intent, batch_size=batch_size, num_epochs=None)

    # step 2: create a backbone of the model to extract text features
    ernie = palm.backbone.ERNIE.from_config(config, phase='predict')

    # step 3: register readers with ernie backbone
    slot_reader.register_with(ernie)
    intent_reader.register_with(ernie)

    # step 4: create task output heads
    slot_head = palm.head.SequenceLabel(num_classes, input_dim, dropout_prob, phase='predict')
    intent_head = palm.head.Classify(num_classes_intent, input_dim, dropout_prob, phase='predict')
   
    # step 5-1: create task trainers and multiHeadTrainer
    trainer_slot = palm.Trainer("slot", mix_ratio=1.0)
    trainer_intent = palm.Trainer("intent", mix_ratio=1.0)
    trainer = palm.MultiHeadTrainer([trainer_slot, trainer_intent])
    # # step 5-2: build forward graph with backbone and task head
    vars = trainer_intent.build_predict_forward(ernie, intent_head)
    vars = trainer_slot.build_predict_forward(ernie, slot_head)
    loss_var = trainer.build_predict_forward()

    # load checkpoint
    trainer.load_ckpt('outputs/ckpt.step300')

    # merge inference readers
    joint_iterator = trainer.merge_inference_readers([slot_reader, intent_reader])

    # for test
    # batch = next(joint_iterator('slot'))
    # results = trainer.predict_one_batch('slot', batch)
    # batch = next(joint_iterator('intent'))
    # results = trainer.predict_one_batch('intent', batch)

    # predict slot filling
    print('processing slot filling examples...')
    print('num examples: '+str(slot_reader.num_examples))
    cnt = 0
    for batch in joint_iterator('slot'):
        cnt += len(trainer.predict_one_batch('slot', batch)['logits'])
        if cnt % 1000 <= 128:
            print(str(cnt)+'th example processed.')
    print(str(cnt)+'th example processed.')

    # predict intent recognition
    print('processing intent recognition examples...')
    print('num examples: '+str(intent_reader.num_examples))
    cnt = 0
    for batch in joint_iterator('intent'):
        cnt += len(trainer.predict_one_batch('intent', batch)['logits'])
        if cnt % 1000 <= 128:
            print(str(cnt)+'th example processed.')
    print(str(cnt)+'th example processed.')


================================================
FILE: examples/multi-task/predict_intent.py
================================================
# coding=utf-8
import paddlepalm as palm
import json
from paddlepalm.distribute import gpu_dev_count


if __name__ == '__main__':

    # configs
    max_seqlen = 256
    batch_size = 16
    num_epochs = 6 
    print_steps = 5
    num_classes = 26
    vocab_path = './pretrain/ERNIE-v2-en-base/vocab.txt'
    predict_file = './data/atis/atis_intent/test.tsv'
    save_path = './outputs/'
    pred_output = './outputs/predict-intent/'
    save_type = 'ckpt'
    random_seed = 0
    config = json.load(open('./pretrain/ERNIE-v2-en-base/ernie_config.json'))
    input_dim = config['hidden_size']

    # -----------------------  for prediction ----------------------- 

    # step 1-1: create readers for prediction
    print('prepare to predict...')
    predict_cls_reader = palm.reader.ClassifyReader(vocab_path, max_seqlen, seed=random_seed, phase='predict')
    # step 1-2: load the training data
    predict_cls_reader.load_data(predict_file, batch_size)
    
    # step 2: create a backbone of the model to extract text features
    pred_ernie = palm.backbone.ERNIE.from_config(config, phase='predict')

    # step 3: register the backbone in reader
    predict_cls_reader.register_with(pred_ernie)
    
    # step 4: create the task output head
    cls_pred_head = palm.head.Classify(num_classes, input_dim, phase='predict')
    
    # step 5-1: create a task trainer
    trainer = palm.Trainer("intent")
    # step 5-2: build forward graph with backbone and task head
    trainer.build_predict_forward(pred_ernie, cls_pred_head)
 
    # step 6: load checkpoint
    pred_model_path = './outputs/ckpt.step4641'
    trainer.load_ckpt(pred_model_path)

    # step 7: fit prepared reader and data
    trainer.fit_reader(predict_cls_reader, phase='predict')

    # step 8: predict
    print('predicting..')
    trainer.predict(print_steps=print_steps, output_dir=pred_output)


================================================
FILE: examples/multi-task/predict_slot.py
================================================
# coding=utf-8
import paddlepalm as palm
import json
from paddlepalm.distribute import gpu_dev_count


if __name__ == '__main__':

    # configs
    max_seqlen = 256
    batch_size = 16
    num_epochs = 6 
    print_steps = 5
    num_classes = 130
    label_map = './data/atis/atis_slot/label_map.json'
    vocab_path = './pretrain/ERNIE-v2-en-base/vocab.txt'
    predict_file = './data/atis/atis_slot/test.tsv'
    save_path = './outputs/'
    pred_output = './outputs/predict-slot/'
    save_type = 'ckpt'
    random_seed = 0
    config = json.load(open('./pretrain/ERNIE-v2-en-base/ernie_config.json'))
    input_dim = config['hidden_size']

    # -----------------------  for prediction ----------------------- 

    # step 1-1: create readers for prediction
    print('prepare to predict...')
    predict_seq_label_reader = palm.reader.SequenceLabelReader(vocab_path, max_seqlen, label_map, seed=random_seed, phase='predict')
    # step 1-2: load the training data
    predict_seq_label_reader.load_data(predict_file, batch_size)
   
    # step 2: create a backbone of the model to extract text features
    pred_ernie = palm.backbone.ERNIE.from_config(config, phase='predict')
    
    # step 3: register the backbone in reader
    predict_seq_label_reader.register_with(pred_ernie)

    # step 4: create the task output head
    seq_label_pred_head = palm.head.SequenceLabel(num_classes, input_dim, phase='predict')
    
    # step 5-1: create a task trainer
    trainer_seq_label = palm.Trainer("slot")
    # step 5-2: build forward graph with backbone and task head
    trainer_seq_label.build_predict_forward(pred_ernie, seq_label_pred_head)
    
    # step 6: load checkpoint
    pred_model_path = './outputs/ckpt.step4641'
    trainer_seq_label.load_ckpt(pred_model_path)
    
    # step 7: fit prepared reader and data
    trainer_seq_label.fit_reader(predict_seq_label_reader, phase='predict')
   
    # step 8: predict
    print('predicting..')
    trainer_seq_label.predict(print_steps=print_steps, output_dir=pred_output)


================================================
FILE: examples/multi-task/process.py
================================================
import os
import json

label_new = "data/atis/atis_slot/label_map.json"
label_old = "data/atis/atis_slot/map_tag_slot_id.txt"
train_old = "data/atis/atis_slot/train.txt"
train_new = "data/atis/atis_slot/train.tsv"
dev_old = "data/atis/atis_slot/dev.txt"
dev_new = "data/atis/atis_slot/dev.tsv"
test_old = "data/atis/atis_slot/test.txt"
test_new = "data/atis/atis_slot/test.tsv"


intent_test =  "data/atis/atis_intent/test.tsv"
os.rename("data/atis/atis_intent/test.txt", intent_test)
intent_train =  "data/atis/atis_intent/train.tsv"
os.rename("data/atis/atis_intent/train.txt", intent_train)
intent_dev = "data/atis/atis_intent/dev.tsv"
os.rename("data/atis/atis_intent/dev.txt", intent_dev)

with open(intent_dev, 'r+') as f: 
    content = f.read()  
    f.seek(0, 0)
    f.write("label\ttext_a\n"+content)
f.close()

with open(intent_test, 'r+') as f: 
    content = f.read()  
    f.seek(0, 0)
    f.write("label\ttext_a\n"+content)
f.close()

with open(intent_train, 'r+') as f: 
    content = f.read()  
    f.seek(0, 0)
    f.write("label\ttext_a\n"+content)
f.close()

os.mknod(label_new)
os.mknod(train_new)
os.mknod(dev_new)
os.mknod(test_new)


tag = []
id = []
map = {}
with open(label_old, "r") as f:
    with open(label_new, "w") as f2:
        for line in f.readlines():
            line = line.split('\t')
            tag.append(line[0])
            id.append(int(line[1][:-1]))
            map[line[1][:-1]] = line[0]

        re = {tag[i]:id[i] for i in range(len(tag))}
        re = json.dumps(re)
        f2.write(re)
    f2.close()
f.close()


with open(train_old, "r") as f:
    with open(train_new, "w") as f2:
        f2.write("text_a\tlabel\n")
        for line in f.readlines():
            line = line.split('\t')
            text = line[0].split(' ')
            label = line[1].split(' ')
            for t in text:
                f2.write(t)
                f2.write('\2')
            f2.write('\t')
            for t in label:
                if t.endswith('\n'):
                    t = t[:-1] 
                f2.write(map[t])
                f2.write('\2')
            f2.write('\n')
    f2.close()
f.close()

with open(test_old, "r") as f:
    with open(test_new, "w") as f2:
        f2.write("text_a\tlabel\n")
        for line in f.readlines():
            line = line.split('\t')
            text = line[0].split(' ')
            label = line[1].split(' ')
            for t in text:
                f2.write(t)
                f2.write('\2')
            f2.write('\t')
            for t in label:
                if t.endswith('\n'):
                    t = t[:-1] 
                f2.write(map[t])
                f2.write('\2')
            f2.write('\n')
    f2.close()
f.close()

with open(dev_old, "r") as f:
    with open(dev_new, "w") as f2:
        f2.write("text_a\tlabel\n")
        for line in f.readlines():
            line = line.split('\t')
            text = line[0].split(' ')
            label = line[1].split(' ')
            for t in text:
                f2.write(t)
                f2.write('\2')
            f2.write('\t')
            for t in label:
                if t.endswith('\n'):
                    t = t[:-1] 
                f2.write(map[t])
                f2.write('\2')
            f2.write('\n')
    f2.close()
f.close()

os.remove(label_old)
os.remove(train_old)
os.remove(test_old)
os.remove(dev_old)

================================================
FILE: examples/multi-task/run.py
================================================
# coding=utf-8
import paddlepalm as palm
import json


if __name__ == '__main__':

    # configs
    max_seqlen = 128
    batch_size = 16
    num_epochs = 20
    print_steps = 5
    lr = 2e-5
    num_classes = 130
    weight_decay = 0.01
    num_classes_intent = 26
    dropout_prob = 0.1
    random_seed = 0
    label_map = './data/atis/atis_slot/label_map.json'
    vocab_path = './pretrain/ERNIE-v2-en-base/vocab.txt'

    train_slot = './data/atis/atis_slot/train.tsv'
    train_intent = './data/atis/atis_intent/train.tsv'

    config = json.load(open('./pretrain/ERNIE-v2-en-base/ernie_config.json'))
    input_dim = config['hidden_size']

    # -----------------------  for training ----------------------- 

    # step 1-1: create readers 
    seq_label_reader = palm.reader.SequenceLabelReader(vocab_path, max_seqlen, label_map, seed=random_seed)
    cls_reader = palm.reader.ClassifyReader(vocab_path, max_seqlen, seed=random_seed)

    # step 1-2: load train data
    seq_label_reader.load_data(train_slot, file_format='tsv', num_epochs=None, batch_size=batch_size)
    cls_reader.load_data(train_intent, batch_size=batch_size, num_epochs=None)

    # step 2: create a backbone of the model to extract text features
    ernie = palm.backbone.ERNIE.from_config(config)

    # step 3: register readers with ernie backbone
    seq_label_reader.register_with(ernie)
    cls_reader.register_with(ernie)

    # step 4: create task output heads
    seq_label_head = palm.head.SequenceLabel(num_classes, input_dim, dropout_prob)
    cls_head = palm.head.Classify(num_classes_intent, input_dim, dropout_prob)
   
    # step 5-1: create task trainers and multiHeadTrainer
    trainer_seq_label = palm.Trainer("slot", mix_ratio=1.0)
    trainer_cls = palm.Trainer("intent", mix_ratio=1.0)
    trainer = palm.MultiHeadTrainer([trainer_seq_label, trainer_cls])
    # # step 5-2: build forward graph with backbone and task head
    loss1 = trainer_cls.build_forward(ernie, cls_head)
    loss2 = trainer_seq_label.build_forward(ernie, seq_label_head)
    loss_var = trainer.build_forward()

    # step 6-1*: enable warmup for better fine-tuning
    n_steps = seq_label_reader.num_examples * 1.5 * num_epochs // batch_size
    warmup_steps = int(0.1 * n_steps)
    sched = palm.lr_sched.TriangularSchedualer(warmup_steps, n_steps)
    # step 6-2: build a optimizer
    adam = palm.optimizer.Adam(loss_var, lr, sched)
    # step 6-3: build backward graph
    trainer.build_backward(optimizer=adam, weight_decay=weight_decay)

    # step 7: fit readers to trainer
    trainer.fit_readers_with_mixratio([seq_label_reader, cls_reader], "slot", num_epochs)

    # step 8-1*: load pretrained model
    trainer.load_pretrain('./pretrain/ERNIE-v2-en-base')
    # step 8-2*: set saver to save models during training
    trainer.set_saver(save_path='./outputs/', save_steps=300)
    # step 8-3: start training
    trainer.train(print_steps=10)


================================================
FILE: examples/predict/README.md
================================================
## Example 5: Prediction
This example demonstrates how to directly do prediction with PaddlePALM. You can either initialize the model from a checkpoint, a pretrained model or just randomly initialization. Here we reuse the task and data in example 1. Hence repeat the step 1 in example 1 to pretrain data. 

After you have prepared the pre-training model and the data set required for the task, run:

```shell
python run.py
```

If you want to specify a specific gpu or use multiple gpus for predict, please use **`CUDA_VISIBLE_DEVICES`**, for example:

```shell
CUDA_VISIBLE_DEVICES=0,1 python run.py
```

Note: On multi-gpu mode, PaddlePALM will automatically split each batch onto the available cards. For example, if the `batch_size` is set 64, and there are 4 cards visible for PaddlePALM, then the batch_size in each card is actually 64/4=16. If you want to change the `batch_size` or the number of gpus used in the example, **you need to ensure that the set batch_size can be divided by the number of cards.**


Some logs will be shown below:

```
step 1/154, speed: 0.51 steps/s
step 2/154, speed: 3.36 steps/s
step 3/154, speed: 3.48 steps/s
```


After the run, you can view the predictions in the `outputs/predict` folder. Here are some examples of predictions:


```
{"index": 0, "logits": [-0.2014336884021759, 0.6799028515815735], "probs": [0.29290086030960083, 0.7070990800857544], "label": 1}
{"index": 1, "logits": [0.8593899011611938, -0.29743513464927673], "probs": [0.7607553601264954, 0.23924466967582703], "label": 0}
{"index": 2, "logits": [0.7462944388389587, -0.7083730101585388], "probs": [0.8107157349586487, 0.18928426504135132], "label": 0}
```

### Step 3: Evaluate

Once you have the prediction, you can run the evaluation script to evaluate the model:

```shell
python evaluate.py
```

The evaluation results are as follows:

```
data num: 1200
accuracy: 0.4758, precision: 0.4730, recall: 0.3026, f1: 0.3691
```


================================================
FILE: examples/predict/download.py
================================================
#  -*- coding: utf-8 -*-
from __future__ import print_function
import os
import tarfile
import shutil
import sys
import urllib
URLLIB=urllib
if sys.version_info >= (3, 0):
    import urllib.request
    URLLIB=urllib.request

def download(src, url):
    def _reporthook(count, chunk_size, total_size):
        bytes_so_far = count * chunk_size
        percent = float(bytes_so_far) / float(total_size)
        if percent > 1:
            percent = 1
        print('\r>> Downloading... {:.1%}'.format(percent), end="")

    URLLIB.urlretrieve(url, src, reporthook=_reporthook)

abs_path = os.path.abspath(__file__)
download_url = "https://ernie.bj.bcebos.com/task_data_zh.tgz"
downlaod_path = os.path.join(os.path.dirname(abs_path), "task_data_zh.tgz")
target_dir = os.path.dirname(abs_path)
download(downlaod_path, download_url)

tar = tarfile.open(downlaod_path)
tar.extractall(target_dir)
os.remove(downlaod_path)

abs_path = os.path.abspath(__file__)
dst_dir = os.path.join(os.path.dirname(abs_path), "data")
if not os.path.exists(dst_dir) or not os.path.isdir(dst_dir):
    os.makedirs(dst_dir)

for file in os.listdir(os.path.join(target_dir, 'task_data', 'chnsenticorp')):
    shutil.move(os.path.join(target_dir, 'task_data', 'chnsenticorp', file), dst_dir)

shutil.rmtree(os.path.join(target_dir, 'task_data'))
print(" done!")


================================================
FILE: examples/predict/evaluate.py
================================================
#  -*- coding: utf-8 -*-

import json
import numpy as np

def accuracy(preds, labels):
    preds = np.array(preds)
    labels = np.array(labels) 
    return (preds == labels).mean()

def pre_recall_f1(preds, labels):
    preds = np.array(preds)
    labels = np.array(labels)
    # recall=TP/(TP+FN)
    tp = np.sum((labels == '1') & (preds == '1'))
    fp = np.sum((labels == '0') & (preds == '1'))
    fn = np.sum((labels == '1') & (preds == '0'))
    r = tp * 1.0 / (tp + fn)
    # Precision=TP/(TP+FP)
    p = tp * 1.0 / (tp + fp)
    epsilon = 1e-31
    f1 = 2 * p * r / (p+r+epsilon)
    return p, r, f1


def res_evaluate(res_dir="./outputs/predict/predictions.json", eval_phase='test'):
    if eval_phase == 'test':
        data_dir="./data/test.tsv"
    elif eval_phase == 'dev':
        data_dir="./data/dev.tsv"
    else:
        assert eval_phase in ['dev', 'test'], 'eval_phase should be dev or test'
    
    labels = []
    with open(data_dir, "r") as file:
        first_flag = True
        for line in file:
            line = line.split("\t")
            label = line[0]
            if label=='label':
                continue
            labels.append(str(label))
    file.close()

    preds = []
    with open(res_dir, "r") as file:
        for line in file.readlines():
            line = json.loads(line)
            pred = line['label']
            preds.append(str(pred))
    file.close()
    assert len(labels) == len(preds), "prediction result doesn't match to labels"
    print('data num: {}'.format(len(labels)))
    p, r, f1 = pre_recall_f1(preds, labels)
    print("accuracy: {:.4f}, precision: {:.4f}, recall: {:.4f}, f1: {:.4f}".format(accuracy(preds, labels), p, r, f1))

res_evaluate()


================================================
FILE: examples/predict/run.py
================================================
# coding=utf-8
import paddlepalm as palm
import json


if __name__ == '__main__':

    # configs
    max_seqlen = 256
    batch_size = 8
    vocab_path = './pretrain/ERNIE-v1-zh-base/vocab.txt'
    predict_file = './data/test.tsv'
    random_seed = 1
    config = json.load(open('./pretrain/ERNIE-v1-zh-base/ernie_config.json'))
    input_dim = config['hidden_size']
    num_classes = 2
    task_name = 'chnsenticorp'
    pred_output = './outputs/predict/'
    print_steps = 20
    pre_params = './pretrain/ERNIE-v1-zh-base/params'

    # -----------------------  for prediction ----------------------- 

    # step 1-1: create readers for prediction
    print('prepare to predict...')
    predict_cls_reader = palm.reader.ClassifyReader(vocab_path, max_seqlen, seed=random_seed, phase='predict')
    # step 1-2: load the training data
    predict_cls_reader.load_data(predict_file, batch_size)
    
    # step 2: create a backbone of the model to extract text features
    pred_ernie = palm.backbone.ERNIE.from_config(config, phase='predict')

    # step 3: register the backbone in reader
    predict_cls_reader.register_with(pred_ernie)
    
    # step 4: create the task output head
    cls_pred_head = palm.head.Classify(num_classes, input_dim, phase='predict')
    
    # step 5-1: create a task trainer
    trainer = palm.Trainer(task_name)
    # step 5-2: build forward graph with backbone and task head
    trainer.build_predict_forward(pred_ernie, cls_pred_head)
 
    # step 6: load checkpoint
    trainer.load_predict_model(pre_params)

    # step 7: fit prepared reader and data
    trainer.fit_reader(predict_cls_reader, phase='predict')

    # step 8: predict
    print('predicting..')
    trainer.predict(print_steps=print_steps, output_dir=pred_output)


================================================
FILE: examples/tagging/README.md
================================================
## Example 3: Tagging
This task is a named entity recognition task. The following sections detail model preparation, dataset preparation, and how to run the task.

### Step 1: Prepare Pre-trained Models & Datasets

#### Pre-trianed Model

The pre-training model of this mission is: [ERNIE-v1-zh-base](https://github.com/PaddlePaddle/PALM/tree/r0.3-api).

Make sure you have downloaded the required pre-training model in the current folder.


#### Dataset

This task uses the `MSRA-NER(SIGHAN2006)` dataset. 

Download dataset:
```shell
python download.py
```

If everything goes well, there will be a folder named `data/`  created with all the datas in it.

The data should have 2 fields,  `text_a  label`, with tsv format. Here is some example datas:

 ```
text_a  label
在 这 里 恕 弟 不 恭 之 罪 ， 敢 在 尊 前 一 诤 ： 前 人 论 书 ， 每 曰 “ 字 字 有 来 历 ， 笔 笔 有 出 处 ” ， 细 读 公 字 ， 何 尝 跳 出 前 人 藩 篱 ， 自 隶 变 而 后 ， 直 至 明 季 ， 兄 有 何 新 出 ？    O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O
相 比 之 下 ， 青 岛 海 牛 队 和 广 州 松 日 队 的 雨 中 之 战 虽 然 也 是 0 ∶ 0 ， 但 乏 善 可 陈 。   O O O O O B-ORG I-ORG I-ORG I-ORG I-ORG O B-ORG I-ORG I-ORG I-ORG I-ORG O O O O O O O O O O O O O O O O O O O
理 由 多 多 ， 最 无 奈 的 却 是 ： 5 月 恰 逢 双 重 考 试 ， 她 攻 读 的 博 士 学 位 论 文 要 通 考 ； 她 任 教 的 两 所 学 校 ， 也 要 在 这 段 时 日 大 考 。    O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O
 ```


### Step 2: Train & Predict

The code used to perform this task is in `run.py`. If you have prepared the pre-training model and the data set required for the task, run:

```shell
python run.py
```

If you want to specify a specific gpu or use multiple gpus for training, please use **`CUDA_VISIBLE_DEVICES`**, for example:

```shell
CUDA_VISIBLE_DEVICES=0,1 python run.py
```

Note: On multi-gpu mode, PaddlePALM will automatically split each batch onto the available cards. For example, if the `batch_size` is set 64, and there are 4 cards visible for PaddlePALM, then the batch_size in each card is actually 64/4=16. If you want to change the `batch_size` or the number of gpus used in the example, **you need to ensure that the set batch_size can be divided by the number of cards.**

Some logs will be shown below:

```
step 1/652 (epoch 0), loss: 216.002, speed: 0.32 steps/s
step 2/652 (epoch 0), loss: 202.567, speed: 1.28 steps/s
step 3/652 (epoch 0), loss: 170.677, speed: 1.05 steps/s
```

After the run, you can view the saved models in the `outputs/` folder and the predictions in the `outputs/predict` folder. Here are some examples of predictions:


```
[6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 4, 4, 6, 4, 4, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6]
[6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6]
[6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6]
```

### Step 3: Evaluate

Once you have the prediction, you can run the evaluation script to evaluate the model:

```python
python evaluate.py
```

The evaluation results are as follows:

```
data num: 4636
f1: 0.9918
```


================================================
FILE: examples/tagging/download.py
================================================
#  -*- coding: utf-8 -*-
from __future__ import print_function
import os
import tarfile
import shutil
import sys
import urllib
URLLIB=urllib
if sys.version_info >= (3, 0):
    import urllib.request
    URLLIB=urllib.request

def download(src, url):
    def _reporthook(count, chunk_size, total_size):
        bytes_so_far = count * chunk_size
        percent = float(bytes_so_far) / float(total_size)
        if percent > 1:
            percent = 1
        print('\r>> Downloading... {:.1%}'.format(percent), end="")

    URLLIB.urlretrieve(url, src, reporthook=_reporthook)

abs_path = os.path.abspath(__file__)
download_url = "https://ernie.bj.bcebos.com/task_data_zh.tgz"
downlaod_path = os.path.join(os.path.dirname(abs_path), "task_data_zh.tgz")
target_dir = os.path.dirname(abs_path)
download(downlaod_path, download_url)

tar = tarfile.open(downlaod_path)
tar.extractall(target_dir)
os.remove(downlaod_path)

abs_path = os.path.abspath(__file__)
dst_dir = os.path.join(os.path.dirname(abs_path), "data")
if not os.path.exists(dst_dir) or not os.path.isdir(dst_dir):
    os.makedirs(dst_dir)

for file in os.listdir(os.path.join(target_dir, 'task_data', 'msra_ner')):
    shutil.move(os.path.join(target_dir, 'task_data', 'msra_ner', file), dst_dir)

shutil.rmtree(os.path.join(target_dir, 'task_data'))
print(" done!")


================================================
FILE: examples/tagging/evaluate.py
================================================
#  -*- coding: utf-8 -*-

import json


def load_label_map(map_dir="./data/label_map.json"):
    """
    :param map_dir: dict indictuing chunk type
    :return:
    """
    return json.load(open(map_dir, "r"))


def cal_chunk(pred_label, refer_label):
    tp = dict()
    fn = dict()
    fp = dict()
    for i in range(len(refer_label)):
        if refer_label[i] == pred_label[i]:
            if refer_label[i] not in tp:
                tp[refer_label[i]] = 0
            tp[refer_label[i]] += 1
        else:
            if pred_label[i] not in fp:
                fp[pred_label[i]] = 0
            fp[pred_label[i]] += 1
            if refer_label[i] not in fn:
                fn[refer_label[i]] = 0
            fn[refer_label[i]] += 1

    tp_total = sum(tp.values())
    fn_total = sum(fn.values())
    fp_total = sum(fp.values())
    p_total = float(tp_total) / (tp_total + fp_total)
    r_total = float(tp_total) / (tp_total + fn_total)
    f_micro = 2 * p_total * r_total / (p_total + r_total)

    return f_micro


def res_evaluate(res_dir="./outputs/predict/predictions.json", data_dir="./data/test.tsv"):
    label_map = load_label_map()

    total_label = []
    with open(data_dir, "r") as file:
        first_flag = True
        for line in file:
            if first_flag:
                first_flag = False
                continue
            line = line.strip("\n")
            if len(line) == 0:
                continue
            line = line.split("\t")
            if len(line) < 2:
                continue
            labels = line[1].split("\x02")
            total_label.append(labels)
    total_label = [[label_map[j] for j in i] for i in total_label]
  
    total_res = []
    with open(res_dir, "r") as file:
        cnt = 0
        for line in file:
            line = line.strip("\n")
            if len(line) == 0:
                continue
            try:
                res_arr = json.loads(line)

                if len(total_label[cnt]) < len(res_arr):
                    total_res.append(res_arr[1: 1 + len(total_label[cnt])])
                elif len(total_label[cnt]) == len(res_arr):
                    total_res.append(res_arr)
                else:
                    total_res.append(res_arr)
                    total_label[cnt] = total_label[cnt][: len(res_arr)]
            except:
                print("json format error: {}".format(cnt))
                print(line)

            cnt += 1

    total_res_equal = []
    total_label_equal = []
    assert len(total_label) == len(total_res), "prediction result doesn't match to labels"
    for i in range(len(total_label)):
        num = len(total_label[i])
        total_label_equal.extend(total_label[i])
        total_res[i] = total_res[i][:num]
        total_res_equal.extend(total_res[i])

    f1 = cal_chunk(total_res_equal, total_label_equal)
    print('data num: {}'.format(len(total_label)))
    print("f1: {:.4f}".format(f1))

res_evaluate()


================================================
FILE: examples/tagging/run.py
================================================
# coding=utf-8
import paddlepalm as palm
import json

if __name__ == '__main__':
 
    # configs
    max_seqlen = 256
    batch_size = 16
    num_epochs = 6
    lr = 5e-5
    num_classes = 7
    weight_decay = 0.01
    dropout_prob = 0.1
    vocab_path = './pretrain/ERNIE-v1-zh-base/vocab.txt'
    label_map = './data/label_map.json'
    random_seed = 1
    train_file = './data/train.tsv'
    predict_file = './data/test.tsv'
    
    save_path='./outputs/'
    save_type='ckpt' 
    pre_params = './pretrain/ERNIE-v1-zh-base/params'
    config = json.load(open('./pretrain/ERNIE-v1-zh-base/ernie_config.json'))
    input_dim = config['hidden_size']  
    task_name = 'msra_ner'
    pred_output = './outputs/predict/'
    train_print_steps = 10
    pred_print_steps = 20
    
    # -----------------------  for training ----------------------- 

    # step 1-1: create readers for training
    seq_label_reader = palm.reader.SequenceLabelReader(vocab_path, max_seqlen, label_map, seed=random_seed)
    # step 1-2: load the training data
    seq_label_reader.load_data(train_file, file_format='tsv', num_epochs=num_epochs, batch_size=batch_size)
    
    # step 2: create a backbone of the model to extract text features
    ernie = palm.backbone.ERNIE.from_config(config)

    # step 3: register the backbone in reader
    seq_label_reader.register_with(ernie)

    # step 4: create the task output head
    seq_label_head = palm.head.SequenceLabel(num_classes, input_dim, dropout_prob)

    # step 5-1: create a task trainer
    trainer = palm.Trainer(task_name)
    # step 5-2: build forward graph with backbone and task head
    loss_var = trainer.build_forward(ernie, seq_label_head)

    # step 6-1*: use warmup
    n_steps = seq_label_reader.num_examples * num_epochs // batch_size
    warmup_steps = int(0.1 * n_steps)
    print('total_steps: {}'.format(n_steps))
    print('warmup_steps: {}'.format(warmup_steps))
    sched = palm.lr_sched.TriangularSchedualer(warmup_steps, n_steps)
    # step 6-2: create a optimizer
    adam = palm.optimizer.Adam(loss_var, lr, sched)
    # step 6-3: build backward
    trainer.build_backward(optimizer=adam, weight_decay=weight_decay)
  
    # step 7: fit prepared reader and data
    trainer.fit_reader(seq_label_reader)

    # step 8-1*: load pretrained parameters
    trainer.load_pretrain(pre_params)
    # step 8-2*: set saver to save model
    save_steps = 1951
    # print('save_steps: {}'.format(save_steps))
    trainer.set_saver(save_path=save_path, save_steps=save_steps, save_type=save_type)
    # # step 8-3: start training
    trainer.train(print_steps=train_print_steps)
   
    # -----------------------  for prediction ----------------------- 

    # step 1-1: create readers for prediction
    print('prepare to predict...')
    predict_seq_label_reader = palm.reader.SequenceLabelReader(vocab_path, max_seqlen, label_map, seed=random_seed, phase='predict')
    # step 1-2: load the training data
    predict_seq_label_reader.load_data(predict_file, batch_size)
   
    # step 2: create a backbone of the model to extract text features
    pred_ernie = palm.backbone.ERNIE.from_config(config, phase='predict')
    
    # step 3: register the backbone in reader
    predict_seq_label_reader.register_with(pred_ernie)

    # step 4: create the task output head
    seq_label_pred_head = palm.head.SequenceLabel(num_classes, input_dim, phase='predict')
    
    # step 5: build forward graph with backbone and task head
    trainer.build_predict_forward(pred_ernie, seq_label_pred_head)
    
    # step 6: load checkpoint
    pred_model_path = './outputs/ckpt.step' + str(save_steps)
    trainer.load_ckpt(pred_model_path)
    
    # step 7: fit prepared reader and data
    trainer.fit_reader(predict_seq_label_reader, phase='predict')
   
    # step 8: predict
    print('predicting..')
    trainer.predict(print_steps=pred_print_steps, output_dir=pred_output)


================================================
FILE: examples/train_with_eval/README.md
================================================
## Train with Evaluation version of Example 1: Classification
This task is a sentiment analysis task. The following sections detail model preparation, dataset preparation, and how to run the task. Here to demonstrate how to do evaluation during training in PaddlePALM. 

### Step 1: Prepare Pre-trained Model & Dataset

#### Pre-trained Model

The pre-training model of this mission is: [ERNIE-v1-zh-base](https://github.com/PaddlePaddle/PALM/tree/r0.3-api).

Make sure you have downloaded the required pre-training model in the current folder.


#### Dataset

This example demonstrates with [ChnSentiCorp](https://github.com/SophonPlus/ChineseNlpCorpus/tree/master/datasets/ChnSentiCorp_htl_all), a Chinese sentiment analysis dataset.

Download dataset:
```shell
python download.py
```

If everything goes well, there will be a folder named `data/`  created with all the data files in it.

The dataset file (for training) should have 2 fields,  `text_a` and `label`, stored with [tsv](https://en.wikipedia.org/wiki/Tab-separated_values) format. Here shows an example:

```
label  text_a
0   当当网名不符实，订货多日不见送货，询问客服只会推托，只会要求用户再下订单。如此服务留不住顾客的。去别的网站买书服务更好。
0   XP的驱动不好找！我的17号提的货，现在就降价了100元，而且还送杀毒软件！
1   <荐书> 推荐所有喜欢<红楼>的红迷们一定要收藏这本书,要知道当年我听说这本书的时候花很长时间去图书馆找和借都没能如愿,所以这次一看到当当有,马上买了,红迷们也要记得备货哦!
```

### Step 2: Train & Predict

The code used to perform this task is in `run.py`. If you have prepared the pre-training model and the data set required for the task, run:

```shell
python run.py
```

If you want to specify a specific gpu or use multiple gpus for training, please use **`CUDA_VISIBLE_DEVICES`**, for example:

```shell
CUDA_VISIBLE_DEVICES=0,1 python run.py
```

Note: On multi-gpu mode, PaddlePALM will automatically split each batch onto the available cards. For example, if the `batch_size` is set 64, and there are 4 cards visible for PaddlePALM, then the batch_size in each card is actually 64/4=16. If you want to change the `batch_size` or the number of gpus used in the example, **you need to ensure that the set batch_size can be divided by the number of cards.**


Some logs will be shown below:

```
step 1/154 (epoch 0), loss: 5.512, speed: 0.51 steps/s
step 2/154 (epoch 0), loss: 2.595, speed: 3.36 steps/s
step 3/154 (epoch 0), loss: 1.798, speed: 3.48 steps/s
```


After the run, you can view the saved models in the `outputs/` folder and the predictions in the `outputs/predict` folder. Here are some examples of predictions:


```
{"index": 0, "logits": [-0.2014336884021759, 0.6799028515815735], "probs": [0.29290086030960083, 0.7070990800857544], "label": 1}
{"index": 1, "logits": [0.8593899011611938, -0.29743513464927673], "probs": [0.7607553601264954, 0.23924466967582703], "label": 0}
{"index": 2, "logits": [0.7462944388389587, -0.7083730101585388], "probs": [0.8107157349586487, 0.18928426504135132], "label": 0}
```

### Step 3: Evaluate

Once you have the prediction, you can run the evaluation script to evaluate the model:

```shell
python evaluate.py
```

The evaluation results are as follows:

```
data num: 1200
accuracy: 0.9575, precision: 0.9634, recall: 0.9523, f1: 0.9578
```


================================================
FILE: examples/train_with_eval/download.py
================================================
#  -*- coding: utf-8 -*-
from __future__ import print_function
import os
import tarfile
import shutil
import sys
import urllib
URLLIB=urllib
if sys.version_info >= (3, 0):
    import urllib.request
    URLLIB=urllib.request

def download(src, url):
    def _reporthook(count, chunk_size, total_size):
        bytes_so_far = count * chunk_size
        percent = float(bytes_so_far) / float(total_size)
        if percent > 1:
            percent = 1
        print('\r>> Downloading... {:.1%}'.format(percent), end="")

    URLLIB.urlretrieve(url, src, reporthook=_reporthook)

abs_path = os.path.abspath(__file__)
download_url = "https://ernie.bj.bcebos.com/task_data_zh.tgz"
downlaod_path = os.path.join(os.path.dirname(abs_path), "task_data_zh.tgz")
target_dir = os.path.dirname(abs_path)
download(downlaod_path, download_url)

tar = tarfile.open(downlaod_path)
tar.extractall(target_dir)
os.remove(downlaod_path)

abs_path = os.path.abspath(__file__)
dst_dir = os.path.join(os.path.dirname(abs_path), "data")
if not os.path.exists(dst_dir) or not os.path.isdir(dst_dir):
    os.makedirs(dst_dir)

for file in os.listdir(os.path.join(target_dir, 'task_data', 'chnsenticorp')):
    shutil.move(os.path.join(target_dir, 'task_data', 'chnsenticorp', file), dst_dir)

shutil.rmtree(os.path.join(target_dir, 'task_data'))
print(" done!")


================================================
FILE: examples/train_with_eval/evaluate.py
================================================
#  -*- coding: utf-8 -*-

import json
import numpy as np

def accuracy(preds, labels):
    preds = np.array(preds)
    labels = np.array(labels) 
    return (preds == labels).mean()

def pre_recall_f1(preds, labels):
    preds = np.array(preds)
    labels = np.array(labels)
    # recall=TP/(TP+FN)
    tp = np.sum((labels == '1') & (preds == '1'))
    fp = np.sum((labels == '0') & (preds == '1'))
    fn = np.sum((labels == '1') & (preds == '0'))
    r = tp * 1.0 / (tp + fn)
    # Precision=TP/(TP+FP)
    p = tp * 1.0 / (tp + fp)
    epsilon = 1e-31
    f1 = 2 * p * r / (p+r+epsilon)
    return p, r, f1


def res_evaluate(res_dir="./outputs/predict/predictions.json", eval_phase='test'):
    if eval_phase == 'test':
        data_dir="./data/test.tsv"
    elif eval_phase == 'dev':
        data_dir="./data/dev.tsv"
    else:
        assert eval_phase in ['dev', 'test'], 'eval_phase should be dev or test'
    
    labels = []
    with open(data_dir, "r") as file:
        first_flag = True
        for line in file:
            line = line.split("\t")
            label = line[0]
            if label=='label':
                continue
            labels.append(str(label))
    file.close()

    preds = []
    with open(res_dir, "r") as file:
        for line in file.readlines():
            line = json.loads(line)
            pred = line['label']
            preds.append(str(pred))
    file.close()
    assert len(labels) == len(preds), "prediction result doesn't match to labels"
    print('data num: {}'.format(len(labels)))
    p, r, f1 = pre_recall_f1(preds, labels)
    print("accuracy: {:.4f}, precision: {:.4f}, recall: {:.4f}, f1: {:.4f}".format(accuracy(preds, labels), p, r, f1))

res_evaluate()


================================================
FILE: examples/train_with_eval/run.py
================================================
# coding=utf-8
import paddlepalm as palm
import json


if __name__ == '__main__':

    # configs
    max_seqlen = 256
    batch_size = 8
    num_epochs = 10
    lr = 5e-5
    weight_decay = 0.01
    vocab_path = './pretrain/ERNIE-v1-zh-base/vocab.txt'

    train_file = './data/train.tsv'
    predict_file = './data/test.tsv'
    config = json.load(open('./pretrain/ERNIE-v1-zh-base/ernie_config.json'))
    input_dim = config['hidden_size']
    num_classes = 2
    dropout_prob = 0.1
    random_seed = 1
    task_name = 'chnsenticorp'
    save_path = './outputs/'
    pred_output = './outputs/predict/'
    save_type = 'ckpt'
    print_steps = 20
    pre_params = './pretrain/ERNIE-v1-zh-base/params'

    # -----------------------  for training ----------------------- 

    # step 1-1: create readers for training
    cls_reader = palm.reader.ClassifyReader(vocab_path, max_seqlen, seed=random_seed)
    # step 1-2: load the training data
    cls_reader.load_data(train_file, batch_size, num_epochs=num_epochs)

    # step 2: create a backbone of the model to extract text features
    ernie = palm.backbone.ERNIE.from_config(config)

    # step 3: register the backbone in reader
    cls_reader.register_with(ernie)

    # step 4: create the task output head
    cls_head = palm.head.Classify(num_classes, input_dim, dropout_prob)

    # step 5-1: create a task trainer
    trainer = palm.Trainer(task_name)
    # step 5-2: build forward graph with backbone and task head
    loss_var = trainer.build_forward(ernie, cls_head)

    # step 6-1*: use warmup
    n_steps = cls_reader.num_examples * num_epochs // batch_size
    warmup_steps = int(0.1 * n_steps)
    sched = palm.lr_sched.TriangularSchedualer(warmup_steps, n_steps)
    # step 6-2: create a optimizer
    adam = palm.optimizer.Adam(loss_var, lr, sched)
    # step 6-3: build backward
    trainer.build_backward(optimizer=adam, weight_decay=weight_decay)
  
    # step 7: fit prepared reader and data
    iterator = trainer.fit_reader(cls_reader)
    
    # step 8-1*: load pretrained parameters
    trainer.load_pretrain(pre_params)
    # step 8-2*: set saver to save model
    # save_steps = n_steps 
    save_steps = 2396
    trainer.set_saver(save_steps=save_steps, save_path=save_path, save_type=save_type)

    # step 8-3: start training
    # you can repeatly get one train batch with trainer.get_one_batch()
    # batch = trainer.get_one_batch()
    for step, batch in enumerate(iterator, start=1):
        trainer.train_one_step(batch)
        if step % 100 == 0:
            print('do evaluation.')
            # insert evaluation code here
   

================================================
FILE: paddlepalm/__init__.py
================================================
from . import downloader
# from mtl_controller import Controller 
#import controller
from . import optimizer
from . import lr_sched
from . import backbone
from . import reader
from . import head


from .trainer import Trainer
from .multihead_trainer import MultiHeadTrainer

#del interface
#del task_instance
#del default_settings
#del utils


================================================
FILE: paddlepalm/_downloader.py
================================================
# -*- coding: UTF-8 -*-
#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from __future__ import print_function
import os
import tarfile
import shutil
from collections import OrderedDict
import sys
import urllib
URLLIB=urllib
if sys.version_info >= (3, 0):
    import urllib.request
    URLLIB=urllib.request

__all__ = ["download", "ls"]

_pretrain = (('RoBERTa-zh-base', 'https://bert-models.bj.bcebos.com/chinese_roberta_wwm_ext_L-12_H-768_A-12.tar.gz'),
            ('RoBERTa-zh-large', 'https://bert-models.bj.bcebos.com/chinese_roberta_wwm_large_ext_L-24_H-1024_A-16.tar.gz'),
            ('ERNIE-v2-en-base', 'https://ernie.bj.bcebos.com/ERNIE_Base_en_stable-2.0.0.tar.gz'),
            ('ERNIE-v2-en-large', 'https://ernie.bj.bcebos.com/ERNIE_Large_en_stable-2.0.0.tar.gz'),
            ('XLNet-cased-base','https://xlnet.bj.bcebos.com/xlnet_cased_L-12_H-768_A-12.tgz'),
            ('XLNet-cased-large','https://xlnet.bj.bcebos.com/xlnet_cased_L-24_H-1024_A-16.tgz'),
            ('ERNIE-v1-zh-base','https://baidu-nlp.bj.bcebos.com/ERNIE_stable-1.0.1.tar.gz'),
            ('ERNIE-v1-zh-base-max-len-512','https://ernie.bj.bcebos.com/ERNIE_1.0_max-len-512.tar.gz'),
            ('BERT-en-uncased-large-whole-word-masking','https://bert-models.bj.bcebos.com/wwm_uncased_L-24_H-1024_A-16.tar.gz'),
            ('BERT-en-cased-large-whole-word-masking','https://bert-models.bj.bcebos.com/wwm_cased_L-24_H-1024_A-16.tar.gz'),
            ('BERT-en-uncased-base', 'https://bert-models.bj.bcebos.com/uncased_L-12_H-768_A-12.tar.gz'),
            ('BERT-en-uncased-large', 'https://bert-models.bj.bcebos.com/uncased_L-24_H-1024_A-16.tar.gz'),
            ('BERT-en-cased-base','https://bert-models.bj.bcebos.com/cased_L-12_H-768_A-12.tar.gz'),
            ('BERT-en-cased-large','https://bert-models.bj.bcebos.com/cased_L-24_H-1024_A-16.tar.gz'),
            ('BERT-multilingual-uncased-base','https://bert-models.bj.bcebos.com/multilingual_L-12_H-768_A-12.tar.gz'),
            ('BERT-multilingual-cased-base','https://bert-models.bj.bcebos.com/multi_cased_L-12_H-768_A-12.tar.gz'),
            ('BERT-zh-base','https://bert-models.bj.bcebos.com/chinese_L-12_H-768_A-12.tar.gz'),
            ('utils', None))
_vocab = (('utils', None),('utils', None))
_backbone =(('utils', None),('utils', None))
_head = (('utils', None),('utils', None))
_reader = (('utils', None),('utils', None))

_items = (('pretrain', OrderedDict(_pretrain)),
        ('vocab', OrderedDict(_vocab)), 
        ('backbone', OrderedDict(_backbone)),
        ('head', OrderedDict(_head)),
        ('reader', OrderedDict(_reader))
)
_items = OrderedDict(_items)

def _download(item, scope, path, silent=False, convert=False):
    data_url = _items[item][scope]
    if data_url == None:
        return
    if not silent:
        print('Downloading {}: {} from {}...'.format(item, scope, data_url))
    data_dir = path + '/' + item + '/' + scope
    if not os.path.exists(data_dir):
        os.makedirs(os.path.join(data_dir))
    data_name = data_url.split('/')[-1]
    filename = data_dir + '/' + data_name

    # print process
    def _reporthook(count, chunk_size, total_size):
        bytes_so_far = count * chunk_size
        percent = float(bytes_so_far) / float(total_size)
        if percent > 1:
            percent = 1
        if not silent:
            print('\r>> Downloading... {:.1%}'.format(percent), end = "")
    
    URLLIB.urlretrieve(data_url, filename, reporthook=_reporthook)
    if not silent:
        print(' done!')
    
    if item == 'pretrain':
        if not silent:
            print ('Extracting {}...'.format(data_name), end=" ")
        if os.path.exists(filename):
            tar = tarfile.open(filename, 'r')
            tar.extractall(path = data_dir)
            tar.close()
            os.remove(filename)
        if len(os.listdir(data_dir))==1:
            source_path = data_dir + '/' + data_name.split('.')[0]
            fileList = os.listdir(source_path)
            for file in fileList:
                filePath = os.path.join(source_path, file)
                shutil.move(filePath, data_dir)
            os.removedirs(source_path)
        if not silent:
            print ('done!')
        if convert:
            if not silent:
                print ('Converting params...', end=" ")
            _convert(data_dir, silent)
        if not silent:
            print ('done!')


def _convert(path, silent=False):
    if os.path.isfile(path + '/params/__palminfo__'):
        if not silent:
            print ('already converted.')
    else:
        if os.path.exists(path + '/params/'):
            os.rename(path + '/params/', path + '/params1/')
            os.mkdir(path + '/params/')
            tar_model = tarfile.open(path + '/params/' + '__palmmodel__', 'w')
            tar_info = open(path + '/params/'+ '__palminfo__', 'w')
            for root, dirs, files in os.walk(path + '/params1/'):
                for file in files:
                    src_file = os.path.join(root, file)
                    tar_model.add(src_file, '__paddlepalm_' + file)
                    tar_info.write('__paddlepalm_' + file)
                    os.remove(src_file)
            tar_model.close()
            tar_info.close()
            os.removedirs(path + '/params1/') 

def download(item, scope='all', path='.'):
    """download an item. The available scopes and contained items can be showed with `paddlepalm.downloader.ls`.

    Args:
        item: the item to download.
        scope: the scope of the item to download.
        path: the target dir to download to. Default is `.`, means current dir.
    """
    # item = item.lower()
    # scope = scope.lower()
    assert item in _items, '{} is not found. Support list: {}'.format(item, list(_items.keys()))
   
    if _items[item]['utils'] is not None:
        _download(item, 'utils', path, silent=True)

    if scope != 'all':
        assert scope in _items[item], '{} is not found. Support scopes: {}'.format(scope, list(_items[item].keys()))
        _download(item, scope, path)
    else:
        for s in _items[item].keys():
            _download(item, s, path)


def _ls(item, scope, l = 10):
    if scope != 'all':
        assert scope in _items[item], '{} is not found. Support scopes: {}'.format(scope, list(_items[item].keys()))
        print ('{}'.format(scope))
    else:
        for s in _items[item].keys():
            if s == 'utils':
                continue
            print ('  => '+s)

def ls(item='all', scope='all'):
    
    if scope == 'utils':
        return
    if item != 'all':
        assert item in _items, '{} is not found. Support scopes: {}'.format(item, list(_items.keys()))
        print ('Available {} items:'.format(item))
        _ls(item, scope)
    else:
        l = max(map(len, _items.keys()))
        for i in _items.keys():
            print ('Available {} items: '.format(i))
            _ls(i, scope, l)


================================================
FILE: paddlepalm/backbone/README.md
================================================


================================================
FILE: paddlepalm/backbone/__init__.py
================================================

from .ernie import ERNIE
from .bert import BERT


================================================
FILE: paddlepalm/backbone/base_backbone.py
================================================
# -*- coding: UTF-8 -*-
#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


class Backbone(object):
    """interface of backbone model."""

    def __init__(self, phase):
        """该函数完成一个主干网络的构造，至少需要包含一个phase参数。
        注意：实现该构造函数时，必须保证对基类构造函数的调用，以创建必要的框架内建的成员变量。
        Args:
            phase: str类型。用于区分主干网络被调用时所处的运行阶段，目前支持训练阶段train和预测阶段predict
            """

        assert isinstance(config, dict)

    @property
    def inputs_attr(self):
        """描述backbone从reader处需要得到的输入对象的属性，包含各个对象的名字、shape以及数据类型。当某个对象
        为标量数据类型（如str, int, float等）时，shape设置为空列表[]，当某个对象的某个维度长度可变时，shape
        中的相应维度设置为-1。

        Return:
            dict类型。对各个输入对象的属性描述。例如，
            对于文本分类和匹配任务，bert backbone依赖的reader对象主要包含如下的对象
                {"token_ids": ([-1, max_len], 'int64'),
                 "input_ids": ([-1, max_len], 'int64'),
                 "segment_ids": ([-1, max_len], 'int64'),
                 "input_mask": ([-1, max_len], 'float32')}"""
        raise NotImplementedError()

    @property
    def outputs_attr(self):
        """描述backbone输出对象的属性，包含各个对象的名字、shape以及数据类型。当某个对象为标量数据类型（如
        str, int, float等）时，shape设置为空列表[]，当某个对象的某个维度长度可变时，shape中的相应维度设置为-1。
        
        Return:
            dict类型。对各个输出对象的属性描述。例如，
            对于文本分类和匹配任务，bert backbone的输出内容可能包含如下的对象
                {"word_emb": ([-1, max_seqlen, word_emb_size], 'float32'),
                 "sentence_emb": ([-1, hidden_size], 'float32'),
                 "sim_vec": ([-1, hidden_size], 'float32')}""" 
        raise NotImplementedError()

    def build(self, inputs):
        """建立backbone的计算图。将符合inputs_attr描述的静态图Variable输入映射成符合outputs_attr描述的静态图Variable输出。
        Args:
            inputs: dict类型。字典中包含inputs_attr中的对象名到计算图Variable的映射，inputs中至少会包含inputs_attr中定义的对象
        Return:
           需要输出的计算图变量，输出对象会被加入到fetch_list中，从而在每个训练/推理step时得到runtime的计算结果，该计算结果会被传入postprocess方法中供用户处理。
            """
        raise NotImplementedError()


================================================
FILE: paddlepalm/backbone/bert.py
================================================
# -*- coding: UTF-8 -*-
#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""v1.1 
BERT model."""

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from paddle import fluid
from paddle.fluid import layers

from paddlepalm.backbone.utils.transformer import pre_process_layer, encoder
from paddlepalm.backbone.base_backbone import Backbone


class BERT(Backbone):


    def __init__(self, hidden_size, num_hidden_layers, num_attention_heads, vocab_size, \
          max_position_embeddings, type_vocab_size, hidden_act, hidden_dropout_prob, \
          attention_probs_dropout_prob, initializer_range, is_pairwise=False, phase='train'):
     
        self._emb_size = hidden_size
        self._n_layer = num_hidden_layers
        self._n_head = num_attention_heads
        self._voc_size = vocab_size
        self._max_position_seq_len = max_position_embeddings
        self._sent_types = type_vocab_size

       
        self._hidden_act = hidden_act
        self._prepostprocess_dropout = 0. if phase == 'predict' else hidden_dropout_prob
        self._attention_dropout = 0. if phase == 'predict' else attention_probs_dropout_prob

        self._word_emb_name = "word_embedding"
        self._pos_emb_name = "pos_embedding"
        self._sent_emb_name = "sent_embedding"
        self._task_emb_name = "task_embedding"
        self._emb_dtype = "float32"
        self._phase = phase
        self._is_pairwise = is_pairwise
        self._param_initializer = fluid.initializer.TruncatedNormal(
            scale=initializer_range)

    @classmethod
    def from_config(self, config, phase='train'):
        
        assert 'hidden_size' in config, "{} is required to initialize ERNIE".format('')
        assert 'num_hidden_layers' in config, "{} is required to initialize ERNIE".format('num_hidden_layers')
        assert 'num_attention_heads' in config, "{} is required to initialize ERNIE".format('num_attention_heads')
        assert 'vocab_size' in config, "{} is required to initialize ERNIE".format('vocab_size')
        assert 'max_position_embeddings' in config, "{} is required to initialize ERNIE".format('max_position_embeddings')
        assert 'sent_type_vocab_size' in config or 'type_vocab_size' in config, \
            "{} is required to initialize ERNIE".format('type_vocab_size')
        assert 'hidden_act' in config, "{} is required to initialize ERNIE".format('hidden_act')
        assert 'hidden_dropout_prob' in config, "{} is required to initialize ERNIE".format('hidden_dropout_prob')
        assert 'attention_probs_dropout_prob' in config, \
            "{} is required to initialize ERNIE".format('attention_probs_dropout_prob')
        assert 'initializer_range' in config, "{} is required to initialize ERNIE".format('initializer_range')

        hidden_size = config['hidden_size']
        num_hidden_layers = config['num_hidden_layers']
        num_attention_heads = config['num_attention_heads']
        vocab_size = config['vocab_size']
        max_position_embeddings = config['max_position_embeddings']
        if 'sent_type_vocab_size' in config:
            sent_type_vocab_size = config['sent_type_vocab_size']
        else:
            sent_type_vocab_size = config['type_vocab_size']

        hidden_act = config['hidden_act']
        hidden_dropout_prob = config['hidden_dropout_prob']
        attention_probs_dropout_prob = config['attention_probs_dropout_prob']
        initializer_range = config['initializer_range']
        if 'is_pairwise' in config:
            is_pairwise = config['is_pairwise']
        else:
            is_pairwise = False

        return self(hidden_size, num_hidden_layers, num_attention_heads, vocab_size, \
          max_position_embeddings, sent_type_vocab_size, \
          hidden_act, hidden_dropout_prob, attention_probs_dropout_prob, initializer_range, is_pairwise, phase)

    @property
    def inputs_attr(self):
        ret = {"token_ids": [[-1, -1], 'int64'],
               "position_ids": [[-1, -1], 'int64'],
               "segment_ids": [[-1, -1], 'int64'],
               "input_mask": [[-1, -1, 1], 'float32'],
               }
        if self._is_pairwise and self._phase=='train':
            ret.update({"token_ids_neg": [[-1, -1], 'int64'],
                        "position_ids_neg": [[-1, -1], 'int64'],
                        "segment_ids_neg": [[-1, -1], 'int64'],
                        "input_mask_neg": [[-1, -1, 1], 'float32'],
                        })
        return ret

    @property
    def outputs_attr(self):
        ret = {"word_embedding": [[-1, -1, self._emb_size], 'float32'],
               "embedding_table": [[-1, self._voc_size, self._emb_size], 'float32'],
               "encoder_outputs": [[-1, -1, self._emb_size], 'float32'],
               "sentence_embedding": [[-1, self._emb_size], 'float32'],
               "sentence_pair_embedding": [[-1, self._emb_size], 'float32']}
        if self._is_pairwise and self._phase == 'train':
            ret.update({"word_embedding_neg": [[-1, -1, self._emb_size], 'float32'],
                        "encoder_outputs_neg": [[-1, -1, self._emb_size], 'float32'],
                        "sentence_embedding_neg": [[-1, self._emb_size], 'float32'],
                        "sentence_pair_embedding_neg": [[-1, self._emb_size], 'float32']})
        return ret 

    def build(self, inputs, scope_name=""):
        src_ids = inputs['token_ids']
        pos_ids = inputs['position_ids']
        sent_ids = inputs['segment_ids']
        input_mask = inputs['input_mask']

        self._emb_dtype = 'float32'

        input_buffer = {}
        output_buffer = {}
        input_buffer['base'] = [src_ids, pos_ids, sent_ids, input_mask]
        output_buffer['base'] = {}

        if self._is_pairwise and self._phase =='train':
            src_ids = inputs['token_ids_neg']
            pos_ids = inputs['position_ids_neg']
            sent_ids = inputs['segment_ids_neg']
            input_mask = inputs['input_mask_neg']
            input_buffer['neg'] = [src_ids, pos_ids, sent_ids, input_mask]
            output_buffer['neg'] = {}
        
        for key, (src_ids, pos_ids, sent_ids, input_mask) in input_buffer.items():
            # padding id in vocabulary must be set to 0
            emb_out = fluid.embedding(
                input=src_ids,
                size=[self._voc_size, self._emb_size],
                dtype=self._emb_dtype,
                param_attr=fluid.ParamAttr(
                    name=scope_name+self._word_emb_name, initializer=self._param_initializer),
                is_sparse=False)

            # fluid.global_scope().find_var('backbone-word_embedding').get_tensor()
            embedding_table = fluid.default_main_program().global_block().var(scope_name+self._word_emb_name)
            
            position_emb_out = fluid.embedding(
                input=pos_ids,
                size=[self._max_position_seq_len, self._emb_size],
                dtype=self._emb_dtype,
                param_attr=fluid.ParamAttr(
                    name=scope_name+self._pos_emb_name, initializer=self._param_initializer))

            sent_emb_out = fluid.embedding(
                sent_ids,
                size=[self._sent_types, self._emb_size],
                dtype=self._emb_dtype,
                param_attr=fluid.ParamAttr(
                    name=scope_name+self._sent_emb_name, initializer=self._param_initializer))

            emb_out = emb_out + position_emb_out
            emb_out = emb_out + sent_emb_out

            emb_out = pre_process_layer(
                emb_out, 'nd', self._prepostprocess_dropout, name=scope_name+'pre_encoder')

            self_attn_mask = fluid.layers.matmul(
                x=input_mask, y=input_mask, transpose_y=True)

            self_attn_mask = fluid.layers.scale(
                x=self_attn_mask, scale=10000.0, bias=-1.0, bias_after_scale=False)
            n_head_self_attn_mask = fluid.layers.stack(
                x=[self_attn_mask] * self._n_head, axis=1)
            n_head_self_attn_mask.stop_gradient = True

            enc_out = encoder(
                enc_input=emb_out,
                attn_bias=n_head_self_attn_mask,
                n_layer=self._n_layer,
                n_head=self._n_head,
                d_key=self._emb_size // self._n_head,
                d_value=self._emb_size // self._n_head,
                d_model=self._emb_size,
                d_inner_hid=self._emb_size * 4,
                prepostprocess_dropout=self._prepostprocess_dropout,
                attention_dropout=self._attention_dropout,
                relu_dropout=0,
                hidden_act=self._hidden_act,
                preprocess_cmd="",
                postprocess_cmd="dan",
                param_initializer=self._param_initializer,
                name=scope_name+'encoder')

            
            next_sent_feat = fluid.layers.slice(
                input=enc_out, axes=[1], starts=[0], ends=[1])
            next_sent_feat = fluid.layers.reshape(next_sent_feat, [-1, next_sent_feat.shape[-1]])
            next_sent_feat = fluid.layers.fc(
                input=next_sent_feat,
                size=self._emb_size,
                act="tanh",
                param_attr=fluid.ParamAttr(
                    name=scope_name+"pooled_fc.w_0", initializer=self._param_initializer),
                bias_attr=scope_name+"pooled_fc.b_0")
            output_buffer[key]['word_embedding'] = emb_out
            output_buffer[key]['encoder_outputs'] = enc_out
            output_buffer[key]['sentence_embedding'] = next_sent_feat
            output_buffer[key]['sentence_pair_embedding'] = next_sent_feat
        
        ret = {}
        ret['embedding_table'] = embedding_table
        ret['word_embedding'] = output_buffer['base']['word_embedding']
        ret['encoder_outputs'] = output_buffer['base']['encoder_outputs']
        ret['sentence_embedding'] = output_buffer['base']['sentence_embedding']
        ret['sentence_pair_embedding'] = output_buffer['base']['sentence_pair_embedding']

        if self._is_pairwise and self._phase == 'train':
            ret['word_embedding_neg'] = output_buffer['neg']['word_embedding']
            ret['encoder_outputs_neg'] = output_buffer['neg']['encoder_outputs']
            ret['sentence_embedding_neg'] = output_buffer['neg']['sentence_embedding']
            ret['sentence_pair_embedding_neg'] = output_buffer['neg']['sentence_pair_embedding']
        
        return ret
                    
    def postprocess(self, rt_outputs):
        pass


class Model(BERT):
    """BERT wrapper for ConfigController"""
    def __init__(self, config, phase):
        BERT.from_config(config, phase=phase)


================================================
FILE: paddlepalm/backbone/ernie.py
================================================
# -*- coding: UTF-8 -*-
#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Ernie model."""

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
from __future__ import absolute_import

from paddle import fluid
from paddle.fluid import layers

from paddlepalm.backbone.utils.transformer import pre_process_layer, encoder
from paddlepalm.backbone.base_backbone import Backbone


class ERNIE(Backbone):
    
    def __init__(self, hidden_size, num_hidden_layers, num_attention_heads, vocab_size, \
          max_position_embeddings, sent_type_vocab_size, task_type_vocab_size, \
          hidden_act, hidden_dropout_prob, attention_probs_dropout_prob, initializer_range, is_pairwise=False, use_task_emb=True, phase='train'):

        # self._is_training = phase == 'train' # backbone一般不用关心运行阶段，因为outputs在任何阶段基本不会变
 
        self._emb_size = hidden_size
        self._n_layer = num_hidden_layers
        self._n_head = num_attention_heads
        self._voc_size = vocab_size
        self._max_position_seq_len = max_position_embeddings
        self._sent_types = sent_type_vocab_size

        self._task_types = task_type_vocab_size

        self._hidden_act = hidden_act
        self._prepostprocess_dropout = 0. if phase == 'predict' else hidden_dropout_prob
        self._attention_dropout = 0. if phase == 'predict' else attention_probs_dropout_prob

        self._word_emb_name = "word_embedding"
        self._pos_emb_name = "pos_embedding"
        self._sent_emb_name = "sent_embedding"
        self._task_emb_name = "task_embedding"
        self._emb_dtype = "float32"
        self._is_pairwise = is_pairwise
        self._use_task_emb = use_task_emb
        self._phase=phase
        self._param_initializer = fluid.initializer.TruncatedNormal(
            scale=initializer_range)

    @classmethod
    def from_config(cls, config, phase='train'):
        assert 'hidden_size' in config, "{} is required to initialize ERNIE".format('hidden_size')
        assert 'num_hidden_layers' in config, "{} is required to initialize ERNIE".format('num_hidden_layers')
        assert 'num_attention_heads' in config, "{} is required to initialize ERNIE".format('num_attention_heads')
        assert 'vocab_size' in config, "{} is required to initialize ERNIE".format('vocab_size')
        assert 'max_position_embeddings' in config, "{} is required to initialize ERNIE".format('max_position_embeddings')
        assert 'sent_type_vocab_size' in config or 'type_vocab_size' in config, "{} is required to initialize ERNIE".format('sent_type_vocab_size')
        # assert 'task_type_vocab_size' in config, "{} is required to initialize ERNIE".format('task_type_vocab_size')
        assert 'hidden_act' in config, "{} is required to initialize ERNIE".format('hidden_act')
        assert 'hidden_dropout_prob' in config, "{} is required to initialize ERNIE".format('hidden_dropout_prob')
        assert 'attention_probs_dropout_prob' in config, "{} is required to initialize ERNIE".format('attention_probs_dropout_prob')
        assert 'initializer_range' in config, "{} is required to initialize ERNIE".format('initializer_range')

        hidden_size = config['hidden_size']
        num_hidden_layers = config['num_hidden_layers']
        num_attention_heads = config['num_attention_heads']
        vocab_size = config['vocab_size']
        max_position_embeddings = config['max_position_embeddings']
        if 'sent_type_vocab_size' in config:
            sent_type_vocab_size = config['sent_type_vocab_size']
        else:
            sent_type_vocab_size = config['type_vocab_size']
        if 'task_type_vocab_size' in config:
            task_type_vocab_size = config['task_type_vocab_size']
        else:
            task_type_vocab_size = config['type_vocab_size']
        if 'use_task_emb' in config:
            use_task_emb = config['use_task_emb']
        else:
            use_task_emb = True
        hidden_act = config['hidden_act']
        hidden_dropout_prob = config['hidden_dropout_prob']
        attention_probs_dropout_prob = config['attention_probs_dropout_prob']
        initializer_range = config['initializer_range']
        if 'is_pairwise' in config:
            is_pairwise = config['is_pairwise']
        else:
            is_pairwise = False
        
        return cls(hidden_size, num_hidden_layers, num_attention_heads, vocab_size, \
          max_position_embeddings, sent_type_vocab_size, task_type_vocab_size, \
          hidden_act, hidden_dropout_prob, attention_probs_dropout_prob, initializer_range, is_pairwise, use_task_emb=use_task_emb, phase=phase)

    @property
    def inputs_attr(self):
        ret = {"token_ids": [[-1, -1], 'int64'],
               "position_ids": [[-1, -1], 'int64'],
               "segment_ids": [[-1, -1], 'int64'],
               "input_mask": [[-1, -1, 1], 'float32'],
               "task_ids": [[-1,-1], 'int64']}
        if self._is_pairwise and self._phase=='train':
            ret.update({"token_ids_neg": [[-1, -1], 'int64'],
                        "position_ids_neg": [[-1, -1], 'int64'],
                        "segment_ids_neg": [[-1, -1], 'int64'],
                        "input_mask_neg": [[-1, -1, 1], 'float32'],
                        "task_ids_neg": [[-1,-1], 'int64']
                        })
        return ret
                

    @property
    def outputs_attr(self):
        ret = {"word_embedding": [[-1, -1, self._emb_size], 'float32'],
               "embedding_table": [[-1, self._voc_size, self._emb_size], 'float32'],
               "encoder_outputs": [[-1, -1, self._emb_size], 'float32'],
               "sentence_embedding": [[-1, self._emb_size], 'float32'],
               "sentence_pair_embedding": [[-1, self._emb_size], 'float32']}
        if self._is_pairwise and self._phase == 'train':
            ret.update({"word_embedding_neg": [[-1, -1, self._emb_size], 'float32'],
                        "encoder_outputs_neg": [[-1, -1, self._emb_size], 'float32'],
                        "sentence_embedding_neg": [[-1, self._emb_size], 'float32'],
                        "sentence_pair_embedding_neg": [[-1, self._emb_size], 'float32']})
        return ret 

    def build(self, inputs, scope_name=""):
        src_ids = inputs['token_ids']
        pos_ids = inputs['position_ids']
        sent_ids = inputs['segment_ids']
        input_mask = inputs['input_mask']
        task_ids = inputs['task_ids']

        input_buffer = {}
        output_buffer = {}
        input_buffer['base'] = [src_ids, pos_ids, sent_ids, input_mask, task_ids]
        output_buffer['base'] = {}

        if self._is_pairwise and self._phase =='train':
            src_ids = inputs['token_ids_neg']
            pos_ids = inputs['position_ids_neg']
            sent_ids = inputs['segment_ids_neg']
            input_mask = inputs['input_mask_neg']
            task_ids = inputs['task_ids_neg']
            input_buffer['neg'] = [src_ids, pos_ids, sent_ids, input_mask, task_ids]
            output_buffer['neg'] = {}

        for key, (src_ids, pos_ids, sent_ids, input_mask, task_ids) in input_buffer.items():
            # padding id in vocabulary must be set to 0
            emb_out = fluid.embedding(
                input=src_ids,
                size=[self._voc_size, self._emb_size],
                dtype=self._emb_dtype,
                param_attr=fluid.ParamAttr(
                    name=scope_name+self._word_emb_name, initializer=self._param_initializer),
                is_sparse=False)
        
            # fluid.global_scope().find_var('backbone-word_embedding').get_tensor()
            embedding_table = fluid.default_main_program().global_block().var(scope_name+self._word_emb_name)
            
            position_emb_out = fluid.embedding(
                input=pos_ids,
                size=[self._max_position_seq_len, self._emb_size],
                dtype=self._emb_dtype,
                param_attr=fluid.ParamAttr(
                    name=scope_name+self._pos_emb_name, initializer=self._param_initializer))

            sent_emb_out = fluid.embedding(
                sent_ids,
                size=[self._sent_types, self._emb_size],
                dtype=self._emb_dtype,
                param_attr=fluid.ParamAttr(
                    name=scope_name+self._sent_emb_name, initializer=self._param_initializer))

            emb_out = emb_out + position_emb_out
            emb_out = emb_out + sent_emb_out

            if self._use_task_emb:
                task_emb_out = fluid.embedding(
                    task_ids,
                    size=[self._task_types, self._emb_size],
                    dtype=self._emb_dtype,
                    param_attr=fluid.ParamAttr(
                        name=scope_name+self._task_emb_name,
                        initializer=self._param_initializer))

                emb_out = emb_out + task_emb_out

            emb_out = pre_process_layer(
                emb_out, 'nd', self._prepostprocess_dropout, name=scope_name+'pre_encoder')

            self_attn_mask = fluid.layers.matmul(
                x=input_mask, y=input_mask, transpose_y=True)

            self_attn_mask = fluid.layers.scale(
                x=self_attn_mask, scale=10000.0, bias=-1.0, bias_after_scale=False)
            n_head_self_attn_mask = fluid.layers.stack(
                x=[self_attn_mask] * self._n_head, axis=1)
            n_head_self_attn_mask.stop_gradient = True

            enc_out = encoder(
                enc_input=emb_out,
                attn_bias=n_head_self_attn_mask,
                n_layer=self._n_layer,
                n_head=self._n_head,
                d_key=self._emb_size // self._n_head,
                d_value=self._emb_size // self._n_head,
                d_model=self._emb_size,
                d_inner_hid=self._emb_size * 4,
                prepostprocess_dropout=self._prepostprocess_dropout,
                attention_dropout=self._attention_dropout,
                relu_dropout=0,
                hidden_act=self._hidden_act,
                preprocess_cmd="",
                postprocess_cmd="dan",
                param_initializer=self._param_initializer,
                name=scope_name+'encoder')

            next_sent_feat = fluid.layers.slice(
                input=enc_out, axes=[1], starts=[0], ends=[1])
            next_sent_feat = fluid.layers.reshape(next_sent_feat, [-1, next_sent_feat.shape[-1]])
            next_sent_feat = fluid.layers.fc(
                input=next_sent_feat,
                size=self._emb_size,
                act="tanh",
                param_attr=fluid.ParamAttr(
                    name=scope_name+"pooled_fc.w_0", initializer=self._param_initializer),
                bias_attr=scope_name+"pooled_fc.b_0")
            
            output_buffer[key]['word_embedding'] = emb_out
            output_buffer[key]['encoder_outputs'] = enc_out
            output_buffer[key]['sentence_embedding'] = next_sent_feat
            output_buffer[key]['sentence_pair_embedding'] = next_sent_feat
        
        ret = {}
        ret['embedding_table'] = embedding_table
        ret['word_embedding'] = output_buffer['base']['word_embedding']
        ret['encoder_outputs'] = output_buffer['base']['encoder_outputs']
        ret['sentence_embedding'] = output_buffer['base']['sentence_embedding']
        ret['sentence_pair_embedding'] = output_buffer['base']['sentence_pair_embedding']

        if self._is_pairwise and self._phase == 'train':
            ret['word_embedding_neg'] = output_buffer['neg']['word_embedding']
            ret['encoder_outputs_neg'] = output_buffer['neg']['encoder_outputs']
            ret['sentence_embedding_neg'] = output_buffer['neg']['sentence_embedding']
            ret['sentence_pair_embedding_neg'] = output_buffer['neg']['sentence_pair_embedding']
        
        return ret

    def postprocess(self, rt_outputs):
        pass


class Model(ERNIE):

    def __init__(self, config, phase):
        ERNIE.from_config(config, phase=phase)


================================================
FILE: paddlepalm/backbone/utils/__init__.py
================================================


================================================
FILE: paddlepalm/backbone/utils/transformer.py
================================================
# -*- coding: UTF-8 -*-
#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Transformer encoder."""

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from functools import partial

import paddle.fluid as fluid
import paddle.fluid.layers as layers

from paddle.fluid.layer_helper import LayerHelper as LayerHelper
from functools import reduce # py3
def layer_norm(x, begin_norm_axis=1, epsilon=1e-6, param_attr=None, bias_attr=None):
    helper = LayerHelper('layer_norm', **locals())
    mean = layers.reduce_mean(x, dim=begin_norm_axis, keep_dim=True)
    shift_x = layers.elementwise_sub(x=x, y=mean, axis=0)
    variance = layers.reduce_mean(layers.square(shift_x), dim=begin_norm_axis, keep_dim=True)
    r_stdev = layers.rsqrt(variance + epsilon)
    norm_x = layers.elementwise_mul(x=shift_x, y=r_stdev, axis=0)

    param_shape = [reduce(lambda x, y: x * y, norm_x.shape[begin_norm_axis:])]
    param_dtype = norm_x.dtype
    scale = helper.create_parameter(
        attr=param_attr,
        shape=param_shape,
        dtype=param_dtype,
        default_initializer=fluid.initializer.Constant(1.))
    bias = helper.create_parameter(
        attr=bias_attr,
        shape=param_shape,
        dtype=param_dtype,
        is_bias=True,
        default_initializer=fluid.initializer.Constant(0.))

    out = layers.elementwise_mul(x=norm_x, y=scale, axis=-1)
    out = layers.elementwise_add(x=out, y=bias, axis=-1)

    return out


def multi_head_attention(queries,
                         keys,
                         values,
                         attn_bias,
                         d_key,
                         d_value,
                         d_model,
                         n_head=1,
                         dropout_rate=0.,
                         cache=None,
                         param_initializer=None,
                         name='multi_head_att'):
    """
    Multi-Head Attention. Note that attn_bias is added to the logit before
    computing softmax activiation to mask certain selected positions so that
    they will not considered in attention weights.
    """
    keys = queries if keys is None else keys
    values = keys if values is None else values

    if not (len(queries.shape) == len(keys.shape) == len(values.shape) == 3):
        raise ValueError(
            "Inputs: quries, keys and values should all be 3-D tensors.")

    def __compute_qkv(queries, keys, values, n_head, d_key, d_value):
        """
        Add linear projection to queries, keys, and values.
        """
        q = layers.fc(input=queries,
                      size=d_key * n_head,
                      num_flatten_dims=2,
                      param_attr=fluid.ParamAttr(
                          name=name + '_query_fc.w_0',
                          initializer=param_initializer),
                      bias_attr=name + '_query_fc.b_0')
        k = layers.fc(input=keys,
                      size=d_key * n_head,
                      num_flatten_dims=2,
                      param_attr=fluid.ParamAttr(
                          name=name + '_key_fc.w_0',
                          initializer=param_initializer),
                      bias_attr=name + '_key_fc.b_0')
        v = layers.fc(input=values,
                      size=d_value * n_head,
                      num_flatten_dims=2,
                      param_attr=fluid.ParamAttr(
                          name=name + '_value_fc.w_0',
                          initializer=param_initializer),
                      bias_attr=name + '_value_fc.b_0')
        return q, k, v

    def __split_heads(x, n_head):
        """
        Reshape the last dimension of inpunt tensor x so that it becomes two
        dimensions and then transpose. Specifically, input a tensor with shape
        [bs, max_sequence_length, n_head * hidden_dim] then output a tensor
        with shape [bs, n_head, max_sequence_length, hidden_dim].
        """
        hidden_size = x.shape[-1]
        # The value 0 in shape attr means copying the corresponding dimension
        # size of the input as the output dimension size.
        reshaped = layers.reshape(
            x=x, shape=[0, 0, n_head, hidden_size // n_head], inplace=True)

        # permuate the dimensions into:
        # [batch_size, n_head, max_sequence_len, hidden_size_per_head]
        return layers.transpose(x=reshaped, perm=[0, 2, 1, 3])

    def __combine_heads(x):
        """
        Transpose and then reshape the last two dimensions of inpunt tensor x
        so that it becomes one dimension, which is reverse to __split_heads.
        """
        if len(x.shape) == 3: return x
        if len(x.shape) != 4:
            raise ValueError("Input(x) should be a 4-D Tensor.")

        trans_x = layers.transpose(x, perm=[0, 2, 1, 3])
        # The value 0 in shape attr means copying the corresponding dimension
        # size of the input as the output dimension size.
        return layers.reshape(
            x=trans_x,
            shape=[0, 0, trans_x.shape[2] * trans_x.shape[3]],
            inplace=True)

    def scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate):
        """
        Scaled Dot-Product Attention
        """
        scaled_q = layers.scale(x=q, scale=d_key**-0.5)
        product = layers.matmul(x=scaled_q, y=k, transpose_y=True)
        if attn_bias:
            product += attn_bias
        weights = layers.softmax(product)
        if dropout_rate:
            weights = layers.dropout(
                weights,
                dropout_prob=dropout_rate,
                dropout_implementation="upscale_in_train",
                is_test=False)
        out = layers.matmul(weights, v)
        return out

    q, k, v = __compute_qkv(queries, keys, values, n_head, d_key, d_value)

    if cache is not None:  # use cache and concat time steps
        # Since the inplace reshape in __split_heads changes the shape of k and
        # v, which is the cache input for next time step, reshape the cache
        # input from the previous time step first.
        k = cache["k"] = layers.concat(
            [layers.reshape(
                cache["k"], shape=[0, 0, d_model]), k], axis=1)
        v = cache["v"] = layers.concat(
            [layers.reshape(
                cache["v"], shape=[0, 0, d_model]), v], axis=1)

    q = __split_heads(q, n_head)
    k = __split_heads(k, n_head)
    v = __split_heads(v, n_head)

    ctx_multiheads = scaled_dot_product_attention(q, k, v, attn_bias, d_key,
                                                  dropout_rate)

    out = __combine_heads(ctx_multiheads)

    # Project back to the model size.
    proj_out = layers.fc(input=out,
                         size=d_model,
                         num_flatten_dims=2,
                         param_attr=fluid.ParamAttr(
                             name=name + '_output_fc.w_0',
                             initializer=param_initializer),
                         bias_attr=name + '_output_fc.b_0')
    return proj_out


def positionwise_feed_forward(x,
                              d_inner_hid,
                              d_hid,
                              dropout_rate,
                              hidden_act,
                              param_initializer=None,
                              name='ffn'):
    """
    Position-wise Feed-Forward Networks.
    This module consists of two linear transformations with a ReLU activation
    in between, which is applied to each position separately and identically.
    """
    hidden = layers.fc(input=x,
                       size=d_inner_hid,
                       num_flatten_dims=2,
                       act=hidden_act,
                       param_attr=fluid.ParamAttr(
                           name=name + '_fc_0.w_0',
                           initializer=param_initializer),
                       bias_attr=name + '_fc_0.b_0')
    if dropout_rate:
        hidden = layers.dropout(
            hidden,
            dropout_prob=dropout_rate,
            dropout_implementation="upscale_in_train",
            is_test=False)
    out = layers.fc(input=hidden,
                    size=d_hid,
                    num_flatten_dims=2,
                    param_attr=fluid.ParamAttr(
                        name=name + '_fc_1.w_0', initializer=param_initializer),
                    bias_attr=name + '_fc_1.b_0')
    return out


def pre_post_process_layer(prev_out, out, process_cmd, dropout_rate=0.,
                           name=''):
    """
    Add residual connection, layer normalization and droput to the out tensor
    optionally according to the value of process_cmd.
    This will be used before or after multi-head attention and position-wise
    feed-forward networks.
    """
    for cmd in process_cmd:
        if cmd == "a":  # add residual connection
            out = out + prev_out if prev_out else out
        elif cmd == "n":  # add layer normalization
            out_dtype = out.dtype
            if out_dtype == fluid.core.VarDesc.VarType.FP16:
                out = layers.cast(x=out, dtype="float32")
            out = layer_norm(
                out,
                begin_norm_axis=len(out.shape) - 1,
                param_attr=fluid.ParamAttr(
                    name=name + '_layer_norm_scale',
                    initializer=fluid.initializer.Constant(1.)),
                bias_attr=fluid.ParamAttr(
                    name=name + '_layer_norm_bias',
                    initializer=fluid.initializer.Constant(0.)))
            if out_dtype == fluid.core.VarDesc.VarType.FP16:
                out = layers.cast(x=out, dtype="float16")
        elif cmd == "d":  # add dropout
            if dropout_rate:
                out = layers.dropout(
                    out,
                    dropout_prob=dropout_rate,
                    dropout_implementation="upscale_in_train",
                    is_test=False)
    return out


pre_process_layer = partial(pre_post_process_layer, None)
post_process_layer = pre_post_process_layer


def encoder_layer(enc_input,
                  attn_bias,
                  n_head,
                  d_key,
                  d_value,
                  d_model,
                  d_inner_hid,
                  prepostprocess_dropout,
                  attention_dropout,
                  relu_dropout,
                  hidden_act,
                  preprocess_cmd="n",
                  postprocess_cmd="da",
                  param_initializer=None,
                  name=''):
    """The encoder layers that can be stacked to form a deep encoder.
    This module consits of a multi-head (self) attention followed by
    position-wise feed-forward networks and both the two components companied
    with the post_process_layer to add residual connection, layer normalization
    and droput.
    """
    attn_output = multi_head_attention(
        pre_process_layer(
            enc_input,
            preprocess_cmd,
            prepostprocess_dropout,
            name=name + '_pre_att'),
        None,
        None,
        attn_bias,
        d_key,
        d_value,
        d_model,
        n_head,
        attention_dropout,
        param_initializer=param_initializer,
        name=name + '_multi_head_att')
    attn_output = post_process_layer(
        enc_input,
        attn_output,
        postprocess_cmd,
        prepostprocess_dropout,
        name=name + '_post_att')
    ffd_output = positionwise_feed_forward(
        pre_process_layer(
            attn_output,
            preprocess_cmd,
            prepostprocess_dropout,
            name=name + '_pre_ffn'),
        d_inner_hid,
        d_model,
        relu_dropout,
        hidden_act,
        param_initializer=param_initializer,
        name=name + '_ffn')
    return post_process_layer(
        attn_output,
        ffd_output,
        postprocess_cmd,
        prepostprocess_dropout,
        name=name + '_post_ffn')


def encoder(enc_input,
            attn_bias,
            n_layer,
            n_head,
            d_key,
            d_value,
            d_model,
            d_inner_hid,
            prepostprocess_dropout,
            attention_dropout,
            relu_dropout,
            hidden_act,
            preprocess_cmd="n",
            postprocess_cmd="da",
            param_initializer=None,
            name=''):
    """
    The encoder is composed of a stack of identical layers returned by calling
    encoder_layer.
    """
    for i in range(n_layer):
        enc_output = encoder_layer(
            enc_input,
            attn_bias,
            n_head,
            d_key,
            d_value,
            d_model,
            d_inner_hid,
            prepostprocess_dropout,
            attention_dropout,
            relu_dropout,
            hidden_act,
            preprocess_cmd,
            postprocess_cmd,
            param_initializer=param_initializer,
            name=name + '_layer_' + str(i))
        enc_input = enc_output
    enc_output = pre_process_layer(
        enc_output, preprocess_cmd, prepostprocess_dropout, name="post_encoder")

    return enc_output


================================================
FILE: paddlepalm/distribute/__init__.py
================================================
from paddle import fluid
import os
import multiprocessing

gpu_dev_count = int(fluid.core.get_cuda_device_count())
cpu_dev_count = int(os.environ.get('CPU_NUM', multiprocessing.cpu_count()))

from .reader import yield_pieces, data_feeder, decode_fake


================================================
FILE: paddlepalm/distribute/reader.py
================================================

from . import gpu_dev_count, cpu_dev_count
try:
    import queue as Queue
except ImportError:
    import Queue
from threading import Thread

dev_count = gpu_dev_count if gpu_dev_count > 0 else cpu_dev_count

def yield_pieces(data, distribute_strategy, batch_size):
    """
    Args:
        distribute_strategy: support s=split, c=copy, u=unstack,
        """
    assert batch_size % dev_count == 0, "batch_size need to be integer times larger than dev_count."
    # print('data in yield pieces')
    # print(len(data))

    assert type(data) == type(distribute_strategy), [type(data), type(distribute_strategy)]
    assert len(data) == len(distribute_strategy), [len(data), len(distribute_strategy)]
    if isinstance(data, dict):
        keys = list(data.keys())
        data_list = [data[i] for i in keys]
        ds_list = [distribute_strategy[i] for i in keys]
    else:
        assert isinstance(data, list), "the input data must be a list or dict, and contained with multiple tensors."
        data_list = data
        ds_list = distribute_strategy
    stride = batch_size // dev_count
    p = stride
    # while p < len(data_list) + stride:
    while p <= batch_size:
        temp = []
        for d, s in zip(data_list, ds_list):
            s = s.strip().lower()
            if s == 's' or s == 'split':
                if p - stride >= len(d):
                    # print('WARNING: no more examples to feed empty devices')
                    temp = []
                    return
                temp.append(d[p-stride:p])
            elif s == 'u' or s == 'unstack':
                assert len(d) <= dev_count, 'Tensor size on dim 0 must be less equal to dev_count when unstack is applied.'
                if p//stride > len(d):
                    # print('WARNING: no more examples to feed empty devices')
                    return
                temp.append(d[p//stride-1])
            elif s == 'c' or s == 'copy':
                temp.append(d)
            else:
                raise NotImplementedError()
            
        p += stride
        if type(data) == dict:
            yield dict(zip(*[keys, temp]))
        else:
            # print('yielded pieces')
            # print(len(temp))
            yield temp


def data_feeder(reader, postprocess_fn=None, prefetch_steps=2, phase='train', is_multi=False):
    if postprocess_fn is None:
        def postprocess_fn(batch, id=-1, phase='train', is_multi=False):
            return batch

    def worker(reader, dev_count, queue):
        dev_batches = []
        for index, data in enumerate(reader()):
            if len(dev_batches) < dev_count:
                dev_batches.append(data)
            if len(dev_batches) == dev_count:
                queue.put((dev_batches, 0))
                dev_batches = []
        # For the prediction of the remained batches, pad more batches to 
        # the number of devices and the padded samples would be removed in
        # prediction outputs. 
        if len(dev_batches) > 0:
            num_pad = dev_count - len(dev_batches)
            for i in range(len(dev_batches), dev_count):
                dev_batches.append(dev_batches[-1])
            queue.put((dev_batches, num_pad))
        queue.put(None)

    queue = Queue.Queue(dev_count*prefetch_steps)
    p = Thread(
        target=worker, args=(reader, dev_count, queue))
    p.daemon = True
    p.start()
    while True:
        ret = queue.get()
        queue.task_done()
        if ret is not None:
            batches, num_pad = ret
            if dev_count > 1 and phase == 'train' and is_multi: 
                id = batches[0]['__task_id'][0]
            else:
                id = -1
            batch_buf = []
            flag_buf = []
            for idx, batch in enumerate(batches):
                # flag = num_pad == 0
                flag = idx-len(batches) < -num_pad
                # if num_pad > 0:
                #     num_pad -= 1
                batch = postprocess_fn(batch, id, phase, is_multi=is_multi)
                # batch = postprocess_fn(batch)
                batch_buf.append(batch)
                flag_buf.append(flag)
            yield batch_buf, flag_buf
        else:
            break
    queue.join()


def decode_fake(nums, mask, bs):
    bs //= dev_count
    n_t = 0
    for flag in mask:
        if not flag:
            break
        n_t = n_t + 1

    n_f = len(mask) - n_t
    p1 = nums - (n_t-1) * bs
    assert p1 % (n_f+1) == 0
    each_f = p1 // (n_f+1)
    return each_f * n_f


================================================
FILE: paddlepalm/downloader.py
================================================
from ._downloader import *


================================================
FILE: paddlepalm/head/__init__.py
================================================

from .cls import Classify
from .match import Match
from .ner import SequenceLabel
from .mrc import MRC
from .mlm import MaskLM


================================================
FILE: paddlepalm/head/base_head.py
================================================
# -*- coding: UTF-8 -*-
#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import os
import json
import copy

class Head(object):

    def __init__(self, phase='train'):
        """该函数完成一个任务头的构造，至少需要包含一个phase参数。
        注意：实现该构造函数时，必须保证对基类构造函数的调用，以创建必要的框架内建的成员变量。
        Args:
            phase: str类型。用于区分任务头被调用时所处的任务运行阶段，目前支持训练阶段train和预测阶段predict
            """
        self._stop_gradient = {}
        self._phase = phase
        self._prog = None
        self._results_buffer = []

    @property
    def inputs_attrs(self):
        """step级别的任务输入对象声明。

        描述该任务头所依赖的reader、backbone和来自其他任务头的输出对象（每个step获取一次）。使用字典进行描述，
        字典的key为输出对象所在的组件（如’reader‘，’backbone‘等），value为该组件下任务头所需要的输出对象集。
        输出对象集使用字典描述，key为输出对象的名字（该名字需保证在相关组件的输出对象集中），value为该输出对象
        的shape和dtype。当某个输出对象的某个维度长度可变时，shape中的相应维度设置为-1。

        Return:
            dict类型。描述该任务头所依赖的step级输入，即来自各个组件的输出对象。"""
        raise NotImplementedError()

    @property
    def outputs_attr(self):
        """step级别的任务输出对象声明。

        描述该任务头的输出对象（每个step输出一次），包括每个输出对象的名字，shape和dtype。输出对象会被加入到
        fetch_list中，从而在每个训练/推理step时得到实时的计算结果，该计算结果可以传入batch_postprocess方
        法中进行当前step的后处理。当某个对象为标量数据类型（如str, int, float等）时，shape设置为空列表[]，
        当某个对象的某个维度长度可变时，shape中的相应维度设置为-1。

        Return:
            dict类型。描述该任务头所产生的输出对象。注意，在训练阶段时必须包含名为loss的输出对象。
            """

        raise NotImplementedError()

    @property
    def epoch_inputs_attrs(self):
        """epoch级别的任务输入对象声明。

        描述该任务所依赖的来自reader、backbone和来自其他任务头的输出对象（每个epoch结束后产生一次），如完整的
        样本集，有效的样本数等。使用字典进行描述，字典的key为输出对象所在的组件（如’reader‘，’backbone‘等），
        value为该组件下任务头所需要的输出对象集。输出对象集使用字典描述，key为输出对象的名字（该名字需保证在相关
        组件的输出对象集中），value为该输出对象的shape和dtype。当某个输出对象的某个维度长度可变时，shape中的相
        应维度设置为-1。
        
        Return:
            dict类型。描述该任务头所产生的输出对象。注意，在训练阶段时必须包含名为loss的输出对象。
        """
        return {}

    def build(self, inputs, scope_name=""):
        """建立任务头的计算图。

        将符合inputs_attrs描述的来自各个对象集的静态图Variables映射成符合outputs_attr描述的静态图Variable输出。

        Args:
            inputs: dict类型。字典中包含inputs_attrs中的对象名到计算图Variable的映射，inputs中至少会包含inputs_attr中定义的对象
        Return:
           需要输出的计算图变量，输出对象会被加入到fetch_list中，从而在每个训练/推理step时得到runtime的计算结果，该计算结果会被传入postprocess方法中供用户处理。
        """
        raise NotImplementedError()

    def batch_postprocess(self, rt_outputs):
        """batch/step级别的后处理。

        每个训练或推理step后针对当前batch的任务头输出对象的实时计算结果来进行相关后处理。
        默认将输出结果存储到缓冲区self._results_buffer中。"""
        if isinstance(rt_outputs, dict):
            keys = rt_outputs.keys()
            vals = [rt_outputs[k] for k in keys]
            lens = [len(v) for v in vals]
            if len(set(lens)) == 1:
                results = [dict(zip(*[keys, i])) for i in zip(*vals)]
                self._results_buffer.extend(results)
                return results
            else:
                print('WARNING: irregular output results. visualize failed.')
                self._results_buffer.append(rt_outputs)
        return None

    def reset(self):
        """清空该任务头的缓冲区（在训练或推理过程中积累的处理结果）"""
        self._results_buffer = []

    def get_results(self):
        """返回当前任务头积累的处理结果。"""
        return copy.deepcopy(self._results_buffer)
        
    def epoch_postprocess(self, post_inputs=None, output_dir=None):
        """epoch级别的后处理。

        每个训练或推理epoch结束后，对积累的各样本的后处理结果results进行后处理。默认情况下，当output_dir为None时，直接将results打印到
        屏幕上。当指定output_dir时，将results存储在指定的文件夹内，并以任务头所处阶段来作为存储文件的文件名。

        Args:
            post_inputs: 当声明的epoch_inputs_attr不为空时，该参数会携带对应的输入变量的内容。
            output_dir: 积累结果的保存路径。
        """
        if output_dir is not None:
            if not os.path.exists(output_dir):
                os.makedirs(output_dir)
            with open(os.path.join(output_dir, self._phase), 'w') as writer:
                for i in self._results_buffer:
                    writer.write(json.dumps(i)+'\n')
        else:
            return self._results_buffer


================================================
FILE: paddlepalm/head/cls.py
================================================
# -*- coding: UTF-8 -*-
#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import paddle.fluid as fluid
from paddle.fluid import layers
from paddlepalm.head.base_head import Head
import numpy as np
import os
import json


class Classify(Head):
    """
    classification
    """
    def __init__(self, num_classes, input_dim, dropout_prob=0.0, \
                 param_initializer_range=0.02, phase='train'):

        self._is_training = phase == 'train'
        self._hidden_size = input_dim

        self.num_classes = num_classes
    
        self._dropout_prob = dropout_prob if phase == 'train' else 0.0
        self._param_initializer = fluid.initializer.TruncatedNormal(
            scale=param_initializer_range)
        self._preds = []
        self._probs = []

    @property
    def inputs_attrs(self):
        reader = {}
        bb = {"sentence_embedding": [[-1, self._hidden_size], 'float32']}
        if self._is_training:
            reader["label_ids"] = [[-1], 'int64']
        return {'reader': reader, 'backbone': bb}

    @property
    def outputs_attrs(self):
        if self._is_training:
            return {'loss': [[1], 'float32']}
        else:
            return {'logits': [[-1, self.num_classes], 'float32'],
                    'probs': [[-1, self.num_classes], 'float32']}
            

    def build(self, inputs, scope_name=''):
        sent_emb = inputs['backbone']['sentence_embedding']
        if self._is_training:
            label_ids = inputs['reader']['label_ids']
            cls_feats = fluid.layers.dropout(
                x=sent_emb,
                dropout_prob=self._dropout_prob,
                dropout_implementation="upscale_in_train")

        logits = fluid.layers.fc(
            input=sent_emb,
            size=self.num_classes,
            param_attr=fluid.ParamAttr(
                name=scope_name+"cls_out_w",
                initializer=self._param_initializer),
            bias_attr=fluid.ParamAttr(
                name=scope_name+"cls_out_b", initializer=fluid.initializer.Constant(0.)))
        probs = fluid.layers.softmax(logits)
        if self._is_training:
            loss = fluid.layers.cross_entropy(
                input=probs, label=label_ids)
            loss = layers.mean(loss)
            return {"loss": loss}
        else:
            return {"logits":logits,
                    "probs":probs}

    def batch_postprocess(self, rt_outputs):
        if not self._is_training:
            logits = rt_outputs['logits']
            probs = rt_outputs['probs']
            self._preds.extend(logits.tolist())
            self._probs.extend(probs.tolist())


    def epoch_postprocess(self, post_inputs, output_dir=None):
        # there is no post_inputs needed and not declared in epoch_inputs_attrs, hence no elements exist in post_inputs
        if not self._is_training:
            results = []
            for i in range(len(self._preds)):
                label = int(np.argmax(np.array(self._preds[i])))
                result = {'index': i, 'label': label, 'logits': self._preds[i], 'probs': self._probs[i]}
                results.append(result)
            if output_dir is not None:
                with open(os.path.join(output_dir, 'predictions.json'), 'w') as writer:
                    for result in results:
                        result = json.dumps(result)
                        writer.write(result+'\n')
                print('Predictions saved at '+os.path.join(output_dir, 'predictions.json'))
            return results

                
================================================
FILE: paddlepalm/head/match.py
================================================
# -*- coding: UTF-8 -*-
#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


import paddle.fluid as fluid
from paddle.fluid import layers
from paddlepalm.head.base_head import Head
import numpy as np
import os
import json


def computeHingeLoss(pos, neg, margin):
    loss_part1 = fluid.layers.elementwise_sub(
        fluid.layers.fill_constant_batch_size_like(
            input=pos, shape=[-1, 1], value=margin, dtype='float32'), pos)
    loss_part2 = fluid.layers.elementwise_add(loss_part1, neg)
    loss_part3 = fluid.layers.elementwise_max(
        fluid.layers.fill_constant_batch_size_like(
            input=loss_part2, shape=[-1, 1], value=0.0, dtype='float32'), loss_part2)
    return loss_part3


class Match(Head):
    '''
    matching
    '''
   
    def __init__(self, num_classes, input_dim, dropout_prob=0.0, param_initializer_range=0.02, \
        learning_strategy='pointwise', margin=0.5, phase='train'):

        """  
        Args:
            phase: train, eval, pred
            lang: en, ch, ...
            learning_strategy: pointwise, pairwise
        """
        
        self._is_training = phase == 'train'
        self._hidden_size = input_dim
    
        self._num_classes = num_classes

        self._dropout_prob = dropout_prob if phase == 'train' else 0.0
        self._param_initializer = fluid.initializer.TruncatedNormal(
            scale=param_initializer_range)
        self._learning_strategy = learning_strategy 
        self._margin = margin

    
        self._preds = []
        self._preds_logits = []
    
    @property
    def inputs_attrs(self):
        reader = {}
        bb = {"sentence_pair_embedding": [[-1, self._hidden_size], 'float32']}
        if self._is_training:
            if self._learning_strategy == 'pointwise':
                reader["label_ids"] = [[-1], 'int64']
            elif self._learning_strategy == 'pairwise':
                bb["sentence_pair_embedding_neg"] = [[-1, self._hidden_size], 'float32']

        return {'reader': reader, 'backbone': bb}

    @property
    def outputs_attrs(self):
        if self._is_training:
            return {"loss": [[1], 'float32']}
        else:
            if self._learning_strategy=='paiwise':
                return {"probs": [[-1, 1], 'float32']}
            else:
                return {"logits": [[-1, self._num_classes], 'float32'],
                        "probs": [[-1, self._num_classes], 'float32']}

    def build(self, inputs, scope_name=""):

        # inputs          
        cls_feats = inputs["backbone"]["sentence_pair_embedding"] 
        if self._is_training:
            cls_feats = fluid.layers.dropout(
                x=cls_feats,
                dropout_prob=self._dropout_prob,
                dropout_implementation="upscale_in_train")
            if self._learning_strategy == 'pairwise':
                cls_feats_neg = inputs["backbone"]["sentence_pair_embedding_neg"]
                cls_feats_neg = fluid.layers.dropout(
                x=cls_feats_neg,
                dropout_prob=self._dropout_prob,
                dropout_implementation="upscale_in_train")
            elif self._learning_strategy == 'pointwise':
                labels = inputs["reader"]["label_ids"] 
        
        # loss
        # for pointwise
        if self._learning_strategy == 'pointwise':
            logits = fluid.layers.fc(
                input=cls_feats,
                size=self._num_classes,
                param_attr=fluid.ParamAttr(
                    name=scope_name+"cls_out_w",
                    initializer=self._param_initializer),
                bias_attr=fluid.ParamAttr(
                    name=scope_name+"cls_out_b",
                    initializer=fluid.initializer.Constant(0.)))
            probs = fluid.layers.softmax(logits)
            if self._is_training:
                ce_loss = fluid.layers.cross_entropy(
                    input=probs, label=labels)
                loss = fluid.layers.mean(x=ce_loss)
                return {'loss': loss}
            # for pred
            else:
                return {'logits': logits,
                        'probs': probs}
        # for pairwise
        elif self._learning_strategy == 'pairwise':
            pos_score = fluid.layers.fc(
                input=cls_feats,
                size=1,
                act = "sigmoid",
                param_attr=fluid.ParamAttr(
                    name=scope_name+"cls_out_w_pr",
                    initializer=self._param_initializer),
                bias_attr=fluid.ParamAttr(
                    name=scope_name+"cls_out_b_pr",
                    initializer=fluid.initializer.Constant(0.)))
            pos_score = fluid.layers.reshape(x=pos_score, shape=[-1, 1], inplace=True)

            if self._is_training:
                neg_score = fluid.layers.fc(
                    input=cls_feats_neg,
                    size=1,
                    act = "sigmoid",
                    param_attr=fluid.ParamAttr(
                        name=scope_name+"cls_out_w_pr",
                        initializer=self._param_initializer),
                    bias_attr=fluid.ParamAttr(
                        name=scope_name+"cls_out_b_pr",
                        initializer=fluid.initializer.Constant(0.)))        
                neg_score = fluid.layers.reshape(x=neg_score, shape=[-1, 1], inplace=True)
        
                loss = fluid.layers.mean(computeHingeLoss(pos_score, neg_score, self._margin))
                return {'loss': loss}
            # for pred
            else:
                return {'probs': pos_score}
        
    def batch_postprocess(self, rt_outputs):
        if not self._is_training:
            probs = []
            logits = []
            probs = rt_outputs['probs']
            self._preds.extend(probs.tolist())
            if self._learning_strategy == 'pointwise':
                logits = rt_outputs['logits']
                self._preds_logits.extend(logits.tolist())

    def reset(self):
        self._preds_logits = []
        self._preds = []
        
    def epoch_postprocess(self, post_inputs, output_dir=None):
        # there is no post_inputs needed and not declared in epoch_inputs_attrs, hence no elements exist in post_inputs
        if not self._is_training:
            results = []
            for i in range(len(self._preds)):
                if self._learning_strategy == 'pointwise':
                    label = int(np.argmax(np.array(self._preds[i])))
                    result = {'index': i, 'label': label, 'logits': self._preds_logits[i], 'probs': self._preds[i]}
                elif self._learning_strategy == 'pairwise':
                    result = {'index': i, 'probs': self._preds[i][0]}
                results.append(result)
            if output_dir is not None:
                with open(os.path.join(output_dir, 'predictions.json'), 'w') as writer:
                    for result in results:
                        result = json.dumps(result, ensure_ascii=False)
                        writer.write(result+'\n')
                print('Predictions saved at '+os.path.join(output_dir, 'predictions.json'))
            return results


================================================
FILE: paddlepalm/head/mlm.py
================================================
# -*- coding: UTF-8 -*-
#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import paddle.fluid as fluid
from paddlepalm.head.base_head import Head
from paddle.fluid import layers
import numpy as np
import os
from paddlepalm.backbone.utils.transformer import pre_process_layer

class MaskLM(Head):
    '''
    mlm
    '''
    def __init__(self, input_dim, vocab_size, hidden_act, dropout_prob=0.0, \
                 param_initializer_range=0.02, phase='train'):
        self._is_training = phase == 'train'
        self._emb_size = input_dim
        self._hidden_size = input_dim
        self._dropout_prob = dropout_prob if phase == 'train' else 0.0
        self._preds = []

        self._vocab_size = vocab_size
        self._hidden_act = hidden_act
        self._initializer_range = param_initializer_range
    
    @property
    def inputs_attrs(self):
        reader = {
            "mask_label": [[-1], 'int64'],
            "mask_pos": [[-1], 'int64'],
            }
        if not self._is_training:
            del reader['mask_label']
        bb = {
            "encoder_outputs": [[-1, -1, self._hidden_size], 'float32'],
            "embedding_table": [[-1, self._vocab_size, self._emb_size], 'float32']}
        return {'reader': reader, 'backbone': bb}

    @property
    def outputs_attrs(self):
        if self._is_training:
            return {"loss": [[1], 'float32']}
        else:
            return {"logits": [[-1], 'float32']}

    def build(self, inputs, scope_name=""):
        mask_pos = inputs["reader"]["mask_pos"]
        
        word_emb = inputs["backbone"]["embedding_table"]
        enc_out = inputs["backbone"]["encoder_outputs"]

        if self._is_training:
            mask_label = inputs["reader"]["mask_label"]
            l1 = enc_out.shape[0] 
            l2 = enc_out.shape[1]
            bxs = fluid.layers.fill_constant(shape=[1], value=l1*l2, dtype='int64')
            max_position = bxs - 1
            mask_pos = fluid.layers.elementwise_min(mask_pos, max_position)
            mask_pos.stop_gradient = True

        emb_size = word_emb.shape[-1]

        _param_initializer = fluid.initializer.TruncatedNormal(
            scale=self._initializer_range)

        reshaped_emb_out = fluid.layers.reshape(
            x=enc_out, shape=[-1, emb_size])

        # extract masked tokens' feature
        mask_feat = fluid.layers.gather(input=reshaped_emb_out, index=mask_pos)

        # transform: fc
        mask_trans_feat = fluid.layers.fc(
            input=mask_feat,
            size=emb_size,
            act=self._hidden_act,
            param_attr=fluid.ParamAttr(
                name=scope_name+'mask_lm_trans_fc.w_0',
                initializer=_param_initializer),
                bias_attr=fluid.ParamAttr(name=scope_name+'mask_lm_trans_fc.b_0'))
        # transform: layer norm
        mask_trans_feat = pre_process_layer(
            mask_trans_feat, 'n', name=scope_name+'mask_lm_trans')

        mask_lm_out_bias_attr = fluid.ParamAttr(
            name=scope_name+"mask_lm_out_fc.b_0",
            initializer=fluid.initializer.Constant(value=0.0))

        fc_out = fluid.layers.matmul(
            x=mask_trans_feat,
            y=word_emb,
            transpose_y=True)
        fc_out += fluid.layers.create_parameter(
            shape=[self._vocab_size],
            dtype='float32',
            attr=mask_lm_out_bias_attr,
            is_bias=True)

        if self._is_training:
            inputs = fluid.layers.softmax(fc_out)
            mask_lm_loss = fluid.layers.cross_entropy(
                input=inputs, label=mask_label)
            loss = fluid.layers.mean(mask_lm_loss)
            return {'loss': loss}
        else:
            return {'logits': fc_out}

    def batch_postprocess(self, rt_outputs):
        if not self._is_training:
            logits = rt_outputs['logits']
            preds = np.argmax(logits, -1)
            self._preds.extend(preds.tolist())
            return preds

    def epoch_postprocess(self, post_inputs, output_dir=None):
        # there is no post_inputs needed and not declared in epoch_inputs_attrs, hence no elements exist in post_inputs
        if not self._is_training:
            results = []
            for i in range(len(self._preds)):
                result = {'index': i, 'word_id': self._preds[i]}
                results.append(result)
            if output_dir is not None:
                with open(os.path.join(output_dir, 'predictions.json'), 'w') as writer:
                    for result in results:
                        result = json.dumps(result)
                        writer.write(result+'\n')
                print('Predictions saved at '+os.path.join(output_dir, 'predictions.json'))
            return results


================================================
FILE: paddlepalm/head/mrc.py
================================================
# -*- coding: UTF-8 -*-
#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import paddle.fluid as fluid
from paddlepalm.head.base_head import Head
import collections
import numpy as np
import os
import math
import six
import paddlepalm.tokenizer.ernie_tokenizer as tokenization
import json
import io

RawResult = collections.namedtuple("RawResult",
                                   ["unique_id", "start_logits", "end_logits"])

class MRC(Head):
    """
    Machine Reading Comprehension
    """

    def __init__(self, max_query_len, input_dim, pred_output_path=None, verbose=False, with_negative=False, do_lower_case=False, max_ans_len=None, null_score_diff_threshold=0.0, n_best_size=20, phase='train'):

        self._is_training = phase == 'train'
        self._hidden_size = input_dim
        self._max_sequence_length = max_query_len
 
        self._pred_results = []
        
        output_dir = pred_output_path
        self._max_answer_length = max_ans_len
        self._null_score_diff_threshold = null_score_diff_threshold
        self._n_best_size = n_best_size
        output_dir = pred_output_path
        self._verbose = verbose
        self._with_negative = with_negative
        self._do_lower_case = do_lower_case


    @property
    def inputs_attrs(self):
        if self._is_training:
            reader = {"start_positions": [[-1], 'int64'],
                      "end_positions": [[-1], 'int64'],
                      }
        else:
            reader = {'unique_ids': [[-1], 'int64']}
        bb = {"encoder_outputs": [[-1, -1, self._hidden_size], 'float32']}
        return {'reader': reader, 'backbone': bb}
        
    @property
    def epoch_inputs_attrs(self):
        if not self._is_training:
            from_reader = {'examples': None, 'features': None}
            return {'reader': from_reader}

    @property
    def outputs_attr(self):
        if self._is_training:
            return {'loss': [[1], 'float32']}
        else:
            return {'start_logits': [[-1, -1, 1], 'float32'],
                    'end_logits': [[-1, -1, 1], 'float32'],
                    'unique_ids': [[-1], 'int64']}


    def build(self, inputs, scope_name=""):
        if self._is_training:
            start_positions = inputs['reader']['start_positions']
            end_positions = inputs['reader']['end_positions']
            # max_position = inputs["reader"]["seqlen"] - 1
            # start_positions = fluid.layers.elementwise_min(start_positions, max_position)
            # end_positions = fluid.layers.elementwise_min(end_positions, max_position)
            start_positions.stop_gradient = True
            end_positions.stop_gradient = True
        else:
            unique_id = inputs['reader']['unique_ids']

            # It's used to help fetch variable 'unique_ids' that will be removed in the future
            helper_constant = fluid.layers.fill_constant(shape=[1], value=1, dtype='int64')
            fluid.layers.elementwise_mul(unique_id, helper_constant)  
            

        enc_out = inputs['backbone']['encoder_outputs']
        logits = fluid.layers.fc(
            input=enc_out,
            size=2,
            num_flatten_dims=2,
            param_attr=fluid.ParamAttr(
                name=scope_name+"cls_squad_out_w",
                initializer=fluid.initializer.TruncatedNormal(scale=0.02)),
            bias_attr=fluid.ParamAttr(
                name=scope_name+"cls_squad_out_b", initializer=fluid.initializer.Constant(0.)))

        logits = fluid.layers.transpose(x=logits, perm=[2, 0, 1])
        start_logits, end_logits = fluid.layers.unstack(x=logits, axis=0)

        def _compute_single_loss(logits, positions):
            """Compute start/en
            d loss for mrc model"""
            inputs = fluid.layers.softmax(logits)
            loss = fluid.layers.cross_entropy(
                input=inputs, label=positions)
            loss = fluid.layers.mean(x=loss)
            return loss

        if self._is_training:
            start_loss = _compute_single_loss(start_logits, start_positions)
            end_loss = _compute_single_loss(end_logits, end_positions)
            total_loss = (start_loss + end_loss) / 2.0
            return {'loss': total_loss}
        else:
            return {'start_logits': start_logits,
                    'end_logits': end_logits,
                    'unique_ids': unique_id}


    def batch_postprocess(self, rt_outputs):
        """this func will be called after each step(batch) of training/evaluating/predicting process."""
        if not self._is_training:
            unique_ids = rt_outputs['unique_ids']
            start_logits = rt_outputs['start_logits']
            end_logits = rt_outputs['end_logits']
            for idx in range(len(unique_ids)):
                
                if unique_ids[idx] < 0:
                    continue
                if len(self._pred_results) % 1000 == 0:
                    print("Predicting example: {}".format(len(self._pred_results)))
                uid = int(unique_ids[idx])

                s = [float(x) for x in start_logits[idx].flat]
                e = [float(x) for x in end_logits[idx].flat]
                self._pred_results.append(
                    RawResult(
                        unique_id=uid,
                        start_logits=s,
                        end_logits=e))

    def epoch_postprocess(self, post_inputs, output_dir=None):
        """(optional interface) this func will be called after evaluation/predicting process and each epoch during training process."""

        if not self._is_training:
            if output_dir is not None:
                examples = post_inputs['reader']['examples']
                features = post_inputs['reader']['features']
                if not os.path.exists(output_dir):
                    os.makedirs(output_dir)
                output_prediction_file = os.path.join(output_dir, "predictions.json")
                output_nbest_file = os.path.join(output_dir, "nbest_predictions.json")
                output_null_log_odds_file = os.path.join(output_dir, "null_odds.json")
                _write_predictions(examples, features, self._pred_results,
                                  self._n_best_size, self._max_answer_length,
                                  self._do_lower_case, output_prediction_file,
                                  output_nbest_file, output_null_log_odds_file,
                                  self._with_negative,
                                  self._null_score_diff_threshold, self._verbose)
            return self._pred_results


def _write_predictions(all_examples, all_features, all_results, n_best_size,
                      max_answer_length, do_lower_case, output_prediction_file,
                      output_nbest_file, output_null_log_odds_file,
                      with_negative, null_score_diff_threshold,
                      verbose):
    """Write final predictions to the json file and log-odds of null if needed."""
    print("Writing predictions to: %s" % (output_prediction_file))
    print("Writing nbest to: %s" % (output_nbest_file))

    example_index_to_features = collections.defaultdict(list)
    for feature in all_features:
        example_index_to_features[feature.example_index].append(feature)

    unique_id_to_result = {}
    for result in all_results:
        unique_id_to_result[result.unique_id] = result

    _PrelimPrediction = collections.namedtuple(  # pylint: disable=invalid-name
        "PrelimPrediction", [
            "feature_index", "start_index", "end_index", "start_logit",
            "end_logit"
        ])

    all_predictions = collections.OrderedDict()
    all_nbest_json = collections.OrderedDict()
    scores_diff_json = collections.OrderedDict()

    for (example_index, example) in enumerate(all_examples):
        features = example_index_to_features[example_index]

        prelim_predictions = []
        # keep track of the minimum score of null start+end of position 0
        score_null = 1000000  # large and positive
        min_null_feature_index = 0  # the paragraph slice with min mull score
        ull_start_logit = 0  # the start logit at the slice with min null score
        null_end_logit = 0  # the end logit at the slice with min null score
    
        for (feature_index, feature) in enumerate(features):
            result = unique_id_to_result[feature.unique_id]
            start_indexes = _get_best_indexes(result.start_logits, n_best_size)
            end_indexes = _get_best_indexes(result.end_logits, n_best_size)
            # if we could have irrelevant answers, get the min score of irrelevant
            if with_negative:
                feature_null_score = result.start_logits[0] + result.end_logits[
                    0]
                if feature_null_score < score_null:
                    score_null = feature_null_score
                    min_null_feature_index = feature_index
                    null_start_logit = result.start_logits[0]
                    null_end_logit = result.end_logits[0]
            for start_index in start_indexes:
                for end_index in end_indexes:
                    # We could hypothetically create invalid predictions, e.g., predict
                    # that the start of the span is in the question. We throw out all
                    # invalid predictions.
                    if start_index >= len(feature.tokens):
                        continue
                    if end_index >= len(feature.tokens):
                        continue
                    if start_index not in feature.token_to_orig_map:
                        continue
                    if end_index not in feature.token_to_orig_map:
                        continue
                    if not feature.token_is_max_context.get(start_index, False):
                        continue
                    if end_index < start_index:
                        continue
                    length = end_index - start_index + 1
                    if length > max_answer_length:
                        continue
                    prelim_predictions.append(
                        _PrelimPrediction(
                            feature_index=feature_index,
                            start_index=start_index,
                            end_index=end_index,
                            start_logit=result.start_logits[start_index],
                            end_logit=result.end_logits[end_index]))

        if with_negative:
            prelim_predictions.append(
                _PrelimPrediction(
                    feature_index=min_null_feature_index,
                    start_index=0,
                    end_index=0,
                    start_logit=null_start_logit,
                    end_logit=null_end_logit))
        prelim_predictions = sorted(
            prelim_predictions,
            key=lambda x: (x.start_logit + x.end_logit),
            reverse=True)

        _NbestPrediction = collections.namedtuple(  # pylint: disable=invalid-name
            "NbestPrediction", ["text", "start_logit", "end_logit"])

        seen_predictions = {}
        nbest = []
        for pred in prelim_predictions:
            if len(nbest) >= n_best_size:
                break
            feature = features[pred.feature_index]
            if pred.start_index > 0:  # this is a non-null prediction
                tok_tokens = feature.tokens[pred.start_index:(pred.end_index + 1
                                                              )]
                orig_doc_start = feature.token_to_orig_map[pred.start_index]
                orig_doc_end = feature.token_to_orig_map[pred.end_index]
                orig_tokens = example.doc_tokens[orig_doc_start:(orig_doc_end +
                                                                 1)]
                tok_text = " ".join(tok_tokens)

                # De-tokenize WordPieces that have been split off.
                tok_text = tok_text.replace(" ##", "")
                tok_text = tok_text.replace("##", "")

                # Clean whitespace
                tok_text = tok_text.strip()
                tok_text = " ".join(tok_text.split())
                orig_text = " ".join(orig_tokens)

                final_text = _get_final_text(tok_text, orig_text, do_lower_case,
                                            verbose)
                if final_text in seen_predictions:
                    continue

                seen_predictions[final_text] = True
            else:
                final_text = ""
                seen_predictions[final_text] = True

            nbest.append(
                _NbestPrediction(
                    text=final_text,
                    start_logit=pred.start_logit,
                    end_logit=pred.end_logit))

        # if we didn't inlude the empty option in the n-best, inlcude it
        if with_negative:
            if "" not in seen_predictions:
                nbest.append(
                    _NbestPrediction(
                        text="",
                        start_logit=null_start_logit,
                        end_logit=null_end_logit))
        # In very rare edge cases we could have no valid predictions. So we
        # just create a nonce prediction in this case to avoid failure.
        if not nbest:
            nbest.append(
                _NbestPrediction(
                    text="empty", start_logit=0.0, end_logit=0.0))

        assert len(nbest) >= 1

        total_scores = []
        best_non_null_entry = None
        for entry in nbest:
            total_scores.append(entry.start_logit + entry.end_logit)
            if not best_non_null_entry:
                if entry.text:
                    best_non_null_entry = entry
        # debug
        if best_non_null_entry is None:
            print("Emmm..., sth wrong")

        probs = _compute_softmax(total_scores)

        nbest_json = []
        for (i, entry) in enumerate(nbest):
            output = collections.OrderedDict()
            output["text"] = entry.text.encode('utf-8').decode('utf-8')
            output["probability"] = probs[i]
            output["start_logit"] = entry.start_logit
            output["end_logit"] = entry.end_logit
            nbest_json.append(output)

        assert len(nbest_json) >= 1

        if not with_negative:
            all_predictions[example.qas_id] = nbest_json[0]["text"]
        else:
            # predict "" iff the null score - the score of best non-null > threshold
            score_diff = score_null - best_non_null_entry.start_logit - (
                best_non_null_entry.end_logit)
            scores_diff_json[example.qas_id] = score_diff
            if score_diff > null_score_diff_threshold:
                all_predictions[example.qas_id] = ""
            else:
                all_predictions[example.qas_id] = best_non_null_entry.text

        all_nbest_json[example.qas_id] = nbest_json
    

    with io.open(output_prediction_file, "w", encoding='utf-8') as writer:
        
        writer.write(json.dumps(all_predictions, indent=4, ensure_ascii=False) + "\n")

    with io.open(output_nbest_file, "w", encoding='utf-8') as writer:
        writer.write(json.dumps(all_nbest_json, indent=4, ensure_ascii=False) + "\n")

    if with_negative:
        with io.open(output_null_log_odds_file, "w", encoding='utf-8') as writer:
            writer.write(json.dumps(scores_diff_json, indent=4, ensure_ascii=False) + "\n")


def _get_final_text(pred_text, orig_text, do_lower_case, verbose):
    """Project the tokenized prediction back to the original text."""

    # When we created the data, we kept track of the alignment between original
    # (whitespace tokenized) tokens and our WordPiece tokenized tokens. So
    # now `orig_text` contains the span of our original text corresponding to the
    # span that we predicted.
    #
    # However, `orig_text` may contain extra characters that we don't want in
    # our prediction.
    #
    # For example, let's say:
    #   pred_text = steve smith
    #   orig_text = Steve Smith's
    #
    # We don't want to return `orig_text` because it contains the extra "'s".
    #
    # We don't want to return `pred_text` because it's already been normalized
    # (the MRQA eval script also does punctuation stripping/lower casing but
    # our tokenizer does additional normalization like stripping accent
    # characters).
    #
    # What we really want to return is "Steve Smith".
    #
    # Therefore, we have to apply a semi-complicated alignment heruistic between
    # `pred_text` and `orig_text` to get a character-to-charcter alignment. This
    # can fail in certain cases in which case we just return `orig_text`.

    def _strip_spaces(text):
        ns_chars = []
        ns_to_s_map = collections.OrderedDict()
        for (i, c) in enumerate(text):
            if c == " ":
                continue
            ns_to_s_map[len(ns_chars)] = i
            ns_chars.append(c)
        ns_text = "".join(ns_chars)
        return (ns_text, ns_to_s_map)

    # We first tokenize `orig_text`, strip whitespace from the result
    # and `pred_text`, and check if they are the same length. If they are
    # NOT the same length, the heuristic has failed. If they are the same
    # length, we assume the characters are one-to-one aligned.
    tokenizer = tokenization.BasicTokenizer(do_lower_case=do_lower_case)

    tok_text = " ".join(tokenizer.tokenize(orig_text))

    start_position = tok_text.find(pred_text)
    if start_position == -1:
        if verbose:
            print("Unable to find text: '%s' in '%s'" % (pred_text, orig_text))
        return orig_text
    end_position = start_position + len(pred_text) - 1

    (orig_ns_text, orig_ns_to_s_map) = _strip_spaces(orig_text)
    (tok_ns_text, tok_ns_to_s_map) = _strip_spaces(tok_text)

    if len(orig_ns_text) != len(tok_ns_text):
        if verbose:
            print("Length not equal after stripping spaces: '%s' vs '%s'",
                  orig_ns_text, tok_ns_text)
        return orig_text

    # We then project the characters in `pred_text` back to `orig_text` using
    # the character-to-character alignment.
    tok_s_to_ns_map = {}
    for (i, tok_index) in six.iteritems(tok_ns_to_s_map):
        tok_s_to_ns_map[tok_index] = i

    orig_start_position = None
    if start_position in tok_s_to_ns_map:
        ns_start_position = tok_s_to_ns_map[start_position]
        if ns_start_position in orig_ns_to_s_map:
            orig_start_position = orig_ns_to_s_map[ns_start_position]

    if orig_start_position is None:
        if verbose:
            print("Couldn't map start position")
        return orig_text

    orig_end_position = None
    if end_position in tok_s_to_ns_map:
        ns_end_position = tok_s_to_ns_map[end_position]
        if ns_end_position in orig_ns_to_s_map:
            orig_end_position = orig_ns_to_s_map[ns_end_position]

    if orig_end_position is None:
        if verbose:
            print("Couldn't map end position")
        return orig_text

    output_text = orig_text[orig_start_position:(orig_end_position + 1)]
    return output_text


def _get_best_indexes(logits, n_best_size):
    """Get the n-best logits from a list."""
    index_and_score = sorted(
        enumerate(logits), key=lambda x: x[1], reverse=True)

    best_indexes = []
    for i in range(len(index_and_score)):
        if i >= n_best_size:
            break
        best_indexes.append(index_and_score[i][0])
    return best_indexes


def _compute_softmax(scores):
    """Compute softmax probability over raw logits."""
    if not scores:
        return []

    max_score = None
    for score in scores:
        if max_score is None or score > max_score:
            max_score = score

    exp_scores = []
    total_sum = 0.0
    for score in scores:
        x = math.exp(score - max_score)
        exp_scores.append(x)
        total_sum += x

    probs = []
    for score in exp_scores:
        probs.append(score / total_sum)
    return probs


================================================
FILE: paddlepalm/head/ner.py
================================================
# -*- coding: UTF-8 -*-
#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import paddle.fluid as fluid
from paddle.fluid import layers
from paddlepalm.head.base_head import Head
import numpy as np
import os
import math

class SequenceLabel(Head):
    '''
    Sequence label
    '''
    def __init__(self, num_classes, input_dim, dropout_prob=0.0, learning_rate=1e-3,  \
                 param_initializer_range=0.02, phase='train'):
        
        """  
        Args:
            phase: train, eval, pred
            lang: en, ch, ...
        """

        self._is_training = phase == 'train'
        self._hidden_size = input_dim

        self.num_classes = num_classes
    
        self._dropout_prob = dropout_prob if phase == 'train' else 0.0
        self._param_initializer = fluid.initializer.TruncatedNormal(
            scale=param_initializer_range)

        self.learning_rate = learning_rate
        self._preds = []


    @property
    def inputs_attrs(self):
        reader = {}
        bb = {"encoder_outputs": [[-1, -1, -1], 'float32']}
        if self._is_training:
            reader["label_ids"] = [[-1, -1], 'int64']
            reader["seq_lens"] = [[-1], 'int64']
        return {'reader': reader, 'backbone': bb}

    @property
    def outputs_attrs(self):
        if self._is_training:
            return {'loss': [[1], 'float32']}
        else:
            return {'logits': [[-1, -1, self.num_classes], 'float32']}

    def build(self, inputs, scope_name=''):
        token_emb = inputs['backbone']['encoder_outputs']
        if self._is_training:
            label_ids = inputs['reader']['label_ids']
            seq_lens = inputs['reader']['seq_lens']

        emission = fluid.layers.fc(
            size=self.num_classes,
            input=token_emb,
            param_attr=fluid.ParamAttr(
                initializer=self._param_initializer,
                regularizer=fluid.regularizer.L2DecayRegularizer(
                    regularization_coeff=1e-4)),
            bias_attr=fluid.ParamAttr(
                name=scope_name+"cls_out_b", initializer=fluid.initializer.Constant(0.)),
            num_flatten_dims=2)

        if self._is_training:

            # compute loss
            crf_cost = fluid.layers.linear_chain_crf(  
                input=emission,
                label=label_ids,
                param_attr=fluid.ParamAttr(
                    name=scope_name+'crfw', learning_rate=self.learning_rate),
                length=seq_lens)

            avg_cost = fluid.layers.mean(x=crf_cost)
            crf_decode = fluid.layers.crf_decoding(
                input=emission,
                param_attr=fluid.ParamAttr(name=scope_name+'crfw'),
                length=seq_lens)

            (precision, recall, f1_score, num_infer_chunks, num_label_chunks,
            num_correct_chunks) = fluid.layers.chunk_eval(
                input=crf_decode,
                label=label_ids,
                chunk_scheme="IOB",
                num_chunk_types=int(math.ceil((self.num_classes - 1) / 2.0)),
                seq_length=seq_lens)
            chunk_evaluator = fluid.metrics.ChunkEvaluator()
            chunk_evaluator.reset()

            return {"loss": avg_cost}
        else:
            return {"logits": emission} 

    def batch_postprocess(self, rt_outputs):
        if not self._is_training:
            emission = rt_outputs['emission']
            preds = np.argmax(emission, -1)
            self._preds.extend(preds.tolist())

    def epoch_postprocess(self, post_inputs, output_dir=None):
        # there is no post_inputs needed and not declared in epoch_inputs_attrs, hence no elements exist in post_inputs
        if not self._is_training:
            if output_dir is not None:
                with open(os.path.join(output_dir, 'predictions.json'), 'w') as writer:
                    for p in self._preds:
                        writer.write(str(p)+'\n')
                print('Predictions saved at '+os.path.join(output_dir, 'predictions.json'))
            return self._preds


================================================
FILE: paddlepalm/lr_sched/__init__.py
================================================

from .slanted_triangular_schedualer import TriangularSchedualer
from .warmup_schedualer import WarmupSchedualer


================================================
FILE: paddlepalm/lr_sched/base_schedualer.py
================================================

class Schedualer():

    def __init__(self):
        self._prog = None
    
    def _set_prog(self, prog):
        self._prog = prog

    def _build(self, learning_rate):
        raise NotImplementedError()


================================================
FILE: paddlepalm/lr_sched/slanted_triangular_schedualer.py
================================================
from paddlepalm.lr_sched.base_schedualer import Schedualer
from paddle import fluid

class TriangularSchedualer(Schedualer):

    """ Implementation of Slanted Triangular learning rate schedual method, more details refer to https://arxiv.org/pdf/1801.06146.pdf . Apply linear warmup of learning rate from 0 to learning_rate until warmup_steps, and then decay to 0 linearly until num_train_steps."""

    def __init__(self, warmup_steps, num_train_steps):
        """Create a new TriangularSchedualer object.

        Args:
            warmup_steps: the learning rate will grow from 0 to max_learning_rate over `warmup_steps` steps.
            num_train_steps: the number of train steps.

        """
        Schedualer.__init__(self)
        assert num_train_steps > warmup_steps > 0
        self.warmup_steps = warmup_steps
        self.num_train_steps = num_train_steps
        

    def _build(self, learning_rate):
        with self._prog._lr_schedule_guard():
            lr = fluid.layers.tensor.create_global_var(
                shape=[1],
                value=0.0,
                dtype='float32',
                persistable=True,
                name="scheduled_learning_rate")

            global_step = fluid.layers.learning_rate_scheduler._decay_step_counter()

            with fluid.layers.control_flow.Switch() as switch:
                with switch.case(global_step < self.warmup_steps):
                    warmup_lr = learning_rate * (global_step / self.warmup_steps)
                    fluid.layers.tensor.assign(warmup_lr, lr)
                with switch.default():
                    decayed_lr = fluid.layers.learning_rate_scheduler.polynomial_decay(
                        learning_rate=learning_rate,
                        decay_steps=self.num_train_steps,
                        end_learning_rate=0.0,
                        power=1.0,
                        cycle=False)
                    fluid.layers.tensor.assign(decayed_lr, lr)

            return lr


================================================
FILE: paddlepalm/lr_sched/warmup_schedualer.py
================================================

from paddlepalm.lr_sched.base_schedualer import Schedualer
import paddle.fluid as fluid

def WarmupSchedualer(Schedualer):
    """ Applies linear warmup of learning rate from 0 to learning_rate until warmup_steps, and then decay to 0 linearly until num_train_steps."""

    def __init__(self, warmup_steps):
        schedualer.__init__(self)
        self.warmup_steps = warmup_steps

    def _build(self, learning_rate):

        with self._prog._lr_schedule_guard():
            lr = fluid.layers.tensor.create_global_var(
                shape=[1],
                value=0.0,
                dtype='float32',
                persistable=True,
                name="scheduled_learning_rate")

            global_step = fluid.layers.learning_rate_scheduler._decay_step_counter()

            with fluid.layers.control_flow.Switch() as switch:
                with switch.case(global_step < self.warmup_steps):
                    warmup_lr = learning_rate * (global_step / self.warmup_steps)
                    fluid.layers.tensor.assign(warmup_lr, lr)
                with switch.default():
                    fluid.layers.tensor.assign(learning_rate, lr)

            return lr


================================================
FILE: paddlepalm/multihead_trainer.py
================================================

from paddle import fluid
from paddle.fluid import layers
from paddlepalm.distribute import gpu_dev_count, cpu_dev_count, data_feeder, decode_fake
from paddlepalm import Trainer
from paddlepalm.utils import reader_helper
import numpy as np
import time
import sys

dev_count = 1 if gpu_dev_count <= 1 else gpu_dev_count
VERBOSE=False


class MultiHeadTrainer(Trainer):
    """
    The core unit to start a multi-task training/predicting session. A MultiHeadTrainer is built based on several Trainers. Beyond the inheritance of Trainer, it additionally achieves model backbone reuse across tasks, trainer sampling for multi-task learning, and multi-head inference for effective evaluation and prediction. 
    """
    
    def __init__(self, trainers):
        """Create a new multi_head_trainer.

        Args:
            trainers: a list of Trainer objects.

        """
        Trainer.__init__(self, '')

        self._trainers = trainers

        name_maxlen = max([len(i.name) for i in self._trainers])
        self._name_pads = {i.name: name_maxlen-len(i.name) for i in self._trainers}

        self._train_init = False
        self._dist_train_init = False
        self._predict_init = False
        self._feeded_var_names = None
        self._cur_train_step = 0
        self._target_vars = None

        self._inputname_to_varname = {}
        self._pred_input_name_list = []
        self._pred_input_varname_list = []
        self._pred_fetch_name_list = []
        self._pred_fetch_var_list = []

        self._exe = None

        self._save_protocol = {
            'input_names': 'self._pred_input_name_list',
            'input_varnames': 'self._pred_input_varname_list',
            'fetch_list': 'self._pred_fetch_name_list'}

        self._check_save = lambda: False
        for t in self._trainers:
            t._set_multitask()

    def build_forward(self):
        """
        Build forward computation graph for training, which usually built from input layer to loss node.

        Return:
            - loss_var: a Variable object. The computational graph variable(node) of loss.
        """
        head_dict = {}
        backbone = self._trainers[0]._backbone
        for i in self._trainers:
            assert i._task_head is not None and i._backbone is not None, "You should build forward for the {} task".format(i._name)
            assert i._backbone == backbone, "The backbone for each task must be the same"
            head_dict[i._name] = i._task_head
            
        train_prog = fluid.Program()
        train_init_prog = fluid.Program()
        self._train_prog = train_prog
        self._train_init_prog = train_init_prog

        def get_loss(i):
            head = head_dict[self._trainers[i].name]
            self._trainers[i]._lock_prog = True
            loss_var = self._trainers[i].build_forward(backbone, head)
            self._trainers[i]._lock_prog = False
            return loss_var
      
        task_fns = {i: lambda i=i: get_loss(i) for i in range(len(self._trainers))}

        with fluid.program_guard(train_prog, train_init_prog):
            task_id_var = fluid.data(name="__task_id",shape=[1],dtype='int64')

            loss_var = layers.switch_case(
                branch_index=task_id_var,
                branch_fns=task_fns
            )
        self._task_id_var = task_id_var
        self._loss_var = loss_var
        self._fetch_list = [loss_var.name]
        if not self._multi_task:
            self._init_exe_prog(for_train=True)
        return loss_var
        
    def build_predict_forward(self):
        head_dict = {}
        backbone = self._trainers[0]._pred_backbone
        for i in self._trainers:
            assert i._pred_head is not None and i._pred_backbone is not None, "You should build_predict_forward for the {} task".format(i._name)
            assert i._pred_backbone == backbone, "The backbone for each task must be the same"
            head_dict[i._name] = i._pred_head
            
        pred_prog = fluid.Program()
        pred_init_prog = fluid.Program()
        self._pred_prog = pred_prog
        self._pred_init_prog = pred_init_prog

        def get_loss(i):
            head = head_dict[self._trainers[i].name]
            self._trainers[i]._lock_prog = True
            pred_vars = self._trainers[i].build_predict_forward(backbone, head)
            self._trainers[i]._lock_prog = False
            # return loss_var
      
        task_fns = {i: lambda i=i: get_loss(i) for i in range(len(self._trainers))}

        with fluid.program_guard(pred_prog, pred_init_prog):
            task_id_var = fluid.data(name="__task_id",shape=[1],dtype='int64')

            loss_var = layers.switch_case(
                branch_index=task_id_var,
                branch_fns=task_fns
            )
        if not self._multi_task:
            self._init_exe_prog(for_train=False)

    def merge_inference_readers(self, readers):

        for r in readers:
            assert r._phase == 'predict'

        if isinstance(readers, list):
            reader_dict = {k.name: v for k,v in zip(self._trainers, readers)}
        elif isinstance(readers, dict):
            reader_dict = readers
        else:
            raise ValueError()
        
        num_heads = len(self._trainers)
        assert len(reader_dict) == num_heads, "received number of readers is not consistent with trainers."

        trainer_dict = {t.name: t for t in self._trainers}
        task_name2id = {t.name: idx for idx, t in enumerate(self._trainers)}
        self._task_name2id = task_name2id

        self._finish_steps = {}
        self._finish = {}
        input_names = []
        name_to_pos = []
        joint_shape_and_dtypes = []
        iterators = []
        prefixes = []
        mrs = []
        net_inputs = []
        global_steps = 0
        for t in self._trainers:
            assert t.name in reader_dict
            assert reader_dict[t.name].num_epochs is None, "{}: num_epochs is not None. \
                To run with multi-head mode, num_epochs of each Trainer should be set as None.".format(t.name)
            # print(num_epochs, t.mix_ratio, base_steps_pur_epoch)
            self._finish_steps[t.name] = 9999999999
            self._finish[t.name] = True

            # t._set_task_id(self._task_id_var)
            t.fit_reader(reader_dict[t.name], phase='predict')
            net_inputs.append(t._pred_net_inputs)
            prefixes.append(t.name)
            iterators.append(t._raw_iterator_fn())
            input_names.append(t._pred_input_names)
            name_to_pos.append(t._pred_name_to_position)
            joint_shape_and_dtypes.append(t._pred_shape_and_dtypes)

        iterator_fn = reader_helper.create_multihead_inference_fn(iterators, prefixes, joint_shape_and_dtypes, \
            input_names, name_to_pos, task_name2id, dev_count=dev_count)
        feed_batch_process_fn = reader_helper.create_feed_batch_process_fn(net_inputs)

        if gpu_dev_count > 1:
            raise NotImplementedError('currently only single-gpu mode has been supported running with multi-task mode.')
            # distribute_feeder_fn = data_feeder(iterator_fn, feed_batch_process_fn, phase=phase, is_multi=True, with_arg=True)
        else:
            distribute_feeder_fn = iterator_fn

        self._predict_iterator_fn = distribute_feeder_fn
        self._pred_feed_batch_process_fn = feed_batch_process_fn
        return distribute_feeder_fn

    def fit_readers_with_mixratio(self, readers, sampling_reference, num_epochs, phase='train'):
        """
        Bind readers and loaded train/predict data to trainers. The `num_epochs` argument only 
            works on `sampling_reference` task(trainer), and num_epochs of other tasks are infered from 
            their `mix_ratio`.

        Args:
            readers: a dict or list of Reader objects. For dict case, each key is a trainer's name, and the mapped value is the reader object to bind to the trainer. For list case, each 
            sampling_reference: a trainer name. The task(trainer) selected as baseline for task sampling. 
            num_epochs: training epochs of the sampling_reference task (trainer). 
        """
        self._check_phase(phase)

        if isinstance(readers, list):
            reader_dict = {k.name: v for k,v in zip(self._trainers, readers)}
        elif isinstance(readers, dict):
            reader_dict = readers
        else:
            raise ValueError()
        
        num_heads = len(self._trainers)
        assert len(reader_dict) == num_heads, "received number of readers is not consistent with trainers."

        trainer_dict = {t.name: t for t in self._trainers}
        assert sampling_reference in trainer_dict

        trainer_dict[sampling_reference]._set_task_id(self._task_id_var)
        trainer_dict[sampling_reference].fit_reader(reader_dict[sampling_reference])
        base_steps_pur_epoch = trainer_dict[sampling_reference]._steps_pur_epoch

        self._finish_steps = {}
        self._finish = {}
        input_names = []
        name_to_pos = []
        joint_shape_and_dtypes = []
        iterators = []
        prefixes = []
        mrs = []
        net_inputs = []
        global_steps = 0
        for t in self._trainers:
            assert t.name in reader_dict
            assert reader_dict[t.name].num_epochs is None, "{}: num_epochs is not None. \
                To run with multi-head mode, num_epochs of each Trainer should be set as None.".format(t.name)
            # print(num_epochs, t.mix_ratio, base_steps_pur_epoch)
            max_train_steps = int(num_epochs * t.mix_ratio * base_steps_pur_epoch)
            if not t._as_auxilary:
                print('{}: expected train steps {}.'.format(t.name, max_train_steps))
                sys.stdout.flush()
                self._finish_steps[t.name] = max_train_steps
                self._finish[t.name] = False
            else:
                self._finish_steps[t.name] = 9999999999
                self._finish[t.name] = True

            global_steps += max_train_steps
            if t.name != sampling_reference:
                t._set_task_id(self._task_id_var)
                t.fit_reader(reader_dict[t.name])
            net_inputs.append(t._net_inputs)
            prefixes.append(t.name)
            mrs.append(t.mix_ratio)
            iterators.append(t._raw_iterator_fn())
            input_names.append(t._input_names)
            name_to_pos.append(t._name_to_position)
            joint_shape_and_dtypes.append(t._shape_and_dtypes)

        print('Estimated overall train steps {}.'.format(global_steps))
        sys.stdout.flush()
        self._overall_train_steps = global_steps

        iterator_fn = reader_helper.create_multihead_iterator_fn(iterators, prefixes, joint_shape_and_dtypes, \
            mrs, input_names, name_to_pos, dev_count=dev_count)
        feed_batch_process_fn = reader_helper.create_feed_batch_process_fn(net_inputs)

        if gpu_dev_count > 1:
            distribute_feeder_fn = data_feeder(iterator_fn, feed_batch_process_fn, phase=phase, is_multi=True)
        else:
            distribute_feeder_fn = iterator_fn()

        if phase == 'train':
            self._train_reader = distribute_feeder_fn
            self._feed_batch_process_fn = feed_batch_process_fn
        elif phase == 'predict':
            self._predict_reader = distribute_feeder_fn
            self._pred_feed_batch_process_fn = feed_batch_process_fn
        return distribute_feeder_fn

    def _check_finish(self, task_name, silent=False):
        trainers = {t.name:t for t in self._trainers}
        if trainers[task_name]._cur_train_step == self._finish_steps[task_name]:
            if not silent:
                print(task_name+' train finish!')
                sys.stdout.flush()
            self._finish[task_name]=True
        flags = list(set(self._finish.values()))
        return len(flags) == 1 and flags[0] == True
        
    def train(self, print_steps=5):
        """
        start training.

        Args:
            print_steps: int. Logging frequency of training message, e.g., current step, loss and speed.
        """
        iterator = self._train_reader
        self._distribute_train_prog = fluid.CompiledProgram(self._train_prog).with_data_parallel(loss_name=self._loss_var.name)
        for t in self._trainers:
            t._dist_train_init = True
            t._set_exe(self._exe)
            t._set_dist_train(self._distribute_train_prog)
            t._set_fetch_list(self._fetch_list)

        time_begin = time.time()
        for feed in iterator:
            # batch, task_id = feed
            rt_outputs, task_id = self.train_one_step(feed)

            task_rt_outputs = {k[len(self._trainers[task_id].name+'.'):]: v for k,v in rt_outputs.items() if k.startswith(self._trainers[task_id].name+'.')}
            self._trainers[task_id]._task_head.batch_postprocess(task_rt_outputs)
            if print_steps > 0 and self._cur_train_step % print_steps == 0:
                loss = rt_outputs[self._trainers[task_id].name+'.loss']
                loss = np.mean(np.squeeze(loss)).tolist()

                time_end = time.time()
                time_cost = time_end - time_begin

                print("global step: {}, {}: step {}/{} (epoch {}), loss: {:.3f}, speed: {:.2f} steps/s".format(
                       self._cur_train_step, ' '*self._name_pads[self._trainers[task_id].name]+self._trainers[task_id].name, \
                       (self._trainers[task_id]._cur_train_step-1) % self._trainers[task_id]._steps_pur_epoch + 1, \
                       self._trainers[task_id]._steps_pur_epoch, self._trainers[task_id]._cur_train_epoch, \
                       loss, print_steps / time_cost))
                sys.stdout.flush()
                time_begin = time.time()

            self._check_save()
            finish = self._check_finish(self._trainers[task_id].name)
            if finish:
                break

    def train_one_step(self, batch):
        if not self._dist_train_init:
            self._distribute_train_prog = fluid.CompiledProgram(self._train_prog).with_data_parallel(loss_name=self._loss_var.name)
            for t in self._trainers:
                t._dist_train_init = True
                t._set_exe(self._exe)
                t._set_dist_train(self._distribute_train_prog)
                t._set_fetch_list(self._fetch_list)
            self._dist_train_init = True

        if dev_count > 1:
            assert isinstance(batch, tuple)
            task_id = batch[0][0]['__task_id'][0]
        else:
            assert isinstance(batch, dict)
            task_id = batch['__task_id'][0]
            
        rt_outputs = self._trainers[task_id].train_one_step(batch)

        self._cur_train_step += 1
        self._check_save()
        return rt_outputs, task_id
        
    def predict_one_batch(self, task_name, batch):
        if dev_count > 1:
            raise NotImplementedError()

        # batch = next(self._predict_iterator_fn(task_name))
        t = self._trainers[self._task_name2id[task_name]]
        # t._set_exe(self._exe)
        t._set_dist_pred(self._trainers[self._task_name2id[task_name]]._pred_prog)
        rt_outputs = t.predict_one_batch(batch)
        return rt_outputs

    def predict(self, output_dir=None, print_steps=1000):
        raise NotImplementedError()
        # iterator = self._predict_iterator
        # self._distribute_pred_prog = fluid.CompiledProgram(self._pred_prog).with_data_parallel()

    @property
    def overall_train_steps(self):
        return self._overall_train_steps


================================================
FILE: paddlepalm/optimizer/__init__.py
================================================

from .adam import Adam


================================================
FILE: paddlepalm/optimizer/adam.py
================================================
# -*- coding: UTF-8 -*-
#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Optimization and learning rate scheduling."""

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import numpy as np
import paddle.fluid as fluid
from paddlepalm.optimizer.base_optimizer import Optimizer

class Adam(Optimizer):

    def __init__(self, loss_var, lr, lr_schedualer=None):

        Optimizer.__init__(self, loss_var, lr, lr_schedualer=None)

        self._loss = loss_var
        self._lr = lr
        self._lr_schedualer = lr_schedualer
    
    def _build(self, grad_clip=None):

        if self._lr_schedualer is not None:
            self._lr = self._lr_schedualer._build(self._lr)

        optimizer = fluid.optimizer.Adam(learning_rate=self._lr)

        if grad_clip is not None:
            clip_norm_thres = grad_clip
            # When using mixed precision training, scale the gradient clip threshold
            # by loss_scaling
            fluid.clip.set_gradient_clip(
                clip=fluid.clip.GradientClipByGlobalNorm(clip_norm=clip_norm_thres))

        _, param_grads = optimizer.minimize(self._loss)
        return param_grads

    def get_cur_learning_rate(self):
        return self._lr


================================================
FILE: paddlepalm/optimizer/base_optimizer.py
================================================

class Optimizer(object):

    def __init__(self, loss_var, lr, lr_schedualer=None):
        self._prog = None
        self._lr_schedualer = lr_schedualer

    def _build(self, grad_clip=None):
        raise NotImplementedError()

    def _set_prog(self, prog, init_prog):
        self._prog = prog
        self._init_prog = prog
        if self._lr_schedualer is not None:
            self._lr_schedualer._set_prog(prog)

    def get_cur_learning_rate(self):
        pass


================================================
FILE: paddlepalm/reader/__init__.py
================================================

from .cls import ClassifyReader
from .match import MatchReader
from .seq_label import SequenceLabelReader
from .mrc import MRCReader
from .mlm import MaskLMReader


================================================
FILE: paddlepalm/reader/base_reader.py
================================================
# -*- coding: UTF-8 -*-
#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from copy import copy
class Reader(object):
    """interface of data reader."""

    def __init__(self, phase='train'):
        """该函数完成一个Reader的构造，至少需要包含一个phase参数。
        注意：实现该构造函数时，必须保证对基类构造函数的调用，以创建必要的框架内建的成员变量。
        Args:
            phase: str类型。用于区分主干网络被调用时所处的运行阶段，目前支持训练阶段train和预测阶段predict
            """
        
        self._phase = phase
        self._batch_size = None
        self._num_epochs = 1
        self._register = set()
        self._registered_backbone = None

    @classmethod
    def create_register(self):
        return set()
        
    def clone(self, phase='train'):
        """拷贝一个新的reader对象。"""
        if phase == self._phase:
            return copy(self)
        else:
            ret = copy(self)
            ret._phase = phase
            return ret

    def require_attr(self, attr_name):
        """在注册器中新增一个需要产生的对象。

        Args:
            attr_name: 需要产出的对象的对象名，例如’segment_ids‘。
            """
        self._register.add(attr_name)
            
    def register_with(self, backbone):
        """根据backbone对输入对象的依赖，在注册器中对每个依赖的输入对象进行注册。

        Args:
            backbone: 需要对接的主干网络。
        """
        for attr in backbone.inputs_attr:
            self.require_attr(attr)
        self._registered_backbone = backbone

    def get_registered_backbone(self):
        """返回该reader所注册的backbone。"""
        return self._registered_backbone

    def _get_registed_attrs(self, attrs):
        ret = {}
        for i in self._register:
            if i not in attrs:
                raise NotImplementedError('output attr {} is not found in this reader.'.format(i))
            ret[i] = attrs[i]
        return ret

    def load_data(self, input_file, batch_size, num_epochs=None, \
                  file_format='tsv', shuffle_train=True):
        """将磁盘上的数据载入到reader中。

        注意：实现该方法时需要同步创建self._batch_size和self._num_epochs。

        Args:
            input_file: 数据集文件路径。文件格式需要满足`file_format`参数的要求。
            batch_size: 迭代器每次yield出的样本数量。注意：当环境中存在多个GPU时，batch_size需要保证被GPU卡数整除。
            num_epochs: 数据集遍历次数。默认为None, 在单任务模式下代表遍历一次，在多任务模式下该参数会被上层的Trainer进行自动赋值。该参数仅对训练阶段有效。
            file_format: 输入文件的文件格式。目前支持的格式: tsv. 默认为tsv.
            shuffle_train: 是否打乱训练集中的样本。默认为True。该参数仅对训练阶段有效。
        """
        raise NotImplementedError()

    @property
    def outputs_attr(self):
        """描述reader输出对象（被yield出的对象）的属性，包含各个对象的名字、shape以及数据类型。当某个对象为标量数据
        类型（如str, int, float等）时，shape设置为空列表[]，当某个对象的某个维度长度可变时，shape中的相应维度设置为-1。
        注意：当使用mini-batch梯度下降学习策略时，，应为常规的输入对象设置batch_size维度（一般为-1）
        Return:
            dict类型。对各个输入对象的属性描述。例如，
            对于文本分类和匹配任务，yield的输出内容可能包含如下的对象（下游backbone和task可按需访问其中的对象）
                {"token_ids": ([-1, max_len], 'int64'),
                 "input_ids": ([-1, max_len], 'int64'),
                 "segment_ids": ([-1, max_len], 'int64'),
                 "input_mask": ([-1, max_len], 'float32'),
                 "label": ([-1], 'int')}
        """
        raise NotImplementedError()
    
    def _iterator(self):
        """数据集遍历接口，注意，当数据集遍历到尾部时该接口应自动完成指针重置，即重新从数据集头部开始新的遍历。
        Yield:
            dict类型。符合outputs_attr描述的当前step的输出对象。
        """
        raise NotImplementedError()

    def get_epoch_outputs(self):
        """返回数据集每个epoch遍历后的输出对象。"""
        raise NotImplementedError()

    @property
    def num_examples(self):
        """数据集中的样本数量，即每个epoch中iterator所生成的样本数。注意，使用滑动窗口等可能导致数据集样本数发生变化的策略时
        该接口应返回runtime阶段的实际样本数。"""
        raise NotImplementedError()

    @property
    def num_epochs(self):
        """数据集遍历次数"""
        return self._num_epochs


================================================
FILE: paddlepalm/reader/cls.py
================================================
# -*- coding: UTF-8 -*-
#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from paddlepalm.reader.base_reader import Reader
from paddlepalm.reader.utils.reader4ernie import ClassifyReader as CLSReader


class ClassifyReader(Reader):
    """
    The reader completes the loading and processing of text classification dataset. Supported file format: tsv. 
    
    For tsv format, training dataset file should have two header areas, i.e., `label` and `text`, and test set only requires `text` area. For example,

    ```
    label [TAB] text
    1 [TAB] Today is a good day.
    0 [TAB] Such a terriable day!
    1 [TAB] I feel lucky to meet you, dear.
    1 [TAB] He likes sunshine and I like him :).
    0 [TAB] JUST! GO! OUT!
    ```

    CAUTIOUS: The first line of the file must be header! And areas are splited by tab (\\t).

    """
    
    def __init__(self, vocab_path, max_len, tokenizer='wordpiece', \
             lang='en', seed=None, do_lower_case=False, phase='train'):
        """Create a new Reader for loading and processing classification task data.

        Args:
          vocab_path: the vocab file path to do tokenization and token_ids generation.
          max_len: The maximum length of the sequence (after word segmentation). The part exceeding max_len will be removed from right.
          tokenizer: string type. The name of the used tokenizer. A tokenizer is to convert raw text into tokens. Avaliable tokenizers: wordpiece.
          lang: the language of dataset. Supported language: en (English), cn (Chinese). Default is en (English). 
          seed: int type. The random seed to shuffle dataset. Default is None, means no use of random seed.
          do_lower_case: bool type. Whether to do lowercase on English text. Default is False. This argument only works on English text.
          phase: the running phase of this reader. Supported phase: train, predict. Default is train.

        Return:
            a Reader object for classification task.
        """

        Reader.__init__(self, phase)

        assert lang.lower() in ['en', 'cn', 'english', 'chinese'], "supported language: en (English), cn (Chinese)."
        assert phase in ['train', 'predict'], "supported phase: train, predict."

        for_cn = lang.lower() == 'cn' or lang.lower() == 'chinese'

        self._register.add('token_ids')
        if phase == 'train':
            self._register.add('label_ids')

        self._is_training = phase == 'train'

        cls_reader = CLSReader(vocab_path,
                                max_seq_len=max_len,
                                do_lower_case=do_lower_case,
                                for_cn=for_cn,
                                random_seed=seed)
        self._reader = cls_reader

        self._phase = phase
        # self._batch_size = 
        # self._print_first_n = config.get('print_first_n', 0)


    @property
    def outputs_attr(self):
        """The contained output items (input features) of this reader."""
        attrs = {"token_ids": [[-1, -1], 'int64'],
                "position_ids": [[-1, -1], 'int64'],
                "segment_ids": [[-1, -1], 'int64'],
                "input_mask": [[-1, -1, 1], 'float32'],
                "label_ids": [[-1], 'int64'],
                "task_ids": [[-1, -1], 'int64']
                }
        return self._get_registed_attrs(attrs)


    def load_data(self, input_file, batch_size, num_epochs=None, \
                  file_format='tsv', shuffle_train=True):
        """Load classification data into reader. 

        Args:
            input_file: the dataset file path. File format should keep consistent with `file_format` argument.
            batch_size: number of examples for once yield. CAUSIOUS! If your environment exists multiple GPU devices (marked as dev_count), the batch_size should be divided by dev_count with no remainder!
            num_epochs: the travelsal times of input examples. Default is None, means once for single-task learning and automatically calculated for multi-task learning. This argument only works on train phase.
            file_format: the file format of input file. Supported format: tsv. Default is tsv.
            shuffle_train: whether to shuffle training dataset. Default is True. This argument only works on training phase.

        """
        self._batch_size = batch_size
        self._num_epochs = num_epochs
        self._data_generator = self._reader.data_generator( \
            input_file, batch_size, num_epochs if self._phase == 'train' else 1, \
            shuffle=shuffle_train if self._phase == 'train' else False, \
            phase=self._phase)

    def _iterator(self): 

        names = ['token_ids', 'segment_ids', 'position_ids', 'task_ids', 'input_mask', 
            'label_ids', 'unique_ids']
        for batch in self._data_generator():
            outputs = {n: i for n,i in zip(names, batch)}
            ret = {}
            # TODO: move runtime shape check here
            for attr in self.outputs_attr.keys():
                ret[attr] = outputs[attr]
            yield ret

    def get_epoch_outputs(self):
        return {'examples': self._reader.get_examples(self._phase),
                'features': self._reader.get_features(self._phase)}

    @property
    def num_examples(self):
        return self._reader.get_num_examples(phase=self._phase)

    @property
    def num_epochs(self):
        return self._num_epochs


================================================
FILE: paddlepalm/reader/match.py
================================================
# -*- coding: UTF-8 -*-
#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from paddlepalm.reader.base_reader import Reader
from paddlepalm.reader.utils.reader4ernie import ClassifyReader as CLSReader


class MatchReader(Reader):
    """
    The reader completes the loading and processing of matching-like task (e.g, query-query, question-answer, text similarity, natural language inference) dataset. Supported file format: tsv. 
    
    For pointwise learning strategy, there should be two fields in training dataset file, i.e., `text_a`, `text_b` and `label`. For pairwise learning, there should exist three fields, i.e., `text_a`, `text_b` and `text_b_neg`. For predicting, only `text_a` and `text_b` are required.
    
    A pointwise learning case shows as follows:
    ```
    label [TAB] text_a [TAB] text_b
    1 [TAB] Today is a good day. [TAB] what a nice day!
    0 [TAB] Such a terriable day! [TAB] There is a dog.
    1 [TAB] I feel lucky to meet you, dear. [TAB] You are my lucky, darling.
    1 [TAB] He likes sunshine and I like him :). [TAB] I like him. He like sunshine.
    0 [TAB] JUST! GO! OUT! [TAB] Come in please.
    ```
    A pairwise learning case shows as follows:
    text_a [TAB] text_b [TAB] text_b_neg
    Today is a good day. [TAB] what a nice day! [TAB] terriable day!
    Such a terriable day! [TAB] So terriable today! [TAB] There is a dog.
    I feel lucky to meet you, dear. [TAB] You are my lucky, darling. [TAB] Buy some bananas, okey?
    He likes sunshine and I like him :). [TAB] I like him. He like sunshine. [TAB] He has a dog.
    JUST! GO! OUT! [TAB] go out now! [TAB] Come in please.

    CAUTIOUS: the HEADER is required for each dataset file! And fields (columns) should be splited by Tab (\\t).

    """
    
    def __init__(self, vocab_path, max_len, tokenizer='wordpiece', lang='en', seed=None, \
        do_lower_case=False, learning_strategy='pointwise', phase='train', dev_count=1, print_prefix=''): 
        """Create a new Reader for classification task data.

        Args:
          vocab_path: the vocab file path to do tokenization and token_ids generation.
          max_len: The maximum length of the sequence (after word segmentation). The part exceeding max_len will be removed from right.
          tokenizer: string type. The name of the used tokenizer. A tokenizer is to convert raw text into tokens. Avaliable tokenizers: wordpiece.
          lang: the language of dataset. Supported language: en (English), cn (Chinese). Default is en (English). 
          seed: int type. The random seed to shuffle dataset. Default is None, means no use of random seed.
          do_lower_case: bool type. Whether to do lowercase on English text. Default is False. This argument only works on English text.
          learning_strategy: string type. This only works for training phase. Available strategies: pointwise, pairwise.
          phase: the running phase of this reader. Supported phase: train, predict. Default is train.

        Return:
            a Reader object for matching-like task.
        """

        Reader.__init__(self, phase)

        assert lang.lower() in ['en', 'cn', 'english', 'chinese'], "supported language: en (English), cn (Chinese)."
        assert phase in ['train', 'predict'], "supported phase: train, predict."

        for_cn = lang.lower() == 'cn' or lang.lower() == 'chinese'

        self._register.add('token_ids')
        if phase == 'train':
            if learning_strategy == 'pointwise':
                self._register.add('label_ids')
            if learning_strategy == 'pairwise':
                self._register.add('token_ids_neg')
                self._register.add('position_ids_neg')
                self._register.add('segment_ids_neg')
                self._register.add('input_mask_neg')
                self._register.add('task_ids_neg')

        self._is_training = phase == 'train'
        self._learning_strategy = learning_strategy


        match_reader = CLSReader(vocab_path,
                                max_seq_len=max_len,
                                do_lower_case=do_lower_case,
                                for_cn=for_cn,
                                random_seed=seed,
                                learning_strategy = learning_strategy)
            
        self._reader = match_reader
        self._dev_count = dev_count
        self._phase = phase


    @property
    def outputs_attr(self):
        attrs = {"token_ids": [[-1, -1], 'int64'],
                "position_ids": [[-1, -1], 'int64'],
                "segment_ids": [[-1, -1], 'int64'],
                "input_mask": [[-1, -1, 1], 'float32'],
                "task_ids": [[-1, -1], 'int64'],
                "label_ids": [[-1], 'int64'],
                "token_ids_neg": [[-1, -1], 'int64'],
                "position_ids_neg": [[-1, -1], 'int64'],
                "segment_ids_neg": [[-1, -1], 'int64'],
                "input_mask_neg": [[-1, -1, 1], 'float32'],
                "task_ids_neg": [[-1, -1], 'int64']
                }
        return self._get_registed_attrs(attrs)


    def load_data(self, input_file, batch_size, num_epochs=None, \
                  file_format='tsv', shuffle_train=True):
        """Load matching data into reader. 

        Args:
            input_file: the dataset file path. File format should keep consistent with `file_format` argument.
            batch_size: number of examples for once yield. CAUSIOUS! If your environment exists multiple GPU devices (marked as dev_count), the batch_size should be divided by dev_count with no remainder!
            num_epochs: the travelsal times of input examples. Default is None, means once for single-task learning and automatically calculated for multi-task learning. This argument only works on train phase.
            file_format: the file format of input file. Supported format: tsv. Default is tsv.
            shuffle_train: whether to shuffle training dataset. Default is True. This argument only works on training phase.

        """
        self._batch_size = batch_size
        self._num_epochs = num_epochs
        self._data_generator = self._reader.data_generator( \
            input_file, batch_size, num_epochs if self._phase == 'train' else 1, \
            shuffle=shuffle_train if self._phase == 'train' else False, \
            phase=self._phase)

    def _iterator(self): 

        
        names = ['token_ids', 'segment_ids', 'position_ids', 'task_ids', 'input_mask', 'label_ids', \
            'token_ids_neg', 'segment_ids_neg', 'position_ids_neg', 'task_ids_neg', 'input_mask_neg']
        
        if self._learning_strategy == 'pairwise':
            names.remove('label_ids')


        for batch in self._data_generator():
            outputs = {n: i for n,i in zip(names, batch)}
            ret = {}
            # TODO: move runtime shape check here
            for attr in self.outputs_attr.keys():
                ret[attr] = outputs[attr]
            yield ret

    @property
    def num_examples(self):
        return self._reader.get_num_examples(phase=self._phase)

    @property
    def num_epochs(self):
        return self._num_epochs


================================================
FILE: paddlepalm/reader/mlm.py
================================================
# -*- coding: UTF-8 -*-
#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from paddlepalm.reader.base_reader import Reader
from paddlepalm.reader.utils.reader4ernie import MaskLMReader as MLMReader
import numpy as np

class MaskLMReader(Reader):
    
    def __init__(self, vocab_path, max_len, tokenizer='wordpiece', \
             lang='en', seed=None, do_lower_case=False, phase='train', dev_count=1, print_prefix=''):
        """
        Args:
            phase: train, eval, pred
        """


        Reader.__init__(self, phase)

        assert lang.lower() in ['en', 'cn', 'english', 'chinese'], "supported language: en (English), cn (Chinese)."
        assert phase in ['train', 'predict'], "supported phase: train, predict."

        for_cn = lang.lower() == 'cn' or lang.lower() == 'chinese'

        self._register.add('mask_pos')
        if phase == 'train':
            self._register.add('mask_label')
        self._is_training = phase == 'train'

        mlm_reader = MLMReader(vocab_path,
                                max_seq_len=max_len,
                                do_lower_case=do_lower_case,
                                for_cn=for_cn,
                                random_seed=seed)
        self._reader = mlm_reader

        self._phase = phase
        self._dev_count = dev_count


    @property
    def outputs_attr(self):
        attrs = {"token_ids": [[-1, -1], 'int64'],
                "position_ids": [[-1, -1], 'int64'],
                "segment_ids": [[-1, -1], 'int64'],
                "input_mask": [[-1, -1, 1], 'float32'],
                "task_ids": [[-1, -1], 'int64'],
                "mask_label": [[-1], 'int64'],
                "mask_pos": [[-1], 'int64']
                }

        return self._get_registed_attrs(attrs)


    def load_data(self, input_file, batch_size, num_epochs=None, \
                  file_format='csv', shuffle_train=True):
        self._batch_size = batch_size
        self._num_epochs = num_epochs
        self._data_generator = self._reader.data_generator( \
            input_file, batch_size, num_epochs if self._phase == 'train' else 1, \
            shuffle=shuffle_train if self._phase == 'train' else False, \
            phase=self._phase)

    def _iterator(self): 

        names = ['token_ids', 'position_ids', 'segment_ids', 'input_mask', 
            'task_ids', 'mask_label', 'mask_pos']
        for batch in self._data_generator():
            outputs = {n: i for n,i in zip(names, batch)}
            ret = {}
            # TODO: move runtime shape check here
            for attr in self.outputs_attr.keys():
                ret[attr] = outputs[attr]

            yield ret

    def get_epoch_outputs(self):
        return {'examples': self._reader.get_examples(self._phase),
                'features': self._reader.get_features(self._phase)}

    @property
    def num_examples(self):
        return self._reader.get_num_examples(phase=self._phase)

    @property
    def num_epochs(self):
        return self._num_epochs


================================================
FILE: paddlepalm/reader/mrc.py
================================================
# -*- coding: UTF-8 -*-
#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from paddlepalm.reader.base_reader import Reader
from paddlepalm.reader.utils.reader4ernie import MRCReader as MRCReader_t
import numpy as np

class MRCReader(Reader):
    """
    The reader completes the loading and processing of SQuAD like machine reading comprehension dataset. Supported file format: json. 
    
    The outermost data structure of a dataset is a dictionary, which contains the dataset version number field and data field. In the data field, each example contains the title of the article and several paragraphs. Each paragraph contains a paragraph context corresponed question-answer pairs. For each q-a pair, it contains a question with globally unique ID, as well as (several) answers. Each answer item contains the text of the answer itself and its starting position of the context. Note that the starting position is at the character level. In addition, for the test set, answers field is not necessary.

    A typical case is shown as follows.
    {"version": "1.0",
     "data": [
         {"title": "...",
          "paragraphs": [
             {"context": "...",
              "qas": [
                 {"question": "..."
                  "id": "..."
                  "answers": [
                     {"text": "...",
                      "answer_start": ...}
                     {...}
                     ...
                     ]
                  }
                  {...}
                  ...
                  ]
              }
              {...},
              ...
              ]
          }
          {...}
          ...
      ]
     }
    
    """

    def __init__(self, vocab_path, max_len, max_query_len, doc_stride, \
                 tokenizer='wordpiece', lang='en', seed=None, do_lower_case=False, \
                 remove_noanswer=True, phase='train'):
        """Create a new Reader for loading and processing machine reading comprehension task data.

        Args:
          vocab_path: the vocab file path to do tokenization and token_ids generation.
          max_len: the maximum length of the sequence (after word segmentation). The part exceeding max_len will be removed from right.
          max_query_len: the maximum length of query/question (after word segmentation).
          doc_stride: the slice stride of context window.
          tokenizer: string type. The name of the used tokenizer. A tokenizer is to convert raw text into tokens. Avaliable tokenizers: wordpiece.
          lang: the language of dataset. Supported language: en (English), cn (Chinese). Default is en (English). 
          seed: int type. The random seed to shuffle dataset. Default is None, means no use of random seed.
          do_lower_case: bool type. Whether to do lowercase on English text. Default is False. This argument only works on English text.
          remove_noanswer: bool type. Whether to remove no answer question and invalid answer.
          phase: the running phase of this reader. Supported phase: train, predict. Default is train.

        Return:
            a Reader object for classification task.
        """

        Reader.__init__(self, phase)


        assert lang.lower() in ['en', 'cn', 'english', 'chinese'], "supported language: en (English), cn (Chinese)."
        assert phase in ['train', 'predict'], "supported phase: train, predict."

        for_cn = lang.lower() == 'cn' or lang.lower() == 'chinese'


        self._register.add('token_ids')
        if phase == 'train':
            self._register.add("start_positions")
            self._register.add("end_positions")
        else:
            self._register.add("unique_ids")
            

        self._is_training = phase == 'train'

        mrc_reader = MRCReader_t(vocab_path,
                                 max_seq_len=max_len,
                                 do_lower_case=do_lower_case,
                                 tokenizer=tokenizer,
                                 doc_stride=doc_stride,
                                 remove_noanswer=remove_noanswer,
                                 max_query_length=max_query_len,
                                 for_cn=for_cn,
                                 random_seed=seed)
        self._reader = mrc_reader

        self._phase = phase
 

    @property
    def outputs_attr(self):
        attrs = {"token_ids": [[-1, -1], 'int64'],
                "position_ids": [[-1, -1], 'int64'],
                "segment_ids": [[-1, -1], 'int64'],
                "input_mask": [[-1, -1, 1], 'float32'],
                "start_positions": [[-1], 'int64'],
                "end_positions": [[-1], 'int64'],
                "task_ids": [[-1, -1], 'int64'],
                "unique_ids": [[-1], 'int64']
                }
        return self._get_registed_attrs(attrs)

    @property
    def epoch_outputs_attr(self):
        if not self._is_training:
            return {"examples": None,
                    "features": None}

    def load_data(self, input_file, batch_size, num_epochs=None, file_format='csv', shuffle_train=True):
        """Load mrc data into reader. 

        Args:
            input_file: the dataset file path. File format should keep consistent with `file_format` argument.
            batch_size: number of examples for once yield. CAUSIOUS! If your environment exists multiple GPU devices (marked as dev_count), the batch_size should be divided by dev_count with no remainder!
            num_epochs: the travelsal times of input examples. Default is None, means once for single-task learning and automatically calculated for multi-task learning. This argument only works on train phase.
            file_format: the file format of input file. Supported format: tsv. Default is tsv.
            shuffle_train: whether to shuffle training dataset. Default is True. This argument only works on training phase.

        """
        self._batch_size = batch_size
        self._num_epochs = num_epochs
        self._data_generator = self._reader.data_generator( \
            input_file, batch_size, num_epochs if self._phase == 'train' else 1, \
            shuffle=shuffle_train if self._phase == 'train' else False, \
            phase=self._phase)
    def _iterator(self): 

        names = ['token_ids', 'segment_ids', 'position_ids', 'task_ids', 'input_mask', 
            'start_positions', 'end_positions', 'unique_ids']
        
        if self._is_training:
            names.remove('unique_ids')
        
        for batch in self._data_generator():
            outputs = {n: i for n,i in zip(names, batch)}
            ret = {}
            # TODO: move runtime shape check here
            for attr in self.outputs_attr.keys():
                ret[attr] = outputs[attr]
            if not self._is_training:
                assert 'unique_ids' in ret, ret
            yield ret
    

    def get_epoch_outputs(self):

        return {'examples': self._reader.get_examples(self._phase),
                'features': self._reader.get_features(self._phase)}

    @property
    def num_examples(self):
        return self._reader.get_num_examples(phase=self._phase)

    @property
    def num_epochs(self):
        return self._num_epochs


================================================
FILE: paddlepalm/reader/seq_label.py
================================================
# -*- coding: UTF-8 -*-
#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from paddlepalm.reader.base_reader import Reader
from paddlepalm.reader.utils.reader4ernie import SequenceLabelReader as SLReader

class SequenceLabelReader(Reader):
    """
    The reader completes the loading and processing of sequence labeling type task (e.g, pos tagging, named entity recognition) dataset. Supported file format: tsv. 
    """
    
    def __init__(self, vocab_path, max_len, label_map_config, tokenizer='wordpiece', \
             lang='en', seed=None, do_lower_case=False, phase='train', dev_count=1, print_prefix=''):
        """  
        Args:
            phase: train, eval, pred
            lang: en, ch, ...
        """
        
        Reader.__init__(self, phase)

        assert lang.lower() in ['en', 'cn', 'english', 'chinese'], "supported language: en (English), cn (Chinese)."
        assert phase in ['train', 'predict'], "supported phase: train, predict."

        for_cn = lang.lower() == 'cn' or lang.lower() == 'chinese'

        self._register.add('token_ids')
        self._register.add('seq_lens')
        if phase == 'train':
            self._register.add('label_ids')

        self._is_training = phase == 'train'

        ner_reader = SLReader(vocab_path,
                                max_seq_len=max_len,
                                do_lower_case=do_lower_case,
                                for_cn=for_cn,
                                random_seed=seed,
                                label_map_config=label_map_config)
        self._reader = ner_reader
        self._phase = phase
        self._dev_count = dev_count

 
    @property
    def outputs_attr(self):
        attrs = {"token_ids": [[-1, -1], 'int64'],
                "position_ids": [[-1, -1], 'int64'],
                "segment_ids": [[-1, -1], 'int64'],
                "task_ids": [[-1, -1], 'int64'],
                "input_mask": [[-1, -1, 1], 'float32'],
                "seq_lens": [[-1], 'int64'],
                "label_ids": [[-1, -1], 'int64']}
        return self._get_registed_attrs(attrs)


    def load_data(self, input_file, batch_size, num_epochs=None, \
                  file_format='tsv', shuffle_train=True):
        """Load sequence labeling data into reader. 

        Args:
            input_file: the dataset file path. File format should keep consistent with `file_format` argument.
            batch_size: number of examples for once yield. CAUSIOUS! If your environment exists multiple GPU devices (marked as dev_count), the batch_size should be divided by dev_count with no remainder!
            num_epochs: the travelsal times of input examples. Default is None, means once for single-task learning and automatically calculated for multi-task learning. This argument only works on train phase.
            file_format: the file format of input file. Supported format: tsv. Default is tsv.
            shuffle_train: whether to shuffle training dataset. Default is True. This argument only works on training phase.

        """
        self._batch_size = batch_size
        self._num_epochs = num_epochs
        self._data_generator = self._reader.data_generator( \
            input_file, batch_size, num_epochs if self._phase == 'train' else 1, \
            shuffle=shuffle_train if self._phase == 'train' else False, \
            phase=self._phase)

    def _iterator(self): 

        names = ['token_ids', 'segment_ids', 'position_ids', 'task_ids', 'input_mask', 
            'label_ids', 'seq_lens', 'label_ids']
        for batch in self._data_generator():
            outputs = {n: i for n,i in zip(names, batch)}
            ret = {}
            # TODO: move runtime shape check here
            for attr in self.outputs_attr.keys():
                ret[attr] = outputs[attr]
            yield ret

    def get_epoch_outputs(self):
        return {'examples': self._reader.get_examples(self._phase),
                'features': self._reader.get_features(self._phase)}

    @property
    def num_examples(self):
        return self._reader.get_num_examples(phase=self._phase)

    @property
    def num_epochs(self):
        return self._num_epochs


================================================
FILE: paddlepalm/reader/utils/__init__.py
================================================


================================================
FILE: paddlepalm/reader/utils/batching4bert.py
================================================
# -*- coding: UTF-8 -*-
#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Mask, padding and batching."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np


def mask(batch_tokens, total_token_num, vocab_size, CLS=1, SEP=2, MASK=3):
    """
    Add mask for batch_tokens, return out, mask_label, mask_pos;
    Note: mask_pos responding the batch_tokens after padded;
    """
    max_len = max([len(sent) for sent in batch_tokens])
    mask_label = []
    mask_pos = []
    prob_mask = np.random.rand(total_token_num)
    # Note: the first token is [CLS], so [low=1]
    replace_ids = np.random.randint(1, high=vocab_size, size=total_token_num)
    pre_sent_len = 0
    prob_index = 0
    for sent_index, sent in enumerate(batch_tokens):
        mask_flag = False
        prob_index += pre_sent_len
        for token_index, token in enumerate(sent):
            prob = prob_mask[prob_index + token_index]
            if prob > 0.15:
                continue
            elif 0.03 < prob <= 0.15:
                # mask
                if token != SEP and token != CLS:
                    mask_label.append(sent[token_index])
                    sent[token_index] = MASK
                    mask_flag = True
                    mask_pos.append(sent_index * max_len + token_index)
            elif 0.015 < prob <= 0.03:
                # random replace
                if token != SEP and token != CLS:
                    mask_label.append(sent[token_index])
                    sent[token_index] = replace_ids[prob_index + token_index]
                    mask_flag = True
                    mask_pos.append(sent_index * max_len + token_index)
            else:
                # keep the original token
                if token != SEP and token != CLS:
                    mask_label.append(sent[token_index])
                    mask_pos.append(sent_index * max_len + token_index)
        pre_sent_len = len(sent)
        # ensure at least mask one word in a sentence
        while not mask_flag:
            token_index = int(np.random.randint(1, high=len(sent) - 1, size=1))
            if sent[token_index] != SEP and sent[token_index] != CLS:
                mask_label.append(sent[token_index])
                sent[token_index] = MASK
                mask_flag = True
                mask_pos.append(sent_index * max_len + token_index)
    mask_label = np.array(mask_label).astype("int64").reshape([-1])
    mask_pos = np.array(mask_pos).astype("int64").reshape([-1])
    return batch_tokens, mask_label, mask_pos


def prepare_batch_data(insts,
                       total_token_num,
                       max_len=None,
                       voc_size=0,
                       pad_id=None,
                       cls_id=None,
                       sep_id=None,
                       mask_id=None,
                       return_input_mask=True,
                       return_max_len=True,
                       return_num_token=False):
    """
    1. generate Tensor of data
    2. generate Tensor of position
    3. generate self attention mask, [shape: batch_size *  max_len * max_len]
    """
    batch_src_ids = [inst[0] for inst in insts]
    batch_sent_ids = [inst[1] for inst in insts]
    batch_pos_ids = [inst[2] for inst in insts]
    labels_list = []
    # compatible with mrqa, whose example includes start/end positions, 
    # or unique id
    for i in range(3, len(insts[0]), 1):
        labels = [inst[i] for inst in insts]
        labels = np.array(labels).astype("int64").reshape([-1])
        labels_list.append(labels)
    # First step: do mask without padding
    if mask_id >= 0:
        out, mask_label, mask_pos = mask(
            batch_src_ids,
            total_token_num,
            vocab_size=voc_size,
            CLS=cls_id,
            SEP=sep_id,
            MASK=mask_id)
    else:
        out = batch_src_ids
    # Second step: padding
    src_id, self_input_mask = pad_batch_data(
        out, 
        max_len=max_len,
        pad_idx=pad_id, return_input_mask=True)
    pos_id = pad_batch_data(
        batch_pos_ids,
        max_len=max_len,
        pad_idx=pad_id,
        return_pos=False,
        return_input_mask=False)
    sent_id = pad_batch_data(
        batch_sent_ids,
        max_len=max_len,
        pad_idx=pad_id,
        return_pos=False,
        return_input_mask=False)
    if mask_id >= 0:
        return_list = [
            src_id, pos_id, sent_id, self_input_mask, mask_label, mask_pos
        ] + labels_list
    else:
        return_list = [src_id, pos_id, sent_id, self_input_mask] + labels_list
    return return_list if len(return_list) > 1 else return_list[0]


def pad_batch_data(insts,
                   max_len=None,
                   pad_idx=0,
                   return_pos=False,
                   return_input_mask=False,
                   return_max_len=False,
                   return_num_token=False):
    """
    Pad the instances to the max sequence length in batch, and generate the
    corresponding position data and input mask.
    """
    return_list = []
    if max_len is None:
        max_len = max(len(inst) for inst in insts)
    # Any token included in dict can be used to pad, since the paddings' loss
    # will be masked out by weights and make no effect on parameter gradients.
    inst_data = np.array([
        list(inst) + list([pad_idx] * (max_len - len(inst))) for inst in insts
    ])
    return_list += [inst_data.astype("int64").reshape([-1, max_len])]
    # position data
    if return_pos:
        inst_pos = np.array([
            list(range(0, len(inst))) + [pad_idx] * (max_len - len(inst))
            for inst in insts
        ])
        return_list += [inst_pos.astype("int64").reshape([-1, max_len])]
    if return_input_mask:
        # This is used to avoid attention on paddings.
        input_mask_data = np.array([[1] * len(inst) + [0] *
                                    (max_len - len(inst)) for inst in insts])
        input_mask_data = np.expand_dims(input_mask_data, axis=-1)
        return_list += [input_mask_data.astype("float32")]
    if return_max_len:
        return_list += [max_len]
    if return_num_token:
        num_token = 0
        for inst in insts:
            num_token += len(inst)
        return_list += [num_token]
    return return_list if len(return_list) > 1 else return_list[0]


if __name__ == "__main__":
    pass


================================================
FILE: paddlepalm/reader/utils/batching4ernie.py
================================================
# -*- coding: UTF-8 -*-
#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Mask, padding and batching."""

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import numpy as np

from six.moves import xrange


def mask(batch_tokens,
         seg_labels,
         mask_word_tags,
         total_token_num,
         vocab_size,
         CLS=1,
         SEP=2,
         MASK=3):
    """
    Add mask for batch_tokens, return out, mask_label, mask_pos;
    Note: mask_pos responding the batch_tokens after padded;
    """
    max_len = max([len(sent) for sent in batch_tokens])
    mask_label = []
    mask_pos = []
    prob_mask = np.random.rand(total_token_num)
    # Note: the first token is [CLS], so [low=1]
    replace_ids = np.random.randint(1, high=vocab_size, size=total_token_num)
    pre_sent_len = 0
    prob_index = 0
    for sent_index, sent in enumerate(batch_tokens):
        mask_flag = False
        mask_word = mask_word_tags[sent_index]
        prob_index += pre_sent_len
        if mask_word:
            beg = 0
            for token_index, token in enumerate(sent):
                seg_label = seg_labels[sent_index][token_index]
                if seg_label == 1:
                    continue
                if beg == 0:
                    if seg_label != -1:
                        beg = token_index
                    continue

                prob = prob_mask[prob_index + beg]
                if prob > 0.15:
                    pass
                else:
                    for index in xrange(beg, token_index):
                        prob = prob_mask[prob_index + index]
                        base_prob = 1.0
                        if index == beg:
                            base_prob = 0.15
                        if base_prob * 0.2 < prob <= base_prob:
                            mask_label.append(sent[index])
                            sent[index] = MASK
                            mask_flag = True
                            mask_pos.append(sent_index * max_len + index)
                        elif base_prob * 0.1 < prob <= base_prob * 0.2:
                            mask_label.append(sent[index])
                            sent[index] = replace_ids[prob_index + index]
                            mask_flag = True
                            mask_pos.append(sent_index * max_len + index)
                        else:
                            mask_label.append(sent[index])
                            mask_pos.append(sent_index * max_len + index)

                if seg_label == -1:
                    beg = 0
                else:
                    beg = token_index
        else:
            for token_index, token in enumerate(sent):
                prob = prob_mask[prob_index + token_index]
                if prob > 0.15:
                    continue
                elif 0.03 < prob <= 0.15:
                    # mask
                    if token != SEP and token != CLS:
                        mask_label.append(sent[token_index])
                        sent[token_index] = MASK
                        mask_flag = True
                        mask_pos.append(sent_index * max_len + token_index)
                elif 0.015 < prob <= 0.03:
                    # random replace
                    if token != SEP and token != CLS:
                        mask_label.append(sent[token_index])
                        sent[token_index] = replace_ids[prob_index +
                                                        token_index]
                        mask_flag = True
                        mask_pos.append(sent_index * max_len + token_index)
                else:
                    # keep the original token
                    if token != SEP and token != CLS:
                        mask_label.append(sent[token_index])
                        mask_pos.append(sent_index * max_len + token_index)

        pre_sent_len = len(sent)

    mask_label = np.array(mask_label).astype("int64").reshape([-1])
    mask_pos = np.array(mask_pos).astype("int64").reshape([-1])
    return batch_tokens, mask_label, mask_pos


def pad_batch_data(insts,
                   pad_idx=0,
                   return_pos=False,
                   return_input_mask=False,
                   return_max_len=False,
                   return_num_token=False,
                   return_seq_lens=False):
    """
    Pad the instances to the max sequence length in batch, and generate the
    corresponding position data and attention bias.
    """
    return_list = []
    max_len = max(len(inst) for inst in insts)
    # Any token included in dict can be used to pad, since the paddings' loss
    # will be masked out by weights and make no effect on parameter gradients.

    inst_data = np.array(
        [inst + list([pad_idx] * (max_len - len(inst))) for inst in insts])
    return_list += [inst_data.astype("int64").reshape([-1, max_len])]

    # position data
    if return_pos:
        inst_pos = np.array([
            list(range(0, len(inst))) + [pad_idx] * (max_len - len(inst))
            for inst in insts
        ])

        return_list += [inst_pos.astype("int64").reshape([-1, max_len])]

    if return_input_mask:
        # This is used to avoid attention on paddings.
        input_mask_data = np.array([[1] * len(inst) + [0] *
                                    (max_len - len(inst)) for inst in insts])
        input_mask_data = np.expand_dims(input_mask_data, axis=-1)
        return_list += [input_mask_data.astype("float32")]

    if return_max_len:
        return_list += [max_len]

    if return_num_token:
        num_token = 0
        for inst in insts:
            num_token += len(inst)
        return_list += [num_token]

    if return_seq_lens:
        seq_lens = np.array([len(inst) for inst in insts])
        return_list += [seq_lens.astype("int64").reshape([-1])]

    return return_list if len(return_list) > 1 else return_list[0]


if __name__ == "__main__":

    pass


================================================
FILE: paddlepalm/reader/utils/mlm_batching.py
================================================
# -*- coding: UTF-8 -*-
#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Mask, padding and batching."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np


def mask(batch_tokens, total_token_num, vocab_size, CLS=1, SEP=2, MASK=3, dev_count=1):
    """
    Add mask for batch_tokens, return out, mask_label, mask_pos;
    Note: mask_pos responding the batch_tokens after padded;
    """
    max_len = max([len(sent) for sent in batch_tokens])

    multidev_batch_tokens = []
    multidev_mask_label = []
    multidev_mask_pos = []

    big_batch_tokens = batch_tokens
    stride = len(batch_tokens) // dev_count
    if stride == 0:
        return None, None, None
    p = stride

    for i in range(dev_count):
        batch_tokens = big_batch_tokens[p-stride:p]
        p += stride
        mask_label = []
        mask_pos = []
        prob_mask = np.random.rand(total_token_num)
        # Note: the first token is [CLS], so [low=1]
        replace_ids = np.random.randint(1, high=vocab_size, size=total_token_num)
        pre_sent_len = 0
        prob_index = 0
        for sent_index, sent in enumerate(batch_tokens):
            mask_flag = False
            prob_index += pre_sent_len
            for token_index, token in enumerate(sent):
                prob = prob_mask[prob_index + token_index]
                if prob > 0.15:
                    continue
                elif 0.03 < prob <= 0.15:
                    # mask
                    if token != SEP and token != CLS:
                        mask_label.append(sent[token_index])
                        sent[token_index] = MASK
                        mask_flag = True
                        mask_pos.append(sent_index * max_len + token_index)
                elif 0.015 < prob <= 0.03:
                    # random replace
                    if token != SEP and token != CLS:
                        mask_label.append(sent[token_index])
                        sent[token_index] = replace_ids[prob_index + token_index]
                        mask_flag = True
                        mask_pos.append(sent_index * max_len + token_index)
                else:
                    # keep the original token
                    if token != SEP and token != CLS:
                        mask_label.append(sent[token_index])
                        mask_pos.append(sent_index * max_len + token_index)
            pre_sent_len = len(sent)
            # ensure at least mask one word in a sentence
            while not mask_flag:
                token_index = int(np.random.randint(1, high=len(sent) - 1, size=1))
                if sent[token_index] != SEP and sent[token_index] != CLS:
                    mask_label.append(sent[token_index])
                    sent[token_index] = MASK
                    mask_flag = True
                    mask_pos.append(sent_index * max_len + token_index)
        mask_label = np.array(mask_label).astype("int64").reshape([-1])
        mask_pos = np.array(mask_pos).astype("int64").reshape([-1])

        multidev_batch_tokens.extend(batch_tokens)
        multidev_mask_label.append(mask_label)
        multidev_mask_pos.append(mask_pos)
    
    return multidev_batch_tokens, multidev_mask_label, multidev_mask_pos


def prepare_batch_data(insts,
                       total_token_num,
                       max_len=None,
                       voc_size=0,
                       pad_id=None,
                       cls_id=None,
                       sep_id=None,
                       mask_id=None,
                       task_id=0,
                       return_input_mask=True,
                       return_max_len=True,
                       return_num_token=False, 
                       dev_count=1):
    """
    1. generate Tensor of data
    2. generate Tensor of position
    3. generate self attention mask, [shape: batch_size *  max_len * max_len]
    """
    batch_src_ids = [inst[0] for inst in insts]
    batch_sent_ids = [inst[1] for inst in insts]
    batch_pos_ids = [inst[2] for inst in insts]

    # 这里是否应该反过来？？？否则在task layer里展开后的word embedding是padding后的，这时候word的index是跟没有padding时的index对不上的？
    # First step: do mask without padding
    out, mask_label, mask_pos = mask(
        batch_src_ids,
        total_token_num,
        vocab_size=voc_size,
        CLS=cls_id,
        SEP=sep_id,
        MASK=mask_id,
        dev_count=dev_count)
    # Second step: padding
    src_id, self_input_mask = pad_batch_data(
        out, 
        max_len=max_len,
        pad_idx=pad_id, return_input_mask=True)

    pos_id = pad_batch_data(
        batch_pos_ids,
        max_len=max_len,
        pad_idx=pad_id,
        return_pos=False,
        return_input_mask=False)
    sent_id = pad_batch_data(
        batch_sent_ids,
        max_len=max_len,
        pad_idx=pad_id,
        return_pos=False,
        return_input_mask=False)
    task_ids = np.ones_like(
        src_id, dtype="int64") * task_id
    return_list = [
        src_id, pos_id, sent_id, self_input_mask, task_ids, mask_label, mask_pos
    ]
    return return_list


def pad_batch_data(insts,
                   max_len=None,
                   pad_idx=0,
                   return_pos=False,
                   return_input_mask=False,
                   return_max_len=False,
                   return_num_token=False):
    """
    Pad the instances to the max sequence length in batch, and generate the
    corresponding position data and input mask.
    """
    return_list = []
    if max_len is None:
        max_len = max(len(inst) for inst in insts)
    # Any token included in dict can be used to pad, since the paddings' loss
    # will be masked out by weights and make no effect on parameter gradients.
    inst_data = np.array([
        list(inst) + list([pad_idx] * (max_len - len(inst))) for inst in insts
    ])
    return_list += [inst_data.astype("int64").reshape([-1, max_len])]
    # position data
    if return_pos:
        inst_pos = np.array([
            list(range(0, len(inst))) + [pad_idx] * (max_len - len(inst))
            for inst in insts
        ])
        return_list += [inst_pos.astype("int64").reshape([-1, max_len])]
    if return_input_mask:
        # This is used to avoid attention on paddings.
        input_mask_data = np.array([[1] * len(inst) + [0] *
                                    (max_len - len(inst)) for inst in insts])
        input_mask_data = np.expand_dims(input_mask_data, axis=-1)
        return_list += [input_mask_data.astype("float32")]
    if return_max_len:
        return_list += [max_len]
    if return_num_token:
        num_token = 0
        for inst in insts:
            num_token += len(inst)
        return_list += [num_token]
    return return_list if len(return_list) > 1 else return_list[0]


if __name__ == "__main__":
    pass


================================================
FILE: paddlepalm/reader/utils/mrqa_helper.py
================================================
# -*- coding: UTF-8 -*-
#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

class MRQAExample(object):
    """A single training/test example for simple sequence classification.

     For examples without an answer, the start and end position are -1.
  """

    def __init__(self,
                 qas_id,
                 question_text,
                 doc_tokens,
                 orig_answer_text=None,
                 start_position=None,
                 end_position=None,
                 is_impossible=False):
        self.qas_id = qas_id
        self.question_text = question_text
        self.doc_tokens = doc_tokens
        self.orig_answer_text = orig_answer_text
        self.start_position = start_position
        self.end_position = end_position
        self.is_impossible = is_impossible

    def __str__(self):
        return self.__repr__()

    def __repr__(self):
        s = ""
        s += "qas_id: %s" % (tokenization.printable_text(self.qas_id))
        s += ", question_text: %s" % (
            tokenization.printable_text(self.question_text))
        s += ", doc_tokens: [%s]" % (" ".join(self.doc_tokens))
        if self.start_position:
            s += ", start_position: %d" % (self.start_position)
        if self.start_position:
            s += ", end_position: %d" % (self.end_position)
        if self.start_position:
            s += ", is_impossible: %r" % (self.is_impossible)
        return s


class MRQAFeature(object):
    """A single set of features of data."""

    def __init__(self,
                 unique_id,
                 example_index,
                 doc_span_index,
                 tokens,
                 token_to_orig_map,
                 token_is_max_context,
                 input_ids,
                 input_mask,
                 segment_ids,
                 start_position=None,
                 end_position=None,
                 is_impossible=None):
        self.unique_id = unique_id
        self.example_index = example_index
        self.doc_span_index = doc_span_index
        self.tokens = tokens
        self.token_to_orig_map = token_to_orig_map
        self.token_is_max_context = token_is_max_context
        self.input_ids = input_ids
        self.input_mask = input_mask
        self.segment_ids = segment_ids
        self.start_position = start_position
        self.end_position = end_position
        self.is_impossible = is_impossible


================================================
FILE: paddlepalm/reader/utils/reader4ernie.py
================================================
# -*- coding: UTF-8 -*-
#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
from __future__ import absolute_import

import sys
import os
import json
import random
import logging
import numpy as np
import six
from io import open
from collections import namedtuple

import paddlepalm as palm
import paddlepalm.tokenizer.ernie_tokenizer as tokenization
from paddlepalm.reader.utils.batching4ernie import pad_batch_data
from paddlepalm.reader.utils.mlm_batching import prepare_batch_data


log = logging.getLogger(__name__)

if six.PY3 and hasattr(sys.stdout, 'buffer'):
    import io
    sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')
    sys.stderr = io.TextIOWrapper(sys.stderr.buffer, encoding='utf-8')

if sys.version[0] == '2':
    reload(sys)
    sys.setdefaultencoding('utf-8')
else:
    import importlib
    importlib.reload(sys)

def csv_reader(fd, delimiter='\t'):
    def gen():
        for i in fd:
            yield i.rstrip('\n').split(delimiter)
    return gen()


class Reader(object):
    def __init__(self,
                 vocab_path,
                 label_map_config=None,
                 max_seq_len=512,
                 do_lower_case=True,
                 in_tokens=False,
                 is_inference=False,
                 learning_strategy='pointwise',
                 random_seed=None,
                 tokenizer="FullTokenizer",
                 phase='train',
                 is_classify=True,
                 is_regression=False,
                 for_cn=True,
                 task_id=0):
        assert phase in ['train', 'predict'], "supported phase: train, predict."
        self.max_seq_len = max_seq_len
        self.tokenizer = tokenization.FullTokenizer(
            vocab_file=vocab_path, do_lower_case=do_lower_case)
        self.vocab = self.tokenizer.vocab
        self.pad_id = self.vocab["[PAD]"]
        self.cls_id = self.vocab["[CLS]"]
        self.sep_id = self.vocab["[SEP]"]
        self.mask_id = self.vocab["[MASK]"]
        self.in_tokens = in_tokens
        self.phase = phase
        self.is_inference = is_inference
        self.learning_strategy = learning_strategy
        self.for_cn = for_cn
        self.task_id = task_id

        np.random.seed(random_seed)

        self.is_classify = is_classify
        self.is_regression = is_regression
        self.current_example = 0
        self.current_epoch = 0
        self.num_examples = 0
        self.examples = {}

        if label_map_config:
            with open(label_map_config, encoding='utf8') as f: 
                self.label_map = json.load(f)
        else:
            self.label_map = None

    def get_train_progress(self):
        """Gets progress for training phase."""
        return self.current_example, self.current_epoch

    def _read_tsv(self, input_file, quotechar=None):
        """Reads a tab separated value file."""
        with open(input_file, 'r', encoding='utf8') as f:
            reader = csv_reader(f)
            headers = next(reader)
            Example = namedtuple('Example', headers)

            examples = []
            for line in reader:
                example = Example(*line)
                examples.append(example)
            return examples

    def _truncate_seq_pair(self, tokens_a, tokens_b, max_length):
        """Truncates a sequence pair in place to the maximum length."""

        # This is a simple heuristic which will always truncate the longer sequence
        # one token at a time. This makes more sense than truncating an equal percent
        # of tokens from each, since if one sequence is very short then each token
        # that's truncated likely contains more information than a longer sequence.
        while True:
            total_length = len(tokens_a) + len(tokens_b)
            if total_length <= max_length:
                break
            if len(tokens_a) > len(tokens_b):
                tokens_a.pop()
            else:
                tokens_b.pop()
    

    def _convert_example_to_record(self, example, max_seq_length, tokenizer):
        """Converts a single `Example` into a single `Record`."""

        text_a = tokenization.convert_to_unicode(example.text_a)
        tokens_a = tokenizer.tokenize(text_a)
        tokens_b = None
        has_text_b = False
        has_text_b_neg = False
        if isinstance(example, dict):
            has_text_b = "text_b" in example.keys()
            has_text_b_neg = "text_b_neg" in example.keys()
        else:
            has_text_b = "text_b" in example._fields
            has_text_b_neg = "text_b_neg" in example._fields

        if has_text_b:
            text_b = tokenization.convert_to_unicode(example.text_b)
            tokens_b = tokenizer.tokenize(text_b)
            # Modifies `tokens_a` and `tokens_b` in place so that the total
            # length is less than the specified length.
            # Account for [CLS], [SEP], [SEP] with "- 3"
            self._truncate_seq_pair(tokens_a, tokens_b, max_seq_length - 3)
           
            if has_text_b_neg and self.phase == 'train':
                tokens_a_neg = tokenizer.tokenize(text_a)
                text_b_neg = tokenization.convert_to_unicode(example.text_b_neg)
                tokens_b_neg = tokenizer.tokenize(text_b_neg)
                self._truncate_seq_pair(tokens_a_neg, tokens_b_neg, max_seq_length - 3)
        else:
            # Account for [CLS] and [SEP] with "- 2"
            if len(tokens_a) > max_seq_length - 2:
                tokens_a = tokens_a[0:(max_seq_length - 2)]
        

        # The convention in BERT/ERNIE is:
        # (a) For sequence pairs:
        #  tokens:   [CLS] is this jack ##son ##ville ? [SEP] no it is not . [SEP]
        #  type_ids: 0     0  0    0    0     0       0 0     1  1  1  1   1 1
        # (b) For single sequences:
        #  tokens:   [CLS] the dog is hairy . [SEP]
        #  type_ids: 0     0   0   0  0     0 0
        #
        # Where "type_ids" are used to indicate whether this is the first
        # sequence or the second sequence. The embedding vectors for `type=0` and
        # `type=1` were learned during pre-training and are added to the wordpiece
        # embedding vector (and position vector). This is not *strictly* necessary
        # since the [SEP] token unambiguously separates the sequences, but it makes
        # it easier for the model to learn the concept of sequences.
        #
        # For classification tasks, the first vector (corresponding to [CLS]) is
        # used as as the "sentence vector". Note that this only makes sense because
        # the entire model is fine-tuned.
        tokens = []
        text_type_ids = []
        tokens.append("[CLS]")
        
        text_type_ids.append(0)
        for token in tokens_a:
            tokens.append(token)
            text_type_ids.append(0)
        tokens.append("[SEP]")
        text_type_ids.append(0)

        if tokens_b:
            for token in tokens_b:
                tokens.append(token)
                text_type_ids.append(1)
            tokens.append("[SEP]")
            text_type_ids.append(1)

        token_ids = tokenizer.convert_tokens_to_ids(tokens)
        position_ids = list(range(len(token_ids)))


        if has_text_b_neg and self.phase == 'train':
            tokens_neg = []
            text_type_ids_neg = []
            tokens_neg.append("[CLS]")
            text_type_ids_neg.append(0)
            for token in tokens_a_neg:
                tokens_neg.append(token)
                text_type_ids_neg.append(0)
            tokens_neg.append("[SEP]")
            text_type_ids_neg.append(0)

            if tokens_b_neg:
                for token in tokens_b_neg:
                    tokens_neg.append(token)
                    text_type_ids_neg.append(1)
                tokens_neg.append("[SEP]")
                text_type_ids_neg.append(1)

            token_ids_neg = tokenizer.convert_tokens_to_ids(tokens_neg)
            position_ids_neg = list(range(len(token_ids_neg)))


        if self.is_inference:
            Record = namedtuple('Record',
                                ['token_ids', 'text_type_ids', 'position_ids'])
            record = Record(
                token_ids=token_ids,
                text_type_ids=text_type_ids,
                position_ids=position_ids)
        else:
            qid = None
            if "qid" in example._fields:
                qid = example.qid
            if self.learning_strategy == 'pairwise' and self.phase == 'train':
                Record = namedtuple('Record',
                                    ['token_ids', 'text_type_ids', 'position_ids', 'token_ids_neg', 'text_type_ids_neg', 'position_ids_neg', 'qid'])
                
                record = Record(
                    token_ids=token_ids,
                    text_type_ids=text_type_ids,
                    position_ids=position_ids,
                    token_ids_neg=token_ids_neg,
                    text_type_ids_neg=text_type_ids_neg,
                    position_ids_neg=position_ids_neg,
                    qid=qid)
 
            else:
                if self.label_map:
                    label_id = self.label_map[example.label]
                else:
                    label_id = example.label

                Record = namedtuple('Record', [
                    'token_ids', 'text_type_ids', 'position_ids', 'label_id', 'qid'
                ])

                record = Record(
                    token_ids=token_ids,
                    text_type_ids=text_type_ids,
                    position_ids=position_ids,
                    label_id=label_id,
                    qid=qid)
        return record

    def _prepare_batch_data(self, examples, batch_size, phase='train'):
        """generate batch records"""
        batch_records, max_len = [], 0
        if len(examples) < batch_size:
            raise Exception('CLS dataset contains too few samples. Expect more than '+str(batch_size))
        for index, example in enumerate(examples):
            if phase == "train":
                self.current_example = index
            record = self._convert_example_to_record(example, self.max_seq_len,
                                                     self.tokenizer)                                       
            max_len = max(max_len, len(record.token_ids))
            if self.in_tokens:
                to_append = (len(batch_records) + 1) * max_len <= batch_size
            else:
                to_append = len(batch_records) < batch_size
            if to_append:
                batch_records.append(record)
            else:
                batch_pad_records = self._pad_batch_records(batch_records)
                ds = ['s'] * len(batch_pad_records)
                for piece in palm.distribute.yield_pieces(batch_pad_records, ds, batch_size):
                    yield piece
                batch_records, max_len = [record], len(record.token_ids)
      
        if phase == 'predict' and batch_records:
            for piece in palm.distribute.yield_pieces(\
                        self._pad_batch_records(batch_records),
                        ds, batch_size):
                yield piece

    def get_num_examples(self, input_file=None, phase='train'):
        if input_file is None:
            return len(self.examples.get(phase, []))
        else:
            # assert input_file is not None, "Argument input_file should be given or the data_generator should be created when this func is called."
            examples = self._read_tsv(input_file)
            return len(examples)

    def data_generator(self,
                       input_file,
                       batch_size,
                       epoch,
                       dev_count=1,
                       shuffle=True,
                       phase=None):
        examples = self._read_tsv(input_file)
        if phase is None:
            phase = 'all'
        self.examples[phase] = examples

        def wrapper():
            all_dev_batches = []
            if epoch is None:
                num_epochs = 99999999
            else:
                num_epochs = epoch
            for epoch_index in range(num_epochs):
                if phase == "train":
                    self.current_example = 0
                    self.current_epoch = epoch_index
                if shuffle:
                    np.random.shuffle(examples)

                for batch_data in self._prepare_batch_data(
                        examples, batch_size, phase=phase):
                    if len(all_dev_batches) < dev_count:
                        all_dev_batches.append(batch_data)
                    if len(all_dev_batches) == dev_count:
                        for batch in all_dev_batches:
                            yield batch
                        
                        all_dev_batches = []
        def f():
            for i in wrapper():
                yield i
        return f
        # return wrapper


class MaskLMReader(Reader):

    def _convert_example_to_record(self, example, max_seq_length, tokenizer):
        """Converts a single `Example` into a single `Record`."""

        text_a = tokenization.convert_to_unicode(example.text_a)
        tokens_a = tokenizer.tokenize(text_a)
        tokens_b = None 

        has_text_b = False
        if isinstance(example, dict):
            has_text_b = "text_b" in example.keys()
        else:
            has_text_b = "text_b" in example._fields

        if has_text_b:
            text_b = tokenization.convert_to_unicode(example.text_b)
            tokens_b = tokenizer.tokenize(text_b)

        if tokens_b:
            # Modifies `tokens_a` and `tokens_b` in place so that the total
            # length is less than the specified length.
            # Account for [CLS], [SEP], [SEP] with "- 3"
            self._truncate_seq_pair(tokens_a, tokens_b, max_seq_length - 3)
        else:
            # Account for [CLS] and [SEP] with "- 2"
            if len(tokens_a) > max_seq_length - 2:
                tokens_a = tokens_a[0:(max_seq_length - 2)]

        # The convention in BERT/ERNIE is:
        # (a) For sequence pairs:
        #  tokens:   [CLS] is this jack ##son ##ville ? [SEP] no it is not . [SEP]
        #  type_ids: 0     0  0    0    0     0       0 0     1  1  1  1   1 1
        # (b) For single sequences:
        #  tokens:   [CLS] the dog is hairy . [SEP]
        #  type_ids: 0     0   0   0  0     0 0
        #
        # Where "type_ids" are used to indicate whether this is the first
        # sequence or the second sequence. The embedding vectors for `type=0` and
        # `type=1` were learned during pre-training and are added to the wordpiece
        # embedding vector (and position vector). This is not *strictly* necessary
        # since the [SEP] token unambiguously separates the sequences, but it makes
        # it easier for the model to learn the concept of sequences.
        #
        # For classification tasks, the first vector (corresponding to [CLS]) is
        # used as as the "sentence vector". Note that this only makes sense because
        # the entire model is fine-tuned.
        tokens = []
        text_type_ids = []
        tokens.append("[CLS]")
        text_type_ids.append(0)
        for token in tokens_a:
            tokens.append(token)
            text_type_ids.append(0)
        tokens.append("[SEP]")
        text_type_ids.append(0)

        if tokens_b:
            for token in tokens_b:
                tokens.append(token)
                text_type_ids.append(1)
            tokens.append("[SEP]")
            text_type_ids.append(1)

        token_ids = tokenizer.convert_tokens_to_ids(tokens)
        position_ids = list(range(len(token_ids)))

        return [token_ids, text_type_ids, position_ids]

    def batch_reader(self, examples, batch_size, in_tokens, phase):
        batch = []
        total_token_num = 0
        if len(examples) < batch_size:
            raise Exception('MaskLM dataset contains too few samples. Expect more than '+str(batch_size))
        for e in examples:
            parsed_line = self._convert_example_to_record(e, self.max_seq_len, self.tokenizer)
            to_append = len(batch) < batch_size
            if to_append:
                batch.append(parsed_line)
                total_token_num += len(parsed_line[0])
            else:
                yield batch, total_token_num
                batch = [parsed_line]
                total_token_num = len(parsed_line[0])

        if len(batch) > 0 and phase == 'predict':
            yield batch, total_token_num

    def data_generator(self,
                       input_file,
                       batch_size,
                       epoch,
                       dev_count=1,
                       shuffle=True,
                       phase=None):
        examples = self._read_tsv(input_file)
        if phase is None:
            phase = 'all'
        self.examples[phase] = examples

        def wrapper():
            all_dev_batches = []
            if epoch is None:
                num_epochs = 99999999
            else:
                num_epochs = epoch
            for epoch_index in range(num_epochs):
                if phase == "train":
                    self.current_example = 0
                    self.current_epoch = epoch_index
                if shuffle:
                    np.random.shuffle(examples)

                all_dev_batches = []
                for batch_data, num_tokens in self.batch_reader(examples, 
                                                    batch_size, self.in_tokens, phase=phase):
                    batch_data = prepare_batch_data(
                        batch_data,
                        num_tokens,
                        voc_size=len(self.vocab),
                        pad_id=self.pad_id,
                        cls_id=self.cls_id,
                        sep_id=self.sep_id,
                        mask_id=self.mask_id,
                        # max_len=self.max_seq_len, # 注意，如果padding到最大长度，会导致mask_pos与实际位置不对应。因为mask pos是基于batch内最大长度来计算的。
                        return_input_mask=True,
                        return_max_len=False,
                        return_num_token=False,
                        dev_count=dev_count)

                    # yield batch
                    for piece in palm.distribute.yield_pieces(batch_data, ['s', 's', 's', 's', 's', 'u', 'u'], batch_size):
                        yield piece
                    # # ds = ['s'] * len(batch_data)
                    # for piece in palm.distribute.yield_pieces(batch_data, ['s'] * 7, batch_size):
                    #     yield piece

        return wrapper


class ClassifyReader(Reader):
    def _read_tsv(self, input_file, quotechar=None):
        """Reads a tab separated value file."""
        with open(input_file, 'r', encoding='utf8') as f:
            reader = csv_reader(f)
            headers = next(reader)
            text_indices = [
                index for index, h in enumerate(headers) if h != "label"
            ]
            Example = namedtuple('Example', headers)
            examples = []
            for line in reader:
                for index, text in enumerate(line):
                    if index in text_indices:
                        if self.for_cn:
                            line[index] = text.replace(' ', '')
                        else:
                            line[index] = text
                example = Example(*line)
                examples.append(example)
            return examples

    def _pad_batch_records(self, batch_records):
        batch_token_ids = [record.token_ids for record in batch_records]
        batch_text_type_ids = [record.text_type_ids for record in batch_records]
        batch_position_ids = [record.position_ids for record in batch_records]
        if self.phase=='train' and self.learning_strategy == 'pairwise':
            batch_token_ids_neg = [record.token_ids_neg for record in batch_records]
            batch_text_type_ids_neg = [record.text_type_ids_neg for record in batch_records]
            batch_position_ids_neg = [record.position_ids_neg for record in batch_records]

        if not self.is_inference:
            if not self.learning_strategy == 'pairwise':
                batch_labels = [record.label_id for record in batch_records]
                if self.is_classify:
                    batch_labels = np.array(batch_labels).astype("int64").reshape(
                        [-1])
                elif self.is_regression:
                    batch_labels = np.array(batch_labels).astype("float32").reshape(
                        [-1])

            if batch_records[0].qid:
                batch_qids = [record.qid for record in batch_records]
                batch_qids = np.array(batch_qids).astype("int64").reshape(
                    [-1])
            else:
                batch_qids = np.array([]).astype("int64").reshape([-1])

        # padding
        padded_token_ids, input_mask = pad_batch_data(
            batch_token_ids, pad_idx=self.pad_id, return_input_mask=True)
        padded_text_type_ids = pad_batch_data(
            batch_text_type_ids, pad_idx=self.pad_id)
        padded_position_ids = pad_batch_data(
            batch_position_ids, pad_idx=self.pad_id)
        padded_task_ids = np.ones_like(
            padded_token_ids, dtype="int64") * self.task_id

        return_list = [
            padded_token_ids, padded_text_type_ids, padded_position_ids,
            padded_task_ids, input_mask
        ]

        if self.phase=='train':
            if self.learning_strategy == 'pairwise':
                padded_token_ids_neg, input_mask_neg = pad_batch_data(
                    batch_token_ids_neg, pad_idx=self.pad_id, return_input_mask=True)
                padded_text_type_ids_neg = pad_batch_data(
                    batch_text_type_ids_neg, pad_idx=self.pad_id)
                padded_position_ids_neg = pad_batch_data(
                    batch_position_ids_neg, pad_idx=self.pad_id)
                padded_task_ids_neg = np.ones_like(
                    padded_token_ids_neg, dtype="int64") * self.task_id

                return_list += [padded_token_ids_neg, padded_text_type_ids_neg, \
                                padded_position_ids_neg, padded_task_ids_neg, input_mask_neg]

            elif self.learning_strategy == 'pointwise':
                return_list += [batch_labels]

        return return_list


class SequenceLabelReader(Reader):
    def _pad_batch_records(self, batch_records):
        batch_token_ids = [record.token_ids for record in batch_records]
        batch_text_type_ids = [record.text_type_ids for record in batch_records]
        batch_position_ids = [record.position_ids for record in batch_records]
        batch_label_ids = [record.label_ids for record in batch_records]

        # padding
        padded_token_ids, input_mask, batch_seq_lens = pad_batch_data(
            batch_token_ids,
            pad_idx=self.pad_id,
            return_input_mask=True,
            return_seq_lens=True)
        padded_text_type_ids = pad_batch_data(
            batch_text_type_ids, pad_idx=self.pad_id)
        padded_position_ids = pad_batch_data(
            batch_position_ids, pad_idx=self.pad_id)
        padded_label_ids = pad_batch_data(
            batch_label_ids, pad_idx=len(self.label_map) - 1)
        padded_task_ids = np.ones_like(
            padded_token_ids, dtype="int64") * self.task_id

        return_list = [
            padded_token_ids, padded_text_type_ids, padded_position_ids,
            padded_task_ids, input_mask, padded_label_ids, batch_seq_lens
        ]
        return return_list

    def _reseg_token_label(self, tokens, labels, tokenizer):
        assert len(tokens) == len(labels)
        ret_tokens = []
        ret_labels = []
        for token, label in zip(tokens, labels):
            sub_token = tokenizer.tokenize(token)
            if len(sub_token) == 0:
                continue
            ret_tokens.extend(sub_token)
            if len(sub_token) == 1:
                ret_labels.append(label)
                continue

            ret_labels.extend([label] * len(sub_token))

        assert len(ret_tokens) == len(ret_labels)
        return ret_tokens, ret_labels

    def _convert_example_to_record(self, example, max_seq_length, tokenizer):
        tokens = tokenization.convert_to_unicode(example.text_a).split(u"")
        labels = tokenization.convert_to_unicode(example.label).split(u"")
        tokens, labels = self._reseg_token_label(tokens, labels, tokenizer)

        if len(tokens) > max_seq_length - 2:
            tokens = tokens[0:(max_seq_length - 2)]
            labels = labels[0:(max_seq_length - 2)]

        tokens = ["[CLS]"] + tokens + ["[SEP]"]
        token_ids = tokenizer.convert_tokens_to_ids(tokens)
        position_ids = list(range(len(token_ids)))
        text_type_ids = [0] * len(token_ids)
        no_entity_id = len(self.label_map) - 1
        labels = [
            label if label in self.label_map else u"O" for label in labels
        ]
        label_ids = [no_entity_id] + [
            self.label_map[label] for label in labels
        ] + [no_entity_id]

        Record = namedtuple(
            'Record',
            ['token_ids', 'text_type_ids', 'position_ids', 'label_ids'])
        record = Record(
            token_ids=token_ids,
            text_type_ids=text_type_ids,
            position_ids=position_ids,
            label_ids=label_ids)
        return record


class ExtractEmbeddingReader(Reader):
    def _pad_batch_records(self, batch_records):
        batch_token_ids = [record.token_ids for record in batch_records]
        batch_text_type_ids = [record.text_type_ids for record in batch_records]
        batch_position_ids = [record.position_ids for record in batch_records]

        # padding
        padded_token_ids, input_mask, seq_lens = pad_batch_data(
            batch_token_ids,
            pad_idx=self.pad_id,
            return_input_mask=True,
            return_seq_lens=True)
        padded_text_type_ids = pad_batch_data(
            batch_text_type_ids, pad_idx=self.pad_id)
        padded_position_ids = pad_batch_data(
            batch_position_ids, pad_idx=self.pad_id)
        padded_task_ids = np.ones_like(
            padded_token_ids, dtype="int64") * self.task_id

        return_list = [
            padded_token_ids, padded_text_type_ids, padded_position_ids,
            padded_task_ids, input_mask, seq_lens
        ]

        return return_list


class MRCReader(Reader):
    def __init__(self,
                 vocab_path,
                 label_map_config=None,
                 max_seq_len=512,
                 do_lower_case=True,
                 in_tokens=False,
                 random_seed=None,
                 tokenizer="FullTokenizer",
                 is_classify=True,
                 is_regression=False,
                 for_cn=True,
                 task_id=0,
                 doc_stride=128,
                 max_query_length=64,
                 remove_noanswer=True):
        self.max_seq_len = max_seq_len
        self.tokenizer = tokenization.FullTokenizer(
            vocab_file=vocab_path, do_lower_case=do_lower_case)
        self.vocab = self.tokenizer.vocab
        self.pad_id = self.vocab["[PAD]"]
        self.cls_id = self.vocab["[CLS]"]
        self.sep_id = self.vocab["[SEP]"]
        self.in_tokens = in_tokens
        self.for_cn = for_cn
        self.task_id = task_id
        self.doc_stride = doc_stride
        self.max_query_length = max_query_length
        self.examples = {}
        self.features = {}
        self.remove_noanswer = remove_noanswer

        if random_seed is not None:
            np.random.seed(random_seed)

        self.current_example = 0
        self.current_epoch = 0
        self.num_examples = 0

        self.Example = namedtuple('Example',
                ['qas_id', 'question_text', 'doc_tokens', 'orig_answer_text',
                'start_position', 'end_position'])
        self.Feature = namedtuple("Feature", ["unique_id", "example_index", "doc_span_index",
                "tokens", "token_to_orig_map", "token_is_max_context",
                "token_ids", "position_ids", "text_type_ids",
                "start_position", "end_position"])
        self.DocSpan = namedtuple("DocSpan", ["start", "length"])

    def _read_json(self, input_file, is_training):
        examples = []
        with open(input_file, "r", encoding='utf-8') as f:
           # f = f.read().decode(encoding='gbk').encode(encoding='utf-8')
            input_data = json.load(f)["data"]
            for entry in input_data:
                for paragraph in entry["paragraphs"]:
                    paragraph_text = paragraph["context"]
                    for qa in paragraph["qas"]:
                        qas_id = qa["id"]
                        question_text = qa["question"]
                        start_pos = None
                        end_pos = None
                        orig_answer_text = None

                        if is_training:
                            if len(qa["answers"]) != 1:
                                raise ValueError(
                                    "For training, each question should have exactly 1 answer."
                                )

                            answer = qa["answers"][0]
                            orig_answer_text = answer["text"]
                            answer_offset = answer["answer_start"]
                            answer_length = len(orig_answer_text)
                            doc_tokens = [
                                paragraph_text[:answer_offset],
                                paragraph_text[answer_offset:answer_offset +
                                               answer_length],
                                paragraph_text[answer_offset + answer_length:]
                            ]

                            start_pos = 1
                            end_pos = 1

                            actual_text = " ".join(doc_tokens[start_pos:(end_pos
                                                                         + 1)])
                            if actual_text.find(orig_answer_text) == -1:
                                log.info("Could not find answer: '%s' vs. '%s'",
                                      actual_text, orig_answer_text)
                                continue
                        else:
                            doc_tokens = tokenization.tokenize_chinese_chars(
                                paragraph_text)

                        example = self.Example(
                            qas_id=qas_id,
                            question_text=question_text,
                            doc_tokens=doc_tokens,
                            orig_answer_text=orig_answer_text,
                            start_position=start_pos,
                            end_position=end_pos)
                        examples.append(example)

        return examples

    def _improve_answer_span(self, doc_tokens, input_start, input_end,
                             tokenizer, orig_answer_text):
        tok_answer_text = " ".join(tokenizer.tokenize(orig_answer_text))

        for new_start in range(input_start, input_end + 1):
            for new_end in range(input_end, new_start - 1, -1):
                text_span = " ".join(doc_tokens[new_start:(new_end + 1)])
                if text_span == tok_answer_text:
                    return (new_start, new_end)

        return (input_start, input_end)

    def _check_is_max_context(self, doc_spans, cur_span_index, position):
        best_score = None
        best_span_index = None
        for (span_index, doc_span) in enumerate(doc_spans):
            end = doc_span.start + doc_span.length - 1
            if position < doc_span.start:
                continue
            if position > end:
                continue
            num_left_context = position - doc_span.start
            num_right_context = end - position
            score = min(num_left_context,
                        num_right_context) + 0.01 * doc_span.length
            if best_score is None or score > best_score:
                best_score = score
                best_span_index = span_index

        return cur_span_index == best_span_index

    def _convert_example_to_feature(self, examples, max_seq_length, tokenizer,
                                    is_training, remove_noanswer=True):
        features = []
        unique_id = 1000000000

        print('converting examples to features...')
        for (example_index, example) in enumerate(examples):
            if example_index % 1000 == 0:
                print('processing {}th example...'.format(example_index))
            query_tokens = tokenizer.tokenize(example.question_text)
            if len(query_tokens) > self.max_query_length:
                query_tokens = query_tokens[0:self.max_query_length]
            tok_to_orig_index = []
            orig_to_tok_index = []
            all_doc_tokens = []
            for (i, token) in enumerate(example.doc_tokens):
                orig_to_tok_index.append(len(all_doc_tokens))
                sub_tokens = tokenizer.tokenize(token)
                for sub_token in sub_tokens:
                    tok_to_orig_index.append(i)
                    all_doc_tokens.append(sub_token)

            tok_start_position = None
            tok_end_position = None
            if is_training:
                tok_start_position = orig_to_tok_index[example.start_position]
                if example.end_position < len(example.doc_tokens) - 1:
                    tok_end_position = orig_to_tok_index[example.end_position +
                                                         1] - 1
                else:
                    tok_end_position = len(all_doc_tokens) - 1
                (tok_start_position,
                 tok_end_position) = self._improve_answer_span(
                     all_doc_tokens, tok_start_position, tok_end_position,
                     tokenizer, example.orig_answer_text)

            max_tokens_for_doc = max_seq_length - len(query_tokens) - 3
            doc_spans = []
            start_offset = 0
            while start_offset < len(all_doc_tokens):
                length = len(all_doc_tokens) - start_offset
                if length > max_tokens_for_doc:
                    length = max_tokens_for_doc
                doc_spans.append(self.DocSpan(start=start_offset, length=length))
                if start_offset + length == len(all_doc_tokens):
                    break
                start_offset += min(length, self.doc_stride)
           
            for (doc_span_index, doc_span) in enumerate(doc_spans):
                tokens = []
                token_to_orig_map = {}
                token_is_max_context = {}
                text_type_ids = []
                tokens.append("[CLS]")
                text_type_ids.append(0)
                for token in query_tokens:
                    tokens.append(token)
                    text_type_ids.append(0)
                tokens.append("[SEP]")
                text_type_ids.append(0)

                for i in range(doc_span.length):
                    split_token_index = doc_span.start + i
                    token_to_orig_map[len(tokens)] = tok_to_orig_index[
                        split_token_index]

                    is_max_context = self._check_is_max_context(
                        doc_spans, doc_span_index, split_token_index)
                    token_is_max_context[len(tokens)] = is_max_context
                    tokens.append(all_doc_tokens[split_token_index])
                    text_type_ids.append(1)
                tokens.append("[SEP]")
                text_type_ids.append(1)

                token_ids = tokenizer.convert_tokens_to_ids(tokens)
                position_ids = list(range(len(token_ids)))
                start_position = None
                end_position = None
                if is_training:
                    doc_start = doc_span.start
                    doc_end = doc_span.start + doc_span.length - 1
                    out_of_span = False
                    if not (tok_start_position >= doc_start and
                            tok_end_position <= doc_end):
                        out_of_span = True
                    if out_of_span:
                        start_position = 0
                        end_position = 0
                        if remove_noanswer:
                            continue
                    else:
                        doc_offset = len(query_tokens) + 2
                        start_position = tok_start_position - doc_start + doc_offset
                        end_position = tok_end_position - doc_start + doc_offset

                feature = self.Feature(
                    unique_id=unique_id,
                    example_index=example_index,
                    doc_span_index=doc_span_index,
                    tokens=tokens,
                    token_to_orig_map=token_to_orig_map,
                    token_is_max_context=token_is_max_context,
                    token_ids=token_ids,
                    position_ids=position_ids,
                    text_type_ids=text_type_ids,
                    start_position=start_position,
                    end_position=end_position)
                features.append(feature)

                unique_id += 1

        return features

    def _prepare_batch_data(self, records, batch_size, phase=None):
        """generate batch records"""
        batch_records, max_len = [], 0

        if len(records) < batch_size:
            raise Exception('mrc dataset contains too few samples. Expect more than '+str(batch_size))

        for index, record in enumerate(records):
            if phase == "train":
                self.current_example = index
            max_len = max(max_len, len(record.token_ids))
            if self.in_tokens:
                to_append = (len(batch_records) + 1) * max_len <= batch_size
            else:
                to_append = len(batch_records) < batch_size
            if to_append:
                batch_records.append(record)
            else:
                # yield self._pad_batch_records(batch_records, phase == "train")
                ds = ['s'] * 8
                for piece in palm.distribute.yield_pieces(\
                        self._pad_batch_records(batch_records, phase == 'train'),
                        ds, batch_size):
                    yield piece
                batch_records, max_len = [record], len(record.token_ids)
      
        if phase == 'predict' and batch_records:
            for piece in palm.distribute.yield_pieces(\
                        self._pad_batch_records(batch_records, phase == 'train'),
                        ds, batch_size):
                yield piece


    def _pad_batch_records(self, batch_records, is_training):
        batch_token_ids = [record.token_ids for record in batch_records]
        batch_text_type_ids = [record.text_type_ids for record in batch_records]
        batch_position_ids = [record.position_ids for record in batch_records]
        if is_training:
            batch_start_position = [
                record.start_position for record in batch_records
            ]
            batch_end_position = [
                record.end_position for record in batch_records
            ]
            batch_start_position = np.array(batch_start_position).astype(
                "int64").reshape([-1])
            batch_end_position = np.array(batch_end_position).astype(
                "int64").reshape([-1])

        else:
            batch_size = len(batch_token_ids)
            batch_start_position = np.zeros(
                shape=[batch_size], dtype="int64")
            batch_end_position = np.zeros(shape=[batch_size], dtype="int64")

        batch_unique_ids = [record.unique_id for record in batch_records]
        batch_unique_ids = np.array(batch_unique_ids).astype("int64").reshape(
            [-1])

        # padding
        padded_token_ids, input_mask = pad_batch_data(
            batch_token_ids, pad_idx=self.pad_id, return_input_mask=True)
        padded_text_type_ids = pad_batch_data(
            batch_text_type_ids, pad_idx=self.pad_id)
        padded_position_ids = pad_batch_data(
            batch_position_ids, pad_idx=self.pad_id)
        padded_task_ids = np.ones_like(
            padded_token_ids, dtype="int64") * self.task_id

        return_list = [
            padded_token_ids, padded_text_type_ids, padded_position_ids,
            padded_task_ids, input_mask, batch_start_position,
            batch_end_position, batch_unique_ids
        ]

        return return_list

    def get_num_examples(self, phase):
        return len(self.features[phase])

    def get_features(self, phase):
        return self.features[phase]

    def get_examples(self, phase):
        return self.examples[phase]

    def data_generator(self,
                       input_file,
                       batch_size,
                       epoch,
                       dev_count=1,
                       shuffle=True,
                       phase=None):

        examples = self.examples.get(phase, None)
        features = self.features.get(phase, None)
        if not examples:
            examples = self._read_json(input_file, phase == "train")
            features = self._convert_example_to_feature(
                examples, self.max_seq_len, self.tokenizer, phase == "train", remove_noanswer=self.remove_noanswer)
            self.examples[phase] = examples
            self.features[phase] = features

        def wrapper():
            all_dev_batches = []
            if epoch is None:
                num_epochs = 99999999
            else:
                num_epochs = epoch
            for epoch_index in range(num_epochs):
                if phase == "train":
                    self.current_example = 0
                    self.current_epoch = epoch_index
                if phase == "train" and shuffle:
                    np.random.shuffle(features)
  
                for batch_data in self._prepare_batch_data(
                        features, batch_size, phase=phase):

                    yield batch_data

        return wrapper


if __name__ == '__main__':
    pass


================================================
FILE: paddlepalm/tokenizer/__init__.py
================================================


================================================
FILE: paddlepalm/tokenizer/bert_tokenizer.py
================================================
# -*- coding: UTF-8 -*-
#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Tokenization classes."""

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import collections
import unicodedata
import six


def convert_to_unicode(text):
    """Converts `text` to Unicode (if it's not already), assuming utf-8 input."""
    if six.PY3:
        if isinstance(text, str):
            return text
        elif isinstance(text, bytes):
            return text.decode("utf-8", "ignore")
        else:
            raise ValueError("Unsupported string type: %s" % (type(text)))
    elif six.PY2:
        if isinstance(text, str):
            return text.decode("utf-8", "ignore")
        elif isinstance(text, unicode):
            return text
        else:
            raise ValueError("Unsupported string type: %s" % (type(text)))
    else:
        raise ValueError("Not running on Python2 or Python 3?")


def printable_text(text):
    """Returns text encoded in a way suitable for print or `tf.logging`."""

    # These functions want `str` for both Python2 and Python3, but in one case
    # it's a Unicode string and in the other it's a byte string.
    if six.PY3:
        if isinstance(text, str):
            return text
        elif isinstance(text, bytes):
            return text.decode("utf-8", "ignore")
        else:
            raise ValueError("Unsupported string type: %s" % (type(text)))
    elif six.PY2:
        if isinstance(text, str):
            return text
        elif isinstance(text, unicode):
            return text.encode("utf-8")
        else:
            raise ValueError("Unsupported string type: %s" % (type(text)))
    else:
        raise ValueError("Not running on Python2 or Python 3?")


def load_vocab(vocab_file):
    """Loads a vocabulary file into a dictionary."""
    vocab = collections.OrderedDict()
    fin = open(vocab_file)
    for num, line in enumerate(fin):
        items = convert_to_unicode(line.strip()).split("\t")
        if len(items) > 2:
            break
        token = items[0]
        index = items[1] if len(items) == 2 else num
        token = token.strip()
        vocab[token] = int(index)
    return vocab


def convert_by_vocab(vocab, items):
    """Converts a sequence of [tokens|ids] using the vocab."""
    output = []
    for item in items:
        output.append(vocab[item])
    return output


def convert_tokens_to_ids(vocab, tokens):
    return convert_by_vocab(vocab, tokens)


def convert_ids_to_tokens(inv_vocab, ids):
    return convert_by_vocab(inv_vocab, ids)


def whitespace_tokenize(text):
    """Runs basic whitespace cleaning and splitting on a peice of text."""
    text = text.strip()
    if not text:
        return []
    tokens = text.split()
    return tokens


class FullTokenizer(object):
    """Runs end-to-end tokenziation."""

    def __init__(self, vocab_file, do_lower_case=True):
        self.vocab = load_vocab(vocab_file)
        self.inv_vocab = {v: k for k, v in self.vocab.items()}
        self.basic_tokenizer = BasicTokenizer(do_lower_case=do_lower_case)
        self.wordpiece_tokenizer = WordpieceTokenizer(vocab=self.vocab)

    def tokenize(self, text):
        split_tokens = []
        for token in self.basic_tokenizer.tokenize(text):
            for sub_token in self.wordpiece_tokenizer.tokenize(token):
                split_tokens.append(sub_token)

        return split_tokens

    def convert_tokens_to_ids(self, tokens):
        return convert_by_vocab(self.vocab, tokens)

    def convert_ids_to_tokens(self, ids):
        return convert_by_vocab(self.inv_vocab, ids)


class CharTokenizer(object):
    """Runs end-to-end tokenziation."""

    def __init__(self, vocab_file, do_lower_case=True):
        self.vocab = load_vocab(vocab_file)
        self.inv_vocab = {v: k for k, v in self.vocab.items()}
        self.wordpiece_tokenizer = WordpieceTokenizer(vocab=self.vocab)

    def tokenize(self, text):
        split_tokens = []
        for token in text.lower().split(" "):
            for sub_token in self.wordpiece_tokenizer.tokenize(token):
                split_tokens.append(sub_token)

        return split_tokens

    def convert_tokens_to_ids(self, tokens):
        return convert_by_vocab(self.vocab, tokens)

    def convert_ids_to_tokens(self, ids):
        return convert_by_vocab(self.inv_vocab, ids)


class BasicTokenizer(object):
    """Runs basic tokenization (punctuation splitting, lower casing, etc.)."""

    def __init__(self, do_lower_case=True):
        """Constructs a BasicTokenizer.

        Args:
            do_lower_case: Whether to lower case the input.
        """
        self.do_lower_case = do_lower_case
        self._never_lowercase = ['[UNK]', '[SEP]', '[PAD]', '[CLS]', '[MASK]']

    def tokenize(self, text):
        """Tokenizes a piece of text."""
        text = convert_to_unicode(text)
        text = self._clean_text(text)

        # This was added on November 1st, 2018 for the multilingual and Chinese
        # models. This is also applied to the English models now, but it doesn't
        # matter since the English models were not trained on any Chinese data
        # and generally don't have any Chinese data in them (there are Chinese
        # characters in the vocabulary because Wikipedia does have some Chinese
        # words in the English Wikipedia.).
        text = self._tokenize_chinese_chars(text)

        orig_tokens = whitespace_tokenize(text)
        split_tokens = []
        for token in orig_tokens:
            if self.do_lower_case and token not in self._never_lowercase:
                token = token.lower()
                token = self._run_strip_accents(token)
            if token in self._never_lowercase:
                split_tokens.extend([token])
            else:
                split_tokens.extend(self._run_split_on_punc(token))

        output_tokens = whitespace_tokenize(" ".join(split_tokens))
        return output_tokens

    def _run_strip_accents(self, text):
        """Strips accents from a piece of text."""
        text = unicodedata.normalize("NFD", text)
        output = []
        for char in text:
            cat = unicodedata.category(char)
            if cat == "Mn":
                continue
            output.append(char)
        return "".join(output)

    def _run_split_on_punc(self, text):
        """Splits punctuation on a piece of text."""
        chars = list(text)
        i = 0
        start_new_word = True
        output = []
        while i < len(chars):
            char = chars[i]
            if _is_punctuation(char):
                output.append([char])
                start_new_word = True
            else:
                if start_new_word:
                    output.append([])
                start_new_word = False
                output[-1].append(char)
            i += 1

        return ["".join(x) for x in output]

    def _tokenize_chinese_chars(self, text):
        """Adds whitespace around any CJK character."""
        output = []
        for char in text:
            cp = ord(char)
            if self._is_chinese_char(cp):
                output.append(" ")
                output.append(char)
                output.append(" ")
            else:
                output.append(char)
        return "".join(output)

    def _is_chinese_char(self, cp):
        """Checks whether CP is the codepoint of a CJK character."""
        # This defines a "chinese character" as anything in the CJK Unicode block:
        #     https://en.wikipedia.org/wiki/CJK_Unified_Ideographs_(Unicode_block)
        #
        # Note that the CJK Unicode block is NOT all Japanese and Korean characters,
        # despite its name. The modern Korean Hangul alphabet is a different block,
        # as is Japanese Hiragana and Katakana. Those alphabets are used to write
        # space-separated words, so they are not treated specially and handled
        # like the all of the other languages.
        if ((cp >= 0x4E00 and cp <= 0x9FFF) or  #
            (cp >= 0x3400 and cp <= 0x4DBF) or  #
            (cp >= 0x20000 and cp <= 0x2A6DF) or  #
            (cp >= 0x2A700 and cp <= 0x2B73F) or  #
            (cp >= 0x2B740 and cp <= 0x2B81F) or  #
            (cp >= 0x2B820 and cp <= 0x2CEAF) or
            (cp >= 0xF900 and cp <= 0xFAFF) or  #
            (cp >= 0x2F800 and cp <= 0x2FA1F)):  #
            return True

        return False

    def _clean_text(self, text):
        """Performs invalid character removal and whitespace cleanup on text."""
        output = []
        for char in text:
            cp = ord(char)
            if cp == 0 or cp == 0xfffd or _is_control(char):
                continue
            if _is_whitespace(char):
                output.append(" ")
            else:
                output.append(char)
        return "".join(output)


class WordpieceTokenizer(object):
    """Runs WordPiece tokenziation."""

    def __init__(self, vocab, unk_token="[UNK]", max_input_chars_per_word=100):
        self.vocab = vocab
        self.unk_token = unk_token
        self.max_input_chars_per_word = max_input_chars_per_word

    def tokenize(self, text):
        """Tokenizes a piece of text into its word pieces.

        This uses a greedy longest-match-first algorithm to perform tokenization
        using the given vocabulary.

        For example:
            input = "unaffable"
            output = ["un", "##aff", "##able"]

        Args:
            text: A single token or whitespace separated tokens. This should have
                already been passed through `BasicTokenizer.

        Returns:
            A list of wordpiece tokens.
        """

        text = convert_to_unicode(text)

        output_tokens = []
        for token in whitespace_tokenize(text):
            chars = list(token)
            if len(chars) > self.max_input_chars_per_word:
                output_tokens.append(self.unk_token)
                continue

            is_bad = False
            start = 0
            sub_tokens = []
            while start < len(chars):
                end = len(chars)
                cur_substr = None
                while start < end:
                    substr = "".join(chars[start:end])
                    if start > 0:
                        substr = "##" + substr
                    if substr in self.vocab:
                        cur_substr = substr
                        break
                    end -= 1
                if cur_substr is None:
                    is_bad = True
                    break
                sub_tokens.append(cur_substr)
                start = end

            if is_bad:
                output_tokens.append(self.unk_token)
            else:
                output_tokens.extend(sub_tokens)
        return output_tokens


def _is_whitespace(char):
    """Checks whether `chars` is a whitespace character."""
    # \t, \n, and \r are technically contorl characters but we treat them
    # as whitespace since they are generally considered as such.
    if char == " " or char == "\t" or char == "\n" or char == "\r":
        return True
    cat = unicodedata.category(char)
    if cat == "Zs":
        return True
    return False


def _is_control(char):
    """Checks whether `chars` is a control character."""
    # These are technically control characters but we count them as whitespace
    # characters.
    if char == "\t" or char == "\n" or char == "\r":
        return False
    cat = unicodedata.category(char)
    if cat.startswith("C"):
        return True
    return False


def _is_punctuation(char):
    """Checks whether `chars` is a punctuation character."""
    cp = ord(char)
    # We treat all non-letter/number ASCII as punctuation.
    # Characters such as "^", "$", and "`" are not in the Unicode
    # Punctuation class but we treat them as punctuation anyways, for
    # consistency.
    if ((cp >= 33 and cp <= 47) or (cp >= 58 and cp <= 64) or
        (cp >= 91 and cp <= 96) or (cp >= 123 and cp <= 126)):
        return True
    cat = unicodedata.category(char)
    if cat.startswith("P"):
        return True
    return False


================================================
FILE: paddlepalm/tokenizer/ernie_tokenizer.py
================================================
# -*- coding: UTF-8 -*-
#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Tokenization classes."""

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
from __future__ import absolute_import

from io import open

import collections
import unicodedata
import six


def convert_to_unicode(text):
    """Converts `text` to Unicode (if it's not already), assuming utf-8 input."""
    if six.PY3:
        if isinstance(text, str):
            return text
        elif isinstance(text, bytes):
            return text.decode("utf-8", "ignore")
        else:
            raise ValueError("Unsupported string type: %s" % (type(text)))
    elif six.PY2:
        if isinstance(text, str):
            return text.decode("utf-8", "ignore")
        elif isinstance(text, unicode):
            return text
        else:
            raise ValueError("Unsupported string type: %s" % (type(text)))
    else:
        raise ValueError("Not running on Python2 or Python 3?")


def printable_text(text):
    """Returns text encoded in a way suitable for print or `tf.logging`."""

    # These functions want `str` for both Python2 and Python3, but in one case
    # it's a Unicode string and in the other it's a byte string.
    if six.PY3:
        if isinstance(text, str):
            return text
        elif isinstance(text, bytes):
            return text.decode("utf-8", "ignore")
        else:
            raise ValueError("Unsupported string type: %s" % (type(text)))
    elif six.PY2:
        if isinstance(text, str):
            return text
        elif isinstance(text, unicode):
            return text.encode("utf-8")
        else:
            raise ValueError("Unsupported string type: %s" % (type(text)))
    else:
        raise ValueError("Not running on Python2 or Python 3?")


def load_vocab(vocab_file):
    """Loads a vocabulary file into a dictionary."""
    vocab = collections.OrderedDict()
    with open(vocab_file, encoding='utf8') as fin:
        for num, line in enumerate(fin):
            items = convert_to_unicode(line.strip()).split("\t")
            if len(items) > 2:
                break
            token = items[0]
            index = items[1] if len(items) == 2 else num
            token = token.strip()
            vocab[token] = int(index)
    return vocab


def convert_by_vocab(vocab, items):
    """Converts a sequence of [tokens|ids] using the vocab."""
    output = []
    for item in items:
        output.append(vocab[item])
    return output


def convert_tokens_to_ids(vocab, tokens):
    return convert_by_vocab(vocab, tokens)


def convert_ids_to_tokens(inv_vocab, ids):
    return convert_by_vocab(inv_vocab, ids)


def whitespace_tokenize(text):
    """Runs basic whitespace cleaning and splitting on a peice of text."""
    text = text.strip()
    if not text:
        return []
    tokens = text.split()
    return tokens


class FullTokenizer(object):
    """Runs end-to-end tokenziation."""

    def __init__(self, vocab_file, do_lower_case=True):
        self.vocab = load_vocab(vocab_file)
        self.inv_vocab = {v: k for k, v in self.vocab.items()}
        self.basic_tokenizer = BasicTokenizer(do_lower_case=do_lower_case)
        self.wordpiece_tokenizer = WordpieceTokenizer(vocab=self.vocab)

    def tokenize(self, text):
        split_tokens = []
        for token in self.basic_tokenizer.tokenize(text):
            for sub_token in self.wordpiece_tokenizer.tokenize(token):
                split_tokens.append(sub_token)

        return split_tokens

    def convert_tokens_to_ids(self, tokens):
        return convert_by_vocab(self.vocab, tokens)

    def convert_ids_to_tokens(self, ids):
        return convert_by_vocab(self.inv_vocab, ids)


class CharTokenizer(object):
    """Runs end-to-end tokenziation."""

    def __init__(self, vocab_file, do_lower_case=True):
        self.vocab = load_vocab(vocab_file)
        self.inv_vocab = {v: k for k, v in self.vocab.items()}
        self.wordpiece_tokenizer = WordpieceTokenizer(vocab=self.vocab)

    def tokenize(self, text):
        split_tokens = []
        for token in text.lower().split(" "):
            for sub_token in self.wordpiece_tokenizer.tokenize(token):
                split_tokens.append(sub_token)

        return split_tokens

    def convert_tokens_to_ids(self, tokens):
        return convert_by_vocab(self.vocab, tokens)

    def convert_ids_to_tokens(self, ids):
        return convert_by_vocab(self.inv_vocab, ids)


class BasicTokenizer(object):
    """Runs basic tokenization (punctuation splitting, lower casing, etc.)."""

    def __init__(self, do_lower_case=True):
        """Constructs a BasicTokenizer.

        Args:
            do_lower_case: Whether to lower case the input.
        """
        self.do_lower_case = do_lower_case
        self._never_lowercase = ['[UNK]', '[SEP]', '[PAD]', '[CLS]', '[MASK]']

    def tokenize(self, text):
        """Tokenizes a piece of text."""
        text = convert_to_unicode(text)
        text = self._clean_text(text)

        # This was added on November 1st, 2018 for the multilingual and Chinese
        # models. This is also applied to the English models now, but it doesn't
        # matter since the English models were not trained on any Chinese data
        # and generally don't have any Chinese data in them (there are Chinese
        # characters in the vocabulary because Wikipedia does have some Chinese
        # words in the English Wikipedia.).
        text = self._tokenize_chinese_chars(text)

        orig_tokens = whitespace_tokenize(text)
        split_tokens = []
        for token in orig_tokens:
            if self.do_lower_case and token not in self._never_lowercase:
                token = token.lower()
                token = self._run_strip_accents(token)
            if token in self._never_lowercase:
                split_tokens.extend([token])
            else:
                split_tokens.extend(self._run_split_on_punc(token))

        output_tokens = whitespace_tokenize(" ".join(split_tokens))
        return output_tokens

    def _run_strip_accents(self, text):
        """Strips accents from a piece of text."""
        text = unicodedata.normalize("NFD", text)
        output = []
        for char in text:
            cat = unicodedata.category(char)
            if cat == "Mn":
                continue
            output.append(char)
        return "".join(output)

    def _run_split_on_punc(self, text):
        """Splits punctuation on a piece of text."""
        chars = list(text)
        i = 0
        start_new_word = True
        output = []
        while i < len(chars):
            char = chars[i]
            if _is_punctuation(char):
                output.append([char])
                start_new_word = True
            else:
                if start_new_word:
                    output.append([])
                start_new_word = False
                output[-1].append(char)
            i += 1

        return ["".join(x) for x in output]

    def _tokenize_chinese_chars(self, text):
        """Adds whitespace around any CJK character."""
        output = []
        for char in text:
            cp = ord(char)
            if self._is_chinese_char(cp):
                output.append(" ")
                output.append(char)
                output.append(" ")
            else:
                output.append(char)
        return "".join(output)

    def _is_chinese_char(self, cp):
        """Checks whether CP is the codepoint of a CJK character."""
        # This defines a "chinese character" as anything in the CJK Unicode block:
        #     https://en.wikipedia.org/wiki/CJK_Unified_Ideographs_(Unicode_block)
        #
        # Note that the CJK Unicode block is NOT all Japanese and Korean characters,
        # despite its name. The modern Korean Hangul alphabet is a different block,
        # as is Japanese Hiragana and Katakana. Those alphabets are used to write
        # space-separated words, so they are not treated specially and handled
        # like the all of the other languages.
        if ((cp >= 0x4E00 and cp <= 0x9FFF) or  #
            (cp >= 0x3400 and cp <= 0x4DBF) or  #
            (cp >= 0x20000 and cp <= 0x2A6DF) or  #
            (cp >= 0x2A700 and cp <= 0x2B73F) or  #
            (cp >= 0x2B740 and cp <= 0x2B81F) or  #
            (cp >= 0x2B820 and cp <= 0x2CEAF) or
            (cp >= 0xF900 and cp <= 0xFAFF) or  #
            (cp >= 0x2F800 and cp <= 0x2FA1F)):  #
            return True

        return False

    def _clean_text(self, text):
        """Performs invalid character removal and whitespace cleanup on text."""
        output = []
        for char in text:
            cp = ord(char)
            if cp == 0 or cp == 0xfffd or _is_control(char):
                continue
            if _is_whitespace(char):
                output.append(" ")
            else:
                output.append(char)
        return "".join(output)


class WordpieceTokenizer(object):
    """Runs WordPiece tokenziation."""

    def __init__(self, vocab, unk_token="[UNK]", max_input_chars_per_word=100):
        self.vocab = vocab
        self.unk_token = unk_token
        self.max_input_chars_per_word = max_input_chars_per_word

    def tokenize(self, text):
        """Tokenizes a piece of text into its word pieces.

        This uses a greedy longest-match-first algorithm to perform tokenization
        using the given vocabulary.

        For example:
            input = "unaffable"
            output = ["un", "##aff", "##able"]

        Args:
            text: A single token or whitespace separated tokens. This should have
                already been passed through `BasicTokenizer.

        Returns:
            A list of wordpiece tokens.
        """

        text = convert_to_unicode(text)

        output_tokens = []
        for token in whitespace_tokenize(text):
            chars = list(token)
            if len(chars) > self.max_input_chars_per_word:
                output_tokens.append(self.unk_token)
                continue

            is_bad = False
            start = 0
            sub_tokens = []
            while start < len(chars):
                end = len(chars)
                cur_substr = None
                while start < end:
                    substr = "".join(chars[start:end])
                    if start > 0:
                        substr = "##" + substr
                    if substr in self.vocab:
                        cur_substr = substr
                        break
                    end -= 1
                if cur_substr is None:
                    is_bad = True
                    break
                sub_tokens.append(cur_substr)
                start = end

            if is_bad:
                output_tokens.append(self.unk_token)
            else:
                output_tokens.extend(sub_tokens)
        return output_tokens


def _is_whitespace(char):
    """Checks whether `chars` is a whitespace character."""
    # \t, \n, and \r are technically contorl characters but we treat them
    # as whitespace since they are generally considered as such.
    if char == " " or char == "\t" or char == "\n" or char == "\r":
        return True
    cat = unicodedata.category(char)
    if cat == "Zs":
        return True
    return False


def _is_control(char):
    """Checks whether `chars` is a control character."""
    # These are technically control characters but we count them as whitespace
    # characters.
    if char == "\t" or char == "\n" or char == "\r":
        return False
    cat = unicodedata.category(char)
    if cat.startswith("C"):
        return True
    return False


def _is_punctuation(char):
    """Checks whether `chars` is a punctuation character."""
    cp = ord(char)
    # We treat all non-letter/number ASCII as punctuation.
    # Characters such as "^", "$", and "`" are not in the Unicode
    # Punctuation class but we treat them as punctuation anyways, for
    # consistency.
    if ((cp >= 33 and cp <= 47) or (cp >= 58 and cp <= 64) or
        (cp >= 91 and cp <= 96) or (cp >= 123 and cp <= 126)):
        return True
    cat = unicodedata.category(char)
    if cat.startswith("P"):
        return True
    return False


def tokenize_chinese_chars(text):
    """Adds whitespace around any CJK character."""

    def _is_chinese_char(cp):
        """Checks whether CP is the codepoint of a CJK character."""
        # This defines a "chinese character" as anything in the CJK Unicode block:
        #     https://en.wikipedia.org/wiki/CJK_Unified_Ideographs_(Unicode_block)
        #
        # Note that the CJK Unicode block is NOT all Japanese and Korean characters,
        # despite its name. The modern Korean Hangul alphabet is a different block,
        # as is Japanese Hiragana and Katakana. Those alphabets are used to write
        # space-separated words, so they are not treated specially and handled
        # like the all of the other languages.
        if ((cp >= 0x4E00 and cp <= 0x9FFF) or  #
            (cp >= 0x3400 and cp <= 0x4DBF) or  #
            (cp >= 0x20000 and cp <= 0x2A6DF) or  #
            (cp >= 0x2A700 and cp <= 0x2B73F) or  #
            (cp >= 0x2B740 and cp <= 0x2B81F) or  #
            (cp >= 0x2B820 and cp <= 0x2CEAF) or
            (cp >= 0xF900 and cp <= 0xFAFF) or  #
            (cp >= 0x2F800 and cp <= 0x2FA1F)):  #
            return True

        return False

    def _is_whitespace(c):
        if c == " " or c == "\t" or c == "\r" or c == "\n" or ord(c) == 0x202F:
            return True
        return False

    output = []
    buff = ""
    for char in text:
        cp = ord(char)
        if _is_chinese_char(cp) or _is_whitespace(char):
            if buff != "":
                output.append(buff)
                buff = ""
            output.append(char)
        else:
            buff += char

    if buff != "":
        output.append(buff)

    return output


================================================
FILE: paddlepalm/trainer.py
================================================
# -*- coding: utf-8 -*-
#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from __future__ import print_function
import os
import json
from paddle import fluid
import time
import sys
import numpy as np
import paddlepalm.utils.basic_helper as helper
from paddlepalm.utils import reader_helper, saver
from paddlepalm.distribute import gpu_dev_count, data_feeder, decode_fake
# from paddlepalm.default_settings import *

DEBUG=False


class Trainer(object):
    """
    The core unit to start a training/predicting session for single task. A trainer is to build computation graph, manage training and evaluation process, achieve model/checkpoint saving and pretrain_model/checkpoint loading.
    """

    def __init__(self, name, mix_ratio=1.0, reuse_head_with=None):
        """Create a new trainer.

        Args:
            name: string. The name of the trainer(training task).
            mix_ratio: sampling weight of this trainer in multi-task learning mode. Default is 1.0.
            reuse_head_with: reuse parameters of task head with another trainer. Default is None, not reuse with others.

        """

        self._name = name
        self._pred_reader = None
        self._task_head = None
        self._pred_head = None
      
        self._train_reader = None
        self._dist_train_init = False
        self._predict_reader = None
        self._train_iterator = None
        self._predict_iterator = None

        self._train_init = False
        self._predict_init = False
        self._train_init_prog = None
        self._pred_init_prog = None

        self._check_save = lambda: False

        self._task_reuse_scope = name if reuse_head_with is None else reuse_head_with

        self._feeded_var_names = None
        self._target_vars = None
        self._predict_vars = None

        self._num_examples = 0

        self._multi_task = False
        self._as_auxilary = False
        self._task_id = None

        # training process management
        self._mix_ratio = mix_ratio
        self._expected_train_steps = None
        self._expected_train_epochs = None
        self._steps_pur_epoch = None
        self._pred_steps_pur_epoch = None
        self._cur_train_epoch = 0
        self._cur_train_step = 0
        self._train_finish = False

        self._inputname_to_varname = {}
        self._pred_input_name_list = []
        self._pred_input_varname_list = []
        self._pred_fetch_name_list = []
        self._pred_fetch_var_list = []

        # exe is built when random_init_params called.
        self._exe = None

        self._save_protocol = {
            'input_names': 'self._pred_input_name_list',
            'input_varnames': 'self._pred_input_varname_list',
            'fetch_list': 'self._pred_fetch_name_list'}

        self._lock = False
        self._lock_prog = False
        self._build_forward = False

    def build_forward(self, backbone, task_head):
        """
        Build forward computation graph for training, which usually built from input layer to loss node.

        Args:
            backbone: a Backbone object with phase == 'train', which is used to extract multi-level text features, e.g., contextual word embedding and sentence embedding.
            head: a Head object with phase == 'train', which is used to build task specific output layers.
        
        Return:
            loss_var: a Variable object. The computational graph variable(node) of loss.
        """


        self._task_head = task_head
        self._backbone = backbone

        self._build_forward = True
        
        # create reader, task
        # then check i/o across reader, backbone and task_layer
        task_attrs = []
        pred_task_attrs = []

        task_attr_from_reader = helper.encode_inputs(self._task_head.inputs_attrs['reader'], self.name)

        # merge reader input attrs from backbone and task_instances
        input_names, shape_and_dtypes, name_to_position = reader_helper.merge_input_attrs(backbone.inputs_attr, task_attr_from_reader, insert_taskid=False)
        # shapes: [task_id, shapes_of_backbone, shapes_of_inst1, ..., shapes_of_instN]
        self._shape_and_dtypes = shape_and_dtypes
        self._name_to_position = name_to_position
        self._input_names = input_names

        if DEBUG:
            print('----- for debug -----')
            print('joint input names:')
            print(joint_input_names)
            print('joint input shape and dtypes:')
            print(joint_shape_and_dtypes)

        input_attrs = [[i, j, k] for i, (j,k) in zip(input_names, shape_and_dtypes)]

        train_prog = fluid.Program()
        train_init_prog = fluid.Program()

        if not self._lock_prog:
            self._train_prog = train_prog
            self._train_init_prog = train_init_prog

        if not self._lock_prog:
            with fluid.program_guard(train_prog, train_init_prog):
                net_inputs = reader_helper.create_net_inputs(input_attrs, is_async=False)
                bb_output_vars = backbone.build(net_inputs)
        else:
            net_inputs = reader_helper.create_net_inputs(input_attrs, is_async=False)
            bb_output_vars = backbone.build(net_inputs)
        self._net_inputs = net_inputs
        assert sorted(bb_output_vars.keys()) == sorted(backbone.outputs_attr.keys())

        task_output_vars = {}
        task_inputs = {'backbone': bb_output_vars}
        task_inputs_from_reader = helper.decode_inputs(net_inputs, self.name)
        task_inputs['reader'] = task_inputs_from_reader

        scope = self.name+'.'
        if not self._lock_prog:
            with fluid.program_guard(train_prog, train_init_prog):
                with fluid.unique_name.guard(scope):
                    output_vars = self._build_head(task_inputs, phase='train', scope=scope)
        else:
            with fluid.unique_name.guard(scope):
                output_vars = self._build_head(task_inputs, phase='train', scope=scope)

        output_vars = {self.name+'.'+key: val for key, val in output_vars.items()}
        old = len(task_output_vars) # for debug
        task_output_vars.update(output_vars)
        assert len(task_output_vars) - old == len(output_vars) # for debug

        bb_fetches = {k: v.name for k,v in bb_output_vars.items()}
        task_fetches = {k: v.name for k,v in task_output_vars.items()}
        self._fetches = task_fetches
        self._fetch_names, self._fetch_list = zip(*self._fetches.items())
        if not self._lock_prog:
            with fluid.program_guard(train_prog, train_init_prog):
                loss_var = fluid.layers.reduce_sum(task_output_vars[self.name+'.loss'])
        else:
            loss_var = fluid.layers.reduce_sum(task_output_vars[self.name+'.loss'])

        self._loss_var = loss_var

        if not self._multi_task:
            self._init_exe_prog(for_train=True)

        return loss_var

    def build_predict_forward(self, pred_backbone, pred_head):
        """
        Build computation graph for evaluation and prediction.

        Arguments:
            - pred_backbone: a Backbone object with phase == 'predict'. For evaluating model during training, the predict backbone should keep the same with train backbone.
            - pred_head: a Head object with phase == 'predict'. For evaluating model during training, the predict head should keep the same with train head.
        
        Return:
            - output_vars: dict type. Each value is a computational graph variable(node) argumented by pred_head outputs_attr.
        """
        self._pred_head = pred_head
        self._pred_backbone = pred_backbone
        pred_task_attr_from_reader = helper.encode_inputs(self._pred_head.inputs_attrs['reader'], self.name)

        pred_input_names, pred_shape_and_dtypes, pred_name_to_position = reader_helper.merge_input_attrs(pred_backbone.inputs_attr, pred_task_attr_from_reader, insert_taskid=False)
        pred_input_attrs = [[i, j, k] for i, (j,k) in zip(pred_input_names, pred_shape_and_dtypes)]
        self._pred_shape_and_dtypes = pred_shape_and_dtypes
        self._pred_name_to_position = pred_name_to_position
        self._pred_input_names = pred_input_names

        if not self._lock_prog:
            pred_prog = fluid.Program()
            self._pred_prog = pred_prog
            pred_init_prog = fluid.Program()
            self._pred_init_prog = pred_init_prog

            with fluid.program_guard(pred_prog, pred_init_prog):
                pred_net_inputs = reader_helper.create_net_inputs(pred_input_attrs)
                pred_bb_output_vars = pred_backbone.build(pred_net_inputs)
                self._pred_net_inputs = pred_net_inputs
        else:
            pred_net_inputs = reader_helper.create_net_inputs(pred_input_attrs)
            pred_bb_output_vars = pred_backbone.build(pred_net_inputs)
            self._pred_net_inputs = pred_net_inputs

        # prepare predict vars for saving inference model
        if not self._lock_prog:
            with fluid.program_guard(pred_prog, pred_init_prog):
                cur_inputs = helper.decode_inputs(pred_net_inputs, self.name)
                self._pred_input_name_list, self._pred_input_varname_list = \
                    zip(*[[k, v.name] for k,v in cur_inputs.items()])

                pred_task_inputs = {'backbone': pred_bb_output_vars, 'reader': cur_inputs}
                scope = self.name + '.'
                with fluid.unique_name.guard(scope):
                    output_vars = self._build_head(pred_task_inputs, phase='predict', scope=scope)
        else:
            cur_inputs = helper.decode_inputs(pred_net_inputs, self.name)
            self._pred_input_name_list, self._pred_input_varname_list = \
                zip(*[[k, v.name] for k,v in cur_inputs.items()])

            pred_task_inputs = {'backbone': pred_bb_output_vars, 'reader': cur_inputs}
            scope = self.name + '.'
            with fluid.unique_name.guard(scope):
                output_vars = self._build_head(pred_task_inputs, phase='predict', scope=scope)

        if output_vars is not None:
            self._pred_fetch_name_list, self._pred_fetch_list = zip(*output_vars.items())
        else:
            self._pred_fetch_name_list = []
            self._pred_fetch_var_list = []

        # if not self._multi_task:
        self._init_exe_prog(for_train=False)
        self._exe.run(self._pred_init_prog)

        self._predict_vars = output_vars
            
        return output_vars

    def build_backward(self, optimizer, weight_decay=None, use_ema=False, ema_decay=None):
        """
        Build backward computation graph and training strategy.

        Arguments:
            - optimizer: 
            - weight_decay: optional, default is None (disable weight decay).
            - use_ema: optional, default is False. The flag to control whether to apply Exponential Moving Average strategy on parameter updates.
            - ema_decay: optional, default is None. Only works with use_ema == True. Control decay rate of EMA strategy.

        """
        # build optimizer
        assert self._loss_var is not None and self._train_init_prog is not None, "train graph not foung! You should build_forward first."
        optimizer._set_prog(self._train_prog, self._train_init_prog)
        with fluid.program_guard(self._train_prog, self._train_init_prog):
            param_grads = optimizer._build()

            if weight_decay is not None:

                param_list = dict()

                for param in self._train_prog.global_block().all_parameters():
                    param_list[param.name] = param * 1.0
                    param_list[param.name].stop_gradient = True

                def exclude_from_weight_decay(name):
                    if name.find("layer_norm") > -1:
                        return True
                    bias_suffix = ["_bias", "_b", ".b_0"]
                    for suffix in bias_suffix:
                        if name.endswith(suffix):
                            return True
                    return False

                for param, grad in param_grads:
                    if exclude_from_weight_decay(param.name):
                        continue
                    with param.block.program._optimized_guard(
                        [param, grad]), fluid.framework.name_scope("weight_decay"):
                        updated_param = param - param_list[
                            param.name] * weight_decay * optimizer.get_cur_learning_rate()
                        fluid.layers.assign(output=param, input=updated_param)

            if use_ema:
                ema = fluid.optimizer.ExponentialMovingAverage(ema_decay)
                ema.update()

        self._exe.run(self._train_init_prog)

    def set_as_aux(self):
        """Set the task in this trainer as auxilary task. \nCAUSIOUS: This API only works on multi-task learning mode. Each task is set as target task by default. """
        self._as_auxilary = True

    def fit_reader(self, reader, phase='train'):
        """
        Bind a reader and loaded train/predict data to trainer. 
        
        Args:
            reader: a Reader object. The running phase of the reader should be consistent with `phase` argument of this method.
            phase: running phase. Currently support: train, predict.

        """

        self._check_phase(phase)
        if phase=='train':
            assert self._shape_and_dtypes is not None, "You need to build_forward or build_predict_head first to prepare input features."
        else:
            assert self._pred_shape_and_dtypes is not None, "You need to build_forward     or build_predict_head first to prepare input features."

        batch_size = reader._batch_size

        self._num_epochs = reader.num_epochs
        if phase == 'train':
            self._train_reader = reader
            self._steps_pur_epoch = reader.num_examples // batch_size
            shape_and_dtypes = self._shape_and_dtypes
            name_to_position = self._name_to_position
            if self._task_id is not None:
                self._net_inputs['__task_id'] = self._task_id
            net_inputs = self._net_inputs
            self._train_batch_size = batch_size
            self._num_examples = reader.num_examples
            reader_helper.check_io(self._backbone.inputs_attr, reader.outputs_attr, in_name='backbone', out_name='reader(train)')
            reader_helper.check_io(self._task_head.inputs_attrs['reader'], reader.outputs_attr, in_name='task_head(reader)', out_name='reader(train)')
            reader_helper.check_io(self._task_head.inputs_attrs['backbone'], self._backbone.outputs_attr, in_name='task_head(backbone, train)', out_name='backbone')
        elif phase == 'predict':
            self._predict_reader = reader
            self._pred_steps_pur_epoch = reader.num_examples // batch_size 
            shape_and_dtypes = self._pred_shape_and_dtypes
            name_to_position = self._pred_name_to_position
            net_inputs = self._pred_net_inputs
            self._predict_batch_size = batch_size
            self._pred_num_examples = reader.num_examples
            reader_helper.check_io(self._pred_backbone.inputs_attr, reader.outputs_attr, in_name='backbone', out_name='reader(predict)')
            reader_helper.check_io(self._pred_head.inputs_attrs['reader'], reader.outputs_attr, in_name='task_head(reader)', out_name='reader(predict)')
            reader_helper.check_io(self._pred_head.inputs_attrs['backbone'], self._pred_backbone.outputs_attr, in_name='task_head(backbone, predict)', out_name='backbone')
        else:
            raise NotImplementedError()
            
        print('ok!')

        # merge dataset iterators and create net input vars
        iterator = reader._iterator()
        prefix = self.name

        # merge dataset iterators and create net input vars
        iterator = reader._iterator()
        prefix = self.name

        # 对yield出的数据进行runtime检查和适配
        iterator_fn = reader_helper.create_iterator_fn(iterator, prefix, shape_and_dtypes, name_to_position, return_type='dict')
        self._raw_iterator_fn = iterator_fn
        feed_batch_process_fn = reader_helper.create_feed_batch_process_fn(net_inputs)
        if gpu_dev_count > 1:
            distribute_feeder_fn = data_feeder(iterator_fn, feed_batch_process_fn, phase=phase)
        else:
            distribute_feeder_fn = iterator_fn()

        if phase == 'train':
            self._train_iterator = distribute_feeder_fn
            self._feed_batch_process_fn = feed_batch_process_fn
        elif phase == 'predict':
            self._predict_iterator = distribute_feeder_fn
            self._pred_feed_batch_process_fn = feed_batch_process_fn
        return distribute_feeder_fn

    def load_ckpt(self, model_path):
        """
        load training checkpoint for further training or predicting.

        Args:
            model_path: the path of saved checkpoint/parameters.
        """
        assert self._train_init_prog is not None or self._pred_init_prog is not None, "model graph not built. You should at least build_forward or build_predict_forward to load its checkpoint."

        # if self._train_init_prog is not None:
        #     saver.init_pretraining_params(
        #         self._exe,
        #         model_path,
        #         convert=False,
        #         main_program=self._train_init_prog,
        #         strict=True)
        # elif self._pred_init_prog is not None:
        #     saver.init_pretraining_params(
        #         self._exe,
        #         model_path,
        #         convert=False,
        #         main_program=self._pred_init_prog,
        #         strict=True)
        if self._train_init_prog is not None:
            print('loading checkpoint into train program')
            saver.init_checkpoint(
                self._exe,
                model_path,
                main_program=self._train_init_prog)
        elif self._pred_init_prog is not None:
            saver.init_checkpoint(
                self._exe,
                model_path,
                main_program=self._pred_init_prog)
        else:
            raise Exception("model not found. You should at least build_forward or build_predict_forward to load its checkpoint.")

    def load_predict_model(self, model_path, convert=False):
        """
        load pretrain models(backbone) for training.

        Args:
            model_path: the path of saved pretrained parameters.
        """

        assert self._pred_prog is not None, "training graph not found. You should at least build_forward to load its pretrained parameters."

        saver.init_pretraining_params(
            self._exe,
            model_path,
            convert=convert,
            main_program=self._pred_prog)

    def load_pretrain(self, model_path, convert=False):
        """
        load pretrain models(backbone) for training.

        Args:
            model_path: the path of saved pretrained parameters.
        """
        assert self._train_init_prog is not None, "training graph not found. You should at least build_forward to load its pretrained parameters."

        saver.init_pretraining_params(
            self._exe,
            model_path,
            convert=convert,
            main_program=self._train_init_prog)

    def set_saver(self, save_path, save_steps, save_type='ckpt'):
        """
        create a build-in saver into trainer. A saver will automatically save checkpoint or predict model every `save_steps` training steps.

        Args:
            save_path: a string. the path to save checkpoints or predict models.
            save_steps: an integer. the frequency to save models.
            save_type: a string. The type of saved model. Currently support checkpoint(ckpt) and predict model(predict), default is ckpt. If both two types are needed to save, you can set as "ckpt,predict".

        """
        

        save_type = save_type.split(',')
        if 'predict' in save_type:
            assert self._pred_head is not None, "Predict head not found! You should build_predict_head first if you want to save predict model."
            assert save_path is not None and save_steps is not None, 'save_path and save_steps is required to save model.'
            self._save_predict = True
            if not os.path.exists(save_path):
                os.makedirs(save_path)
        else:
            self._save_predict = False

        if 'ckpt' in save_type:
            if save_path is not None and save_steps is not None:
                self._save_ckpt = True
                if not os.path.exists(save_path):
                    os.makedirs(save_path)
            else:
                "WARNING: save_path or save_steps is not set, model will not be saved during training."
                self._save_ckpt = False
        else:
            self._save_ckpt = False

        def temp_func():
            if (self._save_predict or self._save_ckpt) and self._cur_train_step % save_steps == 0:

                if self._save_predict:
                    self._save(save_path, suffix='pred.step'+str(self._cur_train_step))
                    print('predict model has been saved at '+os.path.join(save_path, 'pred.step'+str(self._cur_train_step)))
                    sys.stdout.flush()
                if self._save_ckpt:
                    fluid.io.save_persistables(self._exe, os.path.join(save_path, 'ckpt.step'+str(self._cur_train_step)), self._train_prog)
                    print('checkpoint has been saved at '+os.path.join(save_path, 'ckpt.step'+str(self._cur_train_step)))
                    sys.stdout.flush()
                return True
            else:
                return False

        self._check_save = temp_func
            
    def train(self, print_steps=5):
        """
        start training.

        Args:
            print_steps: int. Logging frequency of training message, e.g., current step, loss and speed.
        """
        
        iterator = self._train_iterator
        self._distribute_train_prog = fluid.CompiledProgram(self._train_prog).with_data_parallel(loss_name=self._loss_var.name)

        time_begin = time.time()
        for feed in iterator:
            rt_outputs = self.train_one_step(feed)

            task_rt_outputs = {k[len(self.name+'.'):]: v for k,v in rt_outputs.items() if k.startswith(self.name+'.')}
            self._task_head.batch_postprocess(task_rt_outputs)


            if print_steps > 0 and self._cur_train_step % print_steps == 0:
                loss = rt_outputs[self.name+'.loss']
                loss = np.mean(np.squeeze(loss)).tolist()

                time_end = time.time()
                time_cost = time_end - time_begin

                print("step {}/{} (epoch {}), loss: {:.3f}, speed: {:.2f} steps/s".format(
                       (self._cur_train_step-1) % self._steps_pur_epoch + 1 , self._steps_pur_epoch, self._cur_train_epoch,
                       loss, print_steps / time_cost))
                sys.stdout.flush()
                time_begin = time.time() 

            if self._num_epochs is None and not self._multi_task and self._cur_train_step == self._steps_pur_epoch:
                break
        
    def predict(self, output_dir=None, print_steps=1000):
        """
        start predicting.

        Args:
            output_dir: str. The path to save prediction results, default is None. If set as None, the results would output to screen directly. 
            print_steps: int. Logging frequency of predicting message, e.g., current progress and speed.
        """
        iterator = self._predict_iterator
        self._distribute_pred_prog = fluid.CompiledProgram(self._pred_prog).with_data_parallel()


        if output_dir is not None and not os.path.exists(output_dir):
            os.makedirs(output_dir)

        time_begin = time.time()
        
        cur_predict_step = 0
        for feed in iterator:
            rt_outputs = self.predict_one_batch(feed)
            self._pred_head.batch_postprocess(rt_outputs)

            cur_predict_step += 1

            if print_steps > 0 and cur_predict_step % print_steps == 0:
                time_end = time.time()
                time_cost = time_end - time_begin

                print("batch {}/{}, speed: {:.2f} steps/s".format(
                       cur_predict_step, self._pred_steps_pur_epoch,
                       print_steps / time_cost))
                sys.stdout.flush()
                time_begin = time.time()

        if self._pred_head.epoch_inputs_attrs:
            reader_outputs = self._predict_reader.get_epoch_outputs()
        else:
            reader_outputs = None

        results = self._pred_head.epoch_postprocess({'reader':reader_outputs}, output_dir=output_dir)
        return results

    def reset_buffer(self):
        self._pred_head.reset()

    def _check_phase(self, phase):
        assert phase in ['train', 'predict'], "Supported phase: train, predict,"

    def _set_multitask(self):
        self._multi_task = True

    def _set_nomultitask(self):
        self._multi_task = False

    def _set_task_id(self, task_id):
        self._task_id = task_id

    def _init_exe_prog(self, for_train=True):
        if not self._train_init and not self._predict_init:
            on_gpu = gpu_dev_count > 0
            self._exe = helper.build_executor(on_gpu)

        if for_train:
            assert self._train_prog is not None, "train graph not found! You should build_forward first before you random init parameters."
            self._train_init = True
        else:
            assert self._pred_prog is not None, "predict graph not found! You should build_predict_head first before you random init parameters."
            self._predict_init = True

    # def random_init_params(self):
    #     """
    #     randomly initialize model parameters.
    #     """
    #     
    #     if not self._train_init:
    #         self._init_exe_prog()
    #     
    #     print('random init params...')
    #     self._exe.run(self._train_init_prog)

    def get_one_batch(self, phase='train'):
        self._check_phase(phase)
        if phase == 'train':
            return next(self._train_reader)
        elif phase == 'predict':
            return next(self._predict_reader)
        else:
            raise NotImplementedError()

    def _set_exe(self, exe):
        self._exe = exe

    def _set_dist_train(self, prog):
        self._distribute_train_prog = prog

    def _set_dist_pred(self, prog):
        self._distribute_pred_prog = prog

    def _set_fetch_list(self, fetch_list):
        self._fetch_list = fetch_list

    def train_one_step(self, batch):

        if not self._dist_train_init:
            self._distribute_train_prog = fluid.CompiledProgram(self._train_prog).with_data_parallel(loss_name=self._loss_var.name)
            self._dist_train_init = True

        exe = self._exe
        distribute_train_prog = self._distribute_train_prog
        fetch_list = self._fetch_list

        if gpu_dev_count > 1:
            feed, mask = batch
            rt_outputs = exe.run(distribute_train_prog, feed=feed, fetch_list=fetch_list)
            num_fakes = decode_fake(len(rt_outputs[0]), mask, self._train_batch_size)
            if num_fakes:
                rt_outputs = [i[:-num_fakes] for i in rt_outputs]
        
        else:
            feed = self._feed_batch_process_fn(batch)
            rt_outputs = exe.run(distribute_train_prog, feed=feed, fetch_list=fetch_list)

        rt_outputs = {k:v for k,v in zip(self._fetch_names, rt_outputs)}
        self._cur_train_step += 1
        self._check_save()
        self._cur_train_epoch = (self._cur_train_step-1) // self._steps_pur_epoch
        return rt_outputs

    def predict_one_batch(self, batch):
        if gpu_dev_count > 1:
            feed, mask = batch
            rt_outputs = self._exe.run(self._distribute_pred_prog, feed=feed, fetch_list=self._pred_fetch_list, use_prune=True)
            num_fakes = decode_fake(len(rt_outputs[0]), mask, self._predict_batch_size)
            if num_fakes:
                rt_outputs = [i[:-num_fakes] for i in rt_outputs]
        else:
            feed = self._pred_feed_batch_process_fn(batch)
            rt_outputs = self._exe.run(self._distribute_pred_prog, feed=feed, fetch_list=self._pred_fetch_list, use_prune=True)

        rt_outputs = {k:v for k,v in zip(self._pred_fetch_name_list, rt_outputs)}
        return rt_outputs

    @property
    def name(self):
        return self._name
    
    @property
    def num_examples(self):
        return self._num_examples

    @property
    def mix_ratio(self):
        return self._mix_ratio

    @mix_ratio.setter
    def mix_ratio(self, value):
        self._mix_ratio = value

    @property
    def num_epochs(self):
        return self._num_epochs

    @property
    def cur_train_step(self):
        return self._cur_train_step

    @property
    def cur_train_epoch(self):
        return self._cur_train_epoch

    @property
    def steps_pur_epoch(self):
        return self._steps_pur_epoch

    def _build_head(self, net_inputs, phase, scope=""):
        self._check_phase(phase)
        if phase == 'train':
            output_vars = self._task_head.build(net_inputs, scope_name=scope)
        if phase == 'predict':
            output_vars = self._pred_head.build(net_inputs, scope_name=scope)
        return output_vars
    
    def _save(self, save_path, suffix=None):
        # dirpath = save_path.rstrip('/').rstrip('\\') + suffix
        if suffix is not None:
            dirpath = os.path.join(save_path, suffix)
        else:
            dirpath = save_path
        self._pred_input_varname_list = [str(i) for i in self._pred_input_varname_list]

        prog = self._pred_prog.clone()
        fluid.io.save_inference_model(dirpath, self._pred_input_varname_list, self._pred_fetch_var_list, self._exe, prog)

        conf = {}
        for k, strv in self._save_protocol.items(): 
            d = None
            v = locals()
            exec('d={}'.format(strv), globals(), v)
            conf[k] = v['d']
        with open(os.path.join(dirpath, '__conf__'), 'w') as writer:
            writer.write(json.dumps(conf, indent=1))
        print(self._name + ': predict model saved at ' + dirpath)
        sys.stdout.flush()

    
    def _load(self, infer_model_path=None):
        if infer_model_path is None:
            infer_model_path = self._save_infermodel_path
        for k,v in json.load(open(os.path.join(infer_model_path, '__conf__'))).items(): 
            strv = self._save_protocol[k]
            exec('{}=v'.format(strv))
        pred_prog, self._pred_input_varname_list, self._pred_fetch_var_list = \
            fluid.io.load_inference_model(infer_model_path, self._exe)
        print(self._name+': inference model loaded from ' + infer_model_path)
        sys.stdout.flush()
        return pred_prog


================================================
FILE: paddlepalm/utils/__init__.py
================================================

from . import basic_helper
from . import config_helper


================================================
FILE: paddlepalm/utils/basic_helper.py
================================================
# coding=utf-8
import os
import json
import yaml
from .config_helper import PDConfig
import logging
from paddle import fluid

def get_basename(f):
    return os.path.splitext(f)[0]


def get_suffix(f):
    return os.path.splitext(f)[-1]


def parse_yaml(f, asdict=True, support_cmd_line=False):
    assert os.path.exists(f), "file {} not found.".format(f)
    if support_cmd_line:
        args = PDConfig(yaml_file=f, fuse_args=True)
        args.build()
        return args.asdict() if asdict else args
    else:
        if asdict:
            with open(f, "r") as fin: 
                yaml_config = yaml.load(fin, Loader=yaml.SafeLoader)
            return yaml_config
        else:
            raise NotImplementedError()


def parse_json(f, asdict=True, support_cmd_line=False):
    assert os.path.exists(f), "file {} not found.".format(f)
    if support_cmd_line:
        args = PDConfig(json_file=f, fuse_args=support_cmd_line)
        args.build()
        return args.asdict() if asdict else args
    else:
        if asdict:
            with open(f, "r") as fin: 
                config = json.load(fin)
            return config
        else:
            raise NotImplementedError()
            

def parse_list(string, astype=str):
    assert isinstance(string, str), "{} is not a string.".format(string)
    if ',' not in string:
        return [astype(string)]
    string = string.replace(',', ' ')
    return [astype(i) for i in string.split()]


def try_float(s):
    try:
        float(s)
        return(float(s))
    except:
        return s


# TODO: 增加None机制，允许hidden size、batch size和seqlen设置为None
def check_io(in_attr, out_attr, strict=False, in_name="left", out_name="right"):
    for name, attr in in_attr.items():
        assert name in out_attr, in_name+': '+name+' not found in '+out_name
        if attr != out_attr[name]:
            if strict:
                raise ValueError(name+': shape or dtype not consistent!')
            else:
                logging.warning('{}: shape or dtype not consistent!\n{}:\n{}\n{}:\n{}'.format(name, in_name, attr, out_name, out_attr[name]))


def encode_inputs(inputs, scope_name, sep='.', cand_set=None):
    outputs = {}
    for k, v in inputs.items():
        if cand_set is not None:
            if k in cand_set:
                outputs[k] = v
            if scope_name+sep+k in cand_set:
                outputs[scope_name+sep+k] = v
        else:
            outputs[scope_name+sep+k] = v
    return outputs


def decode_inputs(inputs, scope_name, sep='.', keep_unk_keys=True):
    outputs = {}
    for name, value in inputs.items():
        # var for backbone are also available to tasks
        if keep_unk_keys and sep not in name:
            outputs[name] = value
        # var for this inst
        if name.startswith(scope_name+'.'):
            outputs[name[len(scope_name+'.'):]] = value
    return outputs


def build_executor(on_gpu):
    if on_gpu:
        place = fluid.CUDAPlace(0)
        # dev_count = fluid.core.get_cuda_device_count()
    else:
        place = fluid.CPUPlace()
        # dev_count = int(os.environ.get('CPU_NUM', multiprocessing.cpu_count()))
    # return fluid.Executor(place), dev_count
    return fluid.Executor(place)


def fit_attr(conf, fit_attr, strict=False):
    for i, attr in fit_attr.items():
        if i not in conf:
            if strict:
                raise Exception('Argument {} is required to create a controller.'.format(i))
            else:
                continue
        conf[i] = attr(conf[i])
    return conf


================================================
FILE: paddlepalm/utils/config_helper.py
================================================
# -*- coding: UTF-8 -*-
#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import os
import sys
import argparse
import json
import yaml
import six
import logging

logging_only_message = "%(message)s"
logging_details = "%(asctime)s.%(msecs)03d %(levelname)s %(module)s - %(funcName)s: %(message)s"


class JsonConfig(object):
    """
    A high-level api for handling json configure file.
    """

    def __init__(self, config_path):
        self._config_dict = self._parse(config_path)

    def _parse(self, config_path):
        try:
            with open(config_path) as json_file:
                config_dict = json.load(json_file)
                assert isinstance(config_dict, dict), "Object in {} is NOT a dict.".format(config_path)
        except:
            raise IOError("Error in parsing bert model config file '%s'" %
                          config_path)
        else:
            return config_dict

    def __getitem__(self, key):
        return self._config_dict[key]

    def asdict(self):
        return self._config_dict

    def print_config(self):
        for arg, value in sorted(six.iteritems(self._config_dict)):
            print('%s: %s' % (arg, value))
        print('------------------------------------------------')


class ArgumentGroup(object):
    def __init__(self, parser, title, des):
        self._group = parser.add_argument_group(title=title, description=des)

    def add_arg(self, name, type, default, help, **kwargs):
        type = str2bool if type == bool else type
        self._group.add_argument(
            "--" + name,
            default=default,
            type=type,
            help=help + ' Default: %(default)s.',
            **kwargs)


class ArgConfig(object):
    """
    A high-level api for handling argument configs.
    """

    def __init__(self):
        parser = argparse.ArgumentParser()

        train_g = ArgumentGroup(parser, "training", "training options.")
        train_g.add_arg("epoch", int, 3, "Number of epoches for fine-tuning.")
        train_g.add_arg("learning_rate", float, 5e-5,
                        "Learning rate used to train with warmup.")
        train_g.add_arg(
            "lr_scheduler",
            str,
            "linear_warmup_decay",
            "scheduler of learning rate.",
            choices=['linear_warmup_decay', 'noam_decay'])
        train_g.add_arg("weight_decay", float, 0.01,
                        "Weight decay rate for L2 regularizer.")
        train_g.add_arg(
            "warmup_proportion", float, 0.1,
            "Proportion of training steps to perform linear learning rate warmup for."
        )
        train_g.add_arg("save_steps", int, 1000,
                        "The steps interval to save checkpoints.")
        train_g.add_arg(
            "loss_scaling", float, 1.0,
            "Loss scaling factor for mixed precision training, only valid when use_fp16 is enabled."
        )
        train_g.add_arg("pred_dir", str, None,
                        "Path to save the prediction results")

        log_g = ArgumentGroup(parser, "logging", "logging related.")
        log_g.add_arg("skip_steps", int, 10,
                      "The steps interval to print loss.")
        log_g.add_arg("verbose", bool, False, "Whether to output verbose log.")

        run_type_g = ArgumentGroup(parser, "run_type", "running type options.")
        run_type_g.add_arg("use_cuda", bool, True,
                           "If set, use GPU for training.")
        run_type_g.add_arg(
            "use_fast_executor", bool, False,
            "If set, use fast parallel executor (in experiment).")
        run_type_g.add_arg(
            "num_iteration_per_drop_scope", int, 1,
            "Ihe iteration intervals to clean up temporary variables.")
        run_type_g.add_arg("do_train", bool, True,
                           "Whether to perform training.")
        run_type_g.add_arg("do_predict", bool, True,
                           "Whether to perform prediction.")

        custom_g = ArgumentGroup(parser, "customize", "customized options.")

        self.custom_g = custom_g

        self.parser = parser

    def add_arg(self, name, dtype, default, descrip):
        self.custom_g.add_arg(name, dtype, default, descrip)

    def build_conf(self):
        return self.parser.parse_args()


def str2bool(v):
    # because argparse does not support to parse "true, False" as python
    # boolean directly
    return v.lower() in ("true", "t", "1")


def print_arguments(args, log=None):
    if not log:
        print('-----------  Configuration Arguments -----------')
        for arg, value in sorted(six.iteritems(vars(args))):
            print('%s: %s' % (arg, value))
        print('------------------------------------------------')
    else:
        log.info('-----------  Configuration Arguments -----------')
        for arg, value in sorted(six.iteritems(vars(args))):
            log.info('%s: %s' % (arg, value))
        log.info('------------------------------------------------')


class PDConfig(object):
    """
    A high-level API for managing configuration files in PaddlePaddle.
    Can jointly work with command-line-arugment, json files and yaml files.
    """

    def __init__(self, json_file=None, yaml_file=None, fuse_args=True):
        """
            Init funciton for PDConfig.
            json_file: the path to the json configure file.
            yaml_file: the path to the yaml configure file.
            fuse_args: if fuse the json/yaml configs with argparse.
        """

        if json_file is not None and yaml_file is not None:
            raise Warning(
                "json_file and yaml_file can not co-exist for now. please only use one configure file type."
            )
            return

        self.args = None
        self.arg_config = {}
        self.json_config = {}
        self.yaml_config = {}

        parser = argparse.ArgumentParser()

        self.yaml_g = ArgumentGroup(parser, "yaml", "options from yaml.")
        self.json_g = ArgumentGroup(parser, "json", "options from json.")
        self.com_g = ArgumentGroup(parser, "custom", "customized options.")

        self.parser = parser

        if json_file is not None:
            assert isinstance(json_file, str)
            self.load_json(json_file, fuse_args=fuse_args)

        if yaml_file is not None:
            assert isinstance(yaml_file, str) or isinstance(yaml_file, list)
            self.load_yaml(yaml_file, fuse_args=fuse_args)

    def load_json(self, file_path, fuse_args=True):

        if not os.path.exists(file_path):
            raise Warning("the json file %s does not exist." % file_path)
            return

        with open(file_path, "r") as fin:
            self.json_config = json.loads(fin.read())
            fin.close()

        if fuse_args:
            for name in self.json_config:
                if not isinstance(self.json_config[name], int) \
                    and not isinstance(self.json_config[name], float) \
                    and not isinstance(self.json_config[name], str) \
                    and not isinstance(self.json_config[name], bool):

                    continue

                self.json_g.add_arg(name,
                                    type(self.json_config[name]),
                                    self.json_config[name],
                                    "This is from %s" % file_path)

    def load_yaml(self, file_path_list, fuse_args=True):

        if isinstance(file_path_list, str):
            file_path_list = [file_path_list]
        for file_path in file_path_list: 
            if not os.path.exists(file_path):
                raise Warning("the yaml file %s does not exist." % file_path)
                return

            with open(file_path, "r") as fin: 
                self.yaml_config = yaml.load(fin, Loader=yaml.SafeLoader)
            if fuse_args:
                for name in self.yaml_config:
                    if not isinstance(self.yaml_config[name], int) \
                        and not isinstance(self.yaml_config[name], float) \
                        and not isinstance(self.yaml_config[name], str) \
                        and not isinstance(self.yaml_config[name], bool):

                        continue

                    self.yaml_g.add_arg(name,
                                        type(self.yaml_config[name]),
                                        self.yaml_config[name],
                                        "This is from %s" % file_path)

    def build(self):
        self.args = self.parser.parse_args()
        self.arg_config = vars(self.args)

    def asdict(self):
        return self.arg_config

    def __add__(self, new_arg):
        assert isinstance(new_arg, list) or isinstance(new_arg, tuple)
        assert len(new_arg) >= 3
        assert self.args is None

        name = new_arg[0]
        dtype = new_arg[1]
        dvalue = new_arg[2]
        desc = new_arg[3] if len(
            new_arg) == 4 else "Description is not provided."

        self.com_g.add_arg(name, dtype, dvalue, desc)

        return self

    def __getattr__(self, name):
        if name in self.arg_config:
            return self.arg_config[name]

        if name in self.json_config:
            return self.json_config[name]

        if name in self.yaml_config:
            return self.yaml_config[name]

        raise Warning("The argument %s is not defined." % name)

    def Print(self):

        print("-" * 70)
        for name in self.arg_config:
            print("{: <25}\t{}".format(str(name), str(self.arg_config[name])))

        for name in self.json_config:
            if name not in self.arg_config:
                print("{: <25}\t{}" %
                      (str(name), str(self.json_config[name])))

        for name in self.yaml_config:
            if name not in self.arg_config:
                print("{: <25}\t{}" %
                      (str(name), str(self.yaml_config[name])))

        print("-" * 70)


if __name__ == "__main__":
    pd_config = PDConfig(yaml_file="./test/bert_config.yaml")
    pd_config += ("my_age", int, 18, "I am forever 18.")
    pd_config.build()

    print(pd_config.do_train)
    print(pd_config.hidden_size)
    print(pd_config.my_age)


================================================
FILE: paddlepalm/utils/plot_helper.py
================================================


================================================
FILE: paddlepalm/utils/print_helper.py
================================================
# -*- coding: UTF-8 -*-
#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

MAXLEN = 70
def print_dict(dic, title=""):

    if title:
        title = ' ' + title + ' '
        left_len = (MAXLEN - len(title)) // 2
        title = '-' * left_len + title
        right_len = MAXLEN - len(title)
        title = title + '-' * right_len
    else:
        title = '-' * MAXLEN
    print(title)
    for name in dic:
        print("{: <25}\t{}".format(str(name), str(dic[name])))
    print("")
    # print("-" * MAXLEN + '\n')


================================================
FILE: paddlepalm/utils/reader_helper.py
================================================
# -*- coding: UTF-8 -*-
#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import os
import sys
import random
import logging
import numpy as np
import paddle
from paddle import fluid
from paddle.fluid import layers
from paddlepalm.distribute import gpu_dev_count, cpu_dev_count
import six
dev_count = 1 if gpu_dev_count <= 1 else gpu_dev_count


def create_feed_batch_process_fn(net_inputs):
    
    def feed_batch_process_fn(data, id=-1, phase='train', is_multi=False):
        temp = {}
        if dev_count > 1 and phase=='train' and is_multi:
            inputs = net_inputs[id]
        else:
            inputs= net_inputs

        for q, var in inputs.items():
            
            if isinstance(var, str) or (six.PY3 and isinstance(var, bytes)) or (six.PY2 and isinstance(var, unicode)):
                temp[var] = data[q]
            else:
                temp[var.name] = data[q]
        return temp

    return feed_batch_process_fn


# def create_multihead_feed_batch_process_fn(net_inputs):
# 
#     def feed_batch_process_fn(data, id=-1):
#         # temps = {}
#         # for i in range(len(net_inputs)):
#         temp = {}
#         inputs = net_inputs[id] if id != -1 else net_inputs
#         
#         for q, var in inputs.items():
#             if isinstance(var, str) or isinstance(var, unicode):
#                 temp[var] = data[q]
#             else:
#                 temp[var.name] = data[q]
#             # temps[i] = temp
#             
#         return temp
# 
#     return feed_batch_process_fn


def check_io(in_attr, out_attr, strict=False, in_name="left", out_name="right"):
    for name, attr in in_attr.items():
        assert name in out_attr, in_name+': '+name+' not found in '+out_name
        if attr != out_attr[name]:
            if strict:
                raise ValueError(name+': shape or dtype not consistent!')
            else:
                logging.warning('{}: shape or dtype not consistent!\n{}:\n{}\n{}:\n{}'.format(name, in_name, attr, out_name, out_attr[name]))


def _check_and_adapt_shape_dtype(rt_val, attr, message=""):
    if not isinstance(rt_val, np.ndarray):
        if rt_val is None:
            raise Exception(message+": get None value. ")
        rt_val = np.array(rt_val)
        assert rt_val.dtype != np.dtype('O'), message+"yielded data is not a valid tensor (number of elements on some dimension may not consistent): {}".format(rt_val)
        if rt_val.dtype == np.dtype('float64'):
            rt_val = rt_val.astype('float32')
    
    shape, dtype = attr
    assert rt_val.dtype == np.dtype(dtype), message+"yielded data type not consistent with attr settings. Expect: {}, receive: {}.".format(rt_val.dtype, np.dtype(dtype))
    assert len(shape) == rt_val.ndim, message+"yielded data rank(ndim) not consistent with attr settings. Expect: {}, receive: {}.".format(len(shape), rt_val.ndim)
    for rt, exp in zip(rt_val.shape, shape):
        if exp is None or exp < 0:
            continue
        assert rt == exp, "yielded data shape is not consistent with attr settings.Expected:{}Actual:{}".format(exp, rt)
    return rt_val
    

def _zero_batch(attrs):
    pos_attrs = []
    for shape, dtype in attrs:
        pos_shape = [size if size and size > 0 else 1 for size in shape]
        pos_attrs.append([pos_shape, dtype])

    return [np.zeros(shape=shape, dtype=dtype) for shape, dtype in pos_attrs]


def _zero_batch_x(attrs, batch_size):
    pos_attrs = []
    for shape, dtype in attrs:
        pos_shape = [size for size in shape]
        if pos_shape[0] == -1:
            pos_shape[0] = batch_size
        if pos_shape[1] == -1:
            pos_shape[1] = 512 # max seq len
        pos_attrs.append([pos_shape, dtype])

    return [np.zeros(shape=shape, dtype=dtype) for shape, dtype in pos_attrs]


def create_net_inputs(input_attrs, is_async=False, iterator_fn=None, dev_count=1, n_prefetch=1):
    inputs = []
    ret = {}
    for name, shape, dtype in input_attrs:
        p = layers.data(name, shape=shape, dtype=dtype)
        ret[name] = p
        inputs.append(p)

    if is_async:
        assert iterator_fn is not None, "iterator_fn is needed for building async input layer."
        reader = fluid.io.PyReader(inputs, capacity=dev_count, iterable=False)
        reader.decorate_batch_generator(iterator_fn)
        reader.start()

    return ret


def create_iterator_fn(iterator, iterator_prefix, shape_and_dtypes, outname_to_pos, verbose=0, return_type='list'):

    pos_to_outname = {j:i for i,j in outname_to_pos.items()}
    
    def iterator_fn():
        v = verbose
        for outputs in iterator:
            results = [None] * len(outname_to_pos)
            prefix = iterator_prefix
            for outname, val in outputs.items():
                task_outname = prefix + '.' + outname

                if outname in outname_to_pos:
                    idx = outname_to_pos[outname]
                    val = _check_and_adapt_shape_dtype(val, shape_and_dtypes[idx])
                    results[idx] = val

                if task_outname in outname_to_pos:
                    idx = outname_to_pos[task_outname]
                    val = _check_and_adapt_shape_dtype(val, shape_and_dtypes[idx])
                    results[idx] = val
            if return_type == 'list':
                yield results
            elif return_type == 'dict':
                temp = {}
                for pos, i in enumerate(results):
                    temp[pos_to_outname[pos]] = i

                yield temp

    return iterator_fn

def create_multihead_inference_fn(iterators, iterator_prefixes, joint_shape_and_dtypes, names, outname_to_pos, task_name2id, dev_count=1):
    
    def iterator(task_name):
        while True:
            id = task_name2id[task_name]
            # id = np.random.choice(task_ids, p=weights)
            task_id_tensor = np.array([id]).astype("int64")
            
            for i in range(dev_count):
                
                outputs = next(iterators[id]) # dict type

                prefix = iterator_prefixes[id]
                results = {}
                results['__task_id'] = task_id_tensor
                for outname, val in outputs.items():
                    task_outname = prefix + '.' + outname

                    if outname in names[id]:
                        idx = outname_to_pos[id][outname]
                        val = _check_and_adapt_shape_dtype(val, joint_shape_and_dtypes[id][idx], message=outname+': ')
                        results[outname] = val

                    if task_outname in names[id]:
                        idx = outname_to_pos[id][task_outname]
                        val = _check_and_adapt_shape_dtype(val, joint_shape_and_dtypes[id][idx], message=task_outname+': ')
                        results[task_outname] = val

                yield results

    return iterator


def create_multihead_iterator_fn(iterators, iterator_prefixes, joint_shape_and_dtypes, mrs, names, outname_to_pos, dev_count=1, keep_one_task=True):
    task_ids = range(len(iterators))
    weights = [mr / float(sum(mrs)) for mr in mrs]
    if not keep_one_task:
        dev_count = 1

    def iterator():
        while True:
            id = np.random.choice(task_ids, p=weights)
            task_id_tensor = np.array([id]).astype("int64")
            
            for i in range(dev_count):
                
                outputs = next(iterators[id]) # dict type

                prefix = iterator_prefixes[id]
                results = {}
                results['__task_id'] = task_id_tensor
                for outname, val in outputs.items():
                    task_outname = prefix + '.' + outname

                    if outname in names[id]:
                        idx = outname_to_pos[id][outname]
                        val = _check_and_adapt_shape_dtype(val, joint_shape_and_dtypes[id][idx], message=outname+': ')
                        results[outname] = val

                    if task_outname in names[id]:
                        idx = outname_to_pos[id][task_outname]
                        val = _check_and_adapt_shape_dtype(val, joint_shape_and_dtypes[id][idx], message=task_outname+': ')
                        results[task_outname] = val

                yield results

    return iterator


def create_joint_iterator_fn(iterators, iterator_prefixes, joint_shape_and_dtypes, mrs, outname_to_pos, dev_count=1, keep_one_task=True, verbose=0):
    """
        joint_shape_and_dtypes: 本质上是根据bb和parad的attr设定的，并且由reader中的attr自动填充-1（可变）维度得到，因此通过与iterator的校验可以完成runtime的batch正确性检查
    """

    task_ids = range(len(iterators))
    weights = [mr / float(sum(mrs)) for mr in mrs]
    if not keep_one_task:
        dev_count = 1

    results = _zero_batch(joint_shape_and_dtypes)
    outbuf = {}
    for id in task_ids:
        outputs = next(iterators[id]) # dict type
        outbuf[id] = outputs
        prefix = iterator_prefixes[id]
        for outname, val in outputs.items():
            task_outname = prefix + '.' + outname

            if outname in outname_to_pos:
                idx = outname_to_pos[outname]
                val = _check_and_adapt_shape_dtype(val, joint_shape_and_dtypes[idx], message=outname+': ')
                results[idx] = val

            if task_outname in outname_to_pos:
                idx = outname_to_pos[task_outname]
                val = _check_and_adapt_shape_dtype(val, joint_shape_and_dtypes[idx], message=task_outname+': ')
                results[idx] = val

    fake_batch = results
    dev_count_bak = dev_count

    def iterator():
        v = verbose
        has_show_warn = False
        while True:
            id = np.random.choice(task_ids, p=weights)
            results = fake_batch
            if v > 0:
                print('----- debug joint iterator -----')
                print('sampled task id: '+str(id))
            task_id_tensor = np.array([[id]]).astype("int64")
            
            for i in range(dev_count):
                
                results[outname_to_pos['__task_id']] = task_id_tensor
                assert outname_to_pos['__task_id'] == 0

                if id in outbuf:
                    outputs = outbuf[id]
                    del outbuf[id]
                else:
                    outputs = next(iterators[id]) # dict type

                if 'token_ids' in outputs:
                    val1 = len(outputs['token_ids'])
                    val = _check_and_adapt_shape_dtype([val1], [[1], 'int64'])
                    results[outname_to_pos['batch_size']] = val

                    val2 = len(outputs['token_ids'][0])
                    val = _check_and_adapt_shape_dtype([val2], [[1], 'int64'])
                    results[outname_to_pos['seqlen']] = val

                    val = _check_and_adapt_shape_dtype([val1*val2], [[1], 'int64'])
                    results[outname_to_pos['batchsize_x_seqlen']] = val
                else:
                    if not has_show_warn:
                        print('WARNING: token_ids not found in current batch, failed to yield batch_size, seqlen and batchsize_x_seqlen. (This message would be shown only once.)')
                        has_show_warn = True

                prefix = iterator_prefixes[id]
                for outname, val in outputs.items():
                    if v > 0:
                        print('reader generate: '+outname)
                    task_outname = prefix + '.' + outname

                    if outname in outname_to_pos:
                        idx = outname_to_pos[outname]
                        if v > 0:
                            print(outname + ' is insert in idx ' + str(idx))
                        val = _check_and_adapt_shape_dtype(val, joint_shape_and_dtypes[idx], message=outname+': ')
                        results[idx] = val

                    if task_outname in outname_to_pos:
                        idx = outname_to_pos[task_outname]
                        if v > 0:
                            print(task_outname + ' is insert in idx ' + str(idx))
                        val = _check_and_adapt_shape_dtype(val, joint_shape_and_dtypes[idx], message=task_outname+': ')
                        results[idx] = val

                if v > 0:
                    print('yielded batch len and shapes:')
                    print(len(results))
                    for i in results:
                        print(np.shape(i))
                    print('')
                    v -= 1
                yield results

    return iterator


def merge_input_attrs(backbone_attr, task_attrs, insert_taskid=True, insert_batchsize=False, insert_seqlen=False, insert_batchsize_x_seqlen=False):
    """
    Args:
        task_attrs(list[dict]|dict): task input attributes, key=attr_name, val=[shape, dtype], support single task and nested tasks
    """
    if isinstance(task_attrs, dict):
        task_attrs = [task_attrs]

    ret = []
    names = []
    start = 0
    if insert_taskid:
        ret.append(([1, 1], 'int64'))
        names.append('__task_id')
        start += 1
    
    if insert_batchsize:
        ret.append(([1], 'int64'))
        names.append('batch_size')
        start += 1

    if insert_seqlen:
        ret.append(([1], 'int64'))
        names.append('seqlen')
        start += 1

    if insert_batchsize_x_seqlen:
        ret.append(([1], 'int64'))
        names.append(u'batchsize_x_seqlen')
        start += 1
        
    names += sorted(backbone_attr.keys())
    ret.extend([backbone_attr[k] for k in names[start:]])
    name_to_position = {}
    # pos=0 is for task_id, thus we start from 1
    for pos, k in enumerate(names):
        name_to_position[k] = pos
    for task_attr in task_attrs:
        task_names = sorted(task_attr.keys())
        names.extend(task_names)
        ret.extend([task_attr[k] for k in task_names])
        for pos, k in enumerate(task_names, start=len(name_to_position)):
            name_to_position[k] = pos
    return names, ret, name_to_position


================================================
FILE: paddlepalm/utils/saver.py
================================================
# -*- coding: UTF-8 -*-
#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from __future__ import print_function

import os
import six
import ast
import copy
import tarfile
import shutil

import numpy as np
import paddle.fluid as fluid

def init_checkpoint(exe, init_checkpoint_path, main_program, skip_list = []):
    assert os.path.exists(
        init_checkpoint_path), "[%s] cann't be found." % init_checkpoint_path

    def existed_persitables(var):
        if not fluid.io.is_persistable(var):
            return False
        if var.name in skip_list:
            return False
        return os.path.exists(os.path.join(init_checkpoint_path, var.name))

    fluid.io.load_vars(
        exe,
        init_checkpoint_path,
        main_program=main_program,
        predicate=existed_persitables)
    print("Load model from {}".format(init_checkpoint_path))


def init_pretraining_params(exe,
                            pretraining_params_path,
                            convert,
                            main_program,
                            strict=False):
                            
    assert os.path.exists(pretraining_params_path
                          ), "[%s] cann't be found." % pretraining_params_path

    if convert:
        assert os.path.exists(os.path.join(pretraining_params_path, '__palmmodel__')), "__palmmodel__ not found."

        with tarfile.open(os.path.join(pretraining_params_path, '__palmmodel__'), 'r') as f:
            f.extractall(os.path.join(pretraining_params_path, '.temp'))
        
        log_path = os.path.join(pretraining_params_path, '__palmmodel__')
        pretraining_params_path = os.path.join(pretraining_params_path, '.temp')

    else:
        log_path = pretraining_params_path
    
    print("Loading pretraining parameters from {}...".format(pretraining_params_path))

    def existed_params(var):
        if not isinstance(var, fluid.framework.Parameter):
            return False
        if not os.path.exists(os.path.join(pretraining_params_path, var.name)):
            if strict:
                raise Exception('Error: {} not found in {}.'.format(var.name, log_path))
            else:
                print('Warning: {} not found in {}.'.format(var.name, log_path))
        return os.path.exists(os.path.join(pretraining_params_path, var.name))

    fluid.io.load_vars(
        exe,
        pretraining_params_path,
        main_program=main_program,
        predicate=existed_params)
    if convert:
        shutil.rmtree(pretraining_params_path)
    print('')


================================================
FILE: paddlepalm/utils/textprocess_helper.py
================================================
# -*- coding: UTF-8 -*-
#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

def is_whitespace(c):
    if c == " " or c == "\t" or c == "\r" or c == "\n" or ord(c) == 0x202F:
        return True
    return False


================================================
FILE: setup.cfg
================================================
[metadata]

name = paddlepalm

author = zhangyiming
author_email = zhangyiming04@baidu.com

version = 2.1.0

description = PaddlePALM
long_description = file: README.md
long_description_content_type = text/markdown

home_page = https://github.com/PaddlePaddle/PALM
license = Apache 2.0

classifier =
    Private :: Do Not Upload
    Programming Language :: Python
    Programming Language :: Python :: 2
    Programming Language :: Python :: 2.7
    Programming Language :: Python :: 3
    Programming Language :: Python :: 3.5
    Programming Language :: Python :: 3.6
    Programming Language :: Python :: 3.7

keywords =
    paddlepaddle
    paddle
    nlp
    pretrain
    multi-task-learning

[options]

packages = find:

include_package_data = True
zip_safe = False

[sdist]
dist_dir = output/dist

[bdist_wheel]
dist_dir = output/dist

[easy_install]
index_url = http://pip.baidu.com/root/baidu/+simple/


================================================
FILE: setup.py
================================================
# -*- coding: UTF-8 -*-
################################################################################
#
#   Copyright (c) 2019  Baidu.com, Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
################################################################################
"""
Setup script.
Authors: zhouxiangyang(zhouxiangyang@baidu.com)
Date:    2020/2/4 00:00:01
"""
import setuptools
with open("README.md", "r") as fh:
    long_description = fh.read()
setuptools.setup(
    name="paddlepalm",
    version="2.1.0",
    author="PaddlePaddle",
    author_email="zhangyiming04@baidu.com",
    description="a flexible, general and easy-to-use NLP large-scale pretraining and multi-task learning framework.",
    # long_description=long_description,
    # long_description_content_type="text/markdown",
    url="https://github.com/PaddlePaddle/PALM",
    # packages=setuptools.find_packages(),
    packages = ['paddlepalm', 
        'paddlepalm.backbone', 
        'paddlepalm.backbone.utils', 
        'paddlepalm.optimizer',
        'paddlepalm.reader', 
        'paddlepalm.reader.utils', 
        'paddlepalm.head', 
        'paddlepalm.distribute', 
        'paddlepalm.lr_sched', 
        'paddlepalm.tokenizer', 
        'paddlepalm.utils'],
    package_dir={'paddlepalm':'./paddlepalm',
                 'paddlepalm.backbone':'./paddlepalm/backbone',
                 'paddlepalm.backbone.utils':'./paddlepalm/backbone/utils',
                 'paddlepalm.optimizer':'./paddlepalm/optimizer',
                 'paddlepalm.lr_sched': './paddlepalm/lr_sched',
                 'paddlepalm.distribute': './paddlepalm/distribute',
                 'paddlepalm.reader':'./paddlepalm/reader',
                 'paddlepalm.reader.utils':'./paddlepalm/reader/utils',
                 'paddlepalm.head':'./paddlepalm/head',
                 'paddlepalm.tokenizer':'./paddlepalm/tokenizer',
                 'paddlepalm.utils':'./paddlepalm/utils'},
    platforms = "any",
    license='Apache 2.0',
    classifiers = [
            'License :: OSI Approved :: Apache Software License',
            'Programming Language :: Python',
            'Programming Language :: Python :: 2',
            'Programming Language :: Python :: 2.7',
            'Programming Language :: Python :: 3',
            'Programming Language :: Python :: 3.5',
            'Programming Language :: Python :: 3.6',
            'Programming Language :: Python :: 3.7',
          ],
    install_requires = [
        'paddlepaddle-gpu>=1.8.0'
    ]
)


================================================
FILE: test/test2/config.yaml
================================================
ask_instance: "mrqa, mlm4mrqa, match4mrqa"
target_tag: 1, 0, 0
mix_ratio: 1.0, 0.5, 0.5

save_path: "output_model/secondrun"

backbone: "ernie"
backbone_config_path: "../../pretrain_model/ernie/ernie_config.json"

vocab_path: "../../pretrain_model/ernie/vocab.txt"
do_lower_case: True
max_seq_len: 512

batch_size: 4
num_epochs: 2
optimizer: "adam"
learning_rate: 3e-5
warmup_proportion: 0.1
weight_decay: 0.1

print_every_n_steps: 1


================================================
FILE: test/test2/run.py
================================================
# coding=utf-8
import paddlepalm as palm
import json

if __name__ == '__main__':

    max_seqlen = 512
    batch_size = 4
    num_epochs = 2
    lr = 1e-3
    vocab_path = './pretrain/ernie/vocab.txt'

    train_file = './data/cls4mrqa/train.tsv'
    predict_file = './data/cls4mrqa/dev.tsv'

    config = json.load(open('./pretrain/ernie/ernie_config.json'))
    # ernie = palm.backbone.ERNIE(...)
    ernie = palm.backbone.ERNIE.from_config(config)

    # cls_reader2 = palm.reader.cls(train_file_topic, vocab_path, batch_size, max_seqlen)
    # cls_reader3 = palm.reader.cls(train_file_subj, vocab_path, batch_size, max_seqlen)
    # topic_trainer = palm.Trainer('topic_cls', cls_reader2, cls)
    # subj_trainer = palm.Trainer('subj_cls', cls_reader3, cls)

    # 创建该分类任务的reader，由诸多参数控制数据集读入格式、文件数量、预处理规则等
    cls_reader = palm.reader.ClassifyReader(vocab_path, max_seqlen)
    cls_reader2 = palm.reader.ClassifyReader(vocab_path, max_seqlen)
    print(cls_reader.outputs_attr)
    # 不同的backbone会对任务reader有不同的特征要求，例如对于分类任务，基本的输入feature为token_ids和label_ids，但是对于BERT，还要求从输入中额外提取position、segment、input_mask等特征，因此经过register后，reader会自动补充backbone所要求的字段
    cls_reader.register_with(ernie)
    cls_reader2.register_with(ernie)
    print(cls_reader.outputs_attr)

    print("preparing data...")
    print(cls_reader.num_examples)
    cls_reader.load_data(train_file, batch_size)
    cls_reader2.load_data(train_file, batch_size)
    print(cls_reader.num_examples)
    print('done!')

    # 创建任务头（task head），如分类、匹配、机器阅读理解等。每个任务头有跟该任务相关的必选/可选参数。注意，任务头与reader是解耦合的，只要任务头依赖的数据集侧的字段能被reader提供，那么就是合法的
    cls_head = palm.head.Classify(4, 1024, 0.1)
    cls_head2 = palm.head.Classify(4, 1024, 0.1)

    # 根据reader和任务头来创建一个训练器trainer，trainer代表了一个训练任务，内部维护着训练进程、和任务的关键信息，并完成合法性校验，该任务的模型保存、载入等相关规则控制
    trainer = palm.Trainer('cls')
    trainer2 = palm.Trainer('senti_cls')
    mh_trainer = palm.MultiHeadTrainer([trainer, trainer2])

    # match4mrqa.reuse_head_with(mrc4mrqa)

    # data_vars = cls_reader.build()
    # output_vars = ernie.build(data_vars)
    # cls_head.build({'backbone': output_vars, 'reader': data_vars})

    loss_var = mh_trainer.build_forward(ernie, [cls_head, cls_head2])

    n_steps = cls_reader.num_examples * num_epochs // batch_size
    warmup_steps = int(0.1 * n_steps)
    print(warmup_steps)
    sched = palm.lr_sched.TriangularSchedualer(warmup_steps, n_steps)

    adam = palm.optimizer.Adam(loss_var, lr, sched)

    mh_trainer.build_backward(optimizer=adam, weight_decay=0.001)
    
    # mh_trainer.random_init_params()
    mh_trainer.load_pretrain('pretrain/ernie/params')

    # trainer.train(iterator_fn, print_steps=1, save_steps=5, save_path='outputs', save_type='ckpt,predict')
    mh_trainer.fit_readers_with_mixratio([cls_reader, cls_reader2], 'cls', 2)
    mh_trainer.train(print_steps=1)
    # trainer.save()


================================================
FILE: test/test2/run.sh
================================================
export CUDA_VISIBLE_DEVICES=3
python run.py 


================================================
FILE: test/test3/config.yaml
================================================
task_instance: "cls1, cls2, cls3, cls4, cls5, cls6"

task_reuse_tag: 0,0,1,1,0,2

save_path: "output_model/thirdrun"

backbone: "ernie"
backbone_config_path: "../../pretrain_model/ernie/ernie_config.json"

vocab_path: "../../pretrain_model/ernie/vocab.txt"
do_lower_case: True
max_seq_len: 512

batch_size: 4
num_epochs: 2
optimizer: "adam"
learning_rate: 3e-5
warmup_proportion: 0.1
weight_decay: 0.1

print_every_n_steps: 1


================================================
FILE: test/test3/run.py
================================================
# coding=utf-8
import paddlepalm as palm
import json

if __name__ == '__main__':

    max_seqlen = 512
    batch_size = 4
    num_epochs = 2
    lr = 1e-3
    vocab_path = './pretrain/ernie/vocab.txt'

    train_file = './data/cls4mrqa/train.tsv'
    predict_file = './data/cls4mrqa/dev.tsv'

    config = json.load(open('./pretrain/ernie/ernie_config.json'))
    # ernie = palm.backbone.ERNIE(...)
    ernie = palm.backbone.ERNIE.from_config(config)

    # cls_reader2 = palm.reader.cls(train_file_topic, vocab_path, batch_size, max_seqlen)
    # cls_reader3 = palm.reader.cls(train_file_subj, vocab_path, batch_size, max_seqlen)
    # topic_trainer = palm.Trainer('topic_cls', cls_reader2, cls)
    # subj_trainer = palm.Trainer('subj_cls', cls_reader3, cls)

    # 创建该分类任务的reader，由诸多参数控制数据集读入格式、文件数量、预处理规则等
    cls_reader = palm.reader.ClassifyReader(vocab_path, max_seqlen)
<<<<<<< HEAD:test/test2/run.py
    cls_reader2 = palm.reader.ClassifyReader(vocab_path, max_seqlen)
=======
    predict_cls_reader = palm.reader.ClassifyReader(vocab_path, max_seqlen, phase='predict')
>>>>>>> remotes/upstream/r0.3-api:test/test3/run.py
    print(cls_reader.outputs_attr)
    print(predict_cls_reader.outputs_attr)
    # 不同的backbone会对任务reader有不同的特征要求，例如对于分类任务，基本的输入feature为token_ids和label_ids，但是对于BERT，还要求从输入中额外提取position、segment、input_mask等特征，因此经过register后，reader会自动补充backbone所要求的字段
    cls_reader.register_with(ernie)
    cls_reader2.register_with(ernie)
    print(cls_reader.outputs_attr)
<<<<<<< HEAD:test/test2/run.py

    print("preparing data...")
    print(cls_reader.num_examples)
    cls_reader.load_data(train_file, batch_size)
    cls_reader2.load_data(train_file, batch_size)
=======
    print(predict_cls_reader.outputs_attr)

    print("preparing data...")
    print(cls_reader.num_examples)
    cls_reader.load_data(train_file, batch_size, num_epochs=num_epochs)
>>>>>>> remotes/upstream/r0.3-api:test/test3/run.py
    print(cls_reader.num_examples)
    print('done!')

    # 创建任务头（task head），如分类、匹配、机器阅读理解等。每个任务头有跟该任务相关的必选/可选参数。注意，任务头与reader是解耦合的，只要任务头依赖的数据集侧的字段能被reader提供，那么就是合法的
    cls_head = palm.head.Classify(4, 1024, 0.1)
<<<<<<< HEAD:test/test2/run.py
    cls_head2 = palm.head.Classify(4, 1024, 0.1)

    # 根据reader和任务头来创建一个训练器trainer，trainer代表了一个训练任务，内部维护着训练进程、和任务的关键信息，并完成合法性校验，该任务的模型保存、载入等相关规则控制
    trainer = palm.Trainer('cls')
    trainer2 = palm.Trainer('senti_cls')
    mh_trainer = palm.MultiHeadTrainer([trainer, trainer2])
=======

    # 根据reader和任务头来创建一个训练器trainer，trainer代表了一个训练任务，内部维护着训练进程、和任务的关键信息，并完成合法性校验，该任务的模型保存、载入等相关规则控制
    trainer = palm.Trainer('senti_cls')
>>>>>>> remotes/upstream/r0.3-api:test/test3/run.py

    # match4mrqa.reuse_head_with(mrc4mrqa)

    # data_vars = cls_reader.build()
    # output_vars = ernie.build(data_vars)
    # cls_head.build({'backbone': output_vars, 'reader': data_vars})

<<<<<<< HEAD:test/test2/run.py
    loss_var = mh_trainer.build_forward(ernie, [cls_head, cls_head2])

    n_steps = cls_reader.num_examples * num_epochs // batch_size
    warmup_steps = int(0.1 * n_steps)
    print(warmup_steps)
    sched = palm.lr_sched.TriangularSchedualer(warmup_steps, n_steps)
=======
    loss_var = trainer.build_forward(ernie, cls_head)

    # controller.build_forward()
    # Error! a head/backbone can be only build once! Try NOT to call build_forward method for any Trainer!

    # n_steps = cls_reader.num_examples * num_epochs // batch_size
    # warmup_steps = int(0.1 * n_steps)
    # print(warmup_steps)
    # sched = palm.lr_sched.TriangularSchedualer(warmup_steps, n_steps)
    sched = None
>>>>>>> remotes/upstream/r0.3-api:test/test3/run.py

    adam = palm.optimizer.Adam(loss_var, lr, sched)

    mh_trainer.build_backward(optimizer=adam, weight_decay=0.001)
    
    # mh_trainer.random_init_params()
    mh_trainer.load_pretrain('pretrain/ernie/params')

    # trainer.train(iterator_fn, print_steps=1, save_steps=5, save_path='outputs', save_type='ckpt,predict')
<<<<<<< HEAD:test/test2/run.py
    mh_trainer.fit_readers_with_mixratio([cls_reader, cls_reader2], 'cls', 2)
    mh_trainer.train(print_steps=1)
    # trainer.save()

=======
    trainer.fit_reader(cls_reader)
    trainer.train(print_steps=1)
    # trainer.save()

    print('prepare to predict...')
    pred_ernie = palm.backbone.ERNIE.from_config(config, phase='pred')
    cls_pred_head = palm.head.Classify(4, 1024, phase='pred')
    trainer.build_predict_forward(pred_ernie, cls_pred_head)

    predict_cls_reader.load_data(predict_file, 8)
    print(predict_cls_reader.num_examples)
    predict_cls_reader.register_with(pred_ernie)
    trainer.fit_reader(predict_cls_reader, phase='predict')
    print('predicting..')
    trainer.predict(print_steps=20)


    # controller = palm.Controller([mrqa, match4mrqa, mlm4mrqa])

    # loss = controller.build_forward(bb, mask_task=[])

    # n_steps = controller.estimate_train_steps(basetask=mrqa, num_epochs=2, batch_size=8, dev_count=4)
    # adam = palm.optimizer.Adam(loss)
    # sched = palm.schedualer.LinearWarmup(learning_rate, max_train_steps=n_steps, warmup_steps=0.1*n_steps)
    # 
    # controller.build_backward(optimizer=adam, schedualer=sched, weight_decay=0.001, use_ema=True, ema_decay=0.999)

    # controller.random_init_params()
    # controller.load_pretrain('../../pretrain_model/ernie/params')
    # controller.train()


    # controller = palm.Controller(config='config.yaml', task_dir='tasks', for_train=False)
    # controller.pred('mrqa', inference_model_dir='output_model/secondrun/mrqa/infer_model')


>>>>>>> remotes/upstream/r0.3-api:test/test3/run.py


================================================
FILE: test/test3/run.sh
================================================
export CUDA_VISIBLE_DEVICES=3

python run.py