Full Code of tensorflow/tensor2tensor for AI

master bafdc1b67730 cached

553 files

8.3 MB

2.2M tokens

7112 symbols

1 requests

Download .txt

Showing preview only (8,761K chars total). Download the full file or copy to clipboard to get everything.

Repository: tensorflow/tensor2tensor
Branch: master
Commit: bafdc1b67730
Files: 553
Total size: 8.3 MB

Directory structure:
gitextract_w47bzecb/

├── .gitignore
├── .travis.yml
├── AUTHORS
├── CONTRIBUTING.md
├── ISSUE_TEMPLATE.md
├── LICENSE
├── README.md
├── docs/
│   ├── cloud_mlengine.md
│   ├── cloud_tpu.md
│   ├── distributed_training.md
│   ├── index.md
│   ├── multi_problem.md
│   ├── new_model.md
│   ├── new_problem.md
│   ├── overview.md
│   ├── tutorials/
│   │   └── asr_with_transformer.md
│   └── walkthrough.md
├── floyd.yml
├── floyd_requirements.txt
├── oss_scripts/
│   ├── oss_integration_test.sh
│   ├── oss_pip_install.sh
│   ├── oss_release.sh
│   └── oss_tests.sh
├── pylintrc
├── setup.py
└── tensor2tensor/
    ├── __init__.py
    ├── bin/
    │   ├── __init__.py
    │   ├── build_vocab.py
    │   ├── make_tf_configs.py
    │   ├── t2t-avg-all
    │   ├── t2t-bleu
    │   ├── t2t-datagen
    │   ├── t2t-decoder
    │   ├── t2t-eval
    │   ├── t2t-exporter
    │   ├── t2t-insights-server
    │   ├── t2t-make-tf-configs
    │   ├── t2t-query-server
    │   ├── t2t-trainer
    │   ├── t2t-translate-all
    │   ├── t2t_attack.py
    │   ├── t2t_avg_all.py
    │   ├── t2t_bleu.py
    │   ├── t2t_datagen.py
    │   ├── t2t_decoder.py
    │   ├── t2t_distill.py
    │   ├── t2t_eval.py
    │   ├── t2t_prune.py
    │   ├── t2t_trainer.py
    │   ├── t2t_trainer_test.py
    │   └── t2t_translate_all.py
    ├── data_generators/
    │   ├── README.md
    │   ├── __init__.py
    │   ├── algorithmic.py
    │   ├── algorithmic_math.py
    │   ├── algorithmic_math_deepmind.py
    │   ├── algorithmic_math_test.py
    │   ├── algorithmic_math_two_variables.py
    │   ├── algorithmic_test.py
    │   ├── all_problems.py
    │   ├── allen_brain.py
    │   ├── allen_brain_test.py
    │   ├── audio.py
    │   ├── audio_encoder.py
    │   ├── audio_test.py
    │   ├── babi_qa.py
    │   ├── bair_robot_pushing.py
    │   ├── celeba.py
    │   ├── celeba_test.py
    │   ├── celebahq.py
    │   ├── cifar.py
    │   ├── cipher.py
    │   ├── cleaner_en_xx.py
    │   ├── cnn_dailymail.py
    │   ├── cola.py
    │   ├── common_voice.py
    │   ├── common_voice_test.py
    │   ├── conll_ner.py
    │   ├── desc2code.py
    │   ├── desc2code_test.py
    │   ├── dialog_abstract.py
    │   ├── dialog_cornell.py
    │   ├── dialog_dailydialog.py
    │   ├── dialog_opensubtitles.py
    │   ├── dialog_personachat.py
    │   ├── dna_encoder.py
    │   ├── dna_encoder_test.py
    │   ├── enwik8.py
    │   ├── fsns.py
    │   ├── function_docstring.py
    │   ├── gene_expression.py
    │   ├── gene_expression_test.py
    │   ├── generator_utils.py
    │   ├── generator_utils_test.py
    │   ├── google_robot_pushing.py
    │   ├── gym_env.py
    │   ├── gym_env_test.py
    │   ├── ice_parsing.py
    │   ├── image_lsun.py
    │   ├── image_utils.py
    │   ├── image_utils_test.py
    │   ├── imagenet.py
    │   ├── imagenet_test.py
    │   ├── imdb.py
    │   ├── inspect_tfrecord.py
    │   ├── lambada.py
    │   ├── librispeech.py
    │   ├── lm1b.py
    │   ├── lm1b_imdb.py
    │   ├── lm1b_mnli.py
    │   ├── mnist.py
    │   ├── moving_mnist.py
    │   ├── mrpc.py
    │   ├── mscoco.py
    │   ├── mscoco_test.py
    │   ├── multi_problem.py
    │   ├── multi_problem_v2.py
    │   ├── multi_problem_v2_test.py
    │   ├── multinli.py
    │   ├── ocr.py
    │   ├── ops/
    │   │   ├── pack_sequences_ops.cc
    │   │   ├── pack_sequences_ops_test.py
    │   │   ├── subword_text_encoder.cc
    │   │   ├── subword_text_encoder.h
    │   │   ├── subword_text_encoder_ops.cc
    │   │   ├── subword_text_encoder_ops_test.py
    │   │   ├── subword_text_encoder_test.cc
    │   │   └── testdata/
    │   │       └── subwords
    │   ├── paraphrase_ms_coco.py
    │   ├── paraphrase_ms_coco_test.py
    │   ├── pointer_generator_word.py
    │   ├── problem.py
    │   ├── problem_hparams.py
    │   ├── problem_test.py
    │   ├── program_search.py
    │   ├── program_search_test.py
    │   ├── ptb.py
    │   ├── qnli.py
    │   ├── quora_qpairs.py
    │   ├── rte.py
    │   ├── scitail.py
    │   ├── seq2edits.py
    │   ├── snli.py
    │   ├── speech_recognition.py
    │   ├── squad.py
    │   ├── sst_binary.py
    │   ├── stanford_nli.py
    │   ├── style_transfer.py
    │   ├── style_transfer_test.py
    │   ├── subject_verb_agreement.py
    │   ├── test_data/
    │   │   ├── 1.csv
    │   │   ├── corpus-1.txt
    │   │   ├── corpus-2.txt
    │   │   ├── vocab-1.txt
    │   │   └── vocab-2.txt
    │   ├── text_encoder.py
    │   ├── text_encoder_build_subword.py
    │   ├── text_encoder_test.py
    │   ├── text_problems.py
    │   ├── text_problems_test.py
    │   ├── timeseries.py
    │   ├── timeseries_data_generator.py
    │   ├── timeseries_data_generator_test.py
    │   ├── timeseries_test.py
    │   ├── tokenizer.py
    │   ├── tokenizer_test.py
    │   ├── transduction_problems.py
    │   ├── transduction_problems_test.py
    │   ├── translate.py
    │   ├── translate_encs.py
    │   ├── translate_encs_cubbitt.py
    │   ├── translate_ende.py
    │   ├── translate_ende_test.py
    │   ├── translate_enes.py
    │   ├── translate_enet.py
    │   ├── translate_enfr.py
    │   ├── translate_enid.py
    │   ├── translate_enmk.py
    │   ├── translate_enro.py
    │   ├── translate_entn.py
    │   ├── translate_envi.py
    │   ├── translate_enzh.py
    │   ├── translate_test.py
    │   ├── video_generated.py
    │   ├── video_utils.py
    │   ├── video_utils_test.py
    │   ├── vqa.py
    │   ├── vqa_utils.py
    │   ├── wiki.py
    │   ├── wiki_lm.py
    │   ├── wiki_multi_problems.py
    │   ├── wiki_revision.py
    │   ├── wiki_revision_utils.py
    │   ├── wikifact/
    │   │   └── README.md
    │   ├── wikisum/
    │   │   ├── README.md
    │   │   ├── __init__.py
    │   │   ├── delete_instances.sh
    │   │   ├── generate_vocab.py
    │   │   ├── get_references_commoncrawl.py
    │   │   ├── get_references_web.py
    │   │   ├── get_references_web_single_group.py
    │   │   ├── html.py
    │   │   ├── parallel_launch.py
    │   │   ├── produce_examples.py
    │   │   ├── test_data/
    │   │   │   ├── para_bad1.txt
    │   │   │   └── para_good1.txt
    │   │   ├── utils.py
    │   │   ├── utils_test.py
    │   │   ├── validate_data.py
    │   │   └── wikisum.py
    │   ├── wikitext103.py
    │   ├── wnli.py
    │   ├── wsj_parsing.py
    │   ├── yelp_full.py
    │   └── yelp_polarity.py
    ├── envs/
    │   ├── __init__.py
    │   ├── env_problem.py
    │   ├── env_problem_utils.py
    │   ├── env_problem_utils_test.py
    │   ├── gym_env_problem.py
    │   ├── gym_env_problem_test.py
    │   ├── gym_spaces_utils.py
    │   ├── gym_spaces_utils_test.py
    │   ├── mujoco_problems.py
    │   ├── mujoco_problems_test.py
    │   ├── rendered_env_problem.py
    │   ├── rendered_env_problem_test.py
    │   ├── tic_tac_toe_env.py
    │   ├── tic_tac_toe_env_problem.py
    │   ├── tic_tac_toe_env_problem_test.py
    │   ├── tic_tac_toe_env_test.py
    │   ├── time_step.py
    │   ├── time_step_test.py
    │   ├── trajectory.py
    │   └── trajectory_test.py
    ├── insights/
    │   ├── README.md
    │   ├── __init__.py
    │   ├── graph.py
    │   ├── insight_configuration.proto
    │   ├── polymer/
    │   │   ├── .bowerrc
    │   │   ├── attention_visualization/
    │   │   │   ├── attention-visualization.html
    │   │   │   └── attention-visualization.js
    │   │   ├── bower.json
    │   │   ├── common-types.js
    │   │   ├── explore_view/
    │   │   │   ├── explore-view.html
    │   │   │   └── explore-view.js
    │   │   ├── graph_visualization/
    │   │   │   ├── graph-visualization.html
    │   │   │   └── graph-visualization.js
    │   │   ├── index.html
    │   │   ├── insights_app/
    │   │   │   ├── insights-app.html
    │   │   │   └── insights-app.js
    │   │   ├── language_selector/
    │   │   │   ├── language-selector-content.html
    │   │   │   ├── language-selector-content.js
    │   │   │   ├── language-selector.html
    │   │   │   └── language-selector.js
    │   │   ├── processing_visualization/
    │   │   │   ├── processing-visualization.html
    │   │   │   └── processing-visualization.js
    │   │   ├── query_card/
    │   │   │   ├── query-card.html
    │   │   │   └── query-card.js
    │   │   ├── tensor2tensor.html
    │   │   └── translation_result/
    │   │       ├── translation-result.html
    │   │       └── translation-result.js
    │   ├── query_processor.py
    │   ├── server.py
    │   └── transformer_model.py
    ├── layers/
    │   ├── __init__.py
    │   ├── area_attention.py
    │   ├── area_attention_test.py
    │   ├── common_attention.py
    │   ├── common_attention_test.py
    │   ├── common_audio.py
    │   ├── common_hparams.py
    │   ├── common_image_attention.py
    │   ├── common_image_attention_test.py
    │   ├── common_layers.py
    │   ├── common_layers_test.py
    │   ├── common_video.py
    │   ├── common_video_test.py
    │   ├── discretization.py
    │   ├── discretization_test.py
    │   ├── latent_layers.py
    │   ├── latent_layers_test.py
    │   ├── message_passing_attention.py
    │   ├── modalities.py
    │   ├── modalities_test.py
    │   ├── ngram.py
    │   ├── ngram_test.py
    │   ├── transformer_glow_layers.py
    │   ├── transformer_glow_layers_ops.py
    │   ├── transformer_glow_layers_ops_test.py
    │   ├── transformer_glow_layers_test.py
    │   ├── transformer_layers.py
    │   ├── transformer_memory.py
    │   ├── transformer_memory_test.py
    │   ├── vq_discrete.py
    │   └── vqa_layers.py
    ├── metrics/
    │   ├── __init__.py
    │   ├── video_conditional_fvd.py
    │   └── video_conditional_fvd_test.py
    ├── models/
    │   ├── README.md
    │   ├── __init__.py
    │   ├── basic.py
    │   ├── basic_test.py
    │   ├── bytenet.py
    │   ├── bytenet_test.py
    │   ├── distillation.py
    │   ├── evolved_transformer.py
    │   ├── evolved_transformer_test.py
    │   ├── image_transformer.py
    │   ├── image_transformer_2d.py
    │   ├── image_transformer_2d_test.py
    │   ├── image_transformer_test.py
    │   ├── lstm.py
    │   ├── lstm_test.py
    │   ├── mtf_image_transformer.py
    │   ├── mtf_image_transformer_test.py
    │   ├── mtf_resnet.py
    │   ├── mtf_transformer.py
    │   ├── mtf_transformer2.py
    │   ├── mtf_transformer_test.py
    │   ├── neural_architecture_search/
    │   │   ├── README.md
    │   │   ├── __init__.py
    │   │   ├── nas_layers.py
    │   │   ├── nas_layers_test.py
    │   │   ├── nas_model.py
    │   │   └── nas_model_test.py
    │   ├── neural_assistant.py
    │   ├── neural_gpu.py
    │   ├── neural_gpu_test.py
    │   ├── research/
    │   │   ├── __init__.py
    │   │   ├── adafactor_experiments.py
    │   │   ├── aligned.py
    │   │   ├── attention_lm.py
    │   │   ├── attention_lm_moe.py
    │   │   ├── autoencoders.py
    │   │   ├── autoencoders_test.py
    │   │   ├── cycle_gan.py
    │   │   ├── gene_expression.py
    │   │   ├── gene_expression_test.py
    │   │   ├── glow.py
    │   │   ├── glow_init_hook.py
    │   │   ├── glow_ops.py
    │   │   ├── glow_ops_test.py
    │   │   ├── glow_test.py
    │   │   ├── lm_experiments.py
    │   │   ├── moe.py
    │   │   ├── moe_experiments.py
    │   │   ├── multiquery_paper.py
    │   │   ├── neural_stack.py
    │   │   ├── neural_stack_test.py
    │   │   ├── residual_shuffle_exchange.py
    │   │   ├── rl.py
    │   │   ├── shuffle_network.py
    │   │   ├── similarity_transformer.py
    │   │   ├── super_lm.py
    │   │   ├── transformer_aux.py
    │   │   ├── transformer_aux_test.py
    │   │   ├── transformer_moe.py
    │   │   ├── transformer_nat.py
    │   │   ├── transformer_parallel.py
    │   │   ├── transformer_revnet.py
    │   │   ├── transformer_revnet_test.py
    │   │   ├── transformer_seq2edits.py
    │   │   ├── transformer_sketch.py
    │   │   ├── transformer_symshard.py
    │   │   ├── transformer_vae.py
    │   │   ├── transformer_vae_flow_prior.py
    │   │   ├── transformer_vae_flow_prior_ops.py
    │   │   ├── transformer_vae_test.py
    │   │   ├── universal_transformer.py
    │   │   ├── universal_transformer_test.py
    │   │   ├── universal_transformer_util.py
    │   │   ├── vqa_attention.py
    │   │   ├── vqa_attention_test.py
    │   │   ├── vqa_recurrent_self_attention.py
    │   │   └── vqa_self_attention.py
    │   ├── resnet.py
    │   ├── resnet_test.py
    │   ├── revnet.py
    │   ├── revnet_test.py
    │   ├── shake_shake.py
    │   ├── slicenet.py
    │   ├── slicenet_test.py
    │   ├── text_cnn.py
    │   ├── transformer.py
    │   ├── transformer_test.py
    │   ├── vanilla_gan.py
    │   ├── video/
    │   │   ├── __init__.py
    │   │   ├── base.py
    │   │   ├── base_vae.py
    │   │   ├── basic_deterministic.py
    │   │   ├── basic_deterministic_params.py
    │   │   ├── basic_deterministic_test.py
    │   │   ├── basic_recurrent.py
    │   │   ├── basic_recurrent_test.py
    │   │   ├── basic_stochastic.py
    │   │   ├── basic_stochastic_test.py
    │   │   ├── emily.py
    │   │   ├── emily_test.py
    │   │   ├── epva.py
    │   │   ├── epva_params.py
    │   │   ├── next_frame_glow.py
    │   │   ├── nfg_conv3d_test.py
    │   │   ├── nfg_conv_lstm_test.py
    │   │   ├── nfg_conv_test.py
    │   │   ├── nfg_interpolate.py
    │   │   ├── nfg_test_utils.py
    │   │   ├── nfg_uncond_test.py
    │   │   ├── savp.py
    │   │   ├── savp_params.py
    │   │   ├── savp_test.py
    │   │   ├── sv2p.py
    │   │   ├── sv2p_params.py
    │   │   ├── sv2p_test.py
    │   │   └── tests_utils.py
    │   ├── xception.py
    │   └── xception_test.py
    ├── notebooks/
    │   ├── Transformer_translate.ipynb
    │   ├── asr_transformer.ipynb
    │   ├── hello_t2t-rl.ipynb
    │   ├── hello_t2t.ipynb
    │   └── t2t_problem.ipynb
    ├── problems.py
    ├── problems_colab.py
    ├── problems_test.py
    ├── rl/
    │   ├── README.md
    │   ├── __init__.py
    │   ├── batch_dqn_agent_test.py
    │   ├── batch_runner_test.py
    │   ├── datagen_with_agent.py
    │   ├── dopamine_connector.py
    │   ├── envs/
    │   │   ├── __init__.py
    │   │   ├── in_graph_batch_env.py
    │   │   ├── py_func_batch_env.py
    │   │   ├── simulated_batch_env.py
    │   │   ├── simulated_batch_gym_env.py
    │   │   └── tf_atari_wrappers.py
    │   ├── evaluator.py
    │   ├── evaluator_test.py
    │   ├── gym_utils.py
    │   ├── gym_utils_test.py
    │   ├── player.py
    │   ├── player_utils.py
    │   ├── policy_learner.py
    │   ├── ppo.py
    │   ├── ppo_learner.py
    │   ├── restarter.py
    │   ├── restarter_test.py
    │   ├── rl_utils.py
    │   ├── trainer_model_based.py
    │   ├── trainer_model_based_agent_only.py
    │   ├── trainer_model_based_params.py
    │   ├── trainer_model_based_recurrent_test.py
    │   ├── trainer_model_based_stochastic_test.py
    │   ├── trainer_model_based_sv2p_test.py
    │   ├── trainer_model_based_test.py
    │   ├── trainer_model_free.py
    │   ├── trainer_model_free_test.py
    │   └── trainer_model_free_tictactoe_test.py
    ├── serving/
    │   ├── README.md
    │   ├── __init__.py
    │   ├── export.py
    │   ├── query.py
    │   └── serving_utils.py
    ├── test_data/
    │   ├── example_usr_dir/
    │   │   ├── __init__.py
    │   │   ├── my_submodule.py
    │   │   └── requirements.txt
    │   ├── transformer_test_ckpt/
    │   │   ├── checkpoint
    │   │   ├── flags.txt
    │   │   ├── hparams.json
    │   │   ├── model.ckpt-1.data-00000-of-00002
    │   │   ├── model.ckpt-1.data-00001-of-00002
    │   │   ├── model.ckpt-1.index
    │   │   └── model.ckpt-1.meta
    │   ├── vocab.translate_ende_wmt32k.32768.subwords
    │   └── vocab.translate_ende_wmt8k.8192.subwords
    ├── utils/
    │   ├── __init__.py
    │   ├── adafactor.py
    │   ├── adafactor_test.py
    │   ├── adv_attack_utils.py
    │   ├── avg_checkpoints.py
    │   ├── beam_search.py
    │   ├── beam_search_test.py
    │   ├── bleu_hook.py
    │   ├── bleu_hook_test.py
    │   ├── checkpoint_compatibility_test.py
    │   ├── cloud_mlengine.py
    │   ├── compute_video_metrics.py
    │   ├── contrib.py
    │   ├── data_reader.py
    │   ├── data_reader_test.py
    │   ├── decoding.py
    │   ├── devices.py
    │   ├── diet.py
    │   ├── diet_test.py
    │   ├── expert_utils.py
    │   ├── expert_utils_test.py
    │   ├── flags.py
    │   ├── get_cnndm_rouge.sh
    │   ├── get_ende_bleu.sh
    │   ├── get_rouge.py
    │   ├── hparam.py
    │   ├── hparam_test.py
    │   ├── hparams_lib.py
    │   ├── hparams_lib_test.py
    │   ├── learning_rate.py
    │   ├── metrics.py
    │   ├── metrics_hook.py
    │   ├── metrics_hook_test.py
    │   ├── metrics_test.py
    │   ├── misc_utils.py
    │   ├── misc_utils_test.py
    │   ├── mlperf_log.py
    │   ├── mlperf_tags.py
    │   ├── mtf_model.py
    │   ├── multistep_optimizer.py
    │   ├── multistep_optimizer_test.py
    │   ├── multistep_with_adamoptimizer.py
    │   ├── multistep_with_adamoptimizer_test.py
    │   ├── optimize.py
    │   ├── optimize_test.py
    │   ├── partial_checkpoint_load_hook.py
    │   ├── pruning_utils.py
    │   ├── quantization.py
    │   ├── registry.py
    │   ├── registry_test.py
    │   ├── restore_hook.py
    │   ├── rouge.py
    │   ├── rouge_test.py
    │   ├── sari_hook.py
    │   ├── sari_hook_test.py
    │   ├── scheduled_sampling.py
    │   ├── t2t_model.py
    │   ├── t2t_model_test.py
    │   ├── test_utils.py
    │   ├── test_utils_test.py
    │   ├── trainer_lib.py
    │   ├── trainer_lib_test.py
    │   ├── update_ops_hook.py
    │   ├── usr_dir.py
    │   ├── video/
    │   │   ├── prediction2gif.py
    │   │   └── reward_confusion.py
    │   ├── video2gif.py
    │   ├── video_metrics.py
    │   ├── video_metrics_test.py
    │   ├── yellowfin.py
    │   └── yellowfin_test.py
    └── visualization/
        ├── TransformerVisualization.ipynb
        ├── __init__.py
        ├── attention.js
        ├── attention.py
        ├── visualization.py
        └── visualization_test.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
# Compiled python modules.
*.pyc

# Byte-compiled
_pycache__/
.cache/

# Python egg metadata, regenerated from source files by setuptools.
/*.egg-info
.eggs/

# PyPI distribution artifacts.
build/
dist/

# Sublime project files
*.sublime-project
*.sublime-workspace

# Tests
.pytest_cache/

# Other
*.DS_Store


================================================
FILE: .travis.yml
================================================
sudo: required
language: python
cache: pip
git:
  depth: 3
  quiet: true
services:
  - docker
python:
  - "3.6"
env:
  global:
    - T2T_PROBLEM=algorithmic_reverse_binary40_test
    - T2T_DATA_DIR=/tmp/t2t-data
    - T2T_TRAIN_DIR=/tmp/t2t-train
    - TF_LATEST="1.15.*"
    # This is necessary to have gsutil work with Python 2.7
    - BOTO_CONFIG=/dev/null
  matrix:
    - TF_VERSION="1.15.*"
install:
  - ./oss_scripts/oss_pip_install.sh
script:
  - ./oss_scripts/oss_tests.sh
  - ./oss_scripts/oss_integration_test.sh

  # Conditional commands should each be in a separate block to get proper
  # errors on Travis.
  #
  # TODO(afrozm): Re-enable if this becomes an issue.
  # - if [[ "$TRAVIS_PYTHON_VERSION" == "2.7" ]]; then
  #       pylint -j 2 tensor2tensor;
  #   fi


================================================
FILE: AUTHORS
================================================
# This is the list of T2T authors for copyright purposes.
#
# This does not necessarily list everyone who has contributed code, since in
# some cases, their employer may be the copyright holder.  To see the full list
# of contributors, see the revision history in source control.

Google Inc.
Artit Wangperawong

================================================
FILE: CONTRIBUTING.md
================================================
# How to Contribute

# Issues

* Please tag your issue with `bug`, `feature request`, or `question` to help us
  effectively respond.
* Please include the versions of TensorFlow and Tensor2Tensor you are running
  (run `pip list | grep tensor`)
* Please provide the command line you ran as well as the log output.

# Pull Requests

We'd love to accept your patches and contributions to this project. There are
just a few small guidelines you need to follow.

## Contributor License Agreement

Contributions to this project must be accompanied by a Contributor License
Agreement. You (or your employer) retain the copyright to your contribution,
this simply gives us permission to use and redistribute your contributions as
part of the project. Head over to <https://cla.developers.google.com/> to see
your current agreements on file or to sign a new one.

You generally only need to submit a CLA once, so if you've already submitted one
(even if it was for a different project), you probably don't need to do it
again.

## Code reviews

All submissions, including submissions by project members, require review. We
use GitHub pull requests for this purpose. Consult
[GitHub Help](https://help.github.com/articles/about-pull-requests/) for more
information on using pull requests.


================================================
FILE: ISSUE_TEMPLATE.md
================================================
### Description

...

### Environment information

```
OS: <your answer here>

$ pip freeze | grep tensor
# your output here

$ python -V
# your output here
```

### For bugs: reproduction and error logs

```
# Steps to reproduce:
...
```

```
# Error logs:
...
```


================================================
FILE: LICENSE
================================================

                                 Apache License
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/

   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

   1. Definitions.

      "License" shall mean the terms and conditions for use, reproduction,
      and distribution as defined by Sections 1 through 9 of this document.

      "Licensor" shall mean the copyright owner or entity authorized by
      the copyright owner that is granting the License.

      "Legal Entity" shall mean the union of the acting entity and all
      other entities that control, are controlled by, or are under common
      control with that entity. For the purposes of this definition,
      "control" means (i) the power, direct or indirect, to cause the
      direction or management of such entity, whether by contract or
      otherwise, or (ii) ownership of fifty percent (50%) or more of the
      outstanding shares, or (iii) beneficial ownership of such entity.

      "You" (or "Your") shall mean an individual or Legal Entity
      exercising permissions granted by this License.

      "Source" form shall mean the preferred form for making modifications,
      including but not limited to software source code, documentation
      source, and configuration files.

      "Object" form shall mean any form resulting from mechanical
      transformation or translation of a Source form, including but
      not limited to compiled object code, generated documentation,
      and conversions to other media types.

      "Work" shall mean the work of authorship, whether in Source or
      Object form, made available under the License, as indicated by a
      copyright notice that is included in or attached to the work
      (an example is provided in the Appendix below).

      "Derivative Works" shall mean any work, whether in Source or Object
      form, that is based on (or derived from) the Work and for which the
      editorial revisions, annotations, elaborations, or other modifications
      represent, as a whole, an original work of authorship. For the purposes
      of this License, Derivative Works shall not include works that remain
      separable from, or merely link (or bind by name) to the interfaces of,
      the Work and Derivative Works thereof.

      "Contribution" shall mean any work of authorship, including
      the original version of the Work and any modifications or additions
      to that Work or Derivative Works thereof, that is intentionally
      submitted to Licensor for inclusion in the Work by the copyright owner
      or by an individual or Legal Entity authorized to submit on behalf of
      the copyright owner. For the purposes of this definition, "submitted"
      means any form of electronic, verbal, or written communication sent
      to the Licensor or its representatives, including but not limited to
      communication on electronic mailing lists, source code control systems,
      and issue tracking systems that are managed by, or on behalf of, the
      Licensor for the purpose of discussing and improving the Work, but
      excluding communication that is conspicuously marked or otherwise
      designated in writing by the copyright owner as "Not a Contribution."

      "Contributor" shall mean Licensor and any individual or Legal Entity
      on behalf of whom a Contribution has been received by Licensor and
      subsequently incorporated within the Work.

   2. Grant of Copyright License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      copyright license to reproduce, prepare Derivative Works of,
      publicly display, publicly perform, sublicense, and distribute the
      Work and such Derivative Works in Source or Object form.

   3. Grant of Patent License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      (except as stated in this section) patent license to make, have made,
      use, offer to sell, sell, import, and otherwise transfer the Work,
      where such license applies only to those patent claims licensable
      by such Contributor that are necessarily infringed by their
      Contribution(s) alone or by combination of their Contribution(s)
      with the Work to which such Contribution(s) was submitted. If You
      institute patent litigation against any entity (including a
      cross-claim or counterclaim in a lawsuit) alleging that the Work
      or a Contribution incorporated within the Work constitutes direct
      or contributory patent infringement, then any patent licenses
      granted to You under this License for that Work shall terminate
      as of the date such litigation is filed.

   4. Redistribution. You may reproduce and distribute copies of the
      Work or Derivative Works thereof in any medium, with or without
      modifications, and in Source or Object form, provided that You
      meet the following conditions:

      (a) You must give any other recipients of the Work or
          Derivative Works a copy of this License; and

      (b) You must cause any modified files to carry prominent notices
          stating that You changed the files; and

      (c) You must retain, in the Source form of any Derivative Works
          that You distribute, all copyright, patent, trademark, and
          attribution notices from the Source form of the Work,
          excluding those notices that do not pertain to any part of
          the Derivative Works; and

      (d) If the Work includes a "NOTICE" text file as part of its
          distribution, then any Derivative Works that You distribute must
          include a readable copy of the attribution notices contained
          within such NOTICE file, excluding those notices that do not
          pertain to any part of the Derivative Works, in at least one
          of the following places: within a NOTICE text file distributed
          as part of the Derivative Works; within the Source form or
          documentation, if provided along with the Derivative Works; or,
          within a display generated by the Derivative Works, if and
          wherever such third-party notices normally appear. The contents
          of the NOTICE file are for informational purposes only and
          do not modify the License. You may add Your own attribution
          notices within Derivative Works that You distribute, alongside
          or as an addendum to the NOTICE text from the Work, provided
          that such additional attribution notices cannot be construed
          as modifying the License.

      You may add Your own copyright statement to Your modifications and
      may provide additional or different license terms and conditions
      for use, reproduction, or distribution of Your modifications, or
      for any such Derivative Works as a whole, provided Your use,
      reproduction, and distribution of the Work otherwise complies with
      the conditions stated in this License.

   5. Submission of Contributions. Unless You explicitly state otherwise,
      any Contribution intentionally submitted for inclusion in the Work
      by You to the Licensor shall be under the terms and conditions of
      this License, without any additional terms or conditions.
      Notwithstanding the above, nothing herein shall supersede or modify
      the terms of any separate license agreement you may have executed
      with Licensor regarding such Contributions.

   6. Trademarks. This License does not grant permission to use the trade
      names, trademarks, service marks, or product names of the Licensor,
      except as required for reasonable and customary use in describing the
      origin of the Work and reproducing the content of the NOTICE file.

   7. Disclaimer of Warranty. Unless required by applicable law or
      agreed to in writing, Licensor provides the Work (and each
      Contributor provides its Contributions) on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
      implied, including, without limitation, any warranties or conditions
      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
      PARTICULAR PURPOSE. You are solely responsible for determining the
      appropriateness of using or redistributing the Work and assume any
      risks associated with Your exercise of permissions under this License.

   8. Limitation of Liability. In no event and under no legal theory,
      whether in tort (including negligence), contract, or otherwise,
      unless required by applicable law (such as deliberate and grossly
      negligent acts) or agreed to in writing, shall any Contributor be
      liable to You for damages, including any direct, indirect, special,
      incidental, or consequential damages of any character arising as a
      result of this License or out of the use or inability to use the
      Work (including but not limited to damages for loss of goodwill,
      work stoppage, computer failure or malfunction, or any and all
      other commercial damages or losses), even if such Contributor
      has been advised of the possibility of such damages.

   9. Accepting Warranty or Additional Liability. While redistributing
      the Work or Derivative Works thereof, You may choose to offer,
      and charge a fee for, acceptance of support, warranty, indemnity,
      or other liability obligations and/or rights consistent with this
      License. However, in accepting such obligations, You may act only
      on Your own behalf and on Your sole responsibility, not on behalf
      of any other Contributor, and only if You agree to indemnify,
      defend, and hold each Contributor harmless for any liability
      incurred by, or claims asserted against, such Contributor by reason
      of your accepting any such warranty or additional liability.

   END OF TERMS AND CONDITIONS

   APPENDIX: How to apply the Apache License to your work.

      To apply the Apache License to your work, attach the following
      boilerplate notice, with the fields enclosed by brackets "[]"
      replaced with your own identifying information. (Don't include
      the brackets!)  The text should be enclosed in the appropriate
      comment syntax for the file format. We also recommend that a
      file or class name and description of purpose be included on the
      same "printed page" as the copyright notice for easier
      identification within third-party archives.

   Copyright [yyyy] [name of copyright owner]

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.


================================================
FILE: README.md
================================================
# Tensor2Tensor

[![PyPI
version](https://badge.fury.io/py/tensor2tensor.svg)](https://badge.fury.io/py/tensor2tensor)
[![GitHub
Issues](https://img.shields.io/github/issues/tensorflow/tensor2tensor.svg)](https://github.com/tensorflow/tensor2tensor/issues)
[![Contributions
welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg)](CONTRIBUTING.md)
[![Gitter](https://img.shields.io/gitter/room/nwjs/nw.js.svg)](https://gitter.im/tensor2tensor/Lobby)
[![License](https://img.shields.io/badge/License-Apache%202.0-brightgreen.svg)](https://opensource.org/licenses/Apache-2.0)
[![Travis](https://img.shields.io/travis/tensorflow/tensor2tensor.svg)](https://travis-ci.org/tensorflow/tensor2tensor)
[![Run on FH](https://static.floydhub.com/button/button-small.svg)](https://floydhub.com/run)

[Tensor2Tensor](https://github.com/tensorflow/tensor2tensor), or
[T2T](https://github.com/tensorflow/tensor2tensor) for short, is a library
of deep learning models and datasets designed to make deep learning more
accessible and [accelerate ML
research](https://research.googleblog.com/2017/06/accelerating-deep-learning-research.html).


T2T was developed by researchers and engineers in the
[Google Brain team](https://research.google.com/teams/brain/) and a community
of users. It is now deprecated &mdash; we keep it running and welcome
bug-fixes, but encourage users to use the successor library [Trax](https://github.com/google/trax).

### Quick Start

[This iPython notebook](https://colab.research.google.com/github/tensorflow/tensor2tensor/blob/master/tensor2tensor/notebooks/hello_t2t.ipynb)
explains T2T and runs in your browser using a free VM from Google,
no installation needed. Alternatively, here is a one-command version that
installs T2T, downloads MNIST, trains a model and evaluates it:

```
pip install tensor2tensor && t2t-trainer \
  --generate_data \
  --data_dir=~/t2t_data \
  --output_dir=~/t2t_train/mnist \
  --problem=image_mnist \
  --model=shake_shake \
  --hparams_set=shake_shake_quick \
  --train_steps=1000 \
  --eval_steps=100
```

### Contents

* [Suggested Datasets and Models](#suggested-datasets-and-models)
  * [Mathematical Language Understanding](#mathematical-language-understanding)
  * [Story, Question and Answer](#story-question-and-answer)
  * [Image Classification](#image-classification)
  * [Image Generation](#image-generation)
  * [Language Modeling](#language-modeling)
  * [Sentiment Analysis](#sentiment-analysis)
  * [Speech Recognition](#speech-recognition)
  * [Summarization](#summarization)
  * [Translation](#translation)
* [Basics](#basics)
  * [Walkthrough](#walkthrough)
  * [Installation](#installation)
  * [Features](#features)
* [T2T Overview](#t2t-overview)
  * [Datasets](#datasets)
  * [Problems and Modalities](#problems-and-modalities)
  * [Models](#models)
  * [Hyperparameter Sets](#hyperparameter-sets)
  * [Trainer](#trainer)
* [Adding your own components](#adding-your-own-components)
* [Adding a dataset](#adding-a-dataset)
* [Papers](#papers)
* [Run on FloydHub](#run-on-floydhub)

## Suggested Datasets and Models

Below we list a number of tasks that can be solved with T2T when
you train the appropriate model on the appropriate problem.
We give the problem and model below and we suggest a setting of
hyperparameters that we know works well in our setup. We usually
run either on Cloud TPUs or on 8-GPU machines; you might need
to modify the hyperparameters if you run on a different setup.

### Mathematical Language Understanding

For evaluating mathematical expressions at the character level involving addition, subtraction and multiplication of both positive and negative decimal numbers with variable digits assigned to symbolic variables, use

* the [MLU](https://art.wangperawong.com/mathematical_language_understanding_train.tar.gz) data-set:
 `--problem=algorithmic_math_two_variables`

You can try solving the problem with different transformer models and hyperparameters as described in the [paper](https://arxiv.org/abs/1812.02825):
* Standard transformer:
`--model=transformer`
`--hparams_set=transformer_tiny`
* Universal transformer:
`--model=universal_transformer`
`--hparams_set=universal_transformer_tiny`
* Adaptive universal transformer:
`--model=universal_transformer`
`--hparams_set=adaptive_universal_transformer_tiny`

### Story, Question and Answer

For answering questions based on a story, use

* the [bAbi](https://research.fb.com/downloads/babi/) data-set:
 `--problem=babi_qa_concat_task1_1k`

You can choose the bAbi task from the range [1,20] and the subset from 1k or
10k. To combine test data from all tasks into a single test set, use
`--problem=babi_qa_concat_all_tasks_10k`

### Image Classification

For image classification, we have a number of standard data-sets:

* ImageNet (a large data-set): `--problem=image_imagenet`, or one
   of the re-scaled versions (`image_imagenet224`, `image_imagenet64`,
   `image_imagenet32`)
* CIFAR-10: `--problem=image_cifar10` (or
    `--problem=image_cifar10_plain` to turn off data augmentation)
* CIFAR-100: `--problem=image_cifar100`
* MNIST: `--problem=image_mnist`

For ImageNet, we suggest to use the ResNet or Xception, i.e.,
use `--model=resnet --hparams_set=resnet_50` or
`--model=xception --hparams_set=xception_base`.
Resnet should get to above 76% top-1 accuracy on ImageNet.

For CIFAR and MNIST, we suggest to try the shake-shake model:
`--model=shake_shake --hparams_set=shakeshake_big`.
This setting trained for `--train_steps=700000` should yield
close to 97% accuracy on CIFAR-10.

### Image Generation

For (un)conditional image generation, we have a number of standard data-sets:

* CelebA: `--problem=img2img_celeba` for image-to-image translation, namely,
    superresolution from 8x8 to 32x32.
* CelebA-HQ: `--problem=image_celeba256_rev` for a downsampled 256x256.
* CIFAR-10: `--problem=image_cifar10_plain_gen_rev` for class-conditional
    32x32 generation.
* LSUN Bedrooms: `--problem=image_lsun_bedrooms_rev`
* MS-COCO: `--problem=image_text_ms_coco_rev` for text-to-image generation.
* Small ImageNet (a large data-set): `--problem=image_imagenet32_gen_rev` for
    32x32 or `--problem=image_imagenet64_gen_rev` for 64x64.

We suggest to use the Image Transformer, i.e., `--model=imagetransformer`, or
the Image Transformer Plus, i.e., `--model=imagetransformerpp` that uses
discretized mixture of logistics, or variational auto-encoder, i.e.,
`--model=transformer_ae`.
For CIFAR-10, using `--hparams_set=imagetransformer_cifar10_base` or
`--hparams_set=imagetransformer_cifar10_base_dmol` yields 2.90 bits per
dimension. For Imagenet-32, using
`--hparams_set=imagetransformer_imagenet32_base` yields 3.77 bits per dimension.

### Language Modeling

For language modeling, we have these data-sets in T2T:

* PTB (a small data-set): `--problem=languagemodel_ptb10k` for
    word-level modeling and `--problem=languagemodel_ptb_characters`
    for character-level modeling.
* LM1B (a billion-word corpus): `--problem=languagemodel_lm1b32k` for
    subword-level modeling and `--problem=languagemodel_lm1b_characters`
    for character-level modeling.

We suggest to start with `--model=transformer` on this task and use
`--hparams_set=transformer_small` for PTB and
`--hparams_set=transformer_base` for LM1B.

### Sentiment Analysis

For the task of recognizing the sentiment of a sentence, use

* the IMDB data-set: `--problem=sentiment_imdb`

We suggest to use `--model=transformer_encoder` here and since it is
a small data-set, try `--hparams_set=transformer_tiny` and train for
few steps (e.g., `--train_steps=2000`).

### Speech Recognition

For speech-to-text, we have these data-sets in T2T:

* Librispeech (US English): `--problem=librispeech` for
    the whole set and `--problem=librispeech_clean` for a smaller
    but nicely filtered part.

* Mozilla Common Voice (US English): `--problem=common_voice` for the whole set
    `--problem=common_voice_clean` for a quality-checked subset.

### Summarization

For summarizing longer text into shorter one we have these data-sets:

* CNN/DailyMail articles summarized into a few sentences:
  `--problem=summarize_cnn_dailymail32k`

We suggest to use `--model=transformer` and
`--hparams_set=transformer_prepend` for this task.
This yields good ROUGE scores.

### Translation

There are a number of translation data-sets in T2T:

* English-German: `--problem=translate_ende_wmt32k`
* English-French: `--problem=translate_enfr_wmt32k`
* English-Czech: `--problem=translate_encs_wmt32k`
* English-Chinese: `--problem=translate_enzh_wmt32k`
* English-Vietnamese: `--problem=translate_envi_iwslt32k`
* English-Spanish: `--problem=translate_enes_wmt32k`

You can get translations in the other direction by appending `_rev` to
the problem name, e.g., for German-English use
`--problem=translate_ende_wmt32k_rev`
(note that you still need to download the original data with t2t-datagen
`--problem=translate_ende_wmt32k`).

For all translation problems, we suggest to try the Transformer model:
`--model=transformer`. At first it is best to try the base setting,
`--hparams_set=transformer_base`. When trained on 8 GPUs for 300K steps
this should reach a BLEU score of about 28 on the English-German data-set,
which is close to state-of-the art. If training on a single GPU, try the
`--hparams_set=transformer_base_single_gpu` setting. For very good results
or larger data-sets (e.g., for English-French), try the big model
with `--hparams_set=transformer_big`.

See this [example](https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/notebooks/Transformer_translate.ipynb) to know how the translation works.

## Basics

### Walkthrough

Here's a walkthrough training a good English-to-German translation
model using the Transformer model from [*Attention Is All You
Need*](https://arxiv.org/abs/1706.03762) on WMT data.

```
pip install tensor2tensor

# See what problems, models, and hyperparameter sets are available.
# You can easily swap between them (and add new ones).
t2t-trainer --registry_help

PROBLEM=translate_ende_wmt32k
MODEL=transformer
HPARAMS=transformer_base_single_gpu

DATA_DIR=$HOME/t2t_data
TMP_DIR=/tmp/t2t_datagen
TRAIN_DIR=$HOME/t2t_train/$PROBLEM/$MODEL-$HPARAMS

mkdir -p $DATA_DIR $TMP_DIR $TRAIN_DIR

# Generate data
t2t-datagen \
  --data_dir=$DATA_DIR \
  --tmp_dir=$TMP_DIR \
  --problem=$PROBLEM

# Train
# *  If you run out of memory, add --hparams='batch_size=1024'.
t2t-trainer \
  --data_dir=$DATA_DIR \
  --problem=$PROBLEM \
  --model=$MODEL \
  --hparams_set=$HPARAMS \
  --output_dir=$TRAIN_DIR

# Decode

DECODE_FILE=$DATA_DIR/decode_this.txt
echo "Hello world" >> $DECODE_FILE
echo "Goodbye world" >> $DECODE_FILE
echo -e 'Hallo Welt\nAuf Wiedersehen Welt' > ref-translation.de

BEAM_SIZE=4
ALPHA=0.6

t2t-decoder \
  --data_dir=$DATA_DIR \
  --problem=$PROBLEM \
  --model=$MODEL \
  --hparams_set=$HPARAMS \
  --output_dir=$TRAIN_DIR \
  --decode_hparams="beam_size=$BEAM_SIZE,alpha=$ALPHA" \
  --decode_from_file=$DECODE_FILE \
  --decode_to_file=translation.en

# See the translations
cat translation.en

# Evaluate the BLEU score
# Note: Report this BLEU score in papers, not the internal approx_bleu metric.
t2t-bleu --translation=translation.en --reference=ref-translation.de
```

### Installation


```
# Assumes tensorflow or tensorflow-gpu installed
pip install tensor2tensor

# Installs with tensorflow-gpu requirement
pip install tensor2tensor[tensorflow_gpu]

# Installs with tensorflow (cpu) requirement
pip install tensor2tensor[tensorflow]
```

Binaries:

```
# Data generator
t2t-datagen

# Trainer
t2t-trainer --registry_help
```

Library usage:

```
python -c "from tensor2tensor.models.transformer import Transformer"
```

### Features

* Many state of the art and baseline models are built-in and new models can be
  added easily (open an issue or pull request!).
* Many datasets across modalities - text, audio, image - available for
  generation and use, and new ones can be added easily (open an issue or pull
  request for public datasets!).
* Models can be used with any dataset and input mode (or even multiple); all
  modality-specific processing (e.g. embedding lookups for text tokens) is done
  with `bottom` and `top` transformations, which are specified per-feature in the
  model.
* Support for multi-GPU machines and synchronous (1 master, many workers) and
  asynchronous (independent workers synchronizing through a parameter server)
  [distributed training](https://tensorflow.github.io/tensor2tensor/distributed_training.html).
* Easily swap amongst datasets and models by command-line flag with the data
  generation script `t2t-datagen` and the training script `t2t-trainer`.
* Train on [Google Cloud ML](https://tensorflow.github.io/tensor2tensor/cloud_mlengine.html) and [Cloud TPUs](https://tensorflow.github.io/tensor2tensor/cloud_tpu.html).

## T2T overview

### Problems

**Problems** consist of features such as inputs and targets, and metadata such
as each feature's modality (e.g. symbol, image, audio) and vocabularies. Problem
features are given by a dataset, which is stored as a `TFRecord` file with
`tensorflow.Example` protocol buffers. All
problems are imported in
[`all_problems.py`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/data_generators/all_problems.py)
or are registered with `@registry.register_problem`. Run
[`t2t-datagen`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/bin/t2t-datagen)
to see the list of available problems and download them.

### Models

**`T2TModel`s** define the core tensor-to-tensor computation. They apply a
default transformation to each input and output so that models may deal with
modality-independent tensors (e.g. embeddings at the input; and a linear
transform at the output to produce logits for a softmax over classes). All
models are imported in the
[`models` subpackage](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/models/__init__.py),
inherit from [`T2TModel`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/utils/t2t_model.py),
and are registered with
[`@registry.register_model`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/utils/registry.py).

### Hyperparameter Sets

**Hyperparameter sets** are encoded in
[`HParams`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/utils/hparam.py)
objects, and are registered with
[`@registry.register_hparams`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/utils/registry.py).
Every model and problem has a `HParams`. A basic set of hyperparameters are
defined in
[`common_hparams.py`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/layers/common_hparams.py)
and hyperparameter set functions can compose other hyperparameter set functions.

### Trainer

The **trainer** binary is the entrypoint for training, evaluation, and
inference. Users can easily switch between problems, models, and hyperparameter
sets by using the `--model`, `--problem`, and `--hparams_set` flags. Specific
hyperparameters can be overridden with the `--hparams` flag. `--schedule` and
related flags control local and distributed training/evaluation
([distributed training documentation](https://github.com/tensorflow/tensor2tensor/tree/master/docs/distributed_training.md)).

## Adding your own components

T2T's components are registered using a central registration mechanism that
enables easily adding new ones and easily swapping amongst them by command-line
flag. You can add your own components without editing the T2T codebase by
specifying the `--t2t_usr_dir` flag in `t2t-trainer`.

You can do so for models, hyperparameter sets, modalities, and problems. Please
do submit a pull request if your component might be useful to others.

See the [`example_usr_dir`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/test_data/example_usr_dir)
for an example user directory.

## Adding a dataset

To add a new dataset, subclass
[`Problem`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/data_generators/problem.py)
and register it with `@registry.register_problem`. See
[`TranslateEndeWmt8k`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/data_generators/translate_ende.py)
for an example. Also see the [data generators
README](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/data_generators/README.md).

## Run on FloydHub

[![Run on FloydHub](https://static.floydhub.com/button/button.svg)](https://floydhub.com/run)

Click this button to open a [Workspace](https://blog.floydhub.com/workspaces/) on [FloydHub](https://www.floydhub.com/?utm_medium=readme&utm_source=tensor2tensor&utm_campaign=jul_2018). You can use the workspace to develop and test your code on a fully configured cloud GPU machine.

Tensor2Tensor comes preinstalled in the environment, you can simply open a [Terminal](https://docs.floydhub.com/guides/workspace/#using-terminal) and run your code.

```bash
# Test the quick-start on a Workspace's Terminal with this command
t2t-trainer \
  --generate_data \
  --data_dir=./t2t_data \
  --output_dir=./t2t_train/mnist \
  --problem=image_mnist \
  --model=shake_shake \
  --hparams_set=shake_shake_quick \
  --train_steps=1000 \
  --eval_steps=100
```

Note: Ensure compliance with the FloydHub [Terms of Service](https://www.floydhub.com/about/terms).

## Papers

When referencing Tensor2Tensor, please cite [this
paper](https://arxiv.org/abs/1803.07416).

```
@article{tensor2tensor,
  author    = {Ashish Vaswani and Samy Bengio and Eugene Brevdo and
    Francois Chollet and Aidan N. Gomez and Stephan Gouws and Llion Jones and
    \L{}ukasz Kaiser and Nal Kalchbrenner and Niki Parmar and Ryan Sepassi and
    Noam Shazeer and Jakob Uszkoreit},
  title     = {Tensor2Tensor for Neural Machine Translation},
  journal   = {CoRR},
  volume    = {abs/1803.07416},
  year      = {2018},
  url       = {http://arxiv.org/abs/1803.07416},
}
```

Tensor2Tensor was used to develop a number of state-of-the-art models
and deep learning methods. Here we list some papers that were based on T2T
from the start and benefited from its features and architecture in ways
described in the [Google Research Blog post introducing
T2T](https://research.googleblog.com/2017/06/accelerating-deep-learning-research.html).

* [Attention Is All You Need](https://arxiv.org/abs/1706.03762)
* [Depthwise Separable Convolutions for Neural Machine
   Translation](https://arxiv.org/abs/1706.03059)
* [One Model To Learn Them All](https://arxiv.org/abs/1706.05137)
* [Discrete Autoencoders for Sequence Models](https://arxiv.org/abs/1801.09797)
* [Generating Wikipedia by Summarizing Long
   Sequences](https://arxiv.org/abs/1801.10198)
* [Image Transformer](https://arxiv.org/abs/1802.05751)
* [Training Tips for the Transformer Model](https://arxiv.org/abs/1804.00247)
* [Self-Attention with Relative Position Representations](https://arxiv.org/abs/1803.02155)
* [Fast Decoding in Sequence Models using Discrete Latent Variables](https://arxiv.org/abs/1803.03382)
* [Adafactor: Adaptive Learning Rates with Sublinear Memory Cost](https://arxiv.org/abs/1804.04235)
* [Universal Transformers](https://arxiv.org/abs/1807.03819)
* [Attending to Mathematical Language with Transformers](https://arxiv.org/abs/1812.02825)
* [The Evolved Transformer](https://arxiv.org/abs/1901.11117)
* [Model-Based Reinforcement Learning for Atari](https://arxiv.org/abs/1903.00374)
* [VideoFlow: A Flow-Based Generative Model for Video](https://arxiv.org/abs/1903.01434)

*NOTE: This is not an official Google product.*


================================================
FILE: docs/cloud_mlengine.md
================================================
# Running on Cloud ML Engine

Google Cloud Platform offers a managed training environment for TensorFlow
models called [Cloud ML Engine](https://cloud.google.com/ml-engine/) and
you can easily launch Tensor2Tensor on it, including for hyperparameter tuning.

# Launch

It's the same `t2t-trainer` you know and love with the addition of the
`--cloud_mlengine` flag, which by default will launch on a 1-GPU machine
in the default compute region. See the
[docs for `gcloud compute`](https://cloud.google.com/compute/docs/gcloud-compute/#set_default_zone_and_region_in_your_local_client)
to learn how to set the default compute region.

```
# Note that both the data dir and output dir have to be on GCS
DATA_DIR=gs://my-bucket/data
OUTPUT_DIR=gs://my-bucket/train
t2t-trainer \
  --problem=translate_ende_wmt32k \
  --model=transformer \
  --hparams_set=transformer_base \
  --data_dir=$DATA_DIR \
  --output_dir=$OUTPUT_DIR \
  --cloud_mlengine
```

By passing `--worker_gpu=4` or `--worker_gpu=8` it will automatically launch on
machines with 4 or 8 GPUs.

You can additionally pass the `--cloud_mlengine_master_type` to select another
kind of machine (see the [docs for
`masterType`](https://cloud.google.com/ml-engine/reference/rest/v1/projects.jobs#traininginput)
for options, including
[ML Engine machine
types](https://cloud.google.com/ml-engine/docs/training-overview)
and their
[specs](https://cloud.google.com/compute/docs/machine-types)).
If you provide this flag yourself, make sure you pass the
correct value for `--worker_gpu` (for non-GPU machines, you should pass
`--worker_gpu=0`).

**Note**: `t2t-trainer` only currently supports launching with single machines,
possibly with multiple GPUs. Multi-machine setups are not yet supported out of
the box with the `--cloud_mlengine` flag, though multi-machine should in
principle work just fine. Contributions/testers welcome.


## `--t2t_usr_dir`

Launching on Cloud ML Engine works with `--t2t_usr_dir` as well as long as the
directory is fully self-contained (i.e. the imports only refer to other modules
in the directory). If there are additional PyPI dependencies that you need, you
can include a `requirements.txt` file in the directory specified by
`t2t_usr_dir`.

# Hyperparameter Tuning

Hyperparameter tuning with `t2t-trainer` and Cloud ML Engine is also a breeze
with `--hparams_range` and the `--autotune_*` flags:

```
t2t-trainer \
  --problem=translate_ende_wmt32k \
  --model=transformer \
  --hparams_set=transformer_base \
  --data_dir=$DATA_DIR \
  --output_dir=$OUTPUT_DIR \
  --cloud_mlengine \
  --hparams_range=transformer_base_range \
  --autotune_objective='metrics-translate_ende_wmt32k/neg_log_perplexity' \
  --autotune_maximize \
  --autotune_max_trials=100 \
  --autotune_parallel_trials=3
```

The `--hparams_range` specifies the search space and should be registered with
`@register_ranged_hparams`. It defines a `RangedHParams` object that sets
search ranges and scales for various parameters. See `transformer_base_range`
in
[`transformer.py`](https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/models/transformer.py)
for an example.

The metric name passed as `--autotune_objective` should be exactly what you'd
see in TensorBoard. To minimize a metric, set `--autotune_maximize=False`.

You control how many total trials to run with `--autotune_max_trials` and the
number of jobs to launch in parallel with `--autotune_parallel_trials`.

Happy tuning!


================================================
FILE: docs/cloud_tpu.md
================================================
# Running on Cloud TPUs

Tensor2Tensor supports running on Google Cloud Platforms TPUs, chips
specialized for ML training. See the official tutorials for [running the
T2T Transformer for text on Cloud TPUs](https://cloud.google.com/tpu/docs/tutorials/transformer) and
[Transformer for Speech Recognition](https://cloud.google.com/tpu/docs/tutorials/automated-speech-recognition).

## Other models on TPU

Many of Tensor2Tensor's models work on TPU.

You can provision a VM and TPU with `ctpu up`. Use the `t2t-trainer` command
on the VM as usual with the additional flags `--use_tpu` and
`--cloud_tpu_name=$TPU_NAME`.

Note that because the `TPUEstimator` does not catch the `OutOfRangeError`
during evaluation, you should ensure that `--eval_steps` is small enough to
not exhaust the evaluation data.

A non-exhaustive list of T2T models that work on TPU:

* Image generation: `imagetransformer` with `imagetransformer_base_tpu` (or
  `imagetransformer_tiny_tpu`)
* Super-resolution: `img2img_transformer` with `img2img_transformer_base_tpu`
  (or `img2img_transformer_tiny_tpu`)
* `resnet` with `resnet_50` (or `resnet_18` or `resnet_34`)
* `revnet` with `revnet_104` (or `revnet_38_cifar`)
* `shake_shake` with `shakeshake_tpu` (or `shakeshake_small`)

## Example invocation

Use `ctpu up` to bring up the VM and TPU machines; once the machines are ready
it will SSH you into the VM and you can run the following:

```
# DATA_DIR and OUT_DIR should be GCS buckets
# TPU_NAME should have been set automatically by the ctpu tool

t2t-trainer \
  --model=shake_shake \
  --hparams_set=shakeshake_tpu \
  --problem=image_cifar10 \
  --train_steps=180000 \
  --eval_steps=9 \
  --local_eval_frequency=100 \
  --data_dir=$DATA_DIR \
  --output_dir=$OUT_DIR \
  --use_tpu \
  --cloud_tpu_name=$TPU_NAME
```


================================================
FILE: docs/distributed_training.md
================================================
# Distributed Training

The `t2t-trainer` supports both synchronous and asynchronous distributed
training.

Note that it is almost always more efficient to train on a single machine with
multiple GPUs/TPUs. Async training is less stable than sync training, and sync
training is much faster on 1 machine than on multiple. For these reasons, we
almost always train on single machines with multiple GPUs/TPUs.

T2T uses TensorFlow Estimators and so distributed training is configured with
the `TF_CONFIG` environment variable that is read by the
[RunConfig](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/estimator/run_config.py)
along with a set of flags that T2T uses to distribute the computation.

## Shared output directory

When using multiple machines, it is necessary that all nodes use the same
`--output_dir`, which means that it should be set to a Google Cloud Storage
bucket (`gs://...`) or a directory on a shared network filesystem.

## Utility to produce `TF_CONFIG` and flags

[`t2t-make-tf-configs`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/bin/t2t-make-tf-configs)
generates the `TF_CONFIG` json strings and the necessary command-line flags for
the jobs.

Given a set of master and parameter server addresses, the script outputs, for
each job, a line with the `TF_CONFIG` environment variable and the command-line
flags necessary for distributed training. For each job, you should invoke the
`t2t-trainer` with the `TF_CONFIG` value and flags that are output.

## Eval jobs

Eval jobs should set the following flags and do not need the `TF_CONFIG`
environment variable to be set as the eval jobs run locally and do not
communicate to the other jobs (the eval jobs read the model checkpoints that the
trainer writes out):

- `--schedule=continuous_eval_on_train_data` or
  `--schedule=continuous_eval` (for dev data)
- `--worker_job='/job:localhost'`
- `--output_dir=$TRAIN_DIR`

**Note that evaluation does not work distributed.** That is, distributed jobs
should always use `--schedule=train`.

## Examples

### Sync training across multiple workers

In this scenario, you wish to do synchronous training across multiple workers.
Note that it is easier to simply use 1 worker with multiple GPUs and set
`--worker_gpu=8`, but there may be cases where you may want to have multiple
machines.

You will need 1 `ip:port` for the master and then 1 `ip:port` for each worker.

For this example we'll use 2 workers and these addresses:

```
# Master
10.0.0.1:5555

# Worker 1
10.0.0.2:5555

# Worker 2
10.0.0.3:5555
```

Next we generate the `TF_CONFIG` and command-line-flags for each job.

```
$ t2t-make-tf-configs --masters='10.0.0.1:5555' --ps='10.0.0.2:5555,10.0.0.3:5555'
Assuming SYNC distributed training with a single master and 2 workers
'{"cluster": {"master": ["10.0.0.1:5555"], "ps": ["10.0.0.2:5555", "10.0.0.3:5555"]}, "environment": "cloud", "task": {"index": 0, "type": "master"}}'      --master=grpc://10.0.0.1:5555 --ps_replicas=2 --worker_replicas=1 --worker_gpu=0 --worker_id=0 --ps_gpu=1 --sync --schedule=train --worker_job='/job:master'
'{"cluster": {"master": ["10.0.0.1:5555"], "ps": ["10.0.0.2:5555", "10.0.0.3:5555"]}, "environment": "cloud", "task": {"index": 0, "type": "ps"}}'  --schedule=run_std_server
'{"cluster": {"master": ["10.0.0.1:5555"], "ps": ["10.0.0.2:5555", "10.0.0.3:5555"]}, "environment": "cloud", "task": {"index": 1, "type": "ps"}}'  --schedule=run_std_server
```

The output here is 1 line per job. Each line contains the `TF_CONFIG` to set
for that job as well as the command-line flags to set for that job.

It is a bit confusing that the workers are being passed to the `--ps` flag, but
this is correct. When running in `--sync` mode, the `ps` are actually the
workers. You can see in the next example below that when `--sync=False`, i.e.
async mode, that the `ps` are in fact being used as parameter servers.

Here's how we would start each job on their respective machines (the
commands below assume that you're ssh'd into that job's machine):

**Master**:

```
$ export TF_CONFIG='{"cluster": {"master": ["10.0.0.1:5555"], "ps": ["10.0.0.2:5555", "10.0.0.3:5555"]}, "environment": "cloud", "task": {"index": 0, "type": "master"}}'
$ t2t-trainer \
    --master=grpc://10.0.0.1:5555 \
    --ps_replicas=2 \
    --worker_replicas=1 \
    --worker_gpu=0 \
    --worker_id=0 \
    --ps_gpu=1 \
    --sync \
    --schedule=train \
    --worker_job='/job:master' \
    --model=transformer \
    --hparams_set=transformer_base \
    --problem=translate_ende_wmt32k
```

**Worker 1**:

```
$ export TF_CONFIG='{"cluster": {"master": ["10.0.0.1:5555"], "ps": ["10.0.0.2:5555", "10.0.0.3:5555"]}, "environment": "cloud", "task": {"index": 0, "type": "ps"}}'
$ t2t-trainer --schedule=run_std_server
```

**Worker 2**:

```
$ export TF_CONFIG='{"cluster": {"master": ["10.0.0.1:5555"], "ps": ["10.0.0.2:5555", "10.0.0.3:5555"]}, "environment": "cloud", "task": {"index": 1, "type": "ps"}}'
$ t2t-trainer --schedule=run_std_server
```

Note that if you have more than 1 GPU on each worker machine, make sure to
modify the `--ps_gpu` passed to the master.

### Async training across multiple workers

In this scenario, you wish to do asynchronous training across multiple workers
with 1+ shared parameter servers.

Note that async training is usually less stable than sync training and for that
reason we almost always prefer sync training, but there may be cases where you
want to do async distributed training.

For this example we'll use 2 workers and 2 parameter servers:

```
# Worker 1
10.0.0.1:5555

# Worker 2
10.0.0.2:5555

# PS 1
10.0.0.3:5555

# PS 2
10.0.0.4:5555
```

Next we generate the `TF_CONFIG` and command-line-flags for each job.

```
$ t2t-make-tf-configs --masters='10.0.0.1:5555,10.0.0.2:5555' --ps='10.0.0.3:5555,10.0.0.4:5555'
Assuming ASYNC distributed training with 2 workers and 2 parameter servers
'{"task": {"index": 0, "type": "chief"}, "cluster": {"chief": ["10.0.0.1:5555"], "ps": ["10.0.0.3:5555", "10.0.0.4:5555"], "worker": ["10.0.0.2:5555"]}, "environment": "cloud"}' --master=grpc://10.0.0.1:5555 --ps_replicas=2 --worker_replicas=2 --worker_gpu=1 --worker_id=0 --ps_gpu=0  --schedule=train --worker_job='/job:chief'
'{"task": {"index": 0, "type": "worker"}, "cluster": {"chief": ["10.0.0.1:5555"], "ps": ["10.0.0.3:5555", "10.0.0.4:5555"], "worker": ["10.0.0.2:5555"]}, "environment": "cloud"}'        --master=grpc://10.0.0.2:5555 --ps_replicas=2 --worker_replicas=2 --worker_gpu=1 --worker_id=1 --ps_gpu=0 --schedule=train --worker_job='/job:worker'
'{"task": {"index": 0, "type": "ps"}, "cluster": {"chief": ["10.0.0.1:5555"], "ps": ["10.0.0.3:5555", "10.0.0.4:5555"], "worker": ["10.0.0.2:5555"]}, "environment": "cloud"}'    --schedule=run_std_server
'{"task": {"index": 1, "type": "ps"}, "cluster": {"chief": ["10.0.0.1:5555"], "ps": ["10.0.0.3:5555", "10.0.0.4:5555"], "worker": ["10.0.0.2:5555"]}, "environment": "cloud"}'    --schedule=run_std_server
```

Here's how we would start each job on their respective machines (the
commands below assume that you're ssh'd into that job's machine):

**Worker 1**:

```
$ export TF_CONFIG='{"task": {"index": 0, "type": "chief"}, "cluster": {"chief": ["10.0.0.1:5555"], "ps": ["10.0.0.3:5555", "10.0.0.4:5555"], "worker": ["10.0.0.2:5555"]}, "environment": "cloud"}'
$ t2t-trainer \
    --master=grpc://10.0.0.1:5555 \
    --ps_replicas=2 \
    --worker_replicas=2 \
    --worker_gpu=1 \
    --worker_id=0 \
    --ps_gpu=0 \
    --schedule=train \
    --worker_job='/job:chief' \
    --model=transformer \
    --hparams_set=transformer_base \
    --problem=translate_ende_wmt32k
```

**Worker 2**:

```
$ export TF_CONFIG='{"task": {"index": 0, "type": "worker"}, "cluster": {"chief": ["10.0.0.1:5555"], "ps": ["10.0.0.3:5555", "10.0.0.4:5555"], "worker": ["10.0.0.2:5555"]}, "environment": "cloud"}'
$ t2t-trainer \
    --master=grpc://10.0.0.2:5555 \
    --ps_replicas=2 \
    --worker_replicas=2 \
    --worker_gpu=1 \
    --worker_id=1 \
    --ps_gpu=0 \
    --schedule=train \
    --worker_job='/job:worker' \
    --model=transformer \
    --hparams_set=transformer_base \
    --problem=translate_ende_wmt32k
```

**PS 1**:

```
$ export TF_CONFIG='{"task": {"index": 0, "type": "ps"}, "cluster": {"chief": ["10.0.0.1:5555"], "ps": ["10.0.0.3:5555", "10.0.0.4:5555"], "worker": ["10.0.0.2:5555"]}, "environment": "cloud"}'
$ t2t-trainer --schedule=run_std_server
```

**PS 2**:

```
$ export TF_CONFIG='{"task": {"index": 1, "type": "ps"}, "cluster": {"chief": ["10.0.0.1:5555"], "ps": ["10.0.0.3:5555", "10.0.0.4:5555"], "worker": ["10.0.0.2:5555"]}, "environment": "cloud"}'
$ t2t-trainer --schedule=run_std_server
```

Increase `--worker_gpu` on each of the workers if you have multiple GPUs. If the
parameter servers are also using GPUs, set `--ps_gpu` to the number of GPUs on
the parameter servers.


================================================
FILE: docs/index.md
================================================
# Tensor2Tensor Documentation

[![PyPI
version](https://badge.fury.io/py/tensor2tensor.svg)](https://badge.fury.io/py/tensor2tensor)
[![GitHub
Issues](https://img.shields.io/github/issues/tensorflow/tensor2tensor.svg)](https://github.com/tensorflow/tensor2tensor/issues)
[![Contributions
welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg)](CONTRIBUTING.md)
[![Gitter](https://img.shields.io/gitter/room/nwjs/nw.js.svg)](https://gitter.im/tensor2tensor/Lobby)
[![License](https://img.shields.io/badge/License-Apache%202.0-brightgreen.svg)](https://opensource.org/licenses/Apache-2.0)

[Tensor2Tensor](https://github.com/tensorflow/tensor2tensor), or
[T2T](https://github.com/tensorflow/tensor2tensor) for short, is a library
of deep learning models and datasets designed to make deep learning more
accessible and [accelerate ML
research](https://research.googleblog.com/2017/06/accelerating-deep-learning-research.html).


## Introduction

* [Walkthrough](walkthrough.md): Install and run.
* [IPython notebook](https://colab.research.google.com/github/tensorflow/tensor2tensor/blob/master/tensor2tensor/notebooks/hello_t2t.ipynb): Get a hands-on experience.

## Basics

* [Overview](overview.md): How all parts of T2T code are connected.
* [New Problem](new_problem.md): Train T2T models on your data.
* [New Model](new_model.md): Create your own T2T model.

## Training in the cloud

* [Training on Google Cloud ML](cloud_mlengine.md)
* [Training on Google Cloud TPUs](cloud_tpu.md)
* [Distributed Training](distributed_training.md)

## Solving your task

Below we list a number of tasks that can be solved with T2T when
you train the appropriate model on the appropriate problem.
We give the problem and model below and we suggest a setting of
hyperparameters that we know works well in our setup. We usually
run either on Cloud TPUs or on 8-GPU machines; you might need
to modify the hyperparameters if you run on a different setup.

### Image Classification

For image classification, we have a number of standard data-sets:
* ImageNet (a large data-set): `--problem=image_imagenet`, or one
   of the re-scaled versions (`image_imagenet224`, `image_imagenet64`,
   `image_imagenet32`)
* CIFAR-10: `--problem=image_cifar10` (or
    `--problem=image_cifar10_plain` to turn off data augmentation)
* CIFAR-100: `--problem=image_cifar100`
* MNIST: `--problem=image_mnist`

For ImageNet, we suggest to use the ResNet or Xception, i.e.,
use `--model=resnet --hparams_set=resnet_50` or
`--model=xception --hparams_set=xception_base`.
Resnet should get to above 76% top-1 accuracy on ImageNet.

For CIFAR and MNIST, we suggest to try the shake-shake model:
`--model=shake_shake --hparams_set=shakeshake_big`.
This setting trained for `--train_steps=700000` should yield
close to 97% accuracy on CIFAR-10.

### Language Modeling

For language modeling, we have these data-sets in T2T:
* PTB (a small data-set): `--problem=languagemodel_ptb10k` for
    word-level modeling and `--problem=languagemodel_ptb_characters`
    for character-level modeling.
* LM1B (a billion-word corpus): `--problem=languagemodel_lm1b32k` for
    subword-level modeling and `--problem=languagemodel_lm1b_characters`
    for character-level modeling.

We suggest to start with `--model=transformer` on this task and use
`--hparams_set=transformer_small` for PTB and
`--hparams_set=transformer_base` for LM1B.

### Sentiment Analysis

For the task of recognizing the sentiment of a sentence, use
* the IMDB data-set: `--problem=sentiment_imdb`

We suggest to use `--model=transformer_encoder` here and since it is
a small data-set, try `--hparams_set=transformer_tiny` and train for
few steps (e.g., `--train_steps=2000`).

### Speech Recognition

For speech-to-text, we have these data-sets in T2T:
* Librispeech (English speech to text): `--problem=librispeech` for
    the whole set and `--problem=librispeech_clean` for a smaller
    but nicely filtered part.

### Summarization

For summarizing longer text into shorter one we have these data-sets:
* CNN/DailyMail articles summarized into a few sentences:
  `--problem=summarize_cnn_dailymail32k`

We suggest to use `--model=transformer` and
`--hparams_set=transformer_prepend` for this task.
This yields good ROUGE scores.

### Translation

There are a number of translation data-sets in T2T:
* English-German: `--problem=translate_ende_wmt32k`
* English-French: `--problem=translate_enfr_wmt32k`
* English-Czech: `--problem=translate_encs_wmt32k`
* English-Chinese: `--problem=translate_enzh_wmt32k`
* English-Vietnamese: `--problem=translate_envi_iwslt32k`
* English-Spanish: `--problem=translate_enes_wmt32k`

You can get translations in the other direction by appending `_rev` to
the problem name, e.g., for German-English use
`--problem=translate_ende_wmt32k_rev`.

For all translation problems, we suggest to try the Transformer model:
`--model=transformer`. At first it is best to try the base setting,
`--hparams_set=transformer_base`. When trained on 8 GPUs for 300K steps
this should reach a BLEU score of about 28 on the English-German data-set,
which is close to state-of-the art. If training on a single GPU, try the
`--hparams_set=transformer_base_single_gpu` setting. For very good results
or larger data-sets (e.g., for English-French), try the big model
with `--hparams_set=transformer_big`.


================================================
FILE: docs/multi_problem.md
================================================
# Multi-problem training

Multi-problem training is possible by defining [MultiProblem](https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/data_generators/multi_problem.py) sub-classes that specify a list of [Problem](https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/data_generators/problem.py) objects to include in training. In some cases, multi-problem training can be used to improve performance compared to training on individual problems.

In the following sections we'll discuss MultiProblem from a usage perspective followed by that of someone wishing to build upon it.

Please note the [T2T Walkthrough](https://github.com/tensorflow/tensor2tensor/blob/master/docs/walkthrough.md) documentation is a good place to start to understand the variety of component concepts we'll build on here.

## Usage

### Problem definition and datagen

In this discussion we'll consider the following (large) multi-problem that includes ten different sub-problems. These include:

1. A [language modeling](https://en.wikipedia.org/wiki/Language_model) [problem](https://github.com/tensorflow/tensor2tensor/blob/0dff89d64c3406d42717280cb9135a5ce7af793c/tensor2tensor/data_generators/wiki_lm.py#L223) operating on a corpus of German, English, French, and Romanian language wikipedia articles.
2. Multiple compatible pairwise language translation problems (En -> De, En -> Fr, En -> Ro, De -> En, Fr -> En, Ro -> En)
3. A compatible [version](https://github.com/tensorflow/tensor2tensor/blob/ef12bee72270b322165d073c39a650a189de39aa/tensor2tensor/data_generators/cnn_dailymail.py#L267) of the combined CNN/DailyMail news article summarization problem.
4. A compatible [version](https://github.com/tensorflow/tensor2tensor/blob/ef12bee72270b322165d073c39a650a189de39aa/tensor2tensor/data_generators/multinli.py#L155) of the [MultiNLI](https://www.nyu.edu/projects/bowman/multinli/) textual entailment classification problem.
5. A compatible [version](https://github.com/tensorflow/tensor2tensor/blob/1de13dbebccb415d89b0658e18a57e9607bafd32/tensor2tensor/data_generators/squad.py#L126) of the [SQuAD](https://rajpurkar.github.io/SQuAD-explorer/) question/answer problem.

```python

@registry.register_problem
class LanguagemodelMultiWikiTranslate(multi_problem.MultiProblem):
  """Wiki multi-lingual LM and multiple translations."""

  def __init__(self, was_reversed=False, was_copy=False):
    super(LanguagemodelMultiWikiTranslate, self).__init__(
        was_reversed, was_copy)
    self.task_list.append(wiki_lm.LanguagemodelDeEnFrRoWiki64k())
    self.task_list.append(translate_ende.TranslateEndeWmtMulti64k())
    self.task_list.append(translate_enfr.TranslateEnfrWmtMulti64k())
    self.task_list.append(translate_enro.TranslateEnroWmtMultiTiny64k())
    self.task_list.append(translate_ende.TranslateEndeWmtMulti64k(
        was_reversed=True))
    self.task_list.append(translate_enfr.TranslateEnfrWmtMulti64k(
        was_reversed=True))
    self.task_list.append(translate_enro.TranslateEnroWmtMultiTiny64k(
        was_reversed=True))
    self.task_list.append(
        cnn_dailymail.SummarizeCnnDailymailWikiLMMultiVocab64k())
    self.task_list.append(multinli.MultiNLIWikiLMMultiVocab64k())
    self.task_list.append(squad.SquadConcatMulti64k())

  @property
  def vocab_type(self):
    return text_problems.VocabType.SUBWORD

```

The word "compatible" was used a lot above! That's because each of these problems have been modified to use the vocabulary produced by the Wikipedia-based language modeling problem, e.g. the following

```python
@registry.register_problem
class SummarizeCnnDailymailWikiLMMultiVocab64k(SummarizeCnnDailymail32k):
  """Summarize CNN and Daily Mail articles using multi-lingual 64k vocab."""

  @property
  def vocab_filename(self):
    return wiki_lm.LanguagemodelDeEnFrRoWiki64k().vocab_filename
```

**Important note:** It's easy to miss the key point that, as implemented currently, the first task in the task list must be a language modelling problem and each included task must be modified to use the resulting vocabulary.

With a properly defined and registered multi-problem we can now run datagen as follows:

```bash

t2t-datagen --problem=languagemodel_multi_wiki_translate

```

This will take approximately the following amount of space (and several hours):

```bash
(t2t) username@instance-2:~$ du -sh /tmp
99G     /tmp
(t2t) username@instance-2:~$ du -sh /tmp/t2t_datagen
81G     /tmp/t2t_datagen
```

### Training

Next we're ready to try training a model on this MultiProblem. Note that by not specifying `--data_dir` above TFExample's were by default generated into /tmp so that's what we'll explicitly provide here.

```bash

t2t-trainer --problem=languagemodel_multi_wiki_translate \
    --model=transformer \
    --hparams_set=transformer_tall_pretrain_lm_tpu_adafactor_large \
    --output_dir ~/t2t_train/transformer_multi_2jan19 \
    --data_dir=/tmp \
    --train_steps=1 \
    --eval_steps=1

```

The `hparams_set` parameter we provided above was [transformer_tall_pretrain_lm_tpu_adafactor_large](https://github.com/tensorflow/tensor2tensor/blob/08e83030acf3ef13d15ad6eaefaa0a67fb20b59d/tensor2tensor/models/transformer.py#L1721), also provided below:

```python

@registry.register_hparams
def transformer_tall_pretrain_lm_tpu_adafactor_large():
  """Hparams for transformer on LM pretraining on TPU, large model."""
  hparams = transformer_tall_pretrain_lm_tpu_adafactor()
  hparams.hidden_size = 1024
  hparams.num_heads = 16
  hparams.filter_size = 32768  # max fitting in 16G memory is 49152, batch 2
  hparams.batch_size = 4
  hparams.multiproblem_mixing_schedule = "constant"
  # Task order: lm/en-de/en-fr/en-ro/de-en/fr-en/ro-en/cnndm/mnli/squad.
  hparams.multiproblem_per_task_threshold = "320,80,160,2,80,160,2,20,5,5"
  return hparams

```

Here it's worth noting a couple things, one that we have specified a `multi_problem_mixing_schedule` (which is required), consumed by [MultiProblem.mix_data](https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/data_generators/multi_problem.py#L280). When set to "constant" the strategy for sampling examples is not a function of step and is proportional only to the per-task "thresholds" which are by default equal (sample examples from each problem with equal probability).

But notice we have also specified the (non-required) `multiproblem_per_task_threshold` parameter, also consumed by mix_data, and specifically used by [sample_task](https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/data_generators/multi_problem.py#L340) which defines non-uniform thresholds to inform a weighted random sampling. E.g. for two problems with weights 1 and 9 the first would be sampled 1/10 of the time and the other 9/10.

### Inference

You can try translating from English to German using a model previously trained on `LanguagemodelMultiWikiTranslate` (the one shown above) ([gs://tensor2tensor-checkpoints/transformer_multi_2jan19/](https://console.cloud.google.com/storage/browser/tensor2tensor-checkpoints/transformer_multi_2jan19/)). Just copy the checkpoint down to a local directory such as the one given via `--output_dir` below:

```bash

t2t-decoder --problem=languagemodel_multi_wiki_translate \
    --model=transformer \
    --hparams_set=transformer_tall_pretrain_lm_tpu_adafactor_large \
    --decode_hparams='batch_size=1,multiproblem_task_id=64510' \
    --hparams="" \
    --output_dir=~/t2t_train/transformer_multi_2jan19 \
    --decode_from_file ~/newstest2014.en \
    --data_dir=~/t2t_train/transformer_multi_2jan19

```

Here we'll point `--data_dir` to the checkpoint directory which includes the vocab file `vocab.languagemodel_de_en_fr_ro_wiki64k.64000.subwords`; typically data_dir would point to the directory containing your TFRecord example dataset(s).

The file passed to `--decode_from_file` is simply a file with one sentence to translate on each line (in its original form, not post-vocabulary-encoded).

A key requirement for multi-problem inference is that we specify the ID of the problem for which we want to perform inference. But wait, why is the task ID 64510? We can see from the code for [`MultiProblem.update_task_ids`](https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/data_generators/multi_problem.py#L386) that TID's have a place at the end of the vocabulary.

```python

class MultiProblem(problem.Problem):
  """MultiProblem base class."""

  ...

  def update_task_ids(self, encoder_vocab_size):
    """Generate task_ids for each problem.
    These ids correspond to the index of the task in the task_list.
    Args:
      encoder_vocab_size: the size of the vocab which is used to compute
        the index offset.
    """
    for idx, task in enumerate(self.task_list):
      task.set_task_id(idx + encoder_vocab_size)
      tf.logging.info("Task %d (%s) has id %d." %
                      (idx, task.name, task.task_id))

```

We can look up the task_id that is assigned to each task we may want to use for inference by instantiating the MultiProblem subclass and obtaining the value, in this case via the following:

```python

task_index = 1 # The second task in the list is En -> De
LanguagemodelMultiWikiTranslate().task_list[task_index].task_id

```

For me running the `t2t-decode` command provided above gave the following output:

```bash
...

INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Inference results INPUT: hello world was the news of the day
INFO:tensorflow:Inference results OUTPUT: Hallo Welt war die Nachricht des Tages
INFO:tensorflow:Elapsed Time: 37.15079
INFO:tensorflow:Averaged Single Token Generation Time: 3.3009222 (time 36.3101439 count 11)

...

```


================================================
FILE: docs/new_model.md
================================================
# T2T: Create Your Own Model

[![PyPI
version](https://badge.fury.io/py/tensor2tensor.svg)](https://badge.fury.io/py/tensor2tensor)
[![GitHub
Issues](https://img.shields.io/github/issues/tensorflow/tensor2tensor.svg)](https://github.com/tensorflow/tensor2tensor/issues)
[![Contributions
welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg)](../CONTRIBUTING.md)
[![Gitter](https://img.shields.io/gitter/room/nwjs/nw.js.svg)](https://gitter.im/tensor2tensor/Lobby)
[![License](https://img.shields.io/badge/License-Apache%202.0-brightgreen.svg)](https://opensource.org/licenses/Apache-2.0)

Here we show how to create your own model in T2T.

## The T2TModel class - abstract base class for models

  `T2TModel` has three typical usages:

1.  Estimator: The method `make_estimator_model_fn` builds a `model_fn` for the
    tf.Estimator workflow of training, evaluation, and prediction. It performs
    the method `call`, which performs the core computation, followed by
    `estimator_spec_train`, `estimator_spec_eval`, or `estimator_spec_predict`
    depending on the tf.Estimator mode.
2.  Layer: The method `call` enables `T2TModel` to be used a callable by itself.
    It calls the following methods:

    *   `bottom`, which transforms features according to `problem_hparams`'
        input and target `Modality`s;
    *   `body`, which takes features and performs the core model computation to
        return output and any auxiliary loss terms;
    *   `top`, which takes features and the body output, and transforms them
        according to `problem_hparams`' input and target `Modality`s to return
        the final logits;
    *   `loss`, which takes the logits, forms any missing training loss, and
        sums all loss terms.

3.  Inference: The method `infer` enables `T2TModel` to make sequence
    predictions by itself.

## Creating your own model

1.  Create a class that extends `T2TModel`. This example creates a copy of an
    existing basic fully-connected network:

    ```python
    from tensor2tensor.utils import t2t_model

    class MyFC(t2t_model.T2TModel):
        pass
    ```

2.  Implement the `body` method:

    ```python
    class MyFC(t2t_model.T2TModel):
      def body(self, features):
        hparams = self.hparams
        x = features["inputs"]
        shape = common_layers.shape_list(x)
        x = tf.reshape(x, [-1, shape[1] * shape[2] * shape[3]])  # Flatten input as in T2T they are all 4D vectors
        for i in range(hparams.num_hidden_layers): # create layers
          x = tf.layers.dense(x, hparams.hidden_size, name="layer_%d" % i)
          x = tf.nn.dropout(x, keep_prob=1.0 - hparams.dropout)
          x = tf.nn.relu(x)
        return tf.expand_dims(tf.expand_dims(x, axis=1), axis=1)  # 4D For T2T.
    ```

    Method Signature:

    *   Args:

        *   features: dict of str to Tensor, where each Tensor has shape
            [batch_size, ..., hidden_size]. It typically contains keys `inputs`
            and `targets`.

    *   Returns one of:

        *   output: Tensor of pre-logit activations with shape [batch_size, ...,
            hidden_size].
        *   losses: Either single loss as a scalar, a list, a Tensor (to be
            averaged), or a dictionary of losses. If losses is a dictionary with
            the key "training", losses["training"] is considered the final
            training loss and output is considered logits; self.top and
            self.loss will be skipped.

3.  Register your model:

    ```python
    from tensor2tensor.utils import registry

    @registry.register_model
    class MyFC(t2t_model.T2TModel):
       # ...
    ```

4.  Use it with t2t tools as any other model:

    Have in mind that names are translated from camel case to snake_case `MyFC`
    -> `my_fc` and that you need to point t2t to the directory containing your
    model with the `--t2t_usr_dir` flag. For example if you want to train a
    model on gcloud with 1 GPU worker on the IMDB sentiment task, you can run
    your model by executing the following command from your model class
    directory.

    ```bash
    t2t-trainer \
      --model=my_fc \
      --t2t_usr_dir=.
      --cloud_mlengine --worker_gpu=1 \
      --generate_data \
      --data_dir='gs://data' \
      --output_dir='gs://out' \
      --problem=sentiment_imdb \
      --hparams_set=basic_fc_small \
      --train_steps=10000 \
      --eval_steps=10 \
    ```


================================================
FILE: docs/new_problem.md
================================================
# T2T: Train on Your Own Data

[![PyPI
version](https://badge.fury.io/py/tensor2tensor.svg)](https://badge.fury.io/py/tensor2tensor)
[![GitHub
Issues](https://img.shields.io/github/issues/tensorflow/tensor2tensor.svg)](https://github.com/tensorflow/tensor2tensor/issues)
[![Contributions
welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg)](CONTRIBUTING.md)
[![Gitter](https://img.shields.io/gitter/room/nwjs/nw.js.svg)](https://gitter.im/tensor2tensor/Lobby)
[![License](https://img.shields.io/badge/License-Apache%202.0-brightgreen.svg)](https://opensource.org/licenses/Apache-2.0)

Another good overview of this part together with training is given in
[The Cloud ML Poetry Blog
Post](https://cloud.google.com/blog/big-data/2018/02/cloud-poetry-training-and-hyperparameter-tuning-custom-text-models-on-cloud-ml-engine)

Let's add a new dataset together and train the
[Transformer](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/models/transformer.py)
model on it. We'll give the model a line of poetry, and it will learn to
generate the next line.

# Defining the `Problem`

For each problem we want to tackle we create a new subclass of `Problem` and
register it. Let's call our problem `PoetryLines`.

Since many text-to-text problems share similar methods, there's already a class
called
[`Text2TextProblem`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/data_generators/text_problems.py)
that extends the base problem class
[`Problem`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/data_generators/problem.py)
and makes it easy to add text-to-text problems.

In that same file, there are other base classes that make it easy to add text
classification tasks (`Text2ClassProblem`) and language modeling tasks
(`Text2SelfProblem`).

For our problem, let's create the file `poetry_lines.py` and add our new
problem, `PoetryLines`, which extends `Text2TextProblem` and register it so that
it is accessible by command-line flag.

Here's the Problem in full. We'll go step by step through it.

```python
import re

from gutenberg import acquire
from gutenberg import cleanup

from tensor2tensor.data_generators import problem
from tensor2tensor.data_generators import text_problems
from tensor2tensor.utils import registry

@registry.register_problem
class PoetryLines(text_problems.Text2TextProblem):
  """Predict next line of poetry from the last line. From Gutenberg texts."""

  @property
  def approx_vocab_size(self):
    return 2**13  # ~8k

  @property
  def is_generate_per_split(self):
    # generate_data will shard the data into TRAIN and EVAL for us.
    return False

  @property
  def dataset_splits(self):
    """Splits of data to produce and number of output shards for each."""
    # 10% evaluation data
    return [{
        "split": problem.DatasetSplit.TRAIN,
        "shards": 9,
    }, {
        "split": problem.DatasetSplit.EVAL,
        "shards": 1,
    }]

  def generate_samples(self, data_dir, tmp_dir, dataset_split):
    del data_dir
    del tmp_dir
    del dataset_split


    books = [
        # bookid, skip N lines
        (19221, 223),
        (15553, 522),
    ]

    for (book_id, toskip) in books:
      text = cleanup.strip_headers(acquire.load_etext(book_id)).strip()
      lines = text.split("\n")[toskip:]
      prev_line = None
      ex_count = 0
      for line in lines:
        # Any line that is all upper case is a title or author name
        if not line or line.upper() == line:
          prev_line = None
          continue

        line = re.sub("[^a-z]+", " ", line.strip().lower())
        if prev_line and line:
          yield {
              "inputs": prev_line,
              "targets": line,
          }
          ex_count += 1
        prev_line = line
```

## Vocabulary specification

The text generated is encoded with a vocabulary for training. By default, it is
a `SubwordTextEncoder` that is built with an approximate vocab size specified by
the user. It's fully invertible (no out-of-vocab tokens) with a fixed-size vocab
which makes it ideal for text problems.

You can also choose to use a character-level encoder or a token encoder where
you provide the vocab file yourself. See `Text2TextProblem.vocab_type`.

Here we specify that we're going to have a vocabulary with approximately 8,000
subwords.

```python
  @property
  def approx_vocab_size(self):
    return 2**13  # ~8k
```

## Splitting data between Train and Eval

By setting `is_generate_per_split=False`, the `generate_samples` method will
only be called once and the data will automatically be split across training and
evaluation data for us. This is useful because for our dataset we don't have
pre-existing "training" and "evaluation" sets. If we did, we'd set
`is_generate_per_split=True` so that `generate_samples` was called once per data
split.

The `dataset_splits` method determines the fraction that goes to each split. The
training data will be generated into 9 files and the evaluation data into 1.
90% of the data will be for training. 10% of the data will be for evaluation.

```python
  @property
  def is_generate_per_split(self):
    # generate_data will shard the data into TRAIN and EVAL for us.
    return False

  @property
  def dataset_splits(self):
    """Splits of data to produce and number of output shards for each."""
    # 10% evaluation data
    return [{
        "split": problem.DatasetSplit.TRAIN,
        "shards": 9,
    }, {
        "split": problem.DatasetSplit.EVAL,
        "shards": 1,
    }]
```

## Generating samples

`generate_samples` is the bulk of the code where we actually produce
dictionaries of poetry line pairs ("inputs" and "targets").

Some problems might require downloading, which can be done into `tmp_dir`. Some
problems may use their own token vocabulary file, in which case it can be copied
into `data_dir` before yielding samples.

Here we iterate through the lines of a couple books of poetry and produce pairs
of lines for the model to train against.

```python
  def generate_samples(self, data_dir, tmp_dir, dataset_split):
    del data_dir
    del tmp_dir
    del dataset_split

    books = [
        # bookid, skip N lines
        (19221, 223),
        (15553, 522),
    ]

    for (book_id, toskip) in books:
      text = cleanup.strip_headers(acquire.load_etext(book_id)).strip()
      lines = text.split("\n")[toskip:]
      prev_line = None
      ex_count = 0
      for line in lines:
        # Any line that is all upper case is a title or author name
        if not line or line.upper() == line:
          prev_line = None
          continue

        line = re.sub("[^a-z]+", " ", line.strip().lower())
        if prev_line and line:
          yield {
              "inputs": prev_line,
              "targets": line,
          }
          ex_count += 1
        prev_line = line
```

That's all for the problem specification! We're ready to generate the data.

# Run data generation

You can generate data for your problem with `t2t-datagen` and the
`--t2t_usr_dir` flag, which points to the directory containing an `__init__.py`
file that imports the `poetry_lines` file we just wrote. See setup below.

```bash
USR_DIR=...
PROBLEM=poetry_lines
DATA_DIR=$HOME/t2t_data
TMP_DIR=/tmp/t2t_datagen
mkdir -p $DATA_DIR $TMP_DIR

t2t-datagen \
  --t2t_usr_dir=$USR_DIR \
  --data_dir=$DATA_DIR \
  --tmp_dir=$TMP_DIR \
  --problem=$PROBLEM
```

`PROBLEM` is the name of the class that was registered with
`@registry.register_problem`, but converted from `CamelCase` to `snake_case`.

`USR_DIR` is a directory with the `poetry_lines.py` file and an
`__init__.py` file that imports it (`from . import poetry_lines`).

If you plan to contribute problems to the tensor2tensor repository, you can
clone the repository and install it in developer mode with `pip install -e .`.

# Train!

You can train exactly as you do in the [walkthrough](walkthrough.md) with flags
`--problem=poetry_lines` and `--t2t_usr_dir=$USR_DIR`.

All done. Let us know what amazing poetry your model writes!


================================================
FILE: docs/overview.md
================================================
# T2T: Life of an Example

[![PyPI
version](https://badge.fury.io/py/tensor2tensor.svg)](https://badge.fury.io/py/tensor2tensor)
[![GitHub
Issues](https://img.shields.io/github/issues/tensorflow/tensor2tensor.svg)](https://github.com/tensorflow/tensor2tensor/issues)
[![Contributions
welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg)](CONTRIBUTING.md)
[![Gitter](https://img.shields.io/gitter/room/nwjs/nw.js.svg)](https://gitter.im/tensor2tensor/Lobby)
[![License](https://img.shields.io/badge/License-Apache%202.0-brightgreen.svg)](https://opensource.org/licenses/Apache-2.0)

This doc explains how a training example flows through T2T, from data generation
to training, evaluation, and decoding.

Some key files and their functions:

*   [`t2t_trainer.py`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/bin/t2t_trainer.py) and [`trainer_lib.py`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/utils/trainer_lib.py):
    Main entrypoint for training and evaluation.  Constructs and runs all the
    main components of the system (the `Problem`, the `HParams`, the
    `Estimator`, the `Experiment`, the `input_fn`s and `model_fn`).
*   [`common_hparams.py`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/layers/common_hparams.py):
    `basic_params1` serves as the base for all model hyperparameters. Registered
    model hparams functions always start with this default set of
    hyperparameters.
*   [`problem.py`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/data_generators/problem.py):
    Every dataset in T2T subclasses `Problem`. `Problem.input_fn` is the
    Estimator input function.
*   [`t2t_model.py`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/utils/t2t_model.py):
    Every model in T2T subclasses `T2TModel`. `T2TModel.estimator_model_fn` is
    the Estimator model function.

## Data Generation

The `t2t-datagen` binary is the entrypoint for data generation. It simply looks
up the `Problem` specified by `--problem` and calls
`Problem.generate_data(data_dir, tmp_dir)`.

All `Problem`s are expected to generate 2 sharded `TFRecords` files - 1 for
training and 1 for evaluation - with `tensorflow.Example` protocol buffers. The
expected names of the files are given by `Problem.{training, dev}_filepaths`.
Typically, the features in the `Example` will be `"inputs"` and `"targets"`;
however, some tasks have a different on-disk representation that is converted to
`"inputs"` and `"targets"` online in the input pipeline (e.g. image features are
typically stored with features `"image/encoded"` and `"image/format"` and the
decoding happens in the input pipeline).

For tasks that require a vocabulary, this is also the point at which the
vocabulary is generated and all examples are encoded.

There are several utility functions in
[`generator_utils`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/data_generators/generator_utils.py)
that are commonly used by `Problem`s to generate data. Several are highlighted
below:

*   `generate_dataset_and_shuffle`: given 2 generators, 1 for training and 1 for
    eval, yielding dictionaries of `<feature name, list< int or float or
    string >>`, will produce sharded and shuffled `TFRecords` files with
    `tensorflow.Example` protos.
*   `maybe_download`: downloads a file at a URL to the given directory and
    filename (see `maybe_download_from_drive` if the URL points to Google
    Drive).
*   `get_or_generate_vocab_inner`: given a target vocabulary size and a
    generator that yields lines or tokens from the dataset, will build a
    `SubwordTextEncoder` along with a backing vocabulary file that can be used
    to map input strings to lists of ids.
    [`SubwordTextEncoder`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/data_generators/text_encoder.py)
    uses word pieces and its encoding is fully invertible.

## Data Input Pipeline

Once the data is produced on disk, training, evaluation, and inference (if
decoding from the dataset) consume it by way of the T2T input pipeline, defined
by `Problem.input_fn`.

The entire input pipeline is implemented with the new `tf.data.Dataset` API.

The input function has 2 main parts: first, reading and processing individual
examples, which is done is `Problem.dataset`, and second, batching, which is
done in `Problem.input_fn` after the call to `Problem.dataset`.

`Problem` subclasses may override the entire `input_fn` or portions of it (e.g.
`example_reading_spec` to indicate the names, types, and shapes of features on
disk). Typically they only override portions.

### Batching

Problems that have fixed size features (e.g. image problems) can use
`hp.batch_size` to set the batch size.

Variable length Problems are bucketed by sequence length and then batched out of
those buckets.  This significantly improves performance over a naive batching
scheme for variable length sequences because each example in a batch must be
padded to match the example with the maximum length in the batch.

Controlling hparams:

* `hp.batch_size`: the approximate total number of tokens in
  the batch (i.e. long sequences will have smaller actual batch size and short
  sequences will have a larger actual batch size in order to generally have an
  equal number of tokens in the batch).
* `hp.max_length`: For variable length features, sequences with length longer
  than this will be dropped during training (and also during eval if
  `hp.eval_drop_long_sequences` is `True`). If not set, the maximum length of
  examples is set to `hp.batch_size`.
* `hp.batch_size_multiplier`: multiplier for the maximum length
* `hp.min_length_bucket`: example length for the smallest bucket (i.e. the
  smallest bucket will bucket examples up to this length).
* `hp.length_bucket_step`: controls how spaced out the length buckets are.

## Building the Model

At this point, the input features typically have `"inputs"` and `"targets"`,
each of which is a batched 4-D Tensor (e.g. of shape `[batch_size,
sequence_length, 1, 1]` for text input or `[batch_size, height, width, 3]` for
image input).

The Estimator model function is created by `T2TModel.estimator_model_fn`, which
may be overridden in its entirety by subclasses if desired. Typically,
subclasses only override `T2TModel.body`.

The model function constructs a `T2TModel`, calls it, and then calls
`T2TModel.{estimator_spec_train, estimator_spec_eval, estimator_spec_predict}`
depending on the mode.

A call of a `T2TModel` internally calls `bottom`, `body`, `top`, and `loss`, all
of which can be overridden by subclasses (typically only `body` is).

The default implementations of `bottom`, `top`, and `loss` depend on the
`Modality` specified for the input and target features (e.g.
`SymbolModality.bottom` embeds integer tokens and `SymbolModality.loss` is
`softmax_cross_entropy`).

## `Estimator` and `Experiment`

The actual training loop and related services (checkpointing, summaries,
continuous evaluation, etc.) are all handled by `Estimator` and `Experiment`
objects. `t2t_trainer.py` is the main entrypoint and uses `trainer_lib.py`
to construct the various components.

## Decoding

* [`t2t_decoder.py`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/bin/t2t-decoder)
* [`decoding.py`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/utils/decoding.py)

## System Overview for Train/Eval

See `t2t_trainer.py` and `trainer_lib.py`.

* Create HParams
* Create `RunConfig`, including `Parallelism` object (i.e. `data_parallelism`)
* Create `Experiment`, including hooks
* Create `Estimator`
  * `T2TModel.estimator_model_fn`
    * `model(features)`
      * `model.model_fn`
        * `model.bottom`
        * `model.body`
        * `model.top`
        * `model.loss`
    * [TRAIN] `model.estimator_spec_train`
      * `train_op = model.optimize`
    * [EVAL] `model.estimator_spec_eval`
      * Create metrics
* Create input functions
  * `Problem.input_fn`
    * `Problem.dataset`
    * Batching
* Create hooks
* Run Experiment --schedule (e.g. `exp.continuous_train_and_eval()`)
  * `estimator.train`
    * `train_op = model_fn(input_fn(mode=TRAIN))`
    * Run train op
  * `estimator.evaluate`
    * `metrics = model_fn(input_fn(mode=EVAL))`
    * Accumulate metrics


================================================
FILE: docs/tutorials/asr_with_transformer.md
================================================
# Automated Speech Recognition with the Transformer model

See the
[official tutorial](https://cloud.google.com/tpu/docs/tutorials/automated-speech-recognition).


================================================
FILE: docs/walkthrough.md
================================================
# Tensor2Tensor

[![PyPI
version](https://badge.fury.io/py/tensor2tensor.svg)](https://badge.fury.io/py/tensor2tensor)
[![GitHub
Issues](https://img.shields.io/github/issues/tensorflow/tensor2tensor.svg)](https://github.com/tensorflow/tensor2tensor/issues)
[![Contributions
welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg)](CONTRIBUTING.md)
[![Gitter](https://img.shields.io/gitter/room/nwjs/nw.js.svg)](https://gitter.im/tensor2tensor/Lobby)
[![License](https://img.shields.io/badge/License-Apache%202.0-brightgreen.svg)](https://opensource.org/licenses/Apache-2.0)
[![Travis](https://img.shields.io/travis/tensorflow/tensor2tensor.svg)](https://travis-ci.org/tensorflow/tensor2tensor)
[![Run on FH](https://static.floydhub.com/button/button-small.svg)](https://floydhub.com/run)

[Tensor2Tensor](https://github.com/tensorflow/tensor2tensor), or
[T2T](https://github.com/tensorflow/tensor2tensor) for short, is a library
of deep learning models and datasets designed to make deep learning more
accessible and [accelerate ML
research](https://research.googleblog.com/2017/06/accelerating-deep-learning-research.html).


T2T was developed by researchers and engineers in the
[Google Brain team](https://research.google.com/teams/brain/) and a community
of users. It is now deprecated &mdash; we keep it running and welcome
bug-fixes, but encourage users to use the successor library [Trax](https://github.com/google/trax).

### Quick Start

[This iPython notebook](https://colab.research.google.com/github/tensorflow/tensor2tensor/blob/master/tensor2tensor/notebooks/hello_t2t.ipynb)
explains T2T and runs in your browser using a free VM from Google,
no installation needed. Alternatively, here is a one-command version that
installs T2T, downloads MNIST, trains a model and evaluates it:

```
pip install tensor2tensor && t2t-trainer \
  --generate_data \
  --data_dir=~/t2t_data \
  --output_dir=~/t2t_train/mnist \
  --problem=image_mnist \
  --model=shake_shake \
  --hparams_set=shake_shake_quick \
  --train_steps=1000 \
  --eval_steps=100
```

### Contents

* [Suggested Datasets and Models](#suggested-datasets-and-models)
  * [Mathematical Language Understanding](#mathematical-language-understanding)
  * [Story, Question and Answer](#story-question-and-answer)
  * [Image Classification](#image-classification)
  * [Image Generation](#image-generation)
  * [Language Modeling](#language-modeling)
  * [Sentiment Analysis](#sentiment-analysis)
  * [Speech Recognition](#speech-recognition)
  * [Summarization](#summarization)
  * [Translation](#translation)
* [Basics](#basics)
  * [Walkthrough](#walkthrough)
  * [Installation](#installation)
  * [Features](#features)
* [T2T Overview](#t2t-overview)
  * [Datasets](#datasets)
  * [Problems and Modalities](#problems-and-modalities)
  * [Models](#models)
  * [Hyperparameter Sets](#hyperparameter-sets)
  * [Trainer](#trainer)
* [Adding your own components](#adding-your-own-components)
* [Adding a dataset](#adding-a-dataset)
* [Papers](#papers)
* [Run on FloydHub](#run-on-floydhub)

## Suggested Datasets and Models

Below we list a number of tasks that can be solved with T2T when
you train the appropriate model on the appropriate problem.
We give the problem and model below and we suggest a setting of
hyperparameters that we know works well in our setup. We usually
run either on Cloud TPUs or on 8-GPU machines; you might need
to modify the hyperparameters if you run on a different setup.

### Mathematical Language Understanding

For evaluating mathematical expressions at the character level involving addition, subtraction and multiplication of both positive and negative decimal numbers with variable digits assigned to symbolic variables, use

* the [MLU](https://art.wangperawong.com/mathematical_language_understanding_train.tar.gz) data-set:
 `--problem=algorithmic_math_two_variables`

You can try solving the problem with different transformer models and hyperparameters as described in the [paper](https://arxiv.org/abs/1812.02825):
* Standard transformer:
`--model=transformer`
`--hparams_set=transformer_tiny`
* Universal transformer:
`--model=universal_transformer`
`--hparams_set=universal_transformer_tiny`
* Adaptive universal transformer:
`--model=universal_transformer`
`--hparams_set=adaptive_universal_transformer_tiny`

### Story, Question and Answer

For answering questions based on a story, use

* the [bAbi](https://research.fb.com/downloads/babi/) data-set:
 `--problem=babi_qa_concat_task1_1k`

You can choose the bAbi task from the range [1,20] and the subset from 1k or
10k. To combine test data from all tasks into a single test set, use
`--problem=babi_qa_concat_all_tasks_10k`

### Image Classification

For image classification, we have a number of standard data-sets:

* ImageNet (a large data-set): `--problem=image_imagenet`, or one
   of the re-scaled versions (`image_imagenet224`, `image_imagenet64`,
   `image_imagenet32`)
* CIFAR-10: `--problem=image_cifar10` (or
    `--problem=image_cifar10_plain` to turn off data augmentation)
* CIFAR-100: `--problem=image_cifar100`
* MNIST: `--problem=image_mnist`

For ImageNet, we suggest to use the ResNet or Xception, i.e.,
use `--model=resnet --hparams_set=resnet_50` or
`--model=xception --hparams_set=xception_base`.
Resnet should get to above 76% top-1 accuracy on ImageNet.

For CIFAR and MNIST, we suggest to try the shake-shake model:
`--model=shake_shake --hparams_set=shakeshake_big`.
This setting trained for `--train_steps=700000` should yield
close to 97% accuracy on CIFAR-10.

### Image Generation

For (un)conditional image generation, we have a number of standard data-sets:

* CelebA: `--problem=img2img_celeba` for image-to-image translation, namely,
    superresolution from 8x8 to 32x32.
* CelebA-HQ: `--problem=image_celeba256_rev` for a downsampled 256x256.
* CIFAR-10: `--problem=image_cifar10_plain_gen_rev` for class-conditional
    32x32 generation.
* LSUN Bedrooms: `--problem=image_lsun_bedrooms_rev`
* MS-COCO: `--problem=image_text_ms_coco_rev` for text-to-image generation.
* Small ImageNet (a large data-set): `--problem=image_imagenet32_gen_rev` for
    32x32 or `--problem=image_imagenet64_gen_rev` for 64x64.

We suggest to use the Image Transformer, i.e., `--model=imagetransformer`, or
the Image Transformer Plus, i.e., `--model=imagetransformerpp` that uses
discretized mixture of logistics, or variational auto-encoder, i.e.,
`--model=transformer_ae`.
For CIFAR-10, using `--hparams_set=imagetransformer_cifar10_base` or
`--hparams_set=imagetransformer_cifar10_base_dmol` yields 2.90 bits per
dimension. For Imagenet-32, using
`--hparams_set=imagetransformer_imagenet32_base` yields 3.77 bits per dimension.

### Language Modeling

For language modeling, we have these data-sets in T2T:

* PTB (a small data-set): `--problem=languagemodel_ptb10k` for
    word-level modeling and `--problem=languagemodel_ptb_characters`
    for character-level modeling.
* LM1B (a billion-word corpus): `--problem=languagemodel_lm1b32k` for
    subword-level modeling and `--problem=languagemodel_lm1b_characters`
    for character-level modeling.

We suggest to start with `--model=transformer` on this task and use
`--hparams_set=transformer_small` for PTB and
`--hparams_set=transformer_base` for LM1B.

### Sentiment Analysis

For the task of recognizing the sentiment of a sentence, use

* the IMDB data-set: `--problem=sentiment_imdb`

We suggest to use `--model=transformer_encoder` here and since it is
a small data-set, try `--hparams_set=transformer_tiny` and train for
few steps (e.g., `--train_steps=2000`).

### Speech Recognition

For speech-to-text, we have these data-sets in T2T:

* Librispeech (US English): `--problem=librispeech` for
    the whole set and `--problem=librispeech_clean` for a smaller
    but nicely filtered part.

* Mozilla Common Voice (US English): `--problem=common_voice` for the whole set
    `--problem=common_voice_clean` for a quality-checked subset.

### Summarization

For summarizing longer text into shorter one we have these data-sets:

* CNN/DailyMail articles summarized into a few sentences:
  `--problem=summarize_cnn_dailymail32k`

We suggest to use `--model=transformer` and
`--hparams_set=transformer_prepend` for this task.
This yields good ROUGE scores.

### Translation

There are a number of translation data-sets in T2T:

* English-German: `--problem=translate_ende_wmt32k`
* English-French: `--problem=translate_enfr_wmt32k`
* English-Czech: `--problem=translate_encs_wmt32k`
* English-Chinese: `--problem=translate_enzh_wmt32k`
* English-Vietnamese: `--problem=translate_envi_iwslt32k`
* English-Spanish: `--problem=translate_enes_wmt32k`

You can get translations in the other direction by appending `_rev` to
the problem name, e.g., for German-English use
`--problem=translate_ende_wmt32k_rev`
(note that you still need to download the original data with t2t-datagen
`--problem=translate_ende_wmt32k`).

For all translation problems, we suggest to try the Transformer model:
`--model=transformer`. At first it is best to try the base setting,
`--hparams_set=transformer_base`. When trained on 8 GPUs for 300K steps
this should reach a BLEU score of about 28 on the English-German data-set,
which is close to state-of-the art. If training on a single GPU, try the
`--hparams_set=transformer_base_single_gpu` setting. For very good results
or larger data-sets (e.g., for English-French), try the big model
with `--hparams_set=transformer_big`.

See this [example](https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/notebooks/Transformer_translate.ipynb) to know how the translation works.

## Basics

### Walkthrough

Here's a walkthrough training a good English-to-German translation
model using the Transformer model from [*Attention Is All You
Need*](https://arxiv.org/abs/1706.03762) on WMT data.

```
pip install tensor2tensor

# See what problems, models, and hyperparameter sets are available.
# You can easily swap between them (and add new ones).
t2t-trainer --registry_help

PROBLEM=translate_ende_wmt32k
MODEL=transformer
HPARAMS=transformer_base_single_gpu

DATA_DIR=$HOME/t2t_data
TMP_DIR=/tmp/t2t_datagen
TRAIN_DIR=$HOME/t2t_train/$PROBLEM/$MODEL-$HPARAMS

mkdir -p $DATA_DIR $TMP_DIR $TRAIN_DIR

# Generate data
t2t-datagen \
  --data_dir=$DATA_DIR \
  --tmp_dir=$TMP_DIR \
  --problem=$PROBLEM

# Train
# *  If you run out of memory, add --hparams='batch_size=1024'.
t2t-trainer \
  --data_dir=$DATA_DIR \
  --problem=$PROBLEM \
  --model=$MODEL \
  --hparams_set=$HPARAMS \
  --output_dir=$TRAIN_DIR

# Decode

DECODE_FILE=$DATA_DIR/decode_this.txt
echo "Hello world" >> $DECODE_FILE
echo "Goodbye world" >> $DECODE_FILE
echo -e 'Hallo Welt\nAuf Wiedersehen Welt' > ref-translation.de

BEAM_SIZE=4
ALPHA=0.6

t2t-decoder \
  --data_dir=$DATA_DIR \
  --problem=$PROBLEM \
  --model=$MODEL \
  --hparams_set=$HPARAMS \
  --output_dir=$TRAIN_DIR \
  --decode_hparams="beam_size=$BEAM_SIZE,alpha=$ALPHA" \
  --decode_from_file=$DECODE_FILE \
  --decode_to_file=translation.en

# See the translations
cat translation.en

# Evaluate the BLEU score
# Note: Report this BLEU score in papers, not the internal approx_bleu metric.
t2t-bleu --translation=translation.en --reference=ref-translation.de
```

### Installation


```
# Assumes tensorflow or tensorflow-gpu installed
pip install tensor2tensor

# Installs with tensorflow-gpu requirement
pip install tensor2tensor[tensorflow_gpu]

# Installs with tensorflow (cpu) requirement
pip install tensor2tensor[tensorflow]
```

Binaries:

```
# Data generator
t2t-datagen

# Trainer
t2t-trainer --registry_help
```

Library usage:

```
python -c "from tensor2tensor.models.transformer import Transformer"
```

### Features

* Many state of the art and baseline models are built-in and new models can be
  added easily (open an issue or pull request!).
* Many datasets across modalities - text, audio, image - available for
  generation and use, and new ones can be added easily (open an issue or pull
  request for public datasets!).
* Models can be used with any dataset and input mode (or even multiple); all
  modality-specific processing (e.g. embedding lookups for text tokens) is done
  with `bottom` and `top` transformations, which are specified per-feature in the
  model.
* Support for multi-GPU machines and synchronous (1 master, many workers) and
  asynchronous (independent workers synchronizing through a parameter server)
  [distributed training](https://tensorflow.github.io/tensor2tensor/distributed_training.html).
* Easily swap amongst datasets and models by command-line flag with the data
  generation script `t2t-datagen` and the training script `t2t-trainer`.
* Train on [Google Cloud ML](https://tensorflow.github.io/tensor2tensor/cloud_mlengine.html) and [Cloud TPUs](https://tensorflow.github.io/tensor2tensor/cloud_tpu.html).

## T2T overview

### Problems

**Problems** consist of features such as inputs and targets, and metadata such
as each feature's modality (e.g. symbol, image, audio) and vocabularies. Problem
features are given by a dataset, which is stored as a `TFRecord` file with
`tensorflow.Example` protocol buffers. All
problems are imported in
[`all_problems.py`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/data_generators/all_problems.py)
or are registered with `@registry.register_problem`. Run
[`t2t-datagen`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/bin/t2t-datagen)
to see the list of available problems and download them.

### Models

**`T2TModel`s** define the core tensor-to-tensor computation. They apply a
default transformation to each input and output so that models may deal with
modality-independent tensors (e.g. embeddings at the input; and a linear
transform at the output to produce logits for a softmax over classes). All
models are imported in the
[`models` subpackage](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/models/__init__.py),
inherit from [`T2TModel`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/utils/t2t_model.py),
and are registered with
[`@registry.register_model`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/utils/registry.py).

### Hyperparameter Sets

**Hyperparameter sets** are encoded in
[`HParams`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/utils/hparam.py)
objects, and are registered with
[`@registry.register_hparams`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/utils/registry.py).
Every model and problem has a `HParams`. A basic set of hyperparameters are
defined in
[`common_hparams.py`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/layers/common_hparams.py)
and hyperparameter set functions can compose other hyperparameter set functions.

### Trainer

The **trainer** binary is the entrypoint for training, evaluation, and
inference. Users can easily switch between problems, models, and hyperparameter
sets by using the `--model`, `--problem`, and `--hparams_set` flags. Specific
hyperparameters can be overridden with the `--hparams` flag. `--schedule` and
related flags control local and distributed training/evaluation
([distributed training documentation](https://github.com/tensorflow/tensor2tensor/tree/master/docs/distributed_training.md)).

## Adding your own components

T2T's components are registered using a central registration mechanism that
enables easily adding new ones and easily swapping amongst them by command-line
flag. You can add your own components without editing the T2T codebase by
specifying the `--t2t_usr_dir` flag in `t2t-trainer`.

You can do so for models, hyperparameter sets, modalities, and problems. Please
do submit a pull request if your component might be useful to others.

See the [`example_usr_dir`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/test_data/example_usr_dir)
for an example user directory.

## Adding a dataset

To add a new dataset, subclass
[`Problem`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/data_generators/problem.py)
and register it with `@registry.register_problem`. See
[`TranslateEndeWmt8k`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/data_generators/translate_ende.py)
for an example. Also see the [data generators
README](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/data_generators/README.md).

## Run on FloydHub

[![Run on FloydHub](https://static.floydhub.com/button/button.svg)](https://floydhub.com/run)

Click this button to open a [Workspace](https://blog.floydhub.com/workspaces/) on [FloydHub](https://www.floydhub.com/?utm_medium=readme&utm_source=tensor2tensor&utm_campaign=jul_2018). You can use the workspace to develop and test your code on a fully configured cloud GPU machine.

Tensor2Tensor comes preinstalled in the environment, you can simply open a [Terminal](https://docs.floydhub.com/guides/workspace/#using-terminal) and run your code.

```bash
# Test the quick-start on a Workspace's Terminal with this command
t2t-trainer \
  --generate_data \
  --data_dir=./t2t_data \
  --output_dir=./t2t_train/mnist \
  --problem=image_mnist \
  --model=shake_shake \
  --hparams_set=shake_shake_quick \
  --train_steps=1000 \
  --eval_steps=100
```

Note: Ensure compliance with the FloydHub [Terms of Service](https://www.floydhub.com/about/terms).

## Papers

When referencing Tensor2Tensor, please cite [this
paper](https://arxiv.org/abs/1803.07416).

```
@article{tensor2tensor,
  author    = {Ashish Vaswani and Samy Bengio and Eugene Brevdo and
    Francois Chollet and Aidan N. Gomez and Stephan Gouws and Llion Jones and
    \L{}ukasz Kaiser and Nal Kalchbrenner and Niki Parmar and Ryan Sepassi and
    Noam Shazeer and Jakob Uszkoreit},
  title     = {Tensor2Tensor for Neural Machine Translation},
  journal   = {CoRR},
  volume    = {abs/1803.07416},
  year      = {2018},
  url       = {http://arxiv.org/abs/1803.07416},
}
```

Tensor2Tensor was used to develop a number of state-of-the-art models
and deep learning methods. Here we list some papers that were based on T2T
from the start and benefited from its features and architecture in ways
described in the [Google Research Blog post introducing
T2T](https://research.googleblog.com/2017/06/accelerating-deep-learning-research.html).

* [Attention Is All You Need](https://arxiv.org/abs/1706.03762)
* [Depthwise Separable Convolutions for Neural Machine
   Translation](https://arxiv.org/abs/1706.03059)
* [One Model To Learn Them All](https://arxiv.org/abs/1706.05137)
* [Discrete Autoencoders for Sequence Models](https://arxiv.org/abs/1801.09797)
* [Generating Wikipedia by Summarizing Long
   Sequences](https://arxiv.org/abs/1801.10198)
* [Image Transformer](https://arxiv.org/abs/1802.05751)
* [Training Tips for the Transformer Model](https://arxiv.org/abs/1804.00247)
* [Self-Attention with Relative Position Representations](https://arxiv.org/abs/1803.02155)
* [Fast Decoding in Sequence Models using Discrete Latent Variables](https://arxiv.org/abs/1803.03382)
* [Adafactor: Adaptive Learning Rates with Sublinear Memory Cost](https://arxiv.org/abs/1804.04235)
* [Universal Transformers](https://arxiv.org/abs/1807.03819)
* [Attending to Mathematical Language with Transformers](https://arxiv.org/abs/1812.02825)
* [The Evolved Transformer](https://arxiv.org/abs/1901.11117)
* [Model-Based Reinforcement Learning for Atari](https://arxiv.org/abs/1903.00374)
* [VideoFlow: A Flow-Based Generative Model for Video](https://arxiv.org/abs/1903.01434)

*NOTE: This is not an official Google product.*


================================================
FILE: floyd.yml
================================================
env: tensorflow-1.12
machine: gpu


================================================
FILE: floyd_requirements.txt
================================================
tensor2tensor


================================================
FILE: oss_scripts/oss_integration_test.sh
================================================
#!/bin/bash

# Note that this test script requires docker to be installed and running.

set -v  # print commands as they're executed
set -e  # fail and exit on any command erroring

: "${TF_VERSION:?}"
: "${TF_LATEST:?}"
: "${T2T_DATA_DIR:?}"
: "${T2T_TRAIN_DIR:?}"
: "${T2T_PROBLEM:?}"

# Test --t2t_usr_dir
t2t-trainer --registry_help --t2t_usr_dir=./tensor2tensor/test_data/example_usr_dir 2>&1 | grep my_very_own_hparams && echo passed

# Run data generation, training, and decoding on a dummy problem
t2t-datagen --problem=$T2T_PROBLEM --data_dir=$T2T_DATA_DIR
t2t-trainer --problem=$T2T_PROBLEM --data_dir=$T2T_DATA_DIR --model=transformer --hparams_set=transformer_tiny --train_steps=5 --eval_steps=5 --output_dir=$T2T_TRAIN_DIR
t2t-decoder --problem=$T2T_PROBLEM --data_dir=$T2T_DATA_DIR --model=transformer --hparams_set=transformer_tiny --output_dir=$T2T_TRAIN_DIR --decode_hparams='num_samples=10'

# Test serving
if [[ "$TF_VERSION" == "$TF_LATEST"  ]]
then
  # Export for serving
  pip install tensorflow_hub
  t2t-exporter \
      --problem=$T2T_PROBLEM \
      --data_dir=$T2T_DATA_DIR \
      --model=transformer \
      --hparams_set=transformer_tiny \
      --output_dir=$T2T_TRAIN_DIR

  # Run model server
  server_port=8500
  model_name=my_model
  docker run -d -p $server_port:$server_port \
      --mount type=bind,source=$T2T_TRAIN_DIR/export,target=/models/$model_name \
      -e MODEL_NAME=$model_name -t tensorflow/serving
  sleep 10

  # Query
  pip install tensorflow-serving-api=="$TF_VERSION"
  t2t-query-server \
      --server=localhost:$server_port \
      --servable_name=$model_name \
      --problem=$T2T_PROBLEM \
      --data_dir=$T2T_DATA_DIR \
      --inputs_once='1 0 1 0 1 0'
fi


================================================
FILE: oss_scripts/oss_pip_install.sh
================================================
#!/bin/bash

set -v  # print commands as they're executed
set -e  # fail and exit on any command erroring

: "${TF_VERSION:?}"

# Make sure we have the latest pip and setuptools installed.
pip install -q -U pip
pip install -q -U setuptools

# Make sure we have the latest version of numpy - avoid problems we were
# seeing with Python 3
pip install -q -U numpy
pip install -q "tensorflow==$TF_VERSION"

# Just print the version again to make sure.
python -c 'import tensorflow as tf; print(tf.__version__)'

# First ensure that the base dependencies are sufficient for a full import
pip install -q -e .
t2t-trainer --registry_help 2>&1 >/dev/null
t2t-datagen 2>&1 | grep translate_ende 2>&1 >/dev/null && echo passed

# Then install the test dependencies
pip install -q -e .[tests,allen]
# Make sure to install the atari extras for gym
pip install "gym[atari]"


================================================
FILE: oss_scripts/oss_release.sh
================================================
#!/bin/bash

set -v  # print commands as they're executed
set -e  # fail and exit on any command erroring

GIT_COMMIT_ID=${1:-""}
[[ -z $GIT_COMMIT_ID ]] && echo "Must provide a commit" && exit 1

TMP_DIR=$(mktemp -d)
pushd $TMP_DIR

echo "Cloning tensor2tensor and checking out commit $GIT_COMMIT_ID"
git clone https://github.com/tensorflow/tensor2tensor.git
cd tensor2tensor
git checkout $GIT_COMMIT_ID

# Without `python -m` we sometimes get module not callable error:
# https://stackoverflow.com/questions/58451650/pip-no-longer-working-after-update-error-module-object-is-not-callable
python -m pip install wheel twine pyopenssl

# Build the distribution
echo "Building distribution"
python setup.py sdist
python setup.py bdist_wheel --universal

# Publish to PyPI
echo "Publishing to PyPI"
twine upload dist/*

# Cleanup
rm -rf build/ dist/ tensor2tensor.egg-info/
popd
rm -rf $TMP_DIR


================================================
FILE: oss_scripts/oss_tests.sh
================================================
#!/bin/bash

set -v  # print commands as they're executed

# Instead of exiting on any failure with "set -e", we'll call set_status after
# each command and exit $STATUS at the end.
STATUS=0
function set_status() {
    local last_status=$?
    if [[ $last_status -ne 0 ]]
    then
      echo "<<<<<<FAILED>>>>>> Exit code: $last_status"
    fi
    STATUS=$(($last_status || $STATUS))
}

# Check env vars set
echo "${TF_VERSION:?}" && \
echo "${TF_LATEST:?}" && \
echo "${TRAVIS_PYTHON_VERSION:?}"
set_status
if [[ $STATUS -ne 0 ]]
then
  exit $STATUS
fi

# Check import
python -c "from tensor2tensor.models import transformer; print(transformer.Transformer.__name__)"
set_status

# We need to run some tests separately (because they enable eager or due to
# other reasons). We also test the tests in the top-level-directories separately
# to get more readable error messages.

# Tested separately:
#   * registry_test
#   * trainer_lib_test
#   * visualization_test
#   * trainer_model_based_test
#   * allen_brain_test
#   * models/research


# algorithmic_math_test: flaky
# subword_text_encoder_ops_test, pack_sequences_ops_test: interface with C++ ops
pytest --disable-warnings \
  --ignore=tensor2tensor/data_generators/algorithmic_math_test.py \
  --ignore=tensor2tensor/data_generators/allen_brain_test.py \
  --ignore=tensor2tensor/data_generators/ops/pack_sequences_ops_test.py \
  --ignore=tensor2tensor/data_generators/ops/subword_text_encoder_ops_test.py \
  --ignore=tensor2tensor/data_generators/problem_test.py \
  --deselect=tensor2tensor/data_generators/generator_utils_test.py::GeneratorUtilsTest::testDatasetPacking \
  tensor2tensor/data_generators
set_status


pytest --disable-warnings \
  --ignore=tensor2tensor/envs/mujoco_problems_test.py \
  --ignore=tensor2tensor/envs/rendered_env_problem_test.py \
  tensor2tensor/envs/
set_status


pytest --disable-warnings \
  --ignore=tensor2tensor/layers/common_attention_test.py \
  --ignore=tensor2tensor/layers/common_layers_test.py \
  --ignore=tensor2tensor/layers/common_video_test.py \
  --ignore=tensor2tensor/layers/discretization_test.py \
  --ignore=tensor2tensor/layers/latent_layers_test.py \
  --ignore=tensor2tensor/layers/modalities_test.py \
  --ignore=tensor2tensor/layers/ngram_test.py \
  tensor2tensor/layers/
set_status


# TODO(davidso): Re-enable EvolvedTransformer when possible.
pytest --disable-warnings \
  --ignore=tensor2tensor/models/evolved_transformer_test.py \
  --ignore=tensor2tensor/models/research \
  --ignore=tensor2tensor/models/video/nfg_conv3d_test.py \
  --ignore=tensor2tensor/models/video/nfg_conv_lstm_test.py \
  --ignore=tensor2tensor/models/video/nfg_conv_test.py \
  --ignore=tensor2tensor/models/video/nfg_uncond_test.py \
  tensor2tensor/models/
set_status


# test_utils.py is not a test, but pytest thinks it is.
pytest --disable-warnings \
  --ignore=tensor2tensor/utils/registry_test.py \
  --ignore=tensor2tensor/utils/t2t_model_test.py \
  --ignore=tensor2tensor/utils/test_utils.py \
  --ignore=tensor2tensor/utils/test_utils_test.py \
  --ignore=tensor2tensor/utils/trainer_lib_test.py \
  tensor2tensor/utils/
set_status


# These tests enable eager, so are tested separately.
pytest --disable-warnings \
  tensor2tensor/data_generators/problem_test.py \
  tensor2tensor/layers/common_attention_test.py \
  tensor2tensor/layers/common_layers_test.py \
  tensor2tensor/layers/common_video_test.py \
  tensor2tensor/layers/discretization_test.py \
  tensor2tensor/layers/latent_layers_test.py \
  tensor2tensor/layers/modalities_test.py \
  tensor2tensor/layers/ngram_test.py \
  tensor2tensor/utils/t2t_model_test.py \
  tensor2tensor/utils/test_utils_test.py \
  --deselect=tensor2tensor/layers/common_layers_test.py::CommonLayersTest::testFactoredTensorImplicitConversion \
  --deselect=tensor2tensor/layers/modalities_test.py::ModalityTest::testSymbolModalityTargetsFactored \
  --deselect=tensor2tensor/layers/common_video_test.py::CommonVideoTest::testGifSummary
set_status


pytest --disable-warnings tensor2tensor/utils/registry_test.py
set_status

pytest --disable-warnings tensor2tensor/utils/trainer_lib_test.py
set_status

pytest --disable-warnings tensor2tensor/visualization/visualization_test.py
set_status

pytest --disable-warnings tensor2tensor/data_generators/allen_brain_test.py
set_status

# All other tests not tested above.

# trax tests need C++
# TODO(afrozm): Enable trax tests they currently need GLIBCXX_3.4.21
# Travis Error:
# ImportError: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /home/travis/virtualenv/python3.6.3/lib/python3.6/site-packages/jaxlib/_pywrap_xla.so)
pytest --disable-warnings \
  --ignore=tensor2tensor/bin/t2t_trainer_test.py \
  --ignore=tensor2tensor/data_generators \
  --ignore=tensor2tensor/envs \
  --ignore=tensor2tensor/layers \
  --ignore=tensor2tensor/models \
  --ignore=tensor2tensor/rl \
  --ignore=tensor2tensor/trax \
  --ignore=tensor2tensor/utils \
  --ignore=tensor2tensor/visualization \
  --deselect=tensor2tensor/utils/beam_search_test.py::BeamSearchTest::testTPUBeam
set_status


# TODO(afrozm): Enable this unconditionally?

## Test models/research only against tf-nightly
#if [[ "$TRAVIS_PYTHON_VERSION" == "2.7"  ]]
#then
#  # Ignores:
#  # * Glow requires the CIFAR-10 dataset to be generated
#  pytest --disable-warnings tensor2tensor/models/research \
#    --ignore=tensor2tensor/models/research/glow_test.py
#  set_status
#fi

if [[ "$TF_VERSION" == "$TF_LATEST"  ]]
then
    jupyter nbconvert --ExecutePreprocessor.kernel_name=python3 \
      --ExecutePreprocessor.timeout=600 --to notebook --execute \
      tensor2tensor/notebooks/hello_t2t.ipynb;
    set_status

    jupyter nbconvert --ExecutePreprocessor.kernel_name=python3 \
      --ExecutePreprocessor.timeout=600 --to notebook --execute \
      tensor2tensor/notebooks/t2t_problem.ipynb;
    set_status

    # TODO(afrozm): Once we drop support for 1.10 we can get rid of this.
    pytest --disable-warnings \
      tensor2tensor/utils/beam_search_test.py::BeamSearchTest::testTPUBeam
    set_status

    # TODO(afrozm): Enable other tests in the RL directory.
    # Can't add disable warning here since it parses flags.
    pytest tensor2tensor/rl/trainer_model_based_test.py
    set_status

fi

# Test --t2t_usr_dir
t2t-trainer --registry_help --t2t_usr_dir=./tensor2tensor/test_data/example_usr_dir 2>&1 | grep my_very_own_hparams && echo passed
set_status

exit $STATUS


================================================
FILE: pylintrc
================================================


[MASTER]

# Pickle collected data for later comparisons.
persistent=no

# Set the cache size for astng objects.
cache-size=500

# Ignore Py3 files
ignore=get_references_web.py,get_references_web_single_group.py


[REPORTS]

# Set the output format.
# output-format=sorted-text

# Put messages in a separate file for each module / package specified on the
# command line instead of printing them on stdout. Reports (if any) will be
# written in a file name "pylint_global.[txt|html]".
files-output=no

# Tells whether to display a full report or only the messages.
reports=no

# Disable the report(s) with the given id(s).
disable-report=R0001,R0002,R0003,R0004,R0101,R0102,R0201,R0202,R0220,R0401,R0402,R0701,R0801,R0901,R0902,R0903,R0904,R0911,R0912,R0913,R0914,R0915,R0921,R0922,R0923

# Error message template (continued on second line)
msg-template={msg_id}:{line:3} {obj}: {msg} [{symbol}]


[MESSAGES CONTROL]
# List of checkers and warnings to enable.
enable=indexing-exception,old-raise-syntax

# List of checkers and warnings to disable.
disable=design,similarities,no-self-use,attribute-defined-outside-init,locally-disabled,star-args,pointless-except,bad-option-value,global-statement,fixme,suppressed-message,useless-suppression,locally-enabled,file-ignored,multiple-imports,c-extension-no-member,trailing-newlines,unsubscriptable-object,misplaced-comparison-constant,no-member,abstract-method,no-else-return,missing-docstring,wrong-import-order,protected-access,inconsistent-return-statements,invalid-unary-operand-type,import-error,no-name-in-module,arguments-differ,not-context-manager,unused-argument

[BASIC]

# Required attributes for module, separated by a comma
required-attributes=

# Regular expression which should only match the name
# of functions or classes which do not require a docstring.
no-docstring-rgx=(__.*__|main)

# Min length in lines of a function that requires a docstring.
docstring-min-length=10

# Regular expression which should only match correct module names. The
# leading underscore is sanctioned for private modules by Google's style
# guide.
#
# There are exceptions to the basic rule (_?[a-z][a-z0-9_]*) to cover
# requirements of Python's module system.
module-rgx=^(_?[a-z][a-z0-9_]*)|__init__$

# Regular expression which should only match correct module level names
const-rgx=^(_?[A-Z][A-Z0-9_]*|__[a-z0-9_]+__|_?[a-z][a-z0-9_]*)$

# Regular expression which should only match correct class attribute
class-attribute-rgx=^(_?[A-Z][A-Z0-9_]*|__[a-z0-9_]+__|_?[a-z][a-z0-9_]*)$

# Regular expression which should only match correct class names
class-rgx=^_?[A-Z][a-zA-Z0-9]*$

# Regular expression which should only match correct function names.
# 'camel_case' and 'snake_case' group names are used for consistency of naming
# styles across functions and methods.
function-rgx=^(?:(?P<exempt>setUp|tearDown|setUpModule|tearDownModule)|(?P<camel_case>_?[A-Z][a-zA-Z0-9]*)|(?P<snake_case>_?[a-z][a-z0-9_]*))$


# Regular expression which should only match correct method names.
# 'camel_case' and 'snake_case' group names are used for consistency of naming
# styles across functions and methods. 'exempt' indicates a name which is
# consistent with all naming styles.
method-rgx=(?x)
  ^(?:(?P<exempt>_[a-z0-9_]+__|runTest|setUp|tearDown|setUpTestCase
         |tearDownTestCase|setupSelf|tearDownClass|setUpClass
         |(test|assert)_*[A-Z0-9][a-zA-Z0-9_]*|next)
     |(?P<camel_case>_{0,2}[A-Z][a-zA-Z0-9_]*)
     |(?P<snake_case>_{0,2}[a-z][a-z0-9_]*))$


# Regular expression which should only match correct instance attribute names
attr-rgx=^_{0,2}[a-z][a-z0-9_]*$

# Regular expression which should only match correct argument names
argument-rgx=^[a-z][a-z0-9_]*$

# Regular expression which should only match correct variable names
variable-rgx=^[a-z][a-z0-9_]*$

# Regular expression which should only match correct list comprehension /
# generator expression variable names
inlinevar-rgx=^[a-z][a-z0-9_]*$

# Good variable names which should always be accepted, separated by a comma
good-names=main,_

# Bad variable names which should always be refused, separated by a comma
bad-names=

# List of builtins function names that should not be used, separated by a comma
bad-functions=input,apply,reduce

# List of decorators that define properties, such as abc.abstractproperty.
property-classes=abc.abstractproperty


[TYPECHECK]

# Tells whether missing members accessed in mixin class should be ignored. A
# mixin class is detected if its name ends with "mixin" (case insensitive).
ignore-mixin-members=yes

# List of decorators that create context managers from functions, such as
# contextlib.contextmanager.
contextmanager-decorators=contextlib.contextmanager,contextlib2.contextmanager


[VARIABLES]

# Tells whether we should check for unused import in __init__ files.
init-import=no

# A regular expression matching names used for dummy variables (i.e. not used).
dummy-variables-rgx=^\*{0,2}(_$|unused_|dummy_)

# List of additional names supposed to be defined in builtins. Remember that
# you should avoid to define new builtins when possible.
additional-builtins=


[CLASSES]

# List of method names used to declare (i.e. assign) instance attributes.
defining-attr-methods=__init__,__new__,setUp

# "class_" is also a valid for the first argument to a class method.
valid-classmethod-first-arg=cls,class_


[EXCEPTIONS]

overgeneral-exceptions=StandardError,Exception,BaseException


[IMPORTS]

# Deprecated modules which should not be used, separated by a comma
deprecated-modules=regsub,TERMIOS,Bastion,rexec,sets


[FORMAT]

# Maximum number of characters on a single line.
max-line-length=80

# Regexp for a line that is allowed to be longer than the limit.
# This "ignore" regex is today composed of several independent parts:
# (1) Long import lines
# (2) URLs in comments or pydocs. Detecting URLs by regex is a hard problem and
#     no amount of tweaking will make a perfect regex AFAICT. This one is a good
#     compromise.
# (3) Constant string literals at the start of files don't need to be broken
#     across lines. Allowing long paths and urls to be on a single
#     line. Also requires that the string not be a triplequoted string.
ignore-long-lines=(?x)
  (^\s*(import|from)\s
   |^\s*(\#\ )?<?(https?|ftp):\/\/[^\s\/$.?#].[^\s]*>?$
   |^[a-zA-Z_][a-zA-Z0-9_]*\s*=\s*("[^"]\S+"|'[^']\S+')
   )

# Maximum number of lines in a module
max-module-lines=99999

# String used as indentation unit. We differ from PEP8's normal 4 spaces.
indent-string='  '

# Do not warn about multiple statements on a single line for constructs like
#   if test: stmt
single-line-if-stmt=y

# Make sure : in dicts and trailing commas are checked for whitespace.
no-space-check=


[LOGGING]

# Add logging modules.
logging-modules=logging,absl.logging


[MISCELLANEOUS]

# List of note tags to take in consideration, separated by a comma.
notes=


# Maximum line length for lambdas
short-func-length=1

# List of module members that should be marked as deprecated.
# All of the string functions are listed in 4.1.4 Deprecated string functions
# in the Python 2.4 docs.
deprecated-members=string.atof,string.atoi,string.atol,string.capitalize,string.expandtabs,string.find,string.rfind,string.index,string.rindex,string.count,string.lower,string.split,string.rsplit,string.splitfields,string.join,string.joinfields,string.lstrip,string.rstrip,string.strip,string.swapcase,string.translate,string.upper,string.ljust,string.rjust,string.center,string.zfill,string.replace,sys.exitfunc,sys.maxint


# List of exceptions that do not need to be mentioned in the Raises section of
# a docstring.
ignore-exceptions=AssertionError,NotImplementedError,StopIteration,TypeError


# Number of spaces of indent required when the last token on the preceding line
# is an open (, [, or {.
indent-after-paren=4


================================================
FILE: setup.py
================================================
"""Install tensor2tensor."""

from setuptools import find_packages
from setuptools import setup

setup(
    name='tensor2tensor',
    version='1.15.7',
    description='Tensor2Tensor',
    long_description=(
        'Tensor2Tensor, or T2T for short, is a library of '
        'deep learning models and datasets designed to make deep '
        'learning more accessible and accelerate ML research. '
        'T2T was developed by researchers and engineers in the Google '
        'Brain team and a community of users. It is now in maintenance '
        'mode -- we keep it running and welcome bug-fixes, but encourage '
        'users to use the successor library Trax.'),
    author='Google Inc.',
    author_email='no-reply@google.com',
    url='http://github.com/tensorflow/tensor2tensor',
    license='Apache 2.0',
    packages=find_packages(),
    package_data={
        'tensor2tensor.data_generators': ['test_data/*'],
        'tensor2tensor.data_generators.wikisum': ['test_data/*'],
        'tensor2tensor.visualization': [
            'attention.js', 'TransformerVisualization.ipynb'
        ],
    },
    scripts=[
        'tensor2tensor/bin/t2t-trainer',
        'tensor2tensor/bin/t2t-datagen',
        'tensor2tensor/bin/t2t-decoder',
        'tensor2tensor/bin/t2t-make-tf-configs',
        'tensor2tensor/bin/t2t-eval',
        'tensor2tensor/bin/t2t-exporter',
        'tensor2tensor/bin/t2t-query-server',
        'tensor2tensor/bin/t2t-insights-server',
        'tensor2tensor/bin/t2t-avg-all',
        'tensor2tensor/bin/t2t-bleu',
        'tensor2tensor/bin/t2t-translate-all',
    ],
    install_requires=[
        'absl-py',
        'bz2file',
        'dopamine-rl',
        'flask',
        'future',
        'gevent',
        'gin-config',
        'google-api-python-client',
        'gunicorn',
        'gym',
        'h5py',
        'kfac',
        'mesh-tensorflow',
        'numpy',
        'oauth2client',
        'opencv-python',
        'Pillow',
        'pypng',
        'requests',
        'scipy',
        'six>=1.12.0',
        'sympy',
        'tensorflow-addons',
        'tensorflow-datasets',
        'tensorflow-gan',
        'tensorflow-probability==0.7.0',
        'tf_slim',
        'tqdm',
    ],
    extras_require={
        'tensorflow': ['tensorflow>=1.15.0'],
        'tensorflow-hub': ['tensorflow-hub>=0.1.1'],
        'tests': [
            # Needed to fix a Travis pytest error.
            # https://github.com/Julian/jsonschema/issues/449#issuecomment-411406525
            'attrs>=17.4.0',
            'pytest>=3.8.0',
            'mock',
            'jupyter',
            'matplotlib',
            # Need atari extras for Travis tests, but because gym is already in
            # install_requires, pip skips the atari extras, so we instead do an
            # explicit pip install gym[atari] for the tests.
            # 'gym[atari]',
        ],
        'allen': ['Pillow==5.1.0', 'pandas==0.23.0'],
    },
    classifiers=[
        'Development Status :: 4 - Beta',
        'Intended Audience :: Developers',
        'Intended Audience :: Science/Research',
        'License :: OSI Approved :: Apache Software License',
        'Topic :: Scientific/Engineering :: Artificial Intelligence',
    ],
    dependency_links=[
        'git+https://github.com/tensorflow/cleverhans.git#egg=cleverhans'
    ],
    keywords='tensorflow machine learning',
)


================================================
FILE: tensor2tensor/__init__.py
================================================
# coding=utf-8
# Copyright 2023 The Tensor2Tensor Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.



================================================
FILE: tensor2tensor/bin/__init__.py
================================================
# coding=utf-8
# Copyright 2023 The Tensor2Tensor Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.



================================================
FILE: tensor2tensor/bin/build_vocab.py
================================================
# coding=utf-8
# Copyright 2023 The Tensor2Tensor Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

r"""Build vocab for a subclass of Text2TextProblem.

build_vocab \
    --problem=program_search_algolisp \
    --data_dir=~/t2t_data \
    --tmp_dir=~/t2t_data/tmp
"""

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import os

from tensor2tensor import problems as problems_lib  # pylint: disable=unused-import
from tensor2tensor.data_generators import text_problems
from tensor2tensor.utils import registry
import tensorflow.compat.v1 as tf

flags = tf.flags
FLAGS = flags.FLAGS

flags.DEFINE_string("data_dir", "/tmp/t2t/data_dir",
                    "Directory to place the generated vocabulary file in.")

flags.DEFINE_string("tmp_dir", "/tmp/t2t/tmp_dir",
                    "Temporary storage directory.")

flags.DEFINE_string("problem", None,
                    "Problem to generate the vocabulary file for.")

flags.mark_flag_as_required("problem")


def main(_):
  problem = registry.problem(FLAGS.problem)

  # We make the assumption that the problem is a subclass of Text2TextProblem.
  assert isinstance(problem, text_problems.Text2TextProblem)

  data_dir = os.path.expanduser(FLAGS.data_dir)
  tmp_dir = os.path.expanduser(FLAGS.tmp_dir)

  tf.gfile.MakeDirs(data_dir)
  tf.gfile.MakeDirs(tmp_dir)

  tf.logging.info("Saving vocabulary to data_dir: %s" % data_dir)

  problem.get_or_create_vocab(data_dir, tmp_dir)

  tf.logging.info("Saved vocabulary file: " +
                  os.path.join(data_dir, problem.vocab_filename))


if __name__ == "__main__":
  tf.logging.set_verbosity(tf.logging.INFO)
  tf.app.run()


================================================
FILE: tensor2tensor/bin/make_tf_configs.py
================================================
# coding=utf-8
# Copyright 2023 The Tensor2Tensor Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""Output command line arguments and json-encoded TF_CONFIGs.

Usage:

`t2t-make-tf-configs --masters="server1:1234" --ps="server3:2134,server4:2334"`

Outputs 1 line per job to stdout, first the masters, then the parameter servers.
Each line has the TF_CONFIG, then a tab, then the command line flags for that
job.

If there is a single master, it will have the `--sync` flag.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import json
import tensorflow.compat.v1 as tf

flags = tf.flags
FLAGS = flags.FLAGS

flags.DEFINE_string("masters", "", "Comma-separated list of master addresses")
flags.DEFINE_string("ps", "", "Comma-separated list of ps addresses")


def main(_):
  if not (FLAGS.masters and FLAGS.ps):
    raise ValueError("Must provide --masters and --ps")

  masters = FLAGS.masters.split(",")
  ps = FLAGS.ps.split(",")

  is_sync = len(masters) == 1
  if is_sync:
    print("Assuming SYNC distributed training with a single master and %d "
          "workers" % len(ps))
    cluster = {"ps": ps, "master": masters}
  else:
    print("Assuming ASYNC distributed training with %d workers and %d "
          "parameter servers" % (len(masters), len(ps)))
    cluster = {"ps": ps, "chief": [masters[0]], "worker": masters[1:]}

  # Trainer configs
  for idx, addr in enumerate(masters):
    cmd_line_flags = [
        "--master=grpc://%s" % addr,
        "--ps_replicas=%d" % len(ps),
        "--worker_replicas=%d" % len(masters),
        "--worker_gpu=%d" % (0 if is_sync else 1),
        "--worker_id=%d" % idx,
        "--ps_gpu=%d" % (1 if is_sync else 0),
        "--sync" if is_sync else "",
        "--schedule=train",
    ]
    if is_sync:
      task_type = "master"
      cmd_line_flags.append("--worker_job='/job:master'")
    else:
      if idx == 0:
        task_type = "chief"
        idx = 0
        cmd_line_flags.append("--worker_job='/job:chief'")
      else:
        task_type = "worker"
        idx -= 1
        cmd_line_flags.append("--worker_job='/job:worker'")

    tf_config = json.dumps({
        "cluster": cluster,
        "task": {
            "type": task_type,
            "index": idx
        },
        "environment": "cloud",
    })
    cmd_line_flags = " ".join(cmd_line_flags)
    print("'%s'\t%s" % (tf_config, cmd_line_flags))

  # Std server configs
  for idx, addr in enumerate(ps):
    tf_config = json.dumps({
        "cluster": cluster,
        "task": {
            "type": "ps",
            "index": idx
        },
        "environment": "cloud",
    })
    cmd_line_flags = "--schedule=run_std_server"
    print("'%s'\t%s" % (tf_config, cmd_line_flags))


if __name__ == "__main__":
  tf.logging.set_verbosity(tf.logging.INFO)
  tf.app.run()


================================================
FILE: tensor2tensor/bin/t2t-avg-all
================================================
#!/usr/bin/env python
"""t2t-avg-all."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from tensor2tensor.bin import t2t_avg_all

import tensorflow as tf

def main(argv):
  t2t_avg_all.main(argv)


if __name__ == "__main__":
  tf.logging.set_verbosity(tf.logging.INFO)
  tf.app.run()


================================================
FILE: tensor2tensor/bin/t2t-bleu
================================================
#!/usr/bin/env python
"""t2t-bleu."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from tensor2tensor.bin import t2t_bleu

import tensorflow as tf

def main(argv):
  t2t_bleu.main(argv)



if __name__ == "__main__":
  tf.logging.set_verbosity(tf.logging.INFO)
  tf.app.run()


================================================
FILE: tensor2tensor/bin/t2t-datagen
================================================
#!/usr/bin/env python
"""Data generation for Tensor2Tensor.

This script is used to generate data to train your models
for a number problems for which open-source data is available.

For example, to generate data for MNIST run this:

t2t-datagen \
  --problem=image_mnist \
  --data_dir=~/t2t_data \
  --tmp_dir=~/t2t_data/tmp
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from tensor2tensor.bin import t2t_datagen

import tensorflow.compat.v1 as tf

def main(argv):
  t2t_datagen.main(argv)


if __name__ == "__main__":
  tf.logging.set_verbosity(tf.logging.INFO)
  tf.app.run()


================================================
FILE: tensor2tensor/bin/t2t-decoder
================================================
#!/usr/bin/env python
"""t2t-decoder."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from tensor2tensor.bin import t2t_decoder

import tensorflow as tf

def main(argv):
  t2t_decoder.main(argv)


if __name__ == "__main__":
  tf.logging.set_verbosity(tf.logging.INFO)
  tf.app.run()


================================================
FILE: tensor2tensor/bin/t2t-eval
================================================
#!/usr/bin/env python
"""Run t2t-eval from a trained checkpoint.

This script is used to run evaluation from a trained checkpoint. Example
to run evaluation on the test set when trained checkpoint is in /output_dir.

t2t-eval \
  --problem=image_mnist \
  --model=imagetransformer \
  --data_dir=~/t2t
  --output_dir=/output_dir \
  --eval_use_test_set=True \
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from tensor2tensor.bin import t2t_eval

import tensorflow as tf

def main(argv):
  t2t_eval.main(argv)


if __name__ == "__main__":
  tf.logging.set_verbosity(tf.logging.INFO)
  tf.app.run()


================================================
FILE: tensor2tensor/bin/t2t-exporter
================================================
#!/usr/bin/env python
"""t2t-exporter."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from tensor2tensor.serving import export

import tensorflow as tf

def main(argv):
  export.main(argv)


if __name__ == "__main__":
  tf.logging.set_verbosity(tf.logging.INFO)
  tf.app.run()


================================================
FILE: tensor2tensor/bin/t2t-insights-server
================================================
#!/usr/bin/env python
"""t2t-insights-server."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from tensor2tensor.insights import server

import tensorflow as tf

def main(argv):
  server.main(argv)


if __name__ == "__main__":
  tf.logging.set_verbosity(tf.logging.INFO)
  tf.app.run()


================================================
FILE: tensor2tensor/bin/t2t-make-tf-configs
================================================
#!/usr/bin/env python
"""t2t-make-tf-configs."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from tensor2tensor.bin import make_tf_configs

import tensorflow as tf

def main(argv):
  make_tf_configs.main(argv)


if __name__ == "__main__":
  tf.logging.set_verbosity(tf.logging.INFO)
  tf.app.run()


================================================
FILE: tensor2tensor/bin/t2t-query-server
================================================
#!/usr/bin/env python
"""t2t-query-server."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from tensor2tensor.serving import query

import tensorflow as tf

def main(argv):
  query.main(argv)


if __name__ == "__main__":
  tf.logging.set_verbosity(tf.logging.INFO)
  tf.app.run()


================================================
FILE: tensor2tensor/bin/t2t-trainer
================================================
#!/usr/bin/env python
"""Trainer for Tensor2Tensor.

This script is used to train your models in Tensor2Tensor.

For example, to train a shake-shake model on MNIST run this:

t2t-trainer \
  --generate_data \
  --problem=image_mnist \
  --data_dir=~/t2t_data \
  --tmp_dir=~/t2t_data/tmp
  --model=shake_shake \
  --hparams_set=shake_shake_quick \
  --output_dir=~/t2t_train/mnist1 \
  --train_steps=1000 \
  --eval_steps=100
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from tensor2tensor.bin import t2t_trainer

import tensorflow.compat.v1 as tf

def main(argv):
  t2t_trainer.main(argv)


if __name__ == "__main__":
  tf.logging.set_verbosity(tf.logging.INFO)
  tf.app.run(main)


================================================
FILE: tensor2tensor/bin/t2t-translate-all
================================================
#!/usr/bin/env python
"""t2t-translate-all."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from tensor2tensor.bin import t2t_translate_all

import tensorflow as tf

def main(argv):
  t2t_translate_all.main(argv)



if __name__ == "__main__":
  tf.logging.set_verbosity(tf.logging.INFO)
  tf.app.run()


================================================
FILE: tensor2tensor/bin/t2t_attack.py
================================================
# coding=utf-8
# Copyright 2023 The Tensor2Tensor Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

r"""Adversarially attack a model.

This script adversarially attacks a model and evaluates accuracy at various
  epsilons.

Params such as which epsilons to evaluate at and the attack algorithm are
  specified by attack_params, see models/resnet.py for examples.

--ignore_incorrect will only attack those examples that are already correctly
  classified by the model.

--surrogate_attack will attack a model (A) and evaluate adversarial examples for
  A on a different model (B).

Example run:
- train a resnet on cifar10:
    bin/t2t_trainer.py --problem=image_cifar10 --hparams_set=resnet_cifar_32 \
      --model=resnet

- evaluate robustness using the FGSM attack:
    bin/t2t_attack.py --attack_params_set=resnet_fgsm --problem=image_cifar10\
      --hparams_set=resnet_cifar_32 --model=resnet
"""

import os

from tensor2tensor.bin import t2t_trainer
from tensor2tensor.data_generators import problem as problem_lib  # pylint: disable=unused-import
from tensor2tensor.utils import adv_attack_utils
from tensor2tensor.utils import cloud_mlengine
from tensor2tensor.utils import registry
from tensor2tensor.utils import t2t_model
from tensor2tensor.utils import trainer_lib
from tensor2tensor.utils import usr_dir

import tensorflow.compat.v1 as tf
from tensorflow.compat.v1 import estimator as tf_estimator

flags = tf.flags
FLAGS = flags.FLAGS

# See flags.py for additional command-line flags.
flags.DEFINE_string("attack_params_set", None,
                    "Which attack parameters to use.")
flags.DEFINE_boolean("surrogate_attack", False,
                     "Perform an attack on a surrogate model.")
flags.DEFINE_string("surrogate_model", None, "Surrogate model to attack.")
flags.DEFINE_string("surrogate_hparams_set", None,
                    "Surrogate model's hyperparameter set.")
flags.DEFINE_string("surrogate_output_dir", None,
                    "Directory storing surrogate model's weights.")
flags.DEFINE_boolean(
    "ignore_incorrect", False, "Ignore examples that are "
    "incorrectly classified to begin with.")


def create_attack_params():
  return registry.attack_params(FLAGS.attack_params_set)


def create_attack(attack):
  return registry.attack(attack)


def create_surrogate_hparams():
  return trainer_lib.create_hparams(FLAGS.surrogate_hparams_set, None)


def create_surrogate_run_config(hp):
  """Create a run config.

  Args:
    hp: model hyperparameters
  Returns:
    a run config
  """
  save_ckpt_steps = max(FLAGS.iterations_per_loop, FLAGS.local_eval_frequency)
  save_ckpt_secs = FLAGS.save_checkpoints_secs or None
  if save_ckpt_secs:
    save_ckpt_steps = None
  assert FLAGS.surrogate_output_dir
  # the various custom getters we have written do not play well together yet.
  # TODO(noam): ask rsepassi for help here.
  daisy_chain_variables = (
      hp.daisy_chain_variables and hp.activation_dtype == "float32" and
      hp.weight_dtype == "float32")
  return trainer_lib.create_run_config(
      model_name=FLAGS.model,
      model_dir=os.path.expanduser(FLAGS.surrogate_output_dir),
      master=FLAGS.master,
      iterations_per_loop=FLAGS.iterations_per_loop,
      num_shards=FLAGS.tpu_num_shards,
      log_device_placement=FLAGS.log_device_placement,
      save_checkpoints_steps=save_ckpt_steps,
      save_checkpoints_secs=save_ckpt_secs,
      keep_checkpoint_max=FLAGS.keep_checkpoint_max,
      keep_checkpoint_every_n_hours=FLAGS.keep_checkpoint_every_n_hours,
      num_gpus=FLAGS.worker_gpu,
      gpu_order=FLAGS.gpu_order,
      num_async_replicas=FLAGS.worker_replicas,
      gpu_mem_fraction=FLAGS.worker_gpu_memory_fraction,
      enable_graph_rewriter=FLAGS.enable_graph_rewriter,
      use_tpu=FLAGS.use_tpu,
      schedule=FLAGS.schedule,
      no_data_parallelism=hp.no_data_parallelism,
      daisy_chain_variables=daisy_chain_variables,
      ps_replicas=FLAGS.ps_replicas,
      ps_job=FLAGS.ps_job,
      ps_gpu=FLAGS.ps_gpu,
      sync=FLAGS.sync,
      worker_id=FLAGS.worker_id,
      worker_job=FLAGS.worker_job,
      random_seed=FLAGS.random_seed,
      tpu_infeed_sleep_secs=FLAGS.tpu_infeed_sleep_secs,
      inter_op_parallelism_threads=FLAGS.inter_op_parallelism_threads,
      log_step_count_steps=FLAGS.log_step_count_steps,
      intra_op_parallelism_threads=FLAGS.intra_op_parallelism_threads)


def prepare_data(problem, hparams, params, config):
  """Construct input pipeline."""
  input_fn = problem.make_estimator_input_fn(
      tf_estimator.ModeKeys.EVAL, hparams, force_repeat=True)
  dataset = input_fn(params, config)
  features, _ = dataset.make_one_shot_iterator().get_next()
  inputs, labels = features["targets"], features["inputs"]
  inputs = tf.to_float(inputs)
  input_shape = inputs.shape.as_list()
  inputs = tf.reshape(inputs, [hparams.batch_size] + input_shape[1:])
  labels = tf.reshape(labels, [hparams.batch_size])
  return inputs, labels, features


def main(argv):
  tf.logging.set_verbosity(tf.logging.INFO)
  trainer_lib.set_random_seed(FLAGS.random_seed)
  usr_dir.import_usr_dir(FLAGS.t2t_usr_dir)
  t2t_trainer.maybe_log_registry_and_exit()


  if FLAGS.cloud_mlengine:
    cloud_mlengine.launch()
    return

  if FLAGS.generate_data:
    t2t_trainer.generate_data()

  if cloud_mlengine.job_dir():
    FLAGS.output_dir = cloud_mlengine.job_dir()

  if argv:
    t2t_trainer.set_hparams_from_args(argv[1:])

  if FLAGS.surrogate_attack:
    tf.logging.warn("Performing surrogate model attack.")
    sur_hparams = create_surrogate_hparams()
    trainer_lib.add_problem_hparams(sur_hparams, FLAGS.problem)

  hparams = t2t_trainer.create_hparams()
  trainer_lib.add_problem_hparams(hparams, FLAGS.problem)

  attack_params = create_attack_params()
  attack_params.add_hparam(attack_params.epsilon_name, 0.0)

  if FLAGS.surrogate_attack:
    sur_config = create_surrogate_run_config(sur_hparams)
  config = t2t_trainer.create_run_config(hparams)
  params = {
      "batch_size": hparams.batch_size,
      "use_tpu": FLAGS.use_tpu,
  }

  # add "_rev" as a hack to avoid image standardization
  problem = registry.problem(FLAGS.problem + "_rev")

  inputs, labels, features = prepare_data(problem, hparams, params, config)

  sess = tf.Session()

  if FLAGS.surrogate_attack:
    sur_model_fn = t2t_model.T2TModel.make_estimator_model_fn(
        FLAGS.surrogate_model, sur_hparams, use_tpu=FLAGS.use_tpu)
    sur_ch_model = adv_attack_utils.T2TAttackModel(
        sur_model_fn, features, params, sur_config, scope="surrogate")
    # Dummy call to construct graph
    sur_ch_model.get_probs(inputs)

    checkpoint_path = os.path.expanduser(FLAGS.surrogate_output_dir)
    tf.train.init_from_checkpoint(
        tf.train.latest_checkpoint(checkpoint_path), {"/": "surrogate/"})
    sess.run(tf.global_variables_initializer())

  other_vars = set(tf.global_variables())

  model_fn = t2t_model.T2TModel.make_estimator_model_fn(
      FLAGS.model, hparams)
  ch_model = adv_attack_utils.T2TAttackModel(model_fn, features, params, config)

  acc_mask = None
  probs = ch_model.get_probs(inputs)
  if FLAGS.ignore_incorrect:
    preds = tf.argmax(probs, -1, output_type=labels.dtype)
    preds = tf.reshape(preds, labels.shape)
    acc_mask = tf.to_float(tf.equal(labels, preds))
  one_hot_labels = tf.one_hot(labels, probs.shape[-1])

  if FLAGS.surrogate_attack:
    attack = create_attack(attack_params.attack)(sur_ch_model, sess=sess)
  else:
    attack = create_attack(attack_params.attack)(ch_model, sess=sess)

  new_vars = set(tf.global_variables()) - other_vars

  # Restore weights
  saver = tf.train.Saver(new_vars)
  checkpoint_path = os.path.expanduser(FLAGS.output_dir)
  saver.restore(sess, tf.train.latest_checkpoint(checkpoint_path))

  # reuse variables
  tf.get_variable_scope().reuse_variables()

  def compute_accuracy(x, l, mask):
    """Compute model accuracy."""
    preds = ch_model.get_probs(x)
    preds = tf.squeeze(preds)
    preds = tf.argmax(preds, -1, output_type=l.dtype)

    _, acc_update_op = tf.metrics.accuracy(l, preds, weights=mask)

    if FLAGS.surrogate_attack:
      preds = sur_ch_model.get_probs(x)
      preds = tf.squeeze(preds)
      preds = tf.argmax(preds, -1, output_type=l.dtype)
      acc_update_op = tf.tuple((acc_update_op,
                                tf.metrics.accuracy(l, preds, weights=mask)[1]))

    sess.run(tf.initialize_local_variables())
    for i in range(FLAGS.eval_steps):
      tf.logging.info(
          "\tEvaluating batch [%d / %d]" % (i + 1, FLAGS.eval_steps))
      acc = sess.run(acc_update_op)
    if FLAGS.surrogate_attack:
      tf.logging.info("\tFinal acc: (%.4f, %.4f)" % (acc[0], acc[1]))
    else:
      tf.logging.info("\tFinal acc: %.4f" % acc)
    return acc

  epsilon_acc_pairs = []
  for epsilon in attack_params.attack_epsilons:
    tf.logging.info("Attacking @ eps=%.4f" % epsilon)
    attack_params.set_hparam(attack_params.epsilon_name, epsilon)
    adv_x = attack.generate(inputs, y=one_hot_labels, **attack_params.values())
    acc = compute_accuracy(adv_x, labels, acc_mask)
    epsilon_acc_pairs.append((epsilon, acc))

  for epsilon, acc in epsilon_acc_pairs:
    if FLAGS.surrogate_attack:
      tf.logging.info(
          "Accuracy @ eps=%.4f: (%.4f, %.4f)" % (epsilon, acc[0], acc[1]))
    else:
      tf.logging.info("Accuracy @ eps=%.4f: %.4f" % (epsilon, acc))


if __name__ == "__main__":
  tf.logging.set_verbosity(tf.logging.INFO)
  tf.app.run()


================================================
FILE: tensor2tensor/bin/t2t_avg_all.py
================================================
# coding=utf-8
# Copyright 2023 The Tensor2Tensor Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""Script to continuously average last N checkpoints in a given directory."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from collections import deque
import os
import shutil
import numpy as np
import six
from six.moves import zip  # pylint: disable=redefined-builtin
from tensor2tensor.utils import bleu_hook
import tensorflow.compat.v1 as tf

flags = tf.flags
FLAGS = flags.FLAGS

flags.DEFINE_string("model_dir", "",
                    "Directory to load model checkpoints from.")
flags.DEFINE_string("output_dir", "avg/",
                    "Directory to output the averaged checkpoints to.")
flags.DEFINE_integer("n", 8, "How many checkpoints should be averaged?")
flags.DEFINE_integer("min_steps", 0, "Ignore checkpoints with less steps.")
flags.DEFINE_integer("wait_minutes", 0,
                     "Wait upto N minutes for a new checkpoint.")


def main(_):
  tf.logging.set_verbosity(tf.logging.INFO)

  model_dir = os.path.expanduser(FLAGS.model_dir)
  output_dir = os.path.expanduser(FLAGS.output_dir)
  out_base_file = os.path.join(output_dir, "model.ckpt")

  # Copy flags.txt with the original time, so t2t-bleu can report correct
  # relative time.
  tf.gfile.MakeDirs(FLAGS.output_dir)
  if (not os.path.exists(os.path.join(output_dir, "flags.txt")) and
      os.path.exists(os.path.join(model_dir, "flags.txt"))):
    shutil.copy2(os.path.join(model_dir, "flags.txt"),
                 os.path.join(output_dir, "flags.txt"))

  models_processed = 0
  queue = deque()
  for model in bleu_hook.stepfiles_iterator(model_dir, FLAGS.wait_minutes,
                                            FLAGS.min_steps):
    if models_processed == 0:
      var_list = tf.train.list_variables(model.filename)
      avg_values = {}
      for (name, shape) in var_list:
        if not (name.startswith("global_step") or
                name.startswith("train_stats/")):
          avg_values[name] = np.zeros(shape)
    models_processed += 1

    tf.logging.info("Loading [%d]: %s" % (models_processed, model.filename))
    reader = tf.train.load_checkpoint(model.filename)
    for name in avg_values:
      avg_values[name] += reader.get_tensor(name) / FLAGS.n
    queue.append(model)
    if len(queue) < FLAGS.n:
      continue

    out_file = "%s-%d" % (out_base_file, model.steps)
    tf_vars = []
    tf.logging.info("Averaging %s" % (out_file))
    for (name, value) in six.iteritems(avg_values):
      # TODO(martinpopel): dtype=var_dtypes[name]
      tf_vars.append(tf.get_variable(name, shape=value.shape))
    placeholders = [tf.placeholder(v.dtype, shape=v.shape) for v in tf_vars]
    assign_ops = [tf.assign(v, p) for (v, p) in zip(tf_vars, placeholders)]

    global_step = tf.get_variable(
        "global_step",
        initializer=tf.constant(model.steps, dtype=tf.int64),
        trainable=False)
    with tf.variable_scope("train_stats"):
      tf.get_variable("problem_0_steps", initializer=0, trainable=False)
    saver = tf.train.Saver(tf.global_variables())

    tf.logging.info("Running session for %s" % (out_file))
    with tf.Session() as sess:
      sess.run(tf.global_variables_initializer())
      for p, assign_op, (name, value) in zip(
          placeholders, assign_ops, six.iteritems(avg_values)):
        sess.run(assign_op, {p: value})
      tf.logging.info("Storing to %s" % out_file)
      saver.save(sess, out_base_file, global_step=global_step)
    os.utime(out_file + ".index", (model.mtime, model.mtime))

    tf.reset_default_graph()
    first_model = queue.popleft()

    reader = tf.train.load_checkpoint(first_model.filename)
    for name in avg_values:
      avg_values[name] -= reader.get_tensor(name) / FLAGS.n

if __name__ == "__main__":
  tf.logging.set_verbosity(tf.logging.INFO)
  tf.app.run()


================================================
FILE: tensor2tensor/bin/t2t_bleu.py
================================================
# coding=utf-8
# Copyright 2023 The Tensor2Tensor Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""Evaluate BLEU score for all checkpoints/translations in a given directory.

This script can be used in two ways.


To evaluate one already translated file:

```
t2t-bleu --translation=my-wmt13.de --reference=wmt13_deen.de
```

To evaluate all translations in a given directory (translated by
`t2t-translate-all`):

```
t2t-bleu
  --translations_dir=my-translations
  --reference=wmt13_deen.de
  --event_dir=events
```

In addition to the above-mentioned required parameters,
there are optional parameters:
 * bleu_variant: cased (case-sensitive), uncased, both (default).
 * tag_suffix: Default="", so the tags will be BLEU_cased and BLEU_uncased.
   tag_suffix can be used e.g. for different beam sizes if these should be
   plotted in different graphs.
 * min_steps: Don't evaluate checkpoints with less steps.
   Default=-1 means check the `last_evaluated_step.txt` file, which contains
   the number of steps of the last successfully evaluated checkpoint.
 * report_zero: Store BLEU=0 and guess its time based on the oldest file in the
   translations_dir. Default=True. This is useful, so TensorBoard reports
   correct relative time for the remaining checkpoints. This flag is set to
   False if min_steps is > 0.
 * wait_minutes: Wait upto N minutes for a new translated file. Default=0.
   This is useful for continuous evaluation of a running training, in which case
   this should be equal to save_checkpoints_secs/60 plus time needed for
   translation plus some reserve.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import os
import time
from tensor2tensor.utils import bleu_hook
import tensorflow.compat.v1 as tf


flags = tf.flags
FLAGS = flags.FLAGS

flags.DEFINE_string("source", None,
                    "Path to the source-language file to be translated")
flags.DEFINE_string("reference", None, "Path to the reference translation file")
flags.DEFINE_string("translation", None,
                    "Path to the MT system translation file")
flags.DEFINE_string("translations_dir", None,
                    "Directory with translated files to be evaluated.")
flags.DEFINE_string("event_dir", None, "Where to store the event file.")

flags.DEFINE_string("bleu_variant", "both",
                    "Possible values: cased(case-sensitive), uncased, "
                    "both(default).")
flags.DEFINE_string("tag_suffix", "",
                    "What to add to BLEU_cased and BLEU_uncased tags.")
flags.DEFINE_integer("min_steps", -1,
                     "Don't evaluate checkpoints with less steps.")
flags.DEFINE_integer("wait_minutes", 0,
                     "Wait upto N minutes for a new checkpoint, cf. "
                     "save_checkpoints_secs.")
flags.DEFINE_bool("report_zero", None,
                  "Store BLEU=0 and guess its time based on the oldest file.")


def main(_):
  tf.logging.set_verbosity(tf.logging.INFO)
  if FLAGS.translation:
    if FLAGS.translations_dir:
      raise ValueError(
          "Cannot specify both --translation and --translations_dir.")
    if FLAGS.bleu_variant in ("uncased", "both"):
      bleu = 100 * bleu_hook.bleu_wrapper(FLAGS.reference, FLAGS.translation,
                                          case_sensitive=False)
      print("BLEU_uncased = %6.2f" % bleu)
    if FLAGS.bleu_variant in ("cased", "both"):
      bleu = 100 * bleu_hook.bleu_wrapper(FLAGS.reference, FLAGS.translation,
                                          case_sensitive=True)
      print("BLEU_cased = %6.2f" % bleu)
    return

  if not FLAGS.translations_dir:
    raise ValueError(
        "Either --translation or --translations_dir must be specified.")
  transl_dir = os.path.expanduser(FLAGS.translations_dir)
  if not os.path.exists(transl_dir):
    exit_time = time.time() + FLAGS.wait_minutes * 60
    tf.logging.info("Translation dir %s does not exist, waiting till %s.",
                    transl_dir, time.asctime(time.localtime(exit_time)))
    while not os.path.exists(transl_dir):
      time.sleep(10)
      if time.time() > exit_time:
        raise ValueError("Translation dir %s does not exist" % transl_dir)

  last_step_file = os.path.join(FLAGS.event_dir, "last_evaluated_step.txt")
  if FLAGS.min_steps == -1:
    if tf.gfile.Exists(last_step_file):
      with open(last_step_file) as ls_file:
        FLAGS.min_steps = int(ls_file.read())
    else:
      FLAGS.min_steps = 0
  if FLAGS.report_zero is None:
    FLAGS.report_zero = FLAGS.min_steps == 0

  writer = tf.summary.FileWriter(FLAGS.event_dir)
  for transl_file in bleu_hook.stepfiles_iterator(
      transl_dir, FLAGS.wait_minutes, FLAGS.min_steps, path_suffix=""):
    # report_zero handling must be inside the for-loop,
    # so we are sure the transl_dir is already created.
    if FLAGS.report_zero:
      all_files = (os.path.join(transl_dir, f) for f in os.listdir(transl_dir))
      start_time = min(
          os.path.getmtime(f) for f in all_files if os.path.isfile(f))
      values = []
      if FLAGS.bleu_variant in ("uncased", "both"):
        values.append(tf.Summary.Value(
            tag="BLEU_uncased" + FLAGS.tag_suffix, simple_value=0))
      if FLAGS.bleu_variant in ("cased", "both"):
        values.append(tf.Summary.Value(
            tag="BLEU_cased" + FLAGS.tag_suffix, simple_value=0))
      writer.add_event(tf.summary.Event(summary=tf.Summary(value=values),
                                        wall_time=start_time, step=0))
      FLAGS.report_zero = False

    filename = transl_file.filename
    tf.logging.info("Evaluating " + filename)
    values = []
    if FLAGS.bleu_variant in ("uncased", "both"):
      bleu = 100 * bleu_hook.bleu_wrapper(FLAGS.reference, filename,
                                          case_sensitive=False)
      values.append(tf.Summary.Value(tag="BLEU_uncased" + FLAGS.tag_suffix,
                                     simple_value=bleu))
      tf.logging.info("%s: BLEU_uncased = %6.2f" % (filename, bleu))
    if FLAGS.bleu_variant in ("cased", "both"):
      bleu = 100 * bleu_hook.bleu_wrapper(FLAGS.reference, filename,
                                          case_sensitive=True)
      values.append(tf.Summary.Value(tag="BLEU_cased" + FLAGS.tag_suffix,
                                     simple_value=bleu))
      tf.logging.info("%s: BLEU_cased = %6.2f" % (transl_file.filename, bleu))
    writer.add_event(tf.summary.Event(
        summary=tf.Summary(value=values),
        wall_time=transl_file.mtime, step=transl_file.steps))
    writer.flush()
    with open(last_step_file, "w") as ls_file:
      ls_file.write(str(transl_file.steps) + "\n")


if __name__ == "__main__":
  tf.logging.set_verbosity(tf.logging.INFO)
  tf.app.run()


================================================
FILE: tensor2tensor/bin/t2t_datagen.py
================================================
# coding=utf-8
# Copyright 2023 The Tensor2Tensor Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""Produces the training and dev data for --problem into --data_dir.

Produces sharded and shuffled TFRecord files of tensorflow.Example protocol
buffers for a variety of registered datasets.

All Problems are registered with @registry.register_problem or are in
_SUPPORTED_PROBLEM_GENERATORS in this file. Each entry maps a string name
(selectable on the command-line with --problem) to a function that takes 2
arguments - input_directory and mode (one of "train" or "dev") - and yields for
each training example a dictionary mapping string feature names to lists of
{string, int, float}. The generator will be run once for each mode.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import multiprocessing
import os
import random
import tempfile

import numpy as np

from tensor2tensor import problems as problems_lib  # pylint: disable=unused-import
from tensor2tensor.data_generators import generator_utils
from tensor2tensor.envs import env_problem_utils
from tensor2tensor.utils import registry
from tensor2tensor.utils import usr_dir

try:
  # pylint: disable=g-import-not-at-top
  from tensor2tensor.data_generators import algorithmic_math
  from tensor2tensor.data_generators import audio
  from tensor2tensor.data_generators import snli
  from tensor2tensor.data_generators import wsj_parsing
  # pylint: enable=g-import-not-at-top
except ImportError:
  pass

# Improrting here to prevent pylint from ungrouped-imports warning.
import tensorflow.compat.v1 as tf  # pylint: disable=g-import-not-at-top

flags = tf.flags
FLAGS = flags.FLAGS

flags.DEFINE_string("data_dir", "", "Data directory.")
flags.DEFINE_string("tmp_dir", "/tmp/t2t_datagen",
                    "Temporary storage directory.")
flags.DEFINE_string("problem", "",
                    "The name of the problem to generate data for.")
flags.DEFINE_string("exclude_problems", "",
                    "Comma-separates list of problems to exclude.")
flags.DEFINE_integer(
    "num_shards", 0, "How many shards to use. Ignored for "
    "registered Problems.")
flags.DEFINE_integer("max_cases", 0,
                     "Maximum number of cases to generate (unbounded if 0).")
flags.DEFINE_integer(
    "env_problem_max_env_steps", 0,
    "Maximum number of steps to take for environment-based problems. "
    "Actions are chosen randomly")
flags.DEFINE_integer(
    "env_problem_batch_size", 0,
    "Number of environments to simulate for environment-based problems.")
flags.DEFINE_bool("only_list", False,
                  "If true, we only list the problems that will be generated.")
flags.DEFINE_integer("random_seed", 429459, "Random seed to use.")
flags.DEFINE_integer("task_id", -1, "For distributed data generation.")
flags.DEFINE_integer("task_id_start", -1, "For distributed data generation.")
flags.DEFINE_integer("task_id_end", -1, "For distributed data generation.")
flags.DEFINE_integer(
    "num_concurrent_processes", None,
    "Applies only to problems for which multiprocess_generate=True.")
flags.DEFINE_string(
    "t2t_usr_dir", "", "Path to a Python module that will be imported. The "
    "__init__.py file should include the necessary imports. "
    "The imported files should contain registrations, "
    "e.g. @registry.register_problem calls, that will then be "
    "available to t2t-datagen.")

# Mapping from problems that we can generate data for to their generators.
# pylint: disable=g-long-lambda
_SUPPORTED_PROBLEM_GENERATORS = {
    "algorithmic_algebra_inverse":
        (lambda: algorithmic_math.algebra_inverse(26, 0, 2, 100000),
         lambda: algorithmic_math.algebra_inverse(26, 3, 3, 10000),
         lambda: None),  # test set
    "parsing_english_ptb8k":
        (lambda: wsj_parsing.parsing_token_generator(
            FLAGS.data_dir, FLAGS.tmp_dir, True, 2**13, 2**9),
         lambda: wsj_parsing.parsing_token_generator(
             FLAGS.data_dir, FLAGS.tmp_dir, False, 2**13, 2**9),
         lambda: None),  # test set
    "parsing_english_ptb16k":
        (lambda: wsj_parsing.parsing_token_generator(
            FLAGS.data_dir, FLAGS.tmp_dir, True, 2**14, 2**9),
         lambda: wsj_parsing.parsing_token_generator(
             FLAGS.data_dir, FLAGS.tmp_dir, False, 2**14, 2**9),
         lambda: None),  # test set
    "inference_snli32k":
        (lambda: snli.snli_token_generator(FLAGS.tmp_dir, True, 2**15),
         lambda: snli.snli_token_generator(FLAGS.tmp_dir, False, 2**15),
         lambda: None),  # test set
    "audio_timit_characters_test": (lambda: audio.timit_generator(
        FLAGS.data_dir, FLAGS.tmp_dir, True, 1718
    ), lambda: audio.timit_generator(FLAGS.data_dir, FLAGS.tmp_dir, False, 626),
                                    lambda: None),  # test set
    "audio_timit_tokens_8k_test": (lambda: audio.timit_generator(
        FLAGS.data_dir,
        FLAGS.tmp_dir,
        True,
        1718,
        vocab_filename="vocab.endefr.%d" % 2**13,
        vocab_size=2**13), lambda: audio.timit_generator(
            FLAGS.data_dir,
            FLAGS.tmp_dir,
            False,
            626,
            vocab_filename="vocab.endefr.%d" % 2**13,
            vocab_size=2**13), lambda: None),  # test set
    "audio_timit_tokens_32k_test": (lambda: audio.timit_generator(
        FLAGS.data_dir,
        FLAGS.tmp_dir,
        True,
        1718,
        vocab_filename="vocab.endefr.%d" % 2**15,
        vocab_size=2**15), lambda: audio.timit_generator(
            FLAGS.data_dir,
            FLAGS.tmp_dir,
            False,
            626,
            vocab_filename="vocab.endefr.%d" % 2**15,
            vocab_size=2**15), lambda: None),  # test set
}

# pylint: enable=g-long-lambda


def set_random_seed():
  """Set the random seed from flag everywhere."""
  tf.set_random_seed(FLAGS.random_seed)
  random.seed(FLAGS.random_seed)
  np.random.seed(FLAGS.random_seed)


def main(_):
  usr_dir.import_usr_dir(FLAGS.t2t_usr_dir)

  # Calculate the list of problems to generate.
  problems = sorted(
      list(_SUPPORTED_PROBLEM_GENERATORS) + registry.list_base_problems() +
      registry.list_env_problems())
  for exclude in FLAGS.exclude_problems.split(","):
    if exclude:
      problems = [p for p in problems if exclude not in p]
  if FLAGS.problem and FLAGS.problem[-1] == "*":
    problems = [p for p in problems if p.startswith(FLAGS.problem[:-1])]
  elif FLAGS.problem and "," in FLAGS.problem:
    problems = [p for p in problems if p in FLAGS.problem.split(",")]
  elif FLAGS.problem:
    problems = [p for p in problems if p == FLAGS.problem]
  else:
    problems = []

  # Remove TIMIT if paths are not given.
  if getattr(FLAGS, "timit_paths", None):
    problems = [p for p in problems if "timit" not in p]
  # Remove parsing if paths are not given.
  if getattr(FLAGS, "parsing_path", None):
    problems = [p for p in problems if "parsing_english_ptb" not in p]

  if not problems:
    problems_str = "\n  * ".join(
        sorted(
            list(_SUPPORTED_PROBLEM_GENERATORS) +
            registry.list_base_problems() + registry.list_env_problems()))
    error_msg = ("You must specify one of the supported problems to "
                 "generate data for:\n  * " + problems_str + "\n")
    error_msg += ("TIMIT and parsing need data_sets specified with "
                  "--timit_paths and --parsing_path.")
    raise ValueError(error_msg)

  if not FLAGS.data_dir:
    FLAGS.data_dir = tempfile.gettempdir()
    tf.logging.warning(
        "It is strongly recommended to specify --data_dir. "
        "Data will be written to default data_dir=%s.", FLAGS.data_dir)
  FLAGS.data_dir = os.path.expanduser(FLAGS.data_dir)
  tf.gfile.MakeDirs(FLAGS.data_dir)

  tf.logging.info("Generating problems:\n%s" %
                  registry.display_list_by_prefix(problems, starting_spaces=4))
  if FLAGS.only_list:
    return
  for problem in problems:
    set_random_seed()

    if problem in _SUPPORTED_PROBLEM_GENERATORS:
      generate_data_for_problem(problem)
    elif problem in registry.list_base_problems():
      generate_data_for_registered_problem(problem)
    elif problem in registry.list_env_problems():
      generate_data_for_env_problem(problem)
    else:
      tf.logging.error("Problem %s is not a supported problem for datagen.",
                       problem)


def generate_data_for_problem(problem):
  """Generate data for a problem in _SUPPORTED_PROBLEM_GENERATORS."""
  training_gen, dev_gen, test_gen = _SUPPORTED_PROBLEM_GENERATORS[problem]

  num_train_shards = FLAGS.num_shards or 10
  tf.logging.info("Generating training data for %s.", problem)
  train_output_files = generator_utils.train_data_filenames(
      problem + generator_utils.UNSHUFFLED_SUFFIX, FLAGS.data_dir,
      num_train_shards)
  generator_utils.generate_files(training_gen(), train_output_files,
                                 FLAGS.max_cases)
  num_dev_shards = int(num_train_shards * 0.1)
  tf.logging.info("Generating development data for %s.", problem)
  dev_output_files = generator_utils.dev_data_filenames(
      problem + generator_utils.UNSHUFFLED_SUFFIX, FLAGS.data_dir,
      num_dev_shards)
  generator_utils.generate_files(dev_gen(), dev_output_files)
  num_test_shards = int(num_train_shards * 0.1)
  test_output_files = []
  test_gen_data = test_gen()
  if test_gen_data is not None:
    tf.logging.info("Generating test data for %s.", problem)
    test_output_files = generator_utils.test_data_filenames(
        problem + generator_utils.UNSHUFFLED_SUFFIX, FLAGS.data_dir,
        num_test_shards)
    generator_utils.generate_files(test_gen_data, test_output_files)
  all_output_files = train_output_files + dev_output_files + test_output_files
  generator_utils.shuffle_dataset(all_output_files)


def generate_data_in_process(arg):
  problem_name, data_dir, tmp_dir, task_id = arg
  problem = registry.problem(problem_name)
  problem.generate_data(data_dir, tmp_dir, task_id)


def generate_data_for_env_problem(problem_name):
  """Generate data for `EnvProblem`s."""
  assert FLAGS.env_problem_max_env_steps > 0, ("--env_problem_max_env_steps "
                                               "should be greater than zero")
  assert FLAGS.env_problem_batch_size > 0, ("--env_problem_batch_size should be"
                                            " greather than zero")
  problem = registry.env_problem(problem_name)
  task_id = None if FLAGS.task_id < 0 else FLAGS.task_id
  data_dir = os.path.expanduser(FLAGS.data_dir)
  tmp_dir = os.path.expanduser(FLAGS.tmp_dir)
  # TODO(msaffar): Handle large values for env_problem_batch_size where we
  #  cannot create that many environments within the same process.
  problem.initialize(batch_size=FLAGS.env_problem_batch_size)
  env_problem_utils.play_env_problem_randomly(
      problem, num_steps=FLAGS.env_problem_max_env_steps)
  problem.generate_data(data_dir=data_dir, tmp_dir=tmp_dir, task_id=task_id)


def generate_data_for_registered_problem(problem_name):
  """Generate data for a registered problem."""
  tf.logging.info("Generating data for %s.", problem_name)
  if FLAGS.num_shards:
    raise ValueError("--num_shards should not be set for registered Problem.")
  problem = registry.problem(problem_name)
  task_id = None if FLAGS.task_id < 0 else FLAGS.task_id
  data_dir = os.path.expanduser(FLAGS.data_dir)
  tmp_dir = os.path.expanduser(FLAGS.tmp_dir)
  if task_id is None and problem.multiprocess_generate:
    if FLAGS.task_id_start != -1:
      assert FLAGS.task_id_end != -1
      task_id_start = FLAGS.task_id_start
      task_id_end = FLAGS.task_id_end
    else:
      task_id_start = 0
      task_id_end = problem.num_generate_tasks
    pool = multiprocessing.Pool(processes=FLAGS.num_concurrent_processes)
    problem.prepare_to_generate(data_dir, tmp_dir)
    args = [(problem_name, data_dir, tmp_dir, task_id)
            for task_id in range(task_id_start, task_id_end)]
    pool.map(generate_data_in_process, args)
  else:
    problem.generate_data(data_dir, tmp_dir, task_id)


if __name__ == "__main__":
  tf.logging.set_verbosity(tf.logging.INFO)
  tf.app.run()


================================================
FILE: tensor2tensor/bin/t2t_decoder.py
================================================
# coding=utf-8
# Copyright 2023 The Tensor2Tensor Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

r"""Decode from trained T2T models.

This binary performs inference using the Estimator API.

Example usage to decode from dataset:

  t2t-decoder \
      --data_dir ~/data \
      --problem=algorithmic_identity_binary40 \
      --model=transformer
      --hparams_set=transformer_base

Set FLAGS.decode_interactive or FLAGS.decode_from_file for alternative decode
sources.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import os
from tensor2tensor.bin import t2t_trainer
from tensor2tensor.data_generators import problem  # pylint: disable=unused-import
from tensor2tensor.data_generators import text_encoder
from tensor2tensor.utils import decoding
from tensor2tensor.utils import registry
from tensor2tensor.utils import trainer_lib
from tensor2tensor.utils import usr_dir

import tensorflow.compat.v1 as tf
from tensorflow.compat.v1 import estimator as tf_estimator

flags = tf.flags
FLAGS = flags.FLAGS

# Additional flags in bin/t2t_trainer.py and utils/flags.py
flags.DEFINE_string("checkpoint_path", None,
                    "Path to the model checkpoint. Overrides output_dir.")
flags.DEFINE_bool("keep_timestamp", False,
                  "Set the mtime of the decoded file to the "
                  "checkpoint_path+'.index' mtime.")
flags.DEFINE_bool("decode_interactive", False,
                  "Interactive local inference mode.")
flags.DEFINE_integer("decode_shards", 1, "Number of decoding replicas.")
flags.DEFINE_string("score_file", "", "File to score. Each line in the file "
                    "must be in the format input \t target.")
flags.DEFINE_bool("decode_in_memory", False, "Decode in memory.")
flags.DEFINE_bool("disable_grappler_optimizations", False,
                  "Disable Grappler if need be to avoid tensor format errors.")


def create_hparams():
  hparams_path = None
  if FLAGS.output_dir:
    hparams_path = os.path.join(FLAGS.output_dir, "hparams.json")
  return trainer_lib.create_hparams(
      FLAGS.hparams_set,
      FLAGS.hparams,
      data_dir=os.path.expanduser(FLAGS.data_dir),
      problem_name=FLAGS.problem,
      hparams_path=hparams_path)


def create_decode_hparams():
  decode_hp = decoding.decode_hparams(FLAGS.decode_hparams)
  decode_hp.shards = FLAGS.decode_shards
  decode_hp.shard_id = FLAGS.worker_id
  decode_in_memory = FLAGS.decode_in_memory or decode_hp.decode_in_memory
  decode_hp.decode_in_memory = decode_in_memory
  decode_hp.decode_to_file = FLAGS.decode_to_file
  decode_hp.decode_reference = FLAGS.decode_reference
  return decode_hp


def decode(estimator, hparams, decode_hp):
  """Decode from estimator. Interactive, from file, or from dataset."""
  if FLAGS.decode_interactive:
    if estimator.config.use_tpu:
      raise ValueError("TPU can only decode from dataset.")
    decoding.decode_interactively(estimator, hparams, decode_hp,
                                  checkpoint_path=FLAGS.checkpoint_path)
  elif FLAGS.decode_from_file:
    decoding.decode_from_file(estimator, FLAGS.decode_from_file, hparams,
                              decode_hp, FLAGS.decode_to_file,
                              checkpoint_path=FLAGS.checkpoint_path)
    if FLAGS.checkpoint_path and FLAGS.keep_timestamp:
      ckpt_time = os.path.getmtime(FLAGS.checkpoint_path + ".index")
      os.utime(FLAGS.decode_to_file, (ckpt_time, ckpt_time))
  else:
    decoding.decode_from_dataset(
        estimator,
        FLAGS.problem,
        hparams,
        decode_hp,
        decode_to_file=FLAGS.decode_to_file,
        dataset_split="test" if FLAGS.eval_use_test_set else None,
        checkpoint_path=FLAGS.checkpoint_path)


def score_file(filename):
  """Score each line in a file and return the scores."""
  # Prepare model.
  hparams = create_hparams()
  encoders = registry.problem(FLAGS.problem).feature_encoders(FLAGS.data_dir)
  has_inputs = "inputs" in encoders

  # Prepare features for feeding into the model.
  if has_inputs:
    inputs_ph = tf.placeholder(dtype=tf.int32)  # Just length dimension.
    batch_inputs = tf.reshape(inputs_ph, [1, -1, 1, 1])  # Make it 4D.
  targets_ph = tf.placeholder(dtype=tf.int32)  # Just length dimension.
  batch_targets = tf.reshape(targets_ph, [1, -1, 1, 1])  # Make it 4D.
  if has_inputs:
    features = {"inputs": batch_inputs, "targets": batch_targets}
  else:
    features = {"targets": batch_targets}

  # Prepare the model and the graph when model runs on features.
  model = registry.model(FLAGS.model)(hparams, tf_estimator.ModeKeys.EVAL)
  _, losses = model(features)
  saver = tf.train.Saver()

  with tf.Session() as sess:
    # Load weights from checkpoint.
    if FLAGS.checkpoint_path is None:
      ckpts = tf.train.get_checkpoint_state(FLAGS.output_dir)
      ckpt = ckpts.model_checkpoint_path
    else:
      ckpt = FLAGS.checkpoint_path
    saver.restore(sess, ckpt)
    # Run on each line.
    with tf.gfile.Open(filename) as f:
      lines = f.readlines()
    results = []
    for line in lines:
      tab_split = line.split("\t")
      if len(tab_split) > 2:
        raise ValueError("Each line must have at most one tab separator.")
      if len(tab_split) == 1:
        targets = tab_split[0].strip()
      else:
        targets = tab_split[1].strip()
        inputs = tab_split[0].strip()
      # Run encoders and append EOS symbol.
      targets_numpy = encoders["targets"].encode(
          targets) + [text_encoder.EOS_ID]
      if has_inputs:
        inputs_numpy = encoders["inputs"].encode(inputs) + [text_encoder.EOS_ID]
      # Prepare the feed.
      if has_inputs:
        feed = {inputs_ph: inputs_numpy, targets_ph: targets_numpy}
      else:
        feed = {targets_ph: targets_numpy}
      # Get the score.
      np_loss = sess.run(losses["training"], feed)
      results.append(np_loss)
  return results


def main(_):
  tf.logging.set_verbosity(tf.logging.INFO)
  trainer_lib.set_random_seed(FLAGS.random_seed)
  usr_dir.import_usr_dir(FLAGS.t2t_usr_dir)


  if FLAGS.score_file:
    filename = os.path.expanduser(FLAGS.score_file)
    if not tf.gfile.Exists(filename):
      raise ValueError("The file to score doesn't exist: %s" % filename)
    results = score_file(filename)
    if not FLAGS.decode_to_file:
      raise ValueError("To score a file, specify --decode_to_file for results.")
    write_file = tf.gfile.Open(os.path.expanduser(FLAGS.decode_to_file), "w")
    for score in results:
      write_file.write("%.6f\n" % score)
    write_file.close()
    return

  hp = create_hparams()
  decode_hp = create_decode_hparams()
  run_config = t2t_trainer.create_run_config(hp)
  if FLAGS.disable_grappler_optimizations:
    run_config.session_config.graph_options.rewrite_options.disable_meta_optimizer = True

  # summary-hook in tf.estimator.EstimatorSpec requires
  # hparams.model_dir to be set.
  hp.add_hparam("model_dir", run_config.model_dir)

  estimator = trainer_lib.create_estimator(
      FLAGS.model,
      hp,
      run_config,
      decode_hparams=decode_hp,
      use_tpu=FLAGS.use_tpu)

  decode(estimator, hp, decode_hp)


if __name__ == "__main__":
  tf.logging.set_verbosity(tf.logging.INFO)
  tf.app.run()


================================================
FILE: tensor2tensor/bin/t2t_distill.py
================================================
# coding=utf-8
# Copyright 2023 The Tensor2Tensor Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

r"""Perform distillation for a teacher to student.

This script is intended to be used with --model=distillation. See the model for
example hyperparameters and usage.

If only output_dir is specified, then teacher_dir is `output_dir/teacher`, and
the student_dir is `output_dir/student`. Logs are written inside `output_dir`.
If teacher_dir is also specified explicitly, the student_dir is still
`output_dir/student` and the logs are written into `output_dir`. If student_dir
is further specified, the logs are written into student_dir unless output_dir is
explicitly specified, which only contains the logs in this case.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import os
from tensor2tensor import models  # pylint: disable=unused-import
from tensor2tensor import problems as problems_lib  # pylint: disable=unused-import
from tensor2tensor.bin import t2t_trainer
from tensor2tensor.utils import cloud_mlengine
from tensor2tensor.utils import flags as t2t_flags  # pylint: disable=unused-import
from tensor2tensor.utils import trainer_lib
from tensor2tensor.utils import usr_dir

import tensorflow.compat.v1 as tf

flags = tf.flags
FLAGS = flags.FLAGS

flags.DEFINE_bool(
    "skip_teacher_training", False,
    "By default, we train teacher model. If set to True, skip the training.")
flags.DEFINE_string(
    "teacher_dir", None,
    "Directory to teacher network. If not specified, `output_dir/teacher` is "
    "used instead.")
flags.DEFINE_string(
    "student_dir", None,
    "Directory to student network. If not specified, `output_dir/student` is "
    "used instead.")


def main(argv):
  tf.logging.set_verbosity(tf.logging.INFO)
  trainer_lib.set_random_seed(FLAGS.random_seed)
  usr_dir.import_usr_dir(FLAGS.t2t_usr_dir)
  t2t_trainer.maybe_log_registry_and_exit()

  if FLAGS.cloud_mlengine:
    cloud_mlengine.launch()
    return

  if FLAGS.generate_data:
    t2t_trainer.generate_data()

  if cloud_mlengine.job_dir():
    FLAGS.output_dir = cloud_mlengine.job_dir()

  if argv:
    t2t_trainer.set_hparams_from_args(argv[1:])

  root_output_dir = FLAGS.output_dir

  if FLAGS.teacher_dir:
    teacher_dir = FLAGS.teacher_dir
  else:
    teacher_dir = os.path.join(root_output_dir, "teacher")

  # Train Teacher ============
  if FLAGS.skip_teacher_training:
    tf.logging.info("training teacher skipped")
  else:
    hparams = t2t_trainer.create_hparams()
    hparams.distill_phase = "train"
    FLAGS.output_dir = teacher_dir

    exp_fn = t2t_trainer.create_experiment_fn()
    run_config = t2t_trainer.create_run_config(hparams)
    exp = exp_fn(run_config, hparams)
    if t2t_trainer.is_chief():
      t2t_trainer.save_metadata(hparams)
    t2t_trainer.execute_schedule(exp)

  # ==========================
  # Train Student ============
  hparams = t2t_trainer.create_hparams()
  hparams.add_hparam("teacher_dir", teacher_dir)
  hparams.distill_phase = "distill"
  if FLAGS.student_dir:
    student_dir = FLAGS.student_dir
  else:
    student_dir = os.path.join(root_output_dir, "student")
  FLAGS.output_dir = student_dir
  hparams.add_hparam("student_dir", student_dir)

  exp_fn = t2t_trainer.create_experiment_fn()
  run_config = t2t_trainer.create_run_config(hparams)
  exp = exp_fn(run_config, hparams)

  if t2t_trainer.is_chief():
    t2t_trainer.save_metadata(hparams)
  t2t_trainer.execute_schedule(exp)
  # ==========================


def create_teacher_experiment(run_config, hparams, argv):
  """Creates experiment function."""
  tf.logging.info("training teacher")
  tf.logging.set_verbosity(tf.logging.INFO)
  trainer_lib.set_random_seed(FLAGS.random_seed)
  usr_dir.import_usr_dir(FLAGS.t2t_usr_dir)
  t2t_trainer.maybe_log_registry_and_exit()

  if FLAGS.cloud_mlengine:
    return cloud_mlengine.launch()

  if FLAGS.generate_data:
    t2t_trainer.generate_data()

  if cloud_mlengine.job_dir():
    FLAGS.output_dir = cloud_mlengine.job_dir()

  if argv:
    t2t_trainer.set_hparams_from_args(argv[1:])

  hparams.distill_phase = "train"
  exp_fn = t2t_trainer.create_experiment_fn()
  exp = exp_fn(run_config, hparams)
  return exp


def create_student_experiment(run_config, hparams, argv):
  """Creates experiment function."""
  tf.logging.info("training student")
  tf.logging.set_verbosity(tf.logging.INFO)
  trainer_lib.set_random_seed(FLAGS.random_seed)
  usr_dir.import_usr_dir(FLAGS.t2t_usr_dir)
  t2t_trainer.maybe_log_registry_and_exit()

  if FLAGS.cloud_mlengine:
    return cloud_mlengine.launch()

  if FLAGS.generate_data:
    t2t_trainer.generate_data()

  if cloud_mlengine.job_dir():
    FLAGS.output_dir = cloud_mlengine.job_dir()

  if argv:
    t2t_trainer.set_hparams_from_args(argv[1:])

  hparams.add_hparam("teacher_dir", FLAGS.teacher_dir)
  hparams.add_hparam("student_dir", FLAGS.student_dir)
  hparams.distill_phase = "distill"
  exp_fn = t2t_trainer.create_experiment_fn()
  exp = exp_fn(run_config, hparams)
  return exp


def create_experiment_fn(argv, train_teacher):

  def teacher_experiment_fn(run_config, hparams):
    return create_teacher_experiment(run_config, hparams, argv)

  def student_experiment_fn(run_config, hparams):
    return create_student_experiment(run_config, hparams, argv)

  return teacher_experiment_fn if train_teacher else student_experiment_fn


if __name__ == "__main__":
  tf.logging.set_verbosity(tf.logging.INFO)
  tf.app.run()


================================================
FILE: tensor2tensor/bin/t2t_eval.py
================================================
# coding=utf-8
# Copyright 2023 The Tensor2Tensor Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

r"""Perform evaluation on trained T2T models using the Estimator API."""

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from tensor2tensor.bin import t2t_trainer          # pylint: disable=unused-import
from tensor2tensor.data_generators import problem  # pylint: disable=unused-import
from tensor2tensor.utils import trainer_lib
from tensor2tensor.utils import usr_dir
import tensorflow.compat.v1 as tf
from tensorflow.compat.v1 import estimator as tf_estimator

flags = tf.flags
FLAGS = flags.FLAGS


def main(_):
  tf.logging.set_verbosity(tf.logging.INFO)
  trainer_lib.set_random_seed(FLAGS.random_seed)
  usr_dir.import_usr_dir(FLAGS.t2t_usr_dir)

  hparams = trainer_lib.create_hparams(
      FLAGS.hparams_set, FLAGS.hparams, data_dir=FLAGS.data_dir,
      problem_name=FLAGS.problem)

  # set appropriate dataset-split, if flags.eval_use_test_set.
  dataset_split = "test" if FLAGS.eval_use_test_set else None
  dataset_kwargs = {"dataset_split": dataset_split}
  eval_input_fn = hparams.problem.make_estimator_input_fn(
      tf_estimator.ModeKeys.EVAL, hparams, dataset_kwargs=dataset_kwargs)
  config = t2t_trainer.create_run_config(hparams)

  # summary-hook in tf.estimator.EstimatorSpec requires
  # hparams.model_dir to be set.
  hparams.add_hparam("model_dir", config.model_dir)

  estimator = trainer_lib.create_estimator(
      FLAGS.model, hparams, config, use_tpu=FLAGS.use_tpu)
  ckpt_iter = trainer_lib.next_checkpoint(
      hparams.model_dir, FLAGS.eval_timeout_mins)
  for ckpt_path in ckpt_iter:
    predictions = estimator.evaluate(
        eval_input_fn, steps=FLAGS.eval_steps, checkpoint_path=ckpt_path)
    tf.logging.info(predictions)


if __name__ == "__main__":
  tf.logging.set_verbosity(tf.logging.INFO)
  tf.app.run()


================================================
FILE: tensor2tensor/bin/t2t_prune.py
================================================
# coding=utf-8
# Copyright 2023 The Tensor2Tensor Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

r"""Prune T2TModels using some heuristic.

This supports a very common form of pruning known as magnitude-based pruning.
It ranks individual weights or units according to their magnitudes and zeros
out the smallest k% of weights, effectively removing them from the graph.

Example run:
- train a resnet on cifar10:
    bin/t2t_trainer.py --problem=image_cifar10 --hparams_set=resnet_cifar_32 \
      --model=resnet

- evaluate different pruning percentages using weight-level pruning:
    bin/t2t_prune.py --pruning_params_set=resnet_weight --problem=image_cifar10\
      --hparams_set=resnet_cifar_32 --model=resnet
"""

import os

from tensor2tensor.bin import t2t_trainer
from tensor2tensor.data_generators import problem as problem_lib  # pylint: disable=unused-import
from tensor2tensor.utils import pruning_utils
from tensor2tensor.utils import registry
from tensor2tensor.utils import t2t_model
from tensor2tensor.utils import trainer_lib
from tensor2tensor.utils import usr_dir

import tensorflow.compat.v1 as tf
from tensorflow.compat.v1 import estimator as tf_estimator

flags = tf.flags
FLAGS = flags.FLAGS

# See flags.py for additional command-line flags.
flags.DEFINE_string("pruning_params_set", None,
                    "Which pruning parameters to use.")


def create_pruning_params():
  return registry.pruning_params(FLAGS.pruning_params_set)


def create_pruning_strategy(name):
  return registry.pruning_strategy(name)


def main(argv):
  tf.logging.set_verbosity(tf.logging.INFO)
  trainer_lib.set_random_seed(FLAGS.random_seed)
  usr_dir.import_usr_dir(FLAGS.t2t_usr_dir)
  t2t_trainer.maybe_log_registry_and_exit()


  if FLAGS.generate_data:
    t2t_trainer.generate_data()

  if argv:
    t2t_trainer.set_hparams_from_args(argv[1:])
  hparams = t2t_trainer.create_hparams()
  trainer_lib.add_problem_hparams(hparams, FLAGS.problem)
  pruning_params = create_pruning_params()
  pruning_strategy = create_pruning_strategy(pruning_params.strategy)

  config = t2t_trainer.create_run_config(hparams)
  params = {"batch_size": hparams.batch_size}

  # add "_rev" as a hack to avoid image standardization
  problem = registry.problem(FLAGS.problem)
  input_fn = problem.make_estimator_input_fn(tf_estimator.ModeKeys.EVAL,
                                             hparams)
  dataset = input_fn(params, config).repeat()
  features, labels = dataset.make_one_shot_iterator().get_next()

  sess = tf.Session()

  model_fn = t2t_model.T2TModel.make_estimator_model_fn(
      FLAGS.model, hparams, use_tpu=FLAGS.use_tpu)
  spec = model_fn(
      features,
      labels,
      tf_estimator.ModeKeys.EVAL,
      params=hparams,
      config=config)

  # Restore weights
  saver = tf.train.Saver()
  checkpoint_path = os.path.expanduser(FLAGS.output_dir or
                                       FLAGS.checkpoint_path)
  saver.restore(sess, tf.train.latest_checkpoint(checkpoint_path))

  def eval_model():
    preds = spec.predictions["predictions"]
    preds = tf.argmax(preds, -1, output_type=labels.dtype)
    _, acc_update_op = tf.metrics.accuracy(labels=labels, predictions=preds)
    sess.run(tf.initialize_local_variables())
    for _ in range(FLAGS.eval_steps):
      acc = sess.run(acc_update_op)
    return acc

  pruning_utils.sparsify(sess, eval_model, pruning_strategy, pruning_params)


if __name__ == "__main__":
  tf.logging.set_verbosity(tf.logging.INFO)
  tf.app.run()


================================================
FILE: tensor2tensor/bin/t2t_trainer.py
================================================
# coding=utf-8
# Copyright 2023 The Tensor2Tensor Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""Train and evaluate."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import contextlib
import os
import sys
from tensor2tensor import models  # pylint: disable=unused-import
from tensor2tensor import problems as problems_lib  # pylint: disable=unused-import
from tensor2tensor.data_generators import problem  # pylint: disable=unused-import

from tensor2tensor.utils import cloud_mlengine
from tensor2tensor.utils import contrib
from tensor2tensor.utils import decoding
from tensor2tensor.utils import flags as t2t_flags  # pylint: disable=unused-import
from tensor2tensor.utils import hparams_lib
from tensor2tensor.utils import mlperf_log
from tensor2tensor.utils import registry
from tensor2tensor.utils import trainer_lib
from tensor2tensor.utils import usr_dir
import tensorflow.compat.v1 as tf
from tensorflow.compat.v1 import estimator as tf_estimator


flags = tf.flags
FLAGS = flags.FLAGS

# See utils/flags.py for additional command-line flags.
flags.DEFINE_string("t2t_usr_dir", None,
                    "Path to a Python module that will be imported. The "
                    "__init__.py file should inclu

Download .txt

gitextract_w47bzecb/

├── .gitignore
├── .travis.yml
├── AUTHORS
├── CONTRIBUTING.md
├── ISSUE_TEMPLATE.md
├── LICENSE
├── README.md
├── docs/
│   ├── cloud_mlengine.md
│   ├── cloud_tpu.md
│   ├── distributed_training.md
│   ├── index.md
│   ├── multi_problem.md
│   ├── new_model.md
│   ├── new_problem.md
│   ├── overview.md
│   ├── tutorials/
│   │   └── asr_with_transformer.md
│   └── walkthrough.md
├── floyd.yml
├── floyd_requirements.txt
├── oss_scripts/
│   ├── oss_integration_test.sh
│   ├── oss_pip_install.sh
│   ├── oss_release.sh
│   └── oss_tests.sh
├── pylintrc
├── setup.py
└── tensor2tensor/
    ├── __init__.py
    ├── bin/
    │   ├── __init__.py
    │   ├── build_vocab.py
    │   ├── make_tf_configs.py
    │   ├── t2t-avg-all
    │   ├── t2t-bleu
    │   ├── t2t-datagen
    │   ├── t2t-decoder
    │   ├── t2t-eval
    │   ├── t2t-exporter
    │   ├── t2t-insights-server
    │   ├── t2t-make-tf-configs
    │   ├── t2t-query-server
    │   ├── t2t-trainer
    │   ├── t2t-translate-all
    │   ├── t2t_attack.py
    │   ├── t2t_avg_all.py
    │   ├── t2t_bleu.py
    │   ├── t2t_datagen.py
    │   ├── t2t_decoder.py
    │   ├── t2t_distill.py
    │   ├── t2t_eval.py
    │   ├── t2t_prune.py
    │   ├── t2t_trainer.py
    │   ├── t2t_trainer_test.py
    │   └── t2t_translate_all.py
    ├── data_generators/
    │   ├── README.md
    │   ├── __init__.py
    │   ├── algorithmic.py
    │   ├── algorithmic_math.py
    │   ├── algorithmic_math_deepmind.py
    │   ├── algorithmic_math_test.py
    │   ├── algorithmic_math_two_variables.py
    │   ├── algorithmic_test.py
    │   ├── all_problems.py
    │   ├── allen_brain.py
    │   ├── allen_brain_test.py
    │   ├── audio.py
    │   ├── audio_encoder.py
    │   ├── audio_test.py
    │   ├── babi_qa.py
    │   ├── bair_robot_pushing.py
    │   ├── celeba.py
    │   ├── celeba_test.py
    │   ├── celebahq.py
    │   ├── cifar.py
    │   ├── cipher.py
    │   ├── cleaner_en_xx.py
    │   ├── cnn_dailymail.py
    │   ├── cola.py
    │   ├── common_voice.py
    │   ├── common_voice_test.py
    │   ├── conll_ner.py
    │   ├── desc2code.py
    │   ├── desc2code_test.py
    │   ├── dialog_abstract.py
    │   ├── dialog_cornell.py
    │   ├── dialog_dailydialog.py
    │   ├── dialog_opensubtitles.py
    │   ├── dialog_personachat.py
    │   ├── dna_encoder.py
    │   ├── dna_encoder_test.py
    │   ├── enwik8.py
    │   ├── fsns.py
    │   ├── function_docstring.py
    │   ├── gene_expression.py
    │   ├── gene_expression_test.py
    │   ├── generator_utils.py
    │   ├── generator_utils_test.py
    │   ├── google_robot_pushing.py
    │   ├── gym_env.py
    │   ├── gym_env_test.py
    │   ├── ice_parsing.py
    │   ├── image_lsun.py
    │   ├── image_utils.py
    │   ├── image_utils_test.py
    │   ├── imagenet.py
    │   ├── imagenet_test.py
    │   ├── imdb.py
    │   ├── inspect_tfrecord.py
    │   ├── lambada.py
    │   ├── librispeech.py
    │   ├── lm1b.py
    │   ├── lm1b_imdb.py
    │   ├── lm1b_mnli.py
    │   ├── mnist.py
    │   ├── moving_mnist.py
    │   ├── mrpc.py
    │   ├── mscoco.py
    │   ├── mscoco_test.py
    │   ├── multi_problem.py
    │   ├── multi_problem_v2.py
    │   ├── multi_problem_v2_test.py
    │   ├── multinli.py
    │   ├── ocr.py
    │   ├── ops/
    │   │   ├── pack_sequences_ops.cc
    │   │   ├── pack_sequences_ops_test.py
    │   │   ├── subword_text_encoder.cc
    │   │   ├── subword_text_encoder.h
    │   │   ├── subword_text_encoder_ops.cc
    │   │   ├── subword_text_encoder_ops_test.py
    │   │   ├── subword_text_encoder_test.cc
    │   │   └── testdata/
    │   │       └── subwords
    │   ├── paraphrase_ms_coco.py
    │   ├── paraphrase_ms_coco_test.py
    │   ├── pointer_generator_word.py
    │   ├── problem.py
    │   ├── problem_hparams.py
    │   ├── problem_test.py
    │   ├── program_search.py
    │   ├── program_search_test.py
    │   ├── ptb.py
    │   ├── qnli.py
    │   ├── quora_qpairs.py
    │   ├── rte.py
    │   ├── scitail.py
    │   ├── seq2edits.py
    │   ├── snli.py
    │   ├── speech_recognition.py
    │   ├── squad.py
    │   ├── sst_binary.py
    │   ├── stanford_nli.py
    │   ├── style_transfer.py
    │   ├── style_transfer_test.py
    │   ├── subject_verb_agreement.py
    │   ├── test_data/
    │   │   ├── 1.csv
    │   │   ├── corpus-1.txt
    │   │   ├── corpus-2.txt
    │   │   ├── vocab-1.txt
    │   │   └── vocab-2.txt
    │   ├── text_encoder.py
    │   ├── text_encoder_build_subword.py
    │   ├── text_encoder_test.py
    │   ├── text_problems.py
    │   ├── text_problems_test.py
    │   ├── timeseries.py
    │   ├── timeseries_data_generator.py
    │   ├── timeseries_data_generator_test.py
    │   ├── timeseries_test.py
    │   ├── tokenizer.py
    │   ├── tokenizer_test.py
    │   ├── transduction_problems.py
    │   ├── transduction_problems_test.py
    │   ├── translate.py
    │   ├── translate_encs.py
    │   ├── translate_encs_cubbitt.py
    │   ├── translate_ende.py
    │   ├── translate_ende_test.py
    │   ├── translate_enes.py
    │   ├── translate_enet.py
    │   ├── translate_enfr.py
    │   ├── translate_enid.py
    │   ├── translate_enmk.py
    │   ├── translate_enro.py
    │   ├── translate_entn.py
    │   ├── translate_envi.py
    │   ├── translate_enzh.py
    │   ├── translate_test.py
    │   ├── video_generated.py
    │   ├── video_utils.py
    │   ├── video_utils_test.py
    │   ├── vqa.py
    │   ├── vqa_utils.py
    │   ├── wiki.py
    │   ├── wiki_lm.py
    │   ├── wiki_multi_problems.py
    │   ├── wiki_revision.py
    │   ├── wiki_revision_utils.py
    │   ├── wikifact/
    │   │   └── README.md
    │   ├── wikisum/
    │   │   ├── README.md
    │   │   ├── __init__.py
    │   │   ├── delete_instances.sh
    │   │   ├── generate_vocab.py
    │   │   ├── get_references_commoncrawl.py
    │   │   ├── get_references_web.py
    │   │   ├── get_references_web_single_group.py
    │   │   ├── html.py
    │   │   ├── parallel_launch.py
    │   │   ├── produce_examples.py
    │   │   ├── test_data/
    │   │   │   ├── para_bad1.txt
    │   │   │   └── para_good1.txt
    │   │   ├── utils.py
    │   │   ├── utils_test.py
    │   │   ├── validate_data.py
    │   │   └── wikisum.py
    │   ├── wikitext103.py
    │   ├── wnli.py
    │   ├── wsj_parsing.py
    │   ├── yelp_full.py
    │   └── yelp_polarity.py
    ├── envs/
    │   ├── __init__.py
    │   ├── env_problem.py
    │   ├── env_problem_utils.py
    │   ├── env_problem_utils_test.py
    │   ├── gym_env_problem.py
    │   ├── gym_env_problem_test.py
    │   ├── gym_spaces_utils.py
    │   ├── gym_spaces_utils_test.py
    │   ├── mujoco_problems.py
    │   ├── mujoco_problems_test.py
    │   ├── rendered_env_problem.py
    │   ├── rendered_env_problem_test.py
    │   ├── tic_tac_toe_env.py
    │   ├── tic_tac_toe_env_problem.py
    │   ├── tic_tac_toe_env_problem_test.py
    │   ├── tic_tac_toe_env_test.py
    │   ├── time_step.py
    │   ├── time_step_test.py
    │   ├── trajectory.py
    │   └── trajectory_test.py
    ├── insights/
    │   ├── README.md
    │   ├── __init__.py
    │   ├── graph.py
    │   ├── insight_configuration.proto
    │   ├── polymer/
    │   │   ├── .bowerrc
    │   │   ├── attention_visualization/
    │   │   │   ├── attention-visualization.html
    │   │   │   └── attention-visualization.js
    │   │   ├── bower.json
    │   │   ├── common-types.js
    │   │   ├── explore_view/
    │   │   │   ├── explore-view.html
    │   │   │   └── explore-view.js
    │   │   ├── graph_visualization/
    │   │   │   ├── graph-visualization.html
    │   │   │   └── graph-visualization.js
    │   │   ├── index.html
    │   │   ├── insights_app/
    │   │   │   ├── insights-app.html
    │   │   │   └── insights-app.js
    │   │   ├── language_selector/
    │   │   │   ├── language-selector-content.html
    │   │   │   ├── language-selector-content.js
    │   │   │   ├── language-selector.html
    │   │   │   └── language-selector.js
    │   │   ├── processing_visualization/
    │   │   │   ├── processing-visualization.html
    │   │   │   └── processing-visualization.js
    │   │   ├── query_card/
    │   │   │   ├── query-card.html
    │   │   │   └── query-card.js
    │   │   ├── tensor2tensor.html
    │   │   └── translation_result/
    │   │       ├── translation-result.html
    │   │       └── translation-result.js
    │   ├── query_processor.py
    │   ├── server.py
    │   └── transformer_model.py
    ├── layers/
    │   ├── __init__.py
    │   ├── area_attention.py
    │   ├── area_attention_test.py
    │   ├── common_attention.py
    │   ├── common_attention_test.py
    │   ├── common_audio.py
    │   ├── common_hparams.py
    │   ├── common_image_attention.py
    │   ├── common_image_attention_test.py
    │   ├── common_layers.py
    │   ├── common_layers_test.py
    │   ├── common_video.py
    │   ├── common_video_test.py
    │   ├── discretization.py
    │   ├── discretization_test.py
    │   ├── latent_layers.py
    │   ├── latent_layers_test.py
    │   ├── message_passing_attention.py
    │   ├── modalities.py
    │   ├── modalities_test.py
    │   ├── ngram.py
    │   ├── ngram_test.py
    │   ├── transformer_glow_layers.py
    │   ├── transformer_glow_layers_ops.py
    │   ├── transformer_glow_layers_ops_test.py
    │   ├── transformer_glow_layers_test.py
    │   ├── transformer_layers.py
    │   ├── transformer_memory.py
    │   ├── transformer_memory_test.py
    │   ├── vq_discrete.py
    │   └── vqa_layers.py
    ├── metrics/
    │   ├── __init__.py
    │   ├── video_conditional_fvd.py
    │   └── video_conditional_fvd_test.py
    ├── models/
    │   ├── README.md
    │   ├── __init__.py
    │   ├── basic.py
    │   ├── basic_test.py
    │   ├── bytenet.py
    │   ├── bytenet_test.py
    │   ├── distillation.py
    │   ├── evolved_transformer.py
    │   ├── evolved_transformer_test.py
    │   ├── image_transformer.py
    │   ├── image_transformer_2d.py
    │   ├── image_transformer_2d_test.py
    │   ├── image_transformer_test.py
    │   ├── lstm.py
    │   ├── lstm_test.py
    │   ├── mtf_image_transformer.py
    │   ├── mtf_image_transformer_test.py
    │   ├── mtf_resnet.py
    │   ├── mtf_transformer.py
    │   ├── mtf_transformer2.py
    │   ├── mtf_transformer_test.py
    │   ├── neural_architecture_search/
    │   │   ├── README.md
    │   │   ├── __init__.py
    │   │   ├── nas_layers.py
    │   │   ├── nas_layers_test.py
    │   │   ├── nas_model.py
    │   │   └── nas_model_test.py
    │   ├── neural_assistant.py
    │   ├── neural_gpu.py
    │   ├── neural_gpu_test.py
    │   ├── research/
    │   │   ├── __init__.py
    │   │   ├── adafactor_experiments.py
    │   │   ├── aligned.py
    │   │   ├── attention_lm.py
    │   │   ├── attention_lm_moe.py
    │   │   ├── autoencoders.py
    │   │   ├── autoencoders_test.py
    │   │   ├── cycle_gan.py
    │   │   ├── gene_expression.py
    │   │   ├── gene_expression_test.py
    │   │   ├── glow.py
    │   │   ├── glow_init_hook.py
    │   │   ├── glow_ops.py
    │   │   ├── glow_ops_test.py
    │   │   ├── glow_test.py
    │   │   ├── lm_experiments.py
    │   │   ├── moe.py
    │   │   ├── moe_experiments.py
    │   │   ├── multiquery_paper.py
    │   │   ├── neural_stack.py
    │   │   ├── neural_stack_test.py
    │   │   ├── residual_shuffle_exchange.py
    │   │   ├── rl.py
    │   │   ├── shuffle_network.py
    │   │   ├── similarity_transformer.py
    │   │   ├── super_lm.py
    │   │   ├── transformer_aux.py
    │   │   ├── transformer_aux_test.py
    │   │   ├── transformer_moe.py
    │   │   ├── transformer_nat.py
    │   │   ├── transformer_parallel.py
    │   │   ├── transformer_revnet.py
    │   │   ├── transformer_revnet_test.py
    │   │   ├── transformer_seq2edits.py
    │   │   ├── transformer_sketch.py
    │   │   ├── transformer_symshard.py
    │   │   ├── transformer_vae.py
    │   │   ├── transformer_vae_flow_prior.py
    │   │   ├── transformer_vae_flow_prior_ops.py
    │   │   ├── transformer_vae_test.py
    │   │   ├── universal_transformer.py
    │   │   ├── universal_transformer_test.py
    │   │   ├── universal_transformer_util.py
    │   │   ├── vqa_attention.py
    │   │   ├── vqa_attention_test.py
    │   │   ├── vqa_recurrent_self_attention.py
    │   │   └── vqa_self_attention.py
    │   ├── resnet.py
    │   ├── resnet_test.py
    │   ├── revnet.py
    │   ├── revnet_test.py
    │   ├── shake_shake.py
    │   ├── slicenet.py
    │   ├── slicenet_test.py
    │   ├── text_cnn.py
    │   ├── transformer.py
    │   ├── transformer_test.py
    │   ├── vanilla_gan.py
    │   ├── video/
    │   │   ├── __init__.py
    │   │   ├── base.py
    │   │   ├── base_vae.py
    │   │   ├── basic_deterministic.py
    │   │   ├── basic_deterministic_params.py
    │   │   ├── basic_deterministic_test.py
    │   │   ├── basic_recurrent.py
    │   │   ├── basic_recurrent_test.py
    │   │   ├── basic_stochastic.py
    │   │   ├── basic_stochastic_test.py
    │   │   ├── emily.py
    │   │   ├── emily_test.py
    │   │   ├── epva.py
    │   │   ├── epva_params.py
    │   │   ├── next_frame_glow.py
    │   │   ├── nfg_conv3d_test.py
    │   │   ├── nfg_conv_lstm_test.py
    │   │   ├── nfg_conv_test.py
    │   │   ├── nfg_interpolate.py
    │   │   ├── nfg_test_utils.py
    │   │   ├── nfg_uncond_test.py
    │   │   ├── savp.py
    │   │   ├── savp_params.py
    │   │   ├── savp_test.py
    │   │   ├── sv2p.py
    │   │   ├── sv2p_params.py
    │   │   ├── sv2p_test.py
    │   │   └── tests_utils.py
    │   ├── xception.py
    │   └── xception_test.py
    ├── notebooks/
    │   ├── Transformer_translate.ipynb
    │   ├── asr_transformer.ipynb
    │   ├── hello_t2t-rl.ipynb
    │   ├── hello_t2t.ipynb
    │   └── t2t_problem.ipynb
    ├── problems.py
    ├── problems_colab.py
    ├── problems_test.py
    ├── rl/
    │   ├── README.md
    │   ├── __init__.py
    │   ├── batch_dqn_agent_test.py
    │   ├── batch_runner_test.py
    │   ├── datagen_with_agent.py
    │   ├── dopamine_connector.py
    │   ├── envs/
    │   │   ├── __init__.py
    │   │   ├── in_graph_batch_env.py
    │   │   ├── py_func_batch_env.py
    │   │   ├── simulated_batch_env.py
    │   │   ├── simulated_batch_gym_env.py
    │   │   └── tf_atari_wrappers.py
    │   ├── evaluator.py
    │   ├── evaluator_test.py
    │   ├── gym_utils.py
    │   ├── gym_utils_test.py
    │   ├── player.py
    │   ├── player_utils.py
    │   ├── policy_learner.py
    │   ├── ppo.py
    │   ├── ppo_learner.py
    │   ├── restarter.py
    │   ├── restarter_test.py
    │   ├── rl_utils.py
    │   ├── trainer_model_based.py
    │   ├── trainer_model_based_agent_only.py
    │   ├── trainer_model_based_params.py
    │   ├── trainer_model_based_recurrent_test.py
    │   ├── trainer_model_based_stochastic_test.py
    │   ├── trainer_model_based_sv2p_test.py
    │   ├── trainer_model_based_test.py
    │   ├── trainer_model_free.py
    │   ├── trainer_model_free_test.py
    │   └── trainer_model_free_tictactoe_test.py
    ├── serving/
    │   ├── README.md
    │   ├── __init__.py
    │   ├── export.py
    │   ├── query.py
    │   └── serving_utils.py
    ├── test_data/
    │   ├── example_usr_dir/
    │   │   ├── __init__.py
    │   │   ├── my_submodule.py
    │   │   └── requirements.txt
    │   ├── transformer_test_ckpt/
    │   │   ├── checkpoint
    │   │   ├── flags.txt
    │   │   ├── hparams.json
    │   │   ├── model.ckpt-1.data-00000-of-00002
    │   │   ├── model.ckpt-1.data-00001-of-00002
    │   │   ├── model.ckpt-1.index
    │   │   └── model.ckpt-1.meta
    │   ├── vocab.translate_ende_wmt32k.32768.subwords
    │   └── vocab.translate_ende_wmt8k.8192.subwords
    ├── utils/
    │   ├── __init__.py
    │   ├── adafactor.py
    │   ├── adafactor_test.py
    │   ├── adv_attack_utils.py
    │   ├── avg_checkpoints.py
    │   ├── beam_search.py
    │   ├── beam_search_test.py
    │   ├── bleu_hook.py
    │   ├── bleu_hook_test.py
    │   ├── checkpoint_compatibility_test.py
    │   ├── cloud_mlengine.py
    │   ├── compute_video_metrics.py
    │   ├── contrib.py
    │   ├── data_reader.py
    │   ├── data_reader_test.py
    │   ├── decoding.py
    │   ├── devices.py
    │   ├── diet.py
    │   ├── diet_test.py
    │   ├── expert_utils.py
    │   ├── expert_utils_test.py
    │   ├── flags.py
    │   ├── get_cnndm_rouge.sh
    │   ├── get_ende_bleu.sh
    │   ├── get_rouge.py
    │   ├── hparam.py
    │   ├── hparam_test.py
    │   ├── hparams_lib.py
    │   ├── hparams_lib_test.py
    │   ├── learning_rate.py
    │   ├── metrics.py
    │   ├── metrics_hook.py
    │   ├── metrics_hook_test.py
    │   ├── metrics_test.py
    │   ├── misc_utils.py
    │   ├── misc_utils_test.py
    │   ├── mlperf_log.py
    │   ├── mlperf_tags.py
    │   ├── mtf_model.py
    │   ├── multistep_optimizer.py
    │   ├── multistep_optimizer_test.py
    │   ├── multistep_with_adamoptimizer.py
    │   ├── multistep_with_adamoptimizer_test.py
    │   ├── optimize.py
    │   ├── optimize_test.py
    │   ├── partial_checkpoint_load_hook.py
    │   ├── pruning_utils.py
    │   ├── quantization.py
    │   ├── registry.py
    │   ├── registry_test.py
    │   ├── restore_hook.py
    │   ├── rouge.py
    │   ├── rouge_test.py
    │   ├── sari_hook.py
    │   ├── sari_hook_test.py
    │   ├── scheduled_sampling.py
    │   ├── t2t_model.py
    │   ├── t2t_model_test.py
    │   ├── test_utils.py
    │   ├── test_utils_test.py
    │   ├── trainer_lib.py
    │   ├── trainer_lib_test.py
    │   ├── update_ops_hook.py
    │   ├── usr_dir.py
    │   ├── video/
    │   │   ├── prediction2gif.py
    │   │   └── reward_confusion.py
    │   ├── video2gif.py
    │   ├── video_metrics.py
    │   ├── video_metrics_test.py
    │   ├── yellowfin.py
    │   └── yellowfin_test.py
    └── visualization/
        ├── TransformerVisualization.ipynb
        ├── __init__.py
        ├── attention.js
        ├── attention.py
        ├── visualization.py
        └── visualization_test.py

Download .txt

Showing preview only (600K chars total). Download the full file or copy to clipboard to get everything.

SYMBOL INDEX (7112 symbols across 448 files)

FILE: tensor2tensor/bin/build_vocab.py
  function main (line 50) | def main(_):

FILE: tensor2tensor/bin/make_tf_configs.py
  function main (line 42) | def main(_):

FILE: tensor2tensor/bin/t2t_attack.py
  function create_attack_params (line 72) | def create_attack_params():
  function create_attack (line 76) | def create_attack(attack):
  function create_surrogate_hparams (line 80) | def create_surrogate_hparams():
  function create_surrogate_run_config (line 84) | def create_surrogate_run_config(hp):
  function prepare_data (line 135) | def prepare_data(problem, hparams, params, config):
  function main (line 149) | def main(argv):

FILE: tensor2tensor/bin/t2t_avg_all.py
  function main (line 43) | def main(_):

FILE: tensor2tensor/bin/t2t_bleu.py
  function main (line 91) | def main(_):

FILE: tensor2tensor/bin/t2t_datagen.py
  function set_random_seed (line 154) | def set_random_seed():
  function main (line 161) | def main(_):
  function generate_data_for_problem (line 224) | def generate_data_for_problem(problem):
  function generate_data_in_process (line 254) | def generate_data_in_process(arg):
  function generate_data_for_env_problem (line 260) | def generate_data_for_env_problem(problem_name):
  function generate_data_for_registered_problem (line 278) | def generate_data_for_registered_problem(problem_name):

FILE: tensor2tensor/bin/t2t_decoder.py
  function create_hparams (line 66) | def create_hparams():
  function create_decode_hparams (line 78) | def create_decode_hparams():
  function decode (line 89) | def decode(estimator, hparams, decode_hp):
  function score_file (line 114) | def score_file(filename):
  function main (line 174) | def main(_):

FILE: tensor2tensor/bin/t2t_distill.py
  function main (line 59) | def main(argv):
  function create_teacher_experiment (line 122) | def create_teacher_experiment(run_config, hparams, argv):
  function create_student_experiment (line 148) | def create_student_experiment(run_config, hparams, argv):
  function create_experiment_fn (line 176) | def create_experiment_fn(argv, train_teacher):

FILE: tensor2tensor/bin/t2t_eval.py
  function main (line 33) | def main(_):

FILE: tensor2tensor/bin/t2t_prune.py
  function create_pruning_params (line 53) | def create_pruning_params():
  function create_pruning_strategy (line 57) | def create_pruning_strategy(name):
  function main (line 61) | def main(argv):

FILE: tensor2tensor/bin/t2t_trainer.py
  function set_hparams_from_args (line 150) | def set_hparams_from_args(args):
  function create_hparams (line 177) | def create_hparams():
  function create_experiment_fn (line 188) | def create_experiment_fn():
  function create_run_config (line 221) | def create_run_config(hp, output_dir=None):
  function generate_data (line 294) | def generate_data():
  function profile_context (line 307) | def profile_context():
  function maybe_log_registry_and_exit (line 318) | def maybe_log_registry_and_exit():
  function is_chief (line 324) | def is_chief():
  function save_metadata (line 329) | def save_metadata(hparams):
  function execute_schedule (line 367) | def execute_schedule(exp):
  function run_std_server (line 375) | def run_std_server():
  function main (line 380) | def main(argv):

FILE: tensor2tensor/bin/t2t_trainer_test.py
  class TrainerTest (line 29) | class TrainerTest(tf.test.TestCase):
    method setUpClass (line 32) | def setUpClass(cls):
    method testTrain (line 35) | def testTrain(self):

FILE: tensor2tensor/bin/t2t_translate_all.py
  function main (line 66) | def main(_):

FILE: tensor2tensor/data_generators/algorithmic.py
  class AlgorithmicProblem (line 34) | class AlgorithmicProblem(problem.Problem):
    method num_symbols (line 38) | def num_symbols(self):
    method generator (line 41) | def generator(self, nbr_symbols, max_length, nbr_cases):
    method train_length (line 46) | def train_length(self):
    method dev_length (line 50) | def dev_length(self):
    method train_size (line 54) | def train_size(self):
    method dev_size (line 58) | def dev_size(self):
    method num_shards (line 62) | def num_shards(self):
    method generate_data (line 65) | def generate_data(self, data_dir, _, task_id=-1):
    method hparams (line 84) | def hparams(self, defaults, unused_model_hparams):
  class AlgorithmicIdentityBinary40 (line 96) | class AlgorithmicIdentityBinary40(AlgorithmicProblem):
    method num_symbols (line 100) | def num_symbols(self):
    method generator (line 103) | def generator(self, nbr_symbols, max_length, nbr_cases):
  class AlgorithmicIdentityDecimal40 (line 126) | class AlgorithmicIdentityDecimal40(AlgorithmicIdentityBinary40):
    method num_symbols (line 130) | def num_symbols(self):
  class AlgorithmicIdentityVocab95Train20Eval30 (line 135) | class AlgorithmicIdentityVocab95Train20Eval30(AlgorithmicIdentityBinary40):
    method num_symbols (line 139) | def num_symbols(self):
    method train_length (line 143) | def train_length(self):
    method dev_length (line 147) | def dev_length(self):
    method train_size (line 151) | def train_size(self):
  class AlgorithmicShiftDecimal40 (line 156) | class AlgorithmicShiftDecimal40(AlgorithmicProblem):
    method num_symbols (line 160) | def num_symbols(self):
    method generator (line 163) | def generator(self, nbr_symbols, max_length, nbr_cases):
    method dev_length (line 186) | def dev_length(self):
  class AlgorithmicReverseBinary40 (line 191) | class AlgorithmicReverseBinary40(AlgorithmicProblem):
    method num_symbols (line 195) | def num_symbols(self):
    method generator (line 198) | def generator(self, nbr_symbols, max_length, nbr_cases):
  class AlgorithmicReverseDecimal40 (line 221) | class AlgorithmicReverseDecimal40(AlgorithmicReverseBinary40):
    method num_symbols (line 225) | def num_symbols(self):
  function zipf_distribution (line 229) | def zipf_distribution(nbr_symbols, alpha):
  function zipf_random_sample (line 247) | def zipf_random_sample(distr_map, sample_len):
  function reverse_generator_nlplike (line 264) | def reverse_generator_nlplike(nbr_symbols,
  class AlgorithmicReverseNlplike8k (line 299) | class AlgorithmicReverseNlplike8k(AlgorithmicProblem):
    method num_symbols (line 303) | def num_symbols(self):
    method generator (line 306) | def generator(self, nbr_symbols, max_length, nbr_cases):
    method train_length (line 311) | def train_length(self):
    method dev_length (line 315) | def dev_length(self):
  class AlgorithmicReverseNlplike32k (line 320) | class AlgorithmicReverseNlplike32k(AlgorithmicReverseNlplike8k):
    method num_symbols (line 324) | def num_symbols(self):
    method generator (line 327) | def generator(self, nbr_symbols, max_length, nbr_cases):
  function lower_endian_to_number (line 332) | def lower_endian_to_number(l, base):
  function number_to_lower_endian (line 337) | def number_to_lower_endian(n, base):
  function random_number_lower_endian (line 344) | def random_number_lower_endian(length, base):
  class AlgorithmicAdditionBinary40 (line 353) | class AlgorithmicAdditionBinary40(AlgorithmicProblem):
    method num_symbols (line 357) | def num_symbols(self):
    method generator (line 360) | def generator(self, base, max_length, nbr_cases):  # pylint: disable=a...
  class AlgorithmicAdditionDecimal40 (line 394) | class AlgorithmicAdditionDecimal40(AlgorithmicAdditionBinary40):
    method num_symbols (line 398) | def num_symbols(self):
  class AlgorithmicMultiplicationBinary40 (line 403) | class AlgorithmicMultiplicationBinary40(AlgorithmicProblem):
    method num_symbols (line 407) | def num_symbols(self):
    method generator (line 410) | def generator(self, base, max_length, nbr_cases):  # pylint: disable=a...
  class AlgorithmicMultiplicationDecimal40 (line 445) | class AlgorithmicMultiplicationDecimal40(AlgorithmicMultiplicationBinary...
    method num_symbols (line 449) | def num_symbols(self):
  class AlgorithmicReverseBinary40Test (line 454) | class AlgorithmicReverseBinary40Test(AlgorithmicReverseBinary40):
    method train_length (line 458) | def train_length(self):
    method dev_length (line 462) | def dev_length(self):
    method train_size (line 466) | def train_size(self):
    method dev_size (line 470) | def dev_size(self):
    method num_shards (line 474) | def num_shards(self):
  class AlgorithmicSortProblem (line 479) | class AlgorithmicSortProblem(AlgorithmicProblem):
    method num_symbols (line 483) | def num_symbols(self):
    method train_length (line 487) | def train_length(self):
    method dev_length (line 491) | def dev_length(self):
    method unique (line 495) | def unique(self):
    method generator (line 499) | def generator(self, nbr_symbols, max_length, nbr_cases):
    method eval_metrics (line 535) | def eval_metrics(self):
  class TinyAlgo (line 541) | class TinyAlgo(AlgorithmicIdentityBinary40):
    method generate_data (line 544) | def generate_data(self, data_dir, tmp_dir, task_id=-1):
    method setup_for_test (line 557) | def setup_for_test(cls):

FILE: tensor2tensor/data_generators/algorithmic_math.py
  class ExprOp (line 31) | class ExprOp(object):
    method __init__ (line 34) | def __init__(self, symbol, precedence, associative=False):
    method __str__ (line 48) | def __str__(self):
    method __eq__ (line 51) | def __eq__(self, other):
  class ExprNode (line 55) | class ExprNode(object):
    method __init__ (line 61) | def __init__(self, left, right, op):
    method __str__ (line 69) | def __str__(self):
    method is_in (line 81) | def is_in(self, expr):
  function is_in_expr (line 90) | def is_in_expr(expr, find):
  function random_expr_with_required_var (line 95) | def random_expr_with_required_var(depth, required_var, optional_list, ops):
  function random_expr (line 132) | def random_expr(depth, vlist, ops):
  function algebra_inverse_solve (line 158) | def algebra_inverse_solve(left, right, var, solve_ops):
  function format_sympy_expr (line 214) | def format_sympy_expr(sympy_expr, functions=None):
  function generate_algebra_inverse_sample (line 236) | def generate_algebra_inverse_sample(vlist, ops, solve_ops, min_depth,
  function generate_algebra_simplify_sample (line 277) | def generate_algebra_simplify_sample(vlist, ops, min_depth, max_depth):
  function generate_calculus_integrate_sample (line 302) | def generate_calculus_integrate_sample(vlist, ops, min_depth, max_depth,
  function math_dataset_init (line 358) | def math_dataset_init(alphabet_size=26, digits=None, functions=None):
  function algebra_inverse (line 439) | def algebra_inverse(alphabet_size=26, min_depth=0, max_depth=2,
  function algebra_simplify (line 480) | def algebra_simplify(alphabet_size=26,
  function calculus_integrate (line 520) | def calculus_integrate(alphabet_size=26,

FILE: tensor2tensor/data_generators/algorithmic_math_deepmind.py
  class AlgorithmicMathDeepmindAll (line 40) | class AlgorithmicMathDeepmindAll(text_problems.Text2TextProblem):
    method vocab_type (line 44) | def vocab_type(self):
    method dataset_splits (line 48) | def dataset_splits(self):
    method is_generate_per_split (line 58) | def is_generate_per_split(self):
    method generate_samples (line 61) | def generate_samples(self, data_dir, tmp_dir, dataset_split):

FILE: tensor2tensor/data_generators/algorithmic_math_test.py
  class AlgorithmicMathTest (line 29) | class AlgorithmicMathTest(tf.test.TestCase):
    method testAlgebraInverse (line 31) | def testAlgebraInverse(self):
    method testAlgebraSimplify (line 49) | def testAlgebraSimplify(self):
    method testCalculusIntegrate (line 61) | def testCalculusIntegrate(self):

FILE: tensor2tensor/data_generators/algorithmic_math_two_variables.py
  function _download_mlu_data (line 60) | def _download_mlu_data(tmp_dir, data_dir):
  class AlgorithmicMathTwoVariables (line 89) | class AlgorithmicMathTwoVariables(text_problems.Text2TextProblem):
    method vocab_type (line 93) | def vocab_type(self):
    method dataset_splits (line 97) | def dataset_splits(self):
    method is_generate_per_split (line 107) | def is_generate_per_split(self):
    method generate_samples (line 110) | def generate_samples(self, data_dir, tmp_dir, dataset_split):

FILE: tensor2tensor/data_generators/algorithmic_test.py
  class AlgorithmicTest (line 28) | class AlgorithmicTest(tf.test.TestCase):
    method testIdentityGenerator (line 30) | def testIdentityGenerator(self):
    method testReverseGenerator (line 38) | def testReverseGenerator(self):
    method testZipfDistribution (line 46) | def testZipfDistribution(self):
    method testReverseGeneratorNlpLike (line 54) | def testReverseGeneratorNlpLike(self):
    method testLowerEndianToNumber (line 61) | def testLowerEndianToNumber(self):
    method testNumberToLowerEndian (line 70) | def testNumberToLowerEndian(self):
    method testAdditionGenerator (line 79) | def testAdditionGenerator(self):
    method testMultiplicationGenerator (line 90) | def testMultiplicationGenerator(self):
    method testSortGenerator (line 101) | def testSortGenerator(self):

FILE: tensor2tensor/data_generators/all_problems.py
  function _is_import_err_msg (line 114) | def _is_import_err_msg(err_str, module):
  function _handle_errors (line 124) | def _handle_errors(errors):
  function import_modules (line 140) | def import_modules(modules):

FILE: tensor2tensor/data_generators/allen_brain.py
  function PIL_Image (line 77) | def PIL_Image():  # pylint: disable=invalid-name
  function _get_case_file_paths (line 82) | def _get_case_file_paths(tmp_dir, case, training_fraction=0.95):
  function maybe_download_image_dataset (line 125) | def maybe_download_image_dataset(image_ids, target_dir):
  function random_square_mask (line 167) | def random_square_mask(shape, fraction):
  function _generator (line 193) | def _generator(tmp_dir, training, size=_BASE_EXAMPLE_IMAGE_SIZE,
  class Img2imgAllenBrain (line 265) | class Img2imgAllenBrain(problem.Problem):
    method train_shards (line 277) | def train_shards(self):
    method dev_shards (line 281) | def dev_shards(self):
    method training_fraction (line 285) | def training_fraction(self):
    method num_channels (line 289) | def num_channels(self):
    method input_dim (line 294) | def input_dim(self):
    method output_dim (line 300) | def output_dim(self):
    method inpaint_fraction (line 305) | def inpaint_fraction(self):
    method preprocess_example (line 310) | def preprocess_example(self, example, mode, hparams):
    method feature_encoders (line 339) | def feature_encoders(self, data_dir):
    method example_reading_spec (line 346) | def example_reading_spec(self):
    method eval_metrics (line 362) | def eval_metrics(self):
    method generate_data (line 370) | def generate_data(self, data_dir, tmp_dir, task_id=-1):
    method hparams (line 377) | def hparams(self, defaults, unused_model_hparams):
    method generator (line 387) | def generator(self, tmp_dir, is_training):
  class Img2imgAllenBrainDim48to64 (line 397) | class Img2imgAllenBrainDim48to64(Img2imgAllenBrain):
    method dataset_filename (line 400) | def dataset_filename(self):
    method input_dim (line 404) | def input_dim(self):
    method output_dim (line 408) | def output_dim(self):
  class Img2imgAllenBrainDim8to32 (line 413) | class Img2imgAllenBrainDim8to32(Img2imgAllenBrain):
    method dataset_filename (line 416) | def dataset_filename(self):
    method input_dim (line 420) | def input_dim(self):
    method output_dim (line 424) | def output_dim(self):
  class Img2imgAllenBrainDim16to16Paint1 (line 429) | class Img2imgAllenBrainDim16to16Paint1(Img2imgAllenBrain):
    method dataset_filename (line 432) | def dataset_filename(self):
    method input_dim (line 436) | def input_dim(self):
    method output_dim (line 440) | def output_dim(self):
    method inpaint_fraction (line 444) | def inpaint_fraction(self):

FILE: tensor2tensor/data_generators/allen_brain_test.py
  function mock_raw_image (line 37) | def mock_raw_image(x_dim=1024, y_dim=1024, num_channels=3,
  function mock_raw_data (line 70) | def mock_raw_data(tmp_dir, raw_dim=1024, num_channels=3, num_images=1):
  class TemporaryDirectory (line 96) | class TemporaryDirectory(object):
    method __enter__ (line 99) | def __enter__(self):
    method __exit__ (line 103) | def __exit__(self, exc_type, exc_value, traceback):
  class TestAllenBrain (line 107) | class TestAllenBrain(tf.test.TestCase):
    method setUp (line 110) | def setUp(self):
    method test_generator_produces_examples (line 116) | def test_generator_produces_examples(self):
    method test_generate_data_produces_examples_of_correct_shape (line 127) | def test_generate_data_produces_examples_of_correct_shape(self):
    method test_transformer2d_single_step_e2e (line 161) | def test_transformer2d_single_step_e2e(self):
  class TestImageMock (line 247) | class TestImageMock(tf.test.TestCase):
    method test_image_mock_produces_expected_shape (line 250) | def test_image_mock_produces_expected_shape(self):
  class TestMockRawData (line 279) | class TestMockRawData(tf.test.TestCase):
    method test_runs (line 282) | def test_runs(self):

FILE: tensor2tensor/data_generators/audio.py
  function _get_timit (line 42) | def _get_timit(directory):
  function _collect_data (line 54) | def _collect_data(directory, input_ext, target_ext):
  function _get_audio_data (line 75) | def _get_audio_data(filepath):
  function _get_text_data (line 87) | def _get_text_data(filepath):
  function timit_generator (line 96) | def timit_generator(data_dir,

FILE: tensor2tensor/data_generators/audio_encoder.py
  class AudioEncoder (line 25) | class AudioEncoder(object):
    method __init__ (line 28) | def __init__(self, num_reserved_ids=0, sample_rate=16000):
    method num_reserved_ids (line 33) | def num_reserved_ids(self):
    method encode (line 36) | def encode(self, s):
    method decode (line 71) | def decode(self, ids):
    method decode_list (line 87) | def decode_list(self, ids):
    method vocab_size (line 99) | def vocab_size(self):

FILE: tensor2tensor/data_generators/audio_test.py
  class AudioTest (line 29) | class AudioTest(tf.test.TestCase):
    method testDataCollection (line 31) | def testDataCollection(self):

FILE: tensor2tensor/data_generators/babi_qa.py
  function _normalize_string (line 84) | def _normalize_string(raw_str):
  function _prepare_babi_data (line 98) | def _prepare_babi_data(tmp_dir, data_dir):
  function _build_vocab (line 126) | def _build_vocab(generator, vocab_dir, vocab_name):
  function _babi_parser (line 152) | def _babi_parser(tmp_dir,
  class FeatureNames (line 256) | class FeatureNames(object):
    method features (line 263) | def features(cls):
  class BabiQa (line 269) | class BabiQa(text_problems.QuestionAndContext2TextProblem):
    method __init__ (line 272) | def __init__(self, *args, **kwargs):
    method babi_subset (line 279) | def babi_subset(self):
    method babi_task_id (line 288) | def babi_task_id(self):
    method dataset_filename (line 296) | def dataset_filename(self):
    method vocab_file (line 300) | def vocab_file(self):
    method dataset_splits (line 304) | def dataset_splits(self):
    method is_generate_per_split (line 314) | def is_generate_per_split(self):
    method joint_training (line 318) | def joint_training(self):
    method vocab_type (line 323) | def vocab_type(self):
    method get_labels_encoder (line 326) | def get_labels_encoder(self, data_dir):
    method generate_samples (line 338) | def generate_samples(self, data_dir, tmp_dir, dataset_split):
    method generate_encoded_samples (line 364) | def generate_encoded_samples(self, data_dir, tmp_dir, dataset_split):
    method feature_encoders (line 388) | def feature_encoders(self, data_dir):
    method generate_text_for_vocab (line 403) | def generate_text_for_vocab(self, data_dir, tmp_dir):
    method hparams (line 417) | def hparams(self, defaults, unused_model_hparams):
    method example_reading_spec (line 431) | def example_reading_spec(self):
    method eval_metrics (line 437) | def eval_metrics(self):
  class BabiQaConcat (line 446) | class BabiQaConcat(BabiQa):
    method preprocess_example (line 449) | def preprocess_example(self, example, unused_mode, unused_model_hparams):
    method hparams (line 456) | def hparams(self, defaults, unused_model_hparams):
  function _problems_to_register (line 467) | def _problems_to_register():
  function _register_babi_problems (line 510) | def _register_babi_problems():

FILE: tensor2tensor/data_generators/bair_robot_pushing.py
  function PIL_Image (line 46) | def PIL_Image():  # pylint: disable=invalid-name
  class VideoBairRobotPushing (line 52) | class VideoBairRobotPushing(video_utils.VideoProblem):
    method num_channels (line 56) | def num_channels(self):
    method frame_height (line 60) | def frame_height(self):
    method frame_width (line 64) | def frame_width(self):
    method is_generate_per_split (line 68) | def is_generate_per_split(self):
    method total_number_of_frames (line 73) | def total_number_of_frames(self):
    method max_frames_per_video (line 76) | def max_frames_per_video(self, hparams):
    method random_skip (line 80) | def random_skip(self):
    method only_keep_videos_from_0th_frame (line 84) | def only_keep_videos_from_0th_frame(self):
    method use_not_breaking_batching (line 88) | def use_not_breaking_batching(self):
    method dataset_splits (line 92) | def dataset_splits(self):
    method extra_reading_spec (line 100) | def extra_reading_spec(self):
    method hparams (line 111) | def hparams(self, defaults, unused_model_hparams):
    method parse_frames (line 118) | def parse_frames(self, filenames):
    method generate_samples (line 149) | def generate_samples(self, data_dir, tmp_dir, dataset_split):
  class VideoBairRobotPushingWithActions (line 180) | class VideoBairRobotPushingWithActions(VideoBairRobotPushing):
    method extra_reading_spec (line 184) | def extra_reading_spec(self):

FILE: tensor2tensor/data_generators/celeba.py
  class ImageCeleba (line 33) | class ImageCeleba(image_utils.ImageProblem):
    method hparams (line 57) | def hparams(self, defaults, unused_model_hparams):
    method generator (line 67) | def generator(self, tmp_dir, how_many, start_from=0):
    method train_shards (line 137) | def train_shards(self):
    method dev_shards (line 141) | def dev_shards(self):
    method test_shards (line 145) | def test_shards(self):
    method generate_data (line 148) | def generate_data(self, data_dir, tmp_dir, task_id=-1):
  class ImageCelebaMultiResolution (line 166) | class ImageCelebaMultiResolution(ImageCeleba):
    method dataset_filename (line 172) | def dataset_filename(self):
    method preprocess_example (line 175) | def preprocess_example(self, example, mode, hparams):
  class Img2imgCeleba (line 209) | class Img2imgCeleba(ImageCeleba):
    method dataset_filename (line 212) | def dataset_filename(self):
    method preprocess_example (line 215) | def preprocess_example(self, example, unused_mode, unused_hparams):
  class Img2imgCeleba64 (line 229) | class Img2imgCeleba64(Img2imgCeleba):
    method preprocess_example (line 232) | def preprocess_example(self, example, unused_mode, unused_hparams):
  class ImageCeleba32 (line 246) | class ImageCeleba32(Img2imgCeleba):
    method preprocess_example (line 249) | def preprocess_example(self, example, unused_mode, unused_hparams):
  class ImageCeleba64 (line 262) | class ImageCeleba64(Img2imgCeleba):
    method preprocess_example (line 265) | def preprocess_example(self, example, unused_mode, unused_hparams):

FILE: tensor2tensor/data_generators/celeba_test.py
  class CelebaTest (line 30) | class CelebaTest(parameterized.TestCase, tf.test.TestCase):
    method testCelebaMultiResolutionPreprocessExample (line 36) | def testCelebaMultiResolutionPreprocessExample(self, resize_method):

FILE: tensor2tensor/data_generators/celebahq.py
  class ImageCelebahq128 (line 33) | class ImageCelebahq128(image_utils.ImageProblem):
    method dataset_filename (line 36) | def dataset_filename(self):
    method example_reading_spec (line 39) | def example_reading_spec(self):
    method filepattern (line 48) | def filepattern(self, data_dir, mode, shard=None):
    method generate_data (line 74) | def generate_data(self, data_dir, tmp_dir, task_id=-1):
    method hparams (line 80) | def hparams(self, defaults, unused_model_hparams):
    method preprocess_example (line 87) | def preprocess_example(self, example, mode, hparams):
  class ImageCelebahq128Dmol (line 94) | class ImageCelebahq128Dmol(ImageCelebahq128):
    method eval_metrics (line 97) | def eval_metrics(self):
  class ImageCelebahq256 (line 104) | class ImageCelebahq256(ImageCelebahq128):
    method dataset_filename (line 107) | def dataset_filename(self):
    method preprocess_example (line 110) | def preprocess_example(self, example, mode, hparams):
  class ImageCelebahq256Dmol (line 117) | class ImageCelebahq256Dmol(ImageCelebahq256):
    method eval_metrics (line 120) | def eval_metrics(self):

FILE: tensor2tensor/data_generators/cifar.py
  function _get_cifar (line 56) | def _get_cifar(directory, url):
  function cifar_generator (line 63) | def cifar_generator(cifar_version, tmp_dir, training, how_many, start_fr...
  class ImageCifar10Tune (line 118) | class ImageCifar10Tune(mnist.ImageMnistTune):
    method num_channels (line 122) | def num_channels(self):
    method class_labels (line 126) | def class_labels(self):
    method preprocess_example (line 132) | def preprocess_example(self, example, mode, unused_hparams):
    method generator (line 142) | def generator(self, data_dir, tmp_dir, is_training):
  class ImageCifar10 (line 150) | class ImageCifar10(ImageCifar10Tune):
    method generator (line 152) | def generator(self, data_dir, tmp_dir, is_training):
  class ImageCifar10Plain (line 160) | class ImageCifar10Plain(ImageCifar10):
    method preprocess_example (line 162) | def preprocess_example(self, example, mode, unused_hparams):
  class ImageCifar10PlainGen (line 172) | class ImageCifar10PlainGen(ImageCifar10Plain):
    method dataset_filename (line 175) | def dataset_filename(self):
    method preprocess_example (line 178) | def preprocess_example(self, example, mode, unused_hparams):
  class ImageCifar10PlainGenFlat (line 185) | class ImageCifar10PlainGenFlat(ImageCifar10PlainGen):
    method preprocess_example (line 188) | def preprocess_example(self, example, mode, unused_hparams):
    method hparams (line 197) | def hparams(self, defaults, model_hparams):
  class ImageCifar10PlainRandomShift (line 206) | class ImageCifar10PlainRandomShift(ImageCifar10Plain):
    method dataset_filename (line 209) | def dataset_filename(self):
    method preprocess_example (line 212) | def preprocess_example(self, example, mode, unused_hparams):
  class ImageCifar10PlainGenDmol (line 222) | class ImageCifar10PlainGenDmol(ImageCifar10PlainGen):
    method dataset_filename (line 225) | def dataset_filename(self):
    method eval_metrics (line 228) | def eval_metrics(self):
  class ImageCifar10Plain8 (line 235) | class ImageCifar10Plain8(ImageCifar10):
    method dataset_filename (line 238) | def dataset_filename(self):
    method preprocess_example (line 241) | def preprocess_example(self, example, mode, unused_hparams):
  class Img2imgCifar10 (line 251) | class Img2imgCifar10(ImageCifar10):
    method dataset_filename (line 254) | def dataset_filename(self):
    method preprocess_example (line 257) | def preprocess_example(self, example, unused_mode, unused_hparams):
    method hparams (line 264) | def hparams(self, defaults, unused_model_hparams):
  class ImageCifar100Tune (line 276) | class ImageCifar100Tune(mnist.ImageMnistTune):
    method num_classes (line 280) | def num_classes(self):
    method num_channels (line 284) | def num_channels(self):
    method class_labels (line 288) | def class_labels(self):
    method preprocess_example (line 392) | def preprocess_example(self, example, mode, unused_hparams):
    method generator (line 402) | def generator(self, data_dir, tmp_dir, is_training):
  class ImageCifar100 (line 410) | class ImageCifar100(ImageCifar100Tune):
    method generator (line 412) | def generator(self, data_dir, tmp_dir, is_training):
  class ImageCifar100Plain (line 420) | class ImageCifar100Plain(ImageCifar100):
    method preprocess_example (line 422) | def preprocess_example(self, example, mode, unused_hparams):
  class ImageCifar100PlainGen (line 432) | class ImageCifar100PlainGen(ImageCifar100Plain):
    method dataset_filename (line 435) | def dataset_filename(self):
    method preprocess_example (line 438) | def preprocess_example(self, example, mode, unused_hparams):
  class ImageCifar100Plain8 (line 445) | class ImageCifar100Plain8(ImageCifar100):
    method dataset_filename (line 448) | def dataset_filename(self):
    method preprocess_example (line 451) | def preprocess_example(self, example, mode, unused_hparams):
  class Img2imgCifar100 (line 461) | class Img2imgCifar100(ImageCifar100):
    method dataset_filename (line 464) | def dataset_filename(self):
    method preprocess_example (line 467) | def preprocess_example(self, example, unused_mode, unused_hparams):
    method hparams (line 474) | def hparams(self, defaults, unused_model_hparams):
  class ImageCifar20Tune (line 487) | class ImageCifar20Tune(mnist.ImageMnistTune):
    method num_classes (line 491) | def num_classes(self):
    method num_channels (line 495) | def num_channels(self):
    method class_labels (line 499) | def class_labels(self):
    method preprocess_example (line 523) | def preprocess_example(self, example, mode, unused_hparams):
    method generator (line 533) | def generator(self, data_dir, tmp_dir, is_training):
  class ImageCifar20 (line 541) | class ImageCifar20(ImageCifar20Tune):
    method generator (line 543) | def generator(self, data_dir, tmp_dir, is_training):
  class ImageCifar20Plain (line 551) | class ImageCifar20Plain(ImageCifar20):
    method preprocess_example (line 553) | def preprocess_example(self, example, mode, unused_hparams):
  class ImageCifar20PlainGen (line 563) | class ImageCifar20PlainGen(ImageCifar20Plain):
    method dataset_filename (line 566) | def dataset_filename(self):
    method preprocess_example (line 569) | def preprocess_example(self, example, mode, unused_hparams):
  class ImageCifar20Plain8 (line 576) | class ImageCifar20Plain8(ImageCifar20):
    method dataset_filename (line 579) | def dataset_filename(self):
    method preprocess_example (line 582) | def preprocess_example(self, example, mode, unused_hparams):

FILE: tensor2tensor/data_generators/cipher.py
  class AlgorithmicCipherShift5 (line 29) | class AlgorithmicCipherShift5(algorithmic.AlgorithmicProblem):
    method num_symbols (line 33) | def num_symbols(self):
    method distribution (line 37) | def distribution(self):
    method shift (line 41) | def shift(self):
    method generator (line 44) | def generator(self, nbr_symbols, max_length, nbr_cases):
    method train_length (line 53) | def train_length(self):
    method dev_length (line 57) | def dev_length(self):
  class AlgorithmicCipherVigenere5 (line 62) | class AlgorithmicCipherVigenere5(algorithmic.AlgorithmicProblem):
    method num_symbols (line 66) | def num_symbols(self):
    method distribution (line 70) | def distribution(self):
    method key (line 74) | def key(self):
    method generator (line 77) | def generator(self, nbr_symbols, max_length, nbr_cases):
    method train_length (line 86) | def train_length(self):
    method dev_length (line 90) | def dev_length(self):
  class AlgorithmicCipherShift200 (line 95) | class AlgorithmicCipherShift200(AlgorithmicCipherShift5):
    method num_symbols (line 99) | def num_symbols(self):
    method distribution (line 103) | def distribution(self):
  class AlgorithmicCipherVigenere200 (line 110) | class AlgorithmicCipherVigenere200(AlgorithmicCipherVigenere5):
    method num_symbols (line 114) | def num_symbols(self):
    method distribution (line 118) | def distribution(self):
    method key (line 124) | def key(self):
  class ShiftEncryptionLayer (line 128) | class ShiftEncryptionLayer(object):
    method __init__ (line 131) | def __init__(self, vocab, shift):
    method encrypt_character (line 147) | def encrypt_character(self, character):
    method decrypt_character (line 150) | def decrypt_character(self, character):
  function generate_plaintext_random (line 154) | def generate_plaintext_random(plain_vocab, distribution, train_samples,
  function encipher_shift (line 180) | def encipher_shift(plaintext, plain_vocab, shift):
  function encipher_vigenere (line 203) | def encipher_vigenere(plaintext, plain_vocab, key):

FILE: tensor2tensor/data_generators/cleaner_en_xx.py
  function paracrawl_v3_pairs (line 66) | def paracrawl_v3_pairs(paracrawl_file):
  function _raw_sentences (line 89) | def _raw_sentences(paracrawl_file):
  function clean_en_xx_pairs (line 113) | def clean_en_xx_pairs(en_xx_pairs):
  function _regex_filter (line 145) | def _regex_filter(sentence):
  function _is_match (line 165) | def _is_match(sentence, regex):
  function _split_sentences (line 169) | def _split_sentences(s1, s2):

FILE: tensor2tensor/data_generators/cnn_dailymail.py
  function _maybe_download_corpora (line 67) | def _maybe_download_corpora(tmp_dir, dataset_split):
  function example_splits (line 110) | def example_splits(url_file, all_files):
  function example_generator (line 137) | def example_generator(all_files, urls_path, sum_token):
  function _story_summary_split (line 176) | def _story_summary_split(story):
  function write_raw_text_to_files (line 183) | def write_raw_text_to_files(all_files, urls_path, dataset_split, tmp_dir):
  class SummarizeCnnDailymail32k (line 211) | class SummarizeCnnDailymail32k(text_problems.Text2TextProblem):
    method generate_text_for_vocab (line 214) | def generate_text_for_vocab(self, data_dir, tmp_dir):
    method dataset_splits (line 221) | def dataset_splits(self):
    method is_generate_per_split (line 234) | def is_generate_per_split(self):
    method generate_samples (line 237) | def generate_samples(self, data_dir, tmp_dir, dataset_split):
  class SummarizeCnnDailymailWikiLMSharedVocab (line 247) | class SummarizeCnnDailymailWikiLMSharedVocab(SummarizeCnnDailymail32k):
    method use_vocab_from_other_problem (line 251) | def use_vocab_from_other_problem(self):
  class SummarizeCnnDailymailWikiLMSharedVocab64k (line 256) | class SummarizeCnnDailymailWikiLMSharedVocab64k(SummarizeCnnDailymail32k):
    method use_vocab_from_other_problem (line 260) | def use_vocab_from_other_problem(self):
  class SummarizeCnnDailymailWikiLMMultiVocab64k (line 265) | class SummarizeCnnDailymailWikiLMMultiVocab64k(SummarizeCnnDailymail32k):
    method use_vocab_from_other_problem (line 269) | def use_vocab_from_other_problem(self):
  class SummarizeCnnDailymailMulti64kPacked1k (line 274) | class SummarizeCnnDailymailMulti64kPacked1k(SummarizeCnnDailymail32k):
    method use_vocab_from_other_problem (line 278) | def use_vocab_from_other_problem(self):
    method packed_length (line 282) | def packed_length(self):
    method num_training_examples (line 286) | def num_training_examples(self):
    method inputs_prefix (line 290) | def inputs_prefix(self):
    method targets_prefix (line 294) | def targets_prefix(self):
  class SummarizeFracCnnDailymailWikiLMSharedVocab64k (line 299) | class SummarizeFracCnnDailymailWikiLMSharedVocab64k(SummarizeCnnDailymai...
    method use_vocab_from_other_problem (line 303) | def use_vocab_from_other_problem(self):
    method fraction_of_data (line 306) | def fraction_of_data(self):
    method generate_samples (line 309) | def generate_samples(self, data_dir, tmp_dir, dataset_split):
  class SummarizeFrac0p1CnnDailymailWikiLMSharedVocab64k (line 328) | class SummarizeFrac0p1CnnDailymailWikiLMSharedVocab64k(
    method fraction_of_data (line 331) | def fraction_of_data(self):
  class SummarizeFrac1CnnDailymailWikiLMSharedVocab64k (line 336) | class SummarizeFrac1CnnDailymailWikiLMSharedVocab64k(
    method fraction_of_data (line 339) | def fraction_of_data(self):
  class SummarizeFrac2CnnDailymailWikiLMSharedVocab64k (line 344) | class SummarizeFrac2CnnDailymailWikiLMSharedVocab64k(
    method fraction_of_data (line 347) | def fraction_of_data(self):
  class SummarizeFrac5CnnDailymailWikiLMSharedVocab64k (line 352) | class SummarizeFrac5CnnDailymailWikiLMSharedVocab64k(
    method fraction_of_data (line 355) | def fraction_of_data(self):
  class SummarizeFrac10CnnDailymailWikiLMSharedVocab64k (line 360) | class SummarizeFrac10CnnDailymailWikiLMSharedVocab64k(
    method fraction_of_data (line 363) | def fraction_of_data(self):
  class SummarizeFrac20CnnDailymailWikiLMSharedVocab64k (line 368) | class SummarizeFrac20CnnDailymailWikiLMSharedVocab64k(
    method fraction_of_data (line 371) | def fraction_of_data(self):
  class SummarizeFrac50CnnDailymailWikiLMSharedVocab64k (line 376) | class SummarizeFrac50CnnDailymailWikiLMSharedVocab64k(
    method fraction_of_data (line 379) | def fraction_of_data(self):

FILE: tensor2tensor/data_generators/cola.py
  class Cola (line 35) | class Cola(text_problems.Text2ClassProblem):
    method is_generate_per_split (line 45) | def is_generate_per_split(self):
    method dataset_splits (line 49) | def dataset_splits(self):
    method approx_vocab_size (line 59) | def approx_vocab_size(self):
    method num_classes (line 63) | def num_classes(self):
    method class_labels (line 66) | def class_labels(self, data_dir):
    method _maybe_download_corpora (line 71) | def _maybe_download_corpora(self, tmp_dir):
    method example_generator (line 83) | def example_generator(self, filename):
    method generate_samples (line 92) | def generate_samples(self, data_dir, tmp_dir, dataset_split):
  class ColaCharacters (line 105) | class ColaCharacters(Cola):
    method vocab_type (line 109) | def vocab_type(self):
    method global_task_id (line 112) | def global_task_id(self):

FILE: tensor2tensor/data_generators/common_voice.py
  function _collect_data (line 42) | def _collect_data(directory):
  function _file_exists (line 69) | def _file_exists(path, filename):
  function _is_relative (line 74) | def _is_relative(path, filename):
  class CommonVoice (line 80) | class CommonVoice(speech_recognition.SpeechRecognitionProblem):
    method num_shards (line 89) | def num_shards(self):
    method use_subword_tokenizer (line 93) | def use_subword_tokenizer(self):
    method num_dev_shards (line 97) | def num_dev_shards(self):
    method num_test_shards (line 101) | def num_test_shards(self):
    method use_train_shards_for_dev (line 105) | def use_train_shards_for_dev(self):
    method generator (line 109) | def generator(self,
    method generate_data (line 157) | def generate_data(self, data_dir, tmp_dir, task_id=-1):
  class CommonVoiceTrainFullTestClean (line 180) | class CommonVoiceTrainFullTestClean(CommonVoice):
    method training_filepaths (line 183) | def training_filepaths(self, data_dir, num_shards, shuffled):
    method dev_filepaths (line 186) | def dev_filepaths(self, data_dir, num_shards, shuffled):
    method test_filepaths (line 189) | def test_filepaths(self, data_dir, num_shards, shuffled):
    method generate_data (line 192) | def generate_data(self, data_dir, tmp_dir, task_id=-1):
    method filepattern (line 195) | def filepattern(self, data_dir, mode, shard=None):
  class CommonVoiceClean (line 228) | class CommonVoiceClean(CommonVoice):
  class CommonVoiceNoisy (line 238) | class CommonVoiceNoisy(CommonVoice):
  function set_common_voice_length_hparams (line 247) | def set_common_voice_length_hparams(hparams):

FILE: tensor2tensor/data_generators/common_voice_test.py
  class CommonVoiceTest (line 31) | class CommonVoiceTest(tf.test.TestCase):
    method testCollectData (line 33) | def testCollectData(self):

FILE: tensor2tensor/data_generators/conll_ner.py
  class Conll2002Ner (line 33) | class Conll2002Ner(text_problems.Text2textTmpdir):
    method source_data_files (line 36) | def source_data_files(self, dataset_split):
    method generate_samples (line 40) | def generate_samples(self, data_dir, tmp_dir, dataset_split):
  class Conll2002EsNer (line 74) | class Conll2002EsNer(Conll2002Ner):
    method source_data_files (line 79) | def source_data_files(self, dataset_split):
  class Conll2002NlNer (line 85) | class Conll2002NlNer(Conll2002Ner):
    method source_data_files (line 90) | def source_data_files(self, dataset_split):

FILE: tensor2tensor/data_generators/desc2code.py
  class Desc2CodeProblem (line 76) | class Desc2CodeProblem(text_problems.Text2TextProblem):
    method dataset_splits (line 80) | def dataset_splits(self):
    method input_vocab_size (line 90) | def input_vocab_size(self):
    method target_vocab_size (line 94) | def target_vocab_size(self):
    method vocab_input_filename (line 98) | def vocab_input_filename(self):
    method vocab_target_filename (line 102) | def vocab_target_filename(self):
    method preprocess_target (line 106) | def preprocess_target(self, target):
    method feature_encoders (line 119) | def feature_encoders(self, data_dir):
    method is_generate_per_split (line 129) | def is_generate_per_split(self):
    method generate_encoded_samples (line 132) | def generate_encoded_samples(self, data_dir, tmp_dir, dataset_split):
  class ProgrammingDesc2codePy (line 208) | class ProgrammingDesc2codePy(Desc2CodeProblem):
    method pb_constants (line 212) | def pb_constants(self):
    method preprocess_target (line 215) | def preprocess_target(self, target):
  class ProgrammingDesc2codeCpp (line 221) | class ProgrammingDesc2codeCpp(Desc2CodeProblem):
    method pb_constants (line 225) | def pb_constants(self):
    method preprocess_target (line 228) | def preprocess_target(self, target):
  function generator_samples (line 240) | def generator_samples(tmp_dir, pb_cst):

FILE: tensor2tensor/data_generators/desc2code_test.py
  class Desc2codeTest (line 45) | class Desc2codeTest(tf.test.TestCase):
    method testCppPreprocess (line 47) | def testCppPreprocess(self):

FILE: tensor2tensor/data_generators/dialog_abstract.py
  class DialogAbstract (line 42) | class DialogAbstract(text_problems.Text2TextProblem):
    method vocab_type (line 46) | def vocab_type(self):
    method is_generate_per_split (line 50) | def is_generate_per_split(self):
    method vocab_file (line 54) | def vocab_file(self):
    method vocab_filename (line 58) | def vocab_filename(self):
    method oov_token (line 62) | def oov_token(self):
    method use_subword_tokenizer (line 66) | def use_subword_tokenizer(self):
    method input_space_id (line 70) | def input_space_id(self):
    method target_space_id (line 74) | def target_space_id(self):
    method targeted_vocab_size (line 78) | def targeted_vocab_size(self):
    method targeted_dataset_size (line 82) | def targeted_dataset_size(self):
    method dataset_split (line 88) | def dataset_split(self):
    method dataset_splits (line 92) | def dataset_splits(self):
    method data_dir (line 105) | def data_dir(self):
    method raw_data_dir (line 109) | def raw_data_dir(self):
    method raw_data (line 113) | def raw_data(self):
    method zipped_data (line 117) | def zipped_data(self):
    method url (line 121) | def url(self):
    method data_dir (line 125) | def data_dir(self, value):
    method raw_data_dir (line 129) | def raw_data_dir(self, value):
    method raw_data (line 133) | def raw_data(self, value):
    method zipped_data (line 137) | def zipped_data(self, value):
    method url (line 141) | def url(self, value):
    method preprocess_data (line 145) | def preprocess_data(self, train_mode):
    method create_data (line 149) | def create_data(self, train_mode):
    method data_pipeline_status (line 152) | def data_pipeline_status(self, train_mode):
    method download_data (line 205) | def download_data(self, train_mode):
    method extract_data (line 224) | def extract_data(self, train_mode):
    method hparams (line 248) | def hparams(self, defaults, unused_model_hparams):
    method eval_metrics (line 273) | def eval_metrics(self):
    method generate_data (line 280) | def generate_data(self, data_dir, tmp_dir, task_id=-1):
    method generate_samples (line 311) | def generate_samples(self, data_dir, tmp_dir, data_split):
    method save_vocab (line 345) | def save_vocab(self, vocab):
    method open_6_files (line 363) | def open_6_files(self):
    method close_n_files (line 375) | def close_n_files(self, files):
    method clean_line (line 379) | def clean_line(self, line):

FILE: tensor2tensor/data_generators/dialog_cornell.py
  class DialogCornell32k (line 36) | class DialogCornell32k(dialog_abstract.DialogAbstract):
    method targeted_vocab_size (line 43) | def targeted_vocab_size(self):
    method preprocess_data (line 46) | def preprocess_data(self, train_mode):
    method create_data (line 68) | def create_data(self, train_mode):
    method extract_dialog_ids (line 161) | def extract_dialog_ids(self):

FILE: tensor2tensor/data_generators/dialog_dailydialog.py
  class DialogDailydialog16k (line 35) | class DialogDailydialog16k(dialog_abstract.DialogAbstract):
    method preprocess_data (line 42) | def preprocess_data(self, train_mode):
    method create_data (line 62) | def create_data(self, train_mode):

FILE: tensor2tensor/data_generators/dialog_opensubtitles.py
  class DialogOpensubtitles64k2009 (line 37) | class DialogOpensubtitles64k2009(dialog_abstract.DialogAbstract):
    method targeted_vocab_size (line 44) | def targeted_vocab_size(self):
    method dataset_version (line 48) | def dataset_version(self):
    method extract_data (line 52) | def extract_data(self, train_mode):
    method preprocess_data (line 73) | def preprocess_data(self, train_mode):
    method create_data (line 94) | def create_data(self, train_mode):
    method clean_line (line 193) | def clean_line(self, line):
  class DialogOpensubtitles64k2011 (line 217) | class DialogOpensubtitles64k2011(DialogOpensubtitles64k2009):
    method dataset_version (line 220) | def dataset_version(self):
  class DialogOpensubtitles64k2012 (line 226) | class DialogOpensubtitles64k2012(DialogOpensubtitles64k2009):
    method dataset_version (line 229) | def dataset_version(self):
  class DialogOpensubtitles64k2013 (line 235) | class DialogOpensubtitles64k2013(DialogOpensubtitles64k2009):
    method dataset_version (line 238) | def dataset_version(self):
  class DialogOpensubtitles64k2016 (line 244) | class DialogOpensubtitles64k2016(DialogOpensubtitles64k2009):
    method dataset_version (line 247) | def dataset_version(self):
  class DialogOpensubtitles64k2018 (line 253) | class DialogOpensubtitles64k2018(DialogOpensubtitles64k2009):
    method dataset_version (line 256) | def dataset_version(self):

FILE: tensor2tensor/data_generators/dialog_personachat.py
  class DialogPersonachat16k (line 37) | class DialogPersonachat16k(dialog_abstract.DialogAbstract):
    method preprocess_data (line 44) | def preprocess_data(self, train_mode):
    method extract_data (line 63) | def extract_data(self, train_mode):
    method create_data (line 86) | def create_data(self, train_mode):

FILE: tensor2tensor/data_generators/dna_encoder.py
  class DNAEncoder (line 32) | class DNAEncoder(text_encoder.TextEncoder):
    method __init__ (line 44) | def __init__(self,
    method _tokens (line 56) | def _tokens(self):
    method vocab_size (line 67) | def vocab_size(self):
    method encode (line 70) | def encode(self, s):
    method decode (line 88) | def decode(self, ids, strip_extraneous=False):
  class DelimitedDNAEncoder (line 103) | class DelimitedDNAEncoder(DNAEncoder):
    method __init__ (line 109) | def __init__(self, delimiter=",", **kwargs):
    method delimiter (line 115) | def delimiter(self):
    method _tokens (line 118) | def _tokens(self):
    method encode (line 121) | def encode(self, s):

FILE: tensor2tensor/data_generators/dna_encoder_test.py
  class DnaEncoderTest (line 25) | class DnaEncoderTest(tf.test.TestCase):
    method test_encode_decode (line 27) | def test_encode_decode(self):
    method test_delimited_dna_encoder (line 37) | def test_delimited_dna_encoder(self):

FILE: tensor2tensor/data_generators/enwik8.py
  function _maybe_download_corpus (line 33) | def _maybe_download_corpus(tmp_dir):
  class Enwik8L65k (line 55) | class Enwik8L65k(text_problems.Text2SelfProblem):
    method is_generate_per_split (line 62) | def is_generate_per_split(self):
    method vocab_type (line 66) | def vocab_type(self):
    method global_task_id (line 69) | def global_task_id(self):
    method dataset_splits (line 73) | def dataset_splits(self):
    method max_length (line 86) | def max_length(self, model_hparams):
    method sequence_length (line 90) | def sequence_length(self):
    method generate_samples (line 94) | def generate_samples(self, data_dir, tmp_dir, dataset_split):
    method generate_encoded_samples (line 124) | def generate_encoded_samples(self, data_dir, tmp_dir, dataset_split):
  class Enwik8L2k (line 133) | class Enwik8L2k(Enwik8L65k):
    method sequence_length (line 144) | def sequence_length(self):
    method generate_encoded_samples (line 148) | def generate_encoded_samples(self, data_dir, tmp_dir, dataset_split):
  class Enwik8L32k (line 153) | class Enwik8L32k(Enwik8L2k):
    method sequence_length (line 156) | def sequence_length(self):
  class Enwik8L16k (line 162) | class Enwik8L16k(Enwik8L2k):
    method sequence_length (line 165) | def sequence_length(self):
  class Enwik8L8k (line 171) | class Enwik8L8k(Enwik8L2k):
    method sequence_length (line 174) | def sequence_length(self):
  class Enwik8L4k (line 180) | class Enwik8L4k(Enwik8L2k):
    method sequence_length (line 183) | def sequence_length(self):
  class Enwik8L1k (line 189) | class Enwik8L1k(Enwik8L2k):
    method sequence_length (line 192) | def sequence_length(self):
  class Enwik8L512 (line 198) | class Enwik8L512(Enwik8L2k):
    method sequence_length (line 201) | def sequence_length(self):

FILE: tensor2tensor/data_generators/fsns.py
  class ImageFSNS (line 35) | class ImageFSNS(image_utils.ImageProblem):
    method generate_data (line 38) | def generate_data(self, data_dir, tmp_dir, task_id=-1):
    method feature_encoders (line 56) | def feature_encoders(self, data_dir):
    method hparams (line 64) | def hparams(self, defaults, unused_model_hparams):
    method example_reading_spec (line 74) | def example_reading_spec(self):

FILE: tensor2tensor/data_generators/function_docstring.py
  class GithubFunctionDocstring (line 28) | class GithubFunctionDocstring(text_problems.Text2TextProblem):
    method base_url (line 41) | def base_url(self):
    method pair_files_list (line 45) | def pair_files_list(self):
    method is_generate_per_split (line 56) | def is_generate_per_split(self):
    method approx_vocab_size (line 60) | def approx_vocab_size(self):
    method max_samples_for_vocab (line 64) | def max_samples_for_vocab(self):
    method generate_samples (line 68) | def generate_samples(self, data_dir, tmp_dir, dataset_split):
    method preprocess_example (line 102) | def preprocess_example(self, example, mode, unused_hparams):
    method eval_metrics (line 107) | def eval_metrics(self):

FILE: tensor2tensor/data_generators/gene_expression.py
  class GeneExpressionProblem (line 59) | class GeneExpressionProblem(problem.Problem):
    method download_url (line 63) | def download_url(self):
    method h5_file (line 67) | def h5_file(self):
    method num_output_predictions (line 71) | def num_output_predictions(self):
    method chunk_size (line 76) | def chunk_size(self):
    method feature_encoders (line 79) | def feature_encoders(self, data_dir):
    method num_shards (line 88) | def num_shards(self):
    method generate_data (line 91) | def generate_data(self, data_dir, tmp_dir, task_id=-1):
    method hparams (line 143) | def hparams(self, defaults, unused_model_hparams):
    method example_reading_spec (line 152) | def example_reading_spec(self):
    method preprocess_example (line 160) | def preprocess_example(self, example, mode, unused_hparams):
    method eval_metrics (line 172) | def eval_metrics(self):
  class GenomicsExpressionCage10 (line 177) | class GenomicsExpressionCage10(GeneExpressionProblem):
    method download_url (line 180) | def download_url(self):
    method h5_file (line 184) | def h5_file(self):
  class GenomicsExpressionGm12878 (line 189) | class GenomicsExpressionGm12878(GeneExpressionProblem):
    method download_url (line 192) | def download_url(self):
    method h5_file (line 196) | def h5_file(self):
  class GenomicsExpressionL262k (line 201) | class GenomicsExpressionL262k(GeneExpressionProblem):
    method h5_file (line 204) | def h5_file(self):
  function generate_shard_args (line 208) | def generate_shard_args(outfiles, num_examples):
  function generate_dataset (line 219) | def generate_dataset(h5_filepath,
  function dataset_generator (line 232) | def dataset_generator(filepath,
  function to_example_dict (line 263) | def to_example_dict(encoder, inputs, mask, outputs):

FILE: tensor2tensor/data_generators/gene_expression_test.py
  class GeneticsTest (line 28) | class GeneticsTest(tf.test.TestCase):
    method _one_hot_bases (line 30) | def _one_hot_bases(self, bases):
    method testRecordToExample (line 40) | def testRecordToExample(self):
    method testGenerateShardArgs (line 58) | def testGenerateShardArgs(self):

FILE: tensor2tensor/data_generators/generator_utils.py
  function to_example (line 47) | def to_example(dictionary):
  function generate_files_distributed (line 74) | def generate_files_distributed(generator,
  function _data_filenames (line 101) | def _data_filenames(output_name, output_dir, num_shards):
  function train_data_filenames (line 108) | def train_data_filenames(problem, output_dir, num_shards):
  function dev_data_filenames (line 112) | def dev_data_filenames(problem, output_dir, num_shards):
  function test_data_filenames (line 116) | def test_data_filenames(problem, output_dir, num_shards):
  function combined_data_filenames (line 120) | def combined_data_filenames(problem, output_dir, num_training_shards):
  function sharded_name (line 126) | def sharded_name(base_name, shard, total_shards):
  function shard_filepath (line 130) | def shard_filepath(fname, num_shards):
  function outputs_exist (line 136) | def outputs_exist(filenames):
  function generate_files (line 143) | def generate_files(generator, output_filenames,
  function download_report_hook (line 205) | def download_report_hook(count, block_size, total_size):
  function maybe_download (line 217) | def maybe_download(directory, filename, uri):
  function maybe_download_from_drive (line 260) | def maybe_download_from_drive(directory, filename, url):
  function gunzip_file (line 310) | def gunzip_file(gz_path, new_path):
  function get_or_generate_vocab_inner (line 330) | def get_or_generate_vocab_inner(data_dir, vocab_filename, vocab_size,
  function get_or_generate_vocab (line 370) | def get_or_generate_vocab(data_dir, tmp_dir, vocab_filename, vocab_size,
  function generate_lines_for_vocab (line 380) | def generate_lines_for_vocab(tmp_dir, sources, file_byte_budget=1e6):
  function get_or_generate_tabbed_vocab (line 425) | def get_or_generate_tabbed_vocab(data_dir, tmp_dir, source_filename,
  function get_or_generate_txt_vocab (line 459) | def get_or_generate_txt_vocab(data_dir, vocab_filename, vocab_size,
  function read_records (line 477) | def read_records(filename):
  function write_records (line 487) | def write_records(records, out_filename):
  function generate_dataset_and_shuffle (line 496) | def generate_dataset_and_shuffle(train_gen,
  function _shuffle_single (line 508) | def _shuffle_single(fname, extra_fn=None):
  function shuffle_dataset (line 525) | def shuffle_dataset(filenames, extra_fn=None):
  class SequencePacker (line 542) | class SequencePacker(object):
    method __init__ (line 548) | def __init__(self, first_sequence, spacing=2):
    method add (line 554) | def add(self, ids):
    method can_fit (line 561) | def can_fit(self, ids, packed_length):
    method to_dict (line 564) | def to_dict(self):
  class SequencePairPacker (line 571) | class SequencePairPacker(object):
    method __init__ (line 577) | def __init__(self, first_sequence_pair, spacing=2):
    method add (line 581) | def add(self, pair):
    method can_fit (line 585) | def can_fit(self, pair, packed_length):
    method to_dict (line 589) | def to_dict(self):
  function pack_examples (line 598) | def pack_examples(examples,
  function pack_dataset (line 672) | def pack_dataset(dataset, length, keys=None, use_custom_ops=False):
  function _pack_with_custom_ops (line 734) | def _pack_with_custom_ops(dataset, keys, length):
  class SequenceDatasetPacker (line 783) | class SequenceDatasetPacker(object):
    method __init__ (line 796) | def __init__(self, packed_length=256, spacing=0, queue_size=10,
    method __call__ (line 805) | def __call__(self, dataset, **kwargs):
    method _concurrent_pack (line 810) | def _concurrent_pack(self, dataset, window_size=None, cycle_length=None,
    method _pack (line 833) | def _pack(self, dataset, window_size=None, cycle_length=None,
    method _standardize (line 869) | def _standardize(self, dataset, keys):
    method _eviction_fn (line 900) | def _eviction_fn(self, _):
    method _scan_initial_state (line 904) | def _scan_initial_state(self):
    method _scanning_pack (line 961) | def _scanning_pack(self, dataset):
    method _compute_auxiliary_structure (line 986) | def _compute_auxiliary_structure(self, contents_and_mask):
    method _finalize (line 1009) | def _finalize(self, _, contents):
  function _scan_step_fn (line 1028) | def _scan_step_fn(state, example, packed_length, queue_size, spacing,
  function make_tmp_dir (line 1156) | def make_tmp_dir(suffix="", prefix="tmp", dir=None):  # pylint: disable=...
  function tfrecord_iterator_for_problem (line 1171) | def tfrecord_iterator_for_problem(problem, data_dir,
  function tfrecord_iterator (line 1179) | def tfrecord_iterator(filenames, gzipped=False, example_spec=None):
  function random_deinterleave (line 1219) | def random_deinterleave(text, separator_symbol="X"):

FILE: tensor2tensor/data_generators/generator_utils_test.py
  function example_generator (line 69) | def example_generator():
  function trim_right (line 74) | def trim_right(x):
  function reference_packing (line 81) | def reference_packing(trim_fn=None):
  class GeneratorUtilsTest (line 94) | class GeneratorUtilsTest(tf.test.TestCase):
    method testGenerateFiles (line 96) | def testGenerateFiles(self):
    method testMaybeDownload (line 113) | def testMaybeDownload(self):
    method testMaybeDownloadFromDrive (line 127) | def testMaybeDownloadFromDrive(self):
    method testGunzipFile (line 141) | def testGunzipFile(self):
    method testGetOrGenerateTxtVocab (line 162) | def testGetOrGenerateTxtVocab(self):
    method testPacking (line 184) | def testPacking(self):
    method testDatasetPacking (line 193) | def testDatasetPacking(self):

FILE: tensor2tensor/data_generators/google_robot_pushing.py
  function PIL_Image (line 47) | def PIL_Image():  # pylint: disable=invalid-name
  class VideoGoogleRobotPushing (line 53) | class VideoGoogleRobotPushing(video_utils.VideoProblem):
    method num_channels (line 57) | def num_channels(self):
    method frame_height (line 61) | def frame_height(self):
    method frame_width (line 65) | def frame_width(self):
    method total_number_of_frames (line 69) | def total_number_of_frames(self):
    method max_number_of_frames_per_video (line 74) | def max_number_of_frames_per_video(self):
    method is_generate_per_split (line 78) | def is_generate_per_split(self):
    method parse_frames (line 81) | def parse_frames(self, filename):
    method get_urls (line 113) | def get_urls(self, count, url_part):
    method generate_samples (line 117) | def generate_samples(self, data_dir, tmp_dir, dataset_split):
    method hparams (line 134) | def hparams(self, defaults, unused_model_hparams):

FILE: tensor2tensor/data_generators/gym_env.py
  class Observation (line 49) | class Observation(object):
    method __init__ (line 57) | def __init__(self, data, decode_fn):
    method __eq__ (line 61) | def __eq__(self, other):
    method __ne__ (line 68) | def __ne__(self, other):
    method decode (line 72) | def decode(self):
  class _Noncopyable (line 77) | class _Noncopyable(object):
    method __init__ (line 79) | def __init__(self, obj):
    method __deepcopy__ (line 82) | def __deepcopy__(self, memo):
  class EnvSimulationProblem (line 86) | class EnvSimulationProblem(video_utils.VideoProblem):
    method num_actions (line 99) | def num_actions(self):
    method num_rewards (line 103) | def num_rewards(self):
    method hparams (line 107) | def hparams(self, defaults, unused_model_hparams):
  class T2TEnv (line 129) | class T2TEnv(EnvSimulationProblem):
    method __init__ (line 149) | def __init__(self, batch_size, *args, **kwargs):
    method __str__ (line 172) | def __str__(self):
    method start_new_epoch (line 176) | def start_new_epoch(self, epoch, load_data_dir=None):
    method current_epoch_rollouts (line 189) | def current_epoch_rollouts(self, split=None, minimal_rollout_frames=0):
    method _preprocess_observations (line 215) | def _preprocess_observations(self, obs):
    method _decode_png (line 228) | def _decode_png(self, encoded_observation):
    method _encode_observations (line 235) | def _encode_observations(self, observations):
    method _step (line 248) | def _step(self, actions):
    method step (line 265) | def step(self, actions):
    method _reset (line 304) | def _reset(self, indices):
    method reset (line 315) | def reset(self, indices=None):
    method close (line 354) | def close(self):
    method num_channels (line 362) | def num_channels(self):
    method eval_metrics (line 366) | def eval_metrics(self):
    method extra_reading_spec (line 374) | def extra_reading_spec(self):
    method frame_height (line 387) | def frame_height(self):
    method frame_width (line 391) | def frame_width(self):
    method only_keep_videos_from_0th_frame (line 395) | def only_keep_videos_from_0th_frame(self):
    method _generate_frames (line 398) | def _generate_frames(self, rollouts):
    method _calc_num_frames (line 415) | def _calc_num_frames(rollouts):
    method _split_current_epoch (line 418) | def _split_current_epoch(self):
    method splits_and_paths (line 465) | def splits_and_paths(self, data_dir):
    method filepattern (line 487) | def filepattern(self, data_dir, mode, shard=None, only_last=False):
    method generate_data (line 495) | def generate_data(self, data_dir, tmp_dir=None, task_id=-1):
    method _load_epoch_data (line 520) | def _load_epoch_data(self, data_dir):
    method _load_epoch_split (line 535) | def _load_epoch_split(self, split, paths):
  class T2TGymEnv (line 585) | class T2TGymEnv(T2TEnv):
    method __init__ (line 594) | def __init__(self, base_env_name=None, batch_size=1, grayscale=False,
    method hparams (line 660) | def hparams(self, defaults, unused_model_hparams):
    method new_like (line 669) | def new_like(self, **kwargs):
    method base_env_name (line 684) | def base_env_name(self):
    method num_channels (line 688) | def num_channels(self):
    method _derive_observation_space (line 692) | def _derive_observation_space(self, orig_observ_space):
    method __str__ (line 703) | def __str__(self):
    method _encode_observations (line 706) | def _encode_observations(self, observations):
    method _preprocess_observations (line 711) | def _preprocess_observations(self, observations):
    method state (line 720) | def state(self):
    method set_initial_state (line 724) | def set_initial_state(self, initial_state, initial_frames):
    method _step (line 730) | def _step(self, actions):
    method _reset (line 736) | def _reset(self, indices):
    method close (line 765) | def close(self):
  class DummyWorldModelProblem (line 770) | class DummyWorldModelProblem(EnvSimulationProblem):
    method __init__ (line 773) | def __init__(self, action_space, reward_range, frame_height, frame_wid...
    method frame_height (line 781) | def frame_height(self):
    method frame_width (line 786) | def frame_width(self):
  function register_game (line 885) | def register_game(game_name, game_mode="NoFrameskip-v4"):

FILE: tensor2tensor/data_generators/gym_env_test.py
  class TestEnv (line 37) | class TestEnv(gym.Env):
    method __init__ (line 50) | def __init__(self):
    method _generate_ob (line 53) | def _generate_ob(self):
    method step (line 56) | def step(self, action):
    method reset (line 62) | def reset(self):
  class GymEnvTest (line 70) | class GymEnvTest(tf.test.TestCase):
    method setUp (line 78) | def setUp(self):
    method init_batch_and_play (line 84) | def init_batch_and_play(self, env_name, steps_per_epoch=1, epochs=(0,),
    method play (line 102) | def play(self, env, n_steps):
    method test_splits_dataset (line 117) | def test_splits_dataset(self):
    method test_split_preserves_number_of_rollouts (line 125) | def test_split_preserves_number_of_rollouts(self):
    method test_split_preserves_number_of_frames (line 141) | def test_split_preserves_number_of_frames(self):
    method test_generates_data (line 158) | def test_generates_data(self):
    method test_shards_per_epoch (line 171) | def test_shards_per_epoch(self):
    method test_frame_numbers_are_continuous (line 195) | def test_frame_numbers_are_continuous(self):
    method test_clipping (line 217) | def test_clipping(self):
    method test_resize (line 223) | def test_resize(self):
    method test_no_resize_option (line 240) | def test_no_resize_option(self):
    method assert_channels (line 258) | def assert_channels(self, env, obs, n_channels):
    method test_channels (line 265) | def test_channels(self):
    method test_generating_and_loading_preserves_rollouts (line 273) | def test_generating_and_loading_preserves_rollouts(self):

FILE: tensor2tensor/data_generators/ice_parsing.py
  function tabbed_parsing_token_generator (line 37) | def tabbed_parsing_token_generator(data_dir, tmp_dir, train, prefix,
  function tabbed_parsing_character_generator (line 53) | def tabbed_parsing_character_generator(tmp_dir, train):
  class ParsingIcelandic16k (line 63) | class ParsingIcelandic16k(problem.Problem):
    method source_vocab_size (line 67) | def source_vocab_size(self):
    method targeted_vocab_size (line 71) | def targeted_vocab_size(self):
    method input_space_id (line 75) | def input_space_id(self):
    method target_space_id (line 79) | def target_space_id(self):
    method num_shards (line 83) | def num_shards(self):
    method feature_encoders (line 86) | def feature_encoders(self, data_dir):
    method generate_data (line 98) | def generate_data(self, data_dir, tmp_dir, task_id=-1):
    method hparams (line 109) | def hparams(self, defaults, unused_model_hparams):

FILE: tensor2tensor/data_generators/image_lsun.py
  function pil_image (line 35) | def pil_image():
  function _get_lsun (line 40) | def _get_lsun(directory, category, split_name):
  class ImageLsunBedrooms (line 48) | class ImageLsunBedrooms(image_utils.ImageProblem):
    method num_channels (line 52) | def num_channels(self):
    method generate_data (line 56) | def generate_data(self, data_dir, tmp_dir, task_id=-1):
    method read_and_convert_to_png (line 64) | def read_and_convert_to_png(self, tmp_dir, split_name):

FILE: tensor2tensor/data_generators/image_utils.py
  function matplotlib_pyplot (line 37) | def matplotlib_pyplot():
  function image_to_tf_summary_value (line 44) | def image_to_tf_summary_value(image, tag):
  function convert_predictions_to_image_summaries (line 66) | def convert_predictions_to_image_summaries(hook_args):
  function resize_by_area (line 92) | def resize_by_area(img, size):
  function make_multiscale (line 98) | def make_multiscale(image, resolutions,
  function make_multiscale_dilated (line 126) | def make_multiscale_dilated(image, resolutions, num_channels=3):
  class ImageProblem (line 155) | class ImageProblem(problem.Problem):
    method num_channels (line 159) | def num_channels(self):
    method vocab_size (line 164) | def vocab_size(self):
    method example_reading_spec (line 168) | def example_reading_spec(self):
    method preprocess_example (line 184) | def preprocess_example(self, example, mode, hparams):
    method eval_metrics (line 189) | def eval_metrics(self):
    method decode_hooks (line 199) | def decode_hooks(self):
  class Image2ClassProblem (line 203) | class Image2ClassProblem(ImageProblem):
    method is_small (line 207) | def is_small(self):
    method num_classes (line 211) | def num_classes(self):
    method train_shards (line 215) | def train_shards(self):
    method dev_shards (line 219) | def dev_shards(self):
    method class_labels (line 223) | def class_labels(self):
    method feature_encoders (line 226) | def feature_encoders(self, data_dir):
    method generator (line 233) | def generator(self, data_dir, tmp_dir, is_training):
    method example_reading_spec (line 236) | def example_reading_spec(self):
    method hparams (line 246) | def hparams(self, defaults, unused_model_hparams):
    method generate_data (line 259) | def generate_data(self, data_dir, tmp_dir, task_id=-1):
  function encode_images_as_png (line 267) | def encode_images_as_png(images):
  function image_generator (line 283) | def image_generator(images, labels):
  class Image2TextProblem (line 315) | class Image2TextProblem(ImageProblem):
    method is_character_level (line 319) | def is_character_level(self):
    method vocab_problem (line 323) | def vocab_problem(self):
    method target_space_id (line 327) | def target_space_id(self):
    method train_shards (line 331) | def train_shards(self):
    method dev_shards (line 335) | def dev_shards(self):
    method generator (line 338) | def generator(self, data_dir, tmp_dir, is_training):
    method example_reading_spec (line 341) | def example_reading_spec(self):
    method feature_encoders (line 350) | def feature_encoders(self, data_dir):
    method hparams (line 360) | def hparams(self, defaults, unused_model_hparams):
    method generate_data (line 371) | def generate_data(self, data_dir, tmp_dir, task_id=-1):
  function image_augmentation (line 379) | def image_augmentation(images, do_colors=False, crop_size=None):
  function cifar_image_augmentation (line 393) | def cifar_image_augmentation(images):
  function random_shift (line 409) | def random_shift(image, wsr=0.1, hsr=0.1):

FILE: tensor2tensor/data_generators/image_utils_test.py
  class ImageTest (line 28) | class ImageTest(tf.test.TestCase):
    method testImageAugmentation (line 30) | def testImageAugmentation(self):
    method testImageGenerator (line 37) | def testImageGenerator(self):
    method testMakeMultiscaleDivisible (line 75) | def testMakeMultiscaleDivisible(self):
    method testMakeMultiscaleIndivisible (line 84) | def testMakeMultiscaleIndivisible(self):
    method testMakeMultiscaleLarger (line 90) | def testMakeMultiscaleLarger(self):
    method testMakeMultiscaleDilatedDivisible (line 96) | def testMakeMultiscaleDilatedDivisible(self):
    method testMakeMultiscaleDilatedIndivisible (line 105) | def testMakeMultiscaleDilatedIndivisible(self):
    method testMakeMultiscaleDilatedLarger (line 111) | def testMakeMultiscaleDilatedLarger(self):
    method testRandomShift (line 117) | def testRandomShift(self):
    method testImageToSummaryValue (line 122) | def testImageToSummaryValue(self):
    method testConvertPredictionsToImageSummaries (line 128) | def testConvertPredictionsToImageSummaries(self):

FILE: tensor2tensor/data_generators/imagenet.py
  function imagenet_pixelrnn_generator (line 57) | def imagenet_pixelrnn_generator(tmp_dir,
  function imagenet_preprocess_example (line 102) | def imagenet_preprocess_example(example, mode, resize_size=None,
  class ImageImagenet (line 121) | class ImageImagenet(image_utils.Image2ClassProblem):
    method is_small (line 125) | def is_small(self):
    method num_classes (line 129) | def num_classes(self):
    method generate_data (line 132) | def generate_data(self, data_dir, tmp_dir, task_id=-1):
    method preprocess_example (line 138) | def preprocess_example(self, example, mode, _):
  class ImageImagenetRescaled (line 142) | class ImageImagenetRescaled(ImageImagenet):
    method rescale_size (line 146) | def rescale_size(self):
    method normalize_image (line 151) | def normalize_image(self):
    method dataset_filename (line 155) | def dataset_filename(self):
    method generate_data (line 158) | def generate_data(self, data_dir, tmp_dir, task_id=-1):
    method preprocess_example (line 162) | def preprocess_example(self, example, mode, _):
  class ImageImagenet224 (line 169) | class ImageImagenet224(ImageImagenetRescaled):
    method rescale_size (line 173) | def rescale_size(self):
  class ImageImagenet224NoNormalization (line 178) | class ImageImagenet224NoNormalization(ImageImagenet224):
    method normalize_image (line 182) | def normalize_image(self):
  class ImageImagenet256 (line 188) | class ImageImagenet256(ImageImagenetRescaled):
    method rescale_size (line 192) | def rescale_size(self):
  class ImageImagenet32 (line 197) | class ImageImagenet32(ImageImagenetRescaled):
    method rescale_size (line 201) | def rescale_size(self):
    method is_small (line 205) | def is_small(self):
    method preprocess_example (line 208) | def preprocess_example(self, example, mode, _):
  class ImageImagenet32Gen (line 222) | class ImageImagenet32Gen(ImageImagenet):
    method train_shards (line 226) | def train_shards(self):
    method dev_shards (line 230) | def dev_shards(self):
    method generate_data (line 233) | def generate_data(self, data_dir, tmp_dir, task_id=-1):
    method generator (line 240) | def generator(self, data_dir, tmp_dir, is_training):
    method preprocess_example (line 248) | def preprocess_example(self, example, mode, unused_hparams):
  class ImageImagenet64Gen (line 256) | class ImageImagenet64Gen(ImageImagenet):
    method train_shards (line 260) | def train_shards(self):
    method dev_shards (line 264) | def dev_shards(self):
    method generate_data (line 267) | def generate_data(self, data_dir, tmp_dir, task_id=-1):
    method generator (line 274) | def generator(self, data_dir, tmp_dir, is_training):
    method preprocess_example (line 282) | def preprocess_example(self, example, mode, unused_hparams):
  class ImageImagenetMultiResolutionGen (line 290) | class ImageImagenetMultiResolutionGen(ImageImagenet64Gen):
    method dataset_filename (line 296) | def dataset_filename(self):
    method train_shards (line 300) | def train_shards(self):
    method dev_shards (line 304) | def dev_shards(self):
    method preprocess_example (line 307) | def preprocess_example(self, example, mode, hparams):
  class ImageImagenet64GenFlat (line 336) | class ImageImagenet64GenFlat(ImageImagenet64Gen):
    method dataset_filename (line 339) | def dataset_filename(self):
    method preprocess_example (line 342) | def preprocess_example(self, example, mode, unused_hparams):
    method hparams (line 352) | def hparams(self, defaults, model_hparams):
  class ImageImagenet32Small (line 361) | class ImageImagenet32Small(ImageImagenet):
    method is_small (line 365) | def is_small(self):
    method num_classes (line 369) | def num_classes(self):
    method train_shards (line 373) | def train_shards(self):
    method dev_shards (line 377) | def dev_shards(self):
    method preprocess_example (line 380) | def preprocess_example(self, example, mode, unused_hparams):
  class ImageImagenet64 (line 388) | class ImageImagenet64(ImageImagenet32):
    method rescale_size (line 392) | def rescale_size(self):
  class Img2imgImagenet (line 397) | class Img2imgImagenet(image_utils.ImageProblem):
    method dataset_filename (line 400) | def dataset_filename(self):
    method preprocess_example (line 403) | def preprocess_example(self, example, unused_mode, unused_hparams):
    method generate_data (line 411) | def generate_data(self, data_dir, tmp_dir, task_id=-1):
    method hparams (line 414) | def hparams(self, defaults, unused_model_hparams):
  function _crop (line 428) | def _crop(image, offset_height, offset_width, crop_height, crop_width):
  function distorted_bounding_box_crop (line 470) | def distorted_bounding_box_crop(image,
  function _random_crop (line 528) | def _random_crop(image, size):
  function _flip (line 547) | def _flip(image):
  function _at_least_x_are_true (line 553) | def _at_least_x_are_true(a, b, x):
  function _do_scale (line 560) | def _do_scale(image, size):
  function _center_crop (line 571) | def _center_crop(image, size):
  function _normalize (line 582) | def _normalize(image):
  function preprocess_for_train (line 592) | def preprocess_for_train(image, image_size=224, normalize=True):
  function preprocess_for_eval (line 611) | def preprocess_for_eval(image, image_size=224, normalize=True):

FILE: tensor2tensor/data_generators/imagenet_test.py
  class ImagenetTest (line 30) | class ImagenetTest(parameterized.TestCase, tf.test.TestCase):
    method testImagenetMultiResolutionPreprocessExample (line 36) | def testImagenetMultiResolutionPreprocessExample(self, resize_method):
    method testImagenetIsNormalized (line 48) | def testImagenetIsNormalized(self):

FILE: tensor2tensor/data_generators/imdb.py
  class SentimentIMDB (line 33) | class SentimentIMDB(text_problems.Text2ClassProblem):
    method is_generate_per_split (line 38) | def is_generate_per_split(self):
    method dataset_splits (line 42) | def dataset_splits(self):
    method approx_vocab_size (line 52) | def approx_vocab_size(self):
    method num_classes (line 56) | def num_classes(self):
    method class_labels (line 59) | def class_labels(self, data_dir):
    method doc_generator (line 63) | def doc_generator(self, imdb_dir, dataset, include_label=False):
    method generate_samples (line 76) | def generate_samples(self, data_dir, tmp_dir, dataset_split):
  class SentimentIMDBCharacters (line 98) | class SentimentIMDBCharacters(SentimentIMDB):
    method vocab_type (line 102) | def vocab_type(self):
    method global_task_id (line 105) | def global_task_id(self):

FILE: tensor2tensor/data_generators/inspect_tfrecord.py
  function main (line 48) | def main(_):

FILE: tensor2tensor/data_generators/lambada.py
  function _prepare_lambada_data (line 57) | def _prepare_lambada_data(tmp_dir, data_dir, vocab_size, vocab_filename):
  function get_dataset_split (line 89) | def get_dataset_split(tmp_dir, split, use_control_set):
  class LambadaLm (line 130) | class LambadaLm(text_problems.Text2SelfProblem):
    method is_generate_per_split (line 134) | def is_generate_per_split(self):
    method dataset_splits (line 143) | def dataset_splits(self):
    method vocab_type (line 161) | def vocab_type(self):
    method vocab_size (line 165) | def vocab_size(self):
    method oov_token (line 170) | def oov_token(self):
    method use_control_set (line 174) | def use_control_set(self):
    method generate_samples (line 178) | def generate_samples(self, data_dir, tmp_dir, dataset_split):
  class LambadaLmControl (line 211) | class LambadaLmControl(LambadaLm):
    method control_set (line 215) | def control_set(self):
  class LambadaRc (line 221) | class LambadaRc(text_problems.Text2ClassProblem):
    method is_generate_per_split (line 225) | def is_generate_per_split(self):
    method dataset_splits (line 234) | def dataset_splits(self):
    method vocab_type (line 252) | def vocab_type(self):
    method vocab_size (line 256) | def vocab_size(self):
    method oov_token (line 261) | def oov_token(self):
    method use_control_set (line 265) | def use_control_set(self):
    method get_labels_encoder (line 269) | def get_labels_encoder(self, data_dir):
    method generate_samples (line 282) | def generate_samples(self, data_dir, tmp_dir, dataset_split):
    method generate_encoded_samples (line 316) | def generate_encoded_samples(self, data_dir, tmp_dir, dataset_split):
    method feature_encoders (line 337) | def feature_encoders(self, data_dir):
    method hparams (line 351) | def hparams(self, defaults, unused_model_hparams):
  class LambadaRcControl (line 368) | class LambadaRcControl(LambadaRc):
    method control_set (line 372) | def control_set(self):

FILE: tensor2tensor/data_generators/librispeech.py
  function _collect_data (line 63) | def _collect_data(directory, input_ext, transcription_ext):
  class Librispeech (line 89) | class Librispeech(speech_recognition.SpeechRecognitionProblem):
    method num_shards (line 98) | def num_shards(self):
    method use_subword_tokenizer (line 102) | def use_subword_tokenizer(self):
    method num_dev_shards (line 106) | def num_dev_shards(self):
    method num_test_shards (line 110) | def num_test_shards(self):
    method use_train_shards_for_dev (line 114) | def use_train_shards_for_dev(self):
    method generator (line 118) | def generator(self, data_dir, tmp_dir, datasets,
    method generate_data (line 160) | def generate_data(self, data_dir, tmp_dir, task_id=-1):
  class LibrispeechTrainFullTestClean (line 183) | class LibrispeechTrainFullTestClean(Librispeech):
    method training_filepaths (line 186) | def training_filepaths(self, data_dir, num_shards, shuffled):
    method dev_filepaths (line 189) | def dev_filepaths(self, data_dir, num_shards, shuffled):
    method test_filepaths (line 192) | def test_filepaths(self, data_dir, num_shards, shuffled):
    method generate_data (line 195) | def generate_data(self, data_dir, tmp_dir, task_id=-1):
    method filepattern (line 198) | def filepattern(self, data_dir, mode, shard=None):
  class LibrispeechTrainFullTestOther (line 231) | class LibrispeechTrainFullTestOther(Librispeech):
    method training_filepaths (line 234) | def training_filepaths(self, data_dir, num_shards, shuffled):
    method dev_filepaths (line 237) | def dev_filepaths(self, data_dir, num_shards, shuffled):
    method test_filepaths (line 240) | def test_filepaths(self, data_dir, num_shards, shuffled):
    method generate_data (line 243) | def generate_data(self, data_dir, tmp_dir, task_id=-1):
    method filepattern (line 246) | def filepattern(self, data_dir, mode, shard=None):
  class LibrispeechCleanSmall (line 279) | class LibrispeechCleanSmall(Librispeech):
  class LibrispeechClean (line 289) | class LibrispeechClean(Librispeech):
  class LibrispeechNoisy (line 299) | class LibrispeechNoisy(Librispeech):
  function add_librispeech_hparams (line 309) | def add_librispeech_hparams(hparams):
  function set_librispeech_length_hparams (line 324) | def set_librispeech_length_hparams(hparams):

FILE: tensor2tensor/data_generators/lm1b.py
  function _original_vocab (line 35) | def _original_vocab(tmp_dir):
  function _replace_oov (line 58) | def _replace_oov(original_vocab, line):
  function _train_data_filenames (line 74) | def _train_data_filenames(tmp_dir):
  function _dev_data_filenames (line 83) | def _dev_data_filenames(tmp_dir):
  function _maybe_download_corpus (line 90) | def _maybe_download_corpus(tmp_dir):
  class LanguagemodelLm1b32k (line 107) | class LanguagemodelLm1b32k(text_problems.Text2SelfProblem):
    method approx_vocab_size (line 115) | def approx_vocab_size(self):
    method max_samples_for_vocab (line 119) | def max_samples_for_vocab(self):
    method is_generate_per_split (line 122) | def is_generate_per_split(self):
    method generate_samples (line 125) | def generate_samples(self, data_dir, tmp_dir, dataset_split):
  class LanguagemodelLm1b8k (line 142) | class LanguagemodelLm1b8k(LanguagemodelLm1b32k):
    method approx_vocab_size (line 145) | def approx_vocab_size(self):
  class LanguagemodelLm1b32kPacked (line 150) | class LanguagemodelLm1b32kPacked(LanguagemodelLm1b32k):
    method packed_length (line 154) | def packed_length(self):
    method vocab_filename (line 158) | def vocab_filename(self):
  class LanguagemodelLm1b8kPacked (line 163) | class LanguagemodelLm1b8kPacked(LanguagemodelLm1b8k):
    method packed_length (line 171) | def packed_length(self):
    method vocab_filename (line 175) | def vocab_filename(self):
  class LanguagemodelLm1bCharacters (line 180) | class LanguagemodelLm1bCharacters(LanguagemodelLm1b32k):
    method vocab_type (line 188) | def vocab_type(self):
    method global_task_id (line 191) | def global_task_id(self):
  class LanguagemodelLm1bCharactersPacked (line 196) | class LanguagemodelLm1bCharactersPacked(LanguagemodelLm1bCharacters):
    method packed_length (line 204) | def packed_length(self):

FILE: tensor2tensor/data_generators/lm1b_imdb.py
  class LanguagemodelLm1bSentimentIMDB (line 30) | class LanguagemodelLm1bSentimentIMDB(multi_problem.MultiProblem):
    method __init__ (line 33) | def __init__(self, was_reversed=False, was_copy=False):
    method vocab_type (line 39) | def vocab_type(self):

FILE: tensor2tensor/data_generators/lm1b_mnli.py
  class LanguagemodelLm1bMultiNLISubwords (line 30) | class LanguagemodelLm1bMultiNLISubwords(multi_problem.MultiProblem):
    method __init__ (line 33) | def __init__(self, was_reversed=False, was_copy=False):
    method vocab_type (line 40) | def vocab_type(self):
  class LanguagemodelLm1bMultiNLI (line 45) | class LanguagemodelLm1bMultiNLI(multi_problem.MultiProblem):
    method __init__ (line 48) | def __init__(self, was_reversed=False, was_copy=False):
    method vocab_type (line 54) | def vocab_type(self):

FILE: tensor2tensor/data_generators/mnist.py
  function _get_mnist (line 42) | def _get_mnist(directory):
  function _extract_mnist_images (line 51) | def _extract_mnist_images(filename, num_images):
  function _extract_mnist_labels (line 69) | def _extract_mnist_labels(filename, num_labels):
  function mnist_common_generator (line 86) | def mnist_common_generator(tmp_dir,
  function mnist_generator (line 117) | def mnist_generator(tmp_dir, training, how_many, start_from=0):
  class ImageMnistTune (line 136) | class ImageMnistTune(image_utils.Image2ClassProblem):
    method num_channels (line 140) | def num_channels(self):
    method is_small (line 144) | def is_small(self):
    method num_classes (line 148) | def num_classes(self):
    method class_labels (line 152) | def class_labels(self):
    method train_shards (line 156) | def train_shards(self):
    method preprocess_example (line 159) | def preprocess_example(self, example, mode, unused_hparams):
    method generator (line 167) | def generator(self, data_dir, tmp_dir, is_training):
  class ImageMnist (line 175) | class ImageMnist(ImageMnistTune):
    method generator (line 177) | def generator(self, data_dir, tmp_dir, is_training):
  function _get_fashion_mnist (line 191) | def _get_fashion_mnist(directory):
  function fashion_mnist_generator (line 204) | def fashion_mnist_generator(tmp_dir, training, how_many, start_from=0):
  class ImageFashionMnist (line 225) | class ImageFashionMnist(image_utils.Image2ClassProblem):
    method is_small (line 229) | def is_small(self):
    method num_channels (line 233) | def num_channels(self):
    method num_classes (line 237) | def num_classes(self):
    method class_labels (line 241) | def class_labels(self):
    method train_shards (line 245) | def train_shards(self):
    method preprocess_example (line 248) | def preprocess_example(self, example, mode, unused_hparams):
    method generator (line 254) | def generator(self, data_dir, tmp_dir, is_training):

FILE: tensor2tensor/data_generators/moving_mnist.py
  class VideoMovingMnist (line 52) | class VideoMovingMnist(video_utils.VideoProblem):
    method num_channels (line 56) | def num_channels(self):
    method frame_height (line 60) | def frame_height(self):
    method frame_width (line 64) | def frame_width(self):
    method is_generate_per_split (line 68) | def is_generate_per_split(self):
    method total_number_of_frames (line 73) | def total_number_of_frames(self):
    method max_frames_per_video (line 76) | def max_frames_per_video(self, hparams):
    method random_skip (line 80) | def random_skip(self):
    method dataset_splits (line 84) | def dataset_splits(self):
    method extra_reading_spec (line 92) | def extra_reading_spec(self):
    method hparams (line 103) | def hparams(self, defaults, unused_model_hparams):
    method get_test_iterator (line 110) | def get_test_iterator(self, tmp_dir):
    method map_fn (line 120) | def map_fn(self, image, label):
    method get_train_iterator (line 125) | def get_train_iterator(self):
    method generate_samples (line 133) | def generate_samples(self, data_dir, tmp_dir, dataset_split):

FILE: tensor2tensor/data_generators/mrpc.py
  class MSRParaphraseCorpus (line 34) | class MSRParaphraseCorpus(text_problems.TextConcat2ClassProblem):
    method is_generate_per_split (line 49) | def is_generate_per_split(self):
    method dataset_splits (line 53) | def dataset_splits(self):
    method approx_vocab_size (line 66) | def approx_vocab_size(self):
    method num_classes (line 70) | def num_classes(self):
    method class_labels (line 73) | def class_labels(self, data_dir):
    method _maybe_download_corpora (line 77) | def _maybe_download_corpora(self, tmp_dir):
    method example_generator (line 94) | def example_generator(self, filename, dev_ids, dataset_split):
    method generate_samples (line 111) | def generate_samples(self, data_dir, tmp_dir, dataset_split):
  class MSRParaphraseCorpusCharacters (line 128) | class MSRParaphraseCorpusCharacters(MSRParaphraseCorpus):
    method vocab_type (line 132) | def vocab_type(self):
    method global_task_id (line 135) | def global_task_id(self):

FILE: tensor2tensor/data_generators/mscoco.py
  function _get_mscoco (line 49) | def _get_mscoco(directory):
  function mscoco_generator (line 60) | def mscoco_generator(data_dir,
  class ImageMsCocoCharacters (line 146) | class ImageMsCocoCharacters(image_utils.Image2TextProblem):
    method is_character_level (line 150) | def is_character_level(self):
    method target_space_id (line 154) | def target_space_id(self):
    method train_shards (line 158) | def train_shards(self):
    method dev_shards (line 162) | def dev_shards(self):
    method preprocess_example (line 165) | def preprocess_example(self, example, mode, _):
    method generator (line 168) | def generator(self, data_dir, tmp_dir, is_training):
  class ImageMsCocoTokens32k (line 177) | class ImageMsCocoTokens32k(ImageMsCocoCharacters):
    method is_character_level (line 181) | def is_character_level(self):
    method vocab_problem (line 185) | def vocab_problem(self):
    method target_space_id (line 189) | def target_space_id(self):
    method train_shards (line 193) | def train_shards(self):
    method dev_shards (line 197) | def dev_shards(self):
    method generator (line 200) | def generator(self, data_dir, tmp_dir, is_training):
  class ImageTextMsCocoMultiResolution (line 222) | class ImageTextMsCocoMultiResolution(ImageMsCocoTokens32k):
    method dataset_filename (line 225) | def dataset_filename(self):
    method preprocess_example (line 228) | def preprocess_example(self, example, mode, hparams):
  class ImageTextMsCoco (line 257) | class ImageTextMsCoco(ImageMsCocoTokens32k):
    method dataset_filename (line 261) | def dataset_filename(self):
    method preprocess_example (line 264) | def preprocess_example(self, example, mode, unused_hparams):

FILE: tensor2tensor/data_generators/mscoco_test.py
  class MscocoTest (line 30) | class MscocoTest(parameterized.TestCase, tf.test.TestCase):
    method testMsCocoMultiResolutionPreprocessExample (line 36) | def testMsCocoMultiResolutionPreprocessExample(self, resize_method):

FILE: tensor2tensor/data_generators/multi_problem.py
  class MixingSchedule (line 32) | class MixingSchedule(object):
  function normalize_example_nlp (line 39) | def normalize_example_nlp(task, example, is_infer, vocab_type, vocab_off...
  function flatten_zip_dataset (line 112) | def flatten_zip_dataset(*args):
  class MultiProblem (line 132) | class MultiProblem(problem.Problem):
    method __init__ (line 137) | def __init__(self, was_reversed=False, was_copy=False):
    method generate_data (line 141) | def generate_data(self, data_dir, tmp_dir, task_id=-1):
    method normalize_example (line 146) | def normalize_example(self, task, example, encoder, hparams, is_infer):
    method filepattern (line 157) | def filepattern(self, data_dir, mode, shard=None):
    method get_hparams (line 161) | def get_hparams(self, model_hparams=None):
    method dataset (line 179) | def dataset(self,
    method eval_metrics (line 375) | def eval_metrics(self):
    method update_task_ids (line 386) | def update_task_ids(self, encoder_vocab_size):
    method get_max_num_classes (line 400) | def get_max_num_classes(self):
  function aggregate_task_losses (line 420) | def aggregate_task_losses(hparams,
  function aggregate_task_lm_losses (line 526) | def aggregate_task_lm_losses(hparams,

FILE: tensor2tensor/data_generators/multi_problem_v2.py
  class MultiProblemV2 (line 67) | class MultiProblemV2(problem.Problem):
    method __init__ (line 70) | def __init__(self, problems, schedule, **kwargs):
    method filepattern (line 82) | def filepattern(self, *args, **kwargs):
    method generate_data (line 86) | def generate_data(self, *args, **kwargs):
    method only_eval_first_problem (line 92) | def only_eval_first_problem(self):
    method normalize_example (line 96) | def normalize_example(self, example, hparams):
    method dataset (line 101) | def dataset(self, mode, hparams=None, global_step=None, **kwargs):
  class MultiText2TextProblem (line 136) | class MultiText2TextProblem(MultiProblemV2, text_problems.Text2TextProbl...
    method normalize_example (line 139) | def normalize_example(self, example, hparams):
    method generate_data_with_shared_vocab (line 183) | def generate_data_with_shared_vocab(self, data_dir, tmp_dir, task_id=-1):
    method packed_length (line 200) | def packed_length(self):
  function get_multi_dataset (line 205) | def get_multi_dataset(datasets, pmf=None):
  function get_schedule_distribution (line 226) | def get_schedule_distribution(schedule, global_step=None):
  function categorical_case (line 256) | def categorical_case(pmf, fns, rand=None):
  function linear_interpolation (line 274) | def linear_interpolation(x, xp, fp, **kwargs):
  function step_interpolation (line 297) | def step_interpolation(x, xp, fp, **kwargs):
  function constant_schedule (line 328) | def constant_schedule(pmf):
  function example_rates_to_pmf (line 341) | def example_rates_to_pmf(example_rates):
  function epoch_rates_to_pmf (line 353) | def epoch_rates_to_pmf(problems, epoch_rates=None):
  function encode_schedule (line 378) | def encode_schedule(schedule):
  function decode_schedule (line 397) | def decode_schedule(string):
  function tuplize (line 413) | def tuplize(nested):

FILE: tensor2tensor/data_generators/multi_problem_v2_test.py
  class MultiProblemV2Test (line 30) | class MultiProblemV2Test(parameterized.TestCase, tf.test.TestCase):
    method test_tuplize (line 42) | def test_tuplize(self, inputs, targets):
    method test_encode_decode_schedule (line 59) | def test_encode_decode_schedule(self, schedule, string):
    method test_linear_interpolation (line 78) | def test_linear_interpolation(self, x, xp, fp, y):
    method test_step_interpolation (line 96) | def test_step_interpolation(self, x, xp, fp, y):
    method test_get_schedule_distribution (line 115) | def test_get_schedule_distribution(self, schedule, steps, pmfs):
    method test_categorical_case (line 138) | def test_categorical_case(self, pmf, fns, rands, targets):
    method test_get_multi_dataset (line 161) | def test_get_multi_dataset(self, pmf, num_datasets, sample_size):
    method test_multi_problem_v2 (line 186) | def test_multi_problem_v2(self, schedule, num_datasets, sample_size):

FILE: tensor2tensor/data_generators/multinli.py
  function _maybe_download_corpora (line 42) | def _maybe_download_corpora(tmp_dir):
  function _example_generator (line 62) | def _example_generator(filename):
  class MultiNLI (line 83) | class MultiNLI(text_problems.TextConcat2ClassProblem):
    method is_generate_per_split (line 87) | def is_generate_per_split(self):
    method dataset_splits (line 91) | def dataset_splits(self):
    method approx_vocab_size (line 101) | def approx_vocab_size(self):
    method num_classes (line 105) | def num_classes(self):
    method class_labels (line 108) | def class_labels(self, data_dir):
    method generate_samples (line 113) | def generate_samples(self, data_dir, tmp_dir, dataset_split):
  class MultiNLIText2text (line 132) | class MultiNLIText2text(text_problems.Text2TextProblem):
    method is_generate_per_split (line 136) | def is_generate_per_split(self):
    method approx_vocab_size (line 140) | def approx_vocab_size(self):
    method generate_samples (line 143) | def generate_samples(self, data_dir, tmp_dir, dataset_split):
  class MultiNLIText2textMulti64kPacked1k (line 162) | class MultiNLIText2textMulti64kPacked1k(MultiNLIText2text):
    method packed_length (line 166) | def packed_length(self):
    method use_vocab_from_other_problem (line 170) | def use_vocab_from_other_problem(self):
    method num_training_examples (line 174) | def num_training_examples(self):
  class MultiNLICharacters (line 179) | class MultiNLICharacters(MultiNLI):
    method vocab_type (line 183) | def vocab_type(self):
    method global_task_id (line 186) | def global_task_id(self):
  class MultiNLISharedVocab (line 191) | class MultiNLISharedVocab(MultiNLI):
    method use_vocab_from_other_problem (line 195) | def use_vocab_from_other_problem(self):
  class MultiNLIWikiLMSharedVocab (line 200) | class MultiNLIWikiLMSharedVocab(MultiNLI):
    method use_vocab_from_other_problem (line 204) | def use_vocab_from_other_problem(self):
  class MultiNLIWikiLMSharedVocab64k (line 209) | class MultiNLIWikiLMSharedVocab64k(MultiNLIWikiLMSharedVocab):
    method use_vocab_from_other_problem (line 213) | def use_vocab_from_other_problem(self):
  class MultiNLIWikiLMMultiVocab64k (line 218) | class MultiNLIWikiLMMultiVocab64k(MultiNLIWikiLMSharedVocab):
    method use_vocab_from_other_problem (line 222) | def use_vocab_from_other_problem(self):

FILE: tensor2tensor/data_generators/ocr.py
  class OcrTest (line 31) | class OcrTest(image_utils.Image2TextProblem):
    method is_small (line 35) | def is_small(self):
    method is_character_level (line 39) | def is_character_level(self):
    method target_space_id (line 43) | def target_space_id(self):
    method train_shards (line 47) | def train_shards(self):
    method dev_shards (line 51) | def dev_shards(self):
    method preprocess_example (line 54) | def preprocess_example(self, example, mode, _):
    method generator (line 63) | def generator(self, data_dir, tmp_dir, is_training):

FILE: tensor2tensor/data_generators/ops/pack_sequences_ops.cc
  type tensor2tensor (line 9) | namespace tensor2tensor {
    class PackSequences2Op (line 51) | class PackSequences2Op : public OpKernel {
      method PackSequences2Op (line 53) | explicit PackSequences2Op(
      method Compute (line 57) | void Compute(OpKernelContext* ctx) override {
    type PackingSpec (line 244) | struct PackingSpec {
    class PackSequencesKOp (line 255) | class PackSequencesKOp : public OpKernel {
      method PackSequencesKOp (line 257) | explicit PackSequencesKOp(OpKernelConstruction* ctx) : OpKernel(ctx) {
      method Compute (line 266) | void Compute(OpKernelContext* ctx) override {
      method GetInputLengths (line 386) | std::vector<int> GetInputLengths(
      method GetInputLengths (line 407) | std::vector<int> GetInputLengths(
      method GetInputLengths (line 425) | std::vector<int> GetInputLengths(
      method GetInputLengths (line 442) | std::vector<int> GetInputLengths(
      method SetZero (line 461) | void SetZero(OpKernelContext* ctx, Tensor* inputs) {
      method SetZero (line 482) | void SetZero(OpKernelContext* ctx, Tensor* inputs) {
      method PackSequence (line 496) | void PackSequence(OpKernelContext* ctx, const Tensor& inputs, Tensor...
      method PackSequence (line 524) | void PackSequence(OpKernelContext* ctx, const Tensor& inputs, Tensor...
      method PackSequence (line 554) | void PackSequence(OpKernelContext* ctx,
      method PackSequence (line 568) | void PackSequence(OpKernelContext* ctx,

FILE: tensor2tensor/data_generators/ops/pack_sequences_ops_test.py
  function _pack_sequences_k (line 27) | def _pack_sequences_k(inputs, targets, input_max_length, target_max_leng...
  class PackSequencesOpsTest (line 42) | class PackSequencesOpsTest(tf.test.TestCase):
    method do_test_pack_sequences_length3 (line 44) | def do_test_pack_sequences_length3(self, pack_fn):
    method do_test_pack_sequences_length4 (line 91) | def do_test_pack_sequences_length4(self, pack_fn):
    method do_test_pack_sequences_length5 (line 132) | def do_test_pack_sequences_length5(self, pack_fn):
    method do_test_pack_sequences_length6 (line 178) | def do_test_pack_sequences_length6(self, pack_fn):
    method do_test_pack_sequences_length7 (line 212) | def do_test_pack_sequences_length7(self, pack_fn):
    method do_test_pack_sequences_length_different_lengths (line 246) | def do_test_pack_sequences_length_different_lengths(self, pack_fn):
    method test_pack_sequences2 (line 293) | def test_pack_sequences2(self):
    method test_pack_sequences_k (line 302) | def test_pack_sequences_k(self):
    method test_random_inputs (line 310) | def test_random_inputs(self):
    method test_pack_sequences_k_multi_input (line 350) | def test_pack_sequences_k_multi_input(self):
    method test_pack_sequences_k_int64 (line 419) | def test_pack_sequences_k_int64(self):
    method test_pack_sequences_k_bfloat16 (line 448) | def test_pack_sequences_k_bfloat16(self):

FILE: tensor2tensor/data_generators/ops/subword_text_encoder.cc
  type tensor2tensor (line 11) | namespace tensor2tensor {

FILE: tensor2tensor/data_generators/ops/subword_text_encoder.h
  function namespace (line 10) | namespace tensor2tensor {

FILE: tensor2tensor/data_generators/ops/subword_text_encoder_ops.cc
  type tensor2tensor (line 9) | namespace tensor2tensor {
    class SubwordTextEncoderEncodeOp (line 31) | class SubwordTextEncoderEncodeOp : public OpKernel {
      method SubwordTextEncoderEncodeOp (line 33) | explicit SubwordTextEncoderEncodeOp(
      method Compute (line 40) | void Compute(OpKernelContext* ctx) override {

FILE: tensor2tensor/data_generators/ops/subword_text_encoder_ops_test.py
  class SubwordTextEncoderOpsTest (line 29) | class SubwordTextEncoderOpsTest(tf.test.TestCase):
    method test_subword_text_encoder_encode (line 31) | def test_subword_text_encoder_encode(self):

FILE: tensor2tensor/data_generators/ops/subword_text_encoder_test.cc
  type tensor2tensor (line 7) | namespace tensor2tensor {
    function TEST (line 10) | TEST(SubwordTextEncoderTest, EncodesSubTokens) {
    function TEST (line 18) | TEST(SubwordTextEncoderTest, EncodesUnicodeSubTokens) {
    function TEST (line 26) | TEST(SubwordTextEncoderTest, EncodesUnicodeCodePoints) {
    function TEST (line 34) | TEST(SubwordTextEncoderTest, EncodesCharactersNotInAlphabet) {

FILE: tensor2tensor/data_generators/paraphrase_ms_coco.py
  function create_combination (line 42) | def create_combination(list_of_sentences):
  class ParaphraseGenerationProblem (line 70) | class ParaphraseGenerationProblem(text_problems.Text2TextProblem):
    method bidirectional (line 74) | def bidirectional(self):
    method prepare_data (line 82) | def prepare_data(self, data_dir, tmp_dir, dataset_split):
    method generate_samples (line 85) | def generate_samples(self, data_dir, tmp_dir, dataset_split):
  class ParaphraseGenerationMsCocoProblem (line 98) | class ParaphraseGenerationMsCocoProblem(ParaphraseGenerationProblem):
    method is_generate_per_split (line 102) | def is_generate_per_split(self):
    method dataset_splits (line 106) | def dataset_splits(self):
    method approx_vocab_size (line 116) | def approx_vocab_size(self):
    method prepare_data (line 119) | def prepare_data(self, data_dir, tmp_dir, dataset_split):
    method _maybe_download (line 133) | def _maybe_download(self, tmp_dir, dataset_split):
    method _get_captions (line 149) | def _get_captions(self, ms_coco_path):
  class ParaphraseGenerationMsCocoProblem2d (line 164) | class ParaphraseGenerationMsCocoProblem2d(
    method bidirectional (line 168) | def bidirectional(self):
  class ParaphraseGenerationMsCocoProblem1d (line 173) | class ParaphraseGenerationMsCocoProblem1d(
    method bidirectional (line 177) | def bidirectional(self):
  class ParaphraseGenerationMsCocoProblem2dCharacters (line 182) | class ParaphraseGenerationMsCocoProblem2dCharacters(
    method vocab_type (line 186) | def vocab_type(self):
  class ParaphraseGenerationMsCocoProblem1dCharacters (line 191) | class ParaphraseGenerationMsCocoProblem1dCharacters(
    method vocab_type (line 195) | def vocab_type(self):

FILE: tensor2tensor/data_generators/paraphrase_ms_coco_test.py
  class ParaphraseGenerationProblemTest (line 29) | class ParaphraseGenerationProblemTest(tf.test.TestCase):
    method testCombinationPairs (line 31) | def testCombinationPairs(self):
    method testBidirectionalTrue (line 42) | def testBidirectionalTrue(self, data, bidirectional):
    method testBidirectionalFalse (line 59) | def testBidirectionalFalse(self, data, bidirectional):

FILE: tensor2tensor/data_generators/pointer_generator_word.py
  class Text2textCopyableTokens (line 31) | class Text2textCopyableTokens(text_problems.Text2textTmpdirTokens):
    method get_or_create_vocab (line 39) | def get_or_create_vocab(self, data_dir, tmp_dir, force_get=False):
    method generate_encoded_samples (line 45) | def generate_encoded_samples(self, data_dir, tmp_dir, dataset_split):
    method text2text_generate_encoded_oovs (line 51) | def text2text_generate_encoded_oovs(self,
    method example_reading_spec (line 71) | def example_reading_spec(self):
  class TokenTextEncoderOov (line 82) | class TokenTextEncoderOov(text_encoder.TokenTextEncoder):
    method encode (line 93) | def encode(self, s):
    method encode_target (line 138) | def encode_target(self, target, source_oovs):
    method decode_oov (line 175) | def decode_oov(self, ids, source_oov):
    method decode_list_oov (line 178) | def decode_list_oov(self, ids, source_oov_id_to_token):

FILE: tensor2tensor/data_generators/problem.py
  class DatasetSplit (line 47) | class DatasetSplit(object):
  class SpaceID (line 53) | class SpaceID(object):
  class TaskID (line 119) | class TaskID(object):
  function default_model_hparams (line 141) | def default_model_hparams():
  function preprocess_example_common (line 150) | def preprocess_example_common(example, mode, hparams):
  class Problem (line 173) | class Problem(object):
    method generate_data (line 232) | def generate_data(self, data_dir, tmp_dir, task_id=-1):
    method multiprocess_generate (line 236) | def multiprocess_generate(self):
    method num_generate_tasks (line 241) | def num_generate_tasks(self):
    method num_training_examples (line 246) | def num_training_examples(self):
    method prepare_to_generate (line 250) | def prepare_to_generate(self, data_dir, tmp_dir):
    method hparams (line 264) | def hparams(self, defaults, model_hparams):
    method max_length (line 267) | def max_length(self, model_hparams):
    method tpu_batch_size_per_shard (line 280) | def tpu_batch_size_per_shard(self, model_hparams):
    method batch_size_means_tokens (line 294) | def batch_size_means_tokens(self):
    method skip_random_fraction_when_training (line 313) | def skip_random_fraction_when_training(self):
    method dataset_filename (line 321) | def dataset_filename(self):
    method feature_encoders (line 324) | def feature_encoders(self, data_dir):
    method example_reading_spec (line 331) | def example_reading_spec(self):
    method preprocess_example (line 346) | def preprocess_example(self, example, mode, hparams):
    method eval_metrics (line 362) | def eval_metrics(self):
    method all_metrics_fns (line 369) | def all_metrics_fns(self):
    method eval_metric_fns (line 372) | def eval_metric_fns(self, model_hparams):
    method eval_hooks (line 386) | def eval_hooks(self, features, logits, hparams):
    method task_id (line 391) | def task_id(self):
    method set_task_id (line 396) | def set_task_id(self, new_task_id):
    method preprocess (line 404) | def preprocess(self, dataset, mode, hparams, interleave=True):
    method training_filepaths (line 436) | def training_filepaths(self, data_dir, num_shards, shuffled):
    method dev_filepaths (line 443) | def dev_filepaths(self, data_dir, num_shards, shuffled):
    method test_filepaths (line 450) | def test_filepaths(self, data_dir, num_shards, shuffled):
    method data_filepaths (line 457) | def data_filepaths(self, split, output_dir, num_shards, shuffled):
    method filepattern (line 467) | def filepattern(self, data_dir, mode, shard=None):
    method __init__ (line 496) | def __init__(self, was_reversed=False, was_copy=False):
    method was_reversed (line 513) | def was_reversed(self):
    method get_feature_encoders (line 517) | def get_feature_encoders(self, data_dir=None):
    method get_hparams (line 522) | def get_hparams(self, model_hparams=None):
    method maybe_reverse_features (line 553) | def maybe_reverse_features(self, feature_map):
    method maybe_copy_features (line 576) | def maybe_copy_features(self, feature_map):
    method maybe_reverse_and_copy (line 587) | def maybe_reverse_and_copy(self, example):
    method dataset (line 593) | def dataset(self,
    method decode_example (line 710) | def decode_example(self, serialized_example):
    method decode_hooks (line 741) | def decode_hooks(self):
    method has_inputs (line 752) | def has_inputs(self):
    method feature_info (line 756) | def feature_info(self):
    method make_estimator_input_fn (line 792) | def make_estimator_input_fn(self,
    method _dataset_partition (line 814) | def _dataset_partition(self, mode, config, params):
    method input_fn (line 851) | def input_fn(self,
    method export_assets (line 910) | def export_assets(self):
    method serving_input_fn (line 920) | def serving_input_fn(self, hparams, decode_hparams=None, use_tpu=False):
  class FeatureInfo (line 955) | class FeatureInfo(object):
    method __init__ (line 958) | def __init__(self,
  function _copy_problem_hparams (line 969) | def _copy_problem_hparams(p_hparams):
  function _reverse_problem_hparams (line 984) | def _reverse_problem_hparams(p_hparams):
  function _default_hparams (line 1042) | def _default_hparams():
  function problem_hparams_to_features (line 1078) | def problem_hparams_to_features(problem_hparams):

FILE: tensor2tensor/data_generators/problem_hparams.py
  class AudioTimitProblem (line 35) | class AudioTimitProblem(problem.Problem):
    method example_reading_spec (line 38) | def example_reading_spec(self):
    method preprocess_example (line 47) | def preprocess_example(self, example, mode, hparams):
  class AudioTimitCharactersTune (line 60) | class AudioTimitCharactersTune(AudioTimitProblem):
    method feature_encoders (line 63) | def feature_encoders(self, _):
    method hparams (line 69) | def hparams(self, defaults, model_hparams):
  class AudioTimitTokens8kTune (line 78) | class AudioTimitTokens8kTune(AudioTimitProblem):
    method target_vocab_size (line 82) | def target_vocab_size(self):
    method feature_encoders (line 85) | def feature_encoders(self, data_dir):
    method hparams (line 94) | def hparams(self, defaults, model_hparams):
  class AudioTimitTokens8kTest (line 109) | class AudioTimitTokens8kTest(AudioTimitTokens8kTune):
  class ParsingEnglishPtb8k (line 115) | class ParsingEnglishPtb8k(problem.Problem):
    method target_vocab_size (line 119) | def target_vocab_size(self):
    method feature_encoders (line 122) | def feature_encoders(self, data_dir):
    method hparams (line 131) | def hparams(self, defaults, model_hparams):
  class ParsingEnglishPtb16k (line 146) | class ParsingEnglishPtb16k(problem.Problem):
    method vocab_prefix (line 150) | def vocab_prefix(self):
    method inputs_target_vocab_size (line 154) | def inputs_target_vocab_size(self):
    method targets_target_vocab_size (line 158) | def targets_target_vocab_size(self):
    method feature_encoders (line 161) | def feature_encoders(self, data_dir):
    method hparams (line 175) | def hparams(self, defaults, model_hparams):
  class TestProblem (line 187) | class TestProblem(problem.Problem):
    method __init__ (line 190) | def __init__(self, input_vocab_size, target_vocab_size):
    method hparams (line 195) | def hparams(self, defaults, model_hparams):
  function test_problem_hparams (line 203) | def test_problem_hparams(input_vocab_size=None,

FILE: tensor2tensor/data_generators/problem_test.py
  function assert_tensors_equal (line 37) | def assert_tensors_equal(sess, t1, t2, n):
  class ProblemTest (line 53) | class ProblemTest(parameterized.TestCase, tf.test.TestCase):
    method setUpClass (line 56) | def setUpClass(cls):
    method testNoShuffleDeterministic (line 60) | def testNoShuffleDeterministic(self):
    method testNoShufflePreprocess (line 73) | def testNoShufflePreprocess(self):
    method testProblemHparamsModality (line 90) | def testProblemHparamsModality(self):
    method testProblemHparamsInputOnlyModality (line 100) | def testProblemHparamsInputOnlyModality(self):
    method testProblemHparamsTargetOnlyModality (line 115) | def testProblemHparamsTargetOnlyModality(self):
    method testDataFilenames (line 130) | def testDataFilenames(self):
    method testServingInputFnUseTpu (line 155) | def testServingInputFnUseTpu(self):
    method testInputAndTargetVocabSizesAreReversed (line 194) | def testInputAndTargetVocabSizesAreReversed(self):
    method testInputAndTargetModalitiesAreReversed (line 216) | def testInputAndTargetModalitiesAreReversed(self):

FILE: tensor2tensor/data_generators/program_search.py
  class ProgramSearchAlgolisp (line 35) | class ProgramSearchAlgolisp(text_problems.Text2TextProblem):
    method _extract_filename_from_url (line 55) | def _extract_filename_from_url(url):
    method _flatten_target_programs (line 65) | def _flatten_target_programs(iterable):
    method _parse_json_to_dict (line 78) | def _parse_json_to_dict(json_line):
    method is_generate_per_split (line 98) | def is_generate_per_split(self):
    method maybe_download_dataset (line 102) | def maybe_download_dataset(self, tmp_dir, dataset_split):
    method generate_samples (line 116) | def generate_samples(self, data_dir, tmp_dir, dataset_split):

FILE: tensor2tensor/data_generators/program_search_test.py
  class ProgramSearchAlgolispStub (line 34) | class ProgramSearchAlgolispStub(program_search.ProgramSearchAlgolisp):
    method maybe_download_dataset (line 66) | def maybe_download_dataset(self, tmp_dir, dataset_split):
  class ProgramSearchAlgolispTest (line 76) | class ProgramSearchAlgolispTest(tf.test.TestCase):
    method setUpClass (line 79) | def setUpClass(cls):
    method tearDownClass (line 86) | def tearDownClass(cls):
    method testEndToEnd (line 90) | def testEndToEnd(self):

FILE: tensor2tensor/data_generators/ptb.py
  function _read_words (line 39) | def _read_words(filename):
  function _build_vocab (line 48) | def _build_vocab(filename, vocab_path, vocab_size):
  function _get_token_encoder (line 67) | def _get_token_encoder(vocab_dir, vocab_name, filename):
  function _maybe_download_corpus (line 75) | def _maybe_download_corpus(tmp_dir, vocab_type):
  class LanguagemodelPtb10k (line 111) | class LanguagemodelPtb10k(text_problems.Text2SelfProblem):
    method dataset_splits (line 115) | def dataset_splits(self):
    method is_generate_per_split (line 125) | def is_generate_per_split(self):
    method vocab_filename (line 129) | def vocab_filename(self):
    method vocab_type (line 133) | def vocab_type(self):
    method generate_samples (line 136) | def generate_samples(self, data_dir, tmp_dir, dataset_split):
  class LanguagemodelPtbCharacters (line 164) | class LanguagemodelPtbCharacters(LanguagemodelPtb10k):
    method vocab_type (line 168) | def vocab_type(self):

FILE: tensor2tensor/data_generators/qnli.py
  class QuestionNLI (line 35) | class QuestionNLI(text_problems.TextConcat2ClassProblem):
    method is_generate_per_split (line 45) | def is_generate_per_split(self):
    method dataset_splits (line 49) | def dataset_splits(self):
    method approx_vocab_size (line 59) | def approx_vocab_size(self):
    method num_classes (line 63) | def num_classes(self):
    method class_labels (line 66) | def class_labels(self, data_dir):
    method _maybe_download_corpora (line 71) | def _maybe_download_corpora(self, tmp_dir):
    method example_generator (line 83) | def example_generator(self, filename):
    method generate_samples (line 96) | def generate_samples(self, data_dir, tmp_dir, dataset_split):
  class QuestionNLICharacters (line 109) | class QuestionNLICharacters(QuestionNLI):
    method vocab_type (line 113) | def vocab_type(self):
    method global_task_id (line 116) | def global_task_id(self):

FILE: tensor2tensor/data_generators/quora_qpairs.py
  class QuoraQuestionPairs (line 35) | class QuoraQuestionPairs(text_problems.TextConcat2ClassProblem):
    method is_generate_per_split (line 45) | def is_generate_per_split(self):
    method dataset_splits (line 49) | def dataset_splits(self):
    method approx_vocab_size (line 59) | def approx_vocab_size(self):
    method num_classes (line 63) | def num_classes(self):
    method class_labels (line 66) | def class_labels(self, data_dir):
    method _maybe_download_corpora (line 70) | def _maybe_download_corpora(self, tmp_dir):
    method example_generator (line 82) | def example_generator(self, filename):
    method generate_samples (line 102) | def generate_samples(self, data_dir, tmp_dir, dataset_split):
  class QuoraQuestionPairsCharacters (line 115) | class QuoraQuestionPairsCharacters(QuoraQuestionPairs):
    method vocab_type (line 119) | def vocab_type(self):
    method global_task_id (line 122) | def global_task_id(self):

FILE: tensor2tensor/data_generators/rte.py
  class RTE (line 35) | class RTE(text_problems.TextConcat2ClassProblem):
    method is_generate_per_split (line 45) | def is_generate_per_split(self):
    method dataset_splits (line 49) | def dataset_splits(self):
    method approx_vocab_size (line 59) | def approx_vocab_size(self):
    method num_classes (line 63) | def num_classes(self):
    method class_labels (line 66) | def class_labels(self, data_dir):
    method _maybe_download_corpora (line 71) | def _maybe_download_corpora(self, tmp_dir):
    method example_generator (line 83) | def example_generator(self, filename):
    method generate_samples (line 96) | def generate_samples(self, data_dir, tmp_dir, dataset_split):
  class RTECharacters (line 109) | class RTECharacters(RTE):
    method vocab_type (line 113) | def vocab_type(self):
    method global_task_id (line 116) | def global_task_id(self):

FILE: tensor2tensor/data_generators/scitail.py
  class SciTail (line 36) | class SciTail(text_problems.TextConcat2ClassProblem):
    method is_generate_per_split (line 44) | def is_generate_per_split(self):
    method dataset_splits (line 48) | def dataset_splits(self):
    method approx_vocab_size (line 58) | def approx_vocab_size(self):
    method num_classes (line 62) | def num_classes(self):
    method class_labels (line 65) | def class_labels(self, data_dir):
    method _maybe_download_corpora (line 70) | def _maybe_download_corpora(self, tmp_dir):
    method example_generator (line 82) | def example_generator(self, filename):
    method generate_samples (line 95) | def generate_samples(self, data_dir, tmp_dir, dataset_split):
  class SciTailCharacters (line 108) | class SciTailCharacters(SciTail):
    method vocab_type (line 112) | def vocab_type(self):
    method global_task_id (line 115) | def global_task_id(self):
  class SciTailSharedVocab (line 120) | class SciTailSharedVocab(SciTail):
    method vocab_filename (line 124) | def vocab_filename(self):

FILE: tensor2tensor/data_generators/seq2edits.py
  function pointer_top (line 33) | def pointer_top(body_output, targets, model_hparams, vocab_size):
  function pointer_bottom (line 39) | def pointer_bottom(x, model_hparams, vocab_size):
  class Seq2editsGec (line 46) | class Seq2editsGec(text_problems.Text2TextProblem):
    method dataset_filename (line 49) | def dataset_filename(self):
    method vocab_file (line 53) | def vocab_file(self):
    method vocab_filename (line 57) | def vocab_filename(self):
    method error_tag_vocab_file (line 61) | def error_tag_vocab_file(self):
    method feature_encoders (line 64) | def feature_encoders(self, data_dir):
    method hparams (line 75) | def hparams(self, defaults, model_hparams):
    method example_reading_spec (line 114) | def example_reading_spec(self):
  class Seq2editsGecPacked256 (line 123) | class Seq2editsGecPacked256(Seq2editsGec):
    method dataset_filename (line 126) | def dataset_filename(self):
    method packed_length (line 130) | def packed_length(self):
    method max_segment_length (line 134) | def max_segment_length(self):
  class Seq2editsGecNoTags (line 139) | class Seq2editsGecNoTags(Seq2editsGec):
    method dataset_filename (line 142) | def dataset_filename(self):
    method hparams (line 145) | def hparams(self, defaults, model_hparams):
  class Seq2editsGecNoTagsPacked256 (line 151) | class Seq2editsGecNoTagsPacked256(Seq2editsGecPacked256):
    method dataset_filename (line 154) | def dataset_filename(self):
    method hparams (line 157) | def hparams(self, defaults, model_hparams):
  class Seq2editsGecDeep (line 163) | class Seq2editsGecDeep(Seq2editsGec):
    method hparams (line 166) | def hparams(self, defaults, model_hparams):
  class Seq2editsGecDeepPacked256 (line 172) | class Seq2editsGecDeepPacked256(Seq2editsGecPacked256):
    method hparams (line 175) | def hparams(self, defaults, model_hparams):
  class Seq2editsGecDeepNoTags (line 181) | class Seq2editsGecDeepNoTags(Seq2editsGec):
    method hparams (line 184) | def hparams(self, defaults, model_hparams):
  class Seq2editsGecDeepNoTagsPacked256 (line 191) | class Seq2editsGecDeepNoTagsPacked256(Seq2editsGecPacked256):
    method hparams (line 194) | def hparams(self, defaults, model_hparams):
  class Seq2editsTextnorm (line 202) | class Seq2editsTextnorm(Seq2editsGec):
    method dataset_filename (line 205) | def dataset_filename(self):
    method source_vocab_file (line 209) | def source_vocab_file(self):
    method target_vocab_file (line 213) | def target_vocab_file(self):
    method error_tag_vocab_file (line 217) | def error_tag_vocab_file(self):
    method feature_encoders (line 220) | def feature_encoders(self, data_dir):
  class Seq2editsTextnormPacked256 (line 235) | class Seq2editsTextnormPacked256(Seq2editsTextnorm):
    method dataset_filename (line 238) | def dataset_filename(self):
    method packed_length (line 242) | def packed_length(self):
    method max_segment_length (line 246) | def max_segment_length(self):
  class Seq2editsTextnormNoTags (line 251) | class Seq2editsTextnormNoTags(Seq2editsTextnorm):
    method hparams (line 254) | def hparams(self, defaults, model_hparams):
  class Seq2editsTextnormNoTagsPacked256 (line 260) | class Seq2editsTextnormNoTagsPacked256(Seq2editsTextnormPacked256):
    method hparams (line 263) | def hparams(self, defaults, model_hparams):

FILE: tensor2tensor/data_generators/snli.py
  function _download_and_parse_dataset (line 51) | def _download_and_parse_dataset(tmp_dir, train):
  function _get_tokens_and_tags (line 63) | def _get_tokens_and_tags(parse_str):
  function _parse_dataset (line 76) | def _parse_dataset(file_path, tmp_dir, train):
  function _get_or_generate_vocab (line 131) | def _get_or_generate_vocab(tmp_dir, vocab_filename, vocab_size):
  function snli_token_generator (line 149) | def snli_token_generator(tmp_dir, train, vocab_size):

FILE: tensor2tensor/data_generators/speech_recognition.py
  class ByteTextEncoderWithEos (line 35) | class ByteTextEncoderWithEos(text_encoder.ByteTextEncoder):
    method encode (line 38) | def encode(self, s):
  class SpeechRecognitionProblem (line 42) | class SpeechRecognitionProblem(problem.Problem):
    method hparams (line 45) | def hparams(self, defaults, model_hparams):
    method is_character_level (line 73) | def is_character_level(self):
    method input_space_id (line 77) | def input_space_id(self):
    method target_space_id (line 81) | def target_space_id(self):
    method feature_encoders (line 84) | def feature_encoders(self, _):
    method example_reading_spec (line 93) | def example_reading_spec(self):
    method preprocess_example (line 103) | def preprocess_example(self, example, mode, hparams):
    method eval_metrics (line 144) | def eval_metrics(self):

FILE: tensor2tensor/data_generators/squad.py
  function _generate_examples (line 39) | def _generate_examples(tmp_dir, dataset_split):
  class SquadText2text (line 89) | class SquadText2text(text_problems.Text2TextProblem):
    method is_generate_per_split (line 93) | def is_generate_per_split(self):
    method generate_samples (line 96) | def generate_samples(self, data_dir, tmp_dir, dataset_split):
  class SquadText2textMulti64kPacked1k (line 107) | class SquadText2textMulti64kPacked1k(SquadText2text):
    method packed_length (line 111) | def packed_length(self):
    method use_vocab_from_other_problem (line 115) | def use_vocab_from_other_problem(self):
    method num_training_examples (line 119) | def num_training_examples(self):
  class Squad (line 124) | class Squad(text_problems.QuestionAndContext2TextProblem):
    method dataset_splits (line 128) | def dataset_splits(self):
    method is_generate_per_split (line 138) | def is_generate_per_split(self):
    method generate_samples (line 141) | def generate_samples(self, data_dir, tmp_dir, dataset_split):
  class SquadConcat (line 152) | class SquadConcat(Squad):
    method dataset_filename (line 155) | def dataset_filename(self):
    method preprocess_example (line 158) | def preprocess_example(self, example, unused_mode, unused_model_hparams):
    method hparams (line 165) | def hparams(self, defaults, unused_model_hparams):
  class SquadConcatMulti64k (line 174) | class SquadConcatMulti64k(SquadConcat):
    method dataset_splits (line 178) | def dataset_splits(self):
    method preprocess_example (line 187) | def preprocess_example(self, example, unused_mode, unused_model_hparams):
    method dataset_filename (line 195) | def dataset_filename(self):
    method use_vocab_from_other_problem (line 199) | def use_vocab_from_other_problem(self):
  class SquadConcatSharedVocab (line 204) | class SquadConcatSharedVocab(SquadConcatMulti64k):
    method dataset_filename (line 207) | def dataset_filename(self):
    method use_vocab_from_other_problem (line 211) | def use_vocab_from_other_problem(self):
  class SquadConcatPositioned (line 216) | class SquadConcatPositioned(SquadConcat):
    method generate_targets (line 219) | def generate_targets(self, targets, context):
    method generate_encoded_samples (line 231) | def generate_encoded_samples(self, data_dir, tmp_dir, dataset_split):

FILE: tensor2tensor/data_generators/sst_binary.py
  class SentimentSSTBinary (line 35) | class SentimentSSTBinary(text_problems.Text2ClassProblem):
    method is_generate_per_split (line 45) | def is_generate_per_split(self):
    method dataset_splits (line 49) | def dataset_splits(self):
    method approx_vocab_size (line 59) | def approx_vocab_size(self):
    method num_classes (line 63) | def num_classes(self):
    method class_labels (line 66) | def class_labels(self, data_dir):
    method _maybe_download_corpora (line 71) | def _maybe_download_corpora(self, tmp_dir):
    method example_generator (line 83) | def example_generator(self, filename):
    method generate_samples (line 93) | def generate_samples(self, data_dir, tmp_dir, dataset_split):
  class SentimentSSTBinaryCharacters (line 106) | class SentimentSSTBinaryCharacters(SentimentSSTBinary):
    method vocab_type (line 110) | def vocab_type(self):
    method global_task_id (line 113) | def global_task_id(self):

FILE: tensor2tensor/data_generators/stanford_nli.py
  class StanfordNLI (line 37) | class StanfordNLI(text_problems.TextConcat2ClassProblem):
    method is_generate_per_split (line 44) | def is_generate_per_split(self):
    method dataset_splits (line 48) | def dataset_splits(self):
    method approx_vocab_size (line 58) | def approx_vocab_size(self):
    method num_classes (line 62) | def num_classes(self):
    method class_labels (line 65) | def class_labels(self, data_dir):
    method _maybe_download_corpora (line 70) | def _maybe_download_corpora(self, tmp_dir):
    method example_generator (line 82) | def example_generator(self, filename):
    method generate_samples (line 99) | def generate_samples(self, data_dir, tmp_dir, dataset_split):
  class StanfordNLICharacters (line 112) | class StanfordNLICharacters(StanfordNLI):
    method vocab_type (line 116) | def vocab_type(self):
    method global_task_id (line 119) | def global_task_id(self):
  class StanfordNLISharedVocab (line 124) | class StanfordNLISharedVocab(StanfordNLI):
    method vocab_filename (line 128) | def vocab_filename(self):
  class StanfordNLIWikiLMSharedVocab (line 133) | class StanfordNLIWikiLMSharedVocab(StanfordNLI):
    method vocab_filename (line 137) | def vocab_filename(self):
  class StanfordNLIWikiLMSharedVocab64k (line 142) | class StanfordNLIWikiLMSharedVocab64k(StanfordNLIWikiLMSharedVocab):
    method vocab_filename (line 146) | def vocab_filename(self):

FILE: tensor2tensor/data_generators/style_transfer.py
  class StyleTransferProblemShakespeare (line 57) | class StyleTransferProblemShakespeare(text_problems.Text2TextProblem):
    method target (line 61) | def target(self):
    method source (line 65) | def source(self):
    method dataset_url (line 68) | def dataset_url(self, dataset_split):
    method vocab_data_files (line 74) | def vocab_data_files(self):
    method approx_vocab_size (line 79) | def approx_vocab_size(self):
    method dataset_splits (line 83) | def dataset_splits(self):
    method is_generate_per_split (line 94) | def is_generate_per_split(self):
    method generate_samples (line 97) | def generate_samples(self, data_dir, tmp_dir, dataset_split):
    method source_target_paths (line 118) | def source_target_paths(self, dataset_split, tmp_dir):
  class StyleTransferShakespeareToModern (line 126) | class StyleTransferShakespeareToModern(StyleTransferProblemShakespeare):
    method target (line 130) | def target(self):
    method source (line 134) | def source(self):
  class StyleTransferModernToShakespeare (line 139) | class StyleTransferModernToShakespeare(StyleTransferProblemShakespeare):
    method target (line 143) | def target(self):
    method source (line 147) | def source(self):
  class StyleTransferShakespeareToModernCharacters (line 152) | class StyleTransferShakespeareToModernCharacters(
    method vocab_type (line 156) | def vocab_type(self):
  class StyleTransferModernToShakespeareCharacters (line 161) | class StyleTransferModernToShakespeareCharacters(
    method vocab_type (line 165) | def vocab_type(self):

FILE: tensor2tensor/data_generators/style_transfer_test.py
  class StyleTransferProblemShakespeareTest (line 27) | class StyleTransferProblemShakespeareTest(tf.test.TestCase):
    method testSourceAndTargetPathsTrainModern2Shakespeare (line 29) | def testSourceAndTargetPathsTrainModern2Shakespeare(self):
    method testSourceAndTargetPathsTrainShakespeare2Modern (line 43) | def testSourceAndTargetPathsTrainShakespeare2Modern(self):
    method testSourceAndTargetPathsDevModern2Shakespeare (line 57) | def testSourceAndTargetPathsDevModern2Shakespeare(self):
    method testSourceAndTargetPathsDevShakespeare2Modern (line 71) | def testSourceAndTargetPathsDevShakespeare2Modern(self):

FILE: tensor2tensor/data_generators/subject_verb_agreement.py
  function _build_vocab (line 50) | def _build_vocab(examples, example_field, vocab_dir, vocab_name):
  function load_examples (line 77) | def load_examples(tmp_dir, prop_train=0.09, prop_val=0.01):
  class SvaNumberPrediction (line 115) | class SvaNumberPrediction(text_problems.Text2ClassProblem):
    method is_generate_per_split (line 119) | def is_generate_per_split(self):
    method dataset_splits (line 124) | def dataset_splits(self):
    method train_proportion (line 145) | def train_proportion(self):
    method validation_proportion (line 150) | def validation_proportion(self):
    method vocab_type (line 155) | def vocab_type(self):
    method num_classes (line 159) | def num_classes(self):
    method class_labels (line 162) | def class_labels(self, data_dir):
    method generate_samples (line 167) | def generate_samples(self, data_dir, tmp_dir, dataset_split):
    method eval_metrics (line 209) | def eval_metrics(self):
  class SvaLanguageModeling (line 220) | class SvaLanguageModeling(text_problems.Text2SelfProblem):
    method is_generate_per_split (line 224) | def is_generate_per_split(self):
    method dataset_splits (line 229) | def dataset_splits(self):
    method train_proportion (line 250) | def train_proportion(self):
    method validation_proportion (line 255) | def validation_proportion(self):
    method vocab_type (line 260) | def vocab_type(self):
    method generate_samples (line 263) | def generate_samples(self, data_dir, tmp_dir, dataset_split):

FILE: tensor2tensor/data_generators/text_encoder.py
  function native_to_unicode (line 62) | def native_to_unicode(s):
  function unicode_to_native (line 73) | def unicode_to_native(s):
  function is_unicode (line 80) | def is_unicode(s):
  function to_unicode (line 84) | def to_unicode(s, ignore_errors=False):
  function to_unicode_ignore_errors (line 91) | def to_unicode_ignore_errors(s):
  function to_unicode_utf8 (line 95) | def to_unicode_utf8(s):
  function strip_ids (line 99) | def strip_ids(ids, ids_to_strip):
  class TextEncoder (line 107) | class TextEncoder(object):
    method __init__ (line 110) | def __init__(self, num_reserved_ids=NUM_RESERVED_TOKENS):
    method num_reserved_ids (line 114) | def num_reserved_ids(self):
    method encode (line 117) | def encode(self, s):
    method decode (line 133) | def decode(self, ids, strip_extraneous=False):
    method decode_list (line 150) | def decode_list(self, ids):
    method vocab_size (line 172) | def vocab_size(self):
  class ByteTextEncoder (line 176) | class ByteTextEncoder(TextEncoder):
    method encode (line 179) | def encode(self, s):
    method decode (line 188) | def decode(self, ids, strip_extraneous=False):
    method decode_list (line 204) | def decode_list(self, ids):
    method vocab_size (line 217) | def vocab_size(self):
  class ClassLabelEncoder (line 221) | class ClassLabelEncoder(TextEncoder):
    method __init__ (line 224) | def __init__(self, class_labels=None, class_labels_fname=None):
    method encode (line 234) | def encode(self, s):
    method decode (line 238) | def decode(self, ids, strip_extraneous=False):
    method decode_list (line 248) | def decode_list(self, ids):
    method vocab_size (line 252) | def vocab_size(self):
  class OneHotClassLabelEncoder (line 256) | class OneHotClassLabelEncoder(ClassLabelEncoder):
    method encode (line 259) | def encode(self, label_str, on_value=1, off_value=0):  # pylint: disab...
    method decode (line 264) | def decode(self, ids, strip_extraneous=False):
    method vocab_size (line 274) | def vocab_size(self):
  class TokenTextEncoder (line 278) | class TokenTextEncoder(TextEncoder):
    method __init__ (line 281) | def __init__(self,
    method encode (line 314) | def encode(self, s):
    method decode (line 324) | def decode(self, ids, strip_extraneous=False):
    method decode_list (line 327) | def decode_list(self, ids):
    method vocab_size (line 332) | def vocab_size(self):
    method _safe_id_to_token (line 335) | def _safe_id_to_token(self, idx):
    method _init_vocab_from_file (line 338) | def _init_vocab_from_file(self, filename):
    method _init_vocab_from_list (line 353) | def _init_vocab_from_list(self, vocab_list):
    method _init_vocab (line 369) | def _init_vocab(self, token_generator, add_reserved_tokens=True):
    method store_to_file (line 386) | def store_to_file(self, filename):
  function _escape_token (line 400) | def _escape_token(token, alphabet):
  function _unescape_token (line 425) | def _unescape_token(escaped_token):
  class SubwordTextEncoder (line 448) | class SubwordTextEncoder(TextEncoder):
    method __init__ (line 481) | def __init__(self, filename=None):
    method encode (line 494) | def encode(self, s):
    method encode_without_tokenizing (line 505) | def encode_without_tokenizing(self, token_text):
    method decode (line 522) | def decode(self, ids, strip_extraneous=False):
    method decode_list (line 538) | def decode_list(self, ids):
    method vocab_size (line 542) | def vocab_size(self):
    method _tokens_to_subtoken_ids (line 546) | def _tokens_to_subtoken_ids(self, tokens):
    method _token_to_subtoken_ids (line 559) | def _token_to_subtoken_ids(self, token):
    method _subtoken_ids_to_tokens (line 576) | def _subtoken_ids_to_tokens(self, subtokens):
    method _subtoken_id_to_subtoken_string (line 595) | def _subtoken_id_to_subtoken_string(self, subtoken):
    method _escaped_token_to_subtoken_strings (line 601) | def _escaped_token_to_subtoken_strings(self, escaped_token):
    method _escaped_token_to_subtoken_ids (line 633) | def _escaped_token_to_subtoken_ids(self, escaped_token):
    method build_from_generator (line 647) | def build_from_generator(cls,
    method build_to_target_size (line 679) | def build_to_target_size(cls,
    method build_from_token_counts (line 752) | def build_from_token_counts(self,
    method all_subtoken_strings (line 871) | def all_subtoken_strings(self):
    method dump (line 874) | def dump(self):
    method _init_subtokens_from_list (line 881) | def _init_subtokens_from_list(self, subtoken_strings, reserved_tokens=...
    method _init_alphabet_from_tokens (line 914) | def _init_alphabet_from_tokens(self, tokens):
    method _load_from_file_object (line 921) | def _load_from_file_object(self, f):
    method _load_from_file (line 938) | def _load_from_file(self, filename):
    method store_to_file (line 945) | def store_to_file(self, filename, add_single_quotes=True):
  class ImageEncoder (line 954) | class ImageEncoder(object):
    method __init__ (line 957) | def __init__(self, num_reserved_ids=0, height=None, width=None, channe...
    method num_reserved_ids (line 964) | def num_reserved_ids(self):
    method encode (line 967) | def encode(self, s):
    method decode (line 984) | def decode(self, ids, strip_extraneous=False):
    method decode_list (line 1022) | def decode_list(self, ids):
    method vocab_size (line 1034) | def vocab_size(self):
  class RealEncoder (line 1038) | class RealEncoder(object):
    method encode (line 1041) | def encode(self, s):
    method decode (line 1052) | def decode(self, ids, strip_extraneous=False):

FILE: tensor2tensor/data_generators/text_encoder_build_subword.py
  function main (line 53) | def main(unused_argv):

FILE: tensor2tensor/data_generators/text_encoder_test.py
  class NativeToUnicodeTest (line 38) | class NativeToUnicodeTest(tf.test.TestCase):
    method test_native_to_unicode (line 40) | def test_native_to_unicode(self):
  class EscapeUnescapeTokenTest (line 48) | class EscapeUnescapeTokenTest(tf.test.TestCase):
    method test_escape_token (line 50) | def test_escape_token(self):
    method test_unescape_token (line 58) | def test_unescape_token(self):
  class TokenTextEncoderTest (line 66) | class TokenTextEncoderTest(tf.test.TestCase):
    method setUpClass (line 69) | def setUpClass(cls):
    method test_save_and_reload (line 75) | def test_save_and_reload(self):
    method test_reserved_tokens_in_corpus (line 95) | def test_reserved_tokens_in_corpus(self):
  class SubwordTextEncoderTest (line 110) | class SubwordTextEncoderTest(tf.test.TestCase):
    method setUpClass (line 113) | def setUpClass(cls):
    method test_encode_decode (line 119) | def test_encode_decode(self):
    method test_unicode (line 153) | def test_unicode(self):
    method test_small_vocab (line 163) | def test_small_vocab(self):
    method test_long_tokens (line 178) | def test_long_tokens(self):
    method test_custom_reserved_tokens (line 213) | def test_custom_reserved_tokens(self):
    method test_encodable_when_not_in_alphabet (line 233) | def test_encodable_when_not_in_alphabet(self):
    method test_raises_exception_when_not_encodable (line 251) | def test_raises_exception_when_not_encodable(self):
    method test_load_from_file (line 265) | def test_load_from_file(self):
    method test_reserved_token_chars_not_in_alphabet (line 283) | def test_reserved_token_chars_not_in_alphabet(self):
    method test_save_and_reload (line 300) | def test_save_and_reload(self):
    method test_save_and_reload_no_single_quotes (line 320) | def test_save_and_reload_no_single_quotes(self):
    method test_build_from_generator (line 340) | def test_build_from_generator(self):
  class OneHotClassLabelEncoderTest (line 367) | class OneHotClassLabelEncoderTest(tf.test.TestCase):
    method test_one_hot_encode (line 369) | def test_one_hot_encode(self):
    method test_one_hot_decode (line 376) | def test_one_hot_decode(self):

FILE: tensor2tensor/data_generators/text_problems.py
  class VocabType (line 46) | class VocabType(object):
  class Text2TextProblem (line 53) | class Text2TextProblem(problem.Problem):
    method dataset_splits (line 63) | def dataset_splits(self):
    method is_generate_per_split (line 74) | def is_generate_per_split(self):
    method generate_samples (line 91) | def generate_samples(self, data_dir, tmp_dir, dataset_split):
    method vocab_type (line 115) | def vocab_type(self):
    method approx_vocab_size (line 136) | def approx_vocab_size(self):
    method additional_reserved_tokens (line 141) | def additional_reserved_tokens(self):
    method oov_token (line 151) | def oov_token(self):
    method max_samples_for_vocab (line 156) | def max_samples_for_vocab(self):
    method packed_length (line 169) | def packed_length(self):
    method packed_spacing (line 181) | def packed_spacing(self):
    method has_inputs (line 192) | def has_inputs(self):
    method max_length (line 195) | def max_length(self, model_hparams):
    method feature_encoders (line 199) | def feature_encoders(self, data_dir):
    method generate_text_for_vocab (line 206) | def generate_text_for_vocab(self, data_dir, tmp_dir):
    method vocab_filename (line 216) | def vocab_filename(self):
    method use_vocab_from_other_problem (line 228) | def use_vocab_from_other_problem(self):
    method get_or_create_vocab (line 239) | def get_or_create_vocab(self, data_dir, tmp_dir, force_get=False):
    method _pack_fn (line 265) | def _pack_fn(self):
    method _maybe_pack_examples (line 291) | def _maybe_pack_examples(self, generator):
    method generate_encoded_samples (line 302) | def generate_encoded_samples(self, data_dir, tmp_dir, dataset_split):
    method max_subtoken_length (line 316) | def max_subtoken_length(self):
    method batch_size_means_tokens (line 328) | def batch_size_means_tokens(self):
    method already_shuffled (line 332) | def already_shuffled(self):
    method inputs_prefix (line 336) | def inputs_prefix(self):
    method targets_prefix (line 341) | def targets_prefix(self):
    method generate_data (line 345) | def generate_data(self, data_dir, tmp_dir, task_id=-1):
    method hparams (line 371) | def hparams(self, defaults, unused_model_hparams):
    method example_reading_spec (line 394) | def example_reading_spec(self):
    method eval_metrics (line 409) | def eval_metrics(self):
  class QuestionAndContext2TextProblem (line 418) | class QuestionAndContext2TextProblem(Text2TextProblem):
    method additional_reserved_tokens (line 428) | def additional_reserved_tokens(self):
    method feature_encoders (line 431) | def feature_encoders(self, data_dir):
    method generate_text_for_vocab (line 437) | def generate_text_for_vocab(self, data_dir, tmp_dir):
    method generate_encoded_samples (line 446) | def generate_encoded_samples(self, data_dir, tmp_dir, dataset_split):
    method hparams (line 457) | def hparams(self, defaults, unused_model_hparams):
    method example_reading_spec (line 467) | def example_reading_spec(self):
  class Text2SelfProblem (line 475) | class Text2SelfProblem(Text2TextProblem):
    method generate_samples (line 481) | def generate_samples(self, data_dir, tmp_dir, dataset_split):
    method has_inputs (line 500) | def has_inputs(self):
  class Text2ClassProblem (line 504) | class Text2ClassProblem(Text2TextProblem):
    method generate_samples (line 507) | def generate_samples(self, data_dir, tmp_dir, dataset_split):
    method num_classes (line 528) | def num_classes(self):
    method class_labels (line 532) | def class_labels(self, data_dir):
    method generate_text_for_vocab (line 539) | def generate_text_for_vocab(self, data_dir, tmp_dir):
    method generate_encoded_samples (line 546) | def generate_encoded_samples(self, data_dir, tmp_dir, dataset_split):
    method feature_encoders (line 555) | def feature_encoders(self, data_dir):
    method hparams (line 563) | def hparams(self, defaults, unused_model_hparams):
    method example_reading_spec (line 570) | def example_reading_spec(self):
  class TextConcat2ClassProblem (line 579) | class TextConcat2ClassProblem(Text2ClassProblem):
    method generate_text_for_vocab (line 587) | def generate_text_for_vocab(self, data_dir, tmp_dir):
    method generate_encoded_samples (line 595) | def generate_encoded_samples(self, data_dir, tmp_dir, dataset_split):
  class Text2RealProblem (line 609) | class Text2RealProblem(Text2TextProblem):
    method ntasks (line 618) | def ntasks(self):
    method generate_samples (line 622) | def generate_samples(self, data_dir, tmp_dir, dataset_split):
    method generate_text_for_vocab (line 639) | def generate_text_for_vocab(self, data_dir, tmp_dir):
    method generate_encoded_samples (line 646) | def generate_encoded_samples(self, data_dir, tmp_dir, dataset_split):
    method feature_encoders (line 654) | def feature_encoders(self, data_dir):
    method hparams (line 662) | def hparams(self, defaults, unused_model_hparams):
    method max_length (line 675) | def max_length(self, model_hparams):
    method preprocess_example (line 678) | def preprocess_example(self, example, unused_mode, unused_hparams):
    method example_reading_spec (line 684) | def example_reading_spec(self):
    method eval_metrics (line 692) | def eval_metrics(self):
  function txt_line_iterator (line 699) | def txt_line_iterator(txt_path):
  function txt_and_label_iterator (line 706) | def txt_and_label_iterator(txt_path):
  function text2text_txt_iterator (line 724) | def text2text_txt_iterator(source_txt_path, target_txt_path):
  function text2text_txt_iterator_with_label (line 731) | def text2text_txt_iterator_with_label(source_txt_path, target_txt_path):
  function text2text_txt_iterator_with_index (line 739) | def text2text_txt_iterator_with_index(source_txt_path, target_txt_path):
  function text2text_distill_iterator (line 747) | def text2text_distill_iterator(source_txt_path, target_txt_path,
  function text2self_txt_iterator (line 756) | def text2self_txt_iterator(txt_path):
  function text2class_txt_iterator (line 761) | def text2class_txt_iterator(source_txt_path, label_txt_path, class_strs=...
  function text2real_txt_iterator (line 786) | def text2real_txt_iterator(source_txt_path, target_txt_path):
  function txt_line_sharded_iterator (line 802) | def txt_line_sharded_iterator(txt_pattern):
  function text2text_txt_sharded_iterator (line 811) | def text2text_txt_sharded_iterator(source_txt_pattern, target_txt_pattern):
  function text2text_txt_tab_iterator (line 828) | def text2text_txt_tab_iterator(txt_path):
  function text2text_generate_encoded (line 849) | def text2text_generate_encoded(sample_generator,
  class Text2textTmpdir (line 867) | class Text2textTmpdir(Text2TextProblem):
    method is_generate_per_split (line 882) | def is_generate_per_split(self):
    method generate_samples (line 885) | def generate_samples(self, data_dir, tmp_dir, dataset_split):
    method _tmp_dir_override (line 894) | def _tmp_dir_override(self):
  class Text2TextRemotedir (line 898) | class Text2TextRemotedir(Text2textTmpdir):
    method _tmp_dir_override (line 914) | def _tmp_dir_override(self):
  class Text2textTmpdirTokens (line 920) | class Text2textTmpdirTokens(Text2textTmpdir):
    method vocab_type (line 935) | def vocab_type(self):
    method oov_token (line 939) | def oov_token(self):
    method _generate_vocab (line 942) | def _generate_vocab(self, tmp_dir):
    method generate_samples (line 952) | def generate_samples(self, data_dir, tmp_dir, dataset_split):
  class ChoppedTextProblem (line 962) | class ChoppedTextProblem(Text2SelfProblem):
    method train_text_filepaths (line 972) | def train_text_filepaths(self, tmp_dir):
    method dev_text_filepaths (line 984) | def dev_text_filepaths(self, tmp_dir):
    method sequence_length (line 997) | def sequence_length(self):
    method max_length (line 1001) | def max_length(self, model_hparams):
    method text_filepaths_for_task (line 1004) | def text_filepaths_for_task(self, tmp_dir, task_id):
    method filepath_to_unicode_strings (line 1026) | def filepath_to_unicode_strings(self, filepath):
    method file_generator (line 1044) | def file_generator(self,
    method example_generator (line 1084) | def example_generator(self, encoder, tmp_dir, task_id):
    method remainder_policy (line 1119) | def remainder_policy(self):
    method prepare_to_generate (line 1127) | def prepare_to_generate(self, data_dir, tmp_dir):
    method generate_text_for_vocab (line 1133) | def generate_text_for_vocab(self, data_dir, tmp_dir):
    method generate_data (line 1138) | def generate_data(self, data_dir, tmp_dir, task_id=-1):
    method max_chars_for_vocab (line 1163) | def max_chars_for_vocab(self):
    method num_train_shards (line 1168) | def num_train_shards(self):
    method num_dev_shards (line 1172) | def num_dev_shards(self):
    method max_dev_chars (line 1176) | def max_dev_chars(self):
    method multiprocess_generate (line 1181) | def multiprocess_generate(self):
    method num_generate_tasks (line 1185) | def num_generate_tasks(self):
    method eval_metrics (line 1188) | def eval_metrics(self):
  class DistributedText2TextProblem (line 1192) | class DistributedText2TextProblem(Text2TextProblem):
    method generate_samples (line 1212) | def generate_samples(self, data_dir, tmp_dir, dataset_split, input_fil...
    method input_files (line 1231) | def input_files(self, dataset_split=problem.DatasetSplit.TRAIN):
    method num_output_shards (line 1249) | def num_output_shards(self):
    method split_to_input_filenames (line 1257) | def split_to_input_filenames(self):
    method _task_id_to_output_split (line 1280) | def _task_id_to_output_split(self, task_id):
    method _task_id_to_output_file (line 1291) | def _task_id_to_output_file(self, data_dir, task_id):
    method _divide_equally (line 1305) | def _divide_equally(input_files, num_tasks, task_id):
    method _task_id_to_input_files (line 1326) | def _task_id_to_input_files(self, task_id):
    method generate_text_for_vocab (line 1341) | def generate_text_for_vocab(self, data_dir, tmp_dir):
    method generate_encoded_samples (line 1369) | def generate_encoded_samples(self,
    method generate_data (line 1389) | def generate_data(self, data_dir, tmp_dir, task_id=-1):

FILE: tensor2tensor/data_generators/text_problems_test.py
  class Test1 (line 32) | class Test1(text_problems.Text2textTmpdir):
    method name (line 35) | def name(self):
    method approx_vocab_size (line 42) | def approx_vocab_size(self):
    method dataset_splits (line 46) | def dataset_splits(self):
  class TextProblems (line 56) | class TextProblems(tf.test.TestCase):
    method setUpClass (line 59) | def setUpClass(cls):
    method testTxtLineIterator (line 104) | def testTxtLineIterator(self):
    method testText2TextTxtIterator (line 108) | def testText2TextTxtIterator(self):
    method testText2SelfTxtIterator (line 118) | def testText2SelfTxtIterator(self):
    method testText2ClassTxtIterator (line 125) | def testText2ClassTxtIterator(self):
    method testText2ClassTxtIteratorWithStrs (line 135) | def testText2ClassTxtIteratorWithStrs(self):
    method testText2RealTxtIterator (line 146) | def testText2RealTxtIterator(self):
    method testText2TextTxtTabIterator (line 156) | def testText2TextTxtTabIterator(self):
    method testText2TextTmpDir (line 165) | def testText2TextTmpDir(self):
  class FakeDistributedProblem (line 204) | class FakeDistributedProblem(text_problems.DistributedText2TextProblem):
    method __init__ (line 206) | def __init__(self):
    method generate_samples (line 211) | def generate_samples(self, data_dir, tmp_dir, dataset_split, input_fil...
    method is_generate_per_split (line 220) | def is_generate_per_split(self):
    method dataset_splits (line 224) | def dataset_splits(self):
    method input_files (line 236) | def input_files(self, dataset_split=problem_lib.DatasetSplit.TRAIN):
    method setup_for_test (line 244) | def setup_for_test(cls):
  class FakeDistributedProblemNotPerSplit (line 269) | class FakeDistributedProblemNotPerSplit(FakeDistributedProblem):
    method is_generate_per_split (line 272) | def is_generate_per_split(self):
  class DistributedText2TextProblemsTest (line 276) | class DistributedText2TextProblemsTest(tf.test.TestCase):
    method setUp (line 278) | def setUp(self):
    method testOutputSharding (line 281) | def testOutputSharding(self):
    method testInputShardingNoGeneratePerSplit (line 326) | def testInputShardingNoGeneratePerSplit(self):
    method testInputShardingWithGeneratePerSplit (line 357) | def testInputShardingWithGeneratePerSplit(self):
    method testVocabularyIsAllTrain (line 400) | def testVocabularyIsAllTrain(self):

FILE: tensor2tensor/data_generators/timeseries.py
  class TimeseriesProblem (line 33) | class TimeseriesProblem(problem.Problem):
    method feature_encoders (line 36) | def feature_encoders(self, data_dir):
    method is_generate_per_split (line 44) | def is_generate_per_split(self):
    method dataset_splits (line 49) | def dataset_splits(self):
    method has_inputs (line 63) | def has_inputs(self):
    method num_train_shards (line 67) | def num_train_shards(self):
    method num_eval_shards (line 72) | def num_eval_shards(self):
    method num_test_shards (line 77) | def num_test_shards(self):
    method num_series (line 82) | def num_series(self):
    method num_input_timestamps (line 87) | def num_input_timestamps(self):
    method num_target_timestamps (line 92) | def num_target_timestamps(self):
    method timeseries_dataset (line 96) | def timeseries_dataset(self):
    method eval_metrics (line 100) | def eval_metrics(self):
    method normalizing_constant (line 105) | def normalizing_constant(self):
    method preprocess_example (line 109) | def preprocess_example(self, example, unused_mode, unused_hparams):
    method generate_samples (line 124) | def generate_samples(self, data_dir, tmp_dir, dataset_split):
    method hparams (line 152) | def hparams(self, defaults, unused_model_hparams):
    method generate_data (line 161) | def generate_data(self, data_dir, tmp_dir, task_id=-1):
    method example_reading_spec (line 187) | def example_reading_spec(self):
  class TimeseriesToyProblem (line 197) | class TimeseriesToyProblem(TimeseriesProblem):
    method num_train_shards (line 201) | def num_train_shards(self):
    method num_eval_shards (line 206) | def num_eval_shards(self):
    method num_test_shards (line 211) | def num_test_shards(self):
    method num_series (line 216) | def num_series(self):
    method num_input_timestamps (line 221) | def num_input_timestamps(self):
    method num_target_timestamps (line 226) | def num_target_timestamps(self):
    method timeseries_dataset (line 230) | def timeseries_dataset(self):
  class TimeseriesToyProblemNoInputs (line 237) | class TimeseriesToyProblemNoInputs(TimeseriesToyProblem):
    method has_inputs (line 241) | def has_inputs(self):
    method num_input_timestamps (line 245) | def num_input_timestamps(self):
  class TimeseriesSyntheticDataSeries10Samples100k (line 251) | class TimeseriesSyntheticDataSeries10Samples100k(TimeseriesProblem):
    method num_train_shards (line 255) | def num_train_shards(self):
    method num_eval_shards (line 260) | def num_eval_shards(self):
    method num_series (line 265) | def num_series(self):
    method num_input_timestamps (line 270) | def num_input_timestamps(self):
    method num_target_timestamps (line 275) | def num_target_timestamps(self):
    method normalizing_constant (line 280) | def normalizing_constant(self):
    method timeseries_params (line 284) | def timeseries_params(self):
    method timeseries_dataset (line 360) | def timeseries_dataset(self):

FILE: tensor2tensor/data_generators/timeseries_data_generator.py
  function generate_data (line 24) | def generate_data(timeseries_length, timeseries_params):

FILE: tensor2tensor/data_generators/timeseries_data_generator_test.py
  class TimeseriesDataGeneratorTest (line 29) | class TimeseriesDataGeneratorTest(tf.test.TestCase):
    method testGenerateData (line 31) | def testGenerateData(self):

FILE: tensor2tensor/data_generators/timeseries_test.py
  class TimeseriesTest (line 31) | class TimeseriesTest(tf.test.TestCase):
    method setUpClass (line 34) | def setUpClass(cls):
    method testTimeseriesToyProblem (line 39) | def testTimeseriesToyProblem(self):
    method testTimeseriesToyProblemNoInputs (line 65) | def testTimeseriesToyProblemNoInputs(self):
    method testTimeseriesSyntheticData10Series100kSamples (line 89) | def testTimeseriesSyntheticData10Series100kSamples(self):

FILE: tensor2tensor/data_generators/tokenizer.py
  function encode (line 66) | def encode(text):
  function decode (line 91) | def decode(tokens):
  function _read_filepattern (line 108) | def _read_filepattern(filepattern, max_lines=None, split_on_newlines=True):
  function corpus_token_counts (line 148) | def corpus_token_counts(
  function vocab_token_counts (line 174) | def vocab_token_counts(text_filepattern, max_lines):

FILE: tensor2tensor/data_generators/tokenizer_test.py
  class TokenizerTest (line 35) | class TokenizerTest(tf.test.TestCase):
    method test_encode (line 37) | def test_encode(self):
    method test_decode (line 49) | def test_decode(self):
    method test_invertibility_on_random_strings (line 55) | def test_invertibility_on_random_strings(self):
  class TestTokenCounts (line 61) | class TestTokenCounts(tf.test.TestCase):
    method setUp (line 63) | def setUp(self):
    method test_corpus_token_counts_split_on_newlines (line 68) | def test_corpus_token_counts_split_on_newlines(self):
    method test_corpus_token_counts_no_split_on_newlines (line 90) | def test_corpus_token_counts_no_split_on_newlines(self):
    method test_corpus_token_counts_split_with_max_lines (line 96) | def test_corpus_token_counts_split_with_max_lines(self):
    method test_corpus_token_counts_no_split_with_max_lines (line 103) | def test_corpus_token_counts_no_split_with_max_lines(self):
    method test_vocab_token_counts (line 115) | def test_vocab_token_counts(self):
    method test_vocab_token_counts_with_max_lines (line 127) | def test_vocab_token_counts_with_max_lines(self):

FILE: tensor2tensor/data_generators/transduction_problems.py
  class TransductionProblem (line 49) | class TransductionProblem(text_problems.Text2TextProblem):
    method __init__ (line 53) | def __init__(self, was_reversed=False, was_copy=False):
    method num_symbols (line 59) | def num_symbols(self):
    method min_sequence_length (line 63) | def min_sequence_length(self, dataset_split):
    method max_sequence_length (line 78) | def max_sequence_length(self, dataset_split):
    method num_samples (line 93) | def num_samples(self, dataset_split):
    method num_shards (line 109) | def num_shards(self):
    method is_generate_per_split (line 114) | def is_generate_per_split(self):
    method vocab_type (line 118) | def vocab_type(self):
    method sequence_length (line 121) | def sequence_length(self, dataset_split):
    method build_vocab (line 125) | def build_vocab(self):
    method get_or_create_vocab (line 128) | def get_or_create_vocab(self, data_dir, tmp_dir, force_get=False):
    method generate_random_sequence (line 139) | def generate_random_sequence(self, dataset_split):
    method transpose_sequence (line 143) | def transpose_sequence(self, input_sequence):
    method generate_samples (line 146) | def generate_samples(self, data_dir, tmp_dir, dataset_split):
  class CopySequence (line 157) | class CopySequence(TransductionProblem):
    method transpose_sequence (line 160) | def transpose_sequence(self, input_sequence):
  class CopySequenceSmall (line 165) | class CopySequenceSmall(CopySequence):
    method num_symbols (line 170) | def num_symbols(self):
    method min_sequence_length (line 173) | def min_sequence_length(self, dataset_split):
    method max_sequence_length (line 180) | def max_sequence_length(self, dataset_split):
    method num_samples (line 187) | def num_samples(self, dataset_split):
  class ReverseSequence (line 196) | class ReverseSequence(TransductionProblem):
    method transpose_sequence (line 200) | def transpose_sequence(self, input_sequence):
  class ReverseSequenceSmall (line 205) | class ReverseSequenceSmall(ReverseSequence):
    method num_symbols (line 210) | def num_symbols(self):
    method min_sequence_length (line 213) | def min_sequence_length(self, dataset_split):
    method max_sequence_length (line 220) | def max_sequence_length(self, dataset_split):
    method num_samples (line 227) | def num_samples(self, dataset_split):
  class FlipBiGramSequence (line 236) | class FlipBiGramSequence(TransductionProblem):
    method sequence_length (line 240) | def sequence_length(self, dataset_split):
    method transpose_sequence (line 258) | def transpose_sequence(self, input_sequence):

FILE: tensor2tensor/data_generators/transduction_problems_test.py
  class TransductionProblem (line 35) | class TransductionProblem(parameterized.TestCase):
    method setUp (line 37) | def setUp(self):
    method tearDown (line 42) | def tearDown(self):
    method testTransduction (line 64) | def testTransduction(self, p, transformation):

FILE: tensor2tensor/data_generators/translate.py
  class TranslateProblem (line 38) | class TranslateProblem(text_problems.Text2TextProblem):
    method is_generate_per_split (line 42) | def is_generate_per_split(self):
    method approx_vocab_size (line 46) | def approx_vocab_size(self):
    method datatypes_to_clean (line 50) | def datatypes_to_clean(self):
    method source_data_files (line 53) | def source_data_files(self, dataset_split):
    method vocab_data_files (line 57) | def vocab_data_files(self):
    method generate_samples (line 61) | def generate_samples(
    method generate_text_for_vocab (line 79) | def generate_text_for_vocab(self, data_dir, tmp_dir):
    method decode_hooks (line 84) | def decode_hooks(self):
  function compute_bleu_summaries (line 88) | def compute_bleu_summaries(hook_args):
  function _preprocess_sgm (line 126) | def _preprocess_sgm(line, is_sgm):
  function _clean_sentences (line 144) | def _clean_sentences(sentence_pairs):
  function _tmx_to_source_target (line 151) | def _tmx_to_source_target(tmx_file, source_resfile, target_resfile,
  function compile_data (line 163) | def compile_data(tmp_dir, datasets, filename, datatypes_to_clean=None):
  class TranslateDistillProblem (line 266) | class TranslateDistillProblem(TranslateProblem):
    method is_generate_per_split (line 270) | def is_generate_per_split(self):
    method example_reading_spec (line 273) | def example_reading_spec(self):
    method get_or_create_vocab (line 287) | def get_or_create_vocab(self, data_dir, tmp_dir, force_get=False):
    method generate_encoded_samples (line 295) | def generate_encoded_samples(self, data_dir, tmp_dir, dataset_split):
    method generate_samples (line 309) | def generate_samples(self, data_dir, tmp_dir, dataset_split):
  class TranslateWmt20Problem (line 317) | class TranslateWmt20Problem(TranslateProblem):
    method is_generate_per_split (line 321) | def is_generate_per_split(self):
    method generate_encoded_samples (line 324) | def generate_encoded_samples(self, data_dir, tmp_dir, dataset_split):
    method generate_text_for_vocab (line 336) | def generate_text_for_vocab(self, data_dir, tmp_dir):
    method generate_samples (line 345) | def generate_samples(self, data_dir, tmp_dir, dataset_split):
  class TranslateSamanantarProblem (line 350) | class TranslateSamanantarProblem(TranslateWmt20Problem):
    method generate_samples (line 353) | def generate_samples(self, data_dir, tmp_dir, dataset_split):

FILE: tensor2tensor/data_generators/translate_encs.py
  class TranslateEncsWmt32k (line 58) | class TranslateEncsWmt32k(translate.TranslateProblem):
    method approx_vocab_size (line 62) | def approx_vocab_size(self):
    method source_data_files (line 65) | def source_data_files(self, dataset_split):
    method vocab_data_files (line 69) | def vocab_data_files(self):
  class TranslateEncsWmtCharacters (line 85) | class TranslateEncsWmtCharacters(translate.TranslateProblem):
    method vocab_type (line 89) | def vocab_type(self):
    method generate_samples (line 92) | def generate_samples(self, data_dir, tmp_dir, dataset_split):

FILE: tensor2tensor/data_generators/translate_encs_cubbitt.py
  class TranslateEncsCubbitt (line 46) | class TranslateEncsCubbitt(translate_encs.TranslateEncsWmt32k):
    method use_vocab_from_other_problem (line 50) | def use_vocab_from_other_problem(self):
    method already_shuffled (line 54) | def already_shuffled(self):
    method skip_random_fraction_when_training (line 58) | def skip_random_fraction_when_training(self):
    method backtranslate_data_filenames (line 62) | def backtranslate_data_filenames(self):
    method dataset_splits (line 68) | def dataset_splits(self):
    method generate_samples (line 78) | def generate_samples(self, data_dir, tmp_dir, dataset_split):

FILE: tensor2tensor/data_generators/translate_ende.py
  class TranslateEndeWmt32k (line 70) | class TranslateEndeWmt32k(translate.TranslateProblem):
    method additional_training_datasets (line 74) | def additional_training_datasets(self):
    method source_data_files (line 78) | def source_data_files(self, dataset_split):
  class TranslateEnde2018Wmt32k (line 85) | class TranslateEnde2018Wmt32k(translate.TranslateProblem):
    method use_vocab_from_other_problem (line 89) | def use_vocab_from_other_problem(self):
    method additional_training_datasets (line 93) | def additional_training_datasets(self):
  class TranslateEndeWmtClean32k (line 99) | class TranslateEndeWmtClean32k(TranslateEndeWmt32k):
    method use_vocab_from_other_problem (line 103) | def use_vocab_from_other_problem(self):
    method datatypes_to_clean (line 107) | def datatypes_to_clean(self):
  class TranslateEndePc32k (line 112) | class TranslateEndePc32k(translate.TranslateProblem):
    method use_vocab_from_other_problem (line 116) | def use_vocab_from_other_problem(self):
    method additional_training_datasets (line 120) | def additional_training_datasets(self):
    method source_data_files (line 124) | def source_data_files(self, dataset_split):
  class TranslateEndePcClean32k (line 132) | class TranslateEndePcClean32k(TranslateEndePc32k):
    method datatypes_to_clean (line 136) | def datatypes_to_clean(self):
  class TranslateEndeWmtPc32k (line 141) | class TranslateEndeWmtPc32k(TranslateEndeWmt32k):
    method use_vocab_from_other_problem (line 145) | def use_vocab_from_other_problem(self):
    method additional_training_datasets (line 149) | def additional_training_datasets(self):
  class TranslateEndeWmtCleanPc32k (line 154) | class TranslateEndeWmtCleanPc32k(TranslateEndeWmtPc32k):
    method datatypes_to_clean (line 158) | def datatypes_to_clean(self):
  class TranslateEndeWmtPcClean32k (line 163) | class TranslateEndeWmtPcClean32k(TranslateEndeWmtPc32k):
    method datatypes_to_clean (line 167) | def datatypes_to_clean(self):
  class TranslateEndeWmtCleanPcClean32k (line 172) | class TranslateEndeWmtCleanPcClean32k(TranslateEndeWmtPcClean32k):
    method datatypes_to_clean (line 176) | def datatypes_to_clean(self):
  class TranslateEndeWmt32kPacked (line 181) | class TranslateEndeWmt32kPacked(TranslateEndeWmt32k):
    method packed_length (line 184) | def packed_length(self):
    method use_vocab_from_other_problem (line 188) | def use_vocab_from_other_problem(self):
  class TranslateEndeWmt8k (line 193) | class TranslateEndeWmt8k(TranslateEndeWmt32k):
    method approx_vocab_size (line 197) | def approx_vocab_size(self):
  class TranslateEndeWmt8kPacked (line 202) | class TranslateEndeWmt8kPacked(TranslateEndeWmt8k):
    method packed_length (line 205) | def packed_length(self):
    method use_vocab_from_other_problem (line 209) | def use_vocab_from_other_problem(self):
  class TranslateEndeWmtCharacters (line 214) | class TranslateEndeWmtCharacters(TranslateEndeWmt8k):
    method vocab_type (line 218) | def vocab_type(self):
  class TranslateEndeWmtMulti64k (line 223) | class TranslateEndeWmtMulti64k(TranslateEndeWmt8k):
    method use_vocab_from_other_problem (line 227) | def use_vocab_from_other_problem(self):
  class TranslateEndeWmtMulti64kPacked1k (line 232) | class TranslateEndeWmtMulti64kPacked1k(TranslateEndeWmtMulti64k):
    method packed_length (line 236) | def packed_length(self):
    method num_training_examples (line 240) | def num_training_examples(self):
    method inputs_prefix (line 244) | def inputs_prefix(self):
    method targets_prefix (line 248) | def targets_prefix(self):

FILE: tensor2tensor/data_generators/translate_ende_test.py
  class TranslateEndeTest (line 28) | class TranslateEndeTest(tf.test.TestCase):
    method test_vocab_size (line 31) | def test_vocab_size(self):
    method test_additional_datasets (line 37) | def test_additional_datasets(self):
    method test_source_data_files (line 43) | def test_source_data_files(self):

FILE: tensor2tensor/data_generators/translate_enes.py
  class TranslateEnesWmt32k (line 58) | class TranslateEnesWmt32k(translate.TranslateProblem):
    method additional_training_datasets (line 62) | def additional_training_datasets(self):
    method source_data_files (line 66) | def source_data_files(self, dataset_split):
    method vocab_data_files (line 71) | def vocab_data_files(self):
  class TranslateEnesWmtClean32k (line 76) | class TranslateEnesWmtClean32k(TranslateEnesWmt32k):
    method use_vocab_from_other_problem (line 80) | def use_vocab_from_other_problem(self):
    method datatypes_to_clean (line 84) | def datatypes_to_clean(self):
  class TranslateEnesWmt32kPacked (line 89) | class TranslateEnesWmt32kPacked(TranslateEnesWmt32k):
    method packed_length (line 92) | def packed_length(self):
    method use_vocab_from_other_problem (line 96) | def use_vocab_from_other_problem(self):
  class TranslateEnesWmt8k (line 101) | class TranslateEnesWmt8k(TranslateEnesWmt32k):
    method approx_vocab_size (line 105) | def approx_vocab_size(self):
  class TranslateEnesWmt8kPacked (line 110) | class TranslateEnesWmt8kPacked(TranslateEnesWmt8k):
    method packed_length (line 113) | def packed_length(self):
    method use_vocab_from_other_problem (line 117) | def use_vocab_from_other_problem(self):
  class TranslateEnesWmtCharacters (line 122) | class TranslateEnesWmtCharacters(TranslateEnesWmt8k):
    method vocab_type (line 126) | def vocab_type(self):

FILE: tensor2tensor/data_generators/translate_enet.py
  class TranslateEnetWmt32k (line 56) | class TranslateEnetWmt32k(translate.TranslateProblem):
    method approx_vocab_size (line 60) | def approx_vocab_size(self):
    method source_data_files (line 63) | def source_data_files(self, dataset_split):
  class TranslateEnetWmtCharacters (line 69) | class TranslateEnetWmtCharacters(translate.TranslateProblem):
    method vocab_type (line 73) | def vocab_type(self):
    method source_data_files (line 76) | def source_data_files(self, dataset_split):

FILE: tensor2tensor/data_generators/translate_enfr.py
  class TranslateEnfrWmtSmall8k (line 82) | class TranslateEnfrWmtSmall8k(translate.TranslateProblem):
    method approx_vocab_size (line 86) | def approx_vocab_size(self):
    method use_small_dataset (line 90) | def use_small_dataset(self):
    method source_data_files (line 93) | def source_data_files(self, dataset_split):
    method vocab_data_files (line 101) | def vocab_data_files(self):
  class TranslateEnfrWmtSmall32k (line 107) | class TranslateEnfrWmtSmall32k(TranslateEnfrWmtSmall8k):
    method approx_vocab_size (line 110) | def approx_vocab_size(self):
  class TranslateEnfrWmt8k (line 115) | class TranslateEnfrWmt8k(TranslateEnfrWmtSmall8k):
    method use_small_dataset (line 118) | def use_small_dataset(self):
  class TranslateEnfrWmt32k (line 123) | class TranslateEnfrWmt32k(TranslateEnfrWmtSmall32k):
    method use_small_dataset (line 126) | def use_small_dataset(self):
  class TranslateEnfrWmt32kPacked (line 131) | class TranslateEnfrWmt32kPacked(TranslateEnfrWmt32k):
    method packed_length (line 134) | def packed_length(self):
    method use_vocab_from_other_problem (line 138) | def use_vocab_from_other_problem(self):
  class TranslateEnfrWmt32kWithBacktranslateFr (line 143) | class TranslateEnfrWmt32kWithBacktranslateFr(TranslateEnfrWmt32k):
    method use_vocab_from_other_problem (line 147) | def use_vocab_from_other_problem(self):
    method already_shuffled (line 151) | def already_shuffled(self):
    method skip_random_fraction_when_training (line 155) | def skip_random_fraction_when_training(self):
    method backtranslate_data_filenames (line 159) | def backtranslate_data_filenames(self):
    method dataset_splits (line 165) | def dataset_splits(self):
    method generate_samples (line 175) | def generate_samples(self, data_dir, tmp_dir, dataset_split):
  class TranslateEnfrWmt32kWithBacktranslateEn (line 199) | class TranslateEnfrWmt32kWithBacktranslateEn(
    method backtranslate_data_filenames (line 204) | def backtranslate_data_filenames(self):
  class TranslateEnfrWmtSmallCharacters (line 211) | class TranslateEnfrWmtSmallCharacters(translate.TranslateProblem):
    method vocab_type (line 215) | def vocab_type(self):
    method use_small_dataset (line 219) | def use_small_dataset(self):
    method source_data_files (line 222) | def source_data_files(self, dataset_split):
  class TranslateEnfrWmtCharacters (line 232) | class TranslateEnfrWmtCharacters(TranslateEnfrWmtSmallCharacters):
    method use_small_dataset (line 235) | def use_small_dataset(self):
  class TranslateEnfrWmtMulti64k (line 240) | class TranslateEnfrWmtMulti64k(TranslateEnfrWmtSmall32k):
    method use_small_dataset (line 244) | def use_small_dataset(self):
    method use_vocab_from_other_problem (line 248) | def use_vocab_from_other_problem(self):
  class TranslateEnfrWmtMulti64kPacked1k (line 253) | class TranslateEnfrWmtMulti64kPacked1k(TranslateEnfrWmtMulti64k):
    method packed_length (line 257) | def packed_length(self):
    method num_training_examples (line 261) | def num_training_examples(self):
    method inputs_prefix (line 265) | def inputs_prefix(self):
    method targets_prefix (line 269) | def targets_prefix(self):

FILE: tensor2tensor/data_generators/translate_enid.py
  class TranslateEnidIwslt32k (line 76) | class TranslateEnidIwslt32k(translate.TranslateProblem):
    method approx_vocab_size (line 80) | def approx_vocab_size(self):
    method source_data_files (line 83) | def source_data_files(self, dataset_split):

FILE: tensor2tensor/data_generators/translate_enmk.py
  class TranslateEnmkSetimes32k (line 51) | class TranslateEnmkSetimes32k(translate.TranslateProblem):
    method approx_vocab_size (line 55) | def approx_vocab_size(self):
    method source_data_files (line 58) | def source_data_files(self, dataset_split):
  class TranslateEnmkSetimesCharacters (line 64) | class TranslateEnmkSetimesCharacters(translate.TranslateProblem):
    method vocab_type (line 68) | def vocab_type(self):
    method source_data_files (line 71) | def source_data_files(self, dataset_split):

FILE: tensor2tensor/data_generators/translate_enro.py
  class TranslateEnroWmt8k (line 51) | class TranslateEnroWmt8k(translate.TranslateProblem):
    method approx_vocab_size (line 55) | def approx_vocab_size(self):
    method source_data_files (line 58) | def source_data_files(self, dataset_split):
  class TranslateEnroWmt32k (line 64) | class TranslateEnroWmt32k(TranslateEnroWmt8k):
    method approx_vocab_size (line 67) | def approx_vocab_size(self):
  class TranslateEnroWmtCharacters (line 72) | class TranslateEnroWmtCharacters(TranslateEnroWmt8k):
    method vocab_type (line 76) | def vocab_type(self):
  class TranslateEnroWmtMulti64k (line 81) | class TranslateEnroWmtMulti64k(TranslateEnroWmt8k):
    method use_vocab_from_other_problem (line 85) | def use_vocab_from_other_problem(self):
  class TranslateEnroWmtMultiSmall64k (line 90) | class TranslateEnroWmtMultiSmall64k(TranslateEnroWmt8k):
    method dataset_splits (line 94) | def dataset_splits(self):
    method use_vocab_from_other_problem (line 105) | def use_vocab_from_other_problem(self):
    method how_many_examples_to_sample (line 109) | def how_many_examples_to_sample(self):
    method generate_samples (line 112) | def generate_samples(self, data_dir, tmp_dir, dataset_split):
  class TranslateEnroWmtMultiTiny64k (line 147) | class TranslateEnroWmtMultiTiny64k(TranslateEnroWmtMultiSmall64k):
    method how_many_examples_to_sample (line 151) | def how_many_examples_to_sample(self):
  class TranslateEnroWmtMultiTiny64kPacked1k (line 156) | class TranslateEnroWmtMultiTiny64kPacked1k(TranslateEnroWmtMultiTiny64k):
    method packed_length (line 160) | def packed_length(self):
    method num_training_examples (line 164) | def num_training_examples(self):
    method inputs_prefix (line 168) | def inputs_prefix(self):
    method targets_prefix (line 172) | def targets_prefix(self):

FILE: tensor2tensor/data_generators/translate_entn.py
  class TranslateEntnRma (line 41) | class TranslateEntnRma(translate.TranslateProblem):
    method approx_vocab_size (line 48) | def approx_vocab_size(self):
    method vocab_filename (line 52) | def vocab_filename(self):
    method source_data_files (line 55) | def source_data_files(self, dataset_split):

FILE: tensor2tensor/data_generators/translate_envi.py
  class TranslateEnviIwslt32k (line 49) | class TranslateEnviIwslt32k(translate.TranslateProblem):
    method approx_vocab_size (line 53) | def approx_vocab_size(self):
    method source_data_files (line 56) | def source_data_files(self, dataset_split):

FILE: tensor2tensor/data_generators/translate_enzh.py
  function get_filename (line 155) | def get_filename(dataset):
  class TranslateEnzhWmt32k (line 160) | class TranslateEnzhWmt32k(translate.TranslateProblem):
    method approx_vocab_size (line 180) | def approx_vocab_size(self):
    method source_vocab_name (line 184) | def source_vocab_name(self):
    method target_vocab_name (line 188) | def target_vocab_name(self):
    method get_training_dataset (line 191) | def get_training_dataset(self, tmp_dir):
    method generate_encoded_samples (line 213) | def generate_encoded_samples(self, data_dir, tmp_dir, dataset_split):
    method feature_encoders (line 243) | def feature_encoders(self, data_dir):
  class TranslateEnzhWmt8k (line 255) | class TranslateEnzhWmt8k(TranslateEnzhWmt32k):
    method approx_vocab_size (line 262) | def approx_vocab_size(self):
    method dataset_splits (line 266) | def dataset_splits(self):
    method get_training_dataset (line 278) | def get_training_dataset(self, tmp_dir):

FILE: tensor2tensor/data_generators/translate_test.py
  class TranslateTest (line 31) | class TranslateTest(tf.test.TestCase):
    method setUpClass (line 39) | def setUpClass(cls):
    method testCompileData (line 71) | def testCompileData(self):

FILE: tensor2tensor/data_generators/video_generated.py
  class VideoStochasticShapes10k (line 42) | class VideoStochasticShapes10k(video_utils.VideoProblem):
    method is_generate_per_split (line 46) | def is_generate_per_split(self):
    method frame_height (line 51) | def frame_height(self):
    method frame_width (line 55) | def frame_width(self):
    method total_number_of_frames (line 59) | def total_number_of_frames(self):
    method video_length (line 64) | def video_length(self):
    method random_skip (line 68) | def random_skip(self):
    method only_keep_videos_from_0th_frame (line 72) | def only_keep_videos_from_0th_frame(self):
    method use_not_breaking_batching (line 76) | def use_not_breaking_batching(self):
    method eval_metrics (line 79) | def eval_metrics(self):
    method extra_reading_spec (line 83) | def extra_reading_spec(self):
    method hparams (line 94) | def hparams(self, defaults, unused_model_hparams):
    method get_circle (line 106) | def get_circle(x, y, z, c, s):
    method get_rectangle (line 112) | def get_rectangle(x, y, z, c, s):
    method get_triangle (line 118) | def get_triangle(x, y, z, c, s):
    method generate_stochastic_shape_instance (line 124) | def generate_stochastic_shape_instance(self):
    method generate_samples (line 191) | def generate_samples(self, data_dir, tmp_dir, unused_dataset_split):

FILE: tensor2tensor/data_generators/video_utils.py
  function resize_video_frames (line 49) | def resize_video_frames(images, size):
  function video_augmentation (line 54) | def video_augmentation(features, hue=False, saturate=False, contrast=Fal...
  function create_border (line 83) | def create_border(video, color="blue", border_percent=2):
  function convert_videos_to_summaries (line 108) | def convert_videos_to_summaries(input_videos, output_videos, target_videos,
  function display_video_hooks (line 167) | def display_video_hooks(hook_args):
  function summarize_video_metrics (line 211) | def summarize_video_metrics(hook_args):
  function debug_video_writer_factory (line 240) | def debug_video_writer_factory(output_dir):
  class VideoProblem (line 251) | class VideoProblem(problem.Problem):
    method __init__ (line 254) | def __init__(self, *args, **kwargs):
    method max_frames_per_video (line 264) | def max_frames_per_video(self, hparams):
    method num_channels (line 284) | def num_channels(self):
    method frame_height (line 289) | def frame_height(self):
    method frame_width (line 294) | def frame_width(self):
    method frame_shape (line 299) | def frame_shape(self):
    method total_number_of_frames (line 304) | def total_number_of_frames(self):
    method random_skip (line 315) | def random_skip(self):
    method extra_reading_spec (line 320) | def extra_reading_spec(self):
    method dataset_splits (line 325) | def dataset_splits(self):
    method only_keep_videos_from_0th_frame (line 336) | def only_keep_videos_from_0th_frame(self):
    method avoid_overlapping_frames (line 340) | def avoid_overlapping_frames(self):
    method use_not_breaking_batching (line 345) | def use_not_breaking_batching(self):
    method preprocess_example (line 348) | def preprocess_example(self, example, mode, hparams):
    method decode_hooks (line 357) | def decode_hooks(self):
    method is_generate_per_split (line 361) | def is_generate_per_split(self):
    method example_reading_spec (line 378) | def example_reading_spec(self):
    method serving_input_fn (line 399) | def serving_input_fn(self, hparams):
    method preprocess (line 413) | def preprocess(self, dataset, mode, hparams, interleave=True):
    method eval_metrics (line 537) | def eval_metrics(self):
    method validate_frame (line 544) | def validate_frame(self, frame):
    method generate_samples (line 556) | def generate_samples(self, data_dir, tmp_dir, dataset_split):
    method generate_encoded_samples (line 575) | def generate_encoded_samples(self, data_dir, tmp_dir, dataset_split):
    method generate_data (line 634) | def generate_data(self, data_dir, tmp_dir, task_id=-1):
  class VideoProblemOld (line 665) | class VideoProblemOld(problem.Problem):
    method num_channels (line 669) | def num_channels(self):
    method example_reading_spec (line 673) | def example_reading_spec(self):
    method eval_metrics (line 689) | def eval_metrics(self):
  class VideoAugmentationProblem (line 697) | class VideoAugmentationProblem(VideoProblem):
    method hue (line 706) | def hue(self):
    method contrast (line 710) | def contrast(self):
    method saturate (line 714) | def saturate(self):
    method preprocess (line 717) | def preprocess(self, dataset, mode, hparams, interleave=True):
  class Video2ClassProblem (line 728) | class Video2ClassProblem(VideoProblemOld):
    method is_small (line 732) | def is_small(self):
    method num_classes (line 736) | def num_classes(self):
    method train_shards (line 740) | def train_shards(self):
    method dev_shards (line 744) | def dev_shards(self):
    method class_labels (line 748) | def class_labels(self):
    method image_size (line 752) | def image_size(self):
    method feature_encoders (line 755) | def feature_encoders(self, data_dir):
    method generator (line 762) | def generator(self, data_dir, tmp_dir, is_training):
    method example_reading_spec (line 765) | def example_reading_spec(self):
    method hparams (line 774) | def hparams(self, defaults, unused_model_hparams):
    method generate_data (line 783) | def generate_data(self, data_dir, tmp_dir, task_id=-1):

FILE: tensor2tensor/data_generators/video_utils_test.py
  class VideoUtilsTest (line 31) | class VideoUtilsTest(parameterized.TestCase, tf.test.TestCase):
    method get_predictions (line 33) | def get_predictions(self, num_decodes=2):
    method testVideoAugmentation (line 49) | def testVideoAugmentation(self):
    method testDecodeInMemoryTrue (line 62) | def testDecodeInMemoryTrue(self):
    method testConvertPredictionsToVideoSummaries (line 75) | def testConvertPredictionsToVideoSummaries(self, num_decodes=5,

FILE: tensor2tensor/data_generators/vqa.py
  function _get_vqa_v2_annotations (line 45) | def _get_vqa_v2_annotations(directory,
  function _get_vqa_v2_image_raw_dataset (line 55) | def _get_vqa_v2_image_raw_dataset(directory, image_root_url, image_urls):
  function _get_vqa_v2_image_feature_dataset (line 66) | def _get_vqa_v2_image_feature_dataset(
  class ImageQuestion2MultilabelProblem (line 75) | class ImageQuestion2MultilabelProblem(image_utils.ImageProblem):
    method target_space_id (line 79) | def target_space_id(self):
    method vocab_size (line 83) | def vocab_size(self):
    method num_classes (line 87) | def num_classes(self):
    method vocab_filename (line 91) | def vocab_filename(self):
    method label_filename (line 95) | def label_filename(self):
    method train_shards (line 99) | def train_shards(self):
    method dev_shards (line 103) | def dev_shards(self):
    method source_data_files (line 106) | def source_data_files(self, dataset_split):
    method generator (line 109) | def generator(self, data_dir, tmp_dir, dataset_split):
    method eval_metrics (line 112) | def eval_metrics(self):
    method feature_encoders (line 117) | def feature_encoders(self, data_dir):
    method hparams (line 129) | def hparams(self, defaults, unused_model_hparams):
    method generate_data (line 147) | def generate_data(self, data_dir, tmp_dir, task_id=-1):
  class ImageVqav2Tokens10kLabels3k (line 156) | class ImageVqav2Tokens10kLabels3k(ImageQuestion2MultilabelProblem):
    method source_data_files (line 178) | def source_data_files(self, dataset_split):
    method target_space_id (line 183) | def target_space_id(self):
    method vocab_size (line 187) | def vocab_size(self):
    method num_classes (line 191) | def num_classes(self):
    method vocab_filename (line 195) | def vocab_filename(self):
    method label_filename (line 199) | def label_filename(self):
    method train_shards (line 203) | def train_shards(self):
    method dev_shards (line 207) | def dev_shards(self):
    method example_reading_spec (line 210) | def example_reading_spec(self):
    method preprocess_example (line 227) | def preprocess_example(self, example, mode, hparams):
    method generator (line 236) | def generator(self, data_dir, tmp_dir, dataset_split):
    method vqa_v2_generator (line 240) | def vqa_v2_generator(self, data_dir, tmp_dir, datasets):
  class ImageVqav2RcnnFeatureTokens10kLabels3k (line 292) | class ImageVqav2RcnnFeatureTokens10kLabels3k(ImageVqav2Tokens10kLabels3k):
    method num_boxes (line 298) | def num_boxes(self):
    method feature_dimension (line 302) | def feature_dimension(self):
    method spatial_feature_dimension (line 306) | def spatial_feature_dimension(self):
    method feature_file_field_names (line 310) | def feature_file_field_names(self):
    method preprocess_example (line 318) | def preprocess_example(self, example, mode, hparams):
    method example_reading_spec (line 327) | def example_reading_spec(self):
    method vqa_v2_generator (line 357) | def vqa_v2_generator(self, data_dir, tmp_dir, datasets):

FILE: tensor2tensor/data_generators/vqa_utils.py
  function _smallest_size_at_least (line 37) | def _smallest_size_at_least(height, width, smallest_side):
  function _aspect_preserving_resize (line 67) | def _aspect_preserving_resize(image, smallest_side):
  function _flip (line 93) | def _flip(image):
  function _distort_color (line 99) | def _distort_color(image, color_ordering=0, scope=None):
  function _apply_with_random_selector (line 144) | def _apply_with_random_selector(x, func, num_cases):
  function _mean_image_subtraction (line 164) | def _mean_image_subtraction(image, means):
  function vqa_v2_preprocess_image (line 197) | def vqa_v2_preprocess_image(

FILE: tensor2tensor/data_generators/wiki.py
  class LanguagemodelWikiXmlV8kL1k (line 36) | class LanguagemodelWikiXmlV8kL1k(text_problems.ChoppedTextProblem):
    method maybe_prepare_text (line 43) | def maybe_prepare_text(self, tmp_dir):
    method train_text_filepaths (line 72) | def train_text_filepaths(self, tmp_dir):
    method dev_text_filepaths (line 76) | def dev_text_filepaths(self, tmp_dir):
    method dev_fraction (line 81) | def dev_fraction(self):
    method corpus_url (line 85) | def corpus_url(self):
    method approx_vocab_size (line 90) | def approx_vocab_size(self):
    method sequence_length (line 94) | def sequence_length(self):
    method max_chars_for_vocab (line 99) | def max_chars_for_vocab(self):
  class LanguagemodelWikiXmlV8kL4k (line 106) | class LanguagemodelWikiXmlV8kL4k(LanguagemodelWikiXmlV8kL1k):
    method sequence_length (line 114) | def sequence_length(self):
  class LanguagemodelWikiScramble (line 119) | class LanguagemodelWikiScramble(LanguagemodelWikiXmlV8kL1k):
    method example_generator (line 130) | def example_generator(self, encoder, tmp_dir, task_id):
    method scramble_fraction (line 137) | def scramble_fraction(self):
    method has_inputs (line 141) | def has_inputs(self):
    method input_space_id (line 145) | def input_space_id(self):
    method targeted_vocab_size (line 149) | def targeted_vocab_size(self):
    method remainder_policy (line 153) | def remainder_policy(self):
    method scramble (line 157) | def scramble(self, seq):
  class LanguagemodelWikiScrambleL128 (line 172) | class LanguagemodelWikiScrambleL128(LanguagemodelWikiScramble):
    method sequence_length (line 176) | def sequence_length(self):
    method scramble_fraction (line 180) | def scramble_fraction(self):
  class LanguagemodelWikiScrambleL1k (line 185) | class LanguagemodelWikiScrambleL1k(LanguagemodelWikiScramble):
    method sequence_length (line 189) | def sequence_length(self):
    method scramble_fraction (line 193) | def scramble_fraction(self):
  class LanguagemodelWikiNorefV8kL1k (line 198) | class LanguagemodelWikiNorefV8kL1k(LanguagemodelWikiXmlV8kL1k):
    method filepath_to_unicode_strings (line 219) | def filepath_to_unicode_strings(self, filepath):
    method max_chars_for_vocab (line 239) | def max_chars_for_vocab(self):
  function _dump_to_pages (line 245) | def _dump_to_pages(dump):
  function _page_to_title (line 270) | def _page_to_title(page):
  function _page_to_text (line 289) | def _page_to_text(page):
  function _find_and_replace (line 309) | def _find_and_replace(text, start_string, end_string, replace_fn):
  function _remove_references (line 342) | def _remove_references(text):
  function _remove_triple_quotes (line 347) | def _remove_triple_quotes(text):
  function _remove_double_brackets (line 352) | def _remove_double_brackets(text):
  class LanguagemodelWikiNorefV8kL16k (line 373) | class LanguagemodelWikiNorefV8kL16k(LanguagemodelWikiNorefV8kL1k):
    method sequence_length (line 380) | def sequence_length(self):
  class LanguagemodelWikiNorefV32kL1k (line 386) | class LanguagemodelWikiNorefV32kL1k(LanguagemodelWikiNorefV8kL1k):
    method approx_vocab_size (line 390) | def approx_vocab_size(self):
    method max_chars_for_vocab (line 394) | def max_chars_for_vocab(self):
  class LanguagemodelWikiNorefV32kL16k (line 399) | class LanguagemodelWikiNorefV32kL16k(LanguagemodelWikiNorefV32kL1k):
    method sequence_length (line 406) | def sequence_length(self):
  class LanguagemodelWikiNorefV128kL1k (line 412) | class LanguagemodelWikiNorefV128kL1k(LanguagemodelWikiNorefV8kL1k):
    method approx_vocab_size (line 416) | def approx_vocab_size(self):
    method max_chars_for_vocab (line 420) | def max_chars_for_vocab(self):

FILE: tensor2tensor/data_generators/wiki_lm.py
  function concat_generator (line 33) | def concat_generator(filename, up_threshold, low_threshold=10):
  function mix_generators (line 51) | def mix_generators(generator_list):
  class LanguagemodelEnWiki32k (line 85) | class LanguagemodelEnWiki32k(text_problems.Text2SelfProblem):
    method approx_vocab_size (line 93) | def approx_vocab_size(self):
    method max_samples_for_vocab (line 97) | def max_samples_for_vocab(self):
    method combine_characters_threshold (line 101) | def combine_characters_threshold(self):
    method is_generate_per_split (line 105) | def is_generate_per_split(self):
    method dataset_splits (line 109) | def dataset_splits(self):
    method generate_samples (line 122) | def generate_samples(self, data_dir, tmp_dir, dataset_split):
  class LanguagemodelEnWiki64k (line 147) | class LanguagemodelEnWiki64k(LanguagemodelEnWiki32k):
    method approx_vocab_size (line 151) | def approx_vocab_size(self):
  class LanguagemodelEnWiki64kShorter (line 156) | class LanguagemodelEnWiki64kShorter(LanguagemodelEnWiki64k):
    method combine_characters_threshold (line 160) | def combine_characters_threshold(self):
    method use_vocab_from_other_problem (line 165) | def use_vocab_from_other_problem(self):
  class LanguagemodelDeWiki32k (line 170) | class LanguagemodelDeWiki32k(LanguagemodelEnWiki32k):
  class LanguagemodelDeWiki64k (line 179) | class LanguagemodelDeWiki64k(LanguagemodelDeWiki32k):
    method approx_vocab_size (line 183) | def approx_vocab_size(self):
  class LanguagemodelFrWiki32k (line 188) | class LanguagemodelFrWiki32k(LanguagemodelEnWiki32k):
  class LanguagemodelFrWiki64k (line 197) | class LanguagemodelFrWiki64k(LanguagemodelFrWiki32k):
    method approx_vocab_size (line 201) | def approx_vocab_size(self):
  class LanguagemodelRoWiki32k (line 206) | class LanguagemodelRoWiki32k(LanguagemodelEnWiki32k):
  class LanguagemodelRoWiki64k (line 215) | class LanguagemodelRoWiki64k(LanguagemodelRoWiki32k):
    method approx_vocab_size (line 219) | def approx_vocab_size(self):
  class LanguagemodelDeEnFrRoWiki64k (line 224) | class LanguagemodelDeEnFrRoWiki64k(LanguagemodelEnWiki32k):
    method approx_vocab_size (line 235) | def approx_vocab_size(self):
    method max_samples_for_vocab (line 239) | def max_samples_for_vocab(self):
  class LanguagemodelDeEnFrRoWiki64kFitbPacked1k (line 244) | class LanguagemodelDeEnFrRoWiki64kFitbPacked1k(
    method use_vocab_from_other_problem (line 249) | def use_vocab_from_other_problem(self):
    method has_inputs (line 253) | def has_inputs(self):
    method generate_samples (line 256) | def generate_samples(self, data_dir, tmp_dir, dataset_split):
    method num_training_examples (line 264) | def num_training_examples(self):
    method packed_length (line 268) | def packed_length(self):
    method inputs_prefix (line 272) | def inputs_prefix(self):
    method targets_prefix (line 276) | def targets_prefix(self):

FILE: tensor2tensor/data_generators/wiki_multi_problems.py
  class LanguagemodelEnWikiLMMultiNLISubwords (line 36) | class LanguagemodelEnWikiLMMultiNLISubwords(multi_problem.MultiProblem):
    method __init__ (line 39) | def __init__(self, was_reversed=False, was_copy=False):
    method vocab_type (line 46) | def vocab_type(self):
  class LanguagemodelEnWikiLMMultiNLISubwordsV2 (line 51) | class LanguagemodelEnWikiLMMultiNLISubwordsV2(
    method __init__ (line 55) | def __init__(self, was_reversed=False, was_copy=False):
    method has_inputs (line 65) | def has_inputs(self):
    method use_vocab_from_other_problem (line 69) | def use_vocab_from_other_problem(self):
    method vocab_type (line 73) | def vocab_type(self):
  class LanguagemodelMultiWikiTranslatePacked1k (line 78) | class LanguagemodelMultiWikiTranslatePacked1k(
    method __init__ (line 82) | def __init__(self, was_reversed=False, was_copy=False):
    method problems_and_rates (line 95) | def problems_and_rates(self):
    method has_inputs (line 108) | def has_inputs(self):
    method use_vocab_from_other_problem (line 112) | def use_vocab_from_other_problem(self):
    method vocab_type (line 116) | def vocab_type(self):
    method packed_length (line 120) | def packed_length(self):
  class LanguagemodelMultiWikiTranslatePacked1kV2 (line 125) | class LanguagemodelMultiWikiTranslatePacked1kV2(
    method problems_and_rates (line 130) | def problems_and_rates(self):
  class LanguagemodelEnWikiLMMultiNLISubwords64k (line 144) | class LanguagemodelEnWikiLMMultiNLISubwords64k(multi_problem.MultiProblem):
    method __init__ (line 147) | def __init__(self, was_reversed=False, was_copy=False):
    method vocab_type (line 154) | def vocab_type(self):
  class LanguagemodelEnWikiLMShortMultiNLISubwords64k (line 159) | class LanguagemodelEnWikiLMShortMultiNLISubwords64k(multi_problem.MultiP...
    method __init__ (line 162) | def __init__(self, was_reversed=False, was_copy=False):
    method vocab_type (line 169) |

Download .json

Condensed preview — 553 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (8,980K chars).

[
  {
    "path": ".gitignore",
    "chars": 310,
    "preview": "# Compiled python modules.\n*.pyc\n\n# Byte-compiled\n_pycache__/\n.cache/\n\n# Python egg metadata, regenerated from source fi"
  },
  {
    "path": ".travis.yml",
    "chars": 779,
    "preview": "sudo: required\nlanguage: python\ncache: pip\ngit:\n  depth: 3\n  quiet: true\nservices:\n  - docker\npython:\n  - \"3.6\"\nenv:\n  g"
  },
  {
    "path": "AUTHORS",
    "chars": 311,
    "preview": "# This is the list of T2T authors for copyright purposes.\n#\n# This does not necessarily list everyone who has contribute"
  },
  {
    "path": "CONTRIBUTING.md",
    "chars": 1280,
    "preview": "# How to Contribute\n\n# Issues\n\n* Please tag your issue with `bug`, `feature request`, or `question` to help us\n  effecti"
  },
  {
    "path": "ISSUE_TEMPLATE.md",
    "chars": 266,
    "preview": "### Description\n\n...\n\n### Environment information\n\n```\nOS: <your answer here>\n\n$ pip freeze | grep tensor\n# your output "
  },
  {
    "path": "LICENSE",
    "chars": 11358,
    "preview": "\n                                 Apache License\n                           Version 2.0, January 2004\n                  "
  },
  {
    "path": "README.md",
    "chars": 19717,
    "preview": "# Tensor2Tensor\n\n[![PyPI\nversion](https://badge.fury.io/py/tensor2tensor.svg)](https://badge.fury.io/py/tensor2tensor)\n["
  },
  {
    "path": "docs/cloud_mlengine.md",
    "chars": 3467,
    "preview": "# Running on Cloud ML Engine\n\nGoogle Cloud Platform offers a managed training environment for TensorFlow\nmodels called ["
  },
  {
    "path": "docs/cloud_tpu.md",
    "chars": 1803,
    "preview": "# Running on Cloud TPUs\n\nTensor2Tensor supports running on Google Cloud Platforms TPUs, chips\nspecialized for ML trainin"
  },
  {
    "path": "docs/distributed_training.md",
    "chars": 8904,
    "preview": "# Distributed Training\n\nThe `t2t-trainer` supports both synchronous and asynchronous distributed\ntraining.\n\nNote that it"
  },
  {
    "path": "docs/index.md",
    "chars": 5361,
    "preview": "# Tensor2Tensor Documentation\n\n[![PyPI\nversion](https://badge.fury.io/py/tensor2tensor.svg)](https://badge.fury.io/py/te"
  },
  {
    "path": "docs/multi_problem.md",
    "chars": 9794,
    "preview": "# Multi-problem training\n\nMulti-problem training is possible by defining [MultiProblem](https://github.com/tensorflow/te"
  },
  {
    "path": "docs/new_model.md",
    "chars": 4437,
    "preview": "# T2T: Create Your Own Model\n\n[![PyPI\nversion](https://badge.fury.io/py/tensor2tensor.svg)](https://badge.fury.io/py/ten"
  },
  {
    "path": "docs/new_problem.md",
    "chars": 8076,
    "preview": "# T2T: Train on Your Own Data\n\n[![PyPI\nversion](https://badge.fury.io/py/tensor2tensor.svg)](https://badge.fury.io/py/te"
  },
  {
    "path": "docs/overview.md",
    "chars": 8397,
    "preview": "# T2T: Life of an Example\n\n[![PyPI\nversion](https://badge.fury.io/py/tensor2tensor.svg)](https://badge.fury.io/py/tensor"
  },
  {
    "path": "docs/tutorials/asr_with_transformer.md",
    "chars": 162,
    "preview": "# Automated Speech Recognition with the Transformer model\n\nSee the\n[official tutorial](https://cloud.google.com/tpu/docs"
  },
  {
    "path": "docs/walkthrough.md",
    "chars": 19717,
    "preview": "# Tensor2Tensor\n\n[![PyPI\nversion](https://badge.fury.io/py/tensor2tensor.svg)](https://badge.fury.io/py/tensor2tensor)\n["
  },
  {
    "path": "floyd.yml",
    "chars": 34,
    "preview": "env: tensorflow-1.12\nmachine: gpu\n"
  },
  {
    "path": "floyd_requirements.txt",
    "chars": 14,
    "preview": "tensor2tensor\n"
  },
  {
    "path": "oss_scripts/oss_integration_test.sh",
    "chars": 1722,
    "preview": "#!/bin/bash\n\n# Note that this test script requires docker to be installed and running.\n\nset -v  # print commands as they"
  },
  {
    "path": "oss_scripts/oss_pip_install.sh",
    "chars": 861,
    "preview": "#!/bin/bash\n\nset -v  # print commands as they're executed\nset -e  # fail and exit on any command erroring\n\n: \"${TF_VERSI"
  },
  {
    "path": "oss_scripts/oss_release.sh",
    "chars": 892,
    "preview": "#!/bin/bash\n\nset -v  # print commands as they're executed\nset -e  # fail and exit on any command erroring\n\nGIT_COMMIT_ID"
  },
  {
    "path": "oss_scripts/oss_tests.sh",
    "chars": 6488,
    "preview": "#!/bin/bash\n\nset -v  # print commands as they're executed\n\n# Instead of exiting on any failure with \"set -e\", we'll call"
  },
  {
    "path": "pylintrc",
    "chars": 7866,
    "preview": "\n\n[MASTER]\n\n# Pickle collected data for later comparisons.\npersistent=no\n\n# Set the cache size for astng objects.\ncache-"
  },
  {
    "path": "setup.py",
    "chars": 3405,
    "preview": "\"\"\"Install tensor2tensor.\"\"\"\n\nfrom setuptools import find_packages\nfrom setuptools import setup\n\nsetup(\n    name='tensor"
  },
  {
    "path": "tensor2tensor/__init__.py",
    "chars": 606,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/bin/__init__.py",
    "chars": 606,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/bin/build_vocab.py",
    "chars": 2202,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/bin/make_tf_configs.py",
    "chars": 3371,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/bin/t2t-avg-all",
    "chars": 348,
    "preview": "#!/usr/bin/env python\n\"\"\"t2t-avg-all.\"\"\"\nfrom __future__ import absolute_import\nfrom __future__ import division\nfrom __f"
  },
  {
    "path": "tensor2tensor/bin/t2t-bleu",
    "chars": 340,
    "preview": "#!/usr/bin/env python\n\"\"\"t2t-bleu.\"\"\"\nfrom __future__ import absolute_import\nfrom __future__ import division\nfrom __futu"
  },
  {
    "path": "tensor2tensor/bin/t2t-datagen",
    "chars": 648,
    "preview": "#!/usr/bin/env python\n\"\"\"Data generation for Tensor2Tensor.\n\nThis script is used to generate data to train your models\nf"
  },
  {
    "path": "tensor2tensor/bin/t2t-decoder",
    "chars": 348,
    "preview": "#!/usr/bin/env python\n\"\"\"t2t-decoder.\"\"\"\nfrom __future__ import absolute_import\nfrom __future__ import division\nfrom __f"
  },
  {
    "path": "tensor2tensor/bin/t2t-eval",
    "chars": 665,
    "preview": "#!/usr/bin/env python\n\"\"\"Run t2t-eval from a trained checkpoint.\n\nThis script is used to run evaluation from a trained c"
  },
  {
    "path": "tensor2tensor/bin/t2t-exporter",
    "chars": 343,
    "preview": "#!/usr/bin/env python\n\"\"\"t2t-exporter.\"\"\"\nfrom __future__ import absolute_import\nfrom __future__ import division\nfrom __"
  },
  {
    "path": "tensor2tensor/bin/t2t-insights-server",
    "chars": 351,
    "preview": "#!/usr/bin/env python\n\"\"\"t2t-insights-server.\"\"\"\nfrom __future__ import absolute_import\nfrom __future__ import division\n"
  },
  {
    "path": "tensor2tensor/bin/t2t-make-tf-configs",
    "chars": 364,
    "preview": "#!/usr/bin/env python\n\"\"\"t2t-make-tf-configs.\"\"\"\nfrom __future__ import absolute_import\nfrom __future__ import division\n"
  },
  {
    "path": "tensor2tensor/bin/t2t-query-server",
    "chars": 345,
    "preview": "#!/usr/bin/env python\n\"\"\"t2t-query-server.\"\"\"\nfrom __future__ import absolute_import\nfrom __future__ import division\nfro"
  },
  {
    "path": "tensor2tensor/bin/t2t-trainer",
    "chars": 751,
    "preview": "#!/usr/bin/env python\n\"\"\"Trainer for Tensor2Tensor.\n\nThis script is used to train your models in Tensor2Tensor.\n\nFor exa"
  },
  {
    "path": "tensor2tensor/bin/t2t-translate-all",
    "chars": 367,
    "preview": "#!/usr/bin/env python\n\"\"\"t2t-translate-all.\"\"\"\nfrom __future__ import absolute_import\nfrom __future__ import division\nfr"
  },
  {
    "path": "tensor2tensor/bin/t2t_attack.py",
    "chars": 9996,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/bin/t2t_avg_all.py",
    "chars": 4419,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/bin/t2t_bleu.py",
    "chars": 7325,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/bin/t2t_datagen.py",
    "chars": 12728,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/bin/t2t_decoder.py",
    "chars": 7776,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/bin/t2t_distill.py",
    "chars": 6020,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/bin/t2t_eval.py",
    "chars": 2426,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/bin/t2t_prune.py",
    "chars": 4010,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/bin/t2t_trainer.py",
    "chars": 17106,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/bin/t2t_trainer_test.py",
    "chars": 1367,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/bin/t2t_translate_all.py",
    "chars": 4358,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/README.md",
    "chars": 3420,
    "preview": "# T2T Problems.\n\nThis directory contains `Problem` specifications for a number of problems. We\nuse a naming scheme for t"
  },
  {
    "path": "tensor2tensor/data_generators/__init__.py",
    "chars": 606,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/algorithmic.py",
    "chars": 17892,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/algorithmic_math.py",
    "chars": 21962,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/algorithmic_math_deepmind.py",
    "chars": 3366,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/algorithmic_math_test.py",
    "chars": 3070,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/algorithmic_math_two_variables.py",
    "chars": 4463,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/algorithmic_test.py",
    "chars": 4382,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/all_problems.py",
    "chars": 6231,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/allen_brain.py",
    "chars": 14095,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/allen_brain_test.py",
    "chars": 9434,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/audio.py",
    "chars": 5858,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/audio_encoder.py",
    "chars": 3034,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/audio_test.py",
    "chars": 1932,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/babi_qa.py",
    "chars": 16805,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/bair_robot_pushing.py",
    "chars": 5942,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/celeba.py",
    "chars": 10066,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/celeba_test.py",
    "chars": 1818,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/celebahq.py",
    "chars": 3918,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/cifar.py",
    "chars": 17441,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/cipher.py",
    "chars": 6292,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/cleaner_en_xx.py",
    "chars": 6508,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/cnn_dailymail.py",
    "chars": 11981,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/cola.py",
    "chars": 3449,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/common_voice.py",
    "chars": 8774,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/common_voice_test.py",
    "chars": 1289,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/conll_ner.py",
    "chars": 3318,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/desc2code.py",
    "chars": 10092,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/desc2code_test.py",
    "chars": 1743,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/dialog_abstract.py",
    "chars": 13435,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/dialog_cornell.py",
    "chars": 5559,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/dialog_dailydialog.py",
    "chars": 4581,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/dialog_opensubtitles.py",
    "chars": 7756,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/dialog_personachat.py",
    "chars": 6938,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/dna_encoder.py",
    "chars": 4021,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/dna_encoder_test.py",
    "chars": 1758,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/enwik8.py",
    "chars": 5524,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/fsns.py",
    "chars": 3207,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/function_docstring.py",
    "chars": 3544,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/gene_expression.py",
    "chars": 9825,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/gene_expression_test.py",
    "chars": 2504,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/generator_utils.py",
    "chars": 46409,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/generator_utils_test.py",
    "chars": 6917,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/google_robot_pushing.py",
    "chars": 4584,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/gym_env.py",
    "chars": 31022,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/gym_env_test.py",
    "chars": 9636,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/ice_parsing.py",
    "chars": 4801,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/image_lsun.py",
    "chars": 3651,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/image_utils.py",
    "chars": 14495,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/image_utils_test.py",
    "chars": 6094,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/imagenet.py",
    "chars": 21303,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/imagenet_test.py",
    "chars": 1985,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/imdb.py",
    "chars": 3226,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/inspect_tfrecord.py",
    "chars": 3754,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/lambada.py",
    "chars": 9767,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/librispeech.py",
    "chars": 11321,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/lm1b.py",
    "chars": 6102,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/lm1b_imdb.py",
    "chars": 1526,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/lm1b_mnli.py",
    "chars": 2028,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/mnist.py",
    "chars": 8191,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/moving_mnist.py",
    "chars": 4786,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/mrpc.py",
    "chars": 4598,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/mscoco.py",
    "chars": 9392,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/mscoco_test.py",
    "chars": 1748,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/multi_problem.py",
    "chars": 23027,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/multi_problem_v2.py",
    "chars": 15989,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/multi_problem_v2_test.py",
    "chars": 7417,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/multinli.py",
    "chars": 6436,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/ocr.py",
    "chars": 2661,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/ops/pack_sequences_ops.cc",
    "chars": 22146,
    "preview": "#include \"base/integral_types.h\"\n#include \"third_party/tensorflow/core/framework/op_kernel.h\"\n#include \"third_party/tens"
  },
  {
    "path": "tensor2tensor/data_generators/ops/pack_sequences_ops_test.py",
    "chars": 14868,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/ops/subword_text_encoder.cc",
    "chars": 4818,
    "preview": "#include \"third_party/py/tensor2tensor/data_generators/ops/subword_text_encoder.h\"\n\n#include \"third_party/absl/strings/s"
  },
  {
    "path": "tensor2tensor/data_generators/ops/subword_text_encoder.h",
    "chars": 1668,
    "preview": "#ifndef TENSOR2TESNOR_DATA_GENERATORS_OPS_SUBWORD_TEXT_ENCODER_H_\n#define TENSOR2TESNOR_DATA_GENERATORS_OPS_SUBWORD_TEXT"
  },
  {
    "path": "tensor2tensor/data_generators/ops/subword_text_encoder_ops.cc",
    "chars": 2306,
    "preview": "#include <memory>\n\n#include \"third_party/py/tensor2tensor/data_generators/ops/subword_text_encoder.h\"\n#include \"third_pa"
  },
  {
    "path": "tensor2tensor/data_generators/ops/subword_text_encoder_ops_test.py",
    "chars": 1348,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/ops/subword_text_encoder_test.cc",
    "chars": 1637,
    "preview": "#include \"third_party/py/tensor2tensor/data_generators/ops/subword_text_encoder.h\"\n\n#include \"testing/base/public/gunit."
  },
  {
    "path": "tensor2tensor/data_generators/ops/testdata/subwords",
    "chars": 168,
    "preview": "'<pad>'\n'<eos>'\n'the_'\n'quick_'\n'brow'\n'n_'\n'fox_'\n'jump'\n's_'\n'over_'\n'the_'\n'lazy_'\n'dog_'\n'ɧę'\n'ĻĽÒ_'\n'⻦'\n'⻭'\n'_'\n' '"
  },
  {
    "path": "tensor2tensor/data_generators/paraphrase_ms_coco.py",
    "chars": 5861,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/paraphrase_ms_coco_test.py",
    "chars": 3095,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/pointer_generator_word.py",
    "chars": 7368,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/problem.py",
    "chars": 37696,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/problem_hparams.py",
    "chars": 6680,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/problem_test.py",
    "chars": 8736,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/program_search.py",
    "chars": 4186,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/program_search_test.py",
    "chars": 4236,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/ptb.py",
    "chars": 4923,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/qnli.py",
    "chars": 3575,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/quora_qpairs.py",
    "chars": 3784,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/rte.py",
    "chars": 3603,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/scitail.py",
    "chars": 3690,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/seq2edits.py",
    "chars": 8873,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/snli.py",
    "chars": 5420,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/speech_recognition.py",
    "chars": 5497,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/squad.py",
    "chars": 7341,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/sst_binary.py",
    "chars": 3535,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/stanford_nli.py",
    "chars": 4421,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/style_transfer.py",
    "chars": 4983,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/style_transfer_test.py",
    "chars": 3227,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/subject_verb_agreement.py",
    "chars": 8538,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/test_data/1.csv",
    "chars": 35,
    "preview": "media_name,label\nmy_media,my_label\n"
  },
  {
    "path": "tensor2tensor/data_generators/test_data/corpus-1.txt",
    "chars": 100,
    "preview": "One morning I shot an elephant in my pajamas. How he got in my pajamas, I don't\nknow.\n\nGroucho Marx\n"
  },
  {
    "path": "tensor2tensor/data_generators/test_data/corpus-2.txt",
    "chars": 78,
    "preview": "I haven't slept for 10 days... because that would be too long.\n\nMitch Hedberg\n"
  },
  {
    "path": "tensor2tensor/data_generators/test_data/vocab-1.txt",
    "chars": 27,
    "preview": "lollipop,8\nreverberated,12\n"
  },
  {
    "path": "tensor2tensor/data_generators/test_data/vocab-2.txt",
    "chars": 53,
    "preview": "kattywampus,11\nkaput\nbalderdash,10\njiggery-pokery,14\n"
  },
  {
    "path": "tensor2tensor/data_generators/text_encoder.py",
    "chars": 36336,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/text_encoder_build_subword.py",
    "chars": 2983,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/text_encoder_test.py",
    "chars": 14873,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/text_problems.py",
    "chars": 49548,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/text_problems_test.py",
    "chars": 14237,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/timeseries.py",
    "chars": 10085,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/timeseries_data_generator.py",
    "chars": 2451,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/timeseries_data_generator_test.py",
    "chars": 2634,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/timeseries_test.py",
    "chars": 2992,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/tokenizer.py",
    "chars": 6296,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/tokenizer_test.py",
    "chars": 4584,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/transduction_problems.py",
    "chars": 7692,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/transduction_problems_test.py",
    "chars": 2770,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/translate.py",
    "chars": 13919,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/translate_encs.py",
    "chars": 3613,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/translate_encs_cubbitt.py",
    "chars": 3839,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/translate_ende.py",
    "chars": 6788,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/translate_ende_test.py",
    "chars": 2312,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/translate_enes.py",
    "chars": 3675,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/translate_enet.py",
    "chars": 2849,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/translate_enfr.py",
    "chars": 7889,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/translate_enid.py",
    "chars": 2654,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/translate_enmk.py",
    "chars": 2599,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/translate_enro.py",
    "chars": 5221,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/translate_entn.py",
    "chars": 1730,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/translate_envi.py",
    "chars": 2073,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/translate_enzh.py",
    "chars": 10367,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/translate_test.py",
    "chars": 2772,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/video_generated.py",
    "chars": 5763,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/video_utils.py",
    "chars": 29238,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/video_utils_test.py",
    "chars": 3874,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/vqa.py",
    "chars": 16949,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/vqa_utils.py",
    "chars": 8213,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/wiki.py",
    "chars": 11846,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/wiki_lm.py",
    "chars": 8400,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/wiki_multi_problems.py",
    "chars": 13041,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/wiki_revision.py",
    "chars": 19964,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/wiki_revision_utils.py",
    "chars": 19920,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/wikifact/README.md",
    "chars": 202,
    "preview": "# Assessing the Factual Accuracy of Generated Text\n\nThis directory will contain the code and scripts to generate data an"
  },
  {
    "path": "tensor2tensor/data_generators/wikisum/README.md",
    "chars": 11600,
    "preview": "# Generating Wikipedia by Summarizing Long Sequences\n\nThis directory contains the code and scripts to generate the datas"
  },
  {
    "path": "tensor2tensor/data_generators/wikisum/__init__.py",
    "chars": 606,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/wikisum/delete_instances.sh",
    "chars": 581,
    "preview": "#!/bin/bash\n\n# Delete Google Compute Engine instances with naming structure $NAME-$INDEX\n# (e.g. machines created with p"
  },
  {
    "path": "tensor2tensor/data_generators/wikisum/generate_vocab.py",
    "chars": 1657,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/wikisum/get_references_commoncrawl.py",
    "chars": 2517,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "tensor2tensor/data_generators/wikisum/get_references_web.py",
    "chars": 3019,
    "preview": "# coding=utf-8\n# Copyright 2023 The Tensor2Tensor Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  }
]

// ... and 353 more files (download for full content)

About this extraction

This page contains the full source code of the tensorflow/tensor2tensor GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 553 files (8.3 MB), approximately 2.2M tokens, and a symbol index with 7112 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo