[
  {
    "path": "README.md",
    "content": "---\ntitle: tensorflow2官方教程目录导航\ncategories: tensorflow2官方教程\ntags: tensorflow2.0教程\ntop: 1900\nabbrlink: tensorflow/tensorflow2-zh-readme\n---\n\n> 最全TensorFlow2.0学习路线 [www.mashangxue123.com](https://www.mashangxue123.com)\n\n# tensorflow2.0 官方教程目录导航\n\n\n#### Get started with TensorFlow 2.0\n\n\n#### Effective TensorFlow 2.0（高效的tensorflow 2.0）\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-quickstart-beginner.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-quickstart-beginner.html)\n> 英文版本：[https://tensorflow.google.cn/beta/tutorials/quickstart/beginner](https://tensorflow.google.cn/beta/tutorials/quickstart/beginner)\n> 翻译建议：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/quickstart/beginner.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/quickstart/beginner.md)\n\n#### Migrate from TF 1 to TF 2\n\n#### Convert with the upgrade script\n\n#### Get started for beginners (初学者入门 TensorFlow 2.0)\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-quickstart-beginner.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-quickstart-beginner.html)\n> 英文版本：[https://tensorflow.google.cn/beta/tutorials/quickstart/beginner](https://tensorflow.google.cn/beta/tutorials/quickstart/beginner)\n> 翻译建议：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/quickstart/beginner.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/quickstart/beginner.md)\n\n#### Get started for experts  (专家入门TensorFlow 2.0)\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-quickstart-advanced.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-quickstart-advanced.html)\n> 英文版本：[https://tensorflow.google.cn/beta/tutorials/quickstart/advanced](https://tensorflow.google.cn/beta/tutorials/quickstart/advanced)\n> 翻译建议：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/quickstart/beginner.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/quickstart/advanced.md)\n\n## Beginner tutorials\n\n### ML basics\n\n#### Overview\n\n#### Classify images (训练您的第一个神经网络：基本分类)\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-keras-basic_classification.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-keras-basic_classification.html)\n> 英文版本：[https://tensorflow.google.cn/beta/tutorials/keras/basic_classification](https://tensorflow.google.cn/beta/tutorials/keras/basic_classification)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/keras/basic_classification.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/keras/basic_classification.md)\n\n#### Classify text 使用Keras和TensorFlow Hub对电影评论进行文本分类)\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-keras-basic_text_classification_with_tfhub.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-keras-basic_text_classification_with_tfhub.html)\n> 英文版本：[https://tensorflow.google.cn/beta/tutorials/keras/basic_text_classification_with_tfhub](https://tensorflow.google.cn/beta/tutorials/keras/basic_text_classification_with_tfhub)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/keras/basic_text_classification_with_tfhub.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/keras/basic_text_classification_with_tfhub.md)\n\n#### Classify structured data (结构化数据分类实战：心脏病预测)\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-keras-feature_columns.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-keras-feature_columns.html)\n> 英文版本：[https://tensorflow.google.cn/beta/tutorials/keras/feature_columns](https://tensorflow.google.cn/beta/tutorials/keras/feature_columns)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/keras/feature_columns.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/keras/feature_columns.md)\n\n#### Regression  (回归项目实战：预测燃油效率 )\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-keras-basic_regression.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-keras-basic_regression.html)\n> 英文版本：[https://tensorflow.google.cn/beta/tutorials/keras/basic_regression](https://tensorflow.google.cn/beta/tutorials/keras/basic_regression)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/keras/basic_regression.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/keras/basic_regression.md)\n\n#### Overfitting and underfitting (探索过拟合和欠拟合)\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-keras-overfit_and_underfit.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-keras-overfit_and_underfit.html)\n> 英文版本：[https://tensorflow.google.cn/beta/tutorials/keras/overfit_and_underfit](https://tensorflow.google.cn/beta/tutorials/keras/overfit_and_underfit)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/keras/overfit_and_underfit.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/keras/overfit_and_underfit.md)\n\n#### Save and restore models (tensorflow2保存和加载模型 )\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-keras-save_and_restore_models.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-keras-save_and_restore_models.html)\n> 英文版本：[https://tensorflow.google.cn/beta/tutorials/keras/save_and_restore_models](https://tensorflow.google.cn/beta/tutorials/keras/save_and_restore_models)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/keras/save_and_restore_models.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/keras/save_and_restore_models.md)\n\n### Images\n\n#### Convolutional Neural Networks (使用TensorFlow2.0实现卷积神经网络CNN对MNIST数字分类)\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-images-intro_to_cnns.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-images-intro_to_cnns.html)\n> 英文版本：[https://tensorflow.google.cn/beta/tutorials/images/save_and_restore_models](https://tensorflow.google.cn/beta/tutorials/images/intro_to_cnns)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/images/intro_to_cnns.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/images/intro_to_cnns.md)\n\n#### Transfer learning with TFHub (基于Keras使用TensorFlow Hub实现迁移学习)\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-images-hub_with_keras.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-images-hub_with_keras.html)\n> 英文版本：[https://tensorflow.google.cn/beta/tutorials/images/hub_with_keras](https://tensorflow.google.cn/beta/tutorials/images/hub_with_keras)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/images/hub_with_keras.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/images/hub_with_keras.md)\n\n#### Transfer learning with pretrained CNNs (使用预训练的卷积神经网络进行迁移学习)\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-images-transfer_learning.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-images-transfer_learning.html)\n> 英文版本：[https://tensorflow.google.cn/beta/tutorials/images/transfer_learning](https://tensorflow.google.cn/beta/tutorials/images/transfer_learning)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/images/transfer_learning.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/images/transfer_learning.md)\n\n### Text and sequences\n\n#### Intro to word embeddings (NLP词嵌入Word embedding实战项目)\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-text-word_embeddings.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-text-word_embeddings.html)\n> 英文版本：[https://tensorflow.google.cn/beta/tutorials/text/word_embeddings](https://tensorflow.google.cn/beta/tutorials/text/word_embeddings)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/text/word_embeddings.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/text/word_embeddings.md)\n\n#### Classify preprocessed text (文本分类项目实战：电影评论)\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-keras-basic_text_classification.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-keras-basic_text_classification.html)\n> 英文版本：[https://tensorflow.google.cn/beta/tutorials/keras/basic_text_classification](https://tensorflow.google.cn/beta/tutorials/keras/basic_text_classification)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/keras/basic_text_classification.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/keras/basic_text_classification.md)\n\n#### Classify text with a RNN (使用RNN对文本进行分类实践：电影评论)\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-text-text_classification_rnn.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-text-text_classification_rnn.html)\n> 英文版本：[https://tensorflow.google.cn/beta/tutorials/text/text_classification_rnn](https://tensorflow.google.cn/beta/tutorials/text/text_classification_rnn)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/text/text_classification_rnn.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/text/text_classification_rnn.md)\n\n### Estimators\n\n#### Linear models\n\n## Advanced tutorials\n\n### Customization\n\n#### Overview\n#### Tensors and operations (tensorflow2.0张量及其操作、numpy兼容、GPU加速)\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-eager-basics.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-eager-basics.html)\n> 英文版本：[https://tensorflow.google.cn/beta/tutorials/eager/basics](https://tensorflow.google.cn/beta/tutorials/eager/basics)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/eager/basics.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/eager/basics.md)\n\n#### Custom layers (使用Keras自定义层)\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-eager-custom_layers.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-eager-custom_layers.html)\n> 英文版本：[https://tensorflow.google.cn/beta/tutorials/eager/custom_layers](https://tensorflow.google.cn/beta/tutorials/eager/custom_layers)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/eager/custom_layers.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/eager/custom_layers.md)\n\n#### Automatic differentiation (TF梯度下降法的核心自动微分和梯度带)\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-eager-automatic_differentiation.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-eager-automatic_differentiation.html)\n> 英文版本：[https://tensorflow.google.cn/beta/tutorials/eager/automatic_differentiation](https://tensorflow.google.cn/beta/tutorials/eager/automatic_differentiation)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/eager/automatic_differentiation.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/eager/automatic_differentiation.md)\n\n#### Custom training: basics (构建tensorflow2.0模型自定义训练的基础步骤)\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-eager-custom_training.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-eager-custom_training.htnl)\n> 英文版本：[https://tensorflow.google.cn/beta/tutorials/eager/custom_training](https://tensorflow.google.cn/beta/tutorials/eager/custom_training)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/eager/custom_training.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/eager/custom_training.md)\n\n#### Custom training: walkthrough (使用Keras演示TensorFlow2.0自定义训练实战)\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-eager-custom_training_walkthrough.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-eager-custom_training_walkthrough.html)\n> 英文版本：[https://tensorflow.google.cn/beta/tutorials/eager/custom_training_walkthrough](https://tensorflow.google.cn/beta/tutorials/eager/custom_training_walkthrough)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/eager/custom_training_walkthrough.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/eager/custom_training_walkthrough.md)\n\n#### TF function and AutoGraph (TF梯度下降法的核心自动微分和梯度带)\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-eager-automatic_differentiation.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-eager-automatic_differentiation.html)\n> 英文版本：[https://tensorflow.google.cn/beta/tutorials/eager/automatic_differentiation](https://tensorflow.google.cn/beta/tutorials/eager/automatic_differentiation)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/eager/automatic_differentiation.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/eager/automatic_differentiation.md)\n\n### Text and sequences\n\n#### Generate text with an RNN(使用RNN生成文本实战：莎士比亚风格诗句)\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-text-text_generation.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-text-text_generation.html)\n> 英文版本：[https://tensorflow.google.cn/beta/tutorials/text/text_generation](https://tensorflow.google.cn/beta/tutorials/text/text_generation)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/text/text_generation.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/text/text_generation.md)\n\n#### Neural Machine Translation with Attention(采用注意力机制的神经机器翻译)\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-text-nmt_with_attention.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-text-nmt_with_attention.html)\n> 英文版本：[https://tensorflow.google.cn/beta/tutorials/text/nmt_with_attention](https://tensorflow.google.cn/beta/tutorials/text/nmt_with_attention)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/text/nmt_with_attention.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/text/nmt_with_attention.md\n\n#### Image captioning （使用注意力机制给图片取标题）\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-text-image_captioning.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-text-image_captioning.html)\n> 英文版本：[https://tensorflow.google.cn/beta/tutorials/text/image_captioning](https://tensorflow.google.cn/beta/tutorials/text/image_captioning)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/text/image_captioning.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/text/image_captioning.md)\n\n#### Transformer model for language understanding\n\n### Image Generation\n### Image Optimization\n\n#### Style Transfer\n\n### GANs\n#### DCGAN\n#### Pix2Pix\n\n### Auto Encoders\n#### VAE\n\n### Loading data\n#### Load CSV data\n#### Build an image input pipeline\n#### Load text with tf.data\n#### Use TFRecords and tf.Example\n#### Unicode strings\n\n### Distributed training\n#### Distributed training\n#### Distributed training with custom training loops\n#### Multi worker training\n\n## Guide\n\n### Eager essentials\n### Variables\n### AutoGraph\n\n### Keras\n#### Keras overview  （Keras概述：构建模型，输入数据，训练，评估，回调，保存，分布）\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-guide-keras-overview.html](https://www.mashangxue123.com/tensorflow/tf2-guide-keras-overview.html)\n> 英文版本：[https://tensorflow.google.cn/beta/guide/keras/overview](https://tensorflow.google.cn/beta/guide/keras/overview)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/guide/keras/overview.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/guide/keras/overview.md)\n\n#### Keras functional API （不用Sequential模型，TensorFlow中的Keras函数式API ）\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-guide-keras-functional.html](https://www.mashangxue123.com/tensorflow/tf2-guide-keras-functional.html)\n> 英文版本：[https://tensorflow.google.cn/beta/guide/keras/functional](https://tensorflow.google.cn/beta/guide/keras/functional)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/guide/keras/functional.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/guide/keras/functional.md)\n\n#### Train and evaluate （使用TensorFlow Keras进行训练和评估）\n\n最新版本：https://www.mashangxue123.com/tensorflow/tf2-guide-keras-training_and_evaluation.html \n英文版本：https://tensorflow.google.cn/beta/guide/keras/training_and_evaluation\n翻译建议PR：https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/guide/keras/training_and_evaluation.md\n\n#### Write layers and models from scratch\n#### Save and serialize models\n#### Write custom callbacks\n\n### Accelerators\n#### Distribution strategy\n#### Using GPU\n\n### Data Loading\n#### Performance\n\n### Serialization\n#### Checkpoints\n#### Saved models\n\n### Misc\n\n#### Version Compatibility\n\n\n"
  },
  {
    "path": "r2/guide/eager.md",
    "content": "---\ntitle: Eager Execution 概述\ntags: tensorflow2.0教程\ncategories: tensorflow2官方教程\ntop: 1999\nabbrlink: tensorflow/tf2-guide-eager\n---\n\n# Eager Execution 概述\n\nTensorFlow 的 Eager Execution 是一种命令式编程环境，可立即评估操作，无需构建图：操作会返回具体的值，而不是构建以后再运行的计算图。这样能让您轻松地开始使用 TensorFlow 和调试模型，并且还减少了样板代码。要遵循本指南，请在交互式 python 解释器中运行下面的代码示例。\n\nEager Execution 是一个灵活的机器学习平台，用于研究和实验，可提供：\n\n* *直观的界面* - 自然地组织代码结构并使用 Python 数据结构。快速迭代小模型和小型数据集。\n\n* *更轻松的调试功能* - 直接调用操作以检查正在运行的模型并测试更改。使用标准 Python 调试工具进行即时错误报告。\n\n* *自然控制流程* - 使用 Python 控制流程而不是图控制流程，简化了动态模型的规范。\n\nEager Execution 支持大多数 TensorFlow 操作和 GPU 加速。\n\n注意：如果启用 Eager Execution，某些模型的开销可能会增加。我们正在改进性能；如果发现问题，请报告错误，并分享您的基准测试结果。\n\n\n## 设置和基本用法\n\n升级到最新版本的 TensorFlow：\n\n```python\nfrom __future__ import absolute_import, division, print_function, unicode_literals\n\n# pip install tensorflow==2.0.0-alpha0\nimport tensorflow as tf\n```\n\n在Tensorflow 2.0中，默认情况下启用了Eager Execution。\n\n```python\ntf.executing_eagerly()\n```\n\n```\n      True\n```\n\n现在您可以运行TensorFlow操作，结果将立即返回：\n\n```python\nx = [[2.]]\nm = tf.matmul(x, x)\nprint(\"hello, {}\".format(m))\n```\n\n```\n      hello, [[4.]]\n```\n\n启用 Eager Execution 会改变 TensorFlow 操作的行为方式(现在它们会立即评估并将值返回给 Python)。`tf.Tensor` 对象会引用具体值，而不是指向计算图中的节点的符号句柄。由于不需要构建稍后在会话中运行的计算图，因此使用 `print()` 或调试程序很容易检查结果。评估、输出和检查张量值不会中断计算梯度的流程。\n\nEager Execution 适合与 NumPy 一起使用。NumPy 操作接受`tf.Tensor` 参数。TensorFlow [数学运算](https://tensorflow.google.cn/api_guides/python/math_ops) 将 Python 对象和 NumPy 数组转换为 `tf.Tensor` 对象。`tf.Tensor.numpy` 方法返回对象的值作为 NumPy  `ndarray`。\n\n```python\na = tf.constant([[1, 2],\n                 [3, 4]])\nprint(a)\n```\n\n```\n      tf.Tensor(\n      [[1 2]\n       [3 4]], shape=(2, 2), dtype=int32)\n```\n\n\n```python\n# Broadcasting support\nb = tf.add(a, 1)\nprint(b)\n```\n\n```\n      tf.Tensor(\n      [[2 3]\n       [4 5]], shape=(2, 2), dtype=int32)\n```\n\n```python\n# Operator overloading is supported\nprint(a * b)\n```\n\n```\n      tf.Tensor(\n      [[ 2  6]\n       [12 20]], shape=(2, 2), dtype=int32)\n```\n\n\n```python\n# 使用NumPy值\nimport numpy as np\n\nc = np.multiply(a, b)\nprint(c)\n```\n\n```\n      [[ 2  6]\n       [12 20]]\n```\n\n\n```python\n# 从张量中获取numpy值：\nprint(a.numpy())\n# => [[1 2]\n#     [3 4]]\n```\n\n## 动态控制流\n\nEager Execution 的一个主要好处是，在执行模型时，主机语言的所有功能都可用。因此，编写 [fizzbuzz](https://baike.baidu.com/item/FizzBuzz%E9%97%AE%E9%A2%98/16083686?fr=aladdin)很容易（举例而言）：\n\n*FizzBuzz问题：举个例子，编写一个程序从1到100.当遇到数字为3的倍数的时候，点击“Fizz”替代数字，5的倍数用“Buzz”代替，既是3的倍数又是5的倍数点击“FizzBuzz”。* \n\n```python\ndef fizzbuzz(max_num):\n  counter = tf.constant(0)\n  max_num = tf.convert_to_tensor(max_num)\n  for num in range(1, max_num.numpy()+1):\n    num = tf.constant(num)\n    if int(num % 3) == 0 and int(num % 5) == 0:\n      print('FizzBuzz')\n    elif int(num % 3) == 0:\n      print('Fizz')\n    elif int(num % 5) == 0:\n      print('Buzz')\n    else:\n      print(num.numpy())\n    counter += 1\n```\n\n```python\nfizzbuzz(15)\n```\n\n```\n1 2 Fizz 4 Buzz Fizz 7 8 Fizz Buzz 11 Fizz 13 14 FizzBuzz\n```\n\n这段代码具有依赖于张量值的条件并在运行时输出这些值。\n\n\n## 构建模型\n\n许多机器学习模型通过组合层来表示。将 TensorFlow 与 Eager Execution 结合使用时，您可以编写自己的层或使用在 `tf.keras.layers` 程序包中提供的层。\n\n虽然您可以使用任何 Python 对象表示层，但 TensorFlow 提供了便利的基类 `tf.keras.layers.Layer`。您可以通过继承它实现自己的层，如果必须强制执行该层，在构造函数中设置 `self.dynamic=True`：\n\n```python\nclass MySimpleLayer(tf.keras.layers.Layer):\n  def __init__(self, output_units):\n    super(MySimpleLayer, self).__init__()\n    self.output_units = output_units\n    self.dynamic = True\n\n  def build(self, input_shape):\n    # The build method gets called the first time your layer is used.\n    # 构建方法在第一次使用图层时被调用。\n    # 在build()上创建变量允许您使其形状取决于输入形状，因此无需用户指定完整形状。 \n    # 如果您已经知道它们的完整形状，则可以在` __init__()`期间创建变量。\n    self.kernel = self.add_variable(\n      \"kernel\", [input_shape[-1], self.output_units])\n\n  def call(self, input):\n    # 覆盖 `call()` 而不是`__call__`，这样我们就可以执行一些记帐。\n    return tf.matmul(input, self.kernel)\n```\n\n请使用`tf.keras.layers.Dense`层（而不是上面的`MySimpleLayer`），因为它具有其功能的超集（它也可以添加偏差）。\n\n将层组合成模型时，可以使用 `tf.keras.Sequential` 表示由层线性堆叠的模型。它非常适合用于基本模型：\n\n```python\nmodel = tf.keras.Sequential([\n  tf.keras.layers.Dense(10, input_shape=(784,)),  # must declare input shape\n  tf.keras.layers.Dense(10)\n])\n```\n\n或者，通过继承 `tf.keras.Model` 将模型整理为类。这是一个本身也是层的层容器，允许 `tf.keras.Model`对象包含其他  `tf.keras.Model` 对象。\n\n```python\nclass MNISTModel(tf.keras.Model):\n  def __init__(self):\n    super(MNISTModel, self).__init__()\n    self.dense1 = tf.keras.layers.Dense(units=10)\n    self.dense2 = tf.keras.layers.Dense(units=10)\n\n  def call(self, input):\n    \"\"\"Run the model.\"\"\"\n    result = self.dense1(input)\n    result = self.dense2(result)\n    result = self.dense2(result)  # reuse variables from dense2 layer\n    return result\n\nmodel = MNISTModel()\n```\n\n因为第一次将输入传递给层时已经设置参数，所以不需要为`tf.keras.Model` 类设置输入形状。\n\n`tf.keras.layers` 类会创建并包含自己的模型变量，这些变量与其层对象的生命周期相关联。要共享层变量，请共享其对象。\n\n## Eager 训练\n\n### 计算梯度\n\n[自动微分](https://en.wikipedia.org/wiki/Automatic_differentiation)对于实现机器学习算法（例如用于训练神经网络的[反向传播](https://en.wikipedia.org/wiki/Backpropagation)）来说很有用。在 Eager Execution 期间，请使用 `tf.GradientTape` 跟踪操作以便稍后计算梯度。\n\n`tf.GradientTape`  是一种选择性功能，可在不跟踪时提供最佳性能。由于在每次调用期间都可能发生不同的操作，因此所有前向传播操作都会记录到“磁带”中。要计算梯度，请反向播放磁带，然后放弃。特定的 `tf.GradientTape`  只能计算一个梯度；随后的调用会抛出运行时错误。\n\n```python\nw = tf.Variable([[1.0]])\nwith tf.GradientTape() as tape:\n  loss = w * w\n\ngrad = tape.gradient(loss, w)\nprint(grad)  # => tf.Tensor([[ 2.]], shape=(1, 1), dtype=float32)\n```\n\n\n### 训练模型\n\n以下示例将创建一个多层模型，该模型会对标准 MNIST 手写数字进行分类。它演示了在 Eager Execution 环境中构建可训练图的优化器和层 API。\n\n```python\n# 获取并格式化mnist数据\n(mnist_images, mnist_labels), _ = tf.keras.datasets.mnist.load_data()\n\ndataset = tf.data.Dataset.from_tensor_slices(\n  (tf.cast(mnist_images[...,tf.newaxis]/255, tf.float32),\n   tf.cast(mnist_labels,tf.int64)))\ndataset = dataset.shuffle(1000).batch(32)\n```\n\n\n```python\n# 建立模型\nmnist_model = tf.keras.Sequential([\n  tf.keras.layers.Conv2D(16,[3,3], activation='relu',\n                         input_shape=(None, None, 1)),\n  tf.keras.layers.Conv2D(16,[3,3], activation='relu'),\n  tf.keras.layers.GlobalAveragePooling2D(),\n  tf.keras.layers.Dense(10)\n])\n```\n\n即使没有训练，也可以在 Eager Execution 中调用模型并检查输出：\n\n```python\nfor images,labels in dataset.take(1):\n  print(\"Logits: \", mnist_model(images[0:1]).numpy())\n```\n\n```\n      Logits: [[-1.9521490e-02 2.2975644e-02 2.8935237e-02 2.0388789e-02 -1.8511273e-02 -6.4317137e-05 6.0662534e-03 -1.7174225e-02 5.4899108e-02 -2.8871424e-02]]\n```\n\n虽然 keras 模型具有内置训练循环（使用 `fit` 方法），但有时您需要更多自定义设置。下面是一个用 eager 实现的训练循环示例：\n\n```python\noptimizer = tf.keras.optimizers.Adam()\nloss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)\n\nloss_history = []\n```\n\n\n```python\nfor (batch, (images, labels)) in enumerate(dataset.take(400)):\n  if batch % 10 == 0:\n    print('.', end='')\n  with tf.GradientTape() as tape:\n    logits = mnist_model(images, training=True)\n    loss_value = loss_object(labels, logits)\n\n  loss_history.append(loss_value.numpy().mean())\n  grads = tape.gradient(loss_value, mnist_model.trainable_variables)\n  optimizer.apply_gradients(zip(grads, mnist_model.trainable_variables))\n```\n\n\n```python\nimport matplotlib.pyplot as plt\n\nplt.plot(loss_history)\nplt.xlabel('Batch #')\nplt.ylabel('Loss [entropy]')\n```\n\n```\n      Text(0, 0.5, 'Loss [entropy]')\n```\n\n该示例使用了 [TensorFlow MNIST 示例](https://github.com/tensorflow/models/tree/master/official/mnist) 中的 [dataset.py](https://github.com/tensorflow/models/blob/master/official/mnist/dataset.py) 模块，请将该文件下载到本地目录。运行以下命令以将 MNIST 数据文件下载到工作目录并准备要进行训练的 tf.data.Dataset：\n\n### 变量和优化器\n\n`tf.Variable` 对象会存储在训练期间访问的可变 `tf.Tensor` 值，以更加轻松地实现自动微分。模型的参数可以作为变量封装在类中。\n\n通过将 `tf.Variable` 与 `tf.GradientTape` 结合使用可以更好地封装模型参数。例如，上面的自动微分示例可以重写为：\n\n```python\nclass Model(tf.keras.Model):\n  def __init__(self):\n    super(Model, self).__init__()\n    self.W = tf.Variable(5., name='weight')\n    self.B = tf.Variable(10., name='bias')\n  def call(self, inputs):\n    return inputs * self.W + self.B\n\n# 点数约为3 * x + 2的玩具数据集\nNUM_EXAMPLES = 2000\ntraining_inputs = tf.random.normal([NUM_EXAMPLES])\nnoise = tf.random.normal([NUM_EXAMPLES])\ntraining_outputs = training_inputs * 3 + 2 + noise\n\n# 要优化的损失函数\ndef loss(model, inputs, targets):\n  error = model(inputs) - targets\n  return tf.reduce_mean(tf.square(error))\n\ndef grad(model, inputs, targets):\n  with tf.GradientTape() as tape:\n    loss_value = loss(model, inputs, targets)\n  return tape.gradient(loss_value, [model.W, model.B])\n\n# Define:\n# 1. A model.\n# 2. Derivatives of a loss function with respect to model parameters.\n# 3. A strategy for updating the variables based on the derivatives.\nmodel = Model()\noptimizer = tf.keras.optimizers.SGD(learning_rate=0.01)\n\nprint(\"Initial loss: {:.3f}\".format(loss(model, training_inputs, training_outputs)))\n\n# Training loop\nfor i in range(300):\n  grads = grad(model, training_inputs, training_outputs)\n  optimizer.apply_gradients(zip(grads, [model.W, model.B]))\n  if i % 20 == 0:\n    print(\"Loss at step {:03d}: {:.3f}\".format(i, loss(model, training_inputs, training_outputs)))\n\nprint(\"Final loss: {:.3f}\".format(loss(model, training_inputs, training_outputs)))\nprint(\"W = {}, B = {}\".format(model.W.numpy(), model.B.numpy()))\n```\n\n## 在Eager Execution期间将对象用于状态\n\n使用 TF 1.x的 Graph Execution 时，程序状态（如变量）存储在全局集合中，它们的生命周期由 `tf.Session` 对象管理。相反，在Eager Execution期间，状态对象的生命周期由其对应的 Python 对象的生命周期决定。\n\n### 变量是对象\n\n在 Eager Execution 期间，变量会一直存在，直到相应对象的最后一个引用被移除，然后变量被删除。\n\n```python\nif tf.test.is_gpu_available():\n  with tf.device(\"gpu:0\"):\n    v = tf.Variable(tf.random.normal([1000, 1000]))\n    v = None  # v no longer takes up GPU memory\n```\n\n### 基于对象的保存\n\n本节是[训练检查点指南](https://tensorflow.google.cn/beta/guide/checkpoints)的简短版本。\n\n`tf.train.Checkpoint` 可以将 `tf.Variable` 保存到检查点并从中恢复：\n\n```python\nx = tf.Variable(10.)\ncheckpoint = tf.train.Checkpoint(x=x)\n```\n\n```\nx.assign(2.)   # 为变量分配新值并保存。\ncheckpoint_path = './ckpt/'\ncheckpoint.save('./ckpt/')\n```\n\n```python\nx.assign(11.)  # 保存后更改变量。\n\n# 从检查点恢复值\ncheckpoint.restore(tf.train.latest_checkpoint(checkpoint_path))\n\nprint(x)  # => 2.0\n```\n\n要保存和加载模型，`tf.train.Checkpoint` 会存储对象的内部状态，而不需要隐藏变量。要记录 `model`、`optimizer` 和全局步的状态，请将它们传递到 `tf.train.Checkpoint`：\n\n```python\nimport os\n\nmodel = tf.keras.Sequential([\n  tf.keras.layers.Conv2D(16,[3,3], activation='relu'),\n  tf.keras.layers.GlobalAveragePooling2D(),\n  tf.keras.layers.Dense(10)\n])\noptimizer = tf.keras.optimizers.Adam(learning_rate=0.001)\ncheckpoint_dir = 'path/to/model_dir'\nif not os.path.exists(checkpoint_dir):\n  os.makedirs(checkpoint_dir)\ncheckpoint_prefix = os.path.join(checkpoint_dir, \"ckpt\")\nroot = tf.train.Checkpoint(optimizer=optimizer,\n                           model=model)\n\nroot.save(checkpoint_prefix)\nroot.restore(tf.train.latest_checkpoint(checkpoint_dir))\n```\n\n注意：在许多训练循环中，在调用`tf.train.Checkpoint.restore`之后创建变量。这些变量将在创建后立即恢复，并且可以使用断言来确保检查点已完全加载。有关详细信息，请参阅[训练检查点指南](https://tensorflow.google.cn/beta/guide/checkpoints)。\n\n### 面向对象的指标\n\n`tf.keras.metrics`存储为对象。通过将新数据传递给可调用对象来更新指标，并使用  `tf.keras.metrics.result`方法检索结果，例如：\n\n```python\nm = tf.keras.metrics.Mean(\"loss\")\nm(0)\nm(5)\nm.result()  # => 2.5\nm([8, 9])\nm.result()  # => 5.5\n```\n\n## 自动微分高级内容\n\n### 动态模型\n\n`tf.GradientTape` 也可用于动态模型。这个回溯线搜索算法示例看起来像普通的 NumPy 代码，除了存在梯度并且可微分，尽管控制流比较复杂：\n\n```python\ndef line_search_step(fn, init_x, rate=1.0):\n  with tf.GradientTape() as tape:\n    # Variables are automatically recorded, but manually watch a tensor\n    tape.watch(init_x)\n    value = fn(init_x)\n  grad = tape.gradient(value, init_x)\n  grad_norm = tf.reduce_sum(grad * grad)\n  init_value = value\n  while value > init_value - rate * grad_norm:\n    x = init_x - rate * grad\n    value = fn(x)\n    rate /= 2.0\n  return x, value\n```\n\n### 自定义梯度\n\n自定义梯度是一种覆盖梯度的简单方法。在正向函数中，定义相对于输入、输出或中间结果的梯度。例如，下面是在反向传播中截断梯度范数的一种简单方式：\n\n```python\n@tf.custom_gradient\ndef clip_gradient_by_norm(x, norm):\n  y = tf.identity(x)\n  def grad_fn(dresult):\n    return [tf.clip_by_norm(dresult, norm), None]\n  return y, grad_fn\n```\n\n自定义梯度通常用于为一系列操作提供数值稳定的梯度：\n\n```python\ndef log1pexp(x):\n  return tf.math.log(1 + tf.exp(x))\n\ndef grad_log1pexp(x):\n  with tf.GradientTape() as tape:\n    tape.watch(x)\n    value = log1pexp(x)\n  return tape.gradient(value, x)\n\n```\n\n\n```python\n# 梯度计算在x = 0时工作正常。\ngrad_log1pexp(tf.constant(0.)).numpy()   # => 0.5\n```\n\n`0.5`\n\n```python\n# 但是，由于数值不稳定，x = 100失败。\ngrad_log1pexp(tf.constant(100.)).numpy()  # => nan\n```\n\n`nan`\n\n在此处，`log1pexp` 函数可以通过自定义梯度进行分析简化。下面的实现重用了在前向传播期间计算的`tf.exp(x)`的值，通过消除冗余计算，变得更加高效：\n\n```python\n@tf.custom_gradient\ndef log1pexp(x):\n  e = tf.exp(x)\n  def grad(dy):\n    return dy * (1 - 1 / (1 + e))\n  return tf.math.log(1 + e), grad\n\ndef grad_log1pexp(x):\n  with tf.GradientTape() as tape:\n    tape.watch(x)\n    value = log1pexp(x)\n  return tape.gradient(value, x)\n\n```\n\n\n```python\n# 和以前一样，梯度计算在x = 0时工作正常。\ngrad_log1pexp(tf.constant(0.)).numpy()    # => 0.5\n```\n\n\n```python\n# 并且梯度计算也适用于x = 100。\ngrad_log1pexp(tf.constant(100.)).numpy()   # => 1.0\n```\n\n## 性能\n\n在Eager Execution期间，计算会自动分流到 GPU。如果要控制计算运行的位置，可以将其放在`tf.device('/gpu:0')`  块（或 CPU 等效块）中：\n\n```python\nimport time\n\ndef measure(x, steps):\n  # TensorFlow在第一次使用时初始化GPU，从计时中排除。\n  tf.matmul(x, x)\n  start = time.time()\n  for i in range(steps):\n    x = tf.matmul(x, x)\n  # tf.matmul can return before completing the matrix multiplication\n  # (e.g., can return after enqueing the operation on a CUDA stream).\n  # The x.numpy() call below will ensure that all enqueued operations\n  # have completed (and will also copy the result to host memory,\n  # so we're including a little more than just the matmul operation\n  # time).\n  _ = x.numpy()\n  end = time.time()\n  return end - start\n\nshape = (1000, 1000)\nsteps = 200\nprint(\"Time to multiply a {} matrix by itself {} times:\".format(shape, steps))\n\n# Run on CPU:\nwith tf.device(\"/cpu:0\"):\n  print(\"CPU: {} secs\".format(measure(tf.random.normal(shape), steps)))\n\n# Run on GPU, if available:\nif tf.test.is_gpu_available():\n  with tf.device(\"/gpu:0\"):\n    print(\"GPU: {} secs\".format(measure(tf.random.normal(shape), steps)))\nelse:\n  print(\"GPU: not found\")\n```\n\n```\n      Time to multiply a (1000, 1000) matrix by itself 200 times:\n      CPU: 0.7741374969482422 secs\n      GPU: not found\n```\n\n`tf.Tensor`对象可以复制到不同的设备来执行其操作：\n\n```python\nif tf.test.is_gpu_available():\n  x = tf.random.normal([10, 10])\n\n  x_gpu0 = x.gpu()\n  x_cpu = x.cpu()\n\n  _ = tf.matmul(x_cpu, x_cpu)    # Runs on CPU\n  _ = tf.matmul(x_gpu0, x_gpu0)  # Runs on GPU:0\n\n```\n\n### 基准\n\n对于计算量繁重的模型（如在 GPU 上训练的 [ResNet50](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/eager/python/examples/resnet50)），Eager Execution 性能与 `tf.function` Execution 相当。但是对于计算量较小的模型来说，这种性能差距会越来越大，并且有很多工作要做，以便为具有大量小操作的模型优化热代码路径。\n\n## 使用`tf.function`\n\n虽然Eager Execution使开发和调试更具交互性，但TensorFlow 1.x样式图执行在分布式训练，性能优化和生产部署方面具有优势。为了弥补这一差距，TensorFlow 2.0通过`tf.function` API引入此功能。有关更多信息，请参阅[Autograph指南](https://tensorflow.google.cn/beta/guide/autograph)。\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-guide-eager.html](https://www.mashangxue123.com/tensorflow/tf2-guide-eager.html)\n> 英文版本：[https://tensorflow.google.cn/beta/guide/eager](https://tensorflow.google.cn/beta/guide/eager)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/guide/eager.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/guide/eager.md)"
  },
  {
    "path": "r2/guide/effective_tf2.md",
    "content": "---\ntitle: 高效的TensorFlow 2.0\ntags: \n    - tensorflow2.0\ncategories: \n    - tensorflow2官方教程\ntop: 1902\nabbrlink: tensorflow/tf2-guide-effective_tf2\n---\n\n# 高效的TensorFlow 2.0 (tensorflow2.0官方教程翻译)\n\nTensorFlow 2.0中有多处更改，以使TensorFlow用户使用更高效。TensorFlow 2.0删除[冗余 APIs](https://github.com/tensorflow/community/blob/master/rfcs/20180827-api-names.md),使API更加一致([统一 RNNs](https://github.com/tensorflow/community/blob/master/rfcs/20180920-unify-rnn-interface.md),[统一优化器](https://github.com/tensorflow/community/blob/master/rfcs/20181016-optimizer-unification.md)),并通过[Eager execution](https://www.tensorflow.org/guide/eager)模式更好地与Python运行时集成\n\n许多[RFCs](https://github.com/tensorflow/community/pulls?utf8=%E2%9C%93&q=is%3Apr)已经解释了TensorFlow 2.0所带来的变化。本指南介绍了TensorFlow 2.0应该是什么样的开发，假设您对TensorFlow 1.x有一定的了解。\n\n## 1. 主要变化的简要总结\n\n### 1.1. API清理\n\n许多API在tensorflow 2.0中[消失或移动](https://github.com/tensorflow/community/blob/master/rfcs/20180827-api-names.md)。一些主要的变化包括删除`tf.app`、`tf.flags`和`tf.logging` ，转而支持现在开源的[absl-py](https://github.com/abseil/abseil-py)，重新安置`tf.contrib`中的项目，并清理主要的 `tf.*`命名空间，将不常用的函数移动到像 `tf.math`这样的子包中。一些API已被2.0版本等效替换，如`tf.summary`, `tf.keras.metrics`和`tf.keras.optimizers`。\n自动应用这些重命名的最简单方法是使用[v2升级脚本](https://tensorflow.google.cn/beta/guide/upgrade)。\n\n### 1.2. Eager execution\n\nTensorFlow 1.X要求用户通过进行`tf.*` API调用，手动将抽象语法树（图形）拼接在一起。然后要求用户通过将一组输出张量和输入张量传递给`session.run()`来手动编译抽象语法树。\nTensorFlow 2.0 默认Eager execution模式，马上就执行代码（就像Python通常那样），在2.0中，图形和会话应该像实现细节一样。\n\nEager execution的一个值得注意的地方是不在需要`tf.control_dependencies()` ，因为所有代码按顺序执行（在`tf.function`中，带有副作用的代码按写入的顺序执行）。\n\n### 1.3. 没有更多的全局变量\n\nTensorFlow 1.X严重依赖于隐式全局命名空间。当你调用`tf.Variable()`时，它会被放入默认图形中，保留在那里，即使你忘记了指向它的Python变量。\n然后，您可以恢复该`tf.Variable`，但前提是您知道它已创建的名称，如果您无法控制变量的创建，这很难做到。结果，各种机制激增，试图帮助用户再次找到他们的变量，并寻找框架来查找用户创建的变量：变量范围、全局集合、辅助方法如`tf.get_global_step()`, `tf.global_variables_initializer()`、优化器隐式计算所有可训练变量的梯度等等。\n\nTensorFlow 2.0取消了所有这些机制([Variables 2.0 RFC](https://github.com/tensorflow/community/pull/11))，支持默认机制：跟踪变量！如果你失去了对tf.Variable的追踪，就会垃圾收集回收。\n\n跟踪变量的要求为用户创建了一些额外的工作，但是使用Keras对象（见下文），负担被最小化。\n\n### 1.4. Functions, not sessions\n\n`session.run()`调用几乎就像一个函数调用：指定输入和要调用的函数，然后返回一组输出。\n在TensorFlow 2.0中，您可以使用`tf.function()` 来装饰Python函数以将其标记为JIT编译，以便TensorFlow将其作为单个图形运行([Functions 2.0 RFC](https://github.com/tensorflow/community/pull/20))。这种机制允许TensorFlow 2.0获得图形模式的所有好处：\n\n- 性能：可以优化功能（节点修剪，内核融合等）\n- 可移植性：该功能可以导出/重新导入([SavedModel 2.0 RFC](https://github.com/tensorflow/community/pull/34))，允许用户重用和共享模块化TensorFlow功能。\n\n```python\n# TensorFlow 1.X\noutputs = session.run(f(placeholder), feed_dict={placeholder: input})\n# TensorFlow 2.0\noutputs = f(input)\n```\n\n凭借能够自由穿插Python和TensorFlow代码，我们希望用户能够充分利用Python的表现力。但是可移植的TensorFlow在没有Python解释器的情况下执行-移动端、C++和JS，帮助用户避免在添加 `@tf.function`时重写代码，[AutoGraph](https://tensorflow.google.cn/beta/guide/autograph)将把Python构造的一个子集转换成它们等效的TensorFlow：\n\n* `for`/`while` -> `tf.while_loop` (支持`break` 和 `continue`)\n* `if` -> `tf.cond`\n* `for _ in dataset` -> `dataset.reduce`\n\nAutoGraph支持控制流的任意嵌套，这使得高效和简洁地实现许多复杂的ML程序成为可能，比如序列模型、强化学习、自定义训练循环等等。\n\n## 2. 使用TensorFlow 2.0的建议\n\n### 2.1. 将代码重构为更小的函数\n\nTensorFlow 1.X中常见的使用模式是“kitchen sink”策略，在该策略中，所有可能的计算的并集被预先安排好，然后通过`session.run()`对所选的张量进行评估。\n\nTensorFlow 2.0中，用户应该根据需要将代码重构为更小的函数。一般来说，没有必须要使用`tf.function`来修饰这些小函数，只用`tf.function`来修饰高级计算-例如，一个训练步骤，或者模型的前向传递。\n\n### 2.2. 使用Keras层和模型来管理变量\n\nKeras模型和层提供了方便的`variables`和`trainable_variables`属性，它们递归地收集所有的因变量。这使得本地管理变量到使用它们的地方变得非常容易。\n\n对比如下：\n\n```python\ndef dense(x, W, b):\n  return tf.nn.sigmoid(tf.matmul(x, W) + b)\n\n@tf.function\ndef multilayer_perceptron(x, w0, b0, w1, b1, w2, b2 ...):\n  x = dense(x, w0, b0)\n  x = dense(x, w1, b1)\n  x = dense(x, w2, b2)\n  ...\n  \n# 您仍然必须管理w_i和b_i，它们是在代码的其他地方定义的。\n```\n\nKeras版本如下：\n\n```python\n# 每个图层都可以调用，其签名等价于linear(x)\nlayers = [tf.keras.layers.Dense(hidden_size, activation=tf.nn.sigmoid) for _ in range(n)]\nperceptron = tf.keras.Sequential(layers)\n\n# layers[3].trainable_variables => returns [w3, b3]\n# perceptron.trainable_variables => returns [w0, b0, ...]\n```\n\nKeras 层/模型继承自 `tf.train.Checkpointable` 并与`@tf.function`集成，这使得从Keras对象导出保存模型成为可能。\n您不必使用Keras的`.fit()` API来利用这些集成。\n\n下面是一个转移学习示例，演示了Keras如何简化收集相关变量子集的工作。假设你正在训练一个拥有共享trunk的multi-headed模型：\n\n```python\ntrunk = tf.keras.Sequential([...])\nhead1 = tf.keras.Sequential([...])\nhead2 = tf.keras.Sequential([...])\n\npath1 = tf.keras.Sequential([trunk, head1])\npath2 = tf.keras.Sequential([trunk, head2])\n\n# 训练主要数据集\nfor x, y in main_dataset:\n  with tf.GradientTape() as tape:\n    prediction = path1(x)\n    loss = loss_fn_head1(prediction, y)\n  # 同时优化trunk和head1的权重\n  gradients = tape.gradient(loss, path1.trainable_variables)\n  optimizer.apply_gradients(zip(gradients, path1.trainable_variables))\n\n# 微调第二个头部，重用trunk\nfor x, y in small_dataset:\n  with tf.GradientTape() as tape:\n    prediction = path2(x)\n    loss = loss_fn_head2(prediction, y)\n  # 只优化head2的权重，不是trunk的权重\n  gradients = tape.gradient(loss, head2.trainable_variables)\n  optimizer.apply_gradients(zip(gradients, head2.trainable_variables))\n\n# 你可以发布trunk计算，以便他人重用。\ntf.saved_model.save(trunk, output_path)\n```\n\n### 2.3. 结合tf.data.Datesets和@tf.function\n\n当迭代适合内存训练的数据时，可以随意使用常规的Python迭代。除此之外，`tf.data.Datesets`是从磁盘中传输训练数据的最佳方式。\n数据集[可迭代（但不是迭代器](https://docs.python.org/3/glossary.html#term-iterable)），就像其他Python迭代器在Eager模式下工作一样。\n您可以通过将代码包装在`tf.function()`中来充分利用数据集异步预取/流功能，该代码将Python迭代替换为使用AutoGraph的等效图形操作。\n\n```python\n@tf.function\ndef train(model, dataset, optimizer):\n  for x, y in dataset:\n    with tf.GradientTape() as tape:\n      prediction = model(x)\n      loss = loss_fn(prediction, y)\n    gradients = tape.gradient(loss, model.trainable_variables)\n    optimizer.apply_gradients(zip(gradients, model.trainable_variables))\n```\n\n如果使用Keras`.fit()`API，就不必担心数据集迭代：\n\n```python\nmodel.compile(optimizer=optimizer, loss=loss_fn)\nmodel.fit(dataset)\n```\n\n### 2.4. 利用AutoGraph和Python控制流程\n\nAutoGraph提供了一种将依赖于数据的控制流转换为图形模式等价的方法，如`tf.cond`和`tf.while_loop`。\n\n数据依赖控制流出现的一个常见位置是序列模型。`tf.keras.layers.RNN`封装一个RNN单元格，允许你您静态或动态展开递归。\n为了演示，您可以重新实现动态展开如下：\n\n```python\nclass DynamicRNN(tf.keras.Model):\n\n  def __init__(self, rnn_cell):\n    super(DynamicRNN, self).__init__(self)\n    self.cell = rnn_cell\n\n  def call(self, input_data):\n    # [batch, time, features] -> [time, batch, features]\n    input_data = tf.transpose(input_data, [1, 0, 2])\n    outputs = tf.TensorArray(tf.float32, input_data.shape[0])\n    state = self.cell.zero_state(input_data.shape[1], dtype=tf.float32)\n    for i in tf.range(input_data.shape[0]):\n      output, state = self.cell(input_data[i], state)\n      outputs = outputs.write(i, output)\n    return tf.transpose(outputs.stack(), [1, 0, 2]), state\n```\n\n有关AutoGraph功能的更详细概述，请参阅[指南](https://tensorflow.google.cn/beta/guide/autograph).。\n\n### 2.5. 使用tf.metrics聚合数据和tf.summary来记录它\n\n要记录摘要，请使用`tf.summary.(scalar|histogram|...)` 并使用上下文管理器将其重定向到writer。（如果省略上下文管理器，则不会发生任何事情。）与TF 1.x不同，摘要直接发送给writer；没有单独的`merger`操作，也没有单独的`add_summary()`调用，这意味着必须在调用点提供步骤值。\n\n```python\nsummary_writer = tf.summary.create_file_writer('/tmp/summaries')\nwith summary_writer.as_default():\n  tf.summary.scalar('loss', 0.1, step=42)\n```\n\n要在将数据记录为摘要之前聚合数据，请使用`tf.metrics`，Metrics是有状态的；\n当你调用`.result()`时，它们会累计值并返回累计结果。使用`.reset_states()`清除累计值。\n\n```python\ndef train(model, optimizer, dataset, log_freq=10):\n  avg_loss = tf.keras.metrics.Mean(name='loss', dtype=tf.float32)\n  for images, labels in dataset:\n    loss = train_step(model, optimizer, images, labels)\n    avg_loss.update_state(loss)\n    if tf.equal(optimizer.iterations % log_freq, 0):\n      tf.summary.scalar('loss', avg_loss.result(), step=optimizer.iterations)\n      avg_loss.reset_states()\n\ndef test(model, test_x, test_y, step_num):\n  loss = loss_fn(model(test_x), test_y)\n  tf.summary.scalar('loss', loss, step=step_num)\n\ntrain_summary_writer = tf.summary.create_file_writer('/tmp/summaries/train')\ntest_summary_writer = tf.summary.create_file_writer('/tmp/summaries/test')\n\nwith train_summary_writer.as_default():\n  train(model, optimizer, dataset)\n\nwith test_summary_writer.as_default():\n  test(model, test_x, test_y, optimizer.iterations)\n```\n\n通过将TensorBoard指向摘要日志目录来显示生成的摘要：\n\n```shell\ntensorboard --logdir /tmp/summaries\n```\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-guide-effective_tf2.html](https://www.mashangxue123.com/tensorflow/tf2-guide-effective_tf2.html)\n> 英文版本：[https://tensorflow.google.cn/beta/guide/effective_tf2](https://tensorflow.google.cn/beta/guide/effective_tf2)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/guide/effective_tf2.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/guide/effective_tf2.md)"
  },
  {
    "path": "r2/guide/keras/functional.md",
    "content": "---\ntitle: 不用Sequential模型,TensorFlow中的Keras函数式API\ntags: tensorflow2.0教程\ncategories: tensorflow2官方教程\ntop: 1999\nabbrlink: tensorflow/tf2-guide-keras-functional\n---\n\n# 不用Sequential模型，TensorFlow中的Keras函数式API (tensorflow2.0官方教程翻译)\n\n## 1. 设置\n\n安装\n\n```\npip install pydot\napt-get install graphviz\n```\n\n导入库\n```python\nfrom __future__ import absolute_import, division, print_function, unicode_literals\n\nimport tensorflow as tf\n\ntf.keras.backend.clear_session()  # For easy reset of notebook state.\n```\n\n## 2. 介绍\n\n您已经熟悉使用 `keras.Sequential()` 来创建模型。函数式 API是一种创建比 `Sequential` 更灵活的模型的方法：它可以处理具有非线性拓扑的模型，具有共享层的模型以及具有多个输入或输出的模型。\n\n它基于这样一种思想，即深度学习模型通常是由层组成的有向无环图(DAG)。函数API是一组用于构建层图的工具。\n\n考虑以下模型：\n\n```python\n(input: 784-dimensional vectors)\n       ↧\n[Dense (64 units, relu activation)]\n       ↧\n[Dense (64 units, relu activation)]\n       ↧\n[Dense (10 units, softmax activation)]\n       ↧\n(output: probability distribution over 10 classes)\n```\n\n这是一个3层的简单图表。\n要使用函数API构建这个模型，首先要创建一个输入节点:\n\n```python\nfrom tensorflow import keras\n\ninputs = keras.Input(shape=(784,))\n```\n\n这里我们只指定数据的形状：784维向量。无论总是省略批量大小，我们只指定每个样本的形状。对于用于形状 `(32, 32, 3)` 的图像的输入，我们将使用：\n\n```python\nimg_inputs = keras.Input(shape=(32, 32, 3))\n```\n\n返回的内容，`inputs`，包含有关您希望提供给模型的输入数据的形状和类型的信息：\n\n```python\ninputs.shape\n\ninputs.dtype\n```\n\n\n```\n    TensorShape([None, 784])\n    tf.float32\n```\n\n\n通过调用这个输入对象上的一个层，可以在层图中创建一个新节点:\n\n```python\nfrom tensorflow.keras import layers\n\ndense = layers.Dense(64, activation='relu')\nx = dense(inputs)\n```\n\n“层调用”操作就像从“输入”向我们创建的这个层绘制一个箭头。我们把输入“传递”到 `dense` 层，得到x。\n\n让我们在图层中添加几个层：\n\n```python\nx = layers.Dense(64, activation='relu')(x)\noutputs = layers.Dense(10, activation='softmax')(x)\n```\n\n此时，我们可以通过在图层中指定其输入和输出来创建模型：\n\n```python\nmodel = keras.Model(inputs=inputs, outputs=outputs)\n```\n\n回顾一下，这是我们的完整模型定义过程：\n\n```python\ninputs = keras.Input(shape=(784,), name='img')\nx = layers.Dense(64, activation='relu')(inputs)\nx = layers.Dense(64, activation='relu')(x)\noutputs = layers.Dense(10, activation='softmax')(x)\n\nmodel = keras.Model(inputs=inputs, outputs=outputs, name='mnist_model')\n```\n\n让我们看一下模型摘要的样子：\n\n```python\nmodel.summary()\n```\n\n```\n    Model: \"mnist_model\"\n    _________________________________________________________________\n    Layer (type)                 Output Shape              Param #\n    =================================================================\n    img (InputLayer)             [(None, 784)]             0\n    _________________________________________________________________\n    dense_3 (Dense)              (None, 64)                50240\n    _________________________________________________________________\n    dense_4 (Dense)              (None, 64)                4160\n    _________________________________________________________________\n    dense_5 (Dense)              (None, 10)                650\n    =================================================================\n    Total params: 55,050\n    Trainable params: 55,050\n    Non-trainable params: 0\n    _________________________________________________________________\n```\n\n我们还可以将模型绘制为图形：\n\n```python\nkeras.utils.plot_model(model, 'my_first_model.png')\n```\n\n![png](functional_25_0.png)\n\n\n并可选择在绘制的图形中显示每个图层的输入和输出形状：\n\n```python\nkeras.utils.plot_model(model, 'my_first_model_with_shape_info.png', show_shapes=True)\n```\n\n![png](functional_27_0.png)\n\n这个图和我们编写的代码几乎完全相同。在代码版本中，连接箭头只是由调用操作替换。\n\n \"graph of layers\" 是深度学习模型的非常直观的心理图像，而函数API是一种创建模型的方法，可以很好地反映这种心理图像。\n\n\n## 3. 训练、评估和推理\n\n对于使用函数API构建的模型和顺序模型，评估和推理的工作方式完全相同。\n\n这是一个快速演示。\n\n在这里，我们加载MNIST图像数据，将其重新整形为矢量，使模型适合数据（同时监控验证分割的性能），最后我们在测试数据上评估我们的模型：\n\n```python\n(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()\nx_train = x_train.reshape(60000, 784).astype('float32') / 255\nx_test = x_test.reshape(10000, 784).astype('float32') / 255\n\nmodel.compile(loss='sparse_categorical_crossentropy',\n              optimizer=keras.optimizers.RMSprop(),\n              metrics=['accuracy'])\nhistory = model.fit(x_train, y_train,\n                    batch_size=64,\n                    epochs=5,\n                    validation_split=0.2)\ntest_scores = model.evaluate(x_test, y_test, verbose=0)\nprint('Test loss:', test_scores[0])\nprint('Test accuracy:', test_scores[1])\n```\n\n```\n    Train on 48000 samples, validate on 12000 samples\n    ......\n    Epoch 5/5\n    48000/48000 [==============================] - 3s 55us/sample - loss: 0.0759 - accuracy: 0.9770 - val_loss: 0.1139 - val_accuracy: 0.9670\n    Test loss: 0.100577776569454\n    Test accuracy: 0.9696\n```\n\n有关模型训练和评估的完整指南，请参阅[训练和评估指南](https://tensorflow.google.cn/beta/guide/keras/training_and_evaluation)。\n\n## 4. 保存和序列化\n\n对于使用函数API构建的模型和顺序模型，保存和序列化的工作方式完全相同。\n\n保存函数模型的标准方法是调用model.save()将整个模型保存到一个文件中。稍后，您可以从该文件重新创建相同的模型，即使您不再能够访问创建模型的代码。\n\n这个文件包括:\n- 该模型的架构\n- 模型的权重值（在训练期间学到的）\n- 模型的训练配置（你传递给`compile`的东西），如果有的话\n- 优化器及其状态（如果有的话）（这使您可以从中断的地方重新启动训练）\n\n\n```python\nmodel.save('path_to_my_model.h5')\ndel model\n# Recreate the exact same model purely from the file:\nmodel = keras.models.load_model('path_to_my_model.h5')\n```\n\n有关模型保存的完整指南，请参阅[保存和序列化模型指南](https://tensorflow.google.cn/beta/guide/keras/saving_and_serializing)。\n\n## 5. 使用相同的层图来定义多个模型\n\n在函数API中，通过在层图中指定它们的输入和输出来创建模型。这意味着一个图层图可以用来生成多个模型。\n\n在下面的示例中，我们使用相同的层堆栈来实例化两个模型:将图像输入转换为16维向量的编码器 `encoder` 模型，以及用于训练的端到端自动编码器`autoencoder`模型。\n\n```python\nencoder_input = keras.Input(shape=(28, 28, 1), name='img')\nx = layers.Conv2D(16, 3, activation='relu')(encoder_input)\nx = layers.Conv2D(32, 3, activation='relu')(x)\nx = layers.MaxPooling2D(3)(x)\nx = layers.Conv2D(32, 3, activation='relu')(x)\nx = layers.Conv2D(16, 3, activation='relu')(x)\nencoder_output = layers.GlobalMaxPooling2D()(x)\n\nencoder = keras.Model(encoder_input, encoder_output, name='encoder')\nencoder.summary()\n\nx = layers.Reshape((4, 4, 1))(encoder_output)\nx = layers.Conv2DTranspose(16, 3, activation='relu')(x)\nx = layers.Conv2DTranspose(32, 3, activation='relu')(x)\nx = layers.UpSampling2D(3)(x)\nx = layers.Conv2DTranspose(16, 3, activation='relu')(x)\ndecoder_output = layers.Conv2DTranspose(1, 3, activation='relu')(x)\n\nautoencoder = keras.Model(encoder_input, decoder_output, name='autoencoder')\nautoencoder.summary()\n```\n\n请注意，我们使解码架构与编码架构严格对称，因此我们得到的输出形状与输入形状`（28,28,1）`相同。`Conv2D` 层的反面是 `Conv2DTranspose` 层`MaxPooling2D` 层的反面是 `UpSampling2D` 层。\n\n## 6. 所有模型都可以调用，就像层一样\n\n您可以将任何模型视为一个图层，方法是在输入或另一个图层的输出上调用它。请注意，通过调用模型，您不仅可以重用模型的体系结构，还可以重用其权重。\n\n让我们看看它是如何运作的。以下是对自动编码器示例的不同看法，该示例创建编码器模型，解码器模型，并在两次调用中链接它们以获取自动编码器模型：\n\n```python\nencoder_input = keras.Input(shape=(28, 28, 1), name='original_img')\nx = layers.Conv2D(16, 3, activation='relu')(encoder_input)\nx = layers.Conv2D(32, 3, activation='relu')(x)\nx = layers.MaxPooling2D(3)(x)\nx = layers.Conv2D(32, 3, activation='relu')(x)\nx = layers.Conv2D(16, 3, activation='relu')(x)\nencoder_output = layers.GlobalMaxPooling2D()(x)\n\nencoder = keras.Model(encoder_input, encoder_output, name='encoder')\nencoder.summary()\n\ndecoder_input = keras.Input(shape=(16,), name='encoded_img')\nx = layers.Reshape((4, 4, 1))(decoder_input)\nx = layers.Conv2DTranspose(16, 3, activation='relu')(x)\nx = layers.Conv2DTranspose(32, 3, activation='relu')(x)\nx = layers.UpSampling2D(3)(x)\nx = layers.Conv2DTranspose(16, 3, activation='relu')(x)\ndecoder_output = layers.Conv2DTranspose(1, 3, activation='relu')(x)\n\ndecoder = keras.Model(decoder_input, decoder_output, name='decoder')\ndecoder.summary()\n\nautoencoder_input = keras.Input(shape=(28, 28, 1), name='img')\nencoded_img = encoder(autoencoder_input)\ndecoded_img = decoder(encoded_img)\nautoencoder = keras.Model(autoencoder_input, decoded_img, name='autoencoder')\nautoencoder.summary()\n```\n\n如您所见，模型可以嵌套：模型可以包含子模型（因为模型就像一个层）。\n\n模型嵌套的常见用例是集成。作为一个例子，这里是如何将一组模型集成到一个平均其预测的模型中：\n\n```python\ndef get_model():\n  inputs = keras.Input(shape=(128,))\n  outputs = layers.Dense(1, activation='sigmoid')(inputs)\n  return keras.Model(inputs, outputs)\n\nmodel1 = get_model()\nmodel2 = get_model()\nmodel3 = get_model()\n\ninputs = keras.Input(shape=(128,))\ny1 = model1(inputs)\ny2 = model2(inputs)\ny3 = model3(inputs)\noutputs = layers.average([y1, y2, y3])\nensemble_model = keras.Model(inputs=inputs, outputs=outputs)\n```\n\n## 7. 操纵复杂的图形拓扑\n\n\n### 7.1. 具有多个输入和输出的模型\n\n\nfunctional API使操作多个输入和输出变得容易。使用Sequential API无法处理此问题。\n\n这是一个简单的例子。\n\n假设您正在构建一个系统，按照优先级对定制的发行票据进行排序，并将它们路由到正确的部门。\n\n你的模型将有3个输入：\n- 票证标题（文字输入）\n- 票证的文本正文（文本输入）\n- 用户添加的任何标签（分类输入）\n\n它将有两个输出：\n\n- 优先级在0到1之间（标量sigmoid输出）\n- 应该处理票据的部门(各部门之间的softmax输出)\n\n让我们用Functional API在几行中构建这个模型。\n\n```python\nnum_tags = 12  # Number of unique issue tags\nnum_words = 10000  # Size of vocabulary obtained when preprocessing text data\nnum_departments = 4  # Number of departments for predictions\n\ntitle_input = keras.Input(shape=(None,), name='title')  # Variable-length sequence of ints\nbody_input = keras.Input(shape=(None,), name='body')  # Variable-length sequence of ints\ntags_input = keras.Input(shape=(num_tags,), name='tags')  # Binary vectors of size `num_tags`\n\n# Embed each word in the title into a 64-dimensional vector\ntitle_features = layers.Embedding(num_words, 64)(title_input)\n# Embed each word in the text into a 64-dimensional vector\nbody_features = layers.Embedding(num_words, 64)(body_input)\n\n# Reduce sequence of embedded words in the title into a single 128-dimensional vector\ntitle_features = layers.LSTM(128)(title_features)\n# Reduce sequence of embedded words in the body into a single 32-dimensional vector\nbody_features = layers.LSTM(32)(body_features)\n\n# Merge all available features into a single large vector via concatenation\nx = layers.concatenate([title_features, body_features, tags_input])\n\n# Stick a logistic regression for priority prediction on top of the features\npriority_pred = layers.Dense(1, activation='sigmoid', name='priority')(x)\n# Stick a department classifier on top of the features\ndepartment_pred = layers.Dense(num_departments, activation='softmax', name='department')(x)\n\n# Instantiate an end-to-end model predicting both priority and department\nmodel = keras.Model(inputs=[title_input, body_input, tags_input],\n                    outputs=[priority_pred, department_pred])\n```\n\n让我们绘制模型：\n\n```python\nkeras.utils.plot_model(model, 'multi_input_and_output_model.png', show_shapes=True)\n```\n\n![png](functional_45_0.png)\n\n\n编译此模型时，我们可以为每个输出分配不同的损耗。您甚至可以为每个损失分配不同的权重，以调整它们对总训练损失的贡献。\n\n```python\nmodel.compile(optimizer=keras.optimizers.RMSprop(1e-3),\n              loss=['binary_crossentropy', 'categorical_crossentropy'],\n              loss_weights=[1., 0.2])\n```\n\n由于我们为输出图层指定了名称，因此我们也可以像这样指定损失：\n\n```python\nmodel.compile(optimizer=keras.optimizers.RMSprop(1e-3),\n              loss={'priority': 'binary_crossentropy',\n                    'department': 'categorical_crossentropy'},\n              loss_weights=[1., 0.2])\n```\n\n我们可以通过传递Numpy输入和目标数组列表来训练模型：\n\n```python\nimport numpy as np\n\n# Dummy input data\ntitle_data = np.random.randint(num_words, size=(1280, 10))\nbody_data = np.random.randint(num_words, size=(1280, 100))\ntags_data = np.random.randint(2, size=(1280, num_tags)).astype('float32')\n# Dummy target data\npriority_targets = np.random.random(size=(1280, 1))\ndept_targets = np.random.randint(2, size=(1280, num_departments))\n\nmodel.fit({'title': title_data, 'body': body_data, 'tags': tags_data},\n          {'priority': priority_targets, 'department': dept_targets},\n          epochs=2,\n          batch_size=32)\n```\n\n```\n    ....\n    Epoch 2/2\n    1280/1280 [==============================] - 11s 9ms/sample - loss: 1.2137 - priority_loss: 0.6489 - department_loss: 2.8242\n```\n\n当使用`Dataset`对象调用fit时，它应该产生一个列表元组，如 `([title_data, body_data, tags_data], [priority_targets, dept_targets])` 或者一个字典的元组 `({'title': title_data, 'body': body_data, 'tags': tags_data}, {'priority': priority_targets, 'department': dept_targets})`。\n\n有关更详细的说明，请参阅完整的[训练和评估指南](https://tensorflow.google.cn/beta/guide/keras/training_and_evaluation)。\n\n\n### 7.2. 一个玩具resnet模型\n\n除了具有多个输入和输出的模型之外，Functional API还可以轻松地操作非线性连接拓扑，也就是说，层不按顺序连接的模型。这也无法使用Sequential API处理（如名称所示）。\n\n一个常见的用例是残差连接。\n\n让我们为CIFAR10构建一个玩具ResNet模型来演示这个\n\n```python\ninputs = keras.Input(shape=(32, 32, 3), name='img')\nx = layers.Conv2D(32, 3, activation='relu')(inputs)\nx = layers.Conv2D(64, 3, activation='relu')(x)\nblock_1_output = layers.MaxPooling2D(3)(x)\n\nx = layers.Conv2D(64, 3, activation='relu', padding='same')(block_1_output)\nx = layers.Conv2D(64, 3, activation='relu', padding='same')(x)\nblock_2_output = layers.add([x, block_1_output])\n\nx = layers.Conv2D(64, 3, activation='relu', padding='same')(block_2_output)\nx = layers.Conv2D(64, 3, activation='relu', padding='same')(x)\nblock_3_output = layers.add([x, block_2_output])\n\nx = layers.Conv2D(64, 3, activation='relu')(block_3_output)\nx = layers.GlobalAveragePooling2D()(x)\nx = layers.Dense(256, activation='relu')(x)\nx = layers.Dropout(0.5)(x)\noutputs = layers.Dense(10, activation='softmax')(x)\n\nmodel = keras.Model(inputs, outputs, name='toy_resnet')\nmodel.summary()\n```\n\n\n让我们绘制模型：\n\n```python\nkeras.utils.plot_model(model, 'mini_resnet.png', show_shapes=True)\n```\n\n![png](functional_56_0.png)\n\n\n我们来训练吧：\n\n```python\n(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()\nx_train = x_train.astype('float32') / 255.\nx_test = x_test.astype('float32') / 255.\ny_train = keras.utils.to_categorical(y_train, 10)\ny_test = keras.utils.to_categorical(y_test, 10)\n\nmodel.compile(optimizer=keras.optimizers.RMSprop(1e-3),\n              loss='categorical_crossentropy',\n              metrics=['acc'])\nmodel.fit(x_train, y_train,\n          batch_size=64,\n          epochs=1,\n          validation_split=0.2)\n```\n\n```\n    Train on 40000 samples, validate on 10000 samples\n    40000/40000 [==============================] - 318s 8ms/sample - loss: 1.9034 - acc: 0.2767 - val_loss: 1.6173 - val_acc: 0.3870\n```\n\n## 8. 共享图层\n\n函数式API的另一个好用途是使用共享层的模型。共享图层是在同一模型中多次重复使用的图层实例：它们学习与层图中的多个路径对应的特征。\n\n共享层通常用于编码来自类似空间的输入（例如，两个不同的文本，具有相似的词汇表），因为它们可以在这些不同的输入上共享信息，并且可以在更少的时间内训练这样的模型数据。如果在其中一个输入中看到给定的单词，那将有利于处理通过共享层的所有输入。\n\n要在Functional API中共享图层，只需多次调用同一图层实例即可。例如，这是一个跨两个不同文本输入共享的嵌入`Embedding` 层：\n\n```python\n# Embedding for 1000 unique words mapped to 128-dimensional vectors\nshared_embedding = layers.Embedding(1000, 128)\n\n# Variable-length sequence of integers\ntext_input_a = keras.Input(shape=(None,), dtype='int32')\n\n# Variable-length sequence of integers\ntext_input_b = keras.Input(shape=(None,), dtype='int32')\n\n# We reuse the same layer to encode both inputs\nencoded_input_a = shared_embedding(text_input_a)\nencoded_input_b = shared_embedding(text_input_b)\n```\n\n## 9. 提取和重用图层中的节点\n\n因为您在Functional API中操作的层图是静态数据结构，所以可以访问和检查它。这就是我们如何将功能模型绘制为图像的方式。\n\n这也意味着我们可以访问中间层的激活（图中的“节点”）并在其他地方重用它们。例如，这对于特征提取非常有用！\n\n让我们看一个例子。这是一个在ImageNet上预先训练权重的VGG19模型:\n\n```python\nfrom tensorflow.keras.applications import VGG19\n\nvgg19 = VGG19()\n```\n\n这些是模型的中间激活，通过查询图形数据结构获得：\n\n```python\nfeatures_list = [layer.output for layer in vgg19.layers]\n```\n\n我们可以使用这些功能来创建一个新的特征提取模型，它返回中间层激活的值（我们可以在3行中完成所有这些操作）。\n\n```python\nfeat_extraction_model = keras.Model(inputs=vgg19.input, outputs=features_list)\n\nimg = np.random.random((1, 224, 224, 3)).astype('float32')\nextracted_features = feat_extraction_model(img)\n```\n\n除了其他方面，这在[实现神经风格迁移](https://medium.com/tensorflow/neural-style-transfer-creating-art-with-deep-learning-using-tf-keras-and-eager-execution-7d541ac31398)时派上用场。\n\n## 10. 通过编写自定义图层来扩展API\n\ntf.keras拥有广泛的内置层。这里有一些例子：\n\n- 卷积层Convolutional layers: `Conv1D`, `Conv2D`, `Conv3D`, `Conv2DTranspose`, etc.\n- 池化层Pooling layers: `MaxPooling1D`, `MaxPooling2D`, `MaxPooling3D`, `AveragePooling1D`, etc.\n- RNN 层: `GRU`, `LSTM`, `ConvLSTM2D`, etc.\n- `BatchNormalization`, `Dropout`, `Embedding`, etc.\n\n如果找不到所需的内容，可以通过创建自己的图层来扩展API。\n\n所有图层都是`Layer`类的子类，并实现：\n- 一个 `call` 方法，指定由层完成的计算。\n- 一个`build`方法，它创建了图层的权重（请注意，这只是一种样式约定;您也可以在 `__init__` 中创建权重）。\n\n要了解有关从头开始创建图层的更多信息，请查看该指南 [Guide to writing layers and models from scratch](https://tensorflow.google.cn/beta/guide/keras/custom_layers_and_models).\n\n这是一个`Dense`层的简单实现：\n\n```python\nclass CustomDense(layers.Layer):\n\n  def __init__(self, units=32):\n    super(CustomDense, self).__init__()\n    self.units = units\n\n  def build(self, input_shape):\n    self.w = self.add_weight(shape=(input_shape[-1], self.units),\n                             initializer='random_normal',\n                             trainable=True)\n    self.b = self.add_weight(shape=(self.units,),\n                             initializer='random_normal',\n                             trainable=True)\n\n  def call(self, inputs):\n    return tf.matmul(inputs, self.w) + self.b\n\ninputs = keras.Input((4,))\noutputs = CustomDense(10)(inputs)\n\nmodel = keras.Model(inputs, outputs)\n```\n\n如果希望自定义层支持序列化，还应定义 `get_config` 方法，该方法返回层实例的构造函数参数：\n\n```python\nclass CustomDense(layers.Layer):\n\n  def __init__(self, units=32):\n    super(CustomDense, self).__init__()\n    self.units = units\n\n  def build(self, input_shape):\n    self.w = self.add_weight(shape=(input_shape[-1], self.units),\n                             initializer='random_normal',\n                             trainable=True)\n    self.b = self.add_weight(shape=(self.units,),\n                             initializer='random_normal',\n                             trainable=True)\n\n  def call(self, inputs):\n    return tf.matmul(inputs, self.w) + self.b\n\n  def get_config(self):\n    return {'units': self.units}\n\n\ninputs = keras.Input((4,))\noutputs = CustomDense(10)(inputs)\n\nmodel = keras.Model(inputs, outputs)\nconfig = model.get_config()\n\nnew_model = keras.Model.from_config(\n    config, custom_objects={'CustomDense': CustomDense})\n```\n\n或者，你也可以实现类方法 `from_config(cls, config)` ，它负责在给定配置字典的情况下重新创建一个层实例。`from_config` 的默认实现是：\n\n```python\ndef from_config(cls, config):\n  return cls(**config)\n```\n\n## 11. 何时使用函数API\n\n如何决定是使用Functional API创建新模型，还是直接将`Model`类继承？\n\n通常，Functional API更高级，更容易和更安全使用，并且具有子模型不支持的许多功能。\n\n但是，在创建不易表达为层的有向非循环图的模型时，模型子类化为您提供了更大的灵活性（例如，您无法使用Functional API实现Tree-RNN，您必须直接将`Model`子类化）。\n\n### 11.1. 以下是Functional API的优势：\n\n下面列出的属性对于Sequential模型也是如此（它们也是数据结构），但对于子类模型（Python字节码，而不是数据结构）则不然。\n\n#### 11.1.1. 它不那么冗长。\n\nNo `super(MyClass, self).__init__(...)`, no `def call(self, ...):`, etc.\n\n比较:\n\n```python\ninputs = keras.Input(shape=(32,))\nx = layers.Dense(64, activation='relu')(inputs)\noutputs = layers.Dense(10)(x)\nmlp = keras.Model(inputs, outputs)\n```\n\n使用子类型:\n\n```python\nclass MLP(keras.Model):\n\n  def __init__(self, **kwargs):\n    super(MLP, self).__init__(**kwargs)\n    self.dense_1 = layers.Dense(64, activation='relu')\n    self.dense_2 = layers.Dense(10)\n\n  def call(self, inputs):\n    x = self.dense_1(inputs)\n    return self.dense_2(x)\n\n# Instantiate the model.\nmlp = MLP()\n# Necessary to create the model's state.\n# The model doesn't have a state until it's called at least once.\n_ = mlp(tf.zeros((1, 32)))\n```\n\n\n#### 11.1.2. 它在您定义模型时验证您的模型。\n\n在Functional API中，您的输入规范（shape和dtype）是事先创建的（通过`Input`），每次调用一个图层时，图层都会检查传递给它的规范是否符合其假设，并且它会引发一个如果没有帮助错误消息。\n\n这可以保证您可以运行使用Functional API构建的任何模型。所有调试（与收敛相关的调试除外）将在模型构建期间静态发生，而不是在执行时发生。这类似于编译器中的类型检查。\n\n#### 11.1.3. 您的 Functional 模型是可绘图和可检查的。\n\n您可以将模型绘制为图形，并且可以轻松访问此图中的中间节点 - 例如，提取和重用中间层的激活，如前面的示例中所示：\n\n```python\nfeatures_list = [layer.output for layer in vgg19.layers]\nfeat_extraction_model = keras.Model(inputs=vgg19.input, outputs=features_list)\n```\n\n\n#### 11.1.4. 您的 Functional 模型可以序列化或克隆。\n\n因为 Functional 模型是一种数据结构而不是一段代码，所以它可以安全地序列化，并且可以保存为单个文件，允许您重新创建完全相同的模型，而无需访问任何原始代码。有关详细信息，请参阅我们的保存和序列化指南。\n\n### 11.2. 以下是Functional API的弱点:\n\n\n#### 11.2.1. 它不支持动态架构。\n\nFunctional API将模型视为图层的DAG。对于大多数深度学习体系结构都是如此，但并非全部：例如，递归网络或树RNN不遵循此假设，并且无法在Functional API中实现。\n\n#### 11.2.2. 有时，您只需要从头开始编写所有内容。\n\n在编写高级架构时，您可能希望执行“定义层的DAG”范围之外的事情：例如，您可能希望在模型实例上公开多个自定义训练和推理方法。这需要子类化。\n\n---\n\n为了更深入地了解Functional API和Model子类之间的差异，您可以阅读 [What are Symbolic and Imperative APIs in TensorFlow 2.0?](https://medium.com/tensorflow/what-are-symbolic-and-imperative-apis-in-tensorflow-2-0-dfccecb01021).\n\n## 12. 混合和匹配不同的API样式\n\n重要的是，在Functional API或Model子类化之间进行选择并不是一个二元决策，它将您限制为一类模型。tf.keras API中的所有模型都可以与每个模型进行交互，无论它们是顺序模型，功能模型还是从头开始编写的子类模型/层。\n\n您始终可以使用Functional模型或Sequential模型作为子类 Model/Layer 的一部分：\n\n```python\nunits = 32\ntimesteps = 10\ninput_dim = 5\n\n# Define a Functional model\ninputs = keras.Input((None, units))\nx = layers.GlobalAveragePooling1D()(inputs)\noutputs = layers.Dense(1, activation='sigmoid')(x)\nmodel = keras.Model(inputs, outputs)\n\n\nclass CustomRNN(layers.Layer):\n\n  def __init__(self):\n    super(CustomRNN, self).__init__()\n    self.units = units\n    self.projection_1 = layers.Dense(units=units, activation='tanh')\n    self.projection_2 = layers.Dense(units=units, activation='tanh')\n    # Our previously-defined Functional model\n    self.classifier = model\n\n  def call(self, inputs):\n    outputs = []\n    state = tf.zeros(shape=(inputs.shape[0], self.units))\n    for t in range(inputs.shape[1]):\n      x = inputs[:, t, :]\n      h = self.projection_1(x)\n      y = h + self.projection_2(state)\n      state = y\n      outputs.append(y)\n    features = tf.stack(outputs, axis=1)\n    print(features.shape)\n    return self.classifier(features)\n\nrnn_model = CustomRNN()\n_ = rnn_model(tf.zeros((1, timesteps, input_dim)))\n```\n\n    (1, 10, 32)\n\n\n相反，只要它实现了一个遵循以下模式之一的`call`方法，你就可以在Functional API中使用任何子类Layer或Model：\n\n- `call(self, inputs, **kwargs)`  其中 `inputs` 是张量或张量的张量结构（例如张量列表），其中 `**kwargs` 是非张量参数（非输入）。\n- `call(self, inputs, training=None, **kwargs)` 其中 `training` 是一个布尔值，表示该层是否应该在训练模式和推理模式下运行。\n- `call(self, inputs, mask=None, **kwargs)` 其中 `mask` 是布尔掩码张量（例如对RNN有用）。\n- `call(self, inputs, training=None, mask=None, **kwargs)` -- 当然，您可以同时具有屏蔽和特定于训练的行为。\n\n此外，如果在自定义图层或模型上实现 `get_config` 方法，则使用它创建的功能模型仍将是可序列化和可克隆的。\n\n这是一个快速示例，我们在Functional模型中使用从头开始编写的自定义RNN：\n\n```python\nunits = 32\ntimesteps = 10\ninput_dim = 5\nbatch_size = 16\n\n\nclass CustomRNN(layers.Layer):\n\n  def __init__(self):\n    super(CustomRNN, self).__init__()\n    self.units = units\n    self.projection_1 = layers.Dense(units=units, activation='tanh')\n    self.projection_2 = layers.Dense(units=units, activation='tanh')\n    self.classifier = layers.Dense(1, activation='sigmoid')\n\n  def call(self, inputs):\n    outputs = []\n    state = tf.zeros(shape=(inputs.shape[0], self.units))\n    for t in range(inputs.shape[1]):\n      x = inputs[:, t, :]\n      h = self.projection_1(x)\n      y = h + self.projection_2(state)\n      state = y\n      outputs.append(y)\n    features = tf.stack(outputs, axis=1)\n    return self.classifier(features)\n\n# Note that we specify a static batch size for the inputs with the `batch_shape`\n# arg, because the inner computation of `CustomRNN` requires a static batch size\n# (when we create the `state` zeros tensor).\ninputs = keras.Input(batch_shape=(batch_size, timesteps, input_dim))\nx = layers.Conv1D(32, 3)(inputs)\noutputs = CustomRNN()(x)\n\nmodel = keras.Model(inputs, outputs)\n\nrnn_model = CustomRNN()\n_ = rnn_model(tf.zeros((1, 10, 5)))\n```\n\n这就是我们关于函数API的指南的全部内容!\n\n现在，您已经拥有了一套用于构建深度学习模型的强大工具。\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-guide-keras-functional.html](https://www.mashangxue123.com/tensorflow/tf2-guide-keras-functional.html)\n> 英文版本：[https://tensorflow.google.cn/beta/guide/keras/functional](https://tensorflow.google.cn/beta/guide/keras/functional)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/guide/keras/functional.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/guide/keras/functional.md)\n"
  },
  {
    "path": "r2/guide/keras/overview.md",
    "content": "---\ntitle: Keras概述：构建模型，输入数据，训练，评估，回调\ntags: tensorflow2.0教程\ncategories: tensorflow2官方教程\ntop: 1999\nabbrlink: tensorflow/tf2-guide-keras-overview\n---\n\n# Keras概述：构建模型，输入数据，训练，评估，回调，保存，分布(tensorflow2.0官方教程翻译）\n\nKeras 是一个用于构建和训练深度学习模型的高阶API。它可用于快速设计原型、高级研究和生产，具有以下三个主要优势：\n\n* 方便用户使用\n\nKeras 具有针对常见用例做出优化的简单而一致的界面。它可针对用户错误提供切实可行的清晰反馈。\n\n* 模块化和可组合\n\n将可配置的构造块连接在一起就可以构建 Keras 模型，并且几乎不受限制。\n\n* 易于扩展\n\n可以编写自定义构造块以表达新的研究创意，并且可以创建新层、损失函数并开发先进的模型。\n\n## 1. 导入 tf.keras\n\n`tf.keras` 是 TensorFlow 对 [Keras API 规范](https://keras.io)的实现。这是一个用于构建和训练模型的高阶 API，包含对 TensorFlow 特定功能（例如[eager execution](https://tensorflow.google.cn/guide/keras#eager_execution)、[`tf.data` 管道](https://tensorflow.google.cn/api_docs/python/tf/data)和 [Estimators](https://tensorflow.google.cn/guide/estimators)）的顶级支持。 `tf.keras` 使 TensorFlow 更易于使用，并且不会牺牲灵活性和性能。\n\n首先，导入 `tf.keras` 以设置 TensorFlow 程序：\n\n```python\nfrom __future__ import absolute_import, division, print_function, unicode_literals\n\nimport tensorflow as tf\n\nfrom tensorflow import keras\n```\n\n`tf.keras` 可以运行任何与 Keras 兼容的代码，但请注意：\n\n* 最新版 TensorFlow 中的 `tf.keras` 版本可能与 PyPI 中的最新 keras 版本不同。请查看 `tf.keras.version`。\n\n* [保存模型的权重](#weights_only)时，`tf.keras` 默认采用检查点格式。请传递 ` save_format='h5' `以使用 HDF5。\n\n## 2. 构建简单的模型\n\n### 2.1. 序列模型\n\n在 Keras 中，您可以通过组合层来构建模型。模型（通常）是由层构成的图。最常见的模型类型是层的堆叠：`tf.keras.Sequential` 模型。\n\n要构建一个简单的全连接网络（即多层感知器），请运行以下代码：\n\n```python\nfrom tensorflow.keras import layers\n\nmodel = tf.keras.Sequential()\n# 向模型添加一个64单元的密集连接层：\nmodel.add(layers.Dense(64, activation='relu'))\n# 加上另一个：\nmodel.add(layers.Dense(64, activation='relu'))\n# 添加一个包含10个输出单位的softmax层：\nmodel.add(layers.Dense(10, activation='softmax'))\n```\n\n您可以找到有关如何使用Sequential模型的完整简短示例 [here](https://github.com/tensorflow/docs/blob/master/site/en/r2/tutorials/quickstart/beginner.ipynb).\n\n要了解如何构建比Sequential模型更高级的模型，请参阅:\n- [Guide to the Keras Functional](https://tensorflow.google.cn/beta/guide/keras/functional)\n- [Guide to writing layers and models from scratch with subclassing](https://tensorflow.google.cn/beta/guide/keras/custom_layers_and_models)\n\n### 2.2. 配置层\n\n我们可以使用很多 `tf.keras.layers`，它们具有一些相同的构造函数参数：\n\n* `activation`：设置层的激活函数。此参数由内置函数的名称指定，或指定为可调用对象。默认情况下，系统不会应用任何激活函数。\n\n* `kernel_initializer` 和 `bias_initializer`：创建层权重（核和偏差）的初始化方案。此参数是一个名称或可调用对象，默认为 \"Glorot uniform\" 初始化器。\n\n* `kernel_regularizer` 和 `bias_regularizer`：应用层权重（核和偏差）的正则化方案，例如 L1 或 L2 正则化。默认情况下，系统不会应用正则化函数。\n\n以下代码使用构造函数参数实例化 `tf.keras.layers. Dense` 层：\n\n```python\n# 创建一个sigmoid层:\nlayers.Dense(64, activation='sigmoid')\n# 或者使用下面的代码创建:\nlayers.Dense(64, activation=tf.keras.activations.sigmoid)\n\n# 将具有因子0.01的L1正则化的线性层应用于核矩阵:\nlayers.Dense(64, kernel_regularizer=tf.keras.regularizers.l1(0.01))\n\n# 将L2正则化系数为0.01的线性层应用于偏置向量：\nlayers.Dense(64, bias_regularizer=tf.keras.regularizers.l2(0.01))\n\n# 一个内核初始化为随机正交矩阵的线性层：\nlayers.Dense(64, kernel_initializer='orthogonal')\n\n# 偏置矢量初始化为2.0s的线性层：\nlayers.Dense(64, bias_initializer=tf.keras.initializers.Constant(2.0))\n```\n\n## 3. 训练和评估\n\n### 3.1. 设置训练流程\n\n构建好模型后，通过调用 `compile` 方法配置该模型的学习流程：\n\n```python\nmodel = tf.keras.Sequential([\n# 向模型添加一个64单元的密集连接层：\nlayers.Dense(64, activation='relu', input_shape=(32,)),\n# 加上另一个:\nlayers.Dense(64, activation='relu'),\n# 添加具有10个输出单位的softmax层:\nlayers.Dense(10, activation='softmax')])\n\nmodel.compile(optimizer=tf.keras.optimizers.Adam(0.001),\n              loss='categorical_crossentropy',\n              metrics=['accuracy'])\n```\n\n`tf.keras.Model.compile` 采用三个重要参数：\n\n* `optimizer`：此对象会指定训练过程。从`tf.keras.optimizers`  模块向其传递优化器实例，例如  `tf.keras.optimizers.Adam` 、`tf.keras.optimizers.SGD` 。如果您只想使用默认参数，还可以通过字符串指定优化器，例如'adam'或'sgd'。\n\n* `loss`：要在优化期间最小化的函数。常见选择包括均方误差 (`mse`)、`categorical_crossentropy` 和 `binary_crossentropy`。损失函数由名称或通过从 `tf.keras.losses` 模块传递可调用对象来指定。\n\n* `metrics`：用于监控训练。它们是 `tf.keras.metrics` 模块中的字符串名称或可调用对象。\n\n* 此外，为了确保模型能够热切地进行训练和评估，您可以确保将`run_eagerly=True` 作为参数进行编译。\n\n以下代码展示了配置模型以进行训练的几个示例：\n\n```python\n# 配置均方误差回归模型。\nmodel.compile(optimizer=tf.keras.optimizers.Adam(0.01),\n              loss='mse',       # 均方误差\n              metrics=['mae'])  # 平均绝对误差\n\n# 为分类分类配置一个模型\nmodel.compile(optimizer=tf.keras.optimizers.RMSprop(0.01),\n              loss=tf.keras.losses.CategoricalCrossentropy(),\n              metrics=[tf.keras.metrics.CategoricalAccuracy()])\n```\n\n### 3.2. 输入 NumPy 数据\n\n对于小型数据集，请使用内存中的[NumPy](https://www.numpy.org/)数组训练和评估模型。使用 fit 方法使模型与训练数据“拟合”：\n\n```python\nimport numpy as np\n\ndata = np.random.random((1000, 32))\nlabels = np.random.random((1000, 10))\n\nmodel.fit(data, labels, epochs=10, batch_size=32)\n```\n\n```\n      ...\n      Epoch 10/10\n      1000/1000 [==============================] - 0s 82us/sample - loss: 11.4075 - categorical_accuracy: 0.1690\n```\n\n`tf.keras.Model.fit` 采用三个重要参数：\n\n* `epochs`：以周期为单位进行训练。一个周期是对整个输入数据的一次迭代（以较小的批次完成迭代）。\n\n* `batch_size`：当传递 NumPy 数据时，模型将数据分成较小的批次，并在训练期间迭代这些批次。此整数指定每个批次的大小。请注意，如果样本总数不能被批次大小整除，则最后一个批次可能更小。\n\n* `validation_data`：在对模型进行原型设计时，您需要轻松监控该模型在某些验证数据上达到的效果。传递此参数（输入和标签元组）可以让该模型在每个周期结束时以推理模式显示所传递数据的损失和指标。\n\n下面是使用 `validation_data` 的示例：\n\n```python\nimport numpy as np\n\ndata = np.random.random((1000, 32))\nlabels = np.random.random((1000, 10))\n\nval_data = np.random.random((100, 32))\nval_labels = np.random.random((100, 10))\n\nmodel.fit(data, labels, epochs=10, batch_size=32,\n          validation_data=(val_data, val_labels))\n```\n\n```\n      Train on 1000 samples, validate on 100 samples\n      ...\n            Epoch 10/10\n            1000/1000 [==============================] - 0s 93us/sample - loss: 11.5019 - categorical_accuracy: 0.1220 - val_loss: 11.5879 - val_categorical_accuracy: 0.0800\n            <tensorflow.python.keras.callbacks.History at 0x7fe0642970b8>\n```\n\n\n### 3.3. 输入 tf.data 数据集\n\n使用 [Datasets API](https://tensorflow.google.cn/guide/datasets) 可扩展为大型数据集或多设备训练。将 `tf.data.Dataset` 实例传递到 `fit` 方法：\n\n```python\n# 实例化玩具数据集实例：\ndataset = tf.data.Dataset.from_tensor_slices((data, labels))\ndataset = dataset.batch(32)\n\n# 在数据集上调用`fit`时，不要忘记指定`steps_per_epoch`。\nmodel.fit(dataset, epochs=10, steps_per_epoch=30)\n```\n\n输出：\n```\n      Epoch 1/10\n      30/30 [==============================] - 0s 7ms/step - loss: 11.4902 - categorical_accuracy: 0.1094\n```\n\n在上方代码中，`fit` 方法使用了 `steps_per_epoch` 参数（表示模型在进入下一个周期之前运行的训练步数）。由于 `Dataset` 会生成批次数据，因此该代码段不需要 `batch_size`。\n\n数据集也可用于验证：\n\n```python\ndataset = tf.data.Dataset.from_tensor_slices((data, labels))\ndataset = dataset.batch(32)\n\nval_dataset = tf.data.Dataset.from_tensor_slices((val_data, val_labels))\nval_dataset = val_dataset.batch(32)\n\nmodel.fit(dataset, epochs=10,\n          validation_data=val_dataset)\n```\n\n```\n      ...\n      Epoch 10/10\n      32/32 [==============================] - 0s 4ms/step - loss: 11.4778 - categorical_accuracy: 0.1560 - val_loss: 11.6653 - val_categorical_accuracy: 0.1300\n\n      <tensorflow.python.keras.callbacks.History at 0x7fdfd8329d30>\n```\n\n### 3.4. 评估和预测\n\n`tf.keras.Model.evaluate`和`tf.keras.Model.predict`方法可以使用NumPy数据和`tf.data.Dataset`。\n\n要评估所提供数据的推理模式损失和指标，请运行以下代码：\n\n```python\ndata = np.random.random((1000, 32))\nlabels = np.random.random((1000, 10))\n\nmodel.evaluate(data, labels, batch_size=32)\n\nmodel.evaluate(dataset, steps=30)\n```\n\n```\n      1000/1000 [==============================] - 0s 72us/sample - loss: 11.5580 - categorical_accuracy: 0.0960\n      30/30 [==============================] - 0s 2ms/step - loss: 11.4651 - categorical_accuracy: 0.1594\n\n      [11.465100129445394, 0.159375]\n```\n\n要在所提供数据（采用 NumPy 数组形式）的推理中预测最后一层的输出，请运行以下代码：\n\n```python\nresult = model.predict(data, batch_size=32)\nprint(result.shape)\n```\n\n```\n      (1000, 10)\n```\n\n有关训练和评估的完整指南，包括如何从头开始编写自定义训练循环，请参阅[训练和评估指南](https://tensorflow.google.cn/beta/guide/keras/training_and_evaluation)。\n\n## 4. 构建高级模型\n\n### 4.1. 函数式 API\n\n`tf.keras.Sequential` 模型是层的简单堆叠，无法表示任意模型。使用 [Keras 函数式 API](https://tensorflow.google.cn/beta/guide/keras/functional) 可以构建复杂的模型拓扑，例如：\n\n* 多输入模型，\n* 多输出模型，\n* 具有共享层的模型（同一层被调用多次），\n* 具有非序列数据流的模型（例如，剩余连接）。\n\n使用函数式 API 构建的模型具有以下特征：\n\n1. 层实例可调用并返回张量。\n2. 输入张量和输出张量用于定义 `tf.keras.Model` 实例。\n3. 此模型的训练方式和 `Sequential` 模型一样。\n\n以下示例使用函数式 API 构建一个简单的全连接网络：\n\n```python\ninputs = tf.keras.Input(shape=(32,))  # 返回输入占位符\n\n# 层实例可在张量上调用，并返回张量。\nx = layers.Dense(64, activation='relu')(inputs)\nx = layers.Dense(64, activation='relu')(x)\npredictions = layers.Dense(10, activation='softmax')(x)\n```\n\n在给定输入和输出的情况下实例化模型。\n\n```python\nmodel = tf.keras.Model(inputs=inputs, outputs=predictions)\n\n# compile步骤指定训练配置\nmodel.compile(optimizer=tf.keras.optimizers.RMSprop(0.001),\n              loss='categorical_crossentropy',\n              metrics=['accuracy'])\n\n# 训练5个周期\nmodel.fit(data, labels, batch_size=32, epochs=5)\n```\n\n```\n      ...\n      Epoch 5/5\n      1000/1000 [==============================] - 0s 81us/sample - loss: 11.4819 - accuracy: 0.1270\n\n      <tensorflow.python.keras.callbacks.History at 0x7fdfd820b898>\n```\n\n### 4.2. 模型子类化\n\n通过对 `tf.keras.Model`进行子类化，并定义您自己的前向传播来构建完全可自定义的模型。在` __init__` 方法中创建层并将它们设置为类实例的属性。在 `call`方法中定义前向传播。\n\n在启用 [eager execution](https://tensorflow.google.cn/beta/guide/eager) 时，模型子类化特别有用，因为可以强制写入前向传播。\n\n*注意：为了确保正向传递总是强制运行，你必须在调用超级构造函数时设置`dynamic = True`*\n\n要点：针对作业使用正确的 API。虽然模型子类化较为灵活，但代价是复杂性更高且用户出错率更高。如果可能，请首选函数式 API。\n\n以下示例展示了使用自定义前向传播进行子类化的 `tf.keras.Model`，该传递不必强制运行：\n\n```python\nclass MyModel(tf.keras.Model):\n\n  def __init__(self, num_classes=10):\n    super(MyModel, self).__init__(name='my_model')\n    self.num_classes = num_classes\n    # 在此处定义层。.\n    self.dense_1 = layers.Dense(32, activation='relu')\n    self.dense_2 = layers.Dense(num_classes, activation='sigmoid')\n\n  def call(self, inputs):\n    # 在这里定义你的前向传播\n    # 使用之前定义的层（在`__init__`中）\n    x = self.dense_1(inputs)\n    return self.dense_2(x)\n```\n\n实例化新模型类：\n\n```python\nmodel = MyModel(num_classes=10)\n\n# The compile step specifies the training configuration.\nmodel.compile(optimizer=tf.keras.optimizers.RMSprop(0.001),\n              loss='categorical_crossentropy',\n              metrics=['accuracy'])\n\n# 训练5个周期\nmodel.fit(data, labels, batch_size=32, epochs=5)\n```\n\n```\n      ...\n      Epoch 5/5 1000/1000 [==============================] - 0s 74us/sample - loss: 11.4954 - accuracy: 0.1110 \n```\n\n### 4.3. 自定义层\n\n通过继承 `tf.keras.layers.Layer` 并实现以下方法来创建自定义层：\n\n* `__init__`: （可选）定义此层要使用的子层\n\n* `build`: 创建层的权重。使用 `add_weight` 方法添加权重。\n\n* `call`: 定义前向传播。\n\n* 或者，可以通过实现 `get_config` 方法和 `from_config` 类方法序列化层。\n\n下面是一个自定义层的示例，它使用核矩阵实现输入的`matmul`：\n\n```python\nclass MyLayer(layers.Layer):\n\n  def __init__(self, output_dim, **kwargs):\n    self.output_dim = output_dim\n    super(MyLayer, self).__init__(**kwargs)\n\n  def build(self, input_shape):\n    # Create a trainable weight variable for this layer.\n    self.kernel = self.add_weight(name='kernel',\n                                  shape=(input_shape[1], self.output_dim),\n                                  initializer='uniform',\n                                  trainable=True)\n\n  def call(self, inputs):\n    return tf.matmul(inputs, self.kernel)\n\n  def get_config(self):\n    base_config = super(MyLayer, self).get_config()\n    base_config['output_dim'] = self.output_dim\n    return base_config\n\n  @classmethod\n  def from_config(cls, config):\n    return cls(**config)\n```\n\n使用自定义层创建模型：\n\n```python\nmodel = tf.keras.Sequential([\n    MyLayer(10),\n    layers.Activation('softmax')])\n\n# 训练配置\nmodel.compile(optimizer=tf.keras.optimizers.RMSprop(0.001),\n              loss='categorical_crossentropy',\n              metrics=['accuracy'])\n\n# 训练5个周期\nmodel.fit(data, labels, batch_size=32, epochs=5)\n```\n\n了解有关从头开始创建新层和模型的更多信息，在[从头开始编写层和模型指南](https://tensorflow.google.cn/beta/guide/keras/custom_layers_and_models)。\n\n## 5. 回调\n\n回调是传递给模型的对象，用于在训练期间自定义该模型并扩展其行为。您可以编写自定义回调，也可以使用包含以下方法的内置 `tf.keras.callbacks`：\n\n\n* `tf.keras.callbacks.ModelCheckpoint`: 定期保存模型的检查点。\n\n* `tf.keras.callbacks.LearningRateScheduler`: 动态更改学习速率。\n\n* `tf.keras.callbacks.EarlyStopping`:在验证效果不再改进时中断训练。\n\n* `tf.keras.callbacks.TensorBoard`: 使用  [TensorBoard](https://tensorflow.google.cn/tensorboard) 监控模型的行为。\n\n要使用  `tf.keras.callbacks.Callback`，请将其传递给模型的 `fit` 方法：\n\n```python\ncallbacks = [\n  # 如果`val_loss`在2个以上的周期内停止改进，则进行中断训练\n  tf.keras.callbacks.EarlyStopping(patience=2, monitor='val_loss'),\n  # 将TensorBoard日志写入`./logs`目录\n  tf.keras.callbacks.TensorBoard(log_dir='./logs')\n]\nmodel.fit(data, labels, batch_size=32, epochs=5, callbacks=callbacks,\n          validation_data=(val_data, val_labels))\n```\n\n```\n      Train on 1000 samples, validate on 100 samples \n      ...\n      Epoch 5/5 1000/1000 [==============================] - 0s 76us/sample - loss: 11.4813 - accuracy: 0.1190 - val_loss: 11.5753 - val_accuracy: 0.1100 <tensorflow.python.keras.callbacks.History at 0x7fdfd12e7080>\n```\n\n\n## 6. 保存和恢复\n\n### 6.1. 仅限权重\n\n使用 `tf.keras.Model.save_weights`保存并加载模型的权重：\n\n```python\nmodel = tf.keras.Sequential([\nlayers.Dense(64, activation='relu', input_shape=(32,)),\nlayers.Dense(10, activation='softmax')])\n\nmodel.compile(optimizer=tf.keras.optimizers.Adam(0.001),\n              loss='categorical_crossentropy',\n              metrics=['accuracy'])\n```\n\n\n```\n# 将权重保存到TensorFlow检查点文件\nmodel.save_weights('./weights/my_model')\n\n# 恢复模型的状态，这需要具有相同架构的模型。\nmodel.load_weights('./weights/my_model')\n```\n\n默认情况下，会以 [TensorFlow 检查点](https://tensorflow.google.cn/beta/guide/checkpoints)文件格式保存模型的权重。权重也可以另存为 Keras HDF5 格式（Keras 多后端实现的默认格式）：\n\n```\n# 将权重保存到HDF5文件\nmodel.save_weights('my_model.h5', save_format='h5')\n\n# 恢复模型的状态\nmodel.load_weights('my_model.h5')\n```\n\n### 6.2. 仅限配置\n\n可以保存模型的配置，此操作会对模型架构（不含任何权重）进行序列化。即使没有定义原始模型的代码，保存的配置也可以重新创建并初始化相同的模型。Keras 支持 JSON 和 YAML 序列化格式：\n\n\n\n```python\n# 将模型序列化为JSON格式\njson_string = model.to_json()\njson_string\n```\n\n```\n      '{\"class_name\": \"Sequential\", \"config\": {\"layers\": [{\"class_name\": \"Dense\", \"config\": {\"units\": 64, \"activity_regularizer\": null, \"dtype\": \"float32\",....... \"backend\": \"tensorflow\", \"keras_version\": \"2.2.4-tf\"}'\n```\n\n```python\nimport json\nimport pprint\npprint.pprint(json.loads(json_string))\n```\n\n```\n      {'backend': 'tensorflow', 'class_name': 'Sequential', 'config': {'layers': [{'class_name': 'Dense', 'config': {'activation': 'relu', 'activity_regularizer': None, '......'keras_version': '2.2.4-tf'}\n```\n\n更多运行的输出内容请看英文版https://tensorflow.google.cn/beta/guide/keras/overview\n\n从 json 重新创建模型（刚刚初始化）。\n\n```python\nfresh_model = tf.keras.models.model_from_json(json_string)\n```\n\n将模型序列化为YAML格式，要求您在导入TensorFlow之前安装pyyaml（命令：`pip install -q pyyaml`）：\n\n```\nyaml_string = model.to_yaml()\nprint(yaml_string)\n```\n\n从YAML重新创建模型：\n\n```python\nfresh_model = tf.keras.models.model_from_yaml(yaml_string)\n```\n\n注意：子类化模型不可序列化，因为它们的架构由`call`方法正文中的 Python 代码定义。\n\n\n### 6.3. 整个模型\n\n整个模型可以保存到一个文件中，其中包含权重值、模型配置乃至优化器配置。这样，您就可以对模型设置检查点并稍后从完全相同的状态继续训练，而无需访问原始代码。\n\n```python\n# 创建一个简单的模型\nmodel = tf.keras.Sequential([\n  layers.Dense(10, activation='softmax', input_shape=(32,)),\n  layers.Dense(10, activation='softmax')\n])\nmodel.compile(optimizer='rmsprop',\n              loss='categorical_crossentropy',\n              metrics=['accuracy'])\nmodel.fit(data, labels, batch_size=32, epochs=5)\n\n\n# 将整个模型保存到HDF5文件\nmodel.save('my_model.h5')\n\n# 重新创建完全相同的模型，包括权重和优化器\nmodel = tf.keras.models.load_model('my_model.h5')\n```\n\n```\n      ...\n      Epoch 5/5 1000/1000 [==============================] - 0s 76us/sample - loss: 11.4913 - accuracy: 0.0990\n```\n\n \n在[保存和序列化模型指南](https://tensorflow.google.cn/beta/guide/keras/saving_and_serializing)中，了解有关Keras模型的保存和序列化的更多信息。\n\n## 7. Eager execution\n\n[Eager execution](https://tensorflow.google.cn/guide/estimators) 是一种命令式编程环境，可立即评估操作。这不是Keras所必需的，但是由`tf.keras`支持，对于检查程序和调试很有用。\n\n所有 `tf.keras` 模型构建 API 都与 Eager Execution 兼容。虽然可以使用 `Sequential` 和函数式 API，但 Eager Execution 对模型子类化和构建自定义层特别有用。与通过组合现有层来创建模型的 API 不同，函数式 API 要求您编写前向传播代码。\n\n请参阅 [Eager Execution 指南](https://tensorflow.google.cn/guide/eager#build_a_model)，了解将 Keras 模型与自定义训练循环和 [tf.GradientTape](https://tensorflow.google.cn/api_docs/python/tf/GradientTape) 搭配使用的示例 [here](https://github.com/tensorflow/docs/blob/master/site/en/r2/tutorials/quickstart/advanced.ipynb).。\n\n## 8. 分布\n\n\n### 8.1. 多个 GPU\n\n`tf.keras` 模型可以使用 `tf.distribute.Strategy`在多个 GPU 上运行。此 API 在多个 GPU 上提供分布式训练，几乎不需要更改现有代码。\n\n目前，`tf.distribute.MirroredStrategy`是唯一受支持的分布策略。`MirroredStrategy` 通过在一台机器上使用规约在同步训练中进行图内复制。要使用`distribute.Strategy`s，请在 `Strategy`'s `.scope()`中嵌套优化器实例化和模型构造和编译，然后训练模型。\n\n\n以下示例在单个计算机上的多个GPU之间分发`tf.keras.Model`。\n\n首先，在分布式策略范围内定义模型：\n\n```python\nstrategy = tf.distribute.MirroredStrategy()\n\nwith strategy.scope():\n  model = tf.keras.Sequential()\n  model.add(layers.Dense(16, activation='relu', input_shape=(10,)))\n  model.add(layers.Dense(1, activation='sigmoid'))\n\n  optimizer = tf.keras.optimizers.SGD(0.2)\n\n  model.compile(loss='binary_crossentropy', optimizer=optimizer)\n\nmodel.summary()\n```\n\n```\n      Model: \"sequential_5\"\n      _________________________________________________________________ \n      Layer (type) Output Shape Param # =================================================================\n      dense_21 (Dense) (None, 16) 176 _________________________________________________________________ \n      dense_22 (Dense) (None, 1) 17 ================================================================= \n      Total params: 193 Trainable params: 193 Non-trainable params: 0   \n      _________________________________________________________________\n```\n\n接下来，像往常一样训练模型数据：\n\n```python\nx = np.random.random((1024, 10))\ny = np.random.randint(2, size=(1024, 1))\nx = tf.cast(x, tf.float32)\ndataset = tf.data.Dataset.from_tensor_slices((x, y))\ndataset = dataset.shuffle(buffer_size=1024).batch(32)\n\nmodel.fit(dataset, epochs=1)\n```\n\n```\n32/32 [==============================] - 3s 82ms/step - loss: 0.7005 <tensorflow.python.keras.callbacks.History at 0x7fdfa057fb00>\n```\n\n有关更多信息，请参阅[TensorFlow中的分布式训练完整指南](https://tensorflow.google.cn/beta/guide/distribute_strategy)。\n\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-guide-keras-overview.html](https://www.mashangxue123.com/tensorflow/tf2-guide-keras-overview.html)\n> 英文版本：[https://tensorflow.google.cn/beta/guide/keras/overview](https://tensorflow.google.cn/beta/guide/keras/overview)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/guide/keras/overview.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/guide/keras/overview.md)\n"
  },
  {
    "path": "r2/guide/keras/training_and_evaluation.md",
    "content": "---\ntitle: 使用 TensorFlow Keras 进行训练和评估\ntags: tensorflow2.0教程\ncategories: tensorflow2官方教程\ntop: 1999\nabbrlink: tensorflow/tf2-guide-keras-training_and_evaluation\n---\n\n# 使用TensorFlow Keras进行训练和评估 (tensorflow2.0官方教程翻译)\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-guide-keras-training_and_evaluation.html](https://www.mashangxue123.com/tensorflow/tf2-guide-keras-training_and_evaluation.html)\n> 英文版本：[https://tensorflow.google.cn/beta/guide/keras/training_and_evaluation](https://tensorflow.google.cn/beta/guide/keras/training_and_evaluation)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/guide/keras/training_and_evaluation.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/guide/keras/training_and_evaluation.md)\n\n本指南涵盖了TensorFlow 2.0在两种主要情况下的训练、评估和预测(推理)模型:\n\n- 使用内置API进行训练和验证时（例如`model.fit()`, `model.evaluate()`, `model.predict()` ）。这将在“使用内置的训练和评估循环”一节中讨论。\n- 使用eager execution和 `GradientTape` 对象从头开始编写自定义循环时。这在 “从零开始编写您自己的训练和评估循环” 小节中有介绍。\n\n一般来说，无论您是使用内置循环还是编写自己的循环，模型训练和评估在每种Keras模型(Sequential顺序模型、使用使用函数式API构建的模型以及通过模型子类从零开始编写的模型)中都严格按照相同的方式工作。\n\n本指南不包括分布式训练。\n\n## 设置\n\n安装\n```\npip install pydot\napt-get install graphviz\npip install tensorflow-gpu==2.0.0-alpha0\n```\n\n导入\n```\nfrom __future__ import absolute_import, division, print_function, unicode_literals\n\nimport tensorflow as tf\n\ntf.keras.backend.clear_session()  # For easy reset of notebook state.\n```\n\n## 第一部分：使用内置训练和评估循环\n\n将数据传递给模型的内置训练循环时，您应该使用Numpy数组（如果数据很小并且适合内存）或tf.data数据集对象。在接下来的几段中，我们将使用MNIST数据集作为Numpy数组，以演示如何使用优化器，损失和指标。\n\n### API概述：第一个端到端示例\n\n让我们考虑以下模型（这里，我们使用Functional API构建，但它也可以是顺序模型或子类模型）：\n\n```python\nfrom tensorflow import keras\nfrom tensorflow.keras import layers\n\ninputs = keras.Input(shape=(784,), name='digits')\nx = layers.Dense(64, activation='relu', name='dense_1')(inputs)\nx = layers.Dense(64, activation='relu', name='dense_2')(x)\noutputs = layers.Dense(10, activation='softmax', name='predictions')(x)\n\nmodel = keras.Model(inputs=inputs, outputs=outputs)\n```\n\n以下是典型的端到端工作流程的外观，包括训练，对原始训练数据生成的保留集的验证，以及最终对测试数据的评估：\n\n\n```python\n# Load a toy dataset for the sake of this example\n(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()\n\n# Preprocess the data (these are Numpy arrays)\nx_train = x_train.reshape(60000, 784).astype('float32') / 255\nx_test = x_test.reshape(10000, 784).astype('float32') / 255\n\n# Reserve 10,000 samples for validation\nx_val = x_train[-10000:]\ny_val = y_train[-10000:]\nx_train = x_train[:-10000]\ny_train = y_train[:-10000]\n\n# Specify the training configuration (optimizer, loss, metrics)\nmodel.compile(optimizer=keras.optimizers.RMSprop(),  # Optimizer\n              # Loss function to minimize\n              loss=keras.losses.SparseCategoricalCrossentropy(),\n              # List of metrics to monitor\n              metrics=[keras.metrics.SparseCategoricalAccuracy()])\n\n# Train the model by slicing the data into \"batches\"\n# of size \"batch_size\", and repeatedly iterating over\n# the entire dataset for a given number of \"epochs\"\nprint('# Fit model on training data')\nhistory = model.fit(x_train, y_train,\n                    batch_size=64,\n                    epochs=3,\n                    # We pass some validation for\n                    # monitoring validation loss and metrics\n                    # at the end of each epoch\n                    validation_data=(x_val, y_val))\n\n# The returned \"history\" object holds a record\n# of the loss values and metric values during training\nprint('\\nhistory dict:', history.history)\n\n# Evaluate the model on the test data using `evaluate`\nprint('\\n# Evaluate on test data')\nresults = model.evaluate(x_test, y_test, batch_size=128)\nprint('test loss, test acc:', results)\n\n# Generate predictions (probabilities -- the output of the last layer)\n# on new data using `predict`\nprint('\\n# Generate predictions for 3 samples')\npredictions = model.predict(x_test[:3])\nprint('predictions shape:', predictions.shape)\n```\n\n### 指定损失，指标和优化程序\n\n要训练合适的模型，您需要指定一个损失函数，一个优化器，以及可选的一些要监控的指标。\n\n您将这些作为 `compile()` 方法的参数传递给模型：\n\n```python\nmodel.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-3),\n              loss=keras.losses.SparseCategoricalCrossentropy(),\n              metrics=[keras.metrics.SparseCategoricalAccuracy()])\n```\n\n`metrics` 参数应该是一个列表（您的模型可以包含任意数量的度量标准）。\n\n如果您的模型有多个输出，您可以为每个输出指定不同的损失和度量，并且您可以调整每个输出对模型总损失的贡献。\n您将在“将数据传递到多输入、多输出模型”一节中找到更多关于此的详细信息。\n\n注意，在很多情况下，损失和指标是通过字符串标识符指定的，作为一种快捷方式:\n\n```python\nmodel.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-3),\n              loss='sparse_categorical_crossentropy',\n              metrics=['sparse_categorical_accuracy'])\n```\n\n为了以后的重用，我们将模型定义和编译步骤放在函数中;我们将在本指南的不同示例中多次调用它们。\n\n```python\ndef get_uncompiled_model():\n  inputs = keras.Input(shape=(784,), name='digits')\n  x = layers.Dense(64, activation='relu', name='dense_1')(inputs)\n  x = layers.Dense(64, activation='relu', name='dense_2')(x)\n  outputs = layers.Dense(10, activation='softmax', name='predictions')(x)\n  model = keras.Model(inputs=inputs, outputs=outputs)\n  return model\n\ndef get_compiled_model():\n  model = get_uncompiled_model()\n  model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-3),\n              loss='sparse_categorical_crossentropy',\n              metrics=['sparse_categorical_accuracy'])\n  return model\n```\n\n#### 许多内置的优化器、损失和指标都是可用的\n\n通常，您不必从头开始创建自己的损失，指标或优化器，因为您需要的可能已经是Keras API的一部分：\nOptimizers优化器:\n- `SGD()` (with or without momentum)\n- `RMSprop()`\n- `Adam()`\n- etc.\n\nLosses损失:\n- `MeanSquaredError()`\n- `KLDivergence()`\n- `CosineSimilarity()`\n- etc.\n\nMetrics指标:\n- `AUC()`\n- `Precision()`\n- `Recall()`\n- etc.\n\n#### 编写自定义损失和指标\n\n如果您需要不属于API的指标，则可以通过继承Metric类轻松创建自定义指标。\n\n您需要实现4种方法：\n- `__init__(self)`,  您将在其中为指标创建状态变量\n- `update_state(self, y_true, y_pred, sample_weight=None)`, 它使用目标`y_true`和模型预测`y_pred`来更新状态变量。\n- `result(self)`, 它使用状态变量来计算最终结果。\n- `reset_states(self)`, 它重新初始化度量的状态。\n\n状态更新和结果计算是分开的（分别在`update_state()` 和 `result()`中）因为在某些情况下，结果计算可能非常昂贵，并且只能定期进行。\n\n这是一个简单的例子，展示了如何实现一个 `CatgoricalTruePositives`  指标，它计算了正确分类为属于给定类的样本数量：\n\n```python\nclass CatgoricalTruePositives(keras.metrics.Metric):\n\n    def __init__(self, name='categorical_true_positives', **kwargs):\n      super(CatgoricalTruePositives, self).__init__(name=name, **kwargs)\n      self.true_positives = self.add_weight(name='tp', initializer='zeros')\n\n    def update_state(self, y_true, y_pred, sample_weight=None):\n      y_pred = tf.argmax(y_pred)\n      values = tf.equal(tf.cast(y_true, 'int32'), tf.cast(y_pred, 'int32'))\n      values = tf.cast(values, 'float32')\n      if sample_weight is not None:\n        sample_weight = tf.cast(sample_weight, 'float32')\n        values = tf.multiply(values, sample_weight)\n      self.true_positives.assign_add(tf.reduce_sum(values))\n\n    def result(self):\n      return self.true_positives\n\n    def reset_states(self):\n      # The state of the metric will be reset at the start of each epoch.\n      self.true_positives.assign(0.)\n\n\nmodel.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-3),\n              loss=keras.losses.SparseCategoricalCrossentropy(),\n              metrics=[CatgoricalTruePositives()])\nmodel.fit(x_train, y_train,\n          batch_size=64,\n          epochs=3)\n\n```\n\n#### 处理不符合标准签名的损失和指标\n\n绝大多数损失和指标可以从`y_true`和`y_pred`计算，其中`y_pred`是模型的输出。但不是全部。例如，正则化损失可能仅需要激活层（在这种情况下没有目标），并且该激活可能不是模型输出。\n\n在这种情况下，您可以从自定义图层的`call`方法中调用  `self.add_loss(loss_value)` 。这是一个添加活动正则化的简单示例（请注意，活动正则化是内置于所有Keras层中的 - 此层仅用于提供具体示例）：\n\n```python\nclass ActivityRegularizationLayer(layers.Layer):\n\n  def call(self, inputs):\n    self.add_loss(tf.reduce_sum(inputs) * 0.1)\n    return inputs  # Pass-through layer.\n\ninputs = keras.Input(shape=(784,), name='digits')\nx = layers.Dense(64, activation='relu', name='dense_1')(inputs)\n\n# Insert activity regularization as a layer\nx = ActivityRegularizationLayer()(x)\n\nx = layers.Dense(64, activation='relu', name='dense_2')(x)\noutputs = layers.Dense(10, activation='softmax', name='predictions')(x)\n\nmodel = keras.Model(inputs=inputs, outputs=outputs)\nmodel.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-3),\n              loss='sparse_categorical_crossentropy')\n\n# The displayed loss will be much higher than before\n# due to the regularization component.\nmodel.fit(x_train, y_train,\n          batch_size=64,\n          epochs=1)\n```\n\n\n您可以执行相同的记录度量标准值：\n\n```python\nclass MetricLoggingLayer(layers.Layer):\n\n  def call(self, inputs):\n    # The `aggregation` argument defines\n    # how to aggregate the per-batch values\n    # over each epoch:\n    # in this case we simply average them.\n    self.add_metric(keras.backend.std(inputs),\n                    name='std_of_activation',\n                    aggregation='mean')\n    return inputs  # Pass-through layer.\n\n\ninputs = keras.Input(shape=(784,), name='digits')\nx = layers.Dense(64, activation='relu', name='dense_1')(inputs)\n\n# Insert std logging as a layer.\nx = MetricLoggingLayer()(x)\n\nx = layers.Dense(64, activation='relu', name='dense_2')(x)\noutputs = layers.Dense(10, activation='softmax', name='predictions')(x)\n\nmodel = keras.Model(inputs=inputs, outputs=outputs)\nmodel.compile(optimizer=keras.optimizers.RMSprop(learning_rate=1e-3),\n              loss='sparse_categorical_crossentropy')\nmodel.fit(x_train, y_train,\n          batch_size=64,\n          epochs=1)\n```\n\n    50000/50000 [==============================] - 4s 76us/sample - loss: 0.3366 - std_of_activation: 0.9773\n\n\n\n在 [Functional API](https://tensorflow.google.cn/beta/guide/keras/functional) 中，您还可以调用 `model.add_loss(loss_tensor)`, 或 `model.add_metric(metric_tensor, name, aggregation)`。\n\n这是一个简单的例子：\n\n```python\ninputs = keras.Input(shape=(784,), name='digits')\nx1 = layers.Dense(64, activation='relu', name='dense_1')(inputs)\nx2 = layers.Dense(64, activation='relu', name='dense_2')(x1)\noutputs = layers.Dense(10, activation='softmax', name='predictions')(x2)\nmodel = keras.Model(inputs=inputs, outputs=outputs)\n\nmodel.add_loss(tf.reduce_sum(x1) * 0.1)\n\nmodel.add_metric(keras.backend.std(x1),\n                 name='std_of_activation',\n                 aggregation='mean')\n\nmodel.compile(optimizer=keras.optimizers.RMSprop(1e-3),\n              loss='sparse_categorical_crossentropy')\nmodel.fit(x_train, y_train,\n          batch_size=64,\n          epochs=1)\n```\n\n    50000/50000 [==============================] - 4s 80us/sample - loss: 2.5158 - std_of_activation: 0.0020\n\n\n#### 自动设置验证保持集\n\n在您看到的第一个端到端示例中，我们使用 `validation_data` 参数将Numpy数组  `(x_val, y_val)` 的元组传递给模型，以便在每个时期结束时评估验证损失和验证指标。\n\n这是另一个选项：参数 `validation_split` 允许您自动保留部分训练数据以进行验证。参数值表示要为验证保留的数据的分数，因此应将其设置为大于0且小于1的数字。例如，`validation_split=0.2`  表示“使用20％的数据进行验证”，`validation_split=0.6` 表示“使用60％的数据进行验证”。\n\n计算验证的方法是：在任何混洗之前，通过`fit`调用接收的数组的最后x％样本。\n\n在使用Numpy数据进行训练时，您只能使用 `validation_split`。\n\n```python\nmodel = get_compiled_model()\nmodel.fit(x_train, y_train, batch_size=64, validation_split=0.2, epochs=3)\n```\n\n输出\n\n    Train on 40000 samples, validate on 10000 samples\n    Epoch 1/3\n    40000/40000 [==============================] - 3s 82us/sample - loss: 0.3735 - sparse_categorical_accuracy: 0.8951 - val_loss: 0.2413 - val_sparse_categorical_accuracy: 0.9272\n    Epoch 2/3\n    40000/40000 [==============================] - 3s 82us/sample - loss: 0.1688 - sparse_categorical_accuracy: 0.9499 - val_loss: 0.1781 - val_sparse_categorical_accuracy: 0.9468\n    Epoch 3/3\n    40000/40000 [==============================] - 3s 79us/sample - loss: 0.1232 - sparse_categorical_accuracy: 0.9638 - val_loss: 0.1518 - val_sparse_categorical_accuracy: 0.9539\n\n### 来自tf.data数据集的培训和评估\n\n在过去的几段中，您已经了解了如何处理损失，度量和优化器，并且您已经看到，当您的数据作为Numpy数组传递时，如何在`fit`中使用`validation_data` 和 `validation_split` 参数\n\n现在让我们看一下您的数据以tf.data数据集的形式出现的情况。\n\ntf.data API是TensorFlow 2.0中的一组实用程序，用于以快速和可伸缩的方式加载和预处理数据。\n\n有关创建数据集的完整指南，请参阅[the tf.data 文档](https://tensorflow.google.cn/versions/r2.0/api_docs/python/tf)。\n\n您可以将数据集实例直接传递给方法 `fit()`, `evaluate()`, 和 `predict()`：\n\n```python\nmodel = get_compiled_model()\n\n# First, let's create a training Dataset instance.\n# For the sake of our example, we'll use the same MNIST data as before.\ntrain_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))\n# Shuffle and slice the dataset.\ntrain_dataset = train_dataset.shuffle(buffer_size=1024).batch(64)\n\n# Now we get a test dataset.\ntest_dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test))\ntest_dataset = test_dataset.batch(64)\n\n# Since the dataset already takes care of batching,\n# we don't pass a `batch_size` argument.\nmodel.fit(train_dataset, epochs=3)\n\n# You can also evaluate or predict on a dataset.\nprint('\\n# Evaluate')\nmodel.evaluate(test_dataset)\n```\n\n输出：\n\n    Epoch 1/3\n    782/782 [==============================] - 5s 7ms/step - loss: 0.3250 - sparse_categorical_accuracy: 0.9074\n    Epoch 2/3\n    782/782 [==============================] - 4s 6ms/step - loss: 0.1484 - sparse_categorical_accuracy: 0.9559\n    Epoch 3/3\n    782/782 [==============================] - 4s 5ms/step - loss: 0.1074 - sparse_categorical_accuracy: 0.9685\n    \n    # Evaluate\n    157/157 [==============================] - 1s 3ms/step - loss: 0.1137 - sparse_categorical_accuracy: 0.9665\n\n请注意，数据集在每个周期的末尾都会重置，因此可以重复使用下一个周期。\n\n如果您只想从此数据集中对特定数量的批次运行训练，则可以传递 `steps_per_epoch`  参数，该参数指定在继续下一个周期之前使用此数据集运行模型的训练步数。\n\n如果这样做，数据集不会在每个周期的末尾重置，而是我们只是继续绘制下一批。数据集最终会耗尽数据（除非它是一个无限循环的数据集）。\n\n```python\nmodel = get_compiled_model()\n\n# Prepare the training dataset\ntrain_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))\ntrain_dataset = train_dataset.shuffle(buffer_size=1024).batch(64)\n\n# Only use the 100 batches per epoch (that's 64 * 100 samples)\nmodel.fit(train_dataset.take(100), epochs=3)\n```\n\n    Epoch 1/3\n    100/100 [==============================] - 1s 11ms/step - loss: 0.7733 - sparse_categorical_accuracy: 0.8067\n    Epoch 2/3\n    100/100 [==============================] - 0s 5ms/step - loss: 0.3706 - sparse_categorical_accuracy: 0.8922\n    Epoch 3/3\n    100/100 [==============================] - 1s 5ms/step - loss: 0.3379 - sparse_categorical_accuracy: 0.9011\n\n#### 使用验证数据集\n\n您可以将数据集实例作为`fit`中的`validation_data`参数传递：\n\n```python\nmodel = get_compiled_model()\n\n# Prepare the training dataset\ntrain_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))\ntrain_dataset = train_dataset.shuffle(buffer_size=1024).batch(64)\n\n# Prepare the validation dataset\nval_dataset = tf.data.Dataset.from_tensor_slices((x_val, y_val))\nval_dataset = val_dataset.batch(64)\n\nmodel.fit(train_dataset, epochs=3, validation_data=val_dataset)\n```\n\n    Epoch 1/3\n    782/782 [==============================] - 7s 8ms/step - loss: 0.3440 - sparse_categorical_accuracy: 0.9020 - val_loss: 0.1838 - val_sparse_categorical_accuracy: 0.9490\n    Epoch 2/3\n    782/782 [==============================] - 7s 9ms/step - loss: 0.1649 - sparse_categorical_accuracy: 0.9515 - val_loss: 0.1391 - val_sparse_categorical_accuracy: 0.9603\n    Epoch 3/3\n    782/782 [==============================] - 8s 10ms/step - loss: 0.1216 - sparse_categorical_accuracy: 0.9645 - val_loss: 0.1208 - val_sparse_categorical_accuracy: 0.9672\n\n在每个周期结束时，模型将迭代验证数据集并计算验证损失和验证指标。\n\n如果你想只在这个数据集中特定数量的批次上运行验证，你可以传递“validation_steps”参数，它指定了模型在中断验证并进入下一个周期之前，应该与验证数据集一起运行多少个验证步骤:\n\n```python\nmodel = get_compiled_model()\n\n# Prepare the training dataset\ntrain_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))\ntrain_dataset = train_dataset.shuffle(buffer_size=1024).batch(64)\n\n# Prepare the validation dataset\nval_dataset = tf.data.Dataset.from_tensor_slices((x_val, y_val))\nval_dataset = val_dataset.batch(64)\n\nmodel.fit(train_dataset, epochs=3,\n          # Only run validation using the first 10 batches of the dataset\n          # using the `validation_steps` argument\n          validation_data=val_dataset, validation_steps=10)\n```\n\n    Epoch 1/3\n    782/782 [==============================] - 9s 12ms/step - loss: 0.3359 - sparse_categorical_accuracy: 0.9053 - val_loss: 0.3095 - val_sparse_categorical_accuracy: 0.9187\n    Epoch 2/3\n    782/782 [==============================] - 7s 9ms/step - loss: 0.1593 - sparse_categorical_accuracy: 0.9528 - val_loss: 0.2196 - val_sparse_categorical_accuracy: 0.9438\n    Epoch 3/3\n    782/782 [==============================] - 7s 9ms/step - loss: 0.1158 - sparse_categorical_accuracy: 0.9661 - val_loss: 0.1840 - val_sparse_categorical_accuracy: 0.9469\n\n请注意，验证数据集将在每次使用后重置（这样您将始终评估从epoch到epoch的相同样本）。从数据集对象进行训练时，不支持参数validation_split（从训练数据生成保持集），因为此功能需要能够索引数据集的样本，这通常是数据集API无法实现的。\n\n### 支持其他输入格式\n\n除了Numpy数组和TensorFlow数据集之外，还可以使用Pandas数据帧或产生批量的Python生成器来训练Keras模型。\n\n通常，如果数据很小并且适合内存，我们建议您使用Numpy输入数据，否则使用数据集。\n\n### 使用样本加权和类权重\n\n除了输入数据和目标数据之外，还可以在使用 `fit` 时将样本权重或类权重传递给模型：\n\n- 从Numpy数据训练时：通过`sample_weight`和`class_weight`参数。 \n- 从数据集训练时：通过让数据集返回一个元组  `(input_batch, target_batch, sample_weight_batch)` .\n\n\n\"sample weights\" 数组是一个数字数组，用于指定批处理中每个样本在计算总损失时应具有多少权重。它通常用于不平衡的分类问题（这个想法是为了给予很少见的类别更多的权重）。当使用的权重是1和0时，该数组可以用作损失函数的掩码（完全丢弃某些样本对总损失的贡献）。\n\n\"class weights\" 字典是同一概念的更具体的实例：它将类索引映射到应该用于属于该类的样本的样本权重。例如，如果类“0”比数据中的类“1”少两倍，则可以使用 `class_weight={0: 1., 1: 0.5}`.\n\n这是一个Numpy示例，我们使用类权重class weights或样本权重sample weights来更加重视 class #5 的正确分类（这是MNIST数据集中的数字“5”）。\n\n```python\nimport numpy as np\n\nclass_weight = {0: 1., 1: 1., 2: 1., 3: 1., 4: 1.,\n                # Set weight \"2\" for class \"5\",\n                # making this class 2x more important\n                5: 2.,\n                6: 1., 7: 1., 8: 1., 9: 1.}\nmodel.fit(x_train, y_train,\n          class_weight=class_weight,\n          batch_size=64,\n          epochs=4)\n\n# Here's the same example using `sample_weight` instead:\nsample_weight = np.ones(shape=(len(y_train),))\nsample_weight[y_train == 5] = 2.\n\nmodel = get_compiled_model()\nmodel.fit(x_train, y_train,\n          sample_weight=sample_weight,\n          batch_size=64,\n          epochs=4)\n```\n\n    Epoch 1/4\n    50000/50000 [==============================] - 4s 89us/sample - loss: 0.1040 - sparse_categorical_accuracy: 0.9715\n    .....\n    Epoch 4/4\n    50000/50000 [==============================] - 4s 83us/sample - loss: 0.1016 - sparse_categorical_accuracy: 0.9719\n\n\n这是一个匹配的数据集示例：\n\n```python\nsample_weight = np.ones(shape=(len(y_train),))\nsample_weight[y_train == 5] = 2.\n\n# Create a Dataset that includes sample weights\n# (3rd element in the return tuple).\ntrain_dataset = tf.data.Dataset.from_tensor_slices(\n    (x_train, y_train, sample_weight))\n\n# Shuffle and slice the dataset.\ntrain_dataset = train_dataset.shuffle(buffer_size=1024).batch(64)\n\nmodel = get_compiled_model()\nmodel.fit(train_dataset, epochs=3)\n```\n\n    Epoch 1/3\n    782/782 [==============================] - 9s 11ms/step - loss: 0.3666 - sparse_categorical_accuracy: 0.9046\n    Epoch 2/3\n    782/782 [==============================] - 7s 9ms/step - loss: 0.1646 - sparse_categorical_accuracy: 0.9539\n    Epoch 3/3\n    782/782 [==============================] - 7s 9ms/step - loss: 0.1178 - sparse_categorical_accuracy: 0.9677\n\n\n\n### 将数据传递到多输入，多输出模型 \n\n在前面的例子中，我们考虑的是一个带有单个输入的模型（形状为 `(764,)` 的张量）和单个输出（形状为 `(10,)` 的预测张量）。但是具有多个输入或输出的模型呢？\n\n考虑下面的模型，它有一个形状为 `(32, 32, 3)`  的形状输入（即“（高度，宽度，通道）”）和形状为 `(None, 10)` 的时间序列输入（即“（时间步长，特征）”）。我们的模型将根据这些输入的组合计算两个输出：“得分”（形状为`(1,)`和5个类别（形状为`(10,)`）的概率分布。\n\n\n```python\nfrom tensorflow import keras\nfrom tensorflow.keras import layers\n\nimage_input = keras.Input(shape=(32, 32, 3), name='img_input')\ntimeseries_input = keras.Input(shape=(None, 10), name='ts_input')\n\nx1 = layers.Conv2D(3, 3)(image_input)\nx1 = layers.GlobalMaxPooling2D()(x1)\n\nx2 = layers.Conv1D(3, 3)(timeseries_input)\nx2 = layers.GlobalMaxPooling1D()(x2)\n\nx = layers.concatenate([x1, x2])\n\nscore_output = layers.Dense(1, name='score_output')(x)\nclass_output = layers.Dense(5, activation='softmax', name='class_output')(x)\n\nmodel = keras.Model(inputs=[image_input, timeseries_input],\n                    outputs=[score_output, class_output])\n```\n\n让我们绘制这个模型，这样你就可以清楚地看到我们在这里做的事情（请注意，图中显示的形状是批量形状，而不是每个样本的形状）。\n\n```\n\nkeras.utils.plot_model(model, 'multi_input_and_output_model.png', show_shapes=True)\n```\n\n\n\n\n![png](training_and_evaluation_48_0.png)\n\n\n\n在编译时，我们可以通过将损失函数作为列表传递给不同的输出指定不同的损失：\n\n```\nmodel.compile(\n    optimizer=keras.optimizers.RMSprop(1e-3),\n    loss=[keras.losses.MeanSquaredError(),\n          keras.losses.CategoricalCrossentropy()])\n```\n\n如果我们只将单个损失函数传递给模型，则相同的损失函数将应用于每个输出，这在这里是不合适的。\n\n同样适用于指标：\n\n```\nmodel.compile(\n    optimizer=keras.optimizers.RMSprop(1e-3),\n    loss=[keras.losses.MeanSquaredError(),\n          keras.losses.CategoricalCrossentropy()],\n    metrics=[[keras.metrics.MeanAbsolutePercentageError(),\n              keras.metrics.MeanAbsoluteError()],\n             [keras.metrics.CategoricalAccuracy()]])\n```\n\n由于我们为输出层指定了名称，因此我们还可以通过dict指定每个输出的损失和指标：\n\n```python\nmodel.compile(\n    optimizer=keras.optimizers.RMSprop(1e-3),\n    loss={'score_output': keras.losses.MeanSquaredError(),\n          'class_output': keras.losses.CategoricalCrossentropy()},\n    metrics={'score_output': [keras.metrics.MeanAbsolutePercentageError(),\n                              keras.metrics.MeanAbsoluteError()],\n             'class_output': [keras.metrics.CategoricalAccuracy()]})\n```\n\n如果您有超过2个输出，我们建议使用显式名称和dicts。\n\n可以给不同的输出特定损失赋予不同的权重（例如，可能希望通过使用`loss_weight`参数赋予2x类损失的重要性来保留我们示例中的“得分”损失特权：\n\n```python\nmodel.compile(\n    optimizer=keras.optimizers.RMSprop(1e-3),\n    loss={'score_output': keras.losses.MeanSquaredError(),\n          'class_output': keras.losses.CategoricalCrossentropy()},\n    metrics={'score_output': [keras.metrics.MeanAbsolutePercentageError(),\n                              keras.metrics.MeanAbsoluteError()],\n             'class_output': [keras.metrics.CategoricalAccuracy()]},\n    loss_weight={'score_output': 2., 'class_output': 1.})\n```\n\n您还可以选择不计算某些输出的损失，如果这些输出用于预测但不用于训练：\n\n```python\n# List loss version\nmodel.compile(\n    optimizer=keras.optimizers.RMSprop(1e-3),\n    loss=[None, keras.losses.CategoricalCrossentropy()])\n\n# Or dict loss version\nmodel.compile(\n    optimizer=keras.optimizers.RMSprop(1e-3),\n    loss={'class_output': keras.losses.CategoricalCrossentropy()})\n```\n\n将数据传递给`fit`中的多输入或多输出模型的工作方式与在`compile`中指定损失函数的方式类似：\n\n你可以传递Numpy数组列表（1：1映射到接收到损失函数的输出）或者将输出名称映射到Numpy训练数据数组。\n\n```python\nmodel.compile(\n    optimizer=keras.optimizers.RMSprop(1e-3),\n    loss=[keras.losses.MeanSquaredError(),\n          keras.losses.CategoricalCrossentropy()])\n\n# Generate dummy Numpy data\nimg_data = np.random.random_sample(size=(100, 32, 32, 3))\nts_data = np.random.random_sample(size=(100, 20, 10))\nscore_targets = np.random.random_sample(size=(100, 1))\nclass_targets = np.random.random_sample(size=(100, 5))\n\n# Fit on lists\nmodel.fit([img_data, ts_data], [score_targets, class_targets],\n          batch_size=32,\n          epochs=3)\n\n# Alernatively, fit on dicts\nmodel.fit({'img_input': img_data, 'ts_input': ts_data},\n          {'score_output': score_targets, 'class_output': class_targets},\n          batch_size=32,\n          epochs=3)\n```\n\n\n这是数据集用例：类似于我们为Numpy数组所做的，数据集应该返回一个dicts元组。\n\n```python\ntrain_dataset = tf.data.Dataset.from_tensor_slices(\n    ({'img_input': img_data, 'ts_input': ts_data},\n     {'score_output': score_targets, 'class_output': class_targets}))\ntrain_dataset = train_dataset.shuffle(buffer_size=1024).batch(64)\n\nmodel.fit(train_dataset, epochs=3)\n```\n\n\n### 使用回调Using callbacks\n\n\nKeras中的回调是在训练期间（在周期开始时，批处理结束时，周期结束时等）在不同点调用的对象，可用于实现以下行为：\n- 在训练期间的不同时间点进行验证（超出内置的每个时期验证）\n- 定期检查模型或超过某个精度阈值\n- 在训练时改变模型的学习率似乎是平稳的\n- 在训练时对顶层进行微调似乎是平稳的\n- 在训练结束或超出某个性能阈值时发送电子邮件或即时消息通知\n- Etc.\n\n回调可以作为列表传递给你对 `fit` 的调用：\n\n```python\nmodel = get_compiled_model()\n\ncallbacks = [\n    keras.callbacks.EarlyStopping(\n        # Stop training when `val_loss` is no longer improving\n        monitor='val_loss',\n        # \"no longer improving\" being defined as \"no better than 1e-2 less\"\n        min_delta=1e-2,\n        # \"no longer improving\" being further defined as \"for at least 2 epochs\"\n        patience=2,\n        verbose=1)\n]\nmodel.fit(x_train, y_train,\n          epochs=20,\n          batch_size=64,\n          callbacks=callbacks,\n          validation_split=0.2)\n```\n\n#### 许多内置回调都可用\n\n- `ModelCheckpoint`: 定期保存模型。\n- `EarlyStopping`: 当训练不再改进验证指标时停止训练。\n- `TensorBoard`: 定期编写可在TensorBoard中显示的模型日志（更多细节见“可视化”）。\n- `CSVLogger`: 将丢失和指标数据丢失到CSV文件。\n- etc.\n\n\n\n#### 编写自己的回调\n\n您可以通过扩展基类 `keras.callbacks.Callback` 来创建自定义回调。回调可以通过类属性 `self.model` 访问其关联的模型。\n\n以下是在训练期间保存每批损失值列表的简单示例：\n\n```python\nclass LossHistory(keras.callbacks.Callback):\n\n    def on_train_begin(self, logs):\n        self.losses = []\n\n    def on_batch_end(self, batch, logs):\n        self.losses.append(logs.get('loss'))\n```\n\n### 检查点模型\n\n当您在相对较大的数据集上训练模型时，以频繁的间隔保存模型的检查点至关重要。\n\n实现此目的的最简单方法是使用 `ModelCheckpoint` 回调：\n\n```python\nmodel = get_compiled_model()\n\ncallbacks = [\n    keras.callbacks.ModelCheckpoint(\n        filepath='mymodel_{epoch}.h5',\n        # Path where to save the model\n        # The two parameters below mean that we will overwrite\n        # the current checkpoint if and only if\n        # the `val_loss` score has improved.\n        save_best_only=True,\n        monitor='val_loss',\n        verbose=1)\n]\nmodel.fit(x_train, y_train,\n          epochs=3,\n          batch_size=64,\n          callbacks=callbacks,\n          validation_split=0.2)\n```\n\n您也可以编写自己的回调来保存和恢复模型。\n\n有关序列化和保存的完整指南，请参见[保存和序列化模型指南](https://tensorflow.google.cn/beta/guide/keras/saving_and_serializing)。\n\n### 使用学习率计划\n\n在训练深度学习模型时，一个常见的模式是随着训练的进展逐步减少学习。这通常被称为“学习速度衰减”。\n\n学习衰减计划可以是静态的(预先固定，作为当前周期或当前批处理索引的函数)，也可以是动态的(响应模型当前的行为，特别是验证损失)。\n\n#### 将计划传递给优化程序\n\n您可以通过在优化程序中将计划对象作为learning_rate参数传递，轻松使用静态学习速率衰减计划：\n\n```python\ninitial_learning_rate = 0.1\nlr_schedule = keras.optimizers.schedules.ExponentialDecay(\n    initial_learning_rate,\n    decay_steps=100000,\n    decay_rate=0.96,\n    staircase=True)\n\noptimizer = keras.optimizers.RMSprop(learning_rate=lr_schedule)\n```\n\n有几个内置的schedule表: `ExponentialDecay`, `PiecewiseConstantDecay`, `PolynomialDecay`, and `InverseTimeDecay`.\n\n#### 使用回调来实现动态学习速率计划\n\n使用这些schedule对象无法实现动态学习速率计划(例如，当验证损失不再改善时降低学习速率)，因为优化器无法访问验证指标。\n\n然而，回调确实可以访问所有指标，包括验证指标!\n因此，可以通过使用回调函数来修改优化器上的当前学习率来实现这种模式。\n事实上，它甚至内置在 `ReduceLROnPlateau`  回调函数中。\n\n### 在训练期间可视化损失和度量\n\n在训练期间密切关注你的模型的最好方法是使用 [TensorBoard](https://www.tensorflow.org/tensorboard) ，这是一个基于浏览器的应用程序，你可以在本地运行，它为你提供:\n- 实时绘制损失和训练和评估指标\n- （可选）可视化图层激活的直方图\n- （可选）由“嵌入”图层学习的嵌入空间的三维可视化\n  \n如果您已经使用pip安装了TensorFlow，那么您应该能够从命令行启动TensorBoard：\n\n```\ntensorboard --logdir=/full_path_to_your_logs\n```\n\n#### 使用TensorBoard回调\n\n在Keras模型和 `fit` 方法中使用TensorBoard的最简单方法是TensorBoard回调。\n\n在最简单的情况下，只需指定回写日志的位置，就可以了：\n\n```python\ntensorboard_cbk = keras.callbacks.TensorBoard(log_dir='/full_path_to_your_logs')\nmodel.fit(dataset, epochs=10, callbacks=[tensorboard_cbk])\n```\n\nTensorBoard回调有许多有用的选项，包括是否记录嵌入，直方图以及写入日志的频率：\n\n```python\nkeras.callbacks.TensorBoard(\n  log_dir='/full_path_to_your_logs',\n  histogram_freq=0,  # How often to log histogram visualizations\n  embeddings_freq=0,  # How often to log embedding visualizations\n  update_freq='epoch')  # How often to write logs (default: once per epoch)\n```\n\n## 第二部分：从头开始编写自己的训练和评估循环\n\n如果你想要训练和评估循环的级别低于 `fit()` 和 `evaluate()` 提供的的级别，你应该自己编写。它实际上非常简单！但是你应该准备好自己做更多的调试。\n\n### 使用GradientTape：第一个端到端的例子\n\n在“GradientTape”范围内调用模型使您可以根据损失值检索图层的可训练权重的梯度。使用优化器实例，您可以使用这些梯度来更新这些变量（可以使用`model.trainable_weights`检索）。\n\n让我们重用第一部分中的初始MNIST模型，让我们使用带有自定义训练循环的小批量梯度训练它。\n\n```python\n# Get the model.\ninputs = keras.Input(shape=(784,), name='digits')\nx = layers.Dense(64, activation='relu', name='dense_1')(inputs)\nx = layers.Dense(64, activation='relu', name='dense_2')(x)\noutputs = layers.Dense(10, activation='softmax', name='predictions')(x)\nmodel = keras.Model(inputs=inputs, outputs=outputs)\n\n# Instantiate an optimizer.\noptimizer = keras.optimizers.SGD(learning_rate=1e-3)\n# Instantiate a loss function.\nloss_fn = keras.losses.SparseCategoricalCrossentropy()\n\n# Prepare the training dataset.\nbatch_size = 64\ntrain_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))\ntrain_dataset = train_dataset.shuffle(buffer_size=1024).batch(batch_size)\n\n# Iterate over epochs.\nfor epoch in range(3):\n  print('Start of epoch %d' % (epoch,))\n\n  # Iterate over the batches of the dataset.\n  for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):\n\n    # Open a GradientTape to record the operations run\n    # during the forward pass, which enables autodifferentiation.\n    with tf.GradientTape() as tape:\n\n      # Run the forward pass of the layer.\n      # The operations that the layer applies\n      # to its inputs are going to be recorded\n      # on the GradientTape.\n      logits = model(x_batch_train)  # Logits for this minibatch\n\n      # Compute the loss value for this minibatch.\n      loss_value = loss_fn(y_batch_train, logits)\n\n    # Use the gradient tape to automatically retrieve\n    # the gradients of the trainable variables with respect to the loss.\n    grads = tape.gradient(loss_value, model.trainable_weights)\n\n    # Run one step of gradient descent by updating\n    # the value of the variables to minimize the loss.\n    optimizer.apply_gradients(zip(grads, model.trainable_weights))\n\n    # Log every 200 batches.\n    if step % 200 == 0:\n        print('Training loss (for one batch) at step %s: %s' % (step, float(loss_value)))\n        print('Seen so far: %s samples' % ((step + 1) * 64))\n```\n\n### 指标的低级处理\n\n让我们添加指标。您可以很容易地在这样的训练循环中重用内置的指标（或您编写的自定义指标）。这是流程：\n\n- 在循环开始时实例化度量标准\n- 每批后调用 `metric.update_state()` \n- 当需要显示度量的当前值时，调用`metric.result()`\n- 当需要清除度量的状态时（通常在一个周期的末尾），调用`metric.reset_states()`\n\n让我们使用这些知识在每个周期结束时计算验证数据的 `SparseCategoricalAccuracy` ：\n\n```python\n# Get model\ninputs = keras.Input(shape=(784,), name='digits')\nx = layers.Dense(64, activation='relu', name='dense_1')(inputs)\nx = layers.Dense(64, activation='relu', name='dense_2')(x)\noutputs = layers.Dense(10, activation='softmax', name='predictions')(x)\nmodel = keras.Model(inputs=inputs, outputs=outputs)\n\n# Instantiate an optimizer to train the model.\noptimizer = keras.optimizers.SGD(learning_rate=1e-3)\n# Instantiate a loss function.\nloss_fn = keras.losses.SparseCategoricalCrossentropy()\n\n# Prepare the metrics.\ntrain_acc_metric = keras.metrics.SparseCategoricalAccuracy()\nval_acc_metric = keras.metrics.SparseCategoricalAccuracy()\n\n# Prepare the training dataset.\nbatch_size = 64\ntrain_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))\ntrain_dataset = train_dataset.shuffle(buffer_size=1024).batch(batch_size)\n\n# Prepare the validation dataset.\nval_dataset = tf.data.Dataset.from_tensor_slices((x_val, y_val))\nval_dataset = val_dataset.batch(64)\n\n\n# Iterate over epochs.\nfor epoch in range(3):\n  print('Start of epoch %d' % (epoch,))\n\n  # Iterate over the batches of the dataset.\n  for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):\n    with tf.GradientTape() as tape:\n      logits = model(x_batch_train)\n      loss_value = loss_fn(y_batch_train, logits)\n    grads = tape.gradient(loss_value, model.trainable_weights)\n    optimizer.apply_gradients(zip(grads, model.trainable_weights))\n\n    # Update training metric.\n    train_acc_metric(y_batch_train, logits)\n\n    # Log every 200 batches.\n    if step % 200 == 0:\n        print('Training loss (for one batch) at step %s: %s' % (step, float(loss_value)))\n        print('Seen so far: %s samples' % ((step + 1) * 64))\n\n  # Display metrics at the end of each epoch.\n  train_acc = train_acc_metric.result()\n  print('Training acc over epoch: %s' % (float(train_acc),))\n  # Reset training metrics at the end of each epoch\n  train_acc_metric.reset_states()\n\n  # Run a validation loop at the end of each epoch.\n  for x_batch_val, y_batch_val in val_dataset:\n    val_logits = model(x_batch_val)\n    # Update val metrics\n    val_acc_metric(y_batch_val, val_logits)\n  val_acc = val_acc_metric.result()\n  val_acc_metric.reset_states()\n  print('Validation acc: %s' % (float(val_acc),))\n```\n\n### 低水平处理额外损失\n\n您在上一节中看到，通过在`call`方法中调用 `self.add_loss(value)` ，可以通过图层添加正则化损失。\n\n在一般情况下，您需要在自定义训练循环中考虑这些损失（除非您自己编写模型并且您已经知道它不会造成这样的损失）。\n\n回想一下上一节的这个例子，它的特点是一个层会产生正则化损失:\n\n```python\nclass ActivityRegularizationLayer(layers.Layer):\n\n  def call(self, inputs):\n    self.add_loss(1e-2 * tf.reduce_sum(inputs))\n    return inputs\n\ninputs = keras.Input(shape=(784,), name='digits')\nx = layers.Dense(64, activation='relu', name='dense_1')(inputs)\n# Insert activity regularization as a layer\nx = ActivityRegularizationLayer()(x)\nx = layers.Dense(64, activation='relu', name='dense_2')(x)\noutputs = layers.Dense(10, activation='softmax', name='predictions')(x)\n\nmodel = keras.Model(inputs=inputs, outputs=outputs)\n\n```\n\n当您调用模型时，如下所示：\n\n```python\nlogits = model(x_train)\n```\n\n它在前向传递期间产生的损失被添加到  `model.losses`  属性中：\n\n```python\nlogits = model(x_train[:64])\nprint(model.losses)\n```\n\n跟踪损失首先在模型 `__call__` 开始时清除，因此您只能看到在这一次前进过程中产生的损失。例如，重复调用模型然后查询  `losses`  只显示最后一次调用期间创建的最新损失：\n\n```python\nlogits = model(x_train[:64])\nlogits = model(x_train[64: 128])\nlogits = model(x_train[128: 192])\nprint(model.losses)\n```\n\n要在训练期间考虑这些损失，您所要做的就是修改训练循环，将 `sum(model.losses)`  添加到您的总损失中：\n\n```python\noptimizer = keras.optimizers.SGD(learning_rate=1e-3)\n\nfor epoch in range(3):\n  print('Start of epoch %d' % (epoch,))\n\n  for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):\n    with tf.GradientTape() as tape:\n      logits = model(x_batch_train)\n      loss_value = loss_fn(y_batch_train, logits)\n\n      # Add extra losses created during this forward pass:\n      loss_value += sum(model.losses)\n\n    grads = tape.gradient(loss_value, model.trainable_weights)\n    optimizer.apply_gradients(zip(grads, model.trainable_weights))\n\n    # Log every 200 batches.\n    if step % 200 == 0:\n        print('Training loss (for one batch) at step %s: %s' % (step, float(loss_value)))\n        print('Seen so far: %s samples' % ((step + 1) * 64))\n```\n\n那是拼图的最后一块！你已经到了本指南的末尾。\n\n现在您已经了解了有关使用内置训练循环以及从头开始编写自己的知识的所有信息。\n"
  },
  {
    "path": "r2/guide/migration_guide.md",
    "content": "---\ntitle: 将 TF1.x 代码迁移到 TensorFlow 2.0\ncategories: tensorflow2官方教程\ntags: tensorflow2.0教程\ntop: 1903\nabbrlink: tensorflow/tf2-guide-migration_guide\n---\n\n# 将 TF1.x 代码迁移到 TensorFlow 2.0（tensorflow2.0官方教程翻译）\n\n在TensorFlow 2.0中，仍然可以运行未经修改的1.x代码（contrib除外）：\n\n```python\nimport tensorflow.compat.v1 as tf\ntf.disable_v2_behavior()\n```\n\n但是，这并不能让您利用TensorFlow2.0中的许多改进。本指南将帮助您升级代码，使其更简单、更高效、更易于维护。\n\n## 自动转换脚本\n\n第一步是尝试运行[升级脚本](https://tensorflow.google.cn/beta/guide/upgrade).\n\n这将在将您的代码升级到TensorFlow 2.0时执行初始步骤。但是它不能使您的代码适合TensorFlowF 2.0。您的代码仍然可以使用`tf.compat.v1` 接口来访问占位符，会话，集合和其他1.x样式的功能。\n\n## 使代码2.0原生化\n\n\n本指南将介绍将TensorFlow 1.x代码转换为TensorFlow 2.0的几个示例。这些更改将使您的代码利用性能优化和简化的API调用。\n在每一种情况下，模式是：\n\n### 1. 替换`tf.Session.run`调用\n\n每个`tf.Session.run`调用都应该被Python函数替换。\n\n* `feed_dict`和`tf.placeholder'成为函数参数。\n* `fetches`成为函数的返回值。\n\n您可以使用标准Python工具（如`pdb`）逐步调试和调试函数\n\n如果您对它的工作感到满意，可以添加一个`tf.function`装饰器，使其在图形模式下高效运行。有关其工作原理的更多信息，请参阅[Autograph Guide](https://tensorflow.google.cn/beta/guide/autograph)。\n\n### 2.  使用Python对象来跟踪变量和损失\n\n使用`tf.Variable`而不是`tf.get_variable`。\n每个`variable_scope`都可以转换为Python对象。通常这将是以下之一：\n\n* `tf.keras.layers.Layer`\n* `tf.keras.Model`\n* `tf.Module`\n\n如果需要聚合变量列表（如 `tf.Graph.get_collection(tf.GraphKeys.VARIABLES)` ），请使用`Layer`和`Model`对象的`.variables`和`.trainable_variables`属性。\n\n这些`Layer`和`Model`类实现了几个不需要全局集合的其他属性。他们的`.losses`属性可以替代使用`tf.GraphKeys.LOSSES`集合。\n\n有关详细信息，请参阅[keras指南](https://tensorflow.google.cn/beta/guide/keras)。\n\n警告：许多`tf.compat.v1`符号隐式使用全局集合。\n\n### 3. 升级您的训练循环\n\n使用适用于您的用例的最高级API。首选`tf.keras.Model.fit`构建自己的训练循环。\n\n如果您编写自己的训练循环，这些高级函数可以管理很多可能容易遗漏的低级细节。例如，它们会自动收集正则化损失，并在调用模型时设置`training = True`参数。\n\n### 4. 升级数据输入管道\n\n使用`tf.data`数据集进行数据输入。这些对象是高效的，富有表现力的，并且与张量流很好地集成。\n\n它们可以直接传递给`tf.keras.Model.fit`方法。\n\n```python\nmodel.fit(dataset, epochs=5)\n```\n\n它们可以直接在标准Python上迭代：\n\n```python\nfor example_batch, label_batch in dataset:\n    break\n```\n\n\n## 转换模型\n\n### 设置\n\n\n```python\nfrom __future__ import absolute_import, division, print_function, unicode_literals\nimport tensorflow as tf\n\nimport tensorflow_datasets as tfds\n```\n\n### 低阶变量和操作执行\n\n低级API使用的示例包括：\n\n* 使用变量范围来控制重用\n* 用`tf.get_variable`创建变量。\n* 显式访问集合\n* 使用以下方法隐式访问集合：\n\n  * `tf.global_variables`\n  * `tf.losses.get_regularization_loss`\n\n* 使用`tf.placeholder`设置图输入\n* 用`session.run`执行图形\n* 手动初始化变量\n\n\n#### 转换前\n\n以下是使用TensorFlow 1.x在代码中看起来像这些模式的内容：\n\n```python\nin_a = tf.placeholder(dtype=tf.float32, shape=(2))\nin_b = tf.placeholder(dtype=tf.float32, shape=(2))\n\ndef forward(x):\n  with tf.variable_scope(\"matmul\", reuse=tf.AUTO_REUSE):\n    W = tf.get_variable(\"W\", initializer=tf.ones(shape=(2,2)),\n                        regularizer=tf.contrib.layers.l2_regularizer(0.04))\n    b = tf.get_variable(\"b\", initializer=tf.zeros(shape=(2)))\n    return W * x + b\n\nout_a = forward(in_a)\nout_b = forward(in_b)\n\nreg_loss = tf.losses.get_regularization_loss(scope=\"matmul\")\n\nwith tf.Session() as sess:\n  sess.run(tf.global_variables_initializer())\n  outs = sess.run([out_a, out_b, reg_loss],\n      \t        feed_dict={in_a: [1, 0], in_b: [0, 1]})\n\n```\n\n#### 转换后\n\n在转换后的代码中：\n\n* 变量是本地Python对象.\n* `forward`函数仍定义计算。\n* `sess.run`调用被替换为对'forward`的调用\n* 可以添加可选的`tf.function`装饰器以提高性能。\n* 正则化是手动计算的，不涉及任何全局集合。\n* **没有会话或占位符**\n\n\n```python\nW = tf.Variable(tf.ones(shape=(2,2)), name=\"W\")\nb = tf.Variable(tf.zeros(shape=(2)), name=\"b\")\n\n@tf.function\ndef forward(x):\n  return W * x + b\n\nout_a = forward([1,0])\nprint(out_a)\n```\n\n\n```python\nout_b = forward([0,1])\n\nregularizer = tf.keras.regularizers.l2(0.04)\nreg_loss = regularizer(W)\n```\n\n### 基于`tf.layers`的模型\n\n`tf.layers`模块用于包含依赖于`tf.variable_scope`来定义和重用变量的层函数。\n\n#### 转换前\n\n```python\ndef model(x, training, scope='model'):\n  with tf.variable_scope(scope, reuse=tf.AUTO_REUSE):\n    x = tf.layers.conv2d(x, 32, 3, activation=tf.nn.relu,\n          kernel_regularizer=tf.contrib.layers.l2_regularizer(0.04))\n    x = tf.layers.max_pooling2d(x, (2, 2), 1)\n    x = tf.layers.flatten(x)\n    x = tf.layers.dropout(x, 0.1, training=training)\n    x = tf.layers.dense(x, 64, activation=tf.nn.relu)\n    x = tf.layers.batch_normalization(x, training=training)\n    x = tf.layers.dense(x, 10, activation=tf.nn.softmax)\n    return x\n\ntrain_out = model(train_data, training=True)\ntest_out = model(test_data, training=False)\n```\n\n#### 转换后\n\n* 简单的层堆栈可以整齐地放入 `tf.keras.Sequential` 中。  (对于更复杂的模型，请参见 *自定义层和模型* ，以及 *函数式API* 两个教程）\n* 模型跟踪变量和正则化损失\n* 转换是一对一的，因为有一个从`tf.layers`到`tf.keras.layers`的直接映射。\n\n大多数参数保持不变，但注意区别：\n\n* 训练参数在运行时由模型传递给每个层\n* 原来模型函数的第一个参数（input `x` ）消失，这是因为层将构建模型与调用模型分开了。\n\n同时也要注意：\n\n* 如果你使用来自`tf.contrib`的初始化器的正则化器，它们的参数变化比其他变量更多。\n* 代码不在写入集合，因此像 `tf.losses.get_regularization_loss` 这样的函数将不再返回这些值，这可能会破坏您的训练循环。\n\n```python\nmodel = tf.keras.Sequential([\n    tf.keras.layers.Conv2D(32, 3, activation='relu',\n                           kernel_regularizer=tf.keras.regularizers.l2(0.04),\n                           input_shape=(28, 28, 1)),\n    tf.keras.layers.MaxPooling2D(),\n    tf.keras.layers.Flatten(),\n    tf.keras.layers.Dropout(0.1),\n    tf.keras.layers.Dense(64, activation='relu'),\n    tf.keras.layers.BatchNormalization(),\n    tf.keras.layers.Dense(10, activation='softmax')\n])\n\ntrain_data = tf.ones(shape=(1, 28, 28, 1))\ntest_data = tf.ones(shape=(1, 28, 28, 1))\n```\n\n\n```python\ntrain_out = model(train_data, training=True)\nprint(train_out)\n```\n\n\n```python\ntest_out = model(test_data, training=False)\nprint(test_out)\n```\n\n\n```python\n# 以下是所有可训练的变量。\nlen(model.trainable_variables)\n```\n\n\n```python\n# 这是正规化损失。\nmodel.losses\n```\n\n### 混合变量和tf.layers\n\n现存的代码通常将较低级别的TF 1.x变量和操作与较高级的 `tf.layers` 混合。\n\n#### 转换前\n```python\ndef model(x, training, scope='model'):\n  with tf.variable_scope(scope, reuse=tf.AUTO_REUSE):\n    W = tf.get_variable(\n      \"W\", dtype=tf.float32,\n      initializer=tf.ones(shape=x.shape),\n      regularizer=tf.contrib.layers.l2_regularizer(0.04),\n      trainable=True)\n    if training:\n      x = x + W\n    else:\n      x = x + W * 0.5\n    x = tf.layers.conv2d(x, 32, 3, activation=tf.nn.relu)\n    x = tf.layers.max_pooling2d(x, (2, 2), 1)\n    x = tf.layers.flatten(x)\n    return x\n\ntrain_out = model(train_data, training=True)\ntest_out = model(test_data, training=False)\n```\n\n#### 转换后\n\n要转换此代码，请遵循将图层映射到图层的模式，如上例所示。\n\n一般模式是：\n\n* 在`__init__`中收集图层参数。\n* 在`build`中构建变量。\n* 在`call`中执行计算，并返回结果。\n\n`tf.variable_scope`实际上是它自己的一层。所以把它重写为`tf.keras.layers.Layer`。\n有关信息请参阅 [指南](https://tensorflow.google.cn/beta/guide/keras/custom_layers_and_models) \n\n```python\n# Create a custom layer for part of the model\nclass CustomLayer(tf.keras.layers.Layer):\n  def __init__(self, *args, **kwargs):\n    super(CustomLayer, self).__init__(*args, **kwargs)\n\n  def build(self, input_shape):\n    self.w = self.add_weight(\n        shape=input_shape[1:],\n        dtype=tf.float32,\n        initializer=tf.keras.initializers.ones(),\n        regularizer=tf.keras.regularizers.l2(0.02),\n        trainable=True)\n\n  # 调用方法有时会在图形模式下使用，训练会变成一个张量 \n  @tf.function\n  def call(self, inputs, training=None):\n    if training:\n      return inputs + self.w\n    else:\n      return inputs + self.w * 0.5\n```\n\n\n```python\ncustom_layer = CustomLayer()\nprint(custom_layer([1]).numpy())\nprint(custom_layer([1], training=True).numpy())\n```\n\n\n```python\ntrain_data = tf.ones(shape=(1, 28, 28, 1))\ntest_data = tf.ones(shape=(1, 28, 28, 1))\n\n# 构建包含自定义层的模型 \nmodel = tf.keras.Sequential([\n    CustomLayer(input_shape=(28, 28, 1)),\n    tf.keras.layers.Conv2D(32, 3, activation='relu'),\n    tf.keras.layers.MaxPooling2D(),\n    tf.keras.layers.Flatten(),\n])\n\ntrain_out = model(train_data, training=True)\ntest_out = model(test_data, training=False)\n\n```\n\n需要注意以下几点：\n\n* 子类化的Keras模型和层需要在v1图(没有自动控制依赖关系)和eager模式下运行\n\n* 将`call（）`包装在`tf.function（）`中以获取自动图和自动控制依赖关系\n\n* 不要忘了调用时需要一个训练参数（ `tf.Tensor` 或Python布尔值）\n\n* 使用`self.add_weight（）`在构造函数或`def build（）`中创建模型变量\n  * 在`build`中，您可以访问输入形状，因此可以创建具有匹配形状的权重。\n  * 使用`tf.keras.layers.Layer.add_weight`允许Keras跟踪变量和正则化损失。\n\n* 不要在对象中保留`tf.Tensors`。\n  * 它们可能在`tf.function`中或在 eager 的上下文中创建，并且这些张量的行为也不同。\n  * 使用`tf.Variable`s作为状态，它们总是可用于两种情况\n  * `tf.Tensors`仅适用于中间值。\n\n### 关于Slim＆contrib.layers的说明\n\n大量较旧的TensorFlow 1.x代码使用 [Slim](https://ai.googleblog.com/2016/08/tf-slim-high-level-library-to-define.html) 库，与TensorFlow 1.x一起打包为`tf.contrib.layers`。作为`contrib`模块，TensorFlow 2.0中不再提供此功能，即使在`tf.compat.v1`中也是如此。使用Slim转换为TF 2.0比转换使用`tf.layers`的存储库更复杂。事实上，首先将Slim代码转换为`tf.layers`然后转换为Keras可能是有意义的。\n\n- 删除 `arg_scopes`，所有args都需要显式\n\n- 如果您使用它们，请将 `normalizer_fn `和 `activation_fn` 拆分为它们自己的图层\n\n- 可分离的转换层映射到一个或多个不同的Keras层（深度、点和可分离的Keras层）\n\n- Slim和 `tf.layers` 具有不同的arg名称和默认值\n\n- 有些args有不同的尺度\n\n- 如果您使用Slim预训练模型，请尝试使用 `tf.keras.applications` 或 [TFHub](https://tensorflow.orb/hub)\n\n一些`tf.contrib`图层可能没有被移动到核心TensorFlow，而是被移动到了 [TF附加组件包](https://github.com/tensorflow/addons).\n\n\n## 训练\n\n有很多方法可以将数据提供给`tf.keras`模型。他们将接受Python生成器和Numpy数组作为输入。\n\n将数据提供给模型的推荐方法是使用`tf.data`包，其中包含一组用于处理数据的高性能类。\n\n如果您仍在使用tf.queue，则仅支持这些作为数据结构，而不是数据管道。\n\n### 使用Datasets\n\n[TensorFlow数据集包](https://tensorflow.org/datasets)  (`tfds`) 包含用于将预定义数据集加载为 `tf.data.Dataset`  对象的使用程序。\n\n对于此示例，使用 `tfds` 加载MNIST数据集：\n\n```python\ndatasets, info = tfds.load(name='mnist', with_info=True, as_supervised=True)\nmnist_train, mnist_test = datasets['train'], datasets['test']\n```\n\n然后为训练准备数据：\n\n  * 重新缩放每个图像\n  * 打乱样本数据的顺序\n  * 收集批量图像和标签\n\n\n\n```python\nBUFFER_SIZE = 10 # 实际代码中使用更大的值 \nBATCH_SIZE = 64\nNUM_EPOCHS = 5\n\n\ndef scale(image, label):\n  image = tf.cast(image, tf.float32)\n  image /= 255\n\n  return image, label\n```\n\n要使示例保持简短，请修剪数据集以仅返回5个批次：\n\n```python\ntrain_data = mnist_train.map(scale).shuffle(BUFFER_SIZE).batch(BATCH_SIZE).take(5)\ntest_data = mnist_test.map(scale).batch(BATCH_SIZE).take(5)\n\nSTEPS_PER_EPOCH = 5\n\ntrain_data = train_data.take(STEPS_PER_EPOCH)\ntest_data = test_data.take(STEPS_PER_EPOCH)\n```\n\n\n```\nimage_batch, label_batch = next(iter(train_data))\n```\n\n### 使用Keras训练循环\n\n如果你不需要对训练过程进行低级别的控制，建议使用Keras内置的fit、evaluate和predict方法，这些方法提供了一个统一的接口来训练模型，而不管实现是什么（sequential、functional或子类化的）。\n\n这些方法的有点包括：\n\n-   它们接受Numpy数组、Python生成器和 `tf.data.Datasets`\n\n-   它们自动应用正则化和激活损失\n\n-   它们支持用于多设备训练的 `tf.distribute`\n\n-   它们支持任意的callables作为损失和指标\n\n-   它们支持回调，如 `tf.keras.callbacks.TensorBoard` 和自定义回调\n\n-   它们具有高性能，可自动使用TensorFlow图形\n\n以下是使用数据集训练模型的示例：\n\n```python\nmodel = tf.keras.Sequential([\n    tf.keras.layers.Conv2D(32, 3, activation='relu',\n                           kernel_regularizer=tf.keras.regularizers.l2(0.02),\n                           input_shape=(28, 28, 1)),\n    tf.keras.layers.MaxPooling2D(),\n    tf.keras.layers.Flatten(),\n    tf.keras.layers.Dropout(0.1),\n    tf.keras.layers.Dense(64, activation='relu'),\n    tf.keras.layers.BatchNormalization(),\n    tf.keras.layers.Dense(10, activation='softmax')\n])\n\n# 模型是没有自定义图层的完整模型\nmodel.compile(optimizer='adam',\n              loss='sparse_categorical_crossentropy',\n              metrics=['accuracy'])\n\nmodel.fit(train_data, epochs=NUM_EPOCHS)\nloss, acc = model.evaluate(test_data)\n\nprint(\"Loss {}, Accuracy {}\".format(loss, acc))\n```\n\n### 编写你自己的训练循环\n\n如果Keras模型的训练步骤适合您，但您需要在该步骤之外进行更多的控制，请考虑在您自己的数据迭代循环中使用  `tf.keras.model.train_on_batch` 方法。\n\n记住：许多东西可以作为 `tf.keras.Callback` 的实现。\n\n此方法具有上一节中提到的方法的许多优点，但允许用户控制外循环。\n\n您还可以使用 `tf.keras.model.test_on_batch` 或 `tf.keras.Model.evaluate` 来检查训练期间的性能。\n\n注意：`train_on_batch`和`test_on_batch`，默认返回单批的损失和指标。如果你传递`reset_metrics = False`，它们会返回累积的指标，你必须记住适当地重置指标累加器。还要记住，像 `AUC` 这样的一些指标需要正确计算 `reset_metrics = False`。\n\n继续训练上面的模型：\n\n```python\n# 模型是没有自定义图层的完整模型 \nmodel.compile(optimizer='adam',\n              loss='sparse_categorical_crossentropy',\n              metrics=['accuracy'])\n\nmetrics_names = model.metrics_names\n\nfor epoch in range(NUM_EPOCHS):\n  #Reset the metric accumulators\n  model.reset_metrics()\n\n  for image_batch, label_batch in train_data:\n    result = model.train_on_batch(image_batch, label_batch)\n    print(\"train: \",\n          \"{}: {:.3f}\".format(metrics_names[0], result[0]),\n          \"{}: {:.3f}\".format(metrics_names[1], result[1]))\n  for image_batch, label_batch in test_data:\n    result = model.test_on_batch(image_batch, label_batch,\n                                 # return accumulated metrics\n                                 reset_metrics=False)\n  print(\"\\neval: \",\n        \"{}: {:.3f}\".format(metrics_names[0], result[0]),\n        \"{}: {:.3f}\".format(metrics_names[1], result[1]))\n\n\n```\n\n<p id=\"custom_loops\"/>\n\n### 自定义训练步骤\n\n如果您需要更多的灵活性和控制，可以通过实现自己的训练循环来实现，有三个步骤：\n\n1. 迭代Python生成器或tf.data.Dataset以获取样本数据；\n\n2. 使用tf.GradientTape收集渐变；\n\n3. 使用tf.keras.optimizer将权重更新应用于模型。\n\n记住：\n\n-  始终在子类层和模型的调用方法中包含一个训练参数。\n\n-  确保在正确设置训练参数的情况下调用模型。\n\n-  根据使用情况，在对一批数据运行模型之前，模型变量可能不存在。\n\n-  您需要手动处理模型的正则化损失等事情\n\n请注意相对于v1的简化：\n\n-  不需要运行变量初始化器，变量在创建时初始化。\n\n-  不需要添加手动控制依赖项，即使在tf.function中，操作也像在eager模式下一样。\n\n上面的模型：\n\n```python\nmodel = tf.keras.Sequential([\n    tf.keras.layers.Conv2D(32, 3, activation='relu',\n                           kernel_regularizer=tf.keras.regularizers.l2(0.02),\n                           input_shape=(28, 28, 1)),\n    tf.keras.layers.MaxPooling2D(),\n    tf.keras.layers.Flatten(),\n    tf.keras.layers.Dropout(0.1),\n    tf.keras.layers.Dense(64, activation='relu'),\n    tf.keras.layers.BatchNormalization(),\n    tf.keras.layers.Dense(10, activation='softmax')\n])\n\noptimizer = tf.keras.optimizers.Adam(0.001)\nloss_fn = tf.keras.losses.SparseCategoricalCrossentropy()\n\n@tf.function\ndef train_step(inputs, labels):\n  with tf.GradientTape() as tape:\n    predictions = model(inputs, training=True)\n    regularization_loss = tf.math.add_n(model.losses)\n    pred_loss = loss_fn(labels, predictions)\n    total_loss = pred_loss + regularization_loss\n\n  gradients = tape.gradient(total_loss, model.trainable_variables)\n  optimizer.apply_gradients(zip(gradients, model.trainable_variables))\n\nfor epoch in range(NUM_EPOCHS):\n  for inputs, labels in train_data:\n    train_step(inputs, labels)\n  print(\"Finished epoch\", epoch)\n\n```\n\n### 新型指标\n\n在TensorFlow 2.0中，metrics是对象，Metrics对象在eager和tf.functions中运行，一个metrics具有以下方法：\n\n* ` update_state()` – 添加新的观察结果\n\n* `result()` – 给定观察值，获取metrics的当前结果\n\n* `reset_states()` – 清除所有观察值\n\n对象本身是可调用的，与 `update_state` 一样，调用新观察更新状态，并返回metrics的新结果。\n\n你不需要手动初始化metrics的变量，而且因为TensorFlow 2.0具有自动控制依赖项，所以您也不需要担心这些。\n\n下面的代码使用metrics来跟踪自定义训练循环中观察到的平均损失：\n\n```python\n# 创建metrics\nloss_metric = tf.keras.metrics.Mean(name='train_loss')\naccuracy_metric = tf.keras.metrics.SparseCategoricalAccuracy(name='train_accuracy')\n\n@tf.function\ndef train_step(inputs, labels):\n  with tf.GradientTape() as tape:\n    predictions = model(inputs, training=True)\n    regularization_loss = tf.math.add_n(model.losses)\n    pred_loss = loss_fn(labels, predictions)\n    total_loss = pred_loss + regularization_loss\n\n  gradients = tape.gradient(total_loss, model.trainable_variables)\n  optimizer.apply_gradients(zip(gradients, model.trainable_variables))\n  # 更新metrics\n  loss_metric.update_state(total_loss)\n  accuracy_metric.update_state(labels, predictions)\n\n\nfor epoch in range(NUM_EPOCHS):\n  # 重置metrics \n  loss_metric.reset_states()\n  accuracy_metric.reset_states()\n\n  for inputs, labels in train_data:\n    train_step(inputs, labels)\n  # 获取metric结果 \n  mean_loss = loss_metric.result()\n  mean_accuracy = accuracy_metric.result()\n\n  print('Epoch: ', epoch)\n  print('  loss:     {:.3f}'.format(mean_loss))\n  print('  accuracy: {:.3f}'.format(mean_accuracy))\n\n```\n\n## 保存和加载\n\n\n### Checkpoint兼容性\n\nTensorFlow 2.0使用基于对象的检查点。\n\n如果小心的话，仍然可以加载旧式的基于名称的检查点，代码转换过程可能会导致变量名的更改，但是有一些变通的方法。\n\n最简单的方法是将新模型的名称与检查点的名称对齐：\n\n-   变量仍然都有你可以设置的名称参数。\n\n-   Keras模型还采用名称参数，并将其设置为变量的前缀。\n\n-   `tf.name_scope` 函数可用于设置变量名称前缀，这与 `tf.variable_scope` 非常不同，它只影响名称，不跟踪变量和重用。\n\n如果这不适合您的用例，请尝试使用 `tf.compat.v1.train.init_from_checkpoint` 函数，它需要一个 `assignment_map` 参数，该参数指定从旧名称到新名称的映射。\n\n注意：与基于对象的检查点（可以[延迟加载](https://tensorflow.google.cn/beta/guide/checkpoints#loading_mechanics)不同，基于名称的检查点要求在调用函数时构建所有变量。某些模型推迟构建变量，直到您调用 `build` 或在一批数据上运行模型。\n\n### 保存的模型兼容性\n\n对于保存的模型没有明显的兼容性问题：\n\n-   TensorFlow 1.x saved_models在TensorFlow 2.0中工作。\n\n-   如果支持所有操作，TensorFlow 2.0 saved_models甚至可以在TensorFlow\n    1.x中加载工作。\n\n## Estimators\n\n### 使用Estimators进行训练\n\nTensorFlow 2.0支持Estimators，使用Estimators时，可以使用TensorFlow\n1.x中的 `input_fn()` 、`tf.extimatro.TrainSpec` 和 `tf.estimator.EvalSpec`。\n\n以下是使用 `input_fn` 和train以及evaluate的示例：\n\n#### 创建input_fn和train/eval规范\n\n```python\n# 定义一个estimator的input_fn \ndef input_fn():\n  datasets, info = tfds.load(name='mnist', with_info=True, as_supervised=True)\n  mnist_train, mnist_test = datasets['train'], datasets['test']\n\n  BUFFER_SIZE = 10000\n  BATCH_SIZE = 64\n\n  def scale(image, label):\n    image = tf.cast(image, tf.float32)\n    image /= 255\n\n    return image, label[..., tf.newaxis]\n\n  train_data = mnist_train.map(scale).shuffle(BUFFER_SIZE).batch(BATCH_SIZE)\n  return train_data.repeat()\n\n# 定义 train & eval specs\ntrain_spec = tf.estimator.TrainSpec(input_fn=input_fn,\n                                    max_steps=STEPS_PER_EPOCH * NUM_EPOCHS)\neval_spec = tf.estimator.EvalSpec(input_fn=input_fn,\n                                  steps=STEPS_PER_EPOCH)\n\n```\n\n### 使用Keras模型定义\n\n在TensorFlow2.0中如何构建estimators存在一些差异。\n\n我们建议您使用Keras定义模型，然后使用 `tf.keras.model_to_estimator` 将您的模型转换为estimator。下面的代码展示了如何在创建和训练estimator时使用这个功能。\n\n```python\ndef make_model():\n  return tf.keras.Sequential([\n    tf.keras.layers.Conv2D(32, 3, activation='relu',\n                           kernel_regularizer=tf.keras.regularizers.l2(0.02),\n                           input_shape=(28, 28, 1)),\n    tf.keras.layers.MaxPooling2D(),\n    tf.keras.layers.Flatten(),\n    tf.keras.layers.Dropout(0.1),\n    tf.keras.layers.Dense(64, activation='relu'),\n    tf.keras.layers.BatchNormalization(),\n    tf.keras.layers.Dense(10, activation='softmax')\n  ])\n```\n\n\n```\nmodel = make_model()\n\nmodel.compile(optimizer='adam',\n              loss='sparse_categorical_crossentropy',\n              metrics=['accuracy'])\n\nestimator = tf.keras.estimator.model_to_estimator(\n  keras_model = model\n)\n\ntf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)\n```\n\n### 使用自定义 `model_fn`\n\n如果您需要维护现有的自定义估算器 `model_fn`，则可以将 `model_fn` 转换为使用Keras模型。\n\n但是出于兼容性原因，自定义 `model_fn` 仍将以1.x样式的图形模式运行，这意味着没有eager execution，也没有自动控制依赖。\n\n在自定义 `model_fn` 中使用Keras模型类似于在自定义训练循环中使用它：\n\n-  根据mode参数适当设置训练阶段\n\n-  将模型的 `trainable_variables` 显示传递给优化器\n\n但相对于自定义循环，存在重要差异：\n\n-  使用 `tf.keras.Model.get_losses_for` 提取损失，而不是使用 `model.losses`\n\n-  使用 `tf.keras.Model.get_updates_for` 提取模型的更新\n\n注意：“更新”是每批后需要应用于模型的更改。例如，`tf.keras.layers.BatchNormalization`层中均值和方差的移动平均值。\n\n以下代码从自定义`model_fn`创建一个估算器，说明所有这些问题。\n\n```python\ndef my_model_fn(features, labels, mode):\n  model = make_model()\n\n  optimizer = tf.compat.v1.train.AdamOptimizer()\n  loss_fn = tf.keras.losses.SparseCategoricalCrossentropy()\n\n  training = (mode == tf.estimator.ModeKeys.TRAIN)\n  predictions = model(features, training=training)\n\n  reg_losses = model.get_losses_for(None) + model.get_losses_for(features)\n  total_loss = loss_fn(labels, predictions) + tf.math.add_n(reg_losses)\n\n  accuracy = tf.compat.v1.metrics.accuracy(labels=labels,\n                                           predictions=tf.math.argmax(predictions, axis=1),\n                                           name='acc_op')\n\n  update_ops = model.get_updates_for(None) + model.get_updates_for(features)\n  minimize_op = optimizer.minimize(\n      total_loss,\n      var_list=model.trainable_variables,\n      global_step=tf.compat.v1.train.get_or_create_global_step())\n  train_op = tf.group(minimize_op, update_ops)\n\n  return tf.estimator.EstimatorSpec(\n    mode=mode,\n    predictions=predictions,\n    loss=total_loss,\n    train_op=train_op, eval_metric_ops={'accuracy': accuracy})\n\n# Create the Estimator & Train\nestimator = tf.estimator.Estimator(model_fn=my_model_fn)\ntf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)\n```\n\n## TensorShape\n\n这个类被简化为保存`int`s，而不是`tf.compat.v1.Dimension`对象。所以不需要调用`.value（）`来获得`int`。\n\n仍然可以从`tf.TensorShape.dims`访问单个`tf.compat.v1.Dimension`对象。\n\n以下演示了TensorFlow 1.x和TensorFlow 2.0之间的区别。\n\n```\n# 创建一个shape并选择一个索引 \ni = 0\nshape = tf.TensorShape([16, None, 256])\nshape\n```\n\nTF 1.x 运行:\n\n```python\nvalue = shape[i].value\n```\n\nTF 2.0 运行::\n\n\n\n```python\nvalue = shape[i]\nvalue\n```\n\n TF 1.x 运行::\n\n```python\nfor dim in shape:\n    value = dim.value\n    print(value)\n```\n\nTF 2.0 运行::\n\n\n```python\nfor value in shape:\n  print(value)\n```\n\n在TF 1.x（或使用任何其他维度方法）中运行：\n\n```python\ndim = shape[i]\ndim.assert_is_compatible_with(other_dim)\n```\n\nTF 2.0运行：\n\n\n```python\nother_dim = 16\nDimension = tf.compat.v1.Dimension\n\nif shape.rank is None:\n  dim = Dimension(None)\nelse:\n  dim = shape.dims[i]\ndim.is_compatible_with(other_dim) # or any other dimension method\n```\n\n\n```python\nshape = tf.TensorShape(None)\n\nif shape:\n  dim = shape.dims[i]\n  dim.is_compatible_with(other_dim) # or any other dimension method\n```\n\n如果等级已知，则 `tf.TensorShape` 的布尔值为“True”，否则为“False”。\n\n```python\nprint(bool(tf.TensorShape([])))      # 标量 Scalar \nprint(bool(tf.TensorShape([0])))     # 0长度的向量 vector\nprint(bool(tf.TensorShape([1])))     # 1长度的向量 vector\nprint(bool(tf.TensorShape([None])))  # 未知长度的向量 \nprint(bool(tf.TensorShape([1, 10, 100])))       # 3D tensor\nprint(bool(tf.TensorShape([None, None, None]))) # 3D tensor with no known dimensions\nprint()\nprint(bool(tf.TensorShape(None)))  # 未知等级的张量 \n```\n\n## 其他行为改变\n\n您可能会遇到TensorFlow 2.0中的一些其他行为变化。\n\n\n### ResourceVariables\n\nTensorFlow 2.0默认创建`ResourceVariables`，而不是`RefVariables`。\n\n`ResourceVariables`被锁定用于写入，因此提供更直观的一致性保证。\n\n* 这可能会改变边缘情况下的行为\n* 这可能偶尔会创建额外的副本，可能会有更高的内存使用量\n* 可以通过将`use_resource = False`传递给`tf.Variable`构造函数来禁用它。\n\n### Control Flow\n\n控制流op实现得到了简化，因此在TensorFlow 2.0中生成了不同的图。\n\n## 结论\n\n回顾一下本节内容:\n\n1. 运行更新脚本\n2. 删除contrib符号\n3. 将模型切换为面向对象的样式（Keras）\n4. 尽可能使用`tf.keras`或`tf.estimator`培训和评估循环。\n5. 否则，请使用自定义循环，但请务必避免会话和集合。\n\n将代码转换为TensorFlow 2.0需要一些工作，但会有以下改变：\n-   更少的代码行\n-   提高清晰度和简洁性\n-   调试更简单\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-guide-migration_guide.html](https://www.mashangxue123.com/tensorflow/tf2-guide-migration_guide.html)\n> 英文版本：[https://tensorflow.google.cn/beta/guide/migration_guide](https://tensorflow.google.cn/beta/guide/migration_guide)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/guide/migration_guide.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/guide/migration_guide.md)\n"
  },
  {
    "path": "r2/tutorials/eager/automatic_differentiation.md",
    "content": "---\ntitle: TF梯度下降法的核心自动微分和梯度带\ncategories: tensorflow2官方教程\ntags: tensorflow2.0教程\ntop: 1953\nabbrlink: tensorflow/tf2-tutorials-eager-automatic_differentiation\n---\n\n# TF梯度下降法的核心自动微分和梯度带 (tensorflow2.0官方教程翻译）\n\n在上一个教程中，我们介绍了张量及其操作。在本教程中，我们将介绍自动微分，这是优化机器学习模型的关键技术。\n\n> 备注：在此之前，机器学习社区中很少发挥这个利器，一般都是用Backpropagation(反向传播算法)进行梯度求解，然后使用SGD等进行优化更新。手动实现过backprop算法的同学应该可以体会到其中的复杂性和易错性，一个好的框架应该可以很好地将这部分难点隐藏于用户视角，而自动微分技术恰好可以优雅解决这个问题。梯度下降法（Gradient Descendent）是机器学习的核心算法之一，自动微分则是梯度下降法的核心；梯度下降是通过计算参数与损失函数的梯度并在梯度的方向不断迭代求得极值；\n\n## 1. 导入包\n\n```python\nfrom __future__ import absolute_import, division, print_function, unicode_literals\n\nimport tensorflow as tf\n```\n\n## 2. 梯度带(Gradient tapes)\n\nTensorFlow提供了 [tf.GradientTape](https://www.tensorflow.org/api_docs/python/tf/GradientTape) API 用于自动微分(计算与输入变量相关的计算梯度)。\nTensorflow将在 `tf.GradientTape` 上下文中执行的所有操作“records(记录)”到“tape(磁带)”上。然后，TensorFlow使用该磁带和与每个记录操作相关的梯度，使用反向模式微分“记录”计算的梯度。例如：\n\n```python\nx = tf.ones((2, 2))\n\nwith tf.GradientTape() as t:\n  t.watch(x)\n  y = tf.reduce_sum(x)\n  z = tf.multiply(y, y)\n\n# Derivative of z with respect to the original input tensor x\ndz_dx = t.gradient(z, x)\nfor i in [0, 1]:\n  for j in [0, 1]:\n    assert dz_dx[i][j].numpy() == 8.0\n```\n\n您还可以根据在“记录的”tf.GradientTape上下文中计算的中间值请求输出的梯度。\n\n```python\nx = tf.ones((2, 2))\n\nwith tf.GradientTape() as t:\n  t.watch(x)\n  y = tf.reduce_sum(x)\n  z = tf.multiply(y, y)\n\n# Use the tape to compute the derivative of z with respect to the\n# intermediate value y.\ndz_dy = t.gradient(z, y)\nassert dz_dy.numpy() == 8.0\n```\n\n默认情况下，GradientTape持有的资源会在调用 `GradientTape.gradient()` 方法后立即释放。要在同一计算中计算多个梯度，请创建一个持久梯度带，这允许多次调用 `gradient()` 方法，当磁带对象被垃圾收集时释放资源。例如：\n\n```python\nx = tf.constant(3.0)\nwith tf.GradientTape(persistent=True) as t:\n  t.watch(x)\n  y = x * x\n  z = y * y\ndz_dx = t.gradient(z, x)  # 108.0 (4*x^3 at x = 3)\ndy_dx = t.gradient(y, x)  # 6.0\ndel t  # Drop the reference to the tape\n```\n\n### 2.1. 记录控制流程\n\n因为tapes(磁带)在执行时记录操作，所以Python控制流程（例如使用 `if` 和 `while`）自然会被处理：\n\n```python\ndef f(x, y):\n  output = 1.0\n  for i in range(y):\n    if i > 1 and i < 5:\n      output = tf.multiply(output, x)\n  return output\n\ndef grad(x, y):\n  with tf.GradientTape() as t:\n    t.watch(x)\n    out = f(x, y)\n  return t.gradient(out, x)\n\nx = tf.convert_to_tensor(2.0)\n\nassert grad(x, 6).numpy() == 12.0\nassert grad(x, 5).numpy() == 12.0\nassert grad(x, 4).numpy() == 4.0\n\n```\n\n### 2.2. 高阶梯度\n\n `GradientTape` 上下文管理器内的操作将被记录下来，以便自动微分。如果在该上下文中计算梯度，那么梯度计算也会被记录下来。因此，同样的API也适用于高阶梯度。例如:\n\n```python\nx = tf.Variable(1.0)  # Create a Tensorflow variable initialized to 1.0\n\nwith tf.GradientTape() as t:\n  with tf.GradientTape() as t2:\n    y = x * x * x\n  # Compute the gradient inside the 't' context manager\n  # which means the gradient computation is differentiable as well.\n  dy_dx = t2.gradient(y, x)\nd2y_dx2 = t.gradient(dy_dx, x)\n\nassert dy_dx.numpy() == 3.0\nassert d2y_dx2.numpy() == 6.0\n```\n\n## 3. 下一步\n\n在本教程中，我们介绍了TensorFlow中的梯度计算。有了这个，我们就拥有了构建和训练神经网络所需的足够原语。\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-eager-automatic_differentiation.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-eager-automatic_differentiation.html)\n> 英文版本：[https://tensorflow.google.cn/beta/tutorials/eager/automatic_differentiation](https://tensorflow.google.cn/beta/tutorials/eager/automatic_differentiation)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/eager/automatic_differentiation.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/eager/automatic_differentiation.md)\n"
  },
  {
    "path": "r2/tutorials/eager/basics.md",
    "content": "---\ntitle: tensorflow2.0张量及其操作、numpy兼容、GPU加速\ncategories: tensorflow2官方教程\ntags: tensorflow2.0教程\ntop: 1951\nabbrlink: tensorflow/tf2-tutorials-eager-basics\n---\n\n# tensorflow2.0张量及其操作、numpy兼容、GPU加速 (tensorflow2.0官方教程翻译）\n\n这是一个基础入门的TensorFlow教程，展示了如何：\n\n* 导入所需的包\n* 创建和使用张量\n* 使用GPU加速\n* 演示 `tf.data.Dataset`\n\n```python\nfrom __future__ import absolute_import, division, print_function\n```\n\n## 1. 导入TensorFlow\n\n要开始，请导入tensorflow模块。从TensorFlow 2.0开始，默认情况下用会启用Eager execution，这使得TensorFlow能够实现更加互动的前端，我们将在稍后讨论这些细节。\n\n```python\nimport tensorflow as tf\n```\n\n## 2. 张量\n\n张量是一个多维数组，与NumPy的 `ndarray` 对象类似，`tf.Tensor` 对象具有数据类型和形状，此外，`tf.Tensor` 可以驻留在加速器内存中（如GPU）。TensorFlow提供了丰富的操作库（([tf.add](https://www.tensorflow.org/api_docs/python/tf/add), [tf.matmul](https://www.tensorflow.org/api_docs/python/tf/matmul), [tf.linalg.inv](https://www.tensorflow.org/api_docs/python/tf/linalg/inv) 等），它们使用和生成`tf.Tensor`。这些操作会自动转换本机Python类型，例如：\n\n```python\nprint(tf.add(1, 2))\nprint(tf.add([1, 2], [3, 4]))\nprint(tf.square(5))\nprint(tf.reduce_sum([1, 2, 3]))\n\n# 操作符重载也支持\nprint(tf.square(2) + tf.square(3))\n```\n\n```\n      tf.Tensor(3, shape=(), dtype=int32) \n      tf.Tensor([4 6], shape=(2,), dtype=int32) \n      tf.Tensor(25, shape=(), dtype=int32) \n      tf.Tensor(6, shape=(), dtype=int32) \n      tf.Tensor(13, shape=(), dtype=int32)\n```\n\n每个 `tf.Tensor` 有一个形状和数据类型：\n\n```python\nx = tf.matmul([[1]], [[2, 3]])\nprint(x)\nprint(x.shape)\nprint(x.dtype)\n```\n\n```\n      tf.Tensor([[2 3]], shape=(1, 2), dtype=int32) (1, 2) <dtype: 'int32'>\n```\n\nNumPy数组和 `tf.Tensor` 之间最明显的区别是：\n\n1. 张量可以有加速器内存（如GPU,TPU）支持。\n\n2. 张量是不可改变的。\n\n\n### 2.1 NumPy兼容性\n\n在TensorFlow的 `tf.Tensor` 和NumPy的 `ndarray` 之间转换很容易：\n\n* TensorFlow操作自动将NumPy ndarray转换为Tensor\n\n* NumPy操作自动将Tensor转换为NumPy ndarray\n\n使用`.numpy（）`方法将张量显式转换为NumPy `ndarrays`。这些转换通常很便宜，因为如果可能的话，数组和`tf.Tensor`共享底层的内存表示。但是，共享底层表示并不总是可行的，因为`tf.Tensor`可以托管在GPU内存中，而NumPy阵列总是由主机内存支持，并且转换涉及从GPU到主机内存的复制。\n\n```python\nimport numpy as np\n\nndarray = np.ones([3, 3])\n\nprint(\"TensorFlow operations convert numpy arrays to Tensors automatically\")\ntensor = tf.multiply(ndarray, 42)\nprint(tensor)\n\n\nprint(\"And NumPy operations convert Tensors to numpy arrays automatically\")\nprint(np.add(tensor, 1))\n\nprint(\"The .numpy() method explicitly converts a Tensor to a numpy array\")\nprint(tensor.numpy())\n```\n\n```\n    TensorFlow operations convert numpy arrays to Tensors automatically\n      tf.Tensor( [[42. 42. 42.] [42. 42. 42.] [42. 42. 42.]], shape=(3, 3), dtype=float64) \n    And NumPy operations convert Tensors to numpy arrays automatically\n      [[43. 43. 43.] [43. 43. 43.] [43. 43. 43.]] \n    The .numpy() method explicitly converts a Tensor to a numpy array \n      [[42. 42. 42.] [42. 42. 42.] [42. 42. 42.]]\n```\n\n## 3. GPU加速\n\n使用GPU进行计算可以加速许多TensorFlow操作，如果没有任何注释，TensorFlow会自动决定是使用GPU还是CPU进行操作，如果有必要，可以复制CPU和GPU内存之间的张量，操作产生的张量通常由执行操作的设备的存储器支持，例如：\n\n```python\nx = tf.random.uniform([3, 3])\n\nprint(\"Is there a GPU available: \"),\nprint(tf.test.is_gpu_available())\n\nprint(\"Is the Tensor on GPU #0:  \"),\nprint(x.device.endswith('GPU:0'))\n```\n\n### 3.1 设备名称\n\n\n`Tensor.device`属性提供托管张量内容的设备的完全限定字符串名称。此名称编码许多详细信息，例如正在执行此程序的主机的网络地址的标识符以及该主机中的设备。这是分布式执行TensorFlow程序所必需的。如果张量位于主机上的第N个GPU上，则字符串以 `GPU:<N>`  结尾。\n  \n### 3.2 显式设备放置\n\n在TensorFlow中，*placement* (放置)指的是如何分配（放置）设备以执行各个操作，如上所述，如果没有提供明确的指导，TensorFlow会自动决定执行操作的设备，并在需要时将张量复制到该设备。但是，可以使用 `tf.device` 上下文管理器将TensorFlow操作显式放置在特定设备上，例如：\n\n```python\nimport time\n\ndef time_matmul(x):\n  start = time.time()\n  for loop in range(10):\n    tf.matmul(x, x)\n\n  result = time.time()-start\n\n  print(\"10 loops: {:0.2f}ms\".format(1000*result))\n\n# Force execution on CPU\nprint(\"On CPU:\")\nwith tf.device(\"CPU:0\"):\n  x = tf.random.uniform([1000, 1000])\n  assert x.device.endswith(\"CPU:0\")\n  time_matmul(x)\n\n# Force execution on GPU #0 if available\nif tf.test.is_gpu_available():\n  print(\"On GPU:\")\n  with tf.device(\"GPU:0\"): # Or GPU:1 for the 2nd GPU, GPU:2 for the 3rd etc.\n    x = tf.random.uniform([1000, 1000])\n    assert x.device.endswith(\"GPU:0\")\n    time_matmul(x)\n```\n\n```\n      On CPU: 10 loops: 88.60ms\n```\n\n## 4. 数据集\n\n本节使用 [`tf.data.Dataset` API](https://www.tensorflow.org/guide/datasets) 构建管道，以便为模型提供数据。 `tf.data.Dataset`  API用于从简单，可重复使用的部分构建高性能，复杂的输入管道，这些部分将为模型的训练或评估循环提供支持。\n\n\n### 4.1 创建源数据集\n\n使用其中一个工厂函数（如 [`Dataset.from_tensors`](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#from_tensors), [`Dataset.from_tensor_slices`](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#from_tensor_slices)）或使用从[`TextLineDataset`](https://www.tensorflow.org/api_docs/python/tf/data/TextLineDataset) 或  [`TFRecordDataset`](https://www.tensorflow.org/api_docs/python/tf/data/TFRecordDataset) 等文件读取的对象创建源数据集。有关详细信息，请参阅[TensorFlow数据集指南](https://www.tensorflow.org/guide/datasets#reading_input_data)。\n\n```python\nds_tensors = tf.data.Dataset.from_tensor_slices([1, 2, 3, 4, 5, 6])\n\n# Create a CSV file\nimport tempfile\n_, filename = tempfile.mkstemp()\n\nwith open(filename, 'w') as f:\n  f.write(\"\"\"Line 1\nLine 2\nLine 3\n  \"\"\")\n\nds_file = tf.data.TextLineDataset(filename)\n```\n\n### 4.2 应用转换\n\n使用 [`map`](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#map), [`batch`](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#batch), 和 [`shuffle`](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#shuffle)等转换函数将转换应用于数据集记录。\n\n```python\nds_tensors = ds_tensors.map(tf.square).shuffle(2).batch(2)\n\nds_file = ds_file.batch(2)\n```\n\n### 4.3 迭代（Iterate）\n\n`tf.data.Dataset` 对象支持迭代循环：\n\n\n```python\nprint('Elements of ds_tensors:')\nfor x in ds_tensors:\n  print(x)\n\nprint('\\nElements in ds_file:')\nfor x in ds_file:\n  print(x)\n```\n\n```\n      Elements of ds_tensors:\n        tf.Tensor([1 9], shape=(2,), dtype=int32) \n        tf.Tensor([ 4 25], shape=(2,), dtype=int32) \n        tf.Tensor([16 36], shape=(2,), dtype=int32) \n      Elements in ds_file: \n        tf.Tensor([b'Line 1' b'Line 2'], shape=(2,), dtype=string) \n        tf.Tensor([b'Line 3' b' '], shape=(2,), dtype=string)\n```\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-eager-basics.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-eager-basics.html)\n> 英文版本：[https://tensorflow.google.cn/beta/tutorials/eager/basics](https://tensorflow.google.cn/beta/tutorials/eager/basics)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/eager/basics.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/eager/basics.md)"
  },
  {
    "path": "r2/tutorials/eager/custom_layers.md",
    "content": "---\ntitle: 使用Keras自定义层\ncategories: tensorflow2官方教程\ntags: tensorflow2.0教程\ntop: 1952\nabbrlink: tensorflow/tf2-tutorials-eager-custom_layers\n---\n\n# 使用Keras自定义层 (tensorflow2.0官方教程翻译）\n\n我们建议使用 `tf.keras` 作为构建神经网络的高级API，也就是说，大多数TensorFlow API都可用于Eager execution。\n\n```python\nfrom __future__ import absolute_import, division, print_function, unicode_literals\n\nimport tensorflow as tf\n```\n\n## 1. 对图层的常用操作\n\n在编写机器学习模型的代码时，大多数情况下，您希望以比单个操作和单个变量操作更高的抽象级别上进行操作。\n\n许多机器学习模型都可以表示为相对简单的层的组合和叠加，TensorFlow提供了一组公共层和一种简单的方法，让您可以从头开始编写自己的特定于应用程序的层，也可以表示为现有层的组合。\n\nTensorFlow在 `tf.keras` 中包含完整 [Keras](https://keras.io) API，而Keras层在构建自己的模型时非常有用。\n\n\n```python\n# 在tf.keras.layers包中，图层是对象。要构造一个图层，只需构造一个对象。 \n# 大多数层将输出维度/通道的数量作为第一个参数。 \nlayer = tf.keras.layers.Dense(100)\n\n# 输入维度的数量通常是不必要的，因为它可以在第一次使用层时推断出来， \n# 但如果您想手动指定它，则可以提供它，这在某些复杂模型中很有用。 \nlayer = tf.keras.layers.Dense(10, input_shape=(None, 5))\n```\n\n可以在文档([链接](https://www.tensorflow.org/api_docs/python/tf/keras/layers))中看到预先存在的层的完整列表，它包括Dense（完全连接层），Conv2D，LSTM，BatchNormalization，Dropout等等。\n\n```python\n# 要使用图层，只需调用它即可。 \nlayer(tf.zeros([10, 5]))\n```\n\n\n```python\n# 层有许多有用的方法，例如，您可以使用 `layer.variables` 和可训练变量使用 \n# `layer.trainable_variables`检查图层中的所有变量，在这种情况下， \n# 完全连接的层将具有权重和偏差的变量。 \nprint(layer.variables) \n```\n\n```python\n# 变量也可以通过nice accessors访问\nprint(layer.kernel, layer.bias)\n```\n\n## 2. 使用keras实现自定义层\n\n实现自己的层的最佳方法是扩展`tf.keras.Layer` 类并实现：\n\n  *  `__init__` ，您可以在其中执行所有与输入无关的初始化\n\n  * `build`，您可以在其中了解输入张量的形状，并可以执行其余的初始化\n\n  * `call`，在那里进行正向计算。\n\n\n请注意，您不必等到调用 `build` 来创建变量，您也可以在 `__init__`中创建它们。但是，在 `build` 中创建它们的好处是，它支持根据将要操作的层的输入形状，创建后期变量。另一方面，在 `__init__` 中创建变量意味着需要明确指定创建变量所需的形状。\n\n```python\nclass MyDenseLayer(tf.keras.layers.Layer):\n  def __init__(self, num_outputs):\n    super(MyDenseLayer, self).__init__()\n    self.num_outputs = num_outputs\n\n  def build(self, input_shape):\n    self.kernel = self.add_variable(\"kernel\",\n                                    shape=[int(input_shape[-1]),\n                                           self.num_outputs])\n\n  def call(self, input):\n    return tf.matmul(input, self.kernel)\n\nlayer = MyDenseLayer(10)\nprint(layer(tf.zeros([10, 5])))\nprint(layer.trainable_variables)\n```\n\n如果尽可能使用标准层，则整体代码更易于阅读和维护，因为其他读者将熟悉标准层的行为。如果你想使用 `tf.keras.layers` 中不存在的图层，请考虑提交[github问题](http://github.com/tensorflow/tensorflow/issues/new)，或者最好向我们发送pull request！\n\n\n## 3. 通过组合层构建模型\n\n在机器学习模型中，许多有趣的类似层的事物都是通过组合现有层来实现的。例如，resnet中的每个残差块都是convolutions、 batch normalizations和shortcut的组合。\n\n创建包含其他层的类似层的事物时使用的主类是 `tf.keras.Model`，实现一个是通过继承自 `tf.keras.Model` 完成的。\n\n```python\nclass ResnetIdentityBlock(tf.keras.Model):\n  def __init__(self, kernel_size, filters):\n    super(ResnetIdentityBlock, self).__init__(name='')\n    filters1, filters2, filters3 = filters\n\n    self.conv2a = tf.keras.layers.Conv2D(filters1, (1, 1))\n    self.bn2a = tf.keras.layers.BatchNormalization()\n\n    self.conv2b = tf.keras.layers.Conv2D(filters2, kernel_size, padding='same')\n    self.bn2b = tf.keras.layers.BatchNormalization()\n\n    self.conv2c = tf.keras.layers.Conv2D(filters3, (1, 1))\n    self.bn2c = tf.keras.layers.BatchNormalization()\n\n  def call(self, input_tensor, training=False):\n    x = self.conv2a(input_tensor)\n    x = self.bn2a(x, training=training)\n    x = tf.nn.relu(x)\n\n    x = self.conv2b(x)\n    x = self.bn2b(x, training=training)\n    x = tf.nn.relu(x)\n\n    x = self.conv2c(x)\n    x = self.bn2c(x, training=training)\n\n    x += input_tensor\n    return tf.nn.relu(x)\n\n\nblock = ResnetIdentityBlock(1, [1, 2, 3])\nprint(block(tf.zeros([1, 2, 3, 3])))\nprint([x.name for x in block.trainable_variables])\n```\n\n```\n      tf.Tensor( [[[[0. 0. 0.] [0. 0. 0.] [0. 0. 0.]] [[0. 0. 0.] [0. 0. 0.] [0. 0. 0.]]]], shape=(1, 2, 3, 3), dtype=float32)\n      ['resnet_identity_block/conv2d/kernel:0', 'resnet_identity_block/conv2d/bias:0',\n      'resnet_identity_block/batch_normalization_v2/gamma:0', 'resnet_identity_block/batch_normalization_v2/beta:0',\n      'resnet_identity_block/conv2d_1/kernel:0', 'resnet_identity_block/conv2d_1/bias:0',\n      'resnet_identity_block/batch_normalization_v2_1/gamma:0', 'resnet_identity_block/batch_normalization_v2_1/beta:0',\n      'resnet_identity_block/conv2d_2/kernel:0', 'resnet_identity_block/conv2d_2/bias:0',\n      'resnet_identity_block/batch_normalization_v2_2/gamma:0', 'resnet_identity_block/batch_normalization_v2_2/beta:0']\n```\n\n然而，在大多数情况下，组成许多层的模型只是简单地调用一个又一个层。这可以通过使用 `tf.keras.Sequential`在很少的代码中完成\n\n```python\nmy_seq = tf.keras.Sequential([tf.keras.layers.Conv2D(1, (1, 1),\n                                                    input_shape=(\n                                                        None, None, 3)),\n                             tf.keras.layers.BatchNormalization(),\n                             tf.keras.layers.Conv2D(2, 1,\n                                                    padding='same'),\n                             tf.keras.layers.BatchNormalization(),\n                             tf.keras.layers.Conv2D(3, (1, 1)),\n                             tf.keras.layers.BatchNormalization()])\nmy_seq(tf.zeros([1, 2, 3, 3]))\n```\n\n# 4. 下一步\n\n现在，您可以返回到之前的教程，并调整线性回归示例，以使用更好的结构化层和模型。\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-eager-custom_layers.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-eager-custom_layers.html)\n> 英文版本：[https://tensorflow.google.cn/beta/tutorials/eager/custom_layers](https://tensorflow.google.cn/beta/tutorials/eager/custom_layers)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/eager/custom_layers.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/eager/custom_layers.md)"
  },
  {
    "path": "r2/tutorials/eager/custom_training.md",
    "content": "---\ntitle: 构建tensorflow2.0模型自定义训练的基础步骤\ncategories: tensorflow2官方教程\ntags: tensorflow2.0教程\ntop: 1954\nabbrlink: tensorflow/tf2-tutorials-eager-custom_training\n---\n\n# 构建tensorflow2.0模型自定义训练的基础步骤 (tensorflow2.0官方教程翻译）\n\n在上一个教程中，我们介绍了用于自动微分的TensorFlow API，这是机器学习的基本构建块。在本教程中，我们将使用先前教程中介绍的TensorFlow原语来进行一些简单的机器学习。\n\nTensorFlow还包括一个更高级别的神经网络API(`tf.keras`) ，它提供了有用的抽象来减少引用。我们强烈建议使用神经网络的人使用更高级别的API。\n但是，在这个简短的教程中，我们从基本原理入手开始介绍神经网络训练，以建立坚实的基础。\n\n## 1. 设置\n\n```python\nfrom __future__ import absolute_import, division, print_function, unicode_literals\n\nimport tensorflow as tf\n```\n\n## 2. 变量\n\nTensorFlow中的张量是不可变的无状态对象。然而，机器学习模型需要具有变化的状态：随着模型训练，计算预测的相同代码应该随着时间的推移而表现不同（希望具有较低的损失）。要表示需要在计算过程中进行更改的状态，您可以选择依赖Python是有状态编程语言的这一事实：\n\n```python\n# Using python state\nx = tf.zeros([10, 10])\nx += 2  # This is equivalent to x = x + 2, which does not mutate the original\n        # value of x\nprint(x)\n```\n\n但是，TensorFlow内置了有状态操作，这些操作通常比您所在状态的低级Python表示更令人愉快。例如，为了表示模型中的权重，使用TensorFlow变量通常是方便有效的。\n\n变量是一个存储值的对象，当在TensorFlow计算中使用时，它将隐式地从该存储值中读取。有一些操作（`tf.assign_sub`, `tf.scatter_update`等）可以操作存储在TensorFlow变量中的值。\n\n```python\nv = tf.Variable(1.0)\nassert v.numpy() == 1.0\n\n# Re-assign the value\nv.assign(3.0)\nassert v.numpy() == 3.0\n\n# Use `v` in a TensorFlow operation like tf.square() and reassign\nv.assign(tf.square(v))\nassert v.numpy() == 9.0\n```\n\n计算梯度时会自动跟踪使用变量的计算。对于表示嵌入的变量，TensorFlow默认会进行稀疏更新，这样可以提高计算效率和内存效率。\n\n使用变量也是一种快速让代码的读者知道这段状态是可变的方法。\n\n\n## 3. 示例：拟合一个线性模型\n\n现在让我们把我们迄今为止的几个概念：`Tensor`， `GradientTape`， `Variable`。构建并训练一个简单的模型。这通常涉及几个步骤：\n\n1. 定义模型\n\n2. 定义损失函数\n\n3. 获取训练数据\n\n4. 运行训练数据并使用“优化器”调整变量以拟合数据。\n\n在本教程中，我们将介绍简单线性模型的一个简单示例： `f(x) = x * W + b`，它有两个变量，`W` 和 `b`。此外，我们将合成数据，使训练有素的模型具有`W = 3.0` 和` b =2.0` 。\n\n### 3.1. 定义模型\n\n让我们定义一个简单的类来封装变量和计算\n\n```python\nclass Model(object):\n  def __init__(self):\n    # Initialize variable to (5.0, 0.0)\n    # In practice, these should be initialized to random values.\n    self.W = tf.Variable(5.0)\n    self.b = tf.Variable(0.0)\n\n  def __call__(self, x):\n    return self.W * x + self.b\n\nmodel = Model()\n\nassert model(3.0).numpy() == 15.0\n```\n\n### 3.2. 定义损失函数\n\n损失函数测量给定输入的模型输出与期望输出的匹配程度。让我们使用标准的L2损失：\n\n```python\ndef loss(predicted_y, desired_y):\n  return tf.reduce_mean(tf.square(predicted_y - desired_y))\n```\n\n### 3.3. 获取训练数据\n\n让我们用一些噪音合成训练数据：\n\n```python\nTRUE_W = 3.0\nTRUE_b = 2.0\nNUM_EXAMPLES = 1000\n\ninputs  = tf.random.normal(shape=[NUM_EXAMPLES])\nnoise   = tf.random.normal(shape=[NUM_EXAMPLES])\noutputs = inputs * TRUE_W + TRUE_b + noise\n```\n\n在我们训练模型之前，让我们可以看到模型现在所处的位置。我们将用红色绘制模型的预测，用蓝色绘制训练数据。\n\n```python\nimport matplotlib.pyplot as plt\n\nplt.scatter(inputs, outputs, c='b')\nplt.scatter(inputs, model(inputs), c='r')\nplt.show()\n\nprint('Current loss: '),\nprint(loss(model(inputs), outputs).numpy())\n```\n\n### 3.4. 定义训练循环\n\n我们现在拥有我们的网络和训练数据。让我们训练它，即使用训练数据来更新模型的变量（`W` 和 `b`），以便使用梯度下降来减少损失。在`tf.train.Optimizer`实现中拥有许多梯度下降方案的变体。我们强烈建议使用这些实现，但本着从基本原理构建的精神，在这个特定的例子中，我们将自己实现基本的数学。\n\n```python\ndef train(model, inputs, outputs, learning_rate):\n  with tf.GradientTape() as t:\n    current_loss = loss(model(inputs), outputs)\n  dW, db = t.gradient(current_loss, [model.W, model.b])\n  model.W.assign_sub(learning_rate * dW)\n  model.b.assign_sub(learning_rate * db)\n```\n\n最后，让我们反复浏览训练数据，看看W和b是如何演变的。\n\n```python\nmodel = Model()\n\n# Collect the history of W-values and b-values to plot later\nWs, bs = [], []\nepochs = range(10)\nfor epoch in epochs:\n  Ws.append(model.W.numpy())\n  bs.append(model.b.numpy())\n  current_loss = loss(model(inputs), outputs)\n\n  train(model, inputs, outputs, learning_rate=0.1)\n  print('Epoch %2d: W=%1.2f b=%1.2f, loss=%2.5f' %\n        (epoch, Ws[-1], bs[-1], current_loss))\n\n# Let's plot it all\nplt.plot(epochs, Ws, 'r',\n         epochs, bs, 'b')\nplt.plot([TRUE_W] * len(epochs), 'r--',\n         [TRUE_b] * len(epochs), 'b--')\nplt.legend(['W', 'b', 'true W', 'true_b'])\nplt.show()\n\n```\n\n```\n      Epoch 0: W=5.00 b=0.00, loss=9.34552 \n      ...\n      Epoch 9: W=3.22 b=1.74, loss=1.14022\n```\n\n![png](https://tensorflow.google.cn/beta/tutorials/eager/custom_training_files/output_22_1.png)\n\n\n## 4. 下一步\n\n在本教程中，我们介绍了变量，并使用到目前为止讨论的TensorFlow原语构建并训练了一个简单的线性模型。\n\n从理论上讲，这几乎是您使用TensorFlow进行机器学习研究所需要的全部内容。在实践中，特别是对于神经网络，像 `tf.keras` 这样的高级API将更加方便，因为它提供了更高级别的构建块（称为“层”），用于保存和恢复状态的实用程序，一套损失函数，套件优化策略等。\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-eager-custom_training.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-eager-custom_training.htnl)\n> 英文版本：[https://tensorflow.google.cn/beta/tutorials/eager/custom_training](https://tensorflow.google.cn/beta/tutorials/eager/custom_training)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/eager/custom_training.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/eager/custom_training.md)\n\n\n"
  },
  {
    "path": "r2/tutorials/eager/custom_training_walkthrough.md",
    "content": "---\ntitle: 使用Keras演示TensorFlow2.0自定义训练实战\ntags: \n    - tensorflow2.0\ncategories: \n    - tensorflow2官方教程\ntop: 1955\nabbrlink: tensorflow/tf2-tutorials-eager-custom_training_walkthrough\n---\n\n# 使用Keras演示TensorFlow2.0自定义训练实战 (tensorflow2.0官方教程翻译）\n\n本指南使用机器学习对鸢尾花按品种进行分类。它利用 TensorFlow 的 Eager Execution 来执行以下操作：\n\n1. 构建模型\n2. 使用样本数据训练该模型\n3. 利用该模型对未知数据进行预测。\n\n## 1. TensorFlow 编程\n\n本指南采用了以下高级 TensorFlow 概念：\n\n* 使用TensorFlow的默认  [eager execution](https://www.tensorflow.org/guide/eager) 开发环境,\n\n* 使用 [Datasets API](https://www.tensorflow.org/guide/datasets) 导入数据，\n\n* 使用 TensorFlow 的 [Keras API](https://keras.io/getting-started/sequential-model-guide/) 构建模型和层。\n\n本教程采用了与许多 TensorFlow 程序相似的结构：\n\n1. 导入和解析数据集。\n\n2. 选择模型类型。\n\n3. 训练模型。\n\n4. 评估模型的效果。\n\n5. 使用经过训练的模型进行预测。\n\n## 2. 设置程序\n\n### 2.1. 配置导入\n\n导入所需的 Python 模块（包括 TensorFlow），默认情况下，TensorFlow使用 Eager Execution 来立即评估操作，并返回具体的值，而不是创建稍后执行的计算图。如果您习惯使用 REPL 或 python 交互控制台，对于 Eager Execution 您会用起来得心应手。\n\n```python\nfrom __future__ import absolute_import, division, print_function, unicode_literals\n\nimport os\nimport matplotlib.pyplot as plt\n\nimport tensorflow as tf\n\nprint(\"TensorFlow version: {}\".format(tf.__version__))\nprint(\"Eager execution: {}\".format(tf.executing_eagerly()))\n```\n\n```\n      TensorFlow version: 2.0.0-alpha0 Eager execution: True\n```\n\n## 3. 鸢尾花分类问题\n\n想象一下，您是一名植物学家，正在寻找一种能够对所发现的每株鸢尾花进行自动归类的方法。机器学习可提供多种从统计学上分类花卉的算法。例如，一个复杂的机器学习程序可以根据照片对花卉进行分类。我们的要求并不高，我们将根据鸢尾花花萼和花瓣的长度和宽度对其进行分类。\n\n鸢尾属约有 300 个品种，但我们的程序将仅对下列三个品种进行分类：\n\n* 山鸢尾\n* 维吉尼亚鸢尾\n* 变色鸢尾\n\n<table>\n  <tr><td>\n    <img src=\"https://tensorflow.google.cn/images/iris_three_species.jpg\"\n         alt=\"Petal geometry compared for three iris species: Iris setosa, Iris virginica, and Iris versicolor\">\n  </td></tr>\n  <tr><td align=\"center\">\n    <b>图1.</b> <a href=\"https://commons.wikimedia.org/w/index.php?curid=170298\">山鸢尾Iris setosa, <a href=\"https://commons.wikimedia.org/w/index.php?curid=248095\">变色鸢尾Iris versicolor</a>，和 <a href=\"https://www.flickr.com/photos/33397993@N05/3352169862\">维吉尼亚鸢尾Iris virginica</a> <br/>&nbsp;\n  </td></tr>\n</table>\n\n幸运的是，有人已经创建了一个包含 120 株鸢尾花的数据集（其中有花萼和花瓣的测量值）。这是一个在入门级机器学习分类问题中经常使用的经典数据集。\n\n## 4. 导入和解析训练数据集\n\n下载数据集文件并将其转换为可供此 Python 程序使用的结构。\n\n### 4.1. 下载数据集\n\n使用 [tf.keras.utils.get_file](https://www.tensorflow.org/api_docs/python/tf/keras/utils/get_file) 函数下载训练数据集文件。该函数会返回下载文件的文件路径。\n\n```python\ntrain_dataset_url = \"https://storage.googleapis.com/download.tensorflow.org/data/iris_training.csv\"\n\ntrain_dataset_fp = tf.keras.utils.get_file(fname=os.path.basename(train_dataset_url),\n                                           origin=train_dataset_url)\n\nprint(\"Local copy of the dataset file: {}\".format(train_dataset_fp))\n```\n\n### 4.2. 检查数据\n\n数据集 `iris_training.csv` 是一个纯文本文件，其中存储了逗号分隔值 (CSV) 格式的表格式数据。请使用 `head -n5` 命令查看前 5 个条目：\n\n```\n!head -n5 {train_dataset_fp}\n```\n\n```\n      120,4,setosa,versicolor,virginica \n      6.4,2.8,5.6,2.2,2 \n      5.0,2.3,3.3,1.0,1 \n      4.9,2.5,4.5,1.7,2 \n      4.9,3.1,1.5,0.1,0\n```\n\n我们可以从该数据集视图中注意到以下信息：\n\n1. 第一行是标题，其中包含数据集信息：\n* 共有 120 个样本。每个样本都有四个特征和一个标签名称，标签名称有三种可能。\n\n2. 后面的行是数据记录，每个样本各占一行，其中：\n* 前四个字段是特征：即样本的特点。在此数据集中，这些字段存储的是代表花卉测量值的浮点数。\n* 最后一列是标签：即我们想要预测的值。对于此数据集，该值为 0、1 或 2 中的某个整数值（每个值分别对应一个花卉名称）。\n\n我们用代码表示出来：\n\n```python\n# column order in CSV file\ncolumn_names = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']\n\nfeature_names = column_names[:-1]\nlabel_name = column_names[-1]\n\nprint(\"Features: {}\".format(feature_names))\nprint(\"Label: {}\".format(label_name))\n```\n\n```\n      Features: ['sepal_length', 'sepal_width', 'petal_length', 'petal_width'] Label: species\n```\n\n每个标签都分别与一个字符串名称（例如“setosa”）相关联，但机器学习通常依赖于数字值。标签编号会映射到一个指定的表示法，例如：\n\n* `0`: 山鸢尾\n* `1`: 变色鸢尾\n* `2`: 维吉尼亚鸢尾\n\n如需详细了解特征和标签，请参阅[《机器学习速成课程》的“机器学习术语”部分](https://developers.google.cn/machine-learning/crash-course/framing/ml-terminology)。\n\n```python\nclass_names = ['Iris setosa', 'Iris versicolor', 'Iris virginica']\n```\n\n### 4.3. 创建一个 `tf.data.Dataset`\n\nTensorFlow 的 Dataset API 可处理在向模型加载数据时遇到的许多常见情况。这是一种高阶 API，用于读取数据并将其转换为可供训练使用的格式。如需了解详情，请参阅[数据集快速入门指南](https://tensorflow.google.cn/guide/datasets_for_estimators)。\n\n由于数据集是 CSV 格式的文本文件，请使用 make_csv_dataset 函数将数据解析为合适的格式。由于此函数为训练模型生成数据，默认行为是对数据进行随机处理 (`shuffle=True, shuffle_buffer_size=10000`)，并且无限期重复数据集 (`num_epochs=None`)。我们还设置了 batch_size 参数。\n\n```python\nbatch_size = 32\n\ntrain_dataset = tf.data.experimental.make_csv_dataset(\n    train_dataset_fp,\n    batch_size,\n    column_names=column_names,\n    label_name=label_name,\n    num_epochs=1)\n```\n\n`make_csv_dataset` 函数返回 `(features, label)` 对的 `tf.data.Dataset`，其中 `features` 是一个字典：`{'feature_name': value}`\n\n这些 Dataset 对象便可迭代。我们来看看一批特征：\n\n```python\nfeatures, labels = next(iter(train_dataset))\n\nprint(features)\n```\n\n请注意，类似特征会归为一组，即分为一批。每个样本行的字段都会附加到相应的特征数组中。更改 batch_size 可设置存储在这些特征数组中的样本数。\n\n绘制该批次中的几个特征后，就会开始看到一些集群现象：\n\n```python\nplt.scatter(features['petal_length'],\n            features['sepal_length'],\n            c=labels,\n            cmap='viridis')\n\nplt.xlabel(\"Petal length\")\nplt.ylabel(\"Sepal length\")\nplt.show()\n```\n\n![png](https://tensorflow.google.cn/beta/tutorials/eager/custom_training_walkthrough_files/output_22_0.png)\n\n要简化模型构建步骤，请创建一个函数以将特征字典重新打包为形状为 `(batch_size, num_features)` 的单个数组。\n\n此函数使用 [tf.stack](https://tensorflow.google.cn/api_docs/python/tf/stack) 方法，该方法从张量列表中获取值，并创建指定维度的组合张量。\n\n```python\ndef pack_features_vector(features, labels):\n  \"\"\"Pack the features into a single array.\"\"\"\n  features = tf.stack(list(features.values()), axis=1)\n  return features, labels\n```\n\n然后使用 [tf.data.Dataset.map](https://tensorflow.google.cn/api_docs/python/tf/data/dataset/map) 方法将每个 `(features,label)` 对的 `features` 打包到训练数据集中：\n\n```python\ntrain_dataset = train_dataset.map(pack_features_vector)\n```\n\n`Dataset` 的 features 元素现在是形状为 `(batch_size, num_features)` 的数组。我们来看看前几个样本：\n\n```python\nfeatures, labels = next(iter(train_dataset))\n\nprint(features[:5])\n```\n\n```\n    tf.Tensor( \n    [[4.9 2.4 3.3 1. ] \n    ...\n    [6.6 3. 4.4 1.4]], shape=(5, 4), dtype=float32)\n```\n\n## 5. 选择模型类型\n\n### 5.1. 为何要使用模型？\n\n模型是指特征与标签之间的关系。对于鸢尾花分类问题，模型定义了花萼和花瓣测量值与预测的鸢尾花品种之间的关系。一些简单的模型可以用几行代数进行描述，但复杂的机器学习模型拥有大量难以汇总的参数。\n\n您能否在不使用机器学习的情况下确定四个特征与鸢尾花品种之间的关系？也就是说，您能否使用传统编程技巧（例如大量条件语句）创建模型？也许能，前提是反复分析该数据集，并最终确定花瓣和花萼测量值与特定品种的关系。对于更复杂的数据集来说，这会变得非常困难，或许根本就做不到。一个好的机器学习方法可为您确定模型。如果您将足够多的代表性样本馈送到正确类型的机器学习模型中，该程序便会为您找出相应的关系。\n\n### 5.2. 选择模型\n\n我们需要选择要进行训练的模型类型。模型具有许多类型，挑选合适的类型需要一定的经验。本教程使用神经网络来解决鸢尾花分类问题。神经网络可以发现特征与标签之间的复杂关系。神经网络是一个高度结构化的图，其中包含一个或多个隐藏层。每个隐藏层都包含一个或多个神经元。神经网络有多种类别，该程序使用的是密集型神经网络，也称为全连接神经网络：一个层中的神经元将从上一层中的每个神经元获取输入连接。例如，图 2 显示了一个密集型神经网络，其中包含 1 个输入层、2 个隐藏层以及 1 个输出层：\n\n<table>\n  <tr><td>\n    <img src=\"https://tensorflow.google.cn/images/custom_estimators/full_network.png\"\n         alt=\"A diagram of the network architecture: Inputs, 2 hidden layers, and outputs\">\n  </td></tr>\n  <tr><td align=\"center\">\n    <b>图 2.</b> 包含特征、隐藏层和预测的神经网络 <br/>&nbsp;\n  </td></tr>\n</table>\n\n当图 2 中的模型经过训练并馈送未标记的样本时，它会产生 3 个预测结果：相应鸢尾花属于指定品种的可能性。这种预测称为[推理](https://developers.google.cn/machine-learning/crash-course/glossary#inference)。对于该示例，输出预测结果的总和是 1.0。在图 2 中，该预测结果分解如下：山鸢尾为 0.02，变色鸢尾为 0.95，维吉尼亚鸢尾为 0.03。这意味着该模型预测某个无标签鸢尾花样本是变色鸢尾的概率为 95％。\n\n\n### 5.3. 使用Keras创建模型\n\nTensorFlow `tf.keras` API 是创建模型和层的首选方式。通过该 API，您可以轻松地构建模型并进行实验，而将所有部分连接在一起的复杂工作则由 Keras 处理。\n\n`tf.keras.Sequential` 模型是层的线性堆叠。该模型的构造函数会采用一系列层实例；在本示例中，采用的是 2 个密集层（分别包含 10 个节点）以及 1 个输出层（包含 3 个代表标签预测的节点）。第一个层的 `input_shape` 参数对应该数据集中的特征数量，它是一项必需参数。\n\n```python\nmodel = tf.keras.Sequential([\n  tf.keras.layers.Dense(10, activation=tf.nn.relu, input_shape=(4,)),  # input shape required\n  tf.keras.layers.Dense(10, activation=tf.nn.relu),\n  tf.keras.layers.Dense(3)\n])\n```\n\n[激活函数](https://developers.google.cn/machine-learning/crash-course/glossary#activation_function)可决定层中每个节点的输出形状。这些非线性关系很重要，如果没有它们，模型将等同于单个层。[激活函数有很多](https://tensorflow.google.cn/api_docs/python/tf/keras/activations)，但隐藏层通常使用 [ReLU](https://developers.google.cn/machine-learning/crash-course/glossary#ReLU)。\n\n隐藏层和神经元的理想数量取决于问题和数据集。与机器学习的多个方面一样，选择最佳的神经网络形状需要一定的知识水平和实验基础。一般来说，增加隐藏层和神经元的数量通常会产生更强大的模型，而这需要更多数据才能有效地进行训练。\n\n### 5.4. 查看模型\n\n我们快速了解一下此模型如何处理一批特征：\n\n```\npredictions = model(features)\npredictions[:5]\n```\n\n在此示例中，每个样本针对每个类别返回一个[logit](https://developers.google.cn/machine-learning/crash-course/glossary#logits) 。\n\n要将这些对数转换为每个类别的概率，请使用 [softmax](https://developers.google.cn/machine-learning/crash-course/glossary#softmax)  函数：\n\n```python\ntf.nn.softmax(predictions[:5])\n```\n\n对每个类别执行 `tf.argmax` 运算可得出预测的类别索引。不过，该模型尚未接受训练，因此这些预测并不理想。\n\n```python\nprint(\"Prediction: {}\".format(tf.argmax(predictions, axis=1)))\nprint(\"    Labels: {}\".format(labels))\n```\n\n## 6. 训练模型\n\n训练是一个机器学习阶段，在此阶段中，模型会逐渐得到优化，也就是说，模型会了解数据集。目标是充分了解训练数据集的结构，以便对未见过的数据进行预测。如果您从训练数据集中获得了过多的信息，预测便会仅适用于模型见过的数据，但是无法泛化。此问题称为[过拟合](https://developers.google.cn/machine-learning/crash-course/glossary#overfitting)，好比将答案死记硬背下来，而不去理解问题的解决方式。\n\n鸢尾花分类问题是监督式机器学习的一个示例：模型通过包含标签的样本加以训练。在非监督式机器学习中，样本不包含标签。相反，模型通常会在特征中发现一些规律。\n\n### 6.1. 定义损失和梯度函数\n\n在训练和评估阶段，我们都需要计算模型的损失。这样可以衡量模型的预测结果与预期标签有多大偏差，也就是说，模型的效果有多差。我们希望尽可能减小或优化这个值。\n\n我们的模型会使用 `tf.keras.losses.SparseCategoricalCrossentropy` 函数计算其损失，此函数会接受模型的类别概率预测结果和预期标签，然后返回样本的平均损失。\n\n```python\nloss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)\n```\n\n\n```python\ndef loss(model, x, y):\n  y_ = model(x)\n\n  return loss_object(y_true=y, y_pred=y_)\n\n\nl = loss(model, features, labels)\nprint(\"Loss test: {}\".format(l))\n```\n\n使用 [tf.GradientTape](https://tensorflow..google.cn/api_docs/python/tf/GradientTape) 上下计算用于优化模型的[梯度](https://developers.google.cn/machine-learning/crash-course/glossary#gradient)。\n\n```python\ndef grad(model, inputs, targets):\n  with tf.GradientTape() as tape:\n    loss_value = loss(model, inputs, targets)\n  return loss_value, tape.gradient(loss_value, model.trainable_variables)\n```\n\n### 6.2. 创建优化器\n\n*[优化器](https://developers.google.cn/machine-learning/crash-course/glossary#optimizer)* 会将计算出的梯度应用于模型的变量，以最小化 loss 函数。您可以将损失函数想象为一个曲面（见图 3），我们希望通过到处走动找到该曲面的最低点。梯度指向最高速上升的方向，因此我们将沿相反的方向向下移动。我们以迭代方式计算每个批次的损失和梯度，以在训练过程中调整模型。模型会逐渐找到权重和偏差的最佳组合，从而将损失降至最低。损失越低，模型的预测效果就越好。\n\n<table>\n  <tr><td>\n    <img src=\"https://cs231n.github.io/assets/nn3/opt1.gif\" width=\"70%\"\n         alt=\"Optimization algorithms visualized over time in 3D space.\">\n  </td></tr>\n  <tr><td align=\"center\">\n    <b>图 3.</b> 优化算法在三维空间中随时间推移而变化的可视化效果。<br/>(Source: <a href=\"http://cs231n.github.io/neural-networks-3/\">斯坦福大学 CS231n 课程</a>, MIT License, Image credit: <a href=\"https://twitter.com/alecrad\">Alec Radford</a>)\n  </td></tr>\n</table>\n\nTensorFlow 拥有许多可用于训练的[优化算法](https://www.tensorflow.org/api_guides/python/train)。此模型使用的是 [tf.train.GradientDescentOptimizer](https://www.tensorflow.org/api_docs/python/tf/train/GradientDescentOptimizer)，它可以实现[随机梯度下降法](https://developers.google.cn/machine-learning/crash-course/glossary#gradient_descent) (SGD)。`learning_rate` 用于设置每次迭代（向下行走）的步长。这是一个超参数，您通常需要调整此参数以获得更好的结果。\n\n我们来设置优化器：\n\n```python\noptimizer = tf.keras.optimizers.Adam(learning_rate=0.01)\n```\n\n我们将使用它来计算单个优化步骤：\n\n```python\nloss_value, grads = grad(model, features, labels)\n\nprint(\"Step: {}, Initial Loss: {}\".format(optimizer.iterations.numpy(),\n                                          loss_value.numpy()))\n\noptimizer.apply_gradients(zip(grads, model.trainable_variables))\n\nprint(\"Step: {},         Loss: {}\".format(optimizer.iterations.numpy(),\n                                          loss(model, features, labels).numpy()))\n```\n\n```\n      Step: 0, Initial Loss: 2.3108744621276855 \n      Step: 1, Loss: 1.7618987560272217\n```\n\n### 6.3. 训练循环\n\n一切准备就绪后，就可以开始训练模型了！训练循环会将数据集样本馈送到模型中，以帮助模型做出更好的预测。以下代码块可设置这些训练步骤：\n\n1. 迭代每个周期。通过一次数据集即为一个周期。\n2. 在一个周期中，遍历训练 Dataset 中的每个样本，并获取样本的特征 (x) 和标签 (y)。\n3. 根据样本的特征进行预测，并比较预测结果和标签。衡量预测结果的不准确性，并使用所得的值计算模型的损失和梯度。\n4. 使用 optimizer 更新模型的变量。\n5. 跟踪一些统计信息以进行可视化。\n6. 对每个周期重复执行以上步骤。\n\nnum_epochs 变量是遍历数据集集合的次数。与直觉恰恰相反的是，训练模型的时间越长，并不能保证模型就越好。num_epochs 是一个可以调整的超参数。选择正确的次数通常需要一定的经验和实验基础。\n\n```python\n## Note: Rerunning this cell uses the same model variables\n\n# keep results for plotting\ntrain_loss_results = []\ntrain_accuracy_results = []\n\nnum_epochs = 201\n\nfor epoch in range(num_epochs):\n  epoch_loss_avg = tf.keras.metrics.Mean()\n  epoch_accuracy = tf.keras.metrics.SparseCategoricalAccuracy()\n\n  # Training loop - using batches of 32\n  for x, y in train_dataset:\n    # Optimize the model\n    loss_value, grads = grad(model, x, y)\n    optimizer.apply_gradients(zip(grads, model.trainable_variables))\n\n    # Track progress\n    epoch_loss_avg(loss_value)  # add current batch loss\n    # compare predicted label to actual label\n    epoch_accuracy(y, model(x))\n\n  # end epoch\n  train_loss_results.append(epoch_loss_avg.result())\n  train_accuracy_results.append(epoch_accuracy.result())\n\n  if epoch % 50 == 0:\n    print(\"Epoch {:03d}: Loss: {:.3f}, Accuracy: {:.3%}\".format(epoch,\n                                                                epoch_loss_avg.result(),\n                                                                epoch_accuracy.result()))\n```\n\n```\n      Epoch 000: Loss: 1.568, Accuracy: 30.000%\n      ...\n      Epoch 200: Loss: 0.049, Accuracy: 97.500%\n```\n\n### 6.4. 可视化损失函数随时间推移而变化的情况\n\n虽然输出模型的训练过程有帮助，但查看这一过程往往更有帮助。TensorBoard 是与 TensorFlow 封装在一起的出色可视化工具，不过我们可以使用 matplotlib 模块创建基本图表。\n\n解读这些图表需要一定的经验，不过您确实希望看到损失下降且准确率上升。\n\n```python\nfig, axes = plt.subplots(2, sharex=True, figsize=(12, 8))\nfig.suptitle('Training Metrics')\n\naxes[0].set_ylabel(\"Loss\", fontsize=14)\naxes[0].plot(train_loss_results)\n\naxes[1].set_ylabel(\"Accuracy\", fontsize=14)\naxes[1].set_xlabel(\"Epoch\", fontsize=14)\naxes[1].plot(train_accuracy_results)\nplt.show()\n```\n\n![png](https://tensorflow.google.cn/beta/tutorials/eager/custom_training_walkthrough_files/output_54_0.png)\n\n## 7. 评估模型的效果\n\n模型已经过训练，现在我们可以获取一些关于其效果的统计信息了。\n\n评估指的是确定模型做出预测的效果。要确定模型在鸢尾花分类方面的效果，请将一些花萼和花瓣测量值传递给模型，并要求模型预测它们所代表的鸢尾花品种。然后，将模型的预测结果与实际标签进行比较。例如，如果模型对一半输入样本的品种预测正确，则准确率为 0.5。图 4 显示的是一个效果更好一些的模型，该模型做出 5 次预测，其中有 4 次正确，准确率为 80%：\n\n<table cellpadding=\"8\" border=\"0\">\n  <colgroup>\n    <col span=\"4\" >\n    <col span=\"1\" bgcolor=\"lightblue\">\n    <col span=\"1\" bgcolor=\"lightgreen\">\n  </colgroup>\n  <tr bgcolor=\"lightgray\">\n    <th colspan=\"4\">样本特征</th>\n    <th colspan=\"1\">标签</th>\n    <th colspan=\"1\" >模型预测</th>\n  </tr>\n  <tr>\n    <td>5.9</td><td>3.0</td><td>4.3</td><td>1.5</td><td align=\"center\">1</td><td align=\"center\">1</td>\n  </tr>\n  <tr>\n    <td>6.9</td><td>3.1</td><td>5.4</td><td>2.1</td><td align=\"center\">2</td><td align=\"center\">2</td>\n  </tr>\n  <tr>\n    <td>5.1</td><td>3.3</td><td>1.7</td><td>0.5</td><td align=\"center\">0</td><td align=\"center\">0</td>\n  </tr>\n  <tr>\n    <td>6.0</td> <td>3.4</td> <td>4.5</td> <td>1.6</td> <td align=\"center\">1</td><td align=\"center\" bgcolor=\"red\">2</td>\n  </tr>\n  <tr>\n    <td>5.5</td><td>2.5</td><td>4.0</td><td>1.3</td><td align=\"center\">1</td><td align=\"center\">1</td>\n  </tr>\n  <tr><td align=\"center\" colspan=\"6\">\n    <b>图4.</b> 准确率为 80% 的鸢尾花分类器。<br/>&nbsp;\n  </td></tr>\n</table>\n\n### 7.1. 设置测试数据集\n\n评估模型与训练模型相似。最大的区别在于，样本来自一个单独的测试集，而不是训练集。为了公正地评估模型的效果，用于评估模型的样本务必与用于训练模型的样本不同。\n\n测试 Dataset 的设置与训练 Dataset 的设置相似。下载 CSV 文本文件并解析相应的值，然后对数据稍加随机化处理：\n\n```python\ntest_url = \"https://storage.googleapis.com/download.tensorflow.org/data/iris_test.csv\"\n\ntest_fp = tf.keras.utils.get_file(fname=os.path.basename(test_url),\n                                  origin=test_url)\n```\n\n\n```python\ntest_dataset = tf.data.experimental.make_csv_dataset(\n    test_fp,\n    batch_size,\n    column_names=column_names,\n    label_name='species',\n    num_epochs=1,\n    shuffle=False)\n\ntest_dataset = test_dataset.map(pack_features_vector)\n```\n\n### 7.2. 根据测试数据集评估模型\n\n与训练阶段不同，模型仅评估测试数据的一个周期。在以下代码单元格中，我们会遍历测试集中的每个样本，然后将模型的预测结果与实际标签进行比较。这是为了衡量模型在整个测试集中的准确率。\n\n```python\ntest_accuracy = tf.keras.metrics.Accuracy()\n\nfor (x, y) in test_dataset:\n  logits = model(x)\n  prediction = tf.argmax(logits, axis=1, output_type=tf.int32)\n  test_accuracy(prediction, y)\n\nprint(\"Test set accuracy: {:.3%}\".format(test_accuracy.result()))\n```\n```\n      Test set accuracy: 96.667%\n```\n\n例如，我们可以看到对于最后一批数据，该模型通常预测正确：\n\n```python\ntf.stack([y,prediction],axis=1)\n```\n\n```\n      <tf.Tensor: id=164408, shape=(30, 2), dtype=int32, numpy= \n      array([[1, 1], \n             [2, 2], \n             [0, 0],..., dtype=int32)>\n```\n\n## 8. 使用经过训练的模型进行预测\n\n我们已经训练了一个模型并“证明”它是有效的，但在对鸢尾花品种进行分类方面，这还不够。现在，我们使用经过训练的模型对无标签样本（即包含特征但不包含标签的样本）进行一些预测。\n\n在现实生活中，无标签样本可能来自很多不同的来源，包括应用程序、CSV 文件和数据源。暂时我们将手动提供三个无标签样本以预测其标签。回想一下，标签编号会映射到一个指定的表示法：\n\n* `0`：山鸢尾\n* `1`：变色鸢尾\n* `2`：维吉尼亚鸢尾\n\n```python\npredict_dataset = tf.convert_to_tensor([\n    [5.1, 3.3, 1.7, 0.5,],\n    [5.9, 3.0, 4.2, 1.5,],\n    [6.9, 3.1, 5.4, 2.1]\n])\n\npredictions = model(predict_dataset)\n\nfor i, logits in enumerate(predictions):\n  class_idx = tf.argmax(logits).numpy()\n  p = tf.nn.softmax(logits)[class_idx]\n  name = class_names[class_idx]\n  print(\"Example {} prediction: {} ({:4.1f}%)\".format(i, name, 100*p))\n```\n\n```\n      Example 0 prediction: Iris setosa (100.0%)\n      Example 1 prediction: Iris versicolor (100.0%) \n      Example 2 prediction: Iris virginica (99.5%)\n```\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-eager-custom_training_walkthrough.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-eager-custom_training_walkthrough.html)\n> 英文版本：[https://tensorflow.google.cn/beta/tutorials/eager/custom_training_walkthrough](https://tensorflow.google.cn/beta/tutorials/eager/custom_training_walkthrough)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/eager/custom_training_walkthrough.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/eager/custom_training_walkthrough.md)\n"
  },
  {
    "path": "r2/tutorials/eager/tf_function.md",
    "content": "---\ntitle: tf.function和AutoGraph\ncategories: tensorflow2官方教程\ntags: tensorflow2.0教程\ntop: 1956\nabbrlink: tensorflow/tf2-tutorials-eager-tf_function\n---\n\n# tf.function和 AutoGraph 提高性能和可部署性 (tensorflow2.0官方教程翻译）\n\n在TensorFlow 2.0中，默认情况下会打开eager execution，这为您提供了一个非常直观和灵活的用户界面（运行一次性操作更容易，更快）但这可能会牺牲性能和可部署性。\n\n为了获得最佳性能并使您的模型可以在任何地方部署，我们提供了 `tf.function` 作为您可以用来从程序中生成图的工具。多亏了AutoGraph，大量的Python代码可以与tf.function一起工作。但仍有一些陷阱需要警惕。\n\n主要的要点和建议是：\n\n- 不要依赖Python副作用，如对象变异或列表追加。\n\n- tf.function最适合TF操作，而不是NumPy操作或Python原语。\n\n- 如果有疑问，`for x in y` 习语可能会有效。\n\n```python\nfrom __future__ import absolute_import, division, print_function, unicode_literals\n\nimport tensorflow as tf\n\nimport contextlib\n\n# Some helper code to demonstrate the kinds of errors you might encounter.\n@contextlib.contextmanager\ndef assert_raises(error_class):\n  try:\n    yield\n  except error_class as e:\n    print('Caught expected exception \\n  {}: {}'.format(error_class, e))\n  except Exception as e:\n    print('Got unexpected exception \\n  {}: {}'.format(type(e), e))\n  else:\n    raise Exception('Expected {} to be raised but no error was raised!'.format(\n        error_class))\n```\n\n你定义的 `tf.function` 就像一个核心TensorFlow操作：你可以急切地执行它，你可以在图中使用它，它有梯度等。\n\n```python\n# A function is like an op\n\n@tf.function\ndef add(a, b):\n  return a + b\n\nadd(tf.ones([2, 2]), tf.ones([2, 2]))  #  [[2., 2.], [2., 2.]]\n```\n\n输出：\n\n```output\n      <tf.Tensor: id=14, shape=(2, 2), dtype=float32, numpy= array([[2., 2.], [2., 2.]], dtype=float32)>\n```\n\n代码\n```python\n# Functions have gradients\n\n@tf.function\ndef add(a, b):\n  return a + b\n\nv = tf.Variable(1.0)\nwith tf.GradientTape() as tape:\n  result = add(v, 1.0)\ntape.gradient(result, v)\n```\n\n输出：\n\n```output\n      <tf.Tensor: id=40, shape=(), dtype=float32, numpy=1.0>\n```\n\n代码\n```python\n# You can use functions inside functions\n\n@tf.function\ndef dense_layer(x, w, b):\n  return add(tf.matmul(x, w), b)\n\ndense_layer(tf.ones([3, 2]), tf.ones([2, 2]), tf.ones([2]))\n```\n\n输出：\n\n```output\n  <tf.Tensor: id=67, shape=(3, 2), dtype=float32, numpy= array([[3., 3.], [3., 3.], [3., 3.]], dtype=float32)>\n```\n\n\n## 1. 追踪和多态性\n\nPython的动态类型意味着您可以使用各种参数类型调用函数，Python将在每个场景中执行不同的操作。\n另一方面，TensorFlow图需要静态dtypes和形状尺寸。`tf.function` 通过在必要时回溯函数生成正确的图来弥补这一差距。`tf.function` 使用的大多数微妙之处源于这种回溯行为。\n\n您可以使用不同类型的参数调用函数来查看正在发生的事情。\n\n```python\n# Functions are polymorphic\n\n@tf.function\ndef double(a):\n  print(\"Tracing with\", a)\n  return a + a\n\nprint(double(tf.constant(1)))\nprint()\nprint(double(tf.constant(1.1)))\nprint()\nprint(double(tf.constant(\"a\")))\nprint()\n\n```\n\n输出：\n\n```output\n      Tracing with Tensor(\"a:0\", shape=(), dtype=int32) tf.Tensor(2, shape=(), dtype=int32) \n      Tracing with Tensor(\"a:0\", shape=(), dtype=float32) tf.Tensor(2.2, shape=(), dtype=float32) \n      Tracing with Tensor(\"a:0\", shape=(), dtype=string) tf.Tensor(b'aa', shape=(), dtype=string)\n```\n\n要控制跟踪行为，请使用以下技术：\n\n- 创建一个新的`tf.function`：保证单独的`tf.function`对象不共享跟踪。\n\n- 使用`get_concrete_function`方法获取特定的跟踪 \n\n- 调用`tf.function`时指定`input_signature`以确保只构建一个函数图\n\n\n```python\nprint(\"Obtaining concrete trace\")\ndouble_strings = double.get_concrete_function(tf.TensorSpec(shape=None, dtype=tf.string))\nprint(\"Executing traced function\")\nprint(double_strings(tf.constant(\"a\")))\nprint(double_strings(a=tf.constant(\"b\")))\nprint(\"Using a concrete trace with incompatible types will throw an error\")\nwith assert_raises(tf.errors.InvalidArgumentError):\n  double_strings(tf.constant(1))\n```\n\n```python\n@tf.function(input_signature=(tf.TensorSpec(shape=[None], dtype=tf.int32),))\ndef next_collatz(x):\n  print(\"Tracing with\", x)\n  return tf.where(tf.equal(x % 2, 0), x // 2, 3 * x + 1)\n\nprint(next_collatz(tf.constant([1, 2])))\n# We specified a 1-D tensor in the input signature, so this should fail.\nwith assert_raises(ValueError):\n  next_collatz(tf.constant([[1, 2], [3, 4]]))\n\n```\n\n## 2. 什么时候回溯？\n\n多态 `tf.function` 保持跟踪生成的具体函数的缓存。缓存键实际上是从函数args和kwargs生成的键的元组。为`tf.Tensor`参数生成的关键是它的形状和类型。为Python原语生成的密钥是它的值。对于所有其他Python类型，键基于对象`id（）`，以便为每个类的实例独立跟踪方法。将来，TensorFlow可以为Python对象添加更复杂的缓存，可以安全地转换为张量。\n\n## 3. Python还是Tensor args？\n\n通常，Python参数用于控制超参数和图构造。例如，`num_layers = 10 `或 `training = True` 或`nonlinearity ='relu'`。因此，如果Python参数发生变化，那么您必须回溯图形是有道理的。\n\n但是，Python参数可能不会用于控制图构造。在这些情况下，Python值的变化可能会触发不必要的回溯。举例来说，这个训练循环，AutoGraph将动态展开。尽管存在多个跟踪，但生成的图实际上是相同的，因此这有点低效。\n\n```python\ndef train_one_step():\n  pass\n\n@tf.function\ndef train(num_steps):\n  print(\"Tracing with num_steps = {}\".format(num_steps))\n  for _ in tf.range(num_steps):\n    train_one_step()\n\ntrain(num_steps=10)\ntrain(num_steps=20)\n```\n\n输出：\n\n```output\n      Tracing with num_steps = 10 Tracing with num_steps = 20\n```\n\n如果它们不影响生成的图的形状，简单的解决方法是将参数转换为Tensors。\n\n```python\ntrain(num_steps=tf.constant(10))\ntrain(num_steps=tf.constant(20))\n```\n\n输出：\n\n```output\n      Tracing with num_steps = Tensor(\"num_steps:0\", shape=(), dtype=int32)\n```\n\n## 4. `tf.function`中的副作用\n\n> “副作用” 指在满足主要功能（主作用？）的同时，顺便完成了一些其他的副要功能”，也可翻译为“附作用”\n\n通常，Python附作用（如打印或变异对象）仅在跟踪期间发生。那你如何可靠地触发`tf.function`的附作用呢？\n\n一般的经验法则是仅使用Python副作用来调试跟踪。另外，TensorFlow操作如`tf.Variable.assign`，`tf.print`和`tf.summary`是确保TensorFlow运行时，在每次调用时，跟踪和执行代码的最佳方法。通常使用函数样式将产生最佳效果。\n\n```python\n@tf.function\ndef f(x):\n  print(\"Traced with\", x)\n  tf.print(\"Executed with\", x)\n\nf(1)\nf(1)\nf(2)\n\n```\n\n输出：\n\n```output\n  Traced with 1 Executed with 1 Executed with 1 \n  Traced with 2 Executed with 2\n```\n\n如果你想在每次调用 `tf.function` 期间执行Python代码，`tf.py_function`就是一个退出舱口。`tf.py_function`的缺点是它不可移植或特别高效，也不能在分布式（多GPU，TPU）设置中很好地工作。此外，由于必须将`tf.py_function`连接到图中，它会将所有输入/输出转换为张量。\n\n```python\nexternal_list = []\n\ndef side_effect(x):\n  print('Python side effect')\n  external_list.append(x)\n\n@tf.function\ndef f(x):\n  tf.py_function(side_effect, inp=[x], Tout=[])\n\nf(1)\nf(1)\nf(1)\nassert len(external_list) == 3\n# .numpy() call required because py_function casts 1 to tf.constant(1)\nassert external_list[0].numpy() == 1\n\n```\n\n## 5. 谨防Python状态\n\n许多Python功能（如生成器和迭代器）依赖于Python运行时来跟踪状态。通常，虽然这些构造在Eager模式下按预期工作，但由于跟踪行为，在`tf.function`中会发生许多意外情况。\n\n举一个例子，推进迭代器状态是一个Python副作用，因此只在跟踪期间发生。\n\n```python\nexternal_var = tf.Variable(0)\n@tf.function\ndef buggy_consume_next(iterator):\n  external_var.assign_add(next(iterator))\n  tf.print(\"Value of external_var:\", external_var)\n\niterator = iter([0, 1, 2, 3])\nbuggy_consume_next(iterator)\n# This reuses the first value from the iterator, rather than consuming the next value.\nbuggy_consume_next(iterator)\nbuggy_consume_next(iterator)\n\n```\n\n如果在tf.function中生成并完全使用了迭代器，那么它应该可以正常工作。但是，整个迭代器可能正在被跟踪，这可能导致一个巨大的图。这可能就是你想要的。但是如果你正在训练一个表示为Python列表的大型内存数据集，那么这可以生成一个非常大的图，并且`tf.function`不太可能产生加速。\n\n如果你想迭代Python数据，最安全的方法是将它包装在tf.data.Dataset中并使用`for x in y`惯用法。当`y`是张量或tf.data.Dataset时，AutoGraph特别支持安全地转换`for`循环。\n\n```python\ndef measure_graph_size(f, *args):\n  g = f.get_concrete_function(*args).graph\n  print(\"{}({}) contains {} nodes in its graph\".format(\n      f.__name__, ', '.join(map(str, args)), len(g.as_graph_def().node)))\n\n@tf.function\ndef train(dataset):\n  loss = tf.constant(0)\n  for x, y in dataset:\n    loss += tf.abs(y - x) # Some dummy computation.\n  return loss\n\nsmall_data = [(1, 1)] * 2\nbig_data = [(1, 1)] * 10\nmeasure_graph_size(train, small_data)\nmeasure_graph_size(train, big_data)\n\nmeasure_graph_size(train, tf.data.Dataset.from_generator(\n    lambda: small_data, (tf.int32, tf.int32)))\nmeasure_graph_size(train, tf.data.Dataset.from_generator(\n    lambda: big_data, (tf.int32, tf.int32)))\n```\n\n在数据集中包装Python / Numpy数据时，请注意 `tf.data.Dataset.from_generator` 与 `tf.data.Dataset.from_tensors`。前者将数据保存在Python中并通过 `tf.py_function` 获取，这可能会影响性能，而后者会将数据的副本捆绑为图中的一个大的 `tf.constant()` 节点，这可以有记忆含义。\n\n通过 TFRecordDataset/CsvDataset等从文件中读取数据，是最有效的数据消费方式，因为TensorFlow本身可以管理数据的异步加载和预取，而不必涉及Python。\n\n## 6. 自动控制依赖项\n\n在一般数据流图上，作为编程模型的函数，一个非常吸引人的特性是函数可以为运行时提供有关代码预期行为的更多信息。\n\n例如，当编写具有多个读取和写入相同变量的代码时，数据流图可能不会自然地编码最初预期的操作顺序。在`tf.function`中，我们通过引用原始Python代码中语句的执行顺序来解决执行顺序中的歧义。这样，`tf.function` 中的有状态操作的排序复制了Eager模式的语义。\n\n这意味着不需要添加手动控制依赖项;`tf.function`非常智能，可以为代码添加最小的必要和足够的控制依赖关系，以便正确运行。\n\n```python\n# Automatic control dependencies\n\na = tf.Variable(1.0)\nb = tf.Variable(2.0)\n\n@tf.function\ndef f(x, y):\n  a.assign(y * b)\n  b.assign_add(x * a)\n  return a + b\n\nf(1.0, 2.0)  # 10.0\n\n```\n\n输出：\n\n```output\n      <tf.Tensor: id=466, shape=(), dtype=float32, numpy=10.0>\n```\n\n## 7. 变量\n\n我们可以使用相同的想法来利用代码的预期执行顺序，以便在`tf.function`中非常容易地创建和使用变量。但是有一个非常重要的警告，即使用变量，可以编写在急切模式和图形模式下表现不同的代码。\n\n具体来说，每次调用创建一个新变量时都会发生这种情况。由于跟踪语义，`tf.function`将在每次调用时重用相同的变量，但是eager模式将在每次调用时创建一个新变量。为了防止这个错误，`tf.function`会在检测到危险变量创建行为时引发错误。\n\n```python\n@tf.function\ndef f(x):\n  v = tf.Variable(1.0)\n  v.assign_add(x)\n  return v\n\nwith assert_raises(ValueError):\n  f(1.0)\n```\n\n输出：\n\n```output\n      Caught expected exception <class 'ValueError'>: tf.function-decorated function tried to create variables on non-first call.\n```\n\n```python\n# Non-ambiguous code is ok though\n\nv = tf.Variable(1.0)\n\n@tf.function\ndef f(x):\n  return v.assign_add(x)\n\nprint(f(1.0))  # 2.0\nprint(f(2.0))  # 4.0\n\n```\n\n输出：\n\n```output\n      tf.Tensor(2.0, shape=(), dtype=float32) \n      tf.Tensor(4.0, shape=(), dtype=float32)\n```\n\n\n```python\n# You can also create variables inside a tf.function as long as we can prove\n# that those variables are created only the first time the function is executed.\n\nclass C: pass\nobj = C(); obj.v = None\n\n@tf.function\ndef g(x):\n  if obj.v is None:\n    obj.v = tf.Variable(1.0)\n  return obj.v.assign_add(x)\n\nprint(g(1.0))  # 2.0\nprint(g(2.0))  # 4.0\n```\n\n输出：\n\n```output\n      tf.Tensor(2.0, shape=(), dtype=float32) \n      tf.Tensor(4.0, shape=(), dtype=float32)\n```\n\n```python\n# Variable initializers can depend on function arguments and on values of other\n# variables. We can figure out the right initialization order using the same\n# method we use to generate control dependencies.\n\nstate = []\n@tf.function\ndef fn(x):\n  if not state:\n    state.append(tf.Variable(2.0 * x))\n    state.append(tf.Variable(state[0] * 3.0))\n  return state[0] * x * state[1]\n\nprint(fn(tf.constant(1.0)))\nprint(fn(tf.constant(3.0)))\n\n```\n\n输出：\n\n```output\n      tf.Tensor(12.0, shape=(), dtype=float32) \n      tf.Tensor(36.0, shape=(), dtype=float32)\n```\n\n# 使用 AutoGraph\n\n[autograph](https://www.tensorflow.org/guide/autograph) 库与`tf.function`完全集成，它将重写依赖于Tensors的条件和循环，以便在图中动态运行。\n\n`tf.cond`和`tf.while_loop`继续使用`tf.function`，但是当以命令式方式编写时，具有控制流的代码通常更容易编写和理解。\n\n```python\n# Simple loop\n\n@tf.function\ndef f(x):\n  while tf.reduce_sum(x) > 1:\n    tf.print(x)\n    x = tf.tanh(x)\n  return x\n\nf(tf.random.uniform([5]))\n```\n\n\n```python\n# If you're curious you can inspect the code autograph generates.\n# It feels like reading assembly language, though.\n\ndef f(x):\n  while tf.reduce_sum(x) > 1:\n    tf.print(x)\n    x = tf.tanh(x)\n  return x\n\nprint(tf.autograph.to_code(f))\n```\n\n## 8. AutoGraph：条件\n\nAutoGraph会将`if`语句转换为等效的`tf.cond`调用。\n如果条件是Tensor，则进行此替换。否则，在跟踪期间执行条件。\n\n```python\ndef test_tf_cond(f, *args):\n  g = f.get_concrete_function(*args).graph\n  if any(node.name == 'cond' for node in g.as_graph_def().node):\n    print(\"{}({}) uses tf.cond.\".format(\n        f.__name__, ', '.join(map(str, args))))\n  else:\n    print(\"{}({}) executes normally.\".format(\n        f.__name__, ', '.join(map(str, args))))\n\n```\n\n\n```python\n@tf.function\ndef hyperparam_cond(x, training=True):\n  if training:\n    x = tf.nn.dropout(x, rate=0.5)\n  return x\n\n@tf.function\ndef maybe_tensor_cond(x):\n  if x < 0:\n    x = -x\n  return x\n\ntest_tf_cond(hyperparam_cond, tf.ones([1], dtype=tf.float32))\ntest_tf_cond(maybe_tensor_cond, tf.constant(-1))\ntest_tf_cond(maybe_tensor_cond, -1)\n\n```\n\n`tf.cond`有许多微妙之处。\n\n- 它的工作原理是跟踪条件的两边，然后根据条件在运行时选择适当的分支。跟踪双方可能导致意外执行Python代码\n\n- 它要求如果一个分支创建下游使用的张量，另一个分支也必须创建该张量。\n\n```python\n@tf.function\ndef f():\n  x = tf.constant(0)\n  if tf.constant(True):\n    x = x + 1\n    print(\"Tracing `then` branch\")\n  else:\n    x = x - 1\n    print(\"Tracing `else` branch\")\n  return x\n\nf()\n```\n\n\n```python\n@tf.function\ndef f():\n  if tf.constant(True):\n    x = tf.ones([3, 3])\n  return x\n\n# Throws an error because both branches need to define `x`.\nwith assert_raises(ValueError):\n  f()\n```\n\n## 9. AutoGraph和循环\n\nAutoGraph有一些简单的转换循环规则。\n\n- `for`: 如果iterable是张量，则转换\n\n- `while`: 如果while条件取决于张量，则转换\n\n如果转换了循环，它将使用`tf.while_loop`动态展开，或者在 `for x in tf.data.Dataset` 的特殊情况下，转换为 `tf.data.Dataset.reduce`。\n\n如果未转换循环，则将静态展开。\n\n```python\ndef test_dynamically_unrolled(f, *args):\n  g = f.get_concrete_function(*args).graph\n  if any(node.name == 'while' for node in g.as_graph_def().node):\n    print(\"{}({}) uses tf.while_loop.\".format(\n        f.__name__, ', '.join(map(str, args))))\n  elif any(node.name == 'ReduceDataset' for node in g.as_graph_def().node):\n    print(\"{}({}) uses tf.data.Dataset.reduce.\".format(\n        f.__name__, ', '.join(map(str, args))))\n  else:\n    print(\"{}({}) gets unrolled.\".format(\n        f.__name__, ', '.join(map(str, args))))\n\n```\n\n\n```python\n@tf.function\ndef for_in_range():\n  x = 0\n  for i in range(5):\n    x += i\n  return x\n\n@tf.function\ndef for_in_tfrange():\n  x = tf.constant(0, dtype=tf.int32)\n  for i in tf.range(5):\n    x += i\n  return x\n\n@tf.function\ndef for_in_tfdataset():\n  x = tf.constant(0, dtype=tf.int64)\n  for i in tf.data.Dataset.range(5):\n    x += i\n  return x\n\ntest_dynamically_unrolled(for_in_range)\ntest_dynamically_unrolled(for_in_tfrange)\ntest_dynamically_unrolled(for_in_tfdataset)\n\n```\n\n输出：\n\n```output\n      for_in_range() gets unrolled. \n      for_in_tfrange() uses tf.while_loop. \n      for_in_tfdataset() uses tf.data.Dataset.reduce.\n```\n\n```python\n@tf.function\ndef while_py_cond():\n  x = 5\n  while x > 0:\n    x -= 1\n  return x\n\n@tf.function\ndef while_tf_cond():\n  x = tf.constant(5)\n  while x > 0:\n    x -= 1\n  return x\n\ntest_dynamically_unrolled(while_py_cond)\ntest_dynamically_unrolled(while_tf_cond)\n```\n\n输出：\n\n```output\n      while_py_cond() gets unrolled. \n      while_tf_cond() uses tf.while_loop.\n```\n\n如果你有一个取决于张量的`break`或早期`return`子句，那么顶级条件或者iterable也应该是一个张量。\n\n```python\n@tf.function\ndef buggy_while_py_true_tf_break(x):\n  while True:\n    if tf.equal(x, 0):\n      break\n    x -= 1\n  return x\n\n@tf.function\ndef while_tf_true_tf_break(x):\n  while tf.constant(True):\n    if tf.equal(x, 0):\n      break\n    x -= 1\n  return x\n\nwith assert_raises(TypeError):\n  test_dynamically_unrolled(buggy_while_py_true_tf_break, 5)\ntest_dynamically_unrolled(while_tf_true_tf_break, 5)\n\n@tf.function\ndef buggy_py_for_tf_break():\n  x = 0\n  for i in range(5):\n    if tf.equal(i, 3):\n      break\n    x += i\n  return x\n\n@tf.function\ndef tf_for_tf_break():\n  x = 0\n  for i in tf.range(5):\n    if tf.equal(i, 3):\n      break\n    x += i\n  return x\n\nwith assert_raises(TypeError):\n  test_dynamically_unrolled(buggy_py_for_tf_break)\ntest_dynamically_unrolled(tf_for_tf_break)\n\n\n\n```\n\n为了累积动态展开循环的结果，你需要使用`tf.TensorArray`。\n\n```python\nbatch_size = 2\nseq_len = 3\nfeature_size = 4\n\ndef rnn_step(inp, state):\n  return inp + state\n\n@tf.function\ndef dynamic_rnn(rnn_step, input_data, initial_state):\n  # [batch, time, features] -> [time, batch, features]\n  input_data = tf.transpose(input_data, [1, 0, 2])\n  max_seq_len = input_data.shape[0]\n\n  states = tf.TensorArray(tf.float32, size=max_seq_len)\n  state = initial_state\n  for i in tf.range(max_seq_len):\n    state = rnn_step(input_data[i], state)\n    states = states.write(i, state)\n  return tf.transpose(states.stack(), [1, 0, 2])\n  \ndynamic_rnn(rnn_step,\n            tf.random.uniform([batch_size, seq_len, feature_size]),\n            tf.zeros([batch_size, feature_size]))\n```\n\n与`tf.cond`一样，`tf.while_loop`也带有许多细微之处。\n\n- 由于循环可以执行0次，因此必须在循环上方初始化在while_loop下游使用的所有张量 \n\n- 所有循环变量的shape/dtypes必须与每次迭代保持一致\n\n```python\n@tf.function\ndef buggy_loop_var_uninitialized():\n  for i in tf.range(3):\n    x = i\n  return x\n\n@tf.function\ndef f():\n  x = tf.constant(0)\n  for i in tf.range(3):\n    x = i\n  return x\n\nwith assert_raises(ValueError):\n  buggy_loop_var_uninitialized()\nf()\n```\n\n\n```python\n@tf.function\ndef buggy_loop_type_changes():\n  x = tf.constant(0, dtype=tf.float32)\n  for i in tf.range(3): # Yields tensors of type tf.int32...\n    x = i\n  return x\n\nwith assert_raises(tf.errors.InvalidArgumentError):\n  buggy_loop_type_changes()\n```\n\n\n```python\n@tf.function\ndef buggy_concat():\n  x = tf.ones([0, 10])\n  for i in tf.range(5):\n    x = tf.concat([x, tf.ones([1, 10])], axis=0)\n  return x\n\nwith assert_raises(ValueError):\n  buggy_concat()\n  \n@tf.function\ndef concat_with_padding():\n  x = tf.zeros([5, 10])\n  for i in tf.range(5):\n    x = tf.concat([x[:i], tf.ones([1, 10]), tf.zeros([4-i, 10])], axis=0)\n    x.set_shape([5, 10])\n  return x\n\nconcat_with_padding()\n\n```\n\n## 10. 下一步\n\n现在重新访问早期的教程并尝试使用 `tf.function` 加速代码！\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-eager-tf_function.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-eager-tf_function.html)\n> 英文版本：[https://tensorflow.google.cn/beta/tutorials/eager/tf_function](https://tensorflow.google.cn/beta/tutorials/eager/tf_function)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/eager/tf_function.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/eager/tf_function.md)\n"
  },
  {
    "path": "r2/tutorials/estimators/linear.md",
    "content": "---\ntitle: 使用 Estimator 构建线性模型\ntags: \n    - tensorflow2.0\ncategories: \n    - tensorflow2官方教程\ntop: 1929\nabbrlink: tensorflow/tf2-tutorials-estimators-linear\n---\n\n# 使用 Estimator 构建线性模型\n\n## 1. 概述\n\n这个端到端的演练使用`tf.estimator` API训练逻辑回归模型。该模型通常用作其他更复杂算法的基准。\nEstimator 是可扩展性最强且面向生产的 TensorFlow 模型类型。如需了解详情，请参阅 [Estimator 指南](https://www.tensorflow.org/guide/estimators)。\n\n## 2. 安装和导入\n\n安装sklearn命令:  `pip install sklearn`\n\n```python\nfrom __future__ import absolute_import, division, print_function, unicode_literals\n\nimport os\nimport sys\n\nimport numpy as np\nimport pandas as pd\nimport matplotlib.pyplot as plt\nfrom IPython.display import clear_output\nfrom six.moves import urllib\n```\n\n## 3. 加载泰坦尼克号数据集\n\n您将使用泰坦尼克数据集，其以预测乘客的生存(相当病态)为目标，给出性别、年龄、阶级等特征。\n\n```python\nimport tensorflow.compat.v2.feature_column as fc\n\nimport tensorflow as tf\n\n# 加载数据集\ndftrain = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/train.csv')\ndfeval = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/eval.csv')\ny_train = dftrain.pop('survived')\ny_eval = dfeval.pop('survived')\n```\n\n## 4. 探索数据\n\n数据集包含以下特征：\n\n```python\ndftrain.head()\n```\n\n|   | sex    | age  | n_siblings_spouses | parch | fare    | class | deck    | embark_town | alone |\n|---|--------|------|--------------------|-------|---------|-------|---------|-------------|-------|\n| 0 | male   | 22.0 | 1                  | 0     | 7.2500  | Third | unknown | Southampton | n     |\n| 1 | female | 38.0 | 1                  | 0     | 71.2833 | First | C       | Cherbourg   | n     |\n| 2 | female | 26.0 | 0                  | 0     | 7.9250  | Third | unknown | Southampton | y     |\n| 3 | female | 35.0 | 1                  | 0     | 53.1000 | First | C       | Southampton | n     |\n| 4 | male   | 28.0 | 0                  | 0     | 8.4583  | Third | unknown | Queenstown  | y     |\n\n\n```python\ndftrain.describe()\n```\n\n|       | age        | n_siblings_spouses | parch      | fare       |\n|-------|------------|--------------------|------------|------------|\n| count | 627.000000 | 627.000000         | 627.000000 | 627.000000 |\n| mean  | 29.631308  | 0.545455           | 0.379585   | 34.385399  |\n| std   | 12.511818  | 1.151090           | 0.792999   | 54.597730  |\n| min   | 0.750000   | 0.000000           | 0.000000   | 0.000000   |\n| 25%   | 23.000000  | 0.000000           | 0.000000   | 7.895800   |\n| 50%   | 28.000000  | 0.000000           | 0.000000   | 15.045800  |\n| 75%   | 35.000000  | 1.000000           | 0.000000   | 31.387500  |\n| max   | 80.000000  | 8.000000           | 5.000000   | 512.329200 |\n\n\n训练和评估集分别有627和264个样本数据：\n\n```python\ndftrain.shape[0], dfeval.shape[0]\n```\n\n```\n      (627, 264)\n```\n\n大多数乘客都在20和30年代\n\n```python\ndftrain.age.hist(bins=20)\n```\n\n![png](https://tensorflow.google.cn/beta/tutorials/estimators/linear_files/output_15_1.png)\n\n\n机上的男性乘客大约是女性乘客的两倍。\n\n```python\ndftrain.sex.value_counts().plot(kind='barh')\n```\n\n![png](https://tensorflow.google.cn/beta/tutorials/estimators/linear_files/output_17_1.png)\n\n\n大多数乘客都在“第三”阶级：\n\n```python\ndftrain['class'].value_counts().plot(kind='barh')\n```\n\n![png](https://tensorflow.google.cn/beta/tutorials/estimators/linear_files/output_19_1.png)\n\n\n与男性相比，女性的生存机会要高得多，这显然是该模型的预测特征：\n\n```python\npd.concat([dftrain, y_train], axis=1).groupby('sex').survived.mean().plot(kind='barh').set_xlabel('% survive')\n```\n\n![png](https://tensorflow.google.cn/beta/tutorials/estimators/linear_files/output_21_1.png)\n\n\n## 5. 模型的特征工程\n\nEstimator使用称为[特征列](https://www.tensorflow.org/guide/feature_columns)的系统来描述模型应如何解释每个原始输入特征，Estimator需要一个数字输入向量，而特征列描述模型应如何转换每个特征。\n\n选择和制作正确的特征列是学习有效模型的关键，特征列可以是原始特征`dict`（基本特征列）中的原始输入之一，也可以是使用在一个或多个基本列（派生特征列）上定义的转换创建的任何新列。\n\n线性Estimator同时使用数值和分类特征，特征列适用于所有TensorFlow Estimator，它们的目的是定义用于建模的特征。此外，它们还提供了一些特征工程功能，比如独热编码、归一化和分桶。\n\n\n### 5.1. 基本特征列\n\n```python\nCATEGORICAL_COLUMNS = ['sex', 'n_siblings_spouses', 'parch', 'class', 'deck',\n                       'embark_town', 'alone']\nNUMERIC_COLUMNS = ['age', 'fare']\n\nfeature_columns = []\nfor feature_name in CATEGORICAL_COLUMNS:\n  vocabulary = dftrain[feature_name].unique()\n  feature_columns.append(tf.feature_column.categorical_column_with_vocabulary_list(feature_name, vocabulary))\n\nfor feature_name in NUMERIC_COLUMNS:\n  feature_columns.append(tf.feature_column.numeric_column(feature_name, dtype=tf.float32))\n```\n\n`input_function`指定如何将数据转换为以流方式提供输入管道的`tf.data.Dataset`。`tf.data.Dataset`采用多种来源，如数据帧DataFrame，csv格式的文件等。\n\n```python\ndef make_input_fn(data_df, label_df, num_epochs=10, shuffle=True, batch_size=32):\n  def input_function():\n    ds = tf.data.Dataset.from_tensor_slices((dict(data_df), label_df))\n    if shuffle:\n      ds = ds.shuffle(1000)\n    ds = ds.batch(batch_size).repeat(num_epochs)\n    return ds\n  return input_function\n\ntrain_input_fn = make_input_fn(dftrain, y_train)\neval_input_fn = make_input_fn(dfeval, y_eval, num_epochs=1, shuffle=False)\n```\n\n检查数据集：\n\n```python\nds = make_input_fn(dftrain, y_train, batch_size=10)()\nfor feature_batch, label_batch in ds.take(1):\n  print('Some feature keys:', list(feature_batch.keys()))\n  print()\n  print('A batch of class:', feature_batch['class'].numpy())\n  print()\n  print('A batch of Labels:', label_batch.numpy())\n```\n\n您还可以使用`tf.keras.layers.DenseFeatures`层检查特征列的结果：\n\n```python\nage_column = feature_columns[7]\ntf.keras.layers.DenseFeatures([age_column])(feature_batch).numpy()\n```\n\n```\n      array([[38.],\n             [39.],\n             [28.],\n             [28.],\n             [36.],\n             [71.],\n             [24.],\n             [47.],\n             [23.],\n             [28.]], dtype=float32)\n```\n\n`DenseFeatures`只接受密集张量，要检查分类列，需要先将其转换为指示列：\n\n```python\ngender_column = feature_columns[0]\ntf.keras.layers.DenseFeatures([tf.feature_column.indicator_column(gender_column)])(feature_batch).numpy()\n```\n\n```\n      array([[0., 1.],\n             [0., 1.],\n             [1., 0.],\n             [0., 1.],\n             [1., 0.],\n             [1., 0.],\n             [1., 0.],\n             [1., 0.],\n             [1., 0.],\n             [0., 1.]], dtype=float32)\n```       \n\n将所有基本特征添加到模型后，让我们训练模型。使用`tf.estimator` API训练模型只是一个命令：\n\n```python\nlinear_est = tf.estimator.LinearClassifier(feature_columns=feature_columns)\nlinear_est.train(train_input_fn)\nresult = linear_est.evaluate(eval_input_fn)\n\nclear_output()\nprint(result)\n```\n\n```\n        {'accuracy_baseline': 0.625, 'auc': 0.83722067, 'accuracy': 0.7462121, 'recall': 0.6666667, 'global_step': 200, 'prediction/mean': 0.38311505, 'average_loss': 0.47361037, 'precision': 0.66, 'auc_precision_recall': 0.7851523, 'loss': 0.46608958, 'label/mean': 0.375}\n```\n\n### 5.2. 派生特征列\n\n现在你达到了75％的准确率。单独使用每个基本功能列可能不足以解释数据。例如，性别和标签之间的相关性可能因性别不同而不同。因此，如果您只学习`gender=\"Male\"`和`gender=\"Female\"`的单一模型权重，您将无法捕捉每个年龄-性别组合（例如，区分`gender=\"Male\"`和`age=\"30\"` 和`gender=\"Male\"`和 `age=\"40\"`）。\n\n要了解不同特征组合之间的差异，可以将交叉特征列添加到模型中（也可以在交叉列之前对年龄进行分桶）：\n\n```python\nage_x_gender = tf.feature_column.crossed_column(['age', 'sex'], hash_bucket_size=100)\n```\n\n将组合特征添加到模型之后，让我们再次训练模型：\n\n```python\nderived_feature_columns = [age_x_gender]\nlinear_est = tf.estimator.LinearClassifier(feature_columns=feature_columns+derived_feature_columns)\nlinear_est.train(train_input_fn)\nresult = linear_est.evaluate(eval_input_fn)\n\nclear_output()\nprint(result)\n```\n\n```\n      {'accuracy_baseline': 0.625, 'auc': 0.8424855, 'accuracy': 0.7689394, 'recall': 0.6060606, 'global_step': 200, 'prediction/mean': 0.30415845, 'average_loss': 0.49316654, 'precision': 0.73170733, 'auc_precision_recall': 0.7732599, 'loss': 0.48306185, 'label/mean': 0.375}\n```      \n\n它现在到达了77.6%的准确度，略好于仅在基本特征方面受过训练，您可以尝试使用更多特征和转换，看看您是否可以做得更好。\n\n现在，您可以使用训练模型从评估集对乘客进行预测。TensorFlow模型经过优化，可以同时对样本的批处理或集合进行预测，之前的`eval_input_fn`是使用整个评估集定义的。\n\n```python\npred_dicts = list(linear_est.predict(eval_input_fn))\nprobs = pd.Series([pred['probabilities'][1] for pred in pred_dicts])\n\nprobs.plot(kind='hist', bins=20, title='predicted probabilities')\n```\n\n![png](https://tensorflow.google.cn/beta/tutorials/estimators/linear_files/output_42_1.png)\n\n最后，查看结果的接收器操作特性（即ROC），这将使我们更好地了解真阳性率和假阳性率之间的权衡。\n\n```python\nfrom sklearn.metrics import roc_curve\nfrom matplotlib import pyplot as plt\n\nfpr, tpr, _ = roc_curve(y_eval, probs)\nplt.plot(fpr, tpr)\nplt.title('ROC curve')\nplt.xlabel('false positive rate')\nplt.ylabel('true positive rate')\nplt.xlim(0,)\nplt.ylim(0,)\n```\n\n`(0, 1.05)`\n\n![png](https://tensorflow.google.cn/beta/tutorials/estimators/linear_files/output_44_1.png)\n\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-estimators-linear.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-estimators-linear.html)\n> 英文版本：[https://tensorflow.google.cn/beta/tutorials/estimators/linear](https://tensorflow.google.cn/beta/tutorials/estimators/linear)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/estimators/linear.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/estimators/linear.md)"
  },
  {
    "path": "r2/tutorials/images/hub_with_keras.md",
    "content": "---\ntitle: 基于Keras使用TensorFlow Hub实现迁移学习\ntags: tensorflow2.0教程\ncategories: tensorflow2官方教程\ntop: 1922\nabbrlink: tensorflow/tf2-tutorials-images-hub_with_keras\n---\n\n# 基于Keras使用TensorFlow Hub实现迁移学习(tensorflow2.0官方教程翻译)\n\n[TensorFlow Hub](http://tensorflow.google.cn/hub)是一种共享预训练模型组件的方法。\n\n> TensorFlow Hub是一个用于促进机器学习模型的可重用部分的发布，探索和使用的库。特别是，它提供经过预先训练的TensorFlow模型，可以在新任务中重复使用。（可以理解为做迁移学习：可以使用较小的数据集训练模型，可以改善泛化和加快训练。）GitHub 地址：[https://github.com/tensorflow/hub](https://github.com/tensorflow/hub)\n\n有关预先训练模型的可搜索列表，请参阅[TensorFlow模块中心TensorFlow Module Hub](https://tfhub.dev/)。\n\n本教程演示：\n1. 如何在tf.keras中使用TensorFlow Hub。\n2. 如何使用TensorFlow Hub进行图像分类。\n3. 如何做简单的迁移学习。\n\n## 1. 安装和导入包\n\n安装命令：`pip install -U tensorflow_hub`\n\n```python\nfrom __future__ import absolute_import, division, print_function, unicode_literals\n\nimport matplotlib.pylab as plt\n\nimport tensorflow as tf\n \nimport tensorflow_hub as hub\n\nfrom tensorflow.keras import layers\n```\n\n## 2. ImageNet分类器\n\n### 2.1. 下载分类器\n\n使用`hub.module`加载mobilenet，并使用`tf.keras.layers.Lambda`将其包装为keras层。\n来自tfhub.dev的任何兼容tf2的[图像分类器URL](https://tfhub.dev/s?q=tf2&module-type=image-classification)都可以在这里工作。\n\n```python\nclassifier_url =\"https://tfhub.dev/google/tf2-preview/mobilenet_v2/classification/2\" #@param {type:\"string\"}\n\nIMAGE_SHAPE = (224, 224)\n\nclassifier = tf.keras.Sequential([\n    hub.KerasLayer(classifier_url, input_shape=IMAGE_SHAPE+(3,))\n])\n```\n\n### 2.2. 在单个图像上运行它\n\n下载单个图像以试用该模型。\n\n```python\nimport numpy as np\nimport PIL.Image as Image\n\ngrace_hopper = tf.keras.utils.get_file('image.jpg','https://storage.googleapis.com/download.tensorflow.org/example_images/grace_hopper.jpg')\ngrace_hopper = Image.open(grace_hopper).resize(IMAGE_SHAPE)\ngrace_hopper = np.array(grace_hopper)/255.0\ngrace_hopper.shape\n```\n`(224, 224, 3)`\n\n添加批量维度，并将图像传递给模型。\n\n```python\nresult = classifier.predict(grace_hopper[np.newaxis, ...])\nresult.shape\n```\n\n结果是1001元素向量的`logits`，对图像属于每个类的概率进行评级。因此，可以使用`argmax`找到排在最前的类别ID：\n\n```python\npredicted_class = np.argmax(result[0], axis=-1)\npredicted_class\n```\n```\n653\n```\n\n### 2.3. 解码预测\n\n\n我们有预测的类别ID，获取`ImageNet`标签，并解码预测\n\n```python\nlabels_path = tf.keras.utils.get_file('ImageNetLabels.txt','https://storage.googleapis.com/download.tensorflow.org/data/ImageNetLabels.txt')\nimagenet_labels = np.array(open(labels_path).read().splitlines())\n\nplt.imshow(grace_hopper)\nplt.axis('off')\npredicted_class_name = imagenet_labels[predicted_class]\n_ = plt.title(\"Prediction: \" + predicted_class_name.title())\n```\n![png](https://tensorflow.google.cn/beta/tutorials/images/hub_with_keras_files/output_20_0.png)\n\n## 3. 简单的迁移学习\n\n使用TF Hub可以很容易地重新训练模型的顶层以识别数据集中的类。\n\n### 3.1. Dataset\n\n对于此示例，您将使用TensorFlow鲜花数据集：\n\n```python\ndata_root = tf.keras.utils.get_file(\n  'flower_photos','https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz',\n   untar=True)\n```\n\n将此数据加载到我们的模型中的最简单方法是使用 `tf.keras.preprocessing.image.ImageDataGenerator`,\n\n所有TensorFlow Hub的图像模块都期望浮点输入在“[0,1]”范围内。使用`ImageDataGenerator`的`rescale`参数来实现这一目的。图像大小将在稍后处理。\n\n```python\nimage_generator = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1/255)\nimage_data = image_generator.flow_from_directory(str(data_root), target_size=IMAGE_SHAPE)\n```\n\n```\n    Found 3670 images belonging to 5 classes.\n```\n结果对象是一个返回`image_batch，label_batch`对的迭代器。\n\n```python\nfor image_batch, label_batch in image_data:\n  print(\"Image batch shape: \", image_batch.shape)\n  print(\"Labe batch shape: \", label_batch.shape)\n  break\n```\n\n```\n    Image batch shape:  (32, 224, 224, 3)\n    Labe batch shape:  (32, 5)\n```\n\n### 3.2. 在一批图像上运行分类器\n\n现在在图像批处理上运行分类器。\n\n\n```python\nresult_batch = classifier.predict(image_batch)\nresult_batch.shape  # (32, 1001)\n\npredicted_class_names = imagenet_labels[np.argmax(result_batch, axis=-1)]\npredicted_class_names\n```\n\n```\n      array(['daisy', 'sea urchin', 'ant', 'hamper', 'daisy', 'ringlet',\n             'daisy', 'daisy', 'daisy', 'cardoon', 'lycaenid', 'sleeping bag',\n             'Bedlington terrier', 'daisy', 'daisy', 'picket fence',\n             'coral fungus', 'daisy', 'zucchini', 'daisy', 'daisy', 'bee',\n             'daisy', 'daisy', 'bee', 'daisy', 'picket fence', 'bell pepper',\n             'daisy', 'pot', 'wolf spider', 'greenhouse'], dtype='<U30')\n```\n\n现在检查这些预测如何与图像对齐：\n\n```python\nplt.figure(figsize=(10,9))\nplt.subplots_adjust(hspace=0.5)\nfor n in range(30):\n  plt.subplot(6,5,n+1)\n  plt.imshow(image_batch[n])\n  plt.title(predicted_class_names[n])\n  plt.axis('off')\n_ = plt.suptitle(\"ImageNet predictions\")\n```\n\n![png](https://tensorflow.google.cn/beta/tutorials/images/hub_with_keras_files/output_34_0.png)\n\n有关图像属性，请参阅`LICENSE.txt`文件。\n\n结果没有那么完美，但考虑到这些不是模型训练的类（“daisy雏菊”除外），这是合理的。\n\n### 3.3. 下载无头模型\n\nTensorFlow Hub还可以在没有顶级分类层的情况下分发模型。这些可以用来轻松做迁移学习。\n\n来自tfhub.dev的任何[Tensorflow 2兼容图像特征向量URL](https://tfhub.dev/s?module-type=image-feature-vector&q=tf2)都可以在此处使用。\n\n```python\nfeature_extractor_url = \"https://tfhub.dev/google/tf2-preview/mobilenet_v2/feature_vector/2\" #@param {type:\"string\"}\n```\n\n创建特征提取器。\n\n```python\nfeature_extractor_layer = hub.KerasLayer(feature_extractor_url,\n                                         input_shape=(224,224,3))\n```\n\n它为每个图像返回一个1280长度的向量：\n\n```python\nfeature_batch = feature_extractor_layer(image_batch)\nprint(feature_batch.shape)\n```\n`(32, 1280)`\n\n冻结特征提取器层中的变量，以便训练仅修改新的分类器层。\n\n```python\nfeature_extractor_layer.trainable = False\n```\n\n### 3.4. 附上分类头\n\n现在将中心层包装在`tf.keras.Sequential`模型中，并添加新的分类层。\n\n```python\nmodel = tf.keras.Sequential([\n  feature_extractor_layer,\n  layers.Dense(image_data.num_classes, activation='softmax')\n])\n\nmodel.summary()\n```\n```\n    Model: \"sequential_1\"\n    _________________________________________________________________\n    Layer (type)                 Output Shape              Param #   \n    =================================================================\n    keras_layer_1 (KerasLayer)   (None, 1280)              2257984   \n    _________________________________________________________________\n    dense (Dense)                (None, 5)                 6405      \n    =================================================================\n    Total params: 2,264,389\n    Trainable params: 6,405\n    Non-trainable params: 2,257,984\n    _________________________________________________________________\n```\n\n```python\npredictions = model(image_batch)\npredictions.shape\n```\n```\n    TensorShape([32, 5])\n```\n\n### 3.5. 训练模型\n\n使用compile配置训练过程：\n\n```python\nmodel.compile(\n  optimizer=tf.keras.optimizers.Adam(),\n  loss='categorical_crossentropy',\n  metrics=['acc'])\n```\n\n现在使用`.fit`方法训练模型。\n\n这个例子只是训练两个周期。要显示训练进度，请使用自定义回调单独记录每个批次的损失和准确性，而不是记录周期的平均值。\n\n```python\nclass CollectBatchStats(tf.keras.callbacks.Callback):\n  def __init__(self):\n    self.batch_losses = []\n    self.batch_acc = []\n\n  def on_train_batch_end(self, batch, logs=None):\n    self.batch_losses.append(logs['loss'])\n    self.batch_acc.append(logs['acc'])\n    self.model.reset_metrics()\n\nsteps_per_epoch = np.ceil(image_data.samples/image_data.batch_size)\n\nbatch_stats_callback = CollectBatchStats()\n\nhistory = model.fit(image_data, epochs=2,\n                    steps_per_epoch=steps_per_epoch,\n                    callbacks = [batch_stats_callback])\n```\n\n```\n    Epoch 1/2\n    115/115 [==============================] - 22s 193ms/step - loss: 0.8613 - acc: 0.8438\n    Epoch 2/2\n    115/115 [==============================] - 23s 199ms/step - loss: 0.5083 - acc: 0.7812\n```\n\n现在，即使只是几次训练迭代，我们已经可以看到模型正在完成任务。\n\n```python\nplt.figure()\nplt.ylabel(\"Loss\")\nplt.xlabel(\"Training Steps\")\nplt.ylim([0,2])\nplt.plot(batch_stats_callback.batch_losses)\n```\n\n![png](https://tensorflow.google.cn/beta/tutorials/images/hub_with_keras_files/output_53_1.png)\n\n```python\nplt.figure()\nplt.ylabel(\"Accuracy\")\nplt.xlabel(\"Training Steps\")\nplt.ylim([0,1])\nplt.plot(batch_stats_callback.batch_acc)\n```\n![png](https://tensorflow.google.cn/beta/tutorials/images/hub_with_keras_files/output_54_1.png?dcb_=0.5728569869098554)\n\n### 3.6. 检查预测\n\n要重做之前的图，首先获取有序的类名列表：\n\n```python\nclass_names = sorted(image_data.class_indices.items(), key=lambda pair:pair[1])\nclass_names = np.array([key.title() for key, value in class_names])\nclass_names\n```\n```\n    array(['Daisy', 'Dandelion', 'Roses', 'Sunflowers', 'Tulips'],\n          dtype='<U10')\n```\n\n通过模型运行图像批处理，并将索引转换为类名。\n\n```python\npredicted_batch = model.predict(image_batch)\npredicted_id = np.argmax(predicted_batch, axis=-1)\npredicted_label_batch = class_names[predicted_id]\n```\n\n绘制结果\n\n```python\nlabel_id = np.argmax(label_batch, axis=-1)\n\nplt.figure(figsize=(10,9))\nplt.subplots_adjust(hspace=0.5)\nfor n in range(30):\n  plt.subplot(6,5,n+1)\n  plt.imshow(image_batch[n])\n  color = \"green\" if predicted_id[n] == label_id[n] else \"red\"\n  plt.title(predicted_label_batch[n].title(), color=color)\n  plt.axis('off')\n_ = plt.suptitle(\"Model predictions (green: correct, red: incorrect)\")\n```\n\n![png](https://tensorflow.google.cn/beta/tutorials/images/hub_with_keras_files/output_61_0.png)\n\n## 4. 导出你的模型\n\n现在您已经训练了模型，将其导出为已保存的模型：\n\n```python\nimport time\nt = time.time()\n\nexport_path = \"/tmp/saved_models/{}\".format(int(t))\ntf.keras.experimental.export_saved_model(model, export_path)\n\nexport_path\n```\n```\n'/tmp/saved_models/1557794138'\n```\n\n现在确认我们可以重新加载它，它仍然给出相同的结果：\n\n```python\nreloaded = tf.keras.experimental.load_from_saved_model(export_path, custom_objects={'KerasLayer':hub.KerasLayer})\n\nresult_batch = model.predict(image_batch)\nreloaded_result_batch = reloaded.predict(image_batch)\n\nabs(reloaded_result_batch - result_batch).max()\n```\n`0.0`\n\n这个保存的模型可以在以后加载推理，或转换为[TFLite](https://www.tensorflow.google.cn/lite/convert/) 和 [TFjs](https://github.com/tensorflow/tfjs-converter)。\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-images-hub_with_keras.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-images-hub_with_keras.html)\n> 英文版本：[https://tensorflow.google.cn/beta/tutorials/images/hub_with_keras](https://tensorflow.google.cn/beta/tutorials/images/hub_with_keras)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/images/hub_with_keras.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/images/hub_with_keras.md)"
  },
  {
    "path": "r2/tutorials/images/intro_to_cnns.md",
    "content": "---\ntitle: 使用TF2.0实现卷积神经网络CNN对MNIST数字分类\ntags: tensorflow2.0教程\ncategories: tensorflow2官方教程\ntop: 1921\nabbrlink: tensorflow/tf2-tutorials-images-intro_to_cnns\n---\n\n# 使用TensorFlow2.0实现卷积神经网络CNN对MNIST数字分类 (tensorflow2.0官方教程翻译)\n\n本教程演示了如何训练简单的[卷积神经网络](https://developers.google.com/machine-learning/glossary/#convolutional_neural_network)（CNN）来对MNIST数字进行分类。这个简单的网络将在MNIST测试集上实现99％以上的准确率。因为本教程使用[Keras Sequential API](https://www.tensorflow.org/guide/keras)，所以创建和训练我们的模型只需几行代码。\n\n注意：CNN使用GPU训练更快。\n\n## 1. 导入TensorFlow\n\n\n```python\nfrom __future__ import absolute_import, division, print_function, unicode_literals\n\nimport tensorflow as tf\n\nfrom tensorflow.keras import datasets, layers, models\n```\n\n## 2. 下载预处理MNIST数据集\n\n```python\n(train_images, train_labels), (test_images, test_labels) = datasets.mnist.load_data()\n\ntrain_images = train_images.reshape((60000, 28, 28, 1))\ntest_images = test_images.reshape((10000, 28, 28, 1))\n\n# 特征缩放[0, 1]区间 \ntrain_images, test_images = train_images / 255.0, test_images / 255.0\n```\n\n## 3. 创建卷积基\n\n下面6行代码使用常见模式定义卷积基数： [Conv2D](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D) 和[MaxPooling2D](https://www.tensorflow.org/api_docs/python/tf/keras/layers/MaxPool2D)层的堆栈。\n\n作为输入，CNN采用形状的张量（image_height, image_width, color_channels），忽略批量大小。MNIST有一个颜色通道（因为图像是灰度的），而彩色图像有三个颜色通道（R,G,B）。在此示例中，我们将配置CNN以处理形状（28,28,1）的输入，这是MNIST图像的格式，我们通过将参数input_shape传递给第一层来完成此操作。\n\n```python\nmodel = models.Sequential()\nmodel.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))\nmodel.add(layers.MaxPooling2D((2, 2)))\nmodel.add(layers.Conv2D(64, (3, 3), activation='relu'))\nmodel.add(layers.MaxPooling2D((2, 2)))\nmodel.add(layers.Conv2D(64, (3, 3), activation='relu'))\nmodel.summary() # 显示模型的架构\n```\n\n```\nModel: \"sequential\"\n_________________________________________________________________\nLayer (type)                 Output Shape              Param #   \n=================================================================\nconv2d (Conv2D)              (None, 26, 26, 32)        320       \n_________________________________________________________________\nmax_pooling2d (MaxPooling2D) (None, 13, 13, 32)        0         \n_________________________________________________________________\nconv2d_1 (Conv2D)            (None, 11, 11, 64)        18496     \n_________________________________________________________________\nmax_pooling2d_1 (MaxPooling2 (None, 5, 5, 64)          0         \n_________________________________________________________________\nconv2d_2 (Conv2D)            (None, 3, 3, 64)          36928     \n=================================================================\n...\n```\n\n在上面，你可以看到每个Conv2D和MaxPooling2D层的输出都是3D张量的形状（高度，宽度，通道），随着我们在网络中更深入，宽度和高度大小趋于缩小，每个Conv2D层的输出通道的数由第一个参数（例如，32或64）控制。通常，随着宽度和高度的缩小，我们可以（计算地）在每个Conv2D层中添加更多输出通道\n\n## 4. 在顶部添加密集层\n\n为了完成我们的模型，我们将最后的输出张量从卷积基（形状(3,3,64)）馈送到一个或多个密集层中以执行分类。密集层将矢量作为输入（1D），而当前输出是3D张量。首先，我们将3D输出展平（或展开）为1D，然后在顶部添加一个或多个Dense层。MINST有10个输出类，因此我们使用具有10输出和softmax激活的最终Dense层。\n\n```python\nmodel.add(layers.Flatten())\nmodel.add(layers.Dense(64, activation='relu'))\nmodel.add(layers.Dense(10, activation='softmax'))\nmodel.summary() # 显示模型的架构\n```\n\n```\nModel: \"sequential\"\n_________________________________________________________________\nLayer (type)                 Output Shape              Param #   \n=================================================================\nconv2d (Conv2D)              (None, 26, 26, 32)        320       \n_________________________________________________________________\nmax_pooling2d (MaxPooling2D) (None, 13, 13, 32)        0         \n_________________________________________________________________\nconv2d_1 (Conv2D)            (None, 11, 11, 64)        18496     \n_________________________________________________________________\nmax_pooling2d_1 (MaxPooling2 (None, 5, 5, 64)          0         \n_________________________________________________________________\nconv2d_2 (Conv2D)            (None, 3, 3, 64)          36928     \n_________________________________________________________________\nflatten (Flatten)            (None, 576)               0         \n_________________________________________________________________\ndense (Dense)                (None, 64)                36928     \n_________________________________________________________________\ndense_1 (Dense)              (None, 10)                650       \n=================================================================\n...\n```\n\n从上面可以看出，在通过两个密集层之前，我们的(3,3,64)输出被展平为矢量（576）。\n\n## 5. 编译和训练模型\n\n\n```python\nmodel.compile(optimizer='adam',\n              loss='sparse_categorical_crossentropy',\n              metrics=['accuracy'])\n\nmodel.fit(train_images, train_labels, epochs=5)\n```\n\n```\n...\nEpoch 5/5\n60000/60000 [==============================] - 15s 258us/sample - loss: 0.0190 - accuracy: 0.9941\n```\n\n## 6. 评估模型\n\n```python\ntest_loss, test_acc = model.evaluate(test_images, test_labels)\n```\n```\n10000/10000 [==============================] - 1s 92us/sample - loss: 0.0272 - accuracy: 0.9921\n```\n\n```python\nprint(test_acc)\n```\n\n```\n    0.9921\n```\n\n如你所见，我们简单的CNN已经达到了超过99%的测试精度，这几行代码还不错。另一种编写CNN的方式[here](https://github.com/tensorflow/docs/blob/master/site/en/r2/tutorials/quickstart/advanced.ipynb)（使用Keras Subclassing API和GradientTape）。\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-images-intro_to_cnns.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-images-intro_to_cnns.html)\n> 英文版本：[https://tensorflow.google.cn/beta/tutorials/images/save_and_restore_models](https://tensorflow.google.cn/beta/tutorials/images/intro_to_cnns)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/images/intro_to_cnns.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/images/intro_to_cnns.md)\n"
  },
  {
    "path": "r2/tutorials/images/segmentation.md",
    "content": "---\ntitle: 图像分割\ntags: tensorflow2.0教程\ncategories: tensorflow2官方教程\ntop: 1924\nabbrlink: tensorflow/tf2-tutorials-images-intro_to_cnns\n---\n\n# 图像分割 (tensorflow2.0官方教程翻译)\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-images-segmentation.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-images-segmentation.html)\n> 英文版本：[https://tensorflow.google.cn/beta/tutorials/images/segmentation](https://tensorflow.google.cn/beta/tutorials/images/segmentation)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/images/segmentation.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/images/segmentation.md)\n\n\n本教程重点介绍使用修改后的[U-Net](https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/)进行图像分割的任务。\n\n## 什么是图像分割？\n\n前面的章节我们学习了图像分类，网络算法的任务是为输入图像输出对应的标签或类。但是，假设您想知道对象在图像中的位置，该对象的形状，哪个像素属于哪个对象等。在这种情况下，您将要分割图像，即图像的每个像素都是给了一个标签。\n\n因此，图像分割的任务是训练神经网络以输出图像的逐像素掩模。这有助于以更低的水平（即像素级别）理解图像。图像分割在医学成像，自动驾驶汽车和卫星成像等方面具有许多应用。\n\n将用于本教程的数据集是由Parkhi等人创建的[Oxford-IIIT Pet Dataset](https://www.robots.ox.ac.uk/~vgg/data/pets/)。数据集由图像、其对应的标签和像素方式的掩码组成。掩模基本上是每个像素的标签。每个像素分为三类：\n*   第1类：属于宠物的像素。\n*   第2类：与宠物接壤的像素。\n*   第3类：以上都没有/周围像素。\n\n下载依赖项目  https://github.com/tensorflow/examples，\n把文件夹tensorflow_examples放到项目下，下面会导入pix2pix\n\n安装tensorflow：\n\npip install -i https://pypi.tuna.tsinghua.edu.cn/simple tensorflow-gpu==2.0.0-beta1\n\n安装tensorflow_datasets：\n\npip install -i https://pypi.tuna.tsinghua.edu.cn/simple tensorflow_datasets\n\n## 导入各种依赖包\n\n```python\nimport tensorflow as tf\n\nfrom __future__ import absolute_import, division, print_function, unicode_literals\n\nfrom tensorflow_examples.models.pix2pix import pix2pix\n\nimport tensorflow_datasets as tfds\ntfds.disable_progress_bar()\n\nfrom IPython.display import clear_output\nimport matplotlib.pyplot as plt\n```\n\n## 下载Oxford-IIIT Pets数据集\n\n数据集已包含在TensorFlow数据集中，只需下载即可。分段掩码包含在3.0.0版中，这就是使用此特定版本的原因。\n\n```python\ndataset, info = tfds.load('oxford_iiit_pet:3.0.0', with_info=True)\n```\n\n以下代码执行翻转图像的简单扩充。另外，图像归一化为[0,1]。\n最后，如上所述，分割掩模中的像素标记为{1,2,3}。为了方便起见，让我们从分割掩码中减去1，得到标签：{0,1,2}。\n\n```python\ndef normalize(input_image, input_mask):\n  input_image = tf.cast(input_image, tf.float32)/128.0 - 1\n  input_mask -= 1\n  return input_image, input_mask\n\n@tf.function\ndef load_image_train(datapoint):\n  input_image = tf.image.resize(datapoint['image'], (128, 128))\n  input_mask = tf.image.resize(datapoint['segmentation_mask'], (128, 128))\n\n  if tf.random.uniform(()) > 0.5:\n    input_image = tf.image.flip_left_right(input_image)\n    input_mask = tf.image.flip_left_right(input_mask)\n\n  input_image, input_mask = normalize(input_image, input_mask)\n\n  return input_image, input_mask\n\ndef load_image_test(datapoint):\n  input_image = tf.image.resize(datapoint['image'], (128, 128))\n  input_mask = tf.image.resize(datapoint['segmentation_mask'], (128, 128))\n\n  input_image, input_mask = normalize(input_image, input_mask)\n\n  return input_image, input_mask\n```\n\n数据集已包含测试和训练所需的分割，因此让我们继续使用相同的分割。\n\n```python\nTRAIN_LENGTH = info.splits['train'].num_examples\nBATCH_SIZE = 64\nBUFFER_SIZE = 1000\nSTEPS_PER_EPOCH = TRAIN_LENGTH // BATCH_SIZE\n\ntrain = dataset['train'].map(load_image_train, num_parallel_calls=tf.data.experimental.AUTOTUNE)\ntest = dataset['test'].map(load_image_test)\n\ntrain_dataset = train.cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE).repeat()\ntrain_dataset = train_dataset.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)\ntest_dataset = test.batch(BATCH_SIZE)\n```\n\n让我们看一下图像示例，它是数据集的相应掩模。\n\n```python\ndef display(display_list):\n  plt.figure(figsize=(15, 15))\n\n  title = ['Input Image', 'True Mask', 'Predicted Mask']\n\n  for i in range(len(display_list)):\n    plt.subplot(1, len(display_list), i+1)\n    plt.title(title[i])\n    plt.imshow(tf.keras.preprocessing.image.array_to_img(display_list[i]))\n    plt.axis('off')\n  plt.show()\n\nfor image, mask in train.take(1):\n  sample_image, sample_mask = image, mask\ndisplay([sample_image, sample_mask])\n```\n\n![](https://www.tensorflow.org/beta/tutorials/images/segmentation_files/output_a6u_Rblkteqb_0.png)\n\n\n## 定义模型\n\n这里使用的模型是一个改进的U-Net。U-Net由编码器（下采样器）和解码器（上采样器）组成。为了学习鲁棒特征并减少可训练参数的数量，可以使用预训练模型作为编码器。因此，该任务的编码器将是预训练的MobileNetV2模型，其中间输出将被使用，并且解码器是已经在[Pix2pix tutorial](https://github.com/tensorflow/examples/blob/master/tensorflow_examples/models/pix2pix/pix2pix.py)教程示例中实现的上采样块。\n\n输出三个通道的原因是因为每个像素有三种可能的标签。可以将其视为多分类，其中每个像素被分为三类。\n\n```python\nOUTPUT_CHANNELS = 3\n```\n\n如上所述，编码器将是一个预训练的MobileNetV2模型，它已经准备好并可以在[tf.keras.applications](https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/keras/applications)中使用。编码器由模型中间层的特定输出组成。\n请注意，在训练过程中不会训练编码器。\n\n```python\nbase_model = tf.keras.applications.MobileNetV2(input_shape=[128, 128, 3], include_top=False)\n\n# Use the activations of these layers\nlayer_names = [\n    'block_1_expand_relu',   # 64x64\n    'block_3_expand_relu',   # 32x32\n    'block_6_expand_relu',   # 16x16\n    'block_13_expand_relu',  # 8x8\n    'block_16_project',      # 4x4\n]\nlayers = [base_model.get_layer(name).output for name in layer_names]\n\n# 创建特征提取模型\ndown_stack = tf.keras.Model(inputs=base_model.input, outputs=layers)\n\ndown_stack.trainable = False\n```\n\n解码器/上采样器只是在TensorFlow示例中实现的一系列上采样块。\n\n```python\nup_stack = [\n    pix2pix.upsample(512, 3),  # 4x4 -> 8x8\n    pix2pix.upsample(256, 3),  # 8x8 -> 16x16\n    pix2pix.upsample(128, 3),  # 16x16 -> 32x32\n    pix2pix.upsample(64, 3),   # 32x32 -> 64x64\n]\n\n\ndef unet_model(output_channels):\n\n  # 这是模型的最后一层\n  last = tf.keras.layers.Conv2DTranspose(\n      output_channels, 3, strides=2,\n      padding='same', activation='softmax')  #64x64 -> 128x128\n\n  inputs = tf.keras.layers.Input(shape=[128, 128, 3])\n  x = inputs\n\n  # 通过该模型进行下采样\n  skips = down_stack(x)\n  x = skips[-1]\n  skips = reversed(skips[:-1])\n\n  # Upsampling and establishing the skip connections\n  for up, skip in zip(up_stack, skips):\n    x = up(x)\n    concat = tf.keras.layers.Concatenate()\n    x = concat([x, skip])\n\n  x = last(x)\n\n  return tf.keras.Model(inputs=inputs, outputs=x)\n```\n\n## 训练模型\n\n现在，剩下要做的就是编译和训练模型。这里使用的损失是`loss.sparse_categorical_crossentropy`。使用此丢失函数的原因是因为网络正在尝试为每个像素分配标签，就像多类预测一样。在真正的分割掩码中，每个像素都有{0,1,2}。这里的网络输出三个通道。基本上，每个频道都试图学习预测一个类，而 `loss.sparse_categorical_crossentropy` 是这种情况的推荐损失。使用网络输出，分配给像素的标签是具有最高值的通道。这就是create_mask函数正在做的事情。\n\n```python\nmodel = unet_model(OUTPUT_CHANNELS)\nmodel.compile(optimizer='adam', loss='sparse_categorical_crossentropy',\n              metrics=['accuracy'])\n```\n\n让我们试试模型，看看它在训练前预测了什么。\n\n```python\ndef create_mask(pred_mask):\n  pred_mask = tf.argmax(pred_mask, axis=-1)\n  pred_mask = pred_mask[..., tf.newaxis]\n  return pred_mask[0]\n\ndef show_predictions(dataset=None, num=1):\n  if dataset:\n    for image, mask in dataset.take(num):\n      pred_mask = model.predict(image)\n      display([image[0], mask[0], create_mask(pred_mask)])\n  else:\n    display([sample_image, sample_mask,\n             create_mask(model.predict(sample_image[tf.newaxis, ...]))])\n\n\nshow_predictions()\n```\n\n![](https://www.tensorflow.org/beta/tutorials/images/segmentation_files/output_X_1CC0T4dho3_0.png)\n\n\n让我们观察模型在训练时如何改进。要完成此任务，下面定义了回调函数。\n\n```python\nclass DisplayCallback(tf.keras.callbacks.Callback):\n  def on_epoch_end(self, epoch, logs=None):\n    clear_output(wait=True)\n    show_predictions()\n    print ('\\nSample Prediction after epoch {}\\n'.format(epoch+1))\n\n\nEPOCHS = 20\nVAL_SUBSPLITS = 5\nVALIDATION_STEPS = info.splits['test'].num_examples//BATCH_SIZE//VAL_SUBSPLITS\n\nmodel_history = model.fit(train_dataset, epochs=EPOCHS,\n                          steps_per_epoch=STEPS_PER_EPOCH,\n                          validation_steps=VALIDATION_STEPS,\n                          validation_data=test_dataset,\n                          callbacks=[DisplayCallback()])\n```\n\n![](https://www.tensorflow.org/beta/tutorials/images/segmentation_files/output_StKDH_B9t4SD_0.png)\n\n\n我们查看损失变化情况\n```python\nloss = model_history.history['loss']\nval_loss = model_history.history['val_loss']\n\nepochs = range(EPOCHS)\n\nplt.figure()\nplt.plot(epochs, loss, 'r', label='Training loss')\nplt.plot(epochs, val_loss, 'bo', label='Validation loss')\nplt.title('Training and Validation Loss')\nplt.xlabel('Epoch')\nplt.ylabel('Loss Value')\nplt.ylim([0, 1])\nplt.legend()\nplt.show()\n```\n\n![](https://www.tensorflow.org/beta/tutorials/images/segmentation_files/output_P_mu0SAbt40Q_0.png)\n\n\n## 作出预测\n\n让我们做一些预测。为了节省时间，周期的数量很小，但您可以将其设置得更高以获得更准确的结果。\n\n```python\nshow_predictions(test_dataset, 1)\n```\n\n预测效果：\n![](https://www.tensorflow.org/beta/tutorials/images/segmentation_files/output_ikrzoG24qwf5_0.png)\n\n\n## 下一步\n\n现在您已经了解了图像分割是什么，以及它是如何工作的，您可以尝试使用不同的中间层输出，甚至是不同的预训练模型。您也可以通过尝试在Kaggle上托管的[Carvana](https://www.kaggle.com/c/carvana-image-masking-challenge/overview)图像掩蔽比赛来挑战自己。\n\n您可能还希望查看[Tensorflow Object Detection API]（https://github.com/tensorflow/models/tree/master/research/object_detection），以获取您可以重新训练自己数据的其他模型。\n"
  },
  {
    "path": "r2/tutorials/images/transfer_learning.md",
    "content": "---\ntitle: 使用预训练的卷积神经网络进行迁移学习\ntags: tensorflow2.0教程\ncategories: tensorflow2官方教程\ntop: 1923\nabbrlink: tensorflow/tf2-tutorials-images-transfer_learning\n---\n\n# 使用预训练的卷积神经网络进行迁移学习 (tensorflow2.0官方教程翻译)\n\n在本教程中，您将学习如何使用预训练网络进行转移学习对猫与狗图像分类。主要内容：使用预训练的模型进行特征提取，微调与训练的模型。\n\n预训练模型是一个保存的网路，以前在大型数据集上训练的，通常是在大规模图像分类任务上，您可以按原样使用预训练模型，也可以使用转移学习将此模型自定义为给定的任务。\n\n转移学习背后的直觉是，如果一个模型在一个大而且足够通用的数据集上训练，这个模型将有效地作为视觉世界的通用模型。然后，您可以利用这些学习的特征映射，而无需从头开始训练大型数据集上的大型模型。\n\n在本节中，您将尝试两种方法来自定义预训练模型：\n1. **特征提取**：使用先前网络学习的表示从新样本中提取有意义的特征，您只需在与训练模型的基础上添加一个新的分类器（将从头开始训练），以便您可以重新调整先前为我们的数据集学习的特征映射。\n您不需要(重新)训练整个模型，基本卷积网络已经包含了一些对图片分类非常有用的特性。然而，预训练模型的最后一个分类部分是特定于原始分类任务的，然后是特定于模型所训练的一组类。\n\n2. **微调**：解冻冻结模型的顶层，并共同训练新添加的分类器和基础模型的最后一层，这允许我们“微调”基础模型中的高阶特征表示，以使它们与特定任务更相关。\n\n你将要遵循一般的机器学习工作流程：\n1. 检查并理解数据\n2. 构建输入管道，在本例中使用Keras 的 `ImageDataGenerator`\n3. 构建模型\n    * 加载我们的预训练基础模型（和预训练的权重）\n    * 将我们的分类图层堆叠在顶部\n4. 训练模型\n5. 评估模型\n\n\n```python\nfrom __future__ import absolute_import, division, print_function, unicode_literals\n\nimport os\n\nimport numpy as np\n\nimport matplotlib.pyplot as plt\n\nimport tensorflow as tf\n\nkeras = tf.keras\n```\n\n## 1. 数据预处理\n\n### 1.1. 下载数据\n\n使用 [TensorFlow Datasets](http://tensorflow.google.cn/datasets)加载猫狗数据集。`tfds` 包是加载预定义数据的最简单方法，如果您有自己的数据，并且有兴趣使用TensorFlow进行导入，请参阅[加载图像数据](https://tensorflow.google.cn/beta/tutorials/load_data/images)。\n\n\n```python\nimport tensorflow_datasets as tfds\n```\n\n`tfds.load`方法下载并缓存数据，并返回`tf.data.Dataset`对象，这些对象提供了强大、高效的方法来处理数据并将其传递到模型中。\n\n由于`\"cats_vs_dog\"` 没有定义标准分割，因此使用subsplit功能将其分为训练80%、验证10%、测试10%的数据。\n\n```python\nSPLIT_WEIGHTS = (8, 1, 1)\nsplits = tfds.Split.TRAIN.subsplit(weighted=SPLIT_WEIGHTS)\n\n(raw_train, raw_validation, raw_test), metadata = tfds.load(\n    'cats_vs_dogs', split=list(splits),\n    with_info=True, as_supervised=True)\n```\n\n生成的`tf.data.Dataset`对象包含（图像，标签）对。图像具有可变形状和3个通道，标签是标量。\n\n```python\nprint(raw_train)\nprint(raw_validation)\nprint(raw_test)\n```\n\n```\n    <DatasetV1Adapter shapes: ((None, None, 3), ()), types: (tf.uint8, tf.int64)>\n    <DatasetV1Adapter shapes: ((None, None, 3), ()), types: (tf.uint8, tf.int64)>\n    <DatasetV1Adapter shapes: ((None, None, 3), ()), types: (tf.uint8, tf.int64)>\n```\n\n显示训练集中的前两个图像和标签：\n\n```python\nget_label_name = metadata.features['label'].int2str\n\nfor image, label in raw_train.take(2):\n  plt.figure()\n  plt.imshow(image)\n  plt.title(get_label_name(label))\n```\n\n\n![png](https://tensorflow.google.cn/beta/tutorials/images/transfer_learning_files/output_14_0.png)\n\n![png](https://tensorflow.google.cn/beta/tutorials/images/transfer_learning_files/output_14_1.png)\n\n\n### 1.2. 格式化数据\n\n使用`tf.image`模块格式化图像，将图像调整为固定的输入大小，并将输入通道重新调整为`[-1,1]`范围。\n\n<!-- TODO(markdaoust): fix the keras_applications preprocessing functions to work in tf2 -->\n\n```python\nIMG_SIZE = 160 # 所有图像将被调整为160x160\n\ndef format_example(image, label):\n  image = tf.cast(image, tf.float32)\n  image = (image/127.5) - 1\n  image = tf.image.resize(image, (IMG_SIZE, IMG_SIZE))\n  return image, label\n```\n\n使用map方法将此函数应用于数据集中的每一个项：\n\n```python\ntrain = raw_train.map(format_example)\nvalidation = raw_validation.map(format_example)\ntest = raw_test.map(format_example)\n```\n\n打乱和批处理数据：\n\n```python\nBATCH_SIZE = 32\nSHUFFLE_BUFFER_SIZE = 1000\n\ntrain_batches = train.shuffle(SHUFFLE_BUFFER_SIZE).batch(BATCH_SIZE)\nvalidation_batches = validation.batch(BATCH_SIZE)\ntest_batches = test.batch(BATCH_SIZE)\n```\n\n检查一批数据：\n\n```python\nfor image_batch, label_batch in train_batches.take(1):\n  pass\n\nimage_batch.shape\n```\n\n```\n    TensorShape([32, 160, 160, 3])\n```\n\n## 2. 从预先训练的网络中创建基础模型\n\n您将从Google开发的**MobileNet V2**模型创建基础模型，这是在ImageNet数据集上预先训练的，一个包含1.4M图像和1000类Web图像的大型数据集。ImageNet有一个相当随意的研究训练数据集，其中包括“jackfruit(菠萝蜜)”和“syringe(注射器)”等类别，但这个知识基础将帮助我们将猫和狗从特定数据集中区分开来。\n\n首先，您需要选择用于特征提取的MobileNet V2层，显然，最后一个分类层（在“顶部”，因为大多数机器学习模型的图表从下到上）并不是非常有用。相反，您将遵循通常的做法，在展平操作之前依赖于最后一层，该层称为“瓶颈层”，与最终/顶层相比，瓶颈层保持了很多通用性。\n\n然后，实例化预装了ImageNet上训练的MobileNet V2模型权重，通过制定include_top=False参数，可以加载不包含顶部分类层的网络，这是特征提取的理想选择。\n\n```python\nIMG_SHAPE = (IMG_SIZE, IMG_SIZE, 3)\n\n# 从预先训练的模型MobileNet V2创建基础模型 \nbase_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,\n                                               include_top=False,\n                                               weights='imagenet')\n```\n\n此特征提取器将每个160x160x3图像转换为5x5x1280的特征块，看看它对示例批量图像的作用：\n\n```python\nfeature_batch = base_model(image_batch)\nprint(feature_batch.shape)\n```\n\n```\n    (32, 5, 5, 1280)\n```\n\n\n## 3. 特征提取\n\n您将冻结上一步创建的卷积基，并将其用作特征提取器，在其上添加分类器并训练顶级分类器。\n\n### 3.1. 冻结卷积基\n\n在编译和训练模型之前，冻结卷积基是很重要的，通过冻结（或设置`layer.trainable = False`），可以防止在训练期间更新给定图层中的权重。MobileNet V2有很多层，因此将整个模型的可训练标志设置为`False`将冻结所有层。\n\n\n```python\nbase_model.trainable = False\nbase_model.summary() # 看看基础模型架构  \n```\n\n```\n    Model: \"mobilenetv2_1.00_160\"\n    __________________________________________________________________________________________________\n    Layer (type)                    Output Shape         Param #     Connected to\n    ==================================================================================================\n    input_1 (InputLayer)            [(None, 160, 160, 3) 0\n    __________________________________________________________________________________________________\n    Conv1_pad (ZeroPadding2D)       (None, 161, 161, 3)  0           input_1[0][0]\n    __________________________________________________________________________________________________\n    Conv1 (Conv2D)                  (None, 80, 80, 32)   864         Conv1_pad[0][0]\n    __________________________________________________________________________________________________\n    .....（此处省略很多层）\n    __________________________________________________________________________________________________\n    Conv_1_bn (BatchNormalizationV1 (None, 5, 5, 1280)   5120        Conv_1[0][0]\n    __________________________________________________________________________________________________\n    out_relu (ReLU)                 (None, 5, 5, 1280)   0           Conv_1_bn[0][0]\n    ==================================================================================================\n    ...\n```\n\n\n### 3.2. 添加分类头\n\n要从特征块生成预测，请用5x5在空间位置上进行平均，使用`tf.keras.layers.GlobalAveragePooling2D`层将特征转换为每个图像对应一个1280元素向量。\n\n```python\nglobal_average_layer = tf.keras.layers.GlobalAveragePooling2D()\nfeature_batch_average = global_average_layer(feature_batch)\nprint(feature_batch_average.shape)\n```\n\n`(32, 1280)`\n\n\n应用`tf.keras.layers.Dense`层将这些特征转换为每个图像的单个预测。您不需要激活函数，因为此预测将被视为`logit`或原始预测值。正数预测第1类，负数预测第0类。\n\n```python\nprediction_layer = keras.layers.Dense(1)\nprediction_batch = prediction_layer(feature_batch_average)\nprint(prediction_batch.shape)\n```\n\n``` \n    (32, 1)\n```\n\n\n现在使用`tf.keras.Sequential`堆叠特征提取器和这两个层：\n\n```python\nmodel = tf.keras.Sequential([\n  base_model,\n  global_average_layer,\n  prediction_layer\n])\n```\n\n### 3.3. 编译模型\n\n你必须在训练之前编译模型，由于有两个类，因此使用二进制交叉熵损失：\n\n```python\nbase_learning_rate = 0.0001\nmodel.compile(optimizer=tf.keras.optimizers.RMSprop(lr=base_learning_rate),\n              loss='binary_crossentropy',\n              metrics=['accuracy'])\n              \nmodel.summary()\n```\n\n```\n    Model: \"sequential\"\n    _________________________________________________________________\n    Layer (type)                 Output Shape              Param #\n    =================================================================\n    mobilenetv2_1.00_160 (Model) (None, 5, 5, 1280)        2257984\n    _________________________________________________________________\n    global_average_pooling2d (Gl (None, 1280)              0\n    _________________________________________________________________\n    dense (Dense)                (None, 1)                 1281\n    =================================================================\n    Total params: 2,259,265\n    Trainable params: 1,281\n    Non-trainable params: 2,257,984\n    _________________________________________________________________\n```\n\nMobileNet中的2.5M参数被冻结，但Dense层中有1.2K可训练参数，它们分为两个`tf.Variable`对象：权重和偏差。\n\n\n```python\nlen(model.trainable_variables)\n```\n\n`2`\n\n\n\n### 3.4. 训练模型\n\n经过10个周期的训练后，你应该看到约96%的准确率。\n\n<!-- TODO(markdaoust): delete steps_per_epoch in TensorFlow r1.14/r2.0 -->\n\n\n```python\nnum_train, num_val, num_test = (\n  metadata.splits['train'].num_examples*weight/10\n  for weight in SPLIT_WEIGHTS\n)\n\ninitial_epochs = 10\nsteps_per_epoch = round(num_train)//BATCH_SIZE\nvalidation_steps = 20\n\nloss0,accuracy0 = model.evaluate(validation_batches, steps = validation_steps)\n```\n\n```\n    20/20 [==============================] - 4s 219ms/step - loss: 3.1885 - accuracy: 0.6109\n```\n\n\n\n```python\nprint(\"initial loss: {:.2f}\".format(loss0))\nprint(\"initial accuracy: {:.2f}\".format(accuracy0))\n```\n\n```\n    initial loss: 3.19\n    initial accuracy: 0.61\n```\n\n\n\n```python\nhistory = model.fit(train_batches,\n                    epochs=initial_epochs,\n                    validation_data=validation_batches)\n```\n\n```\n    Epoch 1/10\n    581/581 [==============================] - 102s 175ms/step - loss: 1.8917 - accuracy: 0.7606 - val_loss: 0.8860 - val_accuracy: 0.8828\n    ...\n    Epoch 10/10\n    581/581 [==============================] - 96s 165ms/step - loss: 0.4921 - accuracy: 0.9381 - val_loss: 0.1847 - val_accuracy: 0.9719\n```\n\n### 3.5. 学习曲线\n\n让我们来看一下使用MobileNet V2基础模型作为固定特征提取器时，训练和验证准确性/损失的学习曲线。\n\n```python\nacc = history.history['accuracy']\nval_acc = history.history['val_accuracy']\n\nloss = history.history['loss']\nval_loss = history.history['val_loss']\n\nplt.figure(figsize=(8, 8))\nplt.subplot(2, 1, 1)\nplt.plot(acc, label='Training Accuracy')\nplt.plot(val_acc, label='Validation Accuracy')\nplt.legend(loc='lower right')\nplt.ylabel('Accuracy')\nplt.ylim([min(plt.ylim()),1])\nplt.title('Training and Validation Accuracy')\n\nplt.subplot(2, 1, 2)\nplt.plot(loss, label='Training Loss')\nplt.plot(val_loss, label='Validation Loss')\nplt.legend(loc='upper right')\nplt.ylabel('Cross Entropy')\nplt.ylim([0,1.0])\nplt.title('Training and Validation Loss')\nplt.xlabel('epoch')\nplt.show()\n```\n\n\n![png](https://tensorflow.google.cn/beta/tutorials/images/transfer_learning_files/output_50_0.png)\n\n\n*注意：如果您想知道为什么验证指标明显优于训练指标，主要因素是因为像`tf.keras.layers.BatchNormalization`和`tf.keras.layers.Dropout`这样的层会影响训练期间的准确性。在计算验证损失时，它们会被关闭。*\n\n在较小程度上，这也是因为训练指标报告了一个周期的平均值，而验证指标是在周期之后进行评估的，因此验证指标会看到已经训练稍长一些的模型。\n\n## 4. 微调\n\n在我们的特征提取实验中，您只在MobileNet V2基础模型上训练了几层，训练期间未预先更新预训练网络的权重。\n\n进一步提高性能的方法是训练（或“微调”）预训练模型的顶层的权重以及您添加的分类器的训练，训练过程将强制将权重通过特征图调整为专门与我们的数据集关联的特征。\n\n*注意：只有在训练顶级分类器并将预先训练的模型设置为不可训练之后，才应尝试此操作。如果您在预先训练的模型上添加一个随机初始化的分类器并尝试联合训练所有层，则梯度更新的幅度将太大（由于分类器的随机权重），并且您的预训练模型将忘记它学到的东西。*\n\n此外，您应该尝试微调少量顶层而不是整个MobileNet模型，在大多数卷积网络中，层越高，它就越专业化。前几层学习非常简单和通用的功能，这些功能可以推广到几乎所有类型的图像，随着层越来越高，这些功能越来越多地针对训练模型的数据集。微调的目的是使这些专用功能适应新数据集，而不是覆盖通用学习。\n\n### 4.1. 取消冻结模型的顶层\n\n\n您需要做的就是解冻`base_model`并将底层设置为无法训练，然后重新编译模型（这些更改生效所必须的），并恢复训练。\n\n\n```python\nbase_model.trainable = True\n\n# 看看基础模型有多少层 \nprint(\"Number of layers in the base model: \", len(base_model.layers))\n\n# 从此层开始微调 \nfine_tune_at = 100\n\n# 冻结‘fine_tune_at’层之前的所有层\nfor layer in base_model.layers[:fine_tune_at]:\n  layer.trainable =  False\n```\n```\n    Number of layers in the base model:  155\n```\n\n### 4.2. 编译模型\n\n使用低得多的训练率（学习率）编译模型：\n\n```python\nmodel.compile(loss='binary_crossentropy',\n              optimizer = tf.keras.optimizers.RMSprop(lr=base_learning_rate/10),\n              metrics=['accuracy'])\n              \nmodel.summary()\n```\n\n```\n    Model: \"sequential\"\n    _________________________________________________________________\n    Layer (type)                 Output Shape              Param #\n    =================================================================\n    mobilenetv2_1.00_160 (Model) (None, 5, 5, 1280)        2257984\n    _________________________________________________________________\n    global_average_pooling2d (Gl (None, 1280)              0\n    _________________________________________________________________\n    dense (Dense)                (None, 1)                 1281\n    =================================================================\n    Total params: 2,259,265\n    Trainable params: 1,863,873\n    Non-trainable params: 395,392\n    _________________________________________________________________\n```\n\n```python\nlen(model.trainable_variables)\n```\n\n``` \n   58\n```\n\n\n\n### 4.3. 继续训练模型\n\n如果你训练得更早收敛，这将使你的准确率提高几个百分点。\n\n```python\nfine_tune_epochs = 10\ntotal_epochs =  initial_epochs + fine_tune_epochs\n\nhistory_fine = model.fit(train_batches,\n                         epochs=total_epochs,\n                         initial_epoch = initial_epochs,\n                         validation_data=validation_batches)\n```\n```\n    ...\n    Epoch 20/20\n    581/581 [==============================] - 116s 199ms/step - loss: 0.1243 - accuracy: 0.9849 - val_loss: 0.1121 - val_accuracy: 0.9875\n```\n\n让我们看一下训练和验证精度/损失的学习曲线，当微调MobileNet V2基础模型的最后几层并在其上训练分类器是，验证损失远远高于训练损失，因此您可能有一些过度拟合。因为新的训练集相对较小且与原始的MobileNet V2数据集类似。\n\n经过微调后，模型精度几乎达到98%。\n\n```python\nacc += history_fine.history['accuracy']\nval_acc += history_fine.history['val_accuracy']\n\nloss += history_fine.history['loss']\nval_loss += history_fine.history['val_loss']\n\nplt.figure(figsize=(8, 8))\nplt.subplot(2, 1, 1)\nplt.plot(acc, label='Training Accuracy')\nplt.plot(val_acc, label='Validation Accuracy')\nplt.ylim([0.8, 1])\nplt.plot([initial_epochs-1,initial_epochs-1],\n          plt.ylim(), label='Start Fine Tuning')\nplt.legend(loc='lower right')\nplt.title('Training and Validation Accuracy')\n\nplt.subplot(2, 1, 2)\nplt.plot(loss, label='Training Loss')\nplt.plot(val_loss, label='Validation Loss')\nplt.ylim([0, 1.0])\nplt.plot([initial_epochs-1,initial_epochs-1],\n         plt.ylim(), label='Start Fine Tuning')\nplt.legend(loc='upper right')\nplt.title('Training and Validation Loss')\nplt.xlabel('epoch')\nplt.show()\n```\n\n![png](https://tensorflow.google.cn/beta/tutorials/images/transfer_learning_files/output_67_0.png)\n\n\n## 5. 小结:\n\n* **使用预训练的模型进行特征提取：**\n使用小型数据集时，通常会利用在同一域中的较大数据集上训练的模型所学习的特征。这是通过实例化预先训练的模型，并在顶部添加完全连接的分类器来完成的。预训练的模型被“冻结”并且仅在训练期间更新分类器的权重。在这种情况下，卷积基提取了与每幅图像相关的所有特征，您只需训练一个分类器，根据所提取的特征集确定图像类。\n\n* **微调与训练的模型：** \n为了进一步提高性能，可以通过微调将预训练模型的顶层重新调整为新数据集。在这种情况下，您调整了权重，以便模型学习特定于数据集的高级特征，当训练数据集很大并且非常类似于预训练模型训练的原始数据集时，通常建议使用此技术。\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-images-transfer_learning.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-images-transfer_learning.html)\n> 英文版本：[https://tensorflow.google.cn/beta/tutorials/images/transfer_learning](https://tensorflow.google.cn/beta/tutorials/images/transfer_learning)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/images/transfer_learning.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/images/transfer_learning.md)\n"
  },
  {
    "path": "r2/tutorials/keras/basic_classification.md",
    "content": "---\ntitle: 训练您的第一个神经网络：基本分类Fashion MNIST\ncategories: \n    - tensorflow2官方教程\ntags: \n    - tensorflow2.0\ntop: 1911\nabbrlink: tensorflow/tf2-tutorials-keras-basic_classification\n---\n\n# 训练您的第一个神经网络：基本分类Fashion MNIST(tensorflow2.0官方教程翻译)\n\n本指南会训练一个对服饰（例如运动鞋和衬衫）图像进行分类的神经网络模型。即使您不了解所有细节也没关系，本教程只是简要介绍了一个完整的 TensorFlow 程序，而且后续我们会详细介绍。\n\n本指南使用的是[tf.keras](https://tensorflow.google.cn/guide/keras)，它是一种用于在 TensorFlow 中构建和训练模型的高阶 API。\n\n安装\n\n```python\npip install tensorflow==2.0.0-alpha0\n```\n\n导入相关库\n\n```python\nfrom __future__ import absolute_import, division, print_function, unicode_literals\n\n# TensorFlow and tf.keras\nimport tensorflow as tf\nfrom tensorflow import keras\n\n# Helper libraries\nimport numpy as np\nimport matplotlib.pyplot as plt\n\nprint(tf.__version__)\n```\n\n## 1. 导入MNIST数据集\n\n本指南使用[Fashion MNIST](https://github.com/zalandoresearch/fashion-mnist)数据集，其中包含 70000 张灰度图像，涵盖 10 个类别。以下图像显示了单件服饰在较低分辨率（28x28 像素）下的效果：\n\n<table>\n  <tr><td>\n    <img src=\"https://tensorflow.google.cn/images/fashion-mnist-sprite.png\"\n         alt=\"Fashion MNIST sprite\"  width=\"600\">\n  </td></tr>\n  <tr><td align=\"center\">\n    <b>Figure 1.</b> <a href=\"https://github.com/zalandoresearch/fashion-mnist\">Fashion-MNIST 样本</a>\n  </td></tr>\n</table>\n\nFashion MNIST 的作用是成为经典 MNIST 数据集的简易替换，后者通常用作计算机视觉机器学习程序的“Hello, World”入门数据集。[MNIST](http://yann.lecun.com/exdb/mnist/)数据集包含手写数字（0、1、2 等）的图像，这些图像的格式与我们在本教程中使用的服饰图像的格式相同。\n\n本指南使用 Fashion MNIST 实现多样化，并且它比常规 [MNIST](http://yann.lecun.com/exdb/mnist/)更具挑战性。这两个数据集都相对较小，用于验证某个算法能否如期正常运行。它们都是测试和调试代码的良好起点。\n\n我们将使用 60000 张图像训练网络，并使用 10000 张图像评估经过学习的网络分类图像的准确率。您可以从 TensorFlow 直接访问 Fashion MNIST，只需导入和加载数据即可：\n\n```python\nfashion_mnist = keras.datasets.fashion_mnist\n\n(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()\n```\n\n加载数据返回4个NumPy数组：\n\n* `train_images`和`train_labels`数组是训练集，即模型用于学习的数据。\n* 测试集 `test_images` 和 `test_labels` 数组用于测试模型。\n\n图像为28x28的NumPy数组，像素值介于0到255之间。标签是整数数组，介于0到9之间。这些标签对应于图像代表的服饰所属的类别：\n\n<table>\n  <tr>\n    <th>Label</th>\n    <th>Class</th>\n  </tr>\n  <tr>\n    <td>0</td>\n    <td>T-shirt/top(T 恤衫/上衣)</td>\n  </tr>\n  <tr>\n    <td>1</td>\n    <td>Trouser(裤子)</td>\n  </tr>\n    <tr>\n    <td>2</td>\n    <td>Pullover (套衫)</td>\n  </tr>\n    <tr>\n    <td>3</td>\n    <td>Dress(裙子)</td>\n  </tr>\n    <tr>\n    <td>4</td>\n    <td>Coat(外套)</td>\n  </tr>\n    <tr>\n    <td>5</td>\n    <td>Sandal(凉鞋)</td>\n  </tr>\n    <tr>\n    <td>6</td>\n    <td>Shirt(衬衫)</td>\n  </tr>\n    <tr>\n    <td>7</td>\n    <td>Sneaker(运动鞋)</td>\n  </tr>\n    <tr>\n    <td>8</td>\n    <td>Bag(包包)</td>\n  </tr>\n    <tr>\n    <td>9</td>\n    <td>Ankle boot(踝靴)</td>\n  </tr>\n</table>\n\n每个图像都映射到一个标签，由于类名不包含在数据集中，因此将它们存储在此处以便在绘制图像时使用：\n\n```python\nclass_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',\n               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']\n```\n\n## 2. 探索数据\n\n我们先探索数据集的格式，然后再训练模型。以下内容显示训练集中有 60000 张图像，每张图像都表示为 28x28 像素：\n\n```python\ntrain_images.shape\n```\n\n`(60000, 28, 28)`\n\n同样，训练集中有60,000个标签：\n\n```python\nlen(train_labels)\n```\n\n`60000`\n\n每个标签都是0到9之间的整数：\n\n```python\ntrain_labels\n```\n\n`array([9, 0, 0, ..., 3, 0, 5], dtype=uint8)`\n\n测试集中有10,000个图像。同样，每个图像表示为28 x 28像素：\n\n```python\ntest_images.shape\n```\n\n`(10000, 28, 28)`\n\n测试集包含10,000个图像标签：\n\n```python\nlen(test_labels)\n```\n\n`10000`\n\n## 3. 预处理数据\n\n在训练网络之前必须对数据进行预处理。 如果您检查训练集中的第一个图像，您将看到像素值落在0到255的范围内：\n\n```python\nplt.figure()\nplt.imshow(train_images[0])\nplt.colorbar()\nplt.grid(False)\nplt.show()\n```\n\n![](https://tensorflow.google.cn/beta/tutorials/keras/basic_classification_files/output_21_0.png)\n\n我们将这些值缩小到 0 到 1 之间，然后将其馈送到神经网络模型。为此，将图像组件的数据类型从整数转换为浮点数，然后除以 255。以下是预处理图像的函数：\n\n务必要以相同的方式对训练集和测试集进行预处理：\n\n```python\ntrain_images = train_images / 255.0\n\ntest_images = test_images / 255.0\n```\n\n为了验证数据的格式是否正确以及我们是否已准备好构建和训练网络，让我们显示训练集中的前25个图像，并在每个图像下方显示类名。\n\n```python\nplt.figure(figsize=(10,10))\nfor i in range(25):\n    plt.subplot(5,5,i+1)\n    plt.xticks([])\n    plt.yticks([])\n    plt.grid(False)\n    plt.imshow(train_images[i], cmap=plt.cm.binary)\n    plt.xlabel(class_names[train_labels[i]])\nplt.show()\n```\n\n![png](https://tensorflow.google.cn/beta/tutorials/keras/basic_classification_files/output_25_0.png)\n\n## 4. 构建模型\n\n构建神经网络需要配置模型的层，然后编译模型。\n\n### 4.1. 设置图层\n\n神经网络的基本构造块是层。层从馈送到其中的数据中提取表示结果。希望这些表示结果有助于解决手头问题。\n\n大部分深度学习都会把简单的层连在一起。大部分层（例如 `tf.keras.layers.Dense`）都具有在训练期间要学习的参数。\n\n```python\nmodel = keras.Sequential([\n    keras.layers.Flatten(input_shape=(28, 28)),\n    keras.layers.Dense(128, activation='relu'),\n    keras.layers.Dense(10, activation='softmax')\n])\n```\n\n该网络中的第一层`tf.keras.layers.Flatten`将图像的格式从二维数组（28 x 28像素）转换为一维数组（28 * 28 = 784像素））。可以将该层视为图像中像素未堆叠的行，并排列这些行。该层没有要学习的参数；它只改动数据的格式。\n\n在像素被展平之后，网络由两个`tf.keras.layers.Dense`层的序列组成。这些是密集连接或全连接的神经层。第一个`Dense`层有128个节点（或神经元）。第二个（也是最后一个）层是具有 10 个节点的 `softmax` 层，该层会返回一个具有 10 个概率得分的数组，这些得分的总和为 1。每个节点包含一个得分，表示当前图像属于 10 个类别中某一个的概率。\n\n### 4.2. 编译模型\n\n模型还需要再进行几项设置才可以开始训练。这些设置会添加到模型的编译步骤：\n\n* 损失函数：衡量模型在训练期间的准确率。我们希望尽可能缩小该函数，以“引导”模型朝着正确的方向优化。\n* 优化器：根据模型看到的数据及其损失函数更新模型的方式。\n* 度量标准：用于监控训练和测试步骤。以下示例使用准确率，即图像被正确分类的比例。\n\n```python\nmodel.compile(optimizer='adam',\n              loss='sparse_categorical_crossentropy',\n              metrics=['accuracy'])\n```\n\n## 5. 训练模型\n\n训练神经网络模型需要以下步骤：\n\n1. 将训练数据馈送到模型中，在本示例中为 `train_images` 和 `train_labels` 数组。\n2. 模型学习将图像与标签相关联。\n3. 我们要求模型对测试集进行预测，在本示例中为 test_images 数组。我们会验证预测结果是否与 `test_labels` 数组中的标签一致。\n\n要开始训练，请调用 `model.fit` 方法，使模型与训练数据“拟合”：\n\n```python\nmodel.fit(train_images, train_labels, epochs=5)\n```\n\n```shell\nEpoch 1/5\n60000/60000 [==============================] - 5s 87us/step - loss: 0.5033 - acc: 0.8242\n......\nEpoch 5/5\n60000/60000 [==============================] - 5s 88us/step - loss: 0.2941 - acc: 0.8917\n```\n\n在模型训练期间，系统会显示损失和准确率指标。该模型在训练数据上的准确率达到 0.88（即 88%）。\n\n## 6. 评估精度\n\n接下来，比较模型在测试数据集上的表现情况：\n\n```python\ntest_loss, test_acc = model.evaluate(test_images, test_labels)\n\nprint('\\nTest accuracy:', test_acc)\n```\n\n输出：\n\n```output\n10000/10000 [==============================] - 1s 50us/step\nTest accuracy: 0.8734\n```\n\n结果表明，模型在测试数据集上的准确率略低于在训练数据集上的准确率。训练准确率和测试准确率之间的这种差异表示出现过拟合(*overfitting*)。如果机器学习模型在新数据上的表现不如在训练数据上的表现，也就是泛化性不好，就表示出现过拟合。\n\n## 7. 预测\n\n模型经过训练后，我们可以使用它对一些图像进行预测。\n\n```python\npredictions = model.predict(test_images)\n```\n\n在本示例中，模型已经预测了测试集中每张图像的标签。我们来看看第一个预测：\n\n```python\npredictions[0]\n```\n\n输出：\n\n```output\narray([6.2482708e-05, 2.4860196e-08, 9.7165821e-07, 4.7436039e-08,\n       2.0804382e-06, 1.3316551e-02, 9.8731316e-06, 3.4591161e-02,\n       1.2390658e-04, 9.5189297e-01], dtype=float32)\n```\n\n预测结果是一个具有 10 个数字的数组，这些数字说明模型对于图像对应于 10 种不同服饰中每一个服饰的“confidence（置信度）”。我们可以看到哪个标签的置信度值最大：\n\n```python\nnp.argmax(predictions[0])\n```\n\n`9`\n\n因此，模型非常确信这张图像是踝靴或属于 class_names[9]。我们可以检查测试标签以查看该预测是否正确：\n\n```python\ntest_labels[0]\n```\n\n`9`\n\n我们可以将该预测绘制成图来查看全部 10 个通道\n\n```python\ndef plot_image(i, predictions_array, true_label, img):\n  predictions_array, true_label, img = predictions_array[i], true_label[i], img[i]\n  plt.grid(False)\n  plt.xticks([])\n  plt.yticks([])\n\n  plt.imshow(img, cmap=plt.cm.binary)\n\n  predicted_label = np.argmax(predictions_array)\n  if predicted_label == true_label:\n    color = 'blue'\n  else:\n    color = 'red'\n\n  plt.xlabel(\"{} {:2.0f}% ({})\".format(class_names[predicted_label],\n                                100*np.max(predictions_array),\n                                class_names[true_label]),\n                                color=color)\n\ndef plot_value_array(i, predictions_array, true_label):\n  predictions_array, true_label = predictions_array[i], true_label[i]\n  plt.grid(False)\n  plt.xticks([])\n  plt.yticks([])\n  thisplot = plt.bar(range(10), predictions_array, color=\"#777777\")\n  plt.ylim([0, 1])\n  predicted_label = np.argmax(predictions_array)\n\n  thisplot[predicted_label].set_color('red')\n  thisplot[true_label].set_color('blue')\n```\n\n让我们看看第0个图像，预测和预测数组。\n\n```python\ni = 0\nplt.figure(figsize=(6,3))\nplt.subplot(1,2,1)\nplot_image(i, predictions, test_labels, test_images)\nplt.subplot(1,2,2)\nplot_value_array(i, predictions,  test_labels)\nplt.show()\n```\n\n![png](https://tensorflow.google.cn/beta/tutorials/keras/basic_classification_files/output_48_0.png)\n\n```python\ni = 12\nplt.figure(figsize=(6,3))\nplt.subplot(1,2,1)\nplot_image(i, predictions, test_labels, test_images)\nplt.subplot(1,2,2)\nplot_value_array(i, predictions,  test_labels)\nplt.show()\n```\n\n![png](https://tensorflow.google.cn/beta/tutorials/keras/basic_classification_files/output_49_0.png)\n\n我们用它们的预测绘制几张图像。正确的预测标签为蓝色，错误的预测标签为红色。数字表示预测标签的百分比（总计为 100）。请注意，即使置信度非常高，也有可能预测错误。\n\n```python\n# 绘制前X个测试图像，预测标签和真实标签。 \n# 用蓝色标记正确的预测，用红色标记错误的预测。\nnum_rows = 5\nnum_cols = 3\nnum_images = num_rows*num_cols\nplt.figure(figsize=(2*2*num_cols, 2*num_rows))\nfor i in range(num_images):\n  plt.subplot(num_rows, 2*num_cols, 2*i+1)\n  plot_image(i, predictions, test_labels, test_images)\n  plt.subplot(num_rows, 2*num_cols, 2*i+2)\n  plot_value_array(i, predictions, test_labels)\nplt.show()\n```\n\n![png](https://tensorflow.google.cn/beta/tutorials/keras/basic_classification_files/output_51_0.png)\n\n最后，使用训练的模型对单个图像进行预测。\n\n```python\n# 从测试数据集中获取图像\nimg = test_images[0]\n\nprint(img.shape)\n```\n\n`tf.keras`模型已经过优化，可以一次性对样本批次或样本集进行预测。因此，即使我们使用单个图像，仍需要将其添加到列表中：\n\n```python\n# 将图像添加到批次中，它是唯一的成员。 \nimg = (np.expand_dims(img,0))\n\nprint(img.shape)\n```\n\n`(1, 28, 28)`\n\n现在预测此图像的正确标签：\n\n```python\npredictions_single = model.predict(img)\n\nprint(predictions_single)\n```\n\n```python\nplot_value_array(0, predictions_single, test_labels)\n_ = plt.xticks(range(10), class_names, rotation=45)\n```\n\n![png](https://tensorflow.google.cn/beta/tutorials/keras/basic_classification_files/output_58_0.png)\n\n`model.predict`返回一组列表，每个列表对应批次数据中的每张图像。（仅）获取批次数据中相应图像的预测结果：\n\n```python\nnp.argmax(predictions_single[0])\n```\n\n`9`\n\n和前面的一样，模型预测标签为9。\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-keras-basic_classification.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-keras-basic_classification.html)\n> 英文版本：[https://tensorflow.google.cn/beta/tutorials/keras/basic_classification](https://tensorflow.google.cn/beta/tutorials/keras/basic_classification)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/keras/basic_classification.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/keras/basic_classification.md)\n"
  },
  {
    "path": "r2/tutorials/keras/basic_regression.md",
    "content": "---\ntitle: 回归项目实战：预测燃油效率\ncategories: tensorflow2官方教程\ntags: tensorflow2.0教程\ntop: 1914\nabbrlink: tensorflow/tf2-tutorials-keras-basic_regression\n---\n\n# 回归项目实战：预测燃油效率 (tensorflow2.0官方教程翻译)\n\n在*回归*问题中，我们的目标是预测连续值的输出，如价格或概率。\n将此与*分类*问题进行对比，分类的目标是从类列表中选择一个类（例如，图片包含苹果或橙色，识别图片中的哪个水果）。\n\n本章节采用了经典的[Auto MPG](https://archive.ics.uci.edu/ml/datasets/auto+mpg) 数据集，并建立了一个模型来预测20世纪70年代末和80年代初汽车的燃油效率。为此，我们将为该模型提供该时段内许多汽车的描述，此描述包括以下属性：气缸，排量，马力和重量。\n\n此实例使用tf.keras API，有关信息信息，请参阅[Keras指南](https://tensorflow.google.cn/guide/keras)。\n\n```python\n# 使用seaborn进行pairplot数据可视化，安装命令\npip install seaborn\n```\n\n```python\nfrom __future__ import absolute_import, division, print_function, unicode_literals\n\nimport pathlib\n\nimport matplotlib.pyplot as plt\nimport pandas as pd\nimport seaborn as sns\n\n# tensorflow2 安装命令 pip install tensorflow==2.0.0-alpha0\nimport tensorflow as tf\n\nfrom tensorflow import keras\nfrom tensorflow.keras import layers\n\nprint(tf.__version__)\n```\n\n## 1. Auto MPG数据集\n\n该数据集可从[UCI机器学习库](https://archive.ics.uci.edu/ml/)获得。\n\n### 1.1. 获取数据\n\n首先下载数据集：\n\n```python\ndataset_path = keras.utils.get_file(\"auto-mpg.data\", \"http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data\")\ndataset_path\n```\n\n用pandas导入数据\n\n```python\ncolumn_names = ['MPG','Cylinders','Displacement','Horsepower','Weight',\n                'Acceleration', 'Model Year', 'Origin']\nraw_dataset = pd.read_csv(dataset_path, names=column_names,\n                      na_values = \"?\", comment='\\t',\n                      sep=\" \", skipinitialspace=True)\n\ndataset = raw_dataset.copy()\ndataset.tail()\n```\n\n|     | MPG  | Cylinders | Displacement | Horsepower | Weight | Acceleration | Model Year | Origin |\n|-----|------|-----------|--------------|------------|--------|--------------|------------|--------|\n| 393 | 27.0 | 4         | 140.0        | 86.0       | 2790.0 | 15.6         | 82         | 1      |\n| 394 | 44.0 | 4         | 97.0         | 52.0       | 2130.0 | 24.6         | 82         | 2      |\n| 395 | 32.0 | 4         | 135.0        | 84.0       | 2295.0 | 11.6         | 82         | 1      |\n| 396 | 28.0 | 4         | 120.0        | 79.0       | 2625.0 | 18.6         | 82         | 1      |\n| 397 | 31.0 | 4         | 119.0        | 82.0       | 2720.0 | 19.4         | 82         | 1      |\n\n### 1.2. 清理数据\n\n数据集包含一些未知值\n\n```python\ndataset.isna().sum()\n```\n\n```output\nMPG             0\nCylinders       0\nDisplacement    0\nHorsepower      6\nWeight          0\nAcceleration    0\nModel Year      0\nOrigin          0\ndtype: int64\n```\n\n这是一个入门教程，所以我们就简单地删除这些行。\n\n```python\ndataset = dataset.dropna()\n```\n\n“Origin”这一列实际上是分类，而不是数字。 所以把它转换为独热编码：\n\n```python\norigin = dataset.pop('Origin')\n```\n\n```python\ndataset['USA'] = (origin == 1)*1.0\ndataset['Europe'] = (origin == 2)*1.0\ndataset['Japan'] = (origin == 3)*1.0\ndataset.tail()\n```\n\n|     | MPG  | Cylinders | Displacement | Horsepower | Weight | Acceleration | Model Year | USA | Europe | Japan |\n|-----|------|-----------|--------------|------------|--------|--------------|------------|-----|--------|-------|\n| 393 | 27.0 | 4         | 140.0        | 86.0       | 2790.0 | 15.6         | 82         | 1.0 | 0.0    | 0.0   |\n| 394 | 44.0 | 4         | 97.0         | 52.0       | 2130.0 | 24.6         | 82         | 0.0 | 1.0    | 0.0   |\n| 395 | 32.0 | 4         | 135.0        | 84.0       | 2295.0 | 11.6         | 82         | 1.0 | 0.0    | 0.0   |\n| 396 | 28.0 | 4         | 120.0        | 79.0       | 2625.0 | 18.6         | 82         | 1.0 | 0.0    | 0.0   |\n| 397 | 31.0 | 4         | 119.0        | 82.0       | 2720.0 | 19.4         | 82         | 1.0 | 0.0    | 0.0   |\n\n### 1.3. 将数据分为训练集和测试集\n\n现在将数据集拆分为训练集和测试集，我们将在模型的最终评估中使用测试集。\n\n```python\ntrain_dataset = dataset.sample(frac=0.8,random_state=0)\ntest_dataset = dataset.drop(train_dataset.index)\n```\n\n### 1.4. 检查数据\n\n快速浏览训练集中几对列的联合分布：\n\n```python\nsns.pairplot(train_dataset[[\"MPG\", \"Cylinders\", \"Displacement\", \"Weight\"]], diag_kind=\"kde\")\n```\n\n![png](https://tensorflow.google.cn/beta/tutorials/keras/basic_regression_files/output_20_1.png)\n\n另外查看整体统计数据：\n\n```python\ntrain_stats = train_dataset.describe()\ntrain_stats.pop(\"MPG\")\ntrain_stats = train_stats.transpose()\ntrain_stats\n```\n\n|              | count | mean        | std        | min    | 25%     | 50%    | 75%     | max    |\n|--------------|-------|-------------|------------|--------|---------|--------|---------|--------|\n| Cylinders    | 314.0 | 5.477707    | 1.699788   | 3.0    | 4.00    | 4.0    | 8.00    | 8.0    |\n| Displacement | 314.0 | 195.318471  | 104.331589 | 68.0   | 105.50  | 151.0  | 265.75  | 455.0  |\n| Horsepower   | 314.0 | 104.869427  | 38.096214  | 46.0   | 76.25   | 94.5   | 128.00  | 225.0  |\n| Weight       | 314.0 | 2990.251592 | 843.898596 | 1649.0 | 2256.50 | 2822.5 | 3608.00 | 5140.0 |\n| Acceleration | 314.0 | 15.559236   | 2.789230   | 8.0    | 13.80   | 15.5   | 17.20   | 24.8   |\n| Model Year   | 314.0 | 75.898089   | 3.675642   | 70.0   | 73.00   | 76.0   | 79.00   | 82.0   |\n| USA          | 314.0 | 0.624204    | 0.485101   | 0.0    | 0.00    | 1.0    | 1.00    | 1.0    |\n| Europe       | 314.0 | 0.178344    | 0.383413   | 0.0    | 0.00    | 0.0    | 0.00    | 1.0    |\n| Japan        | 314.0 | 0.197452    | 0.398712   | 0.0    | 0.00    | 0.0    | 0.00    | 1.0    |\n\n### 1.5. 从标签中分割特征\n\n将目标值或“标签”与特征分开，此标签是您训练的模型进行预测的值：\n\n```python\ntrain_labels = train_dataset.pop('MPG')\ntest_labels = test_dataset.pop('MPG')\n```\n\n### 1.6. 标准化数据\n\n再看一下上面的`train_stats`块，注意每个特征的范围有多么不同。\n\n使用不同的比例和范围对特征进行标准化是一个很好的实践，虽然模型可能在没有特征标准化的情况下收敛，但它使训练更加困难，并且它使得最终模型取决于输入中使用的单位的选择。\n\n注意：尽管我们仅从训练数据集中有意生成这些统计信息，但这些统计信息也将用于标准化测试数据集。我们需要这样做，将测试数据集投影到模型已经训练过的相同分布中。\n\n```python\ndef norm(x):\n  return (x - train_stats['mean']) / train_stats['std']\nnormed_train_data = norm(train_dataset)\nnormed_test_data = norm(test_dataset)\n```\n\n这个标准化数据是我们用来训练模型的数据。\n\n注意：用于标准化输入的统计数据（平均值和标准偏差）需要应用于输入模型的任何其他数据，以及我们之前执行的独热编码。这包括测试集以及模型在生产中使用时的实时数据。\n\n## 2. 模型\n\n### 2.1. 构建模型\n\n让我们建立我们的模型。在这里，我们将使用具有两个密集连接隐藏层的`Sequential`模型，以及返回单个连续值的输出层。模型构建步骤包含在函数`build_model`中，因为我们稍后将创建第二个模型。\n\n```python\ndef build_model():\n  model = keras.Sequential([\n    layers.Dense(64, activation='relu', input_shape=[len(train_dataset.keys())]),\n    layers.Dense(64, activation='relu'),\n    layers.Dense(1)\n  ])\n\n  optimizer = tf.keras.optimizers.RMSprop(0.001)\n\n  model.compile(loss='mse',\n                optimizer=optimizer,\n                metrics=['mae', 'mse'])\n  return model\n```\n\n```python\nmodel = build_model()\n```\n\n### 2.2. 检查模型\n\n使用`.summary`方法打印模型的简单描述\n\n```python\nmodel.summary()\n```\n\n```output\nModel: \"sequential\"\n_________________________________________________________________\nLayer (type)                 Output Shape              Param #   \n=================================================================\ndense (Dense)                (None, 64)                640       \n_________________________________________________________________\ndense_1 (Dense)              (None, 64)                4160      \n_________________________________________________________________\ndense_2 (Dense)              (None, 1)                 65        \n=================================================================\nTotal params: 4,865\nTrainable params: 4,865\nNon-trainable params: 0\n_________________________________________________________________\n```\n\n现在试试这个模型。从训练数据中取出一批10个样本数据并在调用`model.predict`函数。\n\n```python\nexample_batch = normed_train_data[:10]\nexample_result = model.predict(example_batch)\nexample_result\n```\n\n```output\n      array([[ 0.3297699 ],\n            [ 0.25655937],\n            [-0.12460149],\n            [ 0.32495883],\n            [ 0.50459725],\n            [ 0.10887371],\n            [ 0.57305855],\n            [ 0.57637435],\n            [ 0.12094647],\n            [ 0.6864784 ]], dtype=float32)\n```\n\n这似乎可以工作，它产生预期的shape和类型的结果。\n\n### 2.3. 训练模型\n\n训练模型1000个周期，并在`history`对象中记录训练和验证准确性：\n\n```python\n# 通过为每个完成的周期打印单个点来显示训练进度 \nclass PrintDot(keras.callbacks.Callback):\n  def on_epoch_end(self, epoch, logs):\n    if epoch % 100 == 0: print('')\n    print('.', end='')\n\nEPOCHS = 1000\n\nhistory = model.fit(\n  normed_train_data, train_labels,\n  epochs=EPOCHS, validation_split = 0.2, verbose=0,\n  callbacks=[PrintDot()])\n```\n\n使用存储在`history`对象中的统计数据可视化模型的训练进度。300\n\n```python\nhist = pd.DataFrame(history.history)\nhist['epoch'] = history.epoch\nhist.tail()\n```\n\n|     | loss     | mae      | mse      | val_loss  | val_mae  | val_mse   | epoch |\n|-----|----------|----------|----------|-----------|----------|-----------|-------|\n| 995 | 2.556746 | 0.988013 | 2.556746 | 10.210531 | 2.324411 | 10.210530 | 995   |\n| 996 | 2.597973 | 1.039339 | 2.597973 | 11.257273 | 2.469266 | 11.257273 | 996   |\n| 997 | 2.671929 | 1.040886 | 2.671929 | 10.604957 | 2.446257 | 10.604958 | 997   |\n| 998 | 2.634858 | 1.001898 | 2.634858 | 10.906935 | 2.373279 | 10.906935 | 998   |\n| 999 | 2.741717 | 1.035889 | 2.741717 | 10.698320 | 2.342703 | 10.698319 | 999   |\n\n```python\ndef plot_history(history):\n  hist = pd.DataFrame(history.history)\n  hist['epoch'] = history.epoch\n\n  plt.figure()\n  plt.xlabel('Epoch')\n  plt.ylabel('Mean Abs Error [MPG]')\n  plt.plot(hist['epoch'], hist['mae'],\n           label='Train Error')\n  plt.plot(hist['epoch'], hist['val_mae'],\n           label = 'Val Error')\n  plt.ylim([0,5])\n  plt.legend()\n\n  plt.figure()\n  plt.xlabel('Epoch')\n  plt.ylabel('Mean Square Error [$MPG^2$]')\n  plt.plot(hist['epoch'], hist['mse'],\n           label='Train Error')\n  plt.plot(hist['epoch'], hist['val_mse'],\n           label = 'Val Error')\n  plt.ylim([0,20])\n  plt.legend()\n  plt.show()\n\n\nplot_history(history)\n```\n\n![png](https://tensorflow.google.cn/beta/tutorials/keras/basic_regression_files/output_42_0.png?dcb_=0.7319815786783315)\n\n![png](https://tensorflow.google.cn/beta/tutorials/keras/basic_regression_files/output_42_1.png?dcb_=0.09774210050560783)\n\n该图表显示在约100个周期之后，验证误差几乎没有改进，甚至降低。让我们更新`model.fit`调用，以便在验证分数没有提高时自动停止训练。我们将使用`EarlyStopping`回调来测试每个周期的训练状态。如果经过一定数量的周期而没有显示出改进，则自动停止训练。\n\n您可以了解此回调的更多信息 [连接](https://tensorflow.google.cn/versions/master/api_docs/python/tf/keras/callbacks/EarlyStopping).\n\n```python\nmodel = build_model()\n\n# “patience”参数是检查改进的周期量 \nearly_stop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=10)\n\nhistory = model.fit(normed_train_data, train_labels, epochs=EPOCHS,\n                    validation_split = 0.2, verbose=0, callbacks=[early_stop, PrintDot()])\n\nplot_history(history)\n```\n\n![png](https://tensorflow.google.cn/beta/tutorials/keras/basic_regression_files/output_44_1.png?dcb_=0.8643233947217597)\n\n![png](https://tensorflow.google.cn/beta/tutorials/keras/basic_regression_files/output_44_2.png?dcb_=0.8788778722328034)\n\n上图显示在验证集上平均误差通常约为+/-2MPG，这个好吗？我们会把这个决定留给你。\n\n让我们看一下使用测试集来看一下泛化模型效果，我们在训练模型时没有使用测试集，这告诉我们，当我们在现实世界中使用模型时，我们可以期待模型预测。\n\n```python\nloss, mae, mse = model.evaluate(normed_test_data, test_labels, verbose=0)\n\nprint(\"Testing set Mean Abs Error: {:5.2f} MPG\".format(mae))\n```\n\n`Testing set Mean Abs Error:  2.09 MPG`\n\n### 2.4. 预测\n\n最后，使用测试集中的数据预测MPG值：\n\n```python\ntest_predictions = model.predict(normed_test_data).flatten()\n\nplt.scatter(test_labels, test_predictions)\nplt.xlabel('True Values [MPG]')\nplt.ylabel('Predictions [MPG]')\nplt.axis('equal')\nplt.axis('square')\nplt.xlim([0,plt.xlim()[1]])\nplt.ylim([0,plt.ylim()[1]])\n_ = plt.plot([-100, 100], [-100, 100])\n\n```\n\n![png](https://tensorflow.google.cn/beta/tutorials/keras/basic_regression_files/output_48_0.png?dcb_=0.5259404035812005)\n\n看起来我们的模型预测得相当好，我们来看看错误分布：\n\n```python\nerror = test_predictions - test_labels\nplt.hist(error, bins = 25)\nplt.xlabel(\"Prediction Error [MPG]\")\n_ = plt.ylabel(\"Count\")\n```\n\n![png](https://tensorflow.google.cn/beta/tutorials/keras/basic_regression_files/output_50_0.png?dcb_=0.042220469967213514)\n\n上图看起来不是很高斯（正态分布），很可能是因为样本数据非常少。\n\n## 3. 结论\n\n本章节介绍了一些处理回归问题的技巧：\n\n* 均方误差（MSE）是用于回归问题的常见损失函数（不同的损失函数用于分类问题）。\n\n* 同样，用于回归的评估指标与分类不同，常见的回归度量是平均绝对误差（MAE）。\n\n* 当数字输入数据特征具有不同范围的值时，应将每个特征独立地缩放到相同范围。\n\n* 如果没有太多训练数据，应选择隐藏层很少的小网络，以避免过拟合。\n\n* 尽早停止是防止过拟合的有效技巧。\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-keras-basic_regression.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-keras-basic_regression.html)\n> 英文版本：[https://tensorflow.google.cn/beta/tutorials/keras/basic_regression](https://tensorflow.google.cn/beta/tutorials/keras/basic_regression)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/keras/basic_regression.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/keras/basic_regression.md)\n"
  },
  {
    "path": "r2/tutorials/keras/basic_text_classification.md",
    "content": "---\ntitle: 文本分类项目实战：电影评论\ncategories: tensorflow2官方教程\ntags: tensorflow2.0教程\ntop: 1927\nabbrlink: tensorflow/tf2-tutorials-keras-basic_text_classification\n---\n\n# 文本分类项目实战：电影评论 (tensorflow2.0官方教程翻译)\n\n本文会将文本形式的影评分为“正面”或“负面”影评。这是一个二元分类（又称为两类分类）的示例，也是一种重要且广泛适用的机器学习问题。\n\n我们将使用包含来自[网络电影数据库](https://www.imdb.com/)的50,000条电影评论文本的[IMDB数据集](https://tensorflow.google.cn/api_docs/python/tf/keras/datasets/imdb)，这些被分为25,000条训练评论和25,000条评估评论，训练和测试集是平衡的，这意味着它们包含相同数量的正面和负面评论。\n\n本章节使用tf.keras，这是一个高级API，用于在TensorFlow中构建和训练模型，有关使用tf.keras的更高级文本分类教程，请参阅[MLCC文本分类指南](https://developers.google.cn/machine-learning/guides/text-classification/)。\n\n```python\nfrom __future__ import absolute_import, division, print_function, unicode_literals\n\nimport tensorflow as tf\nfrom tensorflow import keras\n\nimport numpy as np\n\nprint(tf.__version__)\n```\n\n`2.0.0-alpha0`\n\n## 1. 下载IMDB数据集\n\nIMDB数据集与TensorFlow一起打包，它已经被预处理，使得评论（单词序列）已被转换为整数序列，其中每个整数表示字典中的特定单词。\n\n以下代码将IMDB数据集下载到您的计算机（如果您已经下载了它，则使用缓存副本）：\n\n```python\nimdb = keras.datasets.imdb\n\n(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)\n```  \n\n参数 `num_words=10000` 保留训练数据中最常出现的10,000个单词，丢弃罕见的单词以保持数据的大小可管理。\n\n## 2. 探索数据\n\n我们花一点时间来理解数据的格式，数据集经过预处理：每个示例都是一个整数数组，表示电影评论的单词。每个标签都是0或1的整数值，其中0表示负面评论，1表示正面评论。\n\n```python\nprint(\"Training entries: {}, labels: {}\".format(len(train_data), len(train_labels)))\n```\n\n`Training entries: 25000, labels: 25000`\n\n评论文本已转换为整数，其中每个整数表示字典中的特定单词。以下是第一篇评论的内容：\n\n```python\nprint(train_data[0])\n```\n\n`[1, 14, 22, 16, 43, 530, 973, ...., 32, 15, 16, 5345, 19, 178, 32]`\n\n电影评论的长度可能不同，以下代码显示了第一次和第二次评论中的字数。由于对神经网络的输入必须是相同的长度，我们稍后需要解决此问题。\n\n```python\nlen(train_data[0]), len(train_data[1])\n```\n\n`(218, 189)`\n\n### 2.1. 将整数转换成文本\n\n了解如何将整数转换回文本可能很有用。\n在这里，我们将创建一个辅助函数来查询包含整数到字符串映射的字典对象：\n\n```python\n# 将单词映射到整数索引的字典\nword_index = imdb.get_word_index()\n\n# 第一个指数是保留的\nword_index = {k:(v+3) for k,v in word_index.items()}\nword_index[\"<PAD>\"] = 0\nword_index[\"<START>\"] = 1\nword_index[\"<UNK>\"] = 2  # unknown\nword_index[\"<UNUSED>\"] = 3\n\nreverse_word_index = dict([(value, key) for (key, value) in word_index.items()])\n\ndef decode_review(text):\n    return ' '.join([reverse_word_index.get(i, '?') for i in text])\n```\n\n现在我们可以使用`decode_review`函数显示第一次检查的文本：\n\n```python\ndecode_review(train_data[0])\n```\n\n*\"<START> this film was just brilliant casting location scenery story direction .....that was shared with us all\"*\n\n## 3. 预处理数据\n\n影评（整数数组）必须转换为张量，然后才能馈送到神经网络中。我们可以通过以下两种方法实现这种转换：\n\n* 对数组进行独热编码，将它们转换为由 0 和 1 构成的向量。例如，序列 [3, 5] 将变成一个 10000 维的向量，除索引 3 和 5 转换为 1 之外，其余全转换为 0。然后，将它作为网络的第一层，一个可以处理浮点向量数据的密集层。不过，这种方法会占用大量内存，需要一个大小为 `num_words * num_reviews` 的矩阵。\n\n* 或者，我们可以填充数组，使它们都具有相同的长度，然后创建一个形状为 `max_length * num_reviews` 的整数张量。我们可以使用一个能够处理这种形状的嵌入层作为网络中的第一层。\n\n在本教程中，我们将使用第二种方法。\n\n由于电影评论的长度必须相同，我们将使用[pad_sequences](https://tensorflow.google.cn/api_docs/python/tf/keras/preprocessing/sequence/pad_sequences)函数来标准化长度：\n\n```python\ntrain_data = keras.preprocessing.sequence.pad_sequences(train_data,\n                                                        value=word_index[\"<PAD>\"],\n                                                        padding='post',\n                                                        maxlen=256)\n\ntest_data = keras.preprocessing.sequence.pad_sequences(test_data,\n                                                       value=word_index[\"<PAD>\"],\n                                                       padding='post',\n                                                       maxlen=256)\n```\n\n我们再看一下数据的长度：\n\n```python\nlen(train_data[0]), len(train_data[1])\n```\n\n`(256, 256)`\n\n并查看数据：\n\n```python\nprint(train_data[0])\n```\n\n```output\n[   1   14   22   16   43  530  973 1622 1385   65  458 4468   66 3941\n    4  173   36  256    5   25  100   43  838  112   50  670    2    9\n  ...\n    0    0    0    0    0    0    0    0    0    0    0    0    0    0\n    0    0    0    0]\n```\n\n## 4. 构建模型\n\n神经网络通过堆叠层创建而成，这需要做出两个架构方面的主要决策：\n\n* 要在模型中使用多少个层？\n* 要针对每个层使用多少个隐藏单元？\n\n在本示例中，输入数据由字词-索引数组构成。要预测的标签是 0 或 1。接下来，我们为此问题构建一个模型：\n\n```python\n# 输入形状是用于电影评论的词汇计数（10,000字）\nvocab_size = 10000\n\nmodel = keras.Sequential()\nmodel.add(keras.layers.Embedding(vocab_size, 16))\nmodel.add(keras.layers.GlobalAveragePooling1D())\nmodel.add(keras.layers.Dense(16, activation='relu'))\nmodel.add(keras.layers.Dense(1, activation='sigmoid'))\n\nmodel.summary()\n```\n\n```\nModel: \"sequential\"\n_________________________________________________________________\nLayer (type)                 Output Shape              Param #   \n=================================================================\nembedding (Embedding)        (None, None, 16)          160000    \n_________________________________________________________________\nglobal_average_pooling1d (Gl (None, 16)                0         \n_________________________________________________________________\ndense (Dense)                (None, 16)                272       \n_________________________________________________________________\ndense_1 (Dense)              (None, 1)                 17        \n=================================================================\nTotal params: 160,289\nTrainable params: 160,289\nNon-trainable params: 0\n_________________________________________________________________\n```\n这些层按顺序堆叠以构建分类器：\n\n1. 第一层是`Embedding`层。该层采用整数编码的词汇表，并查找每个词索引的嵌入向量。这些向量是作为模型训练学习的，向量为输入数组添加维度，生成的维度为：`(batch, sequence, embedding)`.\n\n2. 接下来，`GlobalAveragePooling1D`层通过对序列维度求平均值，针对每个样本返回一个长度固定的输出向量。这样，模型便能够以尽可能简单的方式处理各种长度的输入。\n\n3. 该长度固定的输出向量会传入一个全连接 (Dense) 层（包含 16 个隐藏单元）\n\n4. 最后一层与单个输出节点密集连接。应用`sigmoid`激活函数后，结果是介于 0 到 1 之间的浮点值，表示概率或置信水平。\n\n### 4.1. 隐藏单元\n\n上述模型在输入和输出之间有两个中间层（也称为“隐藏”层）。输出（单元、节点或神经元）的数量是相应层的表示法空间的维度。换句话说，该数值表示学习内部表示法时网络所允许的自由度。\n\n如果模型具有更多隐藏单元（更高维度的表示空间）和/或更多层，则说明网络可以学习更复杂的表示法。不过，这会使网络耗费更多计算资源，并且可能导致学习不必要的模式（可以优化在训练数据上的表现，但不会优化在测试数据上的表现）。这称为过拟合，我们稍后会加以探讨。\n\n### 4.2. 损失函数和优化器\n\n模型需要一个损失函数和一个用于训练的优化器。由于这是一个二元分类问题，并且模型输出概率（网络最后一层使用sigmoid 激活函数，仅包含一个单元），那么最好使用`binary_crossentropy`（二元交叉熵）损失。\n\n这不是损失函数的唯一选择，例如，您可以选择`mean_squared_error`（均方误差）。但对于输出概率值的模型，交叉熵（crossentropy）往往是最好\n的选择。交叉熵是来自于信息论领域的概念，用于衡量概率分布之间的距离，在这个例子中就是真实分布与预测值之间的距离。。\n\n在后面，当我们探索回归问题（比如预测房子的价格）时，我们将看到如何使用另一种称为均方误差的损失函数。\n\n现在，配置模型以使用优化器和损失函数：\n\n```python\nmodel.compile(optimizer='adam',\n              loss='binary_crossentropy',\n              metrics=['accuracy'])\n```\n\n## 5. 创建验证集\n\n在训练时，我们想要检查模型在以前没有见过的数据上的准确性。通过从原始训练数据中分离10,000个示例来创建验证集。（为什么不立即使用测试集？我们的目标是仅使用训练数据开发和调整我们的模型，然后仅使用测试数据来评估我们的准确性）。\n\n```python\nx_val = train_data[:10000]\npartial_x_train = train_data[10000:]\n\ny_val = train_labels[:10000]\npartial_y_train = train_labels[10000:]\n```\n\n## 6. 训练模型\n\n以512个样本的小批量训练模型40个周期，这是`x_train`和`y_train`张量中所有样本的40次迭代。在训练期间，监控模型在验证集中的10,000个样本的损失和准确性：\n\n```python\nhistory = model.fit(partial_x_train,\n                    partial_y_train,\n                    epochs=40,\n                    batch_size=512,\n                    validation_data=(x_val, y_val),\n                    verbose=1)\n```\n\n`Epoch 40/40\n15000/15000 [==============================] - 1s 54us/sample - loss: 0.0926 - accuracy: 0.9771 - val_loss: 0.3133 - val_accuracy: 0.8824`\n\n## 7. 评估模型\n\n让我们看看模型的表现，将返回两个值，损失（表示我们的错误的数字，更低的值更好）和准确性。\n\n```\nresults = model.evaluate(test_data, test_labels)\n\nprint(results)\n```\n\n`25000/25000 [==============================] - 1s 45us/sample - loss: 0.3334 - accuracy: 0.8704\n[0.33341303256988525, 0.87036]`\n\n这种相当简单的方法实现了约87％的准确度，使用更先进的方法，模型应该接近95％。\n\n## 8. 创建准确性和损失随时间变化的图表\n\n`model.fit()`返回一个`History`对象，其中包含一个字典，其中包含训练期间发生的所有事情：\n\n```python\nhistory_dict = history.history\nhistory_dict.keys()\n```\n\n```output\n      dict_keys(['loss', 'val_loss', 'accuracy', 'val_accuracy'])\n```\n\n有四个条目：在训练和验证期间，每个条目对应一个监控指标，我们可以使用这些来绘制训练和验证损失以进行比较，以及训练和验证准确性：\n\n```python\nimport matplotlib.pyplot as plt\n\nacc = history_dict['accuracy']\nval_acc = history_dict['val_accuracy']\nloss = history_dict['loss']\nval_loss = history_dict['val_loss']\n\nepochs = range(1, len(acc) + 1)\n\n# \"bo\" is for \"blue dot\"\nplt.plot(epochs, loss, 'bo', label='Training loss')\n# b is for \"solid blue line\"\nplt.plot(epochs, val_loss, 'b', label='Validation loss')\nplt.title('Training and validation loss')\nplt.xlabel('Epochs')\nplt.ylabel('Loss')\nplt.legend()\n\nplt.show()\n```\n\n*<Figure size 640x480 with 1 Axes>*  \n\n```\nplt.clf()   # clear figure\n\nplt.plot(epochs, acc, 'bo', label='Training acc')\nplt.plot(epochs, val_acc, 'b', label='Validation acc')\nplt.title('Training and validation accuracy')\nplt.xlabel('Epochs')\nplt.ylabel('Accuracy')\nplt.legend()\n\nplt.show()\n```\n\n![png](https://tensorflow.google.cn/beta/tutorials/keras/basic_text_classification_files/output_40_0.png)\n\n在该图中，点表示训练损失和准确度，实线表示验证损失和准确度。\n\n可以注意到，训练损失随着周期数的增加而降低，训练准确率随着周期数的增加而提高。在使用梯度下降法优化模型时，这属于正常现象(该方法应在每次迭代时尽可能降低目标值)。\n\n验证损失和准确率的变化情况并非如此，它们似乎在大约 20 个周期后达到峰值。这是一种过拟合现象：模型在训练数据上的表现要优于在从未见过的数据上的表现。在此之后，模型会过度优化和学习特定于训练数据的表示法，而无法泛化到测试数据。\n\n对于这种特殊情况，我们可以在大约 20 个周期后停止训练，防止出现过拟合。稍后，您将了解如何使用回调自动执行此操作。\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-keras-basic_text_classification.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-keras-basic_text_classification.html)\n> 英文版本：[https://tensorflow.google.cn/beta/tutorials/keras/basic_text_classification](https://tensorflow.google.cn/beta/tutorials/keras/basic_text_classification)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/keras/basic_text_classification.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/keras/basic_text_classification.md)\n"
  },
  {
    "path": "r2/tutorials/keras/basic_text_classification_with_tfhub.md",
    "content": "---\ntitle: 使用Keras和TensorFlow Hub对电影评论进行文本分类\ncategories: tensorflow2官方教程\ntags: tensorflow2.0教程\ntop: 1918\nabbrlink: tensorflow/tf2-tutorials-keras-basic_text_classification_with_tfhub\n---\n\n# 使用Keras和TensorFlow Hub对电影评论进行文本分类 (tensorflow2.0官方教程翻译)\n\n此教程本会将文本形式的影评分为“正面”或“负面”影评。这是一个二元分类（又称为两类分类）的示例，也是一种重要且广泛适用的机器学习问题。\n\n本教程演示了使用TensorFlow Hub和Keras进行迁移学习的基本应用。\n\n数据集使用 [IMDB 数据集](https://tensorflow.google.cn/api_docs/python/tf/keras/datasets/imdb)，其中包含来自互联网电影数据库  https://www.imdb.com/ 的50000 条影评文本。我们将这些影评拆分为训练集（25000 条影评）和测试集（25000 条影评）。训练集和测试集之间达成了平衡，意味着它们包含相同数量的正面和负面影评。\n\n此教程使用[tf.keras](https://www.tensorflow.org/guide/keras)，一种用于在 TensorFlow 中构建和训练模型的高阶 API，以及[TensorFlow Hub](https://www.tensorflow.org/hub)，一个用于迁移学习的库和平台。\n\n有关使用 tf.keras 的更高级文本分类教程，请参阅 [MLCC 文本分类指南](https://developers.google.cn/machine-learning/guides/text-classification/)。\n\n导入库：\n\n```python\nfrom __future__ import absolute_import, division, print_function, unicode_literals\n\nimport numpy as np\n\nimport tensorflow as tf\n\nimport tensorflow_hub as hub\nimport tensorflow_datasets as tfds\n\nprint(\"Version: \", tf.__version__)\nprint(\"Eager mode: \", tf.executing_eagerly())\nprint(\"Hub version: \", hub.__version__)\nprint(\"GPU is\", \"available\" if tf.test.is_gpu_available() else \"NOT AVAILABLE\")\n```\n\n## 1. 下载 IMDB 数据集\n\n[TensorFlow数据集](https://github.com/tensorflow/datasets)上提供了IMDB数据集。以下代码将IMDB数据集下载到您的机器：\n\n```python\n# 将训练集分成60％和40％，因此我们最终会得到15,000个训练样本，10,000个验证样本和25,000个测试样本。\ntrain_validation_split = tfds.Split.TRAIN.subsplit([6, 4])\n\n(train_data, validation_data), test_data = tfds.load(\n    name=\"imdb_reviews\", \n    split=(train_validation_split, tfds.Split.TEST),\n    as_supervised=True)\n```\n\n## 2. 探索数据 \n\n我们花点时间来了解一下数据的格式，每个样本表示电影评论和相应标签的句子，该句子不以任何方式进行预处理。每个标签都是整数值 0 或 1，其中 0 表示负面影评，1 表示正面影评。\n\n我们先打印10个样本。\n\n```python\ntrain_examples_batch, train_labels_batch = next(iter(train_data.batch(10)))\ntrain_examples_batch\n```\n\n我们还打印前10个标签。\n\n```python\ntrain_labels_batch\n```\n\n## 3. 构建模\n\n神经网络通过堆叠层创建而成，这需要做出三个架构方面的主要决策：\n\n* 如何表示文字？\n* 要在模型中使用多少个层？\n* 要针对每个层使用多少个隐藏单元？\n\n在此示例中，输入数据由句子组成。要预测的标签是0或1。\n\n表示文本的一种方法是将句子转换为嵌入向量。我们可以使用预先训练的文本嵌入作为第一层，这将具有两个优点：\n*  我们不必担心文本预处理，\n*  我们可以从迁移学习中受益\n*  嵌入具有固定的大小，因此处理起来更简单。\n\n对于此示例，我们将使用来自[TensorFlow Hub](https://www.tensorflow.org/hub) 的预训练文本嵌入模型，名为[google/tf2-preview/gnews-swivel-20dim/1](https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1).\n\n要达到本教程的目的，还有其他三种预训练模型可供测试：\n* [google/tf2-preview/gnews-swivel-20dim-with-oov/1](https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim-with-oov/1) 与 [google/tf2-preview/gnews-swivel-20dim/1](https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1)相同，但2.5％的词汇量转换为OOV桶。如果模型的任务和词汇表的词汇不完全重叠，这可以提供帮助。\n\n* [google/tf2-preview/nnlm-en-dim50/1](https://tfhub.dev/google/tf2-preview/nnlm-en-dim50/1) 一个更大的模型，具有约1M的词汇量和50个维度。\n* [google/tf2-preview/nnlm-en-dim128/1](https://tfhub.dev/google/tf2-preview/nnlm-en-dim128/1) 甚至更大的模型，具有约1M的词汇量和128个维度。\n\n让我们首先创建一个使用TensorFlow Hub模型嵌入句子的Keras层，并在几个输入示例上进行尝试。请注意，无论输入文本的长度如何，嵌入的输出形状为：`(num_examples, embedding_dimension)`。\n\n```python\nembedding = \"https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1\"\nhub_layer = hub.KerasLayer(embedding, input_shape=[], \n                           dtype=tf.string, trainable=True)\nhub_layer(train_examples_batch[:3])\n```\n\n现在让我们构建完整的模型：\n\n```python\nmodel = tf.keras.Sequential()\nmodel.add(hub_layer)\nmodel.add(tf.keras.layers.Dense(16, activation='relu'))\nmodel.add(tf.keras.layers.Dense(1, activation='sigmoid'))\n\nmodel.summary()\n```\n\n```output\n            Model: \"sequential\" \n            _________________________________________________________________ \n            Layer (type) Output Shape Param # \n            =================================================================\n            keras_layer (KerasLayer) (None, 20) 400020 \n            _________________________________________________________________ \n            dense (Dense) (None, 16) 336\n            _________________________________________________________________ \n            dense_1 (Dense) (None, 1) 17 \n            ================================================================= \n            Total params: 400,373 Trainable params: 400,373 Non-trainable params: 0 \n            _________________________________________________________________\n```\n\n这些图层按顺序堆叠以构建分类器：\n1. 第一层是TensorFlow Hub层。该层使用预先训练的保存模型将句子映射到其嵌入向量。我们正在使用的预训练文本嵌入模型([google/tf2-preview/gnews-swivel-20dim/1](https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1))将句子拆分为标记，嵌入每个标记然后组合嵌入。生成的维度为：`(num_examples, embedding_dimension)`。\n\n2. 这个固定长度的输出矢量通过一个带有16个隐藏单元的完全连接（“密集”）层传输。\n3. 最后一层与单个输出节点密集连接。使用`sigmoid`激活函数，该值是0到1之间的浮点数，表示概率或置信度。\n\n让我们编译模型。\n\n### 3.1. 损失函数和优化器\n\n模型在训练时需要一个损失函数和一个优化器。由于这是一个二元分类问题且模型会输出一个概率（应用 S 型激活函数的单个单元层），因此我们将使用 binary_crossentropy 损失函数。\n\n该函数并不是唯一的损失函数，例如，您可以选择 mean_squared_error。但一般来说，binary_crossentropy 更适合处理概率问题，它可测量概率分布之间的“差距”，在本例中则为实际分布和预测之间的“差距”。\n\n稍后，在探索回归问题（比如预测房价）时，我们将了解如何使用另一个称为均方误差的损失函数。\n\n现在，配置模型以使用优化器和损失函数：\n\n```python\nmodel.compile(optimizer='adam',\n              loss='binary_crossentropy',\n              metrics=['accuracy'])\n```\n\n## 4. 训练模型\n\n用有 512 个样本的小批次训练模型 40 个周期。这将对 x_train 和 y_train 张量中的所有样本进行 40 次迭代。在训练期间，监控模型在验证集的 10000 个样本上的损失和准确率：\n\n```python\nhistory = model.fit(train_data.shuffle(10000).batch(512),\n                    epochs=20,\n                    validation_data=validation_data.batch(512),\n                    verbose=1)\n```\n\n```\n...output\n            Epoch 20/20\n            30/30 [==============================] - 4s 144ms/step - loss: 0.2027 - accuracy: 0.9264 - val_loss: 0.3079 - val_accuracy: 0.8697\n```\n\n## 5. 评估模型\n\n我们来看看模型的表现如何。模型会返回两个值：损失（表示误差的数字，越低越好）和准确率。\n\n```python\nresults = model.evaluate(test_data.batch(512), verbose=0)\nfor name, value in zip(model.metrics_names, results):\n  print(\"%s: %.3f\" % (name, value))\n```\n\n```\n            loss: 0.324 accuracy: 0.860\n```\n\n使用这种相当简单的方法可实现约 87% 的准确率。如果采用更高级的方法，模型的准确率应该会接近 95%。\n\n## 6. 进一步阅读\n\n要了解处理字符串输入的更一般方法，以及更详细地分析训练过程中的准确性和损失，请查看 https://www.tensorflow.org/tutorials/keras/basic_text_classification\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-keras-basic_text_classification_with_tfhub.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-keras-basic_text_classification_with_tfhub.html)\n> 英文版本：[https://tensorflow.google.cn/beta/tutorials/keras/basic_text_classification_with_tfhub](https://tensorflow.google.cn/beta/tutorials/keras/basic_text_classification_with_tfhub)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/keras/basic_text_classification_with_tfhub.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/keras/basic_text_classification_with_tfhub.md)\n\n"
  },
  {
    "path": "r2/tutorials/keras/feature_columns.md",
    "content": "---\ntitle: 结构化数据分类实战：心脏病预测\ncategories: tensorflow2官方教程\ntags: tensorflow2.0教程\ntop: 1913\nabbrlink: tensorflow/tf2-tutorials-keras-feature_columns\n---\n# 结构化数据分类实战：心脏病预测(tensorflow2.0官方教程翻译)\n\n本教程演示了如何对结构化数据进行分类（例如CSV格式的表格数据）。\n我们将使用Keras定义模型，并使用[特征列](https://tensorflow.google.cn/guide/feature_columns)作为桥梁，将CSV中的列映射到用于训练模型的特性。\n本教程包含完整的代码：\n\n* 使用[Pandas](https://pandas.pydata.org/)加载CSV文件。 .\n* 构建一个输入管道，使用[tf.data](https://tensorflow.google.cn/guide/datasets)批处理和洗牌行\n* 从CSV中的列映射到用于训练模型的特性。\n* 使用Keras构建、训练和评估模型。\n\n## 1. 数据集\n\n我们将使用克利夫兰诊所心脏病基金会提供的一个小[数据集](https://archive.ics.uci.edu/ml/datasets/heart+Disease) 。CSV中有几百行，每行描述一个患者，每列描述一个属性。我们将使用此信息来预测患者是否患有心脏病，该疾病在该数据集中是二元分类任务。\n\n以下是此[数据集的说明](https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/heart-disease.names)。请注意，有数字和分类列。\n\n>Column| Description| Feature Type | Data Type\n>------------|--------------------|----------------------|-----------------\n>Age | Age in years | Numerical | integer\n>Sex | (1 = male; 0 = female) | Categorical | integer\n>CP | Chest pain type (0, 1, 2, 3, 4) | Categorical | integer\n>Trestbpd | Resting blood pressure (in mm Hg on admission to the hospital) | Numerical | integer\n>Chol | Serum cholestoral in mg/dl | Numerical | integer\n>FBS | (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false) | Categorical | integer\n>RestECG | Resting electrocardiographic results (0, 1, 2) | Categorical | integer\n>Thalach | Maximum heart rate achieved | Numerical | integer\n>Exang | Exercise induced angina (1 = yes; 0 = no) | Categorical | integer\n>Oldpeak | ST depression induced by exercise relative to rest | Numerical | integer\n>Slope | The slope of the peak exercise ST segment | Numerical | float\n>CA | Number of major vessels (0-3) colored by flourosopy | Numerical | integer\n>Thal | 3 = normal; 6 = fixed defect; 7 = reversable defect | Categorical | string\n>Target | Diagnosis of heart disease (1 = true; 0 = false) | Classification | integer\n\n## 2. 导入TensorFlow和其他库\n\n安装sklearn依赖库\n\n```python\npip install sklearn\n```\n\n```python\nfrom __future__ import absolute_import, division, print_function, unicode_literals\n\nimport numpy as np\nimport pandas as pd\n\nimport tensorflow as tf\n\nfrom tensorflow import feature_column\nfrom tensorflow.keras import layers\nfrom sklearn.model_selection import train_test_split\n```\n\n## 3. 使用Pandas创建数据帧\n\n[Pandas](https://pandas.pydata.org/) 是一个Python库，包含许多有用的实用程序，用于加载和处理结构化数据。我们将使用Pandas从URL下载数据集，并将其加载到数据帧中。\n\n```python\nURL = 'https://storage.googleapis.com/applied-dl/heart.csv'\ndataframe = pd.read_csv(URL)\ndataframe.head()\n```\n\n## 4. 将数据拆分为训练、验证和测试\n\n我们下载的数据集是一个CSV文件，并将其分为训练，验证和测试集。\n\n```python\ntrain, test = train_test_split(dataframe, test_size=0.2)\ntrain, val = train_test_split(train, test_size=0.2)\nprint(len(train), 'train examples')\nprint(len(val), 'validation examples')\nprint(len(test), 'test examples')\n```\n\n```output\n      193 train examples\n      49 validation examples\n      61 test examples\n```\n\n## 5. 使用tf.data创建输入管道\n\n接下来，我们将使用tf.data包装数据帧，这将使我们能够使用特征列作为桥梁从Pandas数据框中的列映射到用于训练模型的特征。如果我们使用非常大的CSV文件（如此之大以至于它不适合内存），我们将使用tf.data直接从磁盘读取它，本教程不涉及这一点。\n\n```python\n# 一种从Pandas Dataframe创建tf.data数据集的使用方法 \ndef df_to_dataset(dataframe, shuffle=True, batch_size=32):\n  dataframe = dataframe.copy()\n  labels = dataframe.pop('target')\n  ds = tf.data.Dataset.from_tensor_slices((dict(dataframe), labels))\n  if shuffle:\n    ds = ds.shuffle(buffer_size=len(dataframe))\n  ds = ds.batch(batch_size)\n  return ds\n```\n\n```python\nbatch_size = 5 # 小批量用于演示目的\ntrain_ds = df_to_dataset(train, batch_size=batch_size)\nval_ds = df_to_dataset(val, shuffle=False, batch_size=batch_size)\ntest_ds = df_to_dataset(test, shuffle=False, batch_size=batch_size)\n```\n\n## 6. 理解输入管道\n\n现在我们已经创建了输入管道，让我们调用它来查看它返回的数据的格式，我们使用了一小批量来保持输出的可读性。\n\n```python\nfor feature_batch, label_batch in train_ds.take(1):\n  print('Every feature:', list(feature_batch.keys()))\n  print('A batch of ages:', feature_batch['age'])\n  print('A batch of targets:', label_batch )\n```\n\n```output\n      Every feature: ['age', 'chol', 'fbs', 'ca', 'slope', 'restecg', 'sex', 'thal', 'thalach', 'oldpeak', 'exang', 'cp', 'trestbps']\n      A batch of ages: tf.Tensor([58 52 56 35 59], shape=(5,), dtype=int32)\n      A batch of targets: tf.Tensor([1 0 1 0 0], shape=(5,), dtype=int32)\n```\n\n我们可以看到数据集返回一个列名称（来自数据帧），该列表映射到数据帧中行的列值。\n\n## 7. 演示几种类型的特征列\n\nTensorFlow提供了许多类型的特性列。在本节中，我们将创建几种类型的特性列，并演示它们如何从dataframe转换列。\n\n```python\n# 我们将使用此批处理来演示几种类型的特征列 \nexample_batch = next(iter(train_ds))[0]\n\n# 用于创建特征列和转换批量数据 \ndef demo(feature_column):\n  feature_layer = layers.DenseFeatures(feature_column)\n  print(feature_layer(example_batch).numpy())\n```\n\n### 7.1. 数字列\n\n特征列的输出成为模型的输入（使用上面定义的演示函数，我们将能够准确地看到数据帧中每列的转换方式），[数字列](https://tensorflow.google.cn/api_docs/python/tf/feature_column/numeric_column)是最简单的列类型，它用于表示真正有价值的特征，使用此列时，模型将从数据帧中接收未更改的列值。\n\n```python\nage = feature_column.numeric_column(\"age\")\ndemo(age)\n```\n\n```output\n      [[58.]\n      [52.]\n      [56.]\n      [35.]\n      [59.]]\n ```\n\n在心脏病数据集中，数据帧中的大多数列都是数字。\n\n### 7.2. Bucketized列（桶列）\n\n通常，您不希望将数字直接输入模型，而是根据数值范围将其值分成不同的类别，考虑代表一个人年龄的原始数据，我们可以使用[bucketized列](https://tensorflow.google.cn/api_docs/python/tf/feature_column/bucketized_column)将年龄分成几个桶，而不是将年龄表示为数字列。\n请注意，下面的one-hot(独热编码)值描述了每行匹配的年龄范围。\n\n```python\nage_buckets = feature_column.bucketized_column(age, boundaries=[18, 25, 30, 35, 40, 45, 50, 55, 60, 65])\ndemo(age_buckets)\n```\n\n```output\n      [[0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]\n      [0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]\n      [0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]\n      [0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]\n      [0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]]\n ```\n\n### 7.3. 分类列\n\n在该数据集中，thal表示为字符串（例如“固定”，“正常”或“可逆”），我们无法直接将字符串提供给模型，相反，我们必须首先将它们映射到数值。分类词汇表列提供了一种将字符串表示为独热矢量的方法（就像上面用年龄段看到的那样）。词汇表可以使用[categorical_column_with_vocabulary_list](https://tensorflow.google.cn/api_docs/python/tf/feature_column/categorical_column_with_vocabulary_list)作为列表传递，或者使用[categorical_column_with_vocabulary_file](https://tensorflow.google.cn/api_docs/python/tf/feature_column/categorical_column_with_vocabulary_file)从文件加载。\n\n```python\nthal = feature_column.categorical_column_with_vocabulary_list(\n      'thal', ['fixed', 'normal', 'reversible'])\n\nthal_one_hot = feature_column.indicator_column(thal)\ndemo(thal_one_hot)\n```\n\n```output\n      [[0. 0. 1.]\n      [0. 1. 0.]\n      [0. 0. 1.]\n      [0. 0. 1.]\n      [0. 0. 1.]]\n ```\n\n在更复杂的数据集中，许多列将是分类的（例如字符串），在处理分类数据时，特征列最有价值。虽然此数据集中只有一个分类列，但我们将使用它来演示在处理其他数据集时可以使用的几种重要类型的特征列。\n\n### 7.4. 嵌入列\n\n假设我们不是只有几个可能的字符串，而是每个类别有数千（或更多）值。由于多种原因，随着类别数量的增加，使用独热编码训练神经网络变得不可行，我们可以使用嵌入列来克服此限制。\n[嵌入列](https://tensorflow.google.cn/api_docs/python/tf/feature_column/embedding_column)不是将数据表示为多维度的独热矢量，而是将数据表示为低维密集向量，其中每个单元格可以包含任意数字，而不仅仅是0或1.嵌入的大小（在下面的例子中是8）是必须调整的参数。\n\n关键点：当分类列具有许多可能的值时，最好使用嵌入列，我们在这里使用一个用于演示目的，因此您有一个完整的示例，您可以在将来修改其他数据集。\n\n```python\n# 请注意，嵌入列的输入是我们先前创建的分类列 \nthal_embedding = feature_column.embedding_column(thal, dimension=8)\ndemo(thal_embedding)\n```\n\n```output\n[[-0.01019966  0.23583987  0.04172783  0.34261808 -0.02596842  0.05985594\n   0.32729048 -0.07209085]\n [ 0.08829682  0.3921798   0.32400072  0.00508362 -0.15642034 -0.17451124\n   0.12631968  0.15029909]\n [-0.01019966  0.23583987  0.04172783  0.34261808 -0.02596842  0.05985594\n   0.32729048 -0.07209085]\n [-0.01019966  0.23583987  0.04172783  0.34261808 -0.02596842  0.05985594\n   0.32729048 -0.07209085]\n [-0.01019966  0.23583987  0.04172783  0.34261808 -0.02596842  0.05985594\n   0.32729048 -0.07209085]]\n```\n\n### 7.5. 哈希特征列\n\n表示具有大量值的分类列的另一种方法是使用[categorical_column_with_hash_bucket](https://tensorflow.google.cn/api_docs/python/tf/feature_column/categorical_column_with_hash_bucket).\n此特征列计算输入的哈希值，然后选择一个`hash_bucket_size`存储桶来编码字符串，使用此列时，您不需要提供词汇表，并且可以选择使`hash_buckets`的数量远远小于实际类别的数量以节省空间。\n\n关键点：该技术的一个重要缺点是可能存在冲突，其中不同的字符串被映射到同一个桶，实际上，无论如何，这对某些数据集都有效。\n\n```python\nthal_hashed = feature_column.categorical_column_with_hash_bucket(\n      'thal', hash_bucket_size=1000)\ndemo(feature_column.indicator_column(thal_hashed))\n```\n\n```\n[[0. 0. 0. ... 0. 0. 0.]\n [0. 0. 0. ... 0. 0. 0.]\n [0. 0. 0. ... 0. 0. 0.]\n [0. 0. 0. ... 0. 0. 0.]\n [0. 0. 0. ... 0. 0. 0.]]\n```\n\n### 7.6. 交叉特征列\n\n将特征组合成单个特征（也称为[特征交叉](https://developers.google.com/machine-learning/glossary/#feature_cross)），使模型能够为每个特征组合学习单独的权重。\n在这里，我们将创建一个age和thal交叉的新功能，\n请注意，`crossed_column`不会构建所有可能组合的完整表（可能非常大），相反，它由`hashed_column`支持，因此您可以选择表的大小。\n\n```python\ncrossed_feature = feature_column.crossed_column([age_buckets, thal], hash_bucket_size=1000)\ndemo(feature_column.indicator_column(crossed_feature))\n```\n\n```\n[[0. 0. 0. ... 0. 0. 0.]\n [0. 0. 0. ... 0. 0. 0.]\n [0. 0. 0. ... 0. 0. 0.]\n [0. 0. 0. ... 0. 0. 0.]\n [0. 0. 0. ... 0. 0. 0.]]\n```\n\n## 8. 选择要使用的列\n\n我们已经了解了如何使用几种类型的特征列，现在我们将使用它们来训练模型。本教程的目标是向您展示使用特征列所需的完整代码（例如，机制），我们选择了几列来任意训练我们的模型。\n\n关键点：如果您的目标是建立一个准确的模型，请尝试使用您自己的更大数据集，并仔细考虑哪些特征最有意义，以及如何表示它们。\n\n```python\nfeature_columns = []\n\n# numeric 数字列\nfor header in ['age', 'trestbps', 'chol', 'thalach', 'oldpeak', 'slope', 'ca']:\n  feature_columns.append(feature_column.numeric_column(header))\n\n# bucketized 分桶列\nage_buckets = feature_column.bucketized_column(age, boundaries=[18, 25, 30, 35, 40, 45, 50, 55, 60, 65])\nfeature_columns.append(age_buckets)\n\n# indicator 指示符列 \nthal = feature_column.categorical_column_with_vocabulary_list(\n      'thal', ['fixed', 'normal', 'reversible'])\nthal_one_hot = feature_column.indicator_column(thal)\nfeature_columns.append(thal_one_hot)\n\n# embedding 嵌入列 \nthal_embedding = feature_column.embedding_column(thal, dimension=8)\nfeature_columns.append(thal_embedding)\n\n# crossed 交叉列 \ncrossed_feature = feature_column.crossed_column([age_buckets, thal], hash_bucket_size=1000)\ncrossed_feature = feature_column.indicator_column(crossed_feature)\nfeature_columns.append(crossed_feature)\n```\n\n### 8.1. 创建特征层\n\n现在我们已经定义了我们的特征列，我们将使用[DenseFeatures](https://tensorflow.google.cn/versions/r2.0/api_docs/python/tf/keras/layers/DenseFeatures)层将它们输入到我们的Keras模型中。\n\n```python\nfeature_layer = tf.keras.layers.DenseFeatures(feature_columns)\n```\n\n之前，我们使用小批量大小来演示特征列的工作原理，我们创建了一个具有更大批量的新输入管道。\n\n```python\nbatch_size = 32\ntrain_ds = df_to_dataset(train, batch_size=batch_size)\nval_ds = df_to_dataset(val, shuffle=False, batch_size=batch_size)\ntest_ds = df_to_dataset(test, shuffle=False, batch_size=batch_size)\n```\n\n## 9. 创建、编译和训练模型\n\n```python\nmodel = tf.keras.Sequential([\n  feature_layer,\n  layers.Dense(128, activation='relu'),\n  layers.Dense(128, activation='relu'),\n  layers.Dense(1, activation='sigmoid')\n])\n\nmodel.compile(optimizer='adam',\n              loss='binary_crossentropy',\n              metrics=['accuracy'])\n\nmodel.fit(train_ds,\n          validation_data=val_ds,\n          epochs=5)\n```\n\n训练过程的输出\n\n```\nEpoch 1/5\n7/7 [==============================] - 1s 79ms/step - loss: 3.8492 - accuracy: 0.4219 - val_loss: 2.7367 - val_accuracy: 0.7143\n......\nEpoch 5/5\n7/7 [==============================] - 0s 34ms/step - loss: 0.6200 - accuracy: 0.7377 - val_loss: 0.6288 - val_accuracy: 0.6327\n\n<tensorflow.python.keras.callbacks.History at 0x7f48c044c5f8>\n```\n\n测试\n\n```python\nloss, accuracy = model.evaluate(test_ds)\nprint(\"Accuracy\", accuracy)\n```\n\n```output\n      2/2 [==============================] - 0s 19ms/step - loss: 0.5538 - accuracy: 0.6721\n      Accuracy 0.6721311\n```\n\n关键点：通常使用更大更复杂的数据集进行深度学习，您将看到最佳结果。使用像这样的小数据集时，我们建议使用决策树或随机森林作为强基线。\n\n本教程的目标不是为了训练一个准确的模型，而是为了演示使用结构化数据的机制，因此您在将来使用自己的数据集时需要使用代码作为起点。\n\n## 10. 下一步\n\n了解有关分类结构化数据的更多信息的最佳方法是亲自尝试，我们建议找到另一个可以使用的数据集，并训练模型使用类似于上面的代码对其进行分类，要提高准确性，请仔细考虑模型中包含哪些特征以及如何表示这些特征。\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-keras-feature_columns.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-keras-feature_columns.html)\n> 英文版本：[https://tensorflow.google.cn/beta/tutorials/keras/feature_columns](https://tensorflow.google.cn/beta/tutorials/keras/feature_columns)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/keras/feature_columns.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/keras/feature_columns.md)\n"
  },
  {
    "path": "r2/tutorials/keras/overfit_and_underfit.md",
    "content": "---\ntitle: 探索过拟合和欠拟合\ncategories: tensorflow2官方教程\ntags: tensorflow2.0教程\ntop: 1915\nabbrlink: tensorflow/tf2-tutorials-keras-overfit_and_underfit\n---\n\n# 探索过拟合和欠拟合 (tensorflow2.0官方教程翻译)\n\n在前面的两个例子中（电影影评分类和预测燃油效率），我们看到，在训练许多周期之后，我们的模型对验证数据的准确性会到达峰值，然后开始下降。\n\n换句话说，我们的模型会过度拟合训练数据，学习如果处理过拟合很重要，尽管通常可以在训练集上实现高精度，但我们真正想要的是开发能够很好泛化测试数据（或之前未见过的数据）的模型。\n\n过拟合的反面是欠拟合，当测试数据仍有改进空间会发生欠拟合，出现这种情况的原因有很多：模型不够强大，过度正则化，或者根本没有经过足够长的时间训练，这意味着网络尚未学习训练数据中的相关模式。\n\n如果训练时间过长，模型将开始过度拟合，并从训练数据中学习模式，而这些模式可能并不适用于测试数据，我们需要取得平衡，了解如何训练适当数量的周期，我们将在下面讨论，这是一项有用的技能。\n\n为了防止过拟合，最好的解决方案是使用更多的训练数据，受过更多数据训练的模型自然会更好的泛化。当没有更多的训练数据时，另外一个最佳解决方案是使用正则化等技术，这些限制了模型可以存储的信息的数据量和类型，如果网络只能记住少量模式，那么优化过程将迫使它专注于最突出的模式，这些模式有更好的泛化性。\n\n在本章节中，我们将探索两种常见的正则化技术：权重正则化和dropout丢弃正则化，并使用它们来改进我们的IMDB电影评论分类。\n\n```python\nfrom __future__ import absolute_import, division, print_function, unicode_literals\n\nimport tensorflow as tf \nfrom tensorflow import keras\n\nimport numpy as np\nimport matplotlib.pyplot as plt\n\nprint(tf.__version__)\n```\n\n## 1. 下载IMDB数据集\n\n我们不会像以前一样使用嵌入，而是对句子进行多重编码。这个模型将很快适应训练集。它将用于证明何时发生过拟合，以及如何处理它。\n\n对我们的列表进行多热编码意味着将它们转换为0和1的向量，具体地说，这将意味着例如将序列[3,5]转换为10000维向量，除了索引3和5的值是1之外，其他全零。\n\n```python\nNUM_WORDS = 10000\n\n(train_data, train_labels), (test_data, test_labels) = keras.datasets.imdb.load_data(num_words=NUM_WORDS)\n\ndef multi_hot_sequences(sequences, dimension):\n    # 创建一个全零的形状矩阵 (len(sequences), dimension)\n    results = np.zeros((len(sequences), dimension))\n    for i, word_indices in enumerate(sequences):\n        results[i, word_indices] = 1.0  # 将results[i]的特定值设为1\n    return results\n\n\ntrain_data = multi_hot_sequences(train_data, dimension=NUM_WORDS)\ntest_data = multi_hot_sequences(test_data, dimension=NUM_WORDS)\n```\n\n让我们看一下生成的多热矢量，单词索引按频率排序，因此预计索引零附近有更多的1值，我们可以在下图中看到：\n\n```python\nplt.plot(train_data[0])\n```\n\n![png](https://tensorflow.google.cn/beta/tutorials/keras/overfit_and_underfit_files/output_7_1.png)\n\n## 2. 演示过度拟合\n\n防止过度拟合的最简单方法是减小模型的大小，即模型中可学习参数的数量（由层数和每层单元数决定）。在深度学习中，模型中可学习参数的数量通常被称为模型的“容量”。直观地，具有更多参数的模型将具有更多的“记忆能力”，因此将能够容易地学习训练样本与其目标之间的完美的字典式映射，没有任何泛化能力的映射，但是在对未见过的数据做出预测时这将是无用的。\n\n始终牢记这一点：深度学习模型往往善于适应训练数据，但真正的挑战是泛化，而不是适应。\n\n另一方面，如果网络具有有限的记忆资源，则将不能容易地学习映射。为了最大限度地减少损失，它必须学习具有更强预测能力的压缩表示。同时，如果您使模型太小，则难以适应训练数据。“太多容量”和“容量不足”之间存在平衡。\n\n不幸的是，没有神奇的公式来确定模型的正确大小或架构（就层数而言，或每层的正确大小），您将不得不尝试使用一系列不同的架构。\n\n要找到合适的模型大小，最好从相对较少的层和参数开始，然后开始增加层的大小或添加新层，直到您看到验证损失的收益递减为止。让我们在电影评论分类网络上试试。\n\n我们将仅适用`Dense`层作为基线创建一个简单模型，然后创建更小和更大的版本，并进行比较。\n\n### 2.1. 创建一个基线模型\n\n```python\nbaseline_model = keras.Sequential([\n    # `input_shape` is only required here so that `.summary` works.\n    keras.layers.Dense(16, activation='relu', input_shape=(NUM_WORDS,)),\n    keras.layers.Dense(16, activation='relu'),\n    keras.layers.Dense(1, activation='sigmoid')\n])\n\nbaseline_model.compile(optimizer='adam',\n                       loss='binary_crossentropy',\n                       metrics=['accuracy', 'binary_crossentropy'])\n\nbaseline_model.summary()\n```\n\n```output\nModel: \"sequential\"\n_________________________________________________________________\nLayer (type)                 Output Shape              Param #   \n=================================================================\ndense (Dense)                (None, 16)                160016    \n_________________________________________________________________\ndense_1 (Dense)              (None, 16)                272       \n_________________________________________________________________\ndense_2 (Dense)              (None, 1)                 17        \n=================================================================\nTotal params: 160,305\nTrainable params: 160,305\nNon-trainable params: 0\n_________________________________________________________________\n```\n\n```python\nbaseline_history = baseline_model.fit(train_data,\n                                      train_labels,\n                                      epochs=20,\n                                      batch_size=512,\n                                      validation_data=(test_data, test_labels),\n                                      verbose=2)\n```\n\n```output\nTrain on 25000 samples, validate on 25000 samples\nEpoch 1/20\n25000/25000 - 3s - loss: 0.4664 - accuracy: 0.8135 - binary_crossentropy: 0.4664 - val_loss: 0.3257 - val_accuracy: 0.8808 - val_binary_crossentropy: 0.3257\n......\nEpoch 20/20\n25000/25000 - 2s - loss: 0.0037 - accuracy: 0.9999 - binary_crossentropy: 0.0037 - val_loss: 0.8219 - val_accuracy: 0.8532 - val_binary_crossentropy: 0.8219\n```\n\n### 2.2. 创建一个更小的模型\n\n让我们创建一个隐藏单元较少的模型，与我们刚刚创建的基线模型进行比较：\n\n```python\nsmaller_model = keras.Sequential([\n    keras.layers.Dense(4, activation='relu', input_shape=(NUM_WORDS,)),\n    keras.layers.Dense(4, activation='relu'),\n    keras.layers.Dense(1, activation='sigmoid')\n])\n\nsmaller_model.compile(optimizer='adam',\n                      loss='binary_crossentropy',\n                      metrics=['accuracy', 'binary_crossentropy'])\n\nsmaller_model.summary()\n```\n\n```output\nModel: \"sequential_1\"\n_________________________________________________________________\nLayer (type)                 Output Shape              Param #   \n=================================================================\ndense_3 (Dense)              (None, 4)                 40004     \n_________________________________________________________________\ndense_4 (Dense)              (None, 4)                 20        \n_________________________________________________________________\ndense_5 (Dense)              (None, 1)                 5         \n=================================================================\nTotal params: 40,029\nTrainable params: 40,029\nNon-trainable params: 0\n_________________________________________________________________\n```\n\n用相同的数据训练模型：\n\n```python\nsmaller_history = smaller_model.fit(train_data,\n                                    train_labels,\n                                    epochs=20,\n                                    batch_size=512,\n                                    validation_data=(test_data, test_labels),\n                                    verbose=2)\n```\n\n```output\nTrain on 25000 samples, validate on 25000 samples\nEpoch 1/20\n25000/25000 - 3s - loss: 0.6189 - accuracy: 0.6439 - binary_crossentropy: 0.6189 - val_loss: 0.5482 - val_accuracy: 0.7987 - val_binary_crossentropy: 0.5482\n......\nEpoch 20/20\n25000/25000 - 2s - loss: 0.1857 - accuracy: 0.9880 - binary_crossentropy: 0.1857 - val_loss: 0.5043 - val_accuracy: 0.8632 - val_binary_crossentropy: 0.5043\n```\n\n### 2.3. 创建一个较大的模型\n\n作为练习，您可以创建一个更大的模型，并查看它开始过拟合的速度。\n接下来，让我们在这个基准测试中添加一个容量更大的网络，远远超出问题的范围：\n\n```python\nbigger_model = keras.models.Sequential([\n    keras.layers.Dense(512, activation='relu', input_shape=(NUM_WORDS,)),\n    keras.layers.Dense(512, activation='relu'),\n    keras.layers.Dense(1, activation='sigmoid')\n])\n\nbigger_model.compile(optimizer='adam',\n                     loss='binary_crossentropy',\n                     metrics=['accuracy','binary_crossentropy'])\n\nbigger_model.summary()\n```\n\n```output\nModel: \"sequential_2\"\n_________________________________________________________________\nLayer (type)                 Output Shape              Param #   \n=================================================================\ndense_6 (Dense)              (None, 512)               5120512   \n_________________________________________________________________\ndense_7 (Dense)              (None, 512)               262656    \n_________________________________________________________________\ndense_8 (Dense)              (None, 1)                 513       \n=================================================================\nTotal params: 5,383,681\nTrainable params: 5,383,681\nNon-trainable params: 0\n_________________________________________________________________\n```\n\n并且，再次使用相同的数据训练模型：\n\n```python\nbigger_history = bigger_model.fit(train_data, train_labels,\n                                  epochs=20,\n                                  batch_size=512,\n                                  validation_data=(test_data, test_labels),\n                                  verbose=2)\n```\n输出\n```\nTrain on 25000 samples, validate on 25000 samples\nEpoch 1/20\n25000/25000 - 5s - loss: 0.3392 - accuracy: 0.8581 - binary_crossentropy: 0.3392 - val_loss: 0.2947 - val_accuracy: 0.8802 - val_binary_crossentropy: 0.2947\n......\nEpoch 20/20\n25000/25000 - 5s - loss: 1.1516e-05 - accuracy: 1.0000 - binary_crossentropy: 1.1516e-05 - val_loss: 0.9571 - val_accuracy: 0.8717 - val_binary_crossentropy: 0.9571\n```\n\n### 2.4. 绘制训练和验证损失\n\n<!--TODO(markdaoust): This should be a one-liner with tensorboard -->\n\n实线表示训练损失，虚线表示验证损失（记住：较低的验证损失表示更好的模型）。在这里，较小的网络开始过拟合晚于基线模型（在6个周期之后而不是4个周期），并且一旦开始过拟合，其性能下降得慢得多。\n\n```python\ndef plot_history(histories, key='binary_crossentropy'):\n  plt.figure(figsize=(16,10))\n\n  for name, history in histories:\n    val = plt.plot(history.epoch, history.history['val_'+key],\n                   '--', label=name.title()+' Val')\n    plt.plot(history.epoch, history.history[key], color=val[0].get_color(),\n             label=name.title()+' Train')\n\n  plt.xlabel('Epochs')\n  plt.ylabel(key.replace('_',' ').title())\n  plt.legend()\n\n  plt.xlim([0,max(history.epoch)])\n\n\nplot_history([('baseline', baseline_history),\n              ('smaller', smaller_history),\n              ('bigger', bigger_history)])\n```\n\n![png](https://tensorflow.google.cn/beta/tutorials/keras/overfit_and_underfit_files/output_23_0.png?dcb_=0.12370822350480548)\n\n请注意，较大的网络在仅仅一个周期之后几乎立即开始过度拟合，并且更严重。网络容量越大，能够越快地对训练数据进行建模（导致训练损失低），但过拟合的可能性越大（导致训练和验证损失之间的差异很大）。\n\n## 3. 防止过度拟合的策略\n\n### 3.1. 添加权重正则化\n\n你可能熟悉奥卡姆的剃刀原则：给出两个解释的东西，最可能正确的解释是“最简单”的解释，即做出最少量假设的解释。这也适用于神经网络学习的模型：给定一些训练数据和网络架构，有多组权重值（多个模型）可以解释数据，而简单模型比复杂模型更不容易过度拟合。\n\n在这种情况下，“简单模型”是参数值分布的熵更小的模型(或参数更少的模型，如我们在上一节中看到的)。因此，减轻过度拟合的一种常见方法是通过强制网络的权值只取较小的值来限制网络的复杂性，这使得权值的分布更加“规则”。这被称为“权重正则化”，它是通过在网络的损失函数中增加与权重过大相关的成本来实现的。这种成本有两种:\n\n* [L1 正则化](https://developers.google.cn/machine-learning/glossary/#L1_regularization)其中添加的成本与权重系数的绝对值成正比(即与权重的“L1范数”成正比)。\n\n* [L2 正则化](https://developers.google.cn/machine-learning/glossary/#L2_regularization), 其中增加的成本与权重系数值的平方成正比(即与权重的平方“L2范数”成正比)。L2正则化在神经网络中也称为权值衰减。不要让不同的名称迷惑你:权重衰减在数学上与L2正则化是完全相同的。\n\nL2正则化引入了稀疏性，使一些权重参数为零。L2正则化将惩罚权重参数而不会使它们稀疏，这是L2更常见的一个原因。\n\n在`tf.keras`中，通过将权重正则化实例作为关键字参数传递给层来添加权重正则化。我们现在添加L2权重正则化。\n\n```python\nl2_model = keras.models.Sequential([\n    keras.layers.Dense(16, kernel_regularizer=keras.regularizers.l2(0.001),\n                       activation='relu', input_shape=(NUM_WORDS,)),\n    keras.layers.Dense(16, kernel_regularizer=keras.regularizers.l2(0.001),\n                       activation='relu'),\n    keras.layers.Dense(1, activation='sigmoid')\n])\n\nl2_model.compile(optimizer='adam',\n                 loss='binary_crossentropy',\n                 metrics=['accuracy', 'binary_crossentropy'])\n\nl2_model_history = l2_model.fit(train_data, train_labels,\n                                epochs=20,\n                                batch_size=512,\n                                validation_data=(test_data, test_labels),\n                                verbose=2)\n```\n\n```\nTrain on 25000 samples, validate on 25000 samples\nEpoch 1/20\n25000/25000 - 3s - loss: 0.5191 - accuracy: 0.8206 - binary_crossentropy: 0.4785 - val_loss: 0.3855 - val_accuracy: 0.8727 - val_binary_crossentropy: 0.3421\n......\nEpoch 20/20\n25000/25000 - 2s - loss: 0.1567 - accuracy: 0.9718 - binary_crossentropy: 0.0868 - val_loss: 0.5327 - val_accuracy: 0.8561 - val_binary_crossentropy: 0.4631\n```\n\n```l2（0.001）```表示该层的权重矩阵中的每个系数都会将```0.001 * weight_coefficient_value**2```添加到网络的总损失中。请注意，由于此惩罚仅在训练时添加，因此在训练时该网络的损失将远高于测试时。\n\n这是我们的L2正则化惩罚的影响：\n\n```python\nplot_history([('baseline', baseline_history),\n              ('l2', l2_model_history)])\n```\n\n![png](https://tensorflow.google.cn/beta/tutorials/keras/overfit_and_underfit_files/output_30_0.png?dcb_=0.8386779368853696)\n\n正如你所看到的，L2正则化模型比基线模型更能抵抗过拟合，即使两个模型具有相同数量的参数。\n\n### 3.2. 添加Dropout(丢弃正则化)\n\nDropout是由Hinton和他在多伦多大学的学生开发的最有效和最常用的神经网络正则化技术之一。Dropout应用于层主要就是在训练期间随机“丢弃”（即设置为零）该层的多个输出特征。假设一个给定的层通常会在训练期间为给定的输入样本返回一个向量[0.2,0.5,1.3,0.8,1.1]，在应用了Dropout之后，该向量将具有随机分布的几个零条目，例如，[0,0.5,1.3,0,1.1]。“丢弃率”是被归零的特征的一部分，它通常设置在0.2和0.5之间，\n在测试时，没有单元被剔除，而是将层的输出值按与丢弃率相等的因子缩小，以平衡实际活动的单元多余训练时的单元。\n\n在`tf.keras`中，您可以通过`Dropout`层在网络中引入dropout，该层将在之前应用于层的输出。\n\n让我们在IMDB网络中添加两个`Dropout`层，看看它们在减少过度拟合方面做得如何：\n\n```python\ndpt_model = keras.models.Sequential([\n    keras.layers.Dense(16, activation='relu', input_shape=(NUM_WORDS,)),\n    keras.layers.Dropout(0.5),\n    keras.layers.Dense(16, activation='relu'),\n    keras.layers.Dropout(0.5),\n    keras.layers.Dense(1, activation='sigmoid')\n])\n\ndpt_model.compile(optimizer='adam',\n                  loss='binary_crossentropy',\n                  metrics=['accuracy','binary_crossentropy'])\n\ndpt_model_history = dpt_model.fit(train_data, train_labels,\n                                  epochs=20,\n                                  batch_size=512,\n                                  validation_data=(test_data, test_labels),\n                                  verbose=2)\n```\n\n```\nTrain on 25000 samples, validate on 25000 samples\nEpoch 1/20\n25000/25000 - 3s - loss: 0.6355 - accuracy: 0.6373 - binary_crossentropy: 0.6355 - val_loss: 0.4929 - val_accuracy: 0.8396 - val_binary_crossentropy: 0.4929\n......\nEpoch 20/20\n25000/25000 - 3s - loss: 0.0729 - accuracy: 0.9738 - binary_crossentropy: 0.0729 - val_loss: 0.5624 - val_accuracy: 0.8747 - val_binary_crossentropy: 0.5624\n```\n\n```\nplot_history([('baseline', baseline_history),\n              ('dropout', dpt_model_history)])\n```\n\n![png](https://tensorflow.google.cn/beta/tutorials/keras/overfit_and_underfit_files/output_34_0.png?dcb_=0.9304692927609572)\n\n从上图可以看出，添加dropout时对基线模型的明显改进。\n\n回顾一下，以下是防止神经网络中过度拟合的最常用方法：\n* 获取更多训练数据\n* 减少网络的容量\n* 添加权重正则化\n* 添加dropout\n\n本指南未涉及的两个重要方法是数据增强和批量标准化。\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-keras-overfit_and_underfit.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-keras-overfit_and_underfit.html)\n> 英文版本：[https://tensorflow.google.cn/beta/tutorials/keras/overfit_and_underfit](https://tensorflow.google.cn/beta/tutorials/keras/overfit_and_underfit)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/keras/overfit_and_underfit.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/keras/overfit_and_underfit.md)\n"
  },
  {
    "path": "r2/tutorials/keras/save_and_restore_models.md",
    "content": "---\ntitle: tensorflow2保存和加载模型\ncategories: tensorflow2官方教程\ntags: tensorflow2.0教程\ntop: 1916\nabbrlink: tensorflow/tf2-tutorials-keras-save_and_restore_models\n---\n\n# tensorflow2保存和加载模型 (tensorflow2.0官方教程翻译)\n\n模型进度可以在训练期间和训练后保存。这意味着模型可以在它停止的地方继续，并避免长时间的训练。保存还意味着您可以共享您的模型，其他人可以重新创建您的工作。当发布研究模型和技术时，大多数机器学习实践者共享:\n* 用于创建模型的代码\n* 以及模型的训练权重或参数\n\n共享此数据有助于其他人了解模型的工作原理，并使用新数据自行尝试。\n\n注意：小心不受信任的代码(TensorFlow模型是代码)。有关详细信息，请参阅[安全使用TensorFlow](https://github.com/tensorflow/tensorflow/blob/master/SECURITY.md) 。\n\n**选项**\n\n保存TensorFlow模型有多种方法，具体取决于你使用的API。本章节使用tf.keras(一个高级API，用于TensorFlow中构建和训练模型)，有关其他方法，请参阅TensorFlow[保存和还原指南](https://tensorflow.google.cn/guide/saved_model)或[保存在eager中](https://tensorflow.google.cn/guide/eager#object-based_saving)。\n\n## 1. 设置\n\n### 1.1. 安装和导入\n\n需要安装和导入TensorFlow和依赖项\n\n```python\npip install h5py pyyaml\n```\n\n### 1.2. 获取样本数据集\n\n我们将使用[MNIST数据集](http://yann.lecun.com/exdb/mnist/)来训练我们的模型以演示保存权重，要加速这些演示运行，请只使用前1000个样本数据：\n\n```python\nfrom __future__ import absolute_import, division, print_function, unicode_literals\n\nimport os\n\n!pip install tensorflow==2.0.0-alpha0\nimport tensorflow as tf\nfrom tensorflow import keras\n\ntf.__version__\n```\n\n```python\n(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()\n\ntrain_labels = train_labels[:1000]\ntest_labels = test_labels[:1000]\n\ntrain_images = train_images[:1000].reshape(-1, 28 * 28) / 255.0\ntest_images = test_images[:1000].reshape(-1, 28 * 28) / 255.0\n```\n\n### 1.3. 定义模型\n\n让我们构建一个简单的模型，我们将用它来演示保存和加载权重。\n\n```python\n# 返回一个简短的序列模型 \ndef create_model():\n  model = tf.keras.models.Sequential([\n    keras.layers.Dense(512, activation='relu', input_shape=(784,)),\n    keras.layers.Dropout(0.2),\n    keras.layers.Dense(10, activation='softmax')\n  ])\n\n  model.compile(optimizer='adam',\n                loss='sparse_categorical_crossentropy',\n                metrics=['accuracy'])\n\n  return model\n\n\n# 创建基本模型实例\nmodel = create_model()\nmodel.summary()\n```\n\n```python\nModel: \"sequential\"\n_________________________________________________________________\nLayer (type)                 Output Shape              Param #   \n=================================================================\ndense (Dense)                (None, 512)               401920    \n_________________________________________________________________\ndropout (Dropout)            (None, 512)               0         \n_________________________________________________________________\ndense_1 (Dense)              (None, 10)                5130      \n=================================================================\nTotal params: 407,050\nTrainable params: 407,050\nNon-trainable params: 0\n_________________________________________________________________\n```\n\n## 2. 在训练期间保存检查点\n\n主要用例是在训练期间和训练结束时自动保存检查点，通过这种方式，您可以使用训练有素的模型，而无需重新训练，或者在您离开的地方继续训练，以防止训练过程中断。\n\n`tf.keras.callbacks.ModelCheckpoint`是执行此任务的回调，回调需要几个参数来配置检查点。\n\n### 2.1. 检查点回调使用情况\n\n训练模型并将其传递给 `ModelCheckpoint`回调\n\n```python\ncheckpoint_path = \"training_1/cp.ckpt\"\ncheckpoint_dir = os.path.dirname(checkpoint_path)\n\n# 创建一个检查点回调\ncp_callback = tf.keras.callbacks.ModelCheckpoint(checkpoint_path,\n                                                 save_weights_only=True,\n                                                 verbose=1)\n\nmodel = create_model()\n\nmodel.fit(train_images, train_labels,  epochs = 10,\n          validation_data = (test_images,test_labels),\n          callbacks = [cp_callback])  # pass callback to training\n\n# 这可能会生成与保存优化程序状态相关的警告。\n# 这些警告（以及整个笔记本中的类似警告）是为了阻止过时使用的，可以忽略。\n```\n\n```output\n  Train on 1000 samples, validate on 1000 samples\n  ......\n  Epoch 10/10\n  960/1000 [===========================>..] - ETA: 0s - loss: 0.0392 - accuracy: 1.0000\n  Epoch 00010: saving model to training_1/cp.ckpt\n  1000/1000 [==============================] - 0s 207us/sample - loss: 0.0393 - accuracy: 1.0000 - val_loss: 0.3976 - val_accuracy: 0.8750\n\n  <tensorflow.python.keras.callbacks.History at 0x7efc3eba7358>\n```\n\n这将创建一个TensorFlow检查点文件集合，这些文件在每个周期结束时更新。\n文件夹checkpoint_dir下的内容如下：（Linux系统使用 `ls`命令查看）\n```\ncheckpoint  cp.ckpt.data-00000-of-00001  cp.ckpt.index\n```\n\n创建一个新的未经训练的模型，仅从权重恢复模型时，必须具有与原始模型具有相同体系结构的模型，由于它是相同的模型架构，我们可以共享权重，尽管它是模型的不同示例。\n\n现在重建一个新的，未经训练的模型，并在测试集中评估它。未经训练的模型将在随机水平(约10%的准确率):\n\n```python\nmodel = create_model()\n\nloss, acc = model.evaluate(test_images, test_labels)\nprint(\"Untrained model, accuracy: {:5.2f}%\".format(100*acc))\n```\n\n```output\n1000/1000 [==============================] - 0s 107us/sample - loss: 2.3224 - accuracy: 0.1230\nUntrained model, accuracy: 12.30%\n```\n\n然后从检查点加载权重，并重新评估：\n\n```python\nmodel.load_weights(checkpoint_path)\nloss,acc = model.evaluate(test_images, test_labels)\nprint(\"Restored model, accuracy: {:5.2f}%\".format(100*acc))\n```\n\n```\n1000/1000 [==============================] - 0s 48us/sample - loss: 0.3976 - accuracy: 0.8750\nRestored model, accuracy: 87.50%\n```\n\n### 2.2. 检查点选项\n\n回调提供了几个选项，可以为生成的检查点提供唯一的名称，并调整检查点频率。\n\n训练一个新模型，每5个周期保存一次唯一命名的检查点：\n\n```python\n# 在文件名中包含周期数. (使用 `str.format`)\ncheckpoint_path = \"training_2/cp-{epoch:04d}.ckpt\"\ncheckpoint_dir = os.path.dirname(checkpoint_path)\n\ncp_callback = tf.keras.callbacks.ModelCheckpoint(\n    checkpoint_path, verbose=1, save_weights_only=True,\n    # 每5个周期保存一次权重\n    period=5)\n\nmodel = create_model()\nmodel.save_weights(checkpoint_path.format(epoch=0))\nmodel.fit(train_images, train_labels,\n          epochs = 50, callbacks = [cp_callback],\n          validation_data = (test_images,test_labels),\n          verbose=0)\n```\n\n```\n\nEpoch 00005: saving model to training_2/cp-0005.ckpt\n......\nEpoch 00050: saving model to training_2/cp-0050.ckpt\n<tensorflow.python.keras.callbacks.History at 0x7efc7c3bbd30>\n```\n\n现在，查看生成的检查点并选择最新的检查点：\n\n```python\nlatest = tf.train.latest_checkpoint(checkpoint_dir)\nlatest\n```\n\n```\n      'training_2/cp-0050.ckpt'\n```\n\n注意：默认的tensorflow格式仅保存最近的5个检查点。\n\n要测试，请重置模型并加载最新的检查点：\n\n```\nmodel = create_model()\nmodel.load_weights(latest)\nloss, acc = model.evaluate(test_images, test_labels)\nprint(\"Restored model, accuracy: {:5.2f}%\".format(100*acc))\n```\n\n```\n      1000/1000 [==============================] - 0s 84us/sample - loss: 0.4695 - accuracy: 0.8810\n      Restored model, accuracy: 88.10%\n```\n\n## 3. 这些文件是什么？\n\n上述代码将权重存储到[检查点]((https://tensorflow.google.cn/guide/saved_model#save_and_restore_variables))格式的文件集合中，这些文件仅包含二进制格式的训练权重.\n检查点包含：\n* 一个或多个包含模型权重的分片；\n* 索引文件，指示哪些权重存储在哪个分片。\n\n如果您只在一台机器上训练模型，那么您将有一个带有后缀的分片：`.data-00000-of-00001`\n\n## 4. 手动保存权重\n\n上面你看到了如何将权重加载到模型中。手动保存权重同样简单，使用`Model.save_weights`方法。\n\n```python\n# 保存权重\nmodel.save_weights('./checkpoints/my_checkpoint')\n\n# 加载权重\nmodel = create_model()\nmodel.load_weights('./checkpoints/my_checkpoint')\n\nloss,acc = model.evaluate(test_images, test_labels)\nprint(\"Restored model, accuracy: {:5.2f}%\".format(100*acc))\n```\n\n## 5. 保存整个模型\n\n模型和优化器可以保存到包含其状态（权重和变量）和模型配置的文件中，这允许您导出模型，以便可以在不访问原始python代码的情况下使用它。由于恢复了优化器状态，您甚至可以从中断的位置恢复训练。\n\n保存完整的模型非常有用，您可以在TensorFlow.js([HDF5](https://tensorflow.google.cn/js/tutorials/import-keras.html), [Saved Model](https://tensorflow.google.cn/js/tutorials/conversion/import_saved_model)) 中加载它们，然后在Web浏览器中训练和运行它们，或者使用TensorFlow Lite([HDF5](https://tensorflow.google.cn/lite/convert/python_api#exporting_a_tfkeras_file_), [Saved Model](https://tensorflow.google.cn/lite/convert/python_api#exporting_a_savedmodel_))将它们转换为在移动设备上运行。\n\n### 5.1. 作为HDF5文件\n\nKeras使用[HDF5](https://en.wikipedia.org/wiki/Hierarchical_Data_Format)标准提供基本保存格式，出于我们的目的，可以将保存的模型视为单个二进制blob。\n\n```python\nmodel = create_model()\n\nmodel.fit(train_images, train_labels, epochs=5)\n\n# 保存整个模型到HDF5文件 \nmodel.save('my_model.h5')\n```\n\n现在从该文件重新创建模型：\n\n```python\n# 重新创建完全相同的模型，包括权重和优化器\nnew_model = keras.models.load_model('my_model.h5')\nnew_model.summary()\n```\n\n```\nModel: \"sequential_6\"\n_________________________________________________________________\nLayer (type)                 Output Shape              Param #   \n=================================================================\ndense_12 (Dense)             (None, 512)               401920    \n_________________________________________________________________\ndropout_6 (Dropout)          (None, 512)               0         \n_________________________________________________________________\ndense_13 (Dense)             (None, 10)                5130      \n=================================================================\nTotal params: 407,050\nTrainable params: 407,050\nNon-trainable params: 0\n_________________________________________________________________\n```\n\n检查模型的准确率:\n\n```python\nloss, acc = new_model.evaluate(test_images, test_labels)\nprint(\"Restored model, accuracy: {:5.2f}%\".format(100*acc))\n```\n\n```\n1000/1000 [==============================] - 0s 94us/sample - loss: 0.4137 - accuracy: 0.8540\nRestored model, accuracy: 85.40%\n```\n\n此方法可保存模型的所有东西：\n* 权重值\n* 模型的配置（架构）\n* 优化器配置\n\nKeras通过检查架构来保存模型，目前它无法保存TensorFlow优化器（来自`tf.train`）。使用这些时，您需要在加载后重新编译模型，否则您将失去优化程序的状态。\n\n### 5.2. 作为 `saved_model`\n\n注意：这种保存`tf.keras`模型的方法是实验性的，在将来的版本中可能会有所改变。\n\n创建一个新的模型：\n\n```\nmodel = create_model()\n\nmodel.fit(train_images, train_labels, epochs=5)\n```\n\n创建`saved_model`，并将其放在带时间戳的目录中：\n\n```python\nimport time\nsaved_model_path = \"./saved_models/{}\".format(int(time.time()))\n\ntf.keras.experimental.export_saved_model(model, saved_model_path)\nsaved_model_path\n```\n\n```\n    './saved_models/1555630614'\n```\n\n从保存的模型重新加载新的keras模型：\n\n```\nnew_model = tf.keras.experimental.load_from_saved_model(saved_model_path)\nnew_model.summary()\n```\n\n```\nModel: \"sequential_7\"\n_________________________________________________________________\nLayer (type)                 Output Shape              Param #   \n=================================================================\ndense_14 (Dense)             (None, 512)               401920    \n_________________________________________________________________\ndropout_7 (Dropout)          (None, 512)               0         \n_________________________________________________________________\ndense_15 (Dense)             (None, 10)                5130      \n=================================================================\nTotal params: 407,050\nTrainable params: 407,050\nNon-trainable params: 0\n_________________________________________________________________\n```\n\n运行加载的模型进行预测：\n\n```python\nmodel.predict(test_images).shape\n```\n\n```\n(1000, 10)\n```\n\n```python\n# 必须要在评估之前编译模型\n# 如果仅部署已保存的模型，则不需要此步骤 \n\nnew_model.compile(optimizer=model.optimizer,  # keep the optimizer that was loaded\n              loss='sparse_categorical_crossentropy',\n              metrics=['accuracy'])\n\n# 评估加载后的模型 \nloss, acc = new_model.evaluate(test_images, test_labels)\nprint(\"Restored model, accuracy: {:5.2f}%\".format(100*acc))\n```\n\n```\n      1000/1000 [==============================] - 0s 102us/sample - loss: 0.4367 - accuracy: 0.8570\n      Restored model, accuracy: 85.70%\n```\n\n## 6. 下一步是什么\n\n这是使用`tf.keras`保存和加载的快速指南。\n\n* [tf.keras指南](https://tensorflow.google.cn/guide/keras)显示了有关使用tf.keras保存和加载模型的更多信息。\n\n* 在eager execution期间保存，请参阅在[Saving in eager](https://tensorflow.google.cn/guide/eager#object_based_saving)。\n\n* [保存和还原指南](https://tensorflow.google.cn/guide/saved_model)包含有关TensorFlow保存的低阶详细信息。\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-keras-save_and_restore_models.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-keras-save_and_restore_models.html)\n> 英文版本：[https://tensorflow.google.cn/beta/tutorials/keras/save_and_restore_models](https://tensorflow.google.cn/beta/tutorials/keras/save_and_restore_models)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/keras/save_and_restore_models.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/keras/save_and_restore_models.md)"
  },
  {
    "path": "r2/tutorials/quickstart/advanced.md",
    "content": "---\ntitle: 专家入门TensorFlow 2.0\ncategories: tensorflow2官方教程\ntags: tensorflow2.0教程\ntop: 1906\nabbrlink: tensorflow/tf2-tutorials-quickstart-advanced\n---\n\n# 专家入门TensorFlow 2.0使用流程：数据处理、自定义模型、损失、指标、梯度下降 (tensorflow2.0官方教程翻译)\n\n初学者入门教程中，使用tf.keras.Sequential模型，只是简单的堆叠模型。\n本文是专家级入门，使用 Keras 模型子类 API 构建模型，会使用更底层一点的的函数接口，自定义模型、损失、评估指标和梯度下降控制等，流程清晰。\n\n\n开始，请将TensorFlow库导入您的程序：\n\n```python\nfrom __future__ import absolute_import, division, print_function, unicode_literals\n\nimport tensorflow as tf  # 安装命令 `pip install tensorflow-gpu==2.0.0-alpha0`\n\nfrom tensorflow.keras.layers import Dense, Flatten, Conv2D\nfrom tensorflow.keras import Model\n```\n\n加载并准备[MNIST数据集](http://yann.lecun.com/exdb/mnist/).。\n\n```python\nmnist = tf.keras.datasets.mnist\n\n(x_train, y_train), (x_test, y_test) = mnist.load_data()\nx_train, x_test = x_train / 255.0, x_test / 255.0\n\n# 添加一个通道维度\nx_train = x_train[..., tf.newaxis]\nx_test = x_test[..., tf.newaxis]\n```\n\n使用tf.data批处理和随机打乱数据集：\n\n```python\ntrain_ds = tf.data.Dataset.from_tensor_slices(\n    (x_train, y_train)).shuffle(10000).batch(32)\ntest_ds = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(32)\n```\n\n通过使用Keras[模型子类 API](https://tensorflow.google.cn/guide/keras#model_subclassing)构建`tf.keras`模型：\n\n```python\nclass MyModel(Model):\n  def __init__(self):\n    super(MyModel, self).__init__()\n    self.conv1 = Conv2D(32, 3, activation='relu')\n    self.flatten = Flatten()\n    self.d1 = Dense(128, activation='relu')\n    self.d2 = Dense(10, activation='softmax')\n\n  def call(self, x):\n    x = self.conv1(x)\n    x = self.flatten(x)\n    x = self.d1(x)\n    return self.d2(x)\n\nmodel = MyModel()\n```\n\n选择优化器和损失函数进行训练：\n\n```python\nloss_object = tf.keras.losses.SparseCategoricalCrossentropy()\n\noptimizer = tf.keras.optimizers.Adam()\n```\n\n选择指标（metrics）以衡量模型的损失和准确性。这些指标累积超过周期的值，然后打印整体结果。\n\n```python\ntrain_loss = tf.keras.metrics.Mean(name='train_loss')\ntrain_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name='train_accuracy')\n\ntest_loss = tf.keras.metrics.Mean(name='test_loss')\ntest_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name='test_accuracy')\n```\n\n使用`tf.GradientTape`训练模型：\n\n```python\n@tf.function\ndef train_step(images, labels):\n  with tf.GradientTape() as tape:\n    predictions = model(images)\n    loss = loss_object(labels, predictions)\n  gradients = tape.gradient(loss, model.trainable_variables)\n  optimizer.apply_gradients(zip(gradients, model.trainable_variables))\n\n  train_loss(loss)\n  train_accuracy(labels, predictions)\n```\n\n现在测试模型：\n\n```python\n@tf.function\ndef test_step(images, labels):\n  predictions = model(images)\n  t_loss = loss_object(labels, predictions)\n\n  test_loss(t_loss)\n  test_accuracy(labels, predictions)\n```\n\n```python\nEPOCHS = 5\n\nfor epoch in range(EPOCHS):\n  for images, labels in train_ds:\n    train_step(images, labels)\n\n  for test_images, test_labels in test_ds:\n    test_step(test_images, test_labels)\n\n  template = 'Epoch {}, Loss: {}, Accuracy: {}, Test Loss: {}, Test Accuracy: {}'\n  print (template.format(epoch+1,\n                         train_loss.result(),\n                         train_accuracy.result()*100,\n                         test_loss.result(),\n                         test_accuracy.result()*100))\n```\n\n```\n      Epoch 1, Loss: 0.13177014887332916, Accuracy: 96.06000518798828, Test Loss: 0.05814294517040253, Test Accuracy: 98.04999542236328 \n      ...\n      Epoch 5, Loss: 0.042211469262838364, Accuracy: 98.72000122070312, Test Loss: 0.05708516761660576, Test Accuracy: 98.3239974975586\n```\n\n现在，图像分类器在该数据集上的准确度达到约98％。要了解更多信息，请阅读 [TensorFlow教程](https://tensorflow.google.cn/beta/tutorials/keras).。\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-quickstart-advanced.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-quickstart-advanced.html)\n> 英文版本：[https://tensorflow.google.cn/beta/tutorials/quickstart/advanced](https://tensorflow.google.cn/beta/tutorials/quickstart/advanced)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/quickstart/beginner.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/quickstart/advanced.md)\n"
  },
  {
    "path": "r2/tutorials/quickstart/beginner.md",
    "content": "---\ntitle: 初学者入门 TensorFlow 2.0\ncategories: tensorflow2官方教程\ntags: tensorflow2.0教程\ntop: 1905\nabbrlink: tensorflow/tf2-tutorials-quickstart-beginner\n---\n\n# 初学者入门 TensorFlow 2.0（tensorflow2.0官方教程翻译）\n\n安装命令：\n\n```shell\npip install tensorflow-gpu==2.0.0-alpha0\n```\n\n要开始，请将TensorFlow库导入您的程序：\n\n```python\nfrom __future__ import absolute_import, division, print_function, unicode_literals\nimport tensorflow as tf\n```\n\n加载并准备[MNIST数据集](http://yann.lecun.com/exdb/mnist/)，将样本从整数转换为浮点数：\n\n```python\nmnist = tf.keras.datasets.mnist\n\n(x_train, y_train), (x_test, y_test) = mnist.load_data()\nx_train, x_test = x_train / 255.0, x_test / 255.0\n```\n\n通过堆叠图层构建`tf.keras.Sequential`模型。选择用于训练的优化器和损失函数：\n\n```python\nmodel = tf.keras.models.Sequential([\n  tf.keras.layers.Flatten(input_shape=(28, 28)),\n  tf.keras.layers.Dense(128, activation='relu'),\n  tf.keras.layers.Dropout(0.2),\n  tf.keras.layers.Dense(10, activation='softmax')\n])\n\nmodel.compile(optimizer='adam',\n              loss='sparse_categorical_crossentropy',\n              metrics=['accuracy'])\n```\n\n训练和评估模型：\n\n```python\nmodel.fit(x_train, y_train, epochs=5)\n\nmodel.evaluate(x_test, y_test)\n```\n\n现在，图像分类器在该数据集上的准确度达到约98％。 要了解更多信息，请阅读[TensorFlow教程](https://tensorflow.google.cn/beta/tutorials/).。\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-quickstart-beginner.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-quickstart-beginner.html)\n> 英文版本：[https://tensorflow.google.cn/beta/tutorials/quickstart/beginner](https://tensorflow.google.cn/beta/tutorials/quickstart/beginner)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/quickstart/beginner.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/quickstart/beginner.md)\n"
  },
  {
    "path": "r2/tutorials/text/image_captioning.md",
    "content": "---\ntitle: 使用注意力机制给图片取标题\ncategories: tensorflow2官方教程\ntags: tensorflow2.0教程\ntop: 1963\nabbrlink: tensorflow/tf2-tutorials-text-image_captioning\n---\n\n# 使用注意力机制给图片取标题 (tensorflow2.0官方教程翻译)\n\n给定如下图像，我们的目标是生成一个标题，例如“冲浪者骑在波浪上”。\n\n![Man Surfing](https://tensorflow.google.cn/images/surf.jpg)\n\n在这里，我们将使用基于注意力的模型。这使我们能够在生成标题时查看模型关注的图像部分。\n\n![Prediction](https://tensorflow.google.cn/images/imcap_prediction.png)\n\n模型体系结构类似于论文[Show, Attend and Tell: Neural Image Caption Generation with Visual Attention](https://arxiv.org/abs/1502.03044).\n\n本教程是一个端到端的例子。当您运行时，它下载 [MS-COCO](http://cocodataset.org/#home) 数据集，使用Inception V3对图像子集进行预处理和缓存，训练一个编解码器模型，并使用训练过的模型对新图像生成标题。\n\n在本例中，您将使用相对较少的数据来训练模型，大约20,000张图像对应30,000个标题(因为数据集中每个图像都有多个标题)。\n\n导入库\n\n```python\nfrom __future__ import absolute_import, division, print_function, unicode_literals\n\nimport tensorflow as tf\n\n# We'll generate plots of attention in order to see which parts of an image\n# our model focuses on during captioning\nimport matplotlib.pyplot as plt\n\n# Scikit-learn includes many helpful utilities\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.utils import shuffle\n\nimport re\nimport numpy as np\nimport os\nimport time\nimport json\nfrom glob import glob\nfrom PIL import Image\nimport pickle\n```\n\n## 1. 下载并准备MS-COCO数据集\n\n您将使用MS-COCO数据集来训练我们的模型。该数据集包含超过82,000个图像，每个图像至少有5个不同的标题注释。下面的代码自动下载并提取数据集。\n注意：训练集是一个13GB的文件。\n\n```python\nannotation_zip = tf.keras.utils.get_file('captions.zip',\n                                          cache_subdir=os.path.abspath('.'),\n                                          origin = 'http://images.cocodataset.org/annotations/annotations_trainval2014.zip',\n                                          extract = True)\nannotation_file = os.path.dirname(annotation_zip)+'/annotations/captions_train2014.json'\n\nname_of_zip = 'train2014.zip'\nif not os.path.exists(os.path.abspath('.') + '/' + name_of_zip):\n  image_zip = tf.keras.utils.get_file(name_of_zip,\n                                      cache_subdir=os.path.abspath('.'),\n                                      origin = 'http://images.cocodataset.org/zips/train2014.zip',\n                                      extract = True)\n  PATH = os.path.dirname(image_zip)+'/train2014/'\nelse:\n  PATH = os.path.abspath('.')+'/train2014/'\n```\n\n## 2. （可选）限制训练集的大小以加快训练速度\n\n对于本例，我们将选择30,000个标题的子集，并使用这些标题和相应的图像来训练我们的模型。与往常一样，如果您选择使用更多的数据，标题质量将会提高。\n\n```python\n# read the json file\nwith open(annotation_file, 'r') as f:\n    annotations = json.load(f)\n\n# storing the captions and the image name in vectors\nall_captions = []\nall_img_name_vector = []\n\nfor annot in annotations['annotations']:\n    caption = '<start> ' + annot['caption'] + ' <end>'\n    image_id = annot['image_id']\n    full_coco_image_path = PATH + 'COCO_train2014_' + '%012d.jpg' % (image_id)\n\n    all_img_name_vector.append(full_coco_image_path)\n    all_captions.append(caption)\n\n# shuffling the captions and image_names together\n# setting a random state\ntrain_captions, img_name_vector = shuffle(all_captions,\n                                          all_img_name_vector,\n                                          random_state=1)\n\n# selecting the first 30000 captions from the shuffled set\nnum_examples = 30000\ntrain_captions = train_captions[:num_examples]\nimg_name_vector = img_name_vector[:num_examples]\n```\n\n\n```python\nlen(train_captions), len(all_captions)\n```\n\n```\n    (30000, 414113)\n```\n\n## 3. 使用InceptionV3预处理图像\n\n接下来，我们将使用InceptionV3（在Imagenet上预训练）对每个图像进行分类。我们将从最后一个卷积层中提取特征。\n\n首先，我们需要将图像转换成inceptionV3期望的格式:\n* 将图像大小调整为299px×299px\n* 使用[preprocess_input](https://www.tensorflow.org/api_docs/python/tf/keras/applications/inception_v3/preprocess_input)方法对图像进行预处理，使图像规范化，使其包含-1到1范围内的像素，这与用于训练InceptionV3的图像的格式相匹配。\n\n```python\ndef load_image(image_path):\n    img = tf.io.read_file(image_path)\n    img = tf.image.decode_jpeg(img, channels=3)\n    img = tf.image.resize(img, (299, 299))\n    img = tf.keras.applications.inception_v3.preprocess_input(img)\n    return img, image_path\n```\n\n## 4. 初始化InceptionV3并加载预训练的Imagenet权重\n\n现在您将创建一个 tf.keras 模型，其中输出层是 InceptionV3 体系结构中的最后一个卷积层。该层的输出形状为 `8x8x2048` 。使用最后一个卷积层是因为在这个例子中使用了注意力。您不会在训练期间执行此初始化，因为它可能会成为瓶颈。\n* 您通过网络转发每个图像并将结果向量存储在字典中(image_name --> feature_vector)\n* 在所有图像通过网络传递之后，您挑选字典并将其保存到磁盘。\n\n```python\nimage_model = tf.keras.applications.InceptionV3(include_top=False,\n                                                weights='imagenet')\nnew_input = image_model.input\nhidden_layer = image_model.layers[-1].output\n\nimage_features_extract_model = tf.keras.Model(new_input, hidden_layer)\n```\n\n## 5. 缓存从InceptionV3中提取的特性\n\n您将使用InceptionV3预处理每个映像并将输出缓存到磁盘。缓存RAM中的输出会更快但内存密集，每个映像需要 8 \\* 8 \\* 2048 个浮点数。在撰写本文时，这超出了Colab的内存限制（目前为12GB内存）。\n\n可以通过更复杂的缓存策略（例如，通过分割图像以减少随机访问磁盘 I/O）来提高性能，但这需要更多代码。\n\n使用GPU在Clab中运行大约需要10分钟。如果您想查看进度条，可以：\n使用GPU在Colab中运行大约需要10分钟。如果你想看到一个进度条，你可以：\n* 安装[tqdm](https://github.com/tqdm/tqdm) (```!pip install tqdm```)，\n* 导入它(```from tqdm import tqdm```)，\n* 然后改变这一行：\n\n```for img, path in image_dataset:```\n\nto:\n\n```for img, path in tqdm(image_dataset):```.\n\n\n```python\n# getting the unique images\nencode_train = sorted(set(img_name_vector))\n\n# feel free to change the batch_size according to your system configuration\nimage_dataset = tf.data.Dataset.from_tensor_slices(encode_train)\nimage_dataset = image_dataset.map(\n  load_image, num_parallel_calls=tf.data.experimental.AUTOTUNE).batch(16)\n\nfor img, path in image_dataset:\n  batch_features = image_features_extract_model(img)\n  batch_features = tf.reshape(batch_features,\n                              (batch_features.shape[0], -1, batch_features.shape[3]))\n\n  for bf, p in zip(batch_features, path):\n    path_of_feature = p.numpy().decode(\"utf-8\")\n    np.save(path_of_feature, bf.numpy())\n```\n\n## 6. 对标题进行预处理和标记\n\n* 首先，您将对标题进行标记（例如，通过拆分空格）。这为我们提供了数据中所有独特单词的词汇表（例如，“冲浪”，“足球”等）。\n* 接下来，您将词汇量限制为前5,000个单词（以节省内存）。您将使用令牌“UNK”（未知）替换所有其他单词。\n* 然后，您可以创建单词到索引和索引到单词的映射。\n* 最后，将所有序列填充到与最长序列相同的长度。\n\n```python\n# This will find the maximum length of any caption in our dataset\ndef calc_max_length(tensor):\n    return max(len(t) for t in tensor)\n```\n\n\n```python\n# The steps above is a general process of dealing with text processing\n\n# choosing the top 5000 words from the vocabulary\ntop_k = 5000\ntokenizer = tf.keras.preprocessing.text.Tokenizer(num_words=top_k,\n                                                  oov_token=\"<unk>\",\n                                                  filters='!\"#$%&()*+.,-/:;=?@[\\]^_`{|}~ ')\ntokenizer.fit_on_texts(train_captions)\ntrain_seqs = tokenizer.texts_to_sequences(train_captions)\n```\n\n\n```python\ntokenizer.word_index['<pad>'] = 0\ntokenizer.index_word[0] = '<pad>'\n```\n\n\n```python\n# creating the tokenized vectors\ntrain_seqs = tokenizer.texts_to_sequences(train_captions)\n```\n\n\n```python\n# padding each vector to the max_length of the captions\n# if the max_length parameter is not provided, pad_sequences calculates that automatically\ncap_vector = tf.keras.preprocessing.sequence.pad_sequences(train_seqs, padding='post')\n```\n\n\n```python\n# calculating the max_length\n# used to store the attention weights\nmax_length = calc_max_length(train_seqs)\n```\n\n## 7. 将数据分解为训练和测试\n\n\n```python\n# Create training and validation sets using 80-20 split\nimg_name_train, img_name_val, cap_train, cap_val = train_test_split(img_name_vector,\n                                                                    cap_vector,\n                                                                    test_size=0.2,\n                                                                    random_state=0)\n```\n\n\n```python\nlen(img_name_train), len(cap_train), len(img_name_val), len(cap_val)\n```\n\n```\n    (24000, 24000, 6000, 6000)\n```\n\n\n## 8. 创建用于训练的tf.data数据集\n\n我们的图片和标题已准备就绪！接下来，让我们创建一个tf.data数据集来用于训练我们的模型。\n\n```python\n# feel free to change these parameters according to your system's configuration\n\nBATCH_SIZE = 64\nBUFFER_SIZE = 1000\nembedding_dim = 256\nunits = 512\nvocab_size = len(tokenizer.word_index) + 1\nnum_steps = len(img_name_train) // BATCH_SIZE\n# shape of the vector extracted from InceptionV3 is (64, 2048)\n# these two variables represent that\nfeatures_shape = 2048\nattention_features_shape = 64\n```\n\n\n```python\n# loading the numpy files\ndef map_func(img_name, cap):\n  img_tensor = np.load(img_name.decode('utf-8')+'.npy')\n  return img_tensor, cap\n```\n\n\n```python\ndataset = tf.data.Dataset.from_tensor_slices((img_name_train, cap_train))\n\n# using map to load the numpy files in parallel\ndataset = dataset.map(lambda item1, item2: tf.numpy_function(\n          map_func, [item1, item2], [tf.float32, tf.int32]),\n          num_parallel_calls=tf.data.experimental.AUTOTUNE)\n\n# shuffling and batching\ndataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE)\ndataset = dataset.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)\n```\n\n## 9. 模型\n\n\n有趣的事实：下面的解码器与 [注意神经机器翻译的示例](https://tensorflow.google.cn/beta/tutorials/text/nmt_with_attention)中的解码器相同。\n\n模型架构的灵感来自论文 [Show, Attend and Tell](https://arxiv.org/pdf/1502.03044.pdf) 。\n\n* 在这个例子中，你从InceptionV3的下卷积层中提取特征，给我们一个形状矢量(8, 8, 2048).\n* 你将它压成（64,2048）的形状。\n* 然后，该向量通过CNN编码器（由单个完全连接的层组成）。\n* RNN（此处为GRU）参与图像以预测下一个单词。\n\n\n```python\nclass BahdanauAttention(tf.keras.Model):\n  def __init__(self, units):\n    super(BahdanauAttention, self).__init__()\n    self.W1 = tf.keras.layers.Dense(units)\n    self.W2 = tf.keras.layers.Dense(units)\n    self.V = tf.keras.layers.Dense(1)\n\n  def call(self, features, hidden):\n    # features(CNN_encoder output) shape == (batch_size, 64, embedding_dim)\n\n    # hidden shape == (batch_size, hidden_size)\n    # hidden_with_time_axis shape == (batch_size, 1, hidden_size)\n    hidden_with_time_axis = tf.expand_dims(hidden, 1)\n\n    # score shape == (batch_size, 64, hidden_size)\n    score = tf.nn.tanh(self.W1(features) + self.W2(hidden_with_time_axis))\n\n    # attention_weights shape == (batch_size, 64, 1)\n    # we get 1 at the last axis because we are applying score to self.V\n    attention_weights = tf.nn.softmax(self.V(score), axis=1)\n\n    # context_vector shape after sum == (batch_size, hidden_size)\n    context_vector = attention_weights * features\n    context_vector = tf.reduce_sum(context_vector, axis=1)\n\n    return context_vector, attention_weights\n```\n\n\n```python\nclass CNN_Encoder(tf.keras.Model):\n    # Since we have already extracted the features and dumped it using pickle\n    # This encoder passes those features through a Fully connected layer\n    def __init__(self, embedding_dim):\n        super(CNN_Encoder, self).__init__()\n        # shape after fc == (batch_size, 64, embedding_dim)\n        self.fc = tf.keras.layers.Dense(embedding_dim)\n\n    def call(self, x):\n        x = self.fc(x)\n        x = tf.nn.relu(x)\n        return x\n```\n\n\n```python\nclass RNN_Decoder(tf.keras.Model):\n  def __init__(self, embedding_dim, units, vocab_size):\n    super(RNN_Decoder, self).__init__()\n    self.units = units\n\n    self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)\n    self.gru = tf.keras.layers.GRU(self.units,\n                                   return_sequences=True,\n                                   return_state=True,\n                                   recurrent_initializer='glorot_uniform')\n    self.fc1 = tf.keras.layers.Dense(self.units)\n    self.fc2 = tf.keras.layers.Dense(vocab_size)\n\n    self.attention = BahdanauAttention(self.units)\n\n  def call(self, x, features, hidden):\n    # defining attention as a separate model\n    context_vector, attention_weights = self.attention(features, hidden)\n\n    # x shape after passing through embedding == (batch_size, 1, embedding_dim)\n    x = self.embedding(x)\n\n    # x shape after concatenation == (batch_size, 1, embedding_dim + hidden_size)\n    x = tf.concat([tf.expand_dims(context_vector, 1), x], axis=-1)\n\n    # passing the concatenated vector to the GRU\n    output, state = self.gru(x)\n\n    # shape == (batch_size, max_length, hidden_size)\n    x = self.fc1(output)\n\n    # x shape == (batch_size * max_length, hidden_size)\n    x = tf.reshape(x, (-1, x.shape[2]))\n\n    # output shape == (batch_size * max_length, vocab)\n    x = self.fc2(x)\n\n    return x, state, attention_weights\n\n  def reset_state(self, batch_size):\n    return tf.zeros((batch_size, self.units))\n```\n\n\n```python\nencoder = CNN_Encoder(embedding_dim)\ndecoder = RNN_Decoder(embedding_dim, units, vocab_size)\n```\n\n\n```python\noptimizer = tf.keras.optimizers.Adam()\nloss_object = tf.keras.losses.SparseCategoricalCrossentropy(\n    from_logits=True, reduction='none')\n\ndef loss_function(real, pred):\n  mask = tf.math.logical_not(tf.math.equal(real, 0))\n  loss_ = loss_object(real, pred)\n\n  mask = tf.cast(mask, dtype=loss_.dtype)\n  loss_ *= mask\n\n  return tf.reduce_mean(loss_)\n```\n\n## 10. Checkpoint 检查点\n\n\n```python\ncheckpoint_path = \"./checkpoints/train\"\nckpt = tf.train.Checkpoint(encoder=encoder,\n                           decoder=decoder,\n                           optimizer = optimizer)\nckpt_manager = tf.train.CheckpointManager(ckpt, checkpoint_path, max_to_keep=5)\n```\n\n\n```python\nstart_epoch = 0\nif ckpt_manager.latest_checkpoint:\n  start_epoch = int(ckpt_manager.latest_checkpoint.split('-')[-1])\n```\n\n## 11. 训练\n\n* 您提取各自.npy文件中存储的特性，然后通过编码器传递这些特性。\n* 编码器输出，隐藏状态（初始化为0）和解码器输入（它是开始标记）被传递给解码器。\n* 解码器返回预测和解码器隐藏状态。\n* 然后将解码器隐藏状态传递回模型，并使用预测来计算损失。\n* 使用teacher forcing决定解码器的下一个输入。\n* Teacher forcing 是将目标字作为下一个输入传递给解码器的技术。\n* 最后一步是计算梯度，并将其应用于优化器和反向传播。\n\n\n\n```python\n# adding this in a separate cell because if you run the training cell\n# many times, the loss_plot array will be reset\nloss_plot = []\n```\n\n\n```python\n@tf.function\ndef train_step(img_tensor, target):\n  loss = 0\n\n  # initializing the hidden state for each batch\n  # because the captions are not related from image to image\n  hidden = decoder.reset_state(batch_size=target.shape[0])\n\n  dec_input = tf.expand_dims([tokenizer.word_index['<start>']] * BATCH_SIZE, 1)\n\n  with tf.GradientTape() as tape:\n      features = encoder(img_tensor)\n\n      for i in range(1, target.shape[1]):\n          # passing the features through the decoder\n          predictions, hidden, _ = decoder(dec_input, features, hidden)\n\n          loss += loss_function(target[:, i], predictions)\n\n          # using teacher forcing\n          dec_input = tf.expand_dims(target[:, i], 1)\n\n  total_loss = (loss / int(target.shape[1]))\n\n  trainable_variables = encoder.trainable_variables + decoder.trainable_variables\n\n  gradients = tape.gradient(loss, trainable_variables)\n\n  optimizer.apply_gradients(zip(gradients, trainable_variables))\n\n  return loss, total_loss\n```\n\n\n```python\nEPOCHS = 20\n\nfor epoch in range(start_epoch, EPOCHS):\n    start = time.time()\n    total_loss = 0\n\n    for (batch, (img_tensor, target)) in enumerate(dataset):\n        batch_loss, t_loss = train_step(img_tensor, target)\n        total_loss += t_loss\n\n        if batch % 100 == 0:\n            print ('Epoch {} Batch {} Loss {:.4f}'.format(\n              epoch + 1, batch, batch_loss.numpy() / int(target.shape[1])))\n    # storing the epoch end loss value to plot later\n    loss_plot.append(total_loss / num_steps)\n\n    if epoch % 5 == 0:\n      ckpt_manager.save()\n\n    print ('Epoch {} Loss {:.6f}'.format(epoch + 1,\n                                         total_loss/num_steps))\n    print ('Time taken for 1 epoch {} sec\\n'.format(time.time() - start))\n```\n\n```\n    ......\n    Epoch 20 Batch 0 Loss 0.3568\n    Epoch 20 Batch 100 Loss 0.3288\n    Epoch 20 Batch 200 Loss 0.3357\n    Epoch 20 Batch 300 Loss 0.2945\n    Epoch 20 Loss 0.358618\n    Time taken for 1 epoch 186.8766734600067 sec\n    \n```\n\n\n```python\nplt.plot(loss_plot)\nplt.xlabel('Epochs')\nplt.ylabel('Loss')\nplt.title('Loss Plot')\nplt.show()\n```\n\n![png](image_captioning_44_0.png)\n\n\n## 12. 标题!\n\n* 评估函数类似于训练循环，只是这里不使用 teacher forcing 。解码器在每个时间步长的输入是其先前的预测，以及隐藏状态和编码器的输出。\n* 当模型预测结束令牌时停止预测。\n* 并存储每个时间步的注意力。\n\n\n```python\ndef evaluate(image):\n    attention_plot = np.zeros((max_length, attention_features_shape))\n\n    hidden = decoder.reset_state(batch_size=1)\n\n    temp_input = tf.expand_dims(load_image(image)[0], 0)\n    img_tensor_val = image_features_extract_model(temp_input)\n    img_tensor_val = tf.reshape(img_tensor_val, (img_tensor_val.shape[0], -1, img_tensor_val.shape[3]))\n\n    features = encoder(img_tensor_val)\n\n    dec_input = tf.expand_dims([tokenizer.word_index['<start>']], 0)\n    result = []\n\n    for i in range(max_length):\n        predictions, hidden, attention_weights = decoder(dec_input, features, hidden)\n\n        attention_plot[i] = tf.reshape(attention_weights, (-1, )).numpy()\n\n        predicted_id = tf.argmax(predictions[0]).numpy()\n        result.append(tokenizer.index_word[predicted_id])\n\n        if tokenizer.index_word[predicted_id] == '<end>':\n            return result, attention_plot\n\n        dec_input = tf.expand_dims([predicted_id], 0)\n\n    attention_plot = attention_plot[:len(result), :]\n    return result, attention_plot\n```\n\n\n```python\ndef plot_attention(image, result, attention_plot):\n    temp_image = np.array(Image.open(image))\n\n    fig = plt.figure(figsize=(10, 10))\n\n    len_result = len(result)\n    for l in range(len_result):\n        temp_att = np.resize(attention_plot[l], (8, 8))\n        ax = fig.add_subplot(len_result//2, len_result//2, l+1)\n        ax.set_title(result[l])\n        img = ax.imshow(temp_image)\n        ax.imshow(temp_att, cmap='gray', alpha=0.6, extent=img.get_extent())\n\n    plt.tight_layout()\n    plt.show()\n```\n\n\n```python\n# captions on the validation set\nrid = np.random.randint(0, len(img_name_val))\nimage = img_name_val[rid]\nreal_caption = ' '.join([tokenizer.index_word[i] for i in cap_val[rid] if i not in [0]])\nresult, attention_plot = evaluate(image)\n\nprint ('Real Caption:', real_caption)\nprint ('Prediction Caption:', ' '.join(result))\nplot_attention(image, result, attention_plot)\n# opening the image\nImage.open(img_name_val[rid])\n```\n\n```\n    Real Caption: <start> a man gets ready to hit a ball with a bat  <end>\n    Prediction Caption: a baseball player begins to bat <end>\n    真实的标题：一个人准备用球棒击球\n    预测标题:   棒球运动员开始击球\n```\n\n\n![png](image_captioning_48_1.png)\n\n\n![png](image_captioning_48_2.png)\n\n\n\n## 13. 在你自己的图片上试试\n\n为了好玩，下面我们提供了一种方法，您可以使用我们刚刚训练过的模型为您自己的图像添加标题。请记住，它是在相对少量的数据上训练的，您的图像可能与训练数据不同（因此请为奇怪的结果做好准备！）\n\n```python\nimage_url = 'https://tensorflow.org/images/surf.jpg'\nimage_extension = image_url[-4:]\nimage_path = tf.keras.utils.get_file('image'+image_extension,\n                                     origin=image_url)\n\nresult, attention_plot = evaluate(image_path)\nprint ('Prediction Caption:', ' '.join(result))\nplot_attention(image_path, result, attention_plot)\n# opening the image\nImage.open(image_path)\n```\n\n```\n    Prediction Caption: a man riding a surf board in the water <end>\n    预测标题：一名男子在水中骑冲浪板\n```\n\n![png](image_captioning_50_1.png)\n\n![png](image_captioning_50_2.png)\n\n\n# 下一步\n\n恭喜！您刚刚训练了一个注意力机制给图像取标题的模型。接下来，看一下这个[使用注意力机制的神经机器翻译示例](https://tensorflow.google.cn/beta/tutorials/text/nmt_with_attention)。它使用类似的架构来翻译西班牙语和英语句子。您还可以尝试在不同的数据集上训练此笔记本中的代码。\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-text-image_captioning.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-text-image_captioning.html)\n> 英文版本：[https://tensorflow.google.cn/beta/tutorials/text/image_captioning](https://tensorflow.google.cn/beta/tutorials/text/image_captioning)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/text/image_captioning.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/text/image_captioning.md)\n"
  },
  {
    "path": "r2/tutorials/text/nmt_with_attention.md",
    "content": "---\ntitle: 采用注意力机制的神经机器翻译\ncategories: tensorflow2官方教程\ntags: tensorflow2.0教程\ntop: 1962\nabbrlink: tensorflow/tf2-tutorials-text-nmt_with_attention\n---\n\n# 采用注意力机制的神经机器翻译(tensorflow2.0官方教程翻译)\n\n本教程训练一个序列到序列 (seq2seq)模型，实现西班牙语到英语的翻译。这是一个高级示例，要求您对序列到序列模型有一定的了解。\n\n训练模型后，输入一个西班牙语句子将返回对应英文翻译，例如 *\"¿todavia estan en casa?\"* ，返回 *\"are you still at home?\"*\n\n对于一个玩具例子来说，翻译质量是合理的，但是生成的注意情节可能更有趣。这说明在翻译过程中，模型注意到了输入句子的哪些部分:\n\n<img src=\"https://tensorflow.google.cn/images/spanish-english.png\" alt=\"spanish-english attention plot\">\n\n注意：此示例在单个P100 GPU上运行大约需要10分钟。\n\n```python\nfrom __future__ import absolute_import, division, print_function, unicode_literals\n\nimport tensorflow as tf\n\nimport matplotlib.pyplot as plt\nfrom sklearn.model_selection import train_test_split\n\nimport unicodedata\nimport re\nimport numpy as np\nimport os\nimport io\nimport time\n```\n\n## 1. 下载并准备数据集\n\n我们将使用 http://www.manythings.org/anki/  提供的语言数据集。此数据集包含以下格式的语言翻译对：\n\n```\nMay I borrow this book?\t¿Puedo tomar prestado este libro?\n```\n\n有多种语言可供选择，但我们将使用英语 - 西班牙语数据集。为方便起见，我们在Google Cloud上托管了此数据集的副本，但您也可以下载自己的副本。下载数据集后，以下是我们准备数据的步骤：\n\n1. 为每个句子添加开始和结束标记。\n2. 删除特殊字符来清除句子。\n3. 创建一个单词索引和反向单词索引（从单词→id和id→单词映射的字典）。\n4. 将每个句子填充到最大长度。\n\n\n```python\n# Download the file\npath_to_zip = tf.keras.utils.get_file(\n    'spa-eng.zip', origin='http://storage.googleapis.com/download.tensorflow.org/data/spa-eng.zip',\n    extract=True)\n\npath_to_file = os.path.dirname(path_to_zip)+\"/spa-eng/spa.txt\"\n```\n\n```output\n    Downloading data from http://storage.googleapis.com/download.tensorflow.org/data/spa-eng.zip\n    2646016/2638744 [==============================] - 0s 0us/step\n    2654208/2638744 [==============================] - 0s 0us/step\n```\n\n```python\n# Converts the unicode file to ascii\ndef unicode_to_ascii(s):\n    return ''.join(c for c in unicodedata.normalize('NFD', s)\n        if unicodedata.category(c) != 'Mn')\n\n\ndef preprocess_sentence(w):\n    w = unicode_to_ascii(w.lower().strip())\n\n    # creating a space between a word and the punctuation following it\n    # eg: \"he is a boy.\" => \"he is a boy .\"\n    # Reference:- https://stackoverflow.com/questions/3645931/python-padding-punctuation-with-white-spaces-keeping-punctuation\n    w = re.sub(r\"([?.!,¿])\", r\" \\1 \", w)\n    w = re.sub(r'[\" \"]+', \" \", w)\n\n    # replacing everything with space except (a-z, A-Z, \".\", \"?\", \"!\", \",\")\n    w = re.sub(r\"[^a-zA-Z?.!,¿]+\", \" \", w)\n\n    w = w.rstrip().strip()\n\n    # adding a start and an end token to the sentence\n    # so that the model know when to start and stop predicting.\n    w = '<start> ' + w + ' <end>'\n    return w\n```\n\n\n```python\nen_sentence = u\"May I borrow this book?\"\nsp_sentence = u\"¿Puedo tomar prestado este libro?\"\nprint(preprocess_sentence(en_sentence))\nprint(preprocess_sentence(sp_sentence).encode('utf-8'))\n```\n\n```output\n    <start> may i borrow this book ? <end>\n    <start> ¿ puedo tomar prestado este libro ? <end>\n```\n\n```python\n# 1. Remove the accents\n# 2. Clean the sentences\n# 3. Return word pairs in the format: [ENGLISH, SPANISH]\ndef create_dataset(path, num_examples):\n    lines = io.open(path, encoding='UTF-8').read().strip().split('\\n')\n\n    word_pairs = [[preprocess_sentence(w) for w in l.split('\\t')]  for l in lines[:num_examples]]\n\n    return zip(*word_pairs)\n```\n\n\n```python\nen, sp = create_dataset(path_to_file, None)\nprint(en[-1])\nprint(sp[-1])\n```\n\n```output\n    <start> if you want to sound like a native speaker , you must be willing to practice saying the same sentence over and over in the same way that banjo players practice the same phrase over and over until they can play it correctly and at the desired tempo . <end>\n    <start> si quieres sonar como un hablante nativo , debes estar dispuesto a practicar diciendo la misma frase una y otra vez de la misma manera en que un musico de banjo practica el mismo fraseo una y otra vez hasta que lo puedan tocar correctamente y en el tiempo esperado . <end>\n```\n\n\n```python\ndef max_length(tensor):\n    return max(len(t) for t in tensor)\n```\n\n\n```python\ndef tokenize(lang):\n  lang_tokenizer = tf.keras.preprocessing.text.Tokenizer(\n      filters='')\n  lang_tokenizer.fit_on_texts(lang)\n\n  tensor = lang_tokenizer.texts_to_sequences(lang)\n\n  tensor = tf.keras.preprocessing.sequence.pad_sequences(tensor,\n                                                         padding='post')\n\n  return tensor, lang_tokenizer\n```\n\n\n```python\ndef load_dataset(path, num_examples=None):\n    # creating cleaned input, output pairs\n    targ_lang, inp_lang = create_dataset(path, num_examples)\n\n    input_tensor, inp_lang_tokenizer = tokenize(inp_lang)\n    target_tensor, targ_lang_tokenizer = tokenize(targ_lang)\n\n    return input_tensor, target_tensor, inp_lang_tokenizer, targ_lang_tokenizer\n```\n\n### 1.1. 限制数据集的大小以更快地进行实验（可选）\n\n对 > 100,000个句子的完整数据集进行训练需要很长时间。为了更快地训练，我们可以将数据集的大小限制为30,000个句子（当然，翻译质量会随着数据的减少而降低）：\n\n```python\n# Try experimenting with the size of that dataset\nnum_examples = 30000\ninput_tensor, target_tensor, inp_lang, targ_lang = load_dataset(path_to_file, num_examples)\n\n# Calculate max_length of the target tensors\nmax_length_targ, max_length_inp = max_length(target_tensor), max_length(input_tensor)\n```\n\n\n```python\n# Creating training and validation sets using an 80-20 split\ninput_tensor_train, input_tensor_val, target_tensor_train, target_tensor_val = train_test_split(input_tensor, target_tensor, test_size=0.2)\n\n# Show length\nlen(input_tensor_train), len(target_tensor_train), len(input_tensor_val), len(target_tensor_val)\n```\n\n```output\n    (24000, 24000, 6000, 6000)\n```\n\n\n\n```python\ndef convert(lang, tensor):\n  for t in tensor:\n    if t!=0:\n      print (\"%d ----> %s\" % (t, lang.index_word[t]))\n```\n\n\n```python\nprint (\"Input Language; index to word mapping\")\nconvert(inp_lang, input_tensor_train[0])\nprint ()\nprint (\"Target Language; index to word mapping\")\nconvert(targ_lang, target_tensor_train[0])\n```\n\n```output\n    Input Language; index to word mapping\n    1 ----> <start>\n    8 ----> no\n    38 ----> puedo\n    804 ----> confiar\n    20 ----> en\n    1000 ----> vosotras\n    3 ----> .\n    2 ----> <end>\n    \n    Target Language; index to word mapping\n    1 ----> <start>\n    4 ----> i\n    25 ----> can\n    12 ----> t\n    345 ----> trust\n    6 ----> you\n    3 ----> .\n    2 ----> <end>\n```\n\n### 1.2. 创建 `tf.data` 数据集\n\n\n```python\nBUFFER_SIZE = len(input_tensor_train)\nBATCH_SIZE = 64\nsteps_per_epoch = len(input_tensor_train)//BATCH_SIZE\nembedding_dim = 256\nunits = 1024\nvocab_inp_size = len(inp_lang.word_index)+1\nvocab_tar_size = len(targ_lang.word_index)+1\n\ndataset = tf.data.Dataset.from_tensor_slices((input_tensor_train, target_tensor_train)).shuffle(BUFFER_SIZE)\ndataset = dataset.batch(BATCH_SIZE, drop_remainder=True)\n```\n\n\n```python\nexample_input_batch, example_target_batch = next(iter(dataset))\nexample_input_batch.shape, example_target_batch.shape\n```\n\n```output\n    (TensorShape([64, 16]), TensorShape([64, 11]))\n```\n\n\n## 2. 编写编码器和解码器模型\n\n我们将实现一个使用注意力机制的编码器-解码器模型，您可以在TensorFlow [神经机器翻译（seq2seq）教程](https://www.tensorflow.org/tutorials/seq2seq)中阅读。此示例使用更新的API集，实现了seq2seq教程中的注意方程式。下图显示了每个输入单词由注意机制分配权重，然后解码器使用该权重来预测句子中的下一个单词。\n\n<img src=\"https://tensorflow.google.cn/images/seq2seq/attention_mechanism.jpg\" width=\"500\" alt=\"attention mechanism\">\n\n\n通过编码器模型输入，该模型给出了形状 *(batch_size, max_length, hidden_size)* 的编码器输出和形状 *(batch_size, hidden_size)* 的编码器隐藏状态。\n\n下面是实现的方程:\n\n<img src=\"https://tensorflow.google.cn/images/seq2seq/attention_equation_0.jpg\" alt=\"attention equation 0\" width=\"800\">\n\n<img src=\"https://tensorflow.google.cn/images/seq2seq/attention_equation_1.jpg\" alt=\"attention equation 1\" width=\"800\">\n\n我们用的是 *Bahdanau attention* 。在写出简化形式之前，我们先来定义符号:\n\n* FC = Fully connected (dense) layer 完全连接（密集）层\n* EO = Encoder output 编码器输出\n* H = hidden state 隐藏的状态\n* X = input to the decoder 输入到解码器\n\n定义伪代码：\n\n* `score = FC(tanh(FC(EO) + FC(H)))`\n\n* `attention weights = softmax(score, axis = 1)`. 默认情况下Softmax应用于最后一个轴，但是我们要在 *第一轴* 上应用它，因为得分的形状是 *(batch_size, max_length, hidden_size)* 。`Max_length` 是我们输入的长度。由于我们尝试为每个输入分配权重，因此应在该轴上应用softmax。\n\n* `context vector = sum(attention weights * EO, axis = 1)`. 选择轴为1的原因与上述相同。\n\n* `embedding output` = 译码器X的输入通过嵌入层传递\n\n* `merged vector = concat(embedding output, context vector)`\n\n* 将该合并的矢量提供给GRU\n\n每个步骤中所有向量的形状都已在代码中的注释中指定：\n\n```python\nclass Encoder(tf.keras.Model):\n  def __init__(self, vocab_size, embedding_dim, enc_units, batch_sz):\n    super(Encoder, self).__init__()\n    self.batch_sz = batch_sz\n    self.enc_units = enc_units\n    self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)\n    self.gru = tf.keras.layers.GRU(self.enc_units,\n                                   return_sequences=True,\n                                   return_state=True,\n                                   recurrent_initializer='glorot_uniform')\n\n  def call(self, x, hidden):\n    x = self.embedding(x)\n    output, state = self.gru(x, initial_state = hidden)\n    return output, state\n\n  def initialize_hidden_state(self):\n    return tf.zeros((self.batch_sz, self.enc_units))\n```\n\n\n```python\nencoder = Encoder(vocab_inp_size, embedding_dim, units, BATCH_SIZE)\n\n# sample input\nsample_hidden = encoder.initialize_hidden_state()\nsample_output, sample_hidden = encoder(example_input_batch, sample_hidden)\nprint ('Encoder output shape: (batch size, sequence length, units) {}'.format(sample_output.shape))\nprint ('Encoder Hidden state shape: (batch size, units) {}'.format(sample_hidden.shape))\n```\n\n```output\n    Encoder output shape: (batch size, sequence length, units) (64, 16, 1024)\n    Encoder Hidden state shape: (batch size, units) (64, 1024)\n```\n\n\n```python\nclass BahdanauAttention(tf.keras.Model):\n  def __init__(self, units):\n    super(BahdanauAttention, self).__init__()\n    self.W1 = tf.keras.layers.Dense(units)\n    self.W2 = tf.keras.layers.Dense(units)\n    self.V = tf.keras.layers.Dense(1)\n\n  def call(self, query, values):\n    # hidden shape == (batch_size, hidden size)\n    # hidden_with_time_axis shape == (batch_size, 1, hidden size)\n    # we are doing this to perform addition to calculate the score\n    hidden_with_time_axis = tf.expand_dims(query, 1)\n\n    # score shape == (batch_size, max_length, hidden_size)\n    score = self.V(tf.nn.tanh(\n        self.W1(values) + self.W2(hidden_with_time_axis)))\n\n    # attention_weights shape == (batch_size, max_length, 1)\n    # we get 1 at the last axis because we are applying score to self.V\n    attention_weights = tf.nn.softmax(score, axis=1)\n\n    # context_vector shape after sum == (batch_size, hidden_size)\n    context_vector = attention_weights * values\n    context_vector = tf.reduce_sum(context_vector, axis=1)\n\n    return context_vector, attention_weights\n```\n\n\n```python\nattention_layer = BahdanauAttention(10)\nattention_result, attention_weights = attention_layer(sample_hidden, sample_output)\n\nprint(\"Attention result shape: (batch size, units) {}\".format(attention_result.shape))\nprint(\"Attention weights shape: (batch_size, sequence_length, 1) {}\".format(attention_weights.shape))\n```\n\n```output\n    Attention result shape: (batch size, units) (64, 1024)\n    Attention weights shape: (batch_size, sequence_length, 1) (64, 16, 1)\n```\n\n\n```\nclass Decoder(tf.keras.Model):\n  def __init__(self, vocab_size, embedding_dim, dec_units, batch_sz):\n    super(Decoder, self).__init__()\n    self.batch_sz = batch_sz\n    self.dec_units = dec_units\n    self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)\n    self.gru = tf.keras.layers.GRU(self.dec_units,\n                                   return_sequences=True,\n                                   return_state=True,\n                                   recurrent_initializer='glorot_uniform')\n    self.fc = tf.keras.layers.Dense(vocab_size)\n\n    # used for attention\n    self.attention = BahdanauAttention(self.dec_units)\n\n  def call(self, x, hidden, enc_output):\n    # enc_output shape == (batch_size, max_length, hidden_size)\n    context_vector, attention_weights = self.attention(hidden, enc_output)\n\n    # x shape after passing through embedding == (batch_size, 1, embedding_dim)\n    x = self.embedding(x)\n\n    # x shape after concatenation == (batch_size, 1, embedding_dim + hidden_size)\n    x = tf.concat([tf.expand_dims(context_vector, 1), x], axis=-1)\n\n    # passing the concatenated vector to the GRU\n    output, state = self.gru(x)\n\n    # output shape == (batch_size * 1, hidden_size)\n    output = tf.reshape(output, (-1, output.shape[2]))\n\n    # output shape == (batch_size, vocab)\n    x = self.fc(output)\n\n    return x, state, attention_weights\n```\n\n\n```python\ndecoder = Decoder(vocab_tar_size, embedding_dim, units, BATCH_SIZE)\n\nsample_decoder_output, _, _ = decoder(tf.random.uniform((64, 1)),\n                                      sample_hidden, sample_output)\n\nprint ('Decoder output shape: (batch_size, vocab size) {}'.format(sample_decoder_output.shape))\n```\n\n```output\n    Decoder output shape: (batch_size, vocab size) (64, 4935)\n```\n\n## 3. 定义优化器和损失函数\n\n\n```python\noptimizer = tf.keras.optimizers.Adam()\nloss_object = tf.keras.losses.SparseCategoricalCrossentropy(\n    from_logits=True, reduction='none')\n\ndef loss_function(real, pred):\n  mask = tf.math.logical_not(tf.math.equal(real, 0))\n  loss_ = loss_object(real, pred)\n\n  mask = tf.cast(mask, dtype=loss_.dtype)\n  loss_ *= mask\n\n  return tf.reduce_mean(loss_)\n```\n\n## 4. Checkpoints检查点（基于对象的保存）\n\n\n```python\ncheckpoint_dir = './training_checkpoints'\ncheckpoint_prefix = os.path.join(checkpoint_dir, \"ckpt\")\ncheckpoint = tf.train.Checkpoint(optimizer=optimizer,\n                                 encoder=encoder,\n                                 decoder=decoder)\n```\n\n## 5. 训练\n\n1. 通过编码器传递输入，编码器返回编码器输出和编码器隐藏状态。\n2. 编码器输出，编码器隐藏状态和解码器输入（它是开始标记）被传递给解码器。\n3. 解码器返回预测和解码器隐藏状态。\n4. 然后将解码器隐藏状态传递回模型，并使用预测来计算损失。\n5. 使用 *teacher forcing* 决定解码器的下一个输入。\n6. *Teacher forcing* 是将目标字作为下一个输入传递给解码器的技术。\n7. 最后一步是计算梯度并将其应用于优化器并反向传播。\n\n\n```python\n@tf.function\ndef train_step(inp, targ, enc_hidden):\n  loss = 0\n\n  with tf.GradientTape() as tape:\n    enc_output, enc_hidden = encoder(inp, enc_hidden)\n\n    dec_hidden = enc_hidden\n\n    dec_input = tf.expand_dims([targ_lang.word_index['<start>']] * BATCH_SIZE, 1)\n\n    # Teacher forcing - feeding the target as the next input\n    for t in range(1, targ.shape[1]):\n      # passing enc_output to the decoder\n      predictions, dec_hidden, _ = decoder(dec_input, dec_hidden, enc_output)\n\n      loss += loss_function(targ[:, t], predictions)\n\n      # using teacher forcing\n      dec_input = tf.expand_dims(targ[:, t], 1)\n\n  batch_loss = (loss / int(targ.shape[1]))\n\n  variables = encoder.trainable_variables + decoder.trainable_variables\n\n  gradients = tape.gradient(loss, variables)\n\n  optimizer.apply_gradients(zip(gradients, variables))\n\n  return batch_loss\n```\n\n\n```python\nEPOCHS = 10\n\nfor epoch in range(EPOCHS):\n  start = time.time()\n\n  enc_hidden = encoder.initialize_hidden_state()\n  total_loss = 0\n\n  for (batch, (inp, targ)) in enumerate(dataset.take(steps_per_epoch)):\n    batch_loss = train_step(inp, targ, enc_hidden)\n    total_loss += batch_loss\n\n    if batch % 100 == 0:\n        print('Epoch {} Batch {} Loss {:.4f}'.format(epoch + 1,\n                                                     batch,\n                                                     batch_loss.numpy()))\n  # saving (checkpoint) the model every 2 epochs\n  if (epoch + 1) % 2 == 0:\n    checkpoint.save(file_prefix = checkpoint_prefix)\n\n  print('Epoch {} Loss {:.4f}'.format(epoch + 1,\n                                      total_loss / steps_per_epoch))\n  print('Time taken for 1 epoch {} sec\\n'.format(time.time() - start))\n```\n\n```output\n    ......    \n    Epoch 10 Batch 0 Loss 0.1219\n    Epoch 10 Batch 100 Loss 0.1374\n    Epoch 10 Batch 200 Loss 0.1084\n    Epoch 10 Batch 300 Loss 0.0994\n    Epoch 10 Loss 0.1088\n    Time taken for 1 epoch 29.2324090004 sec\n```   \n\n\n## 6. 翻译\n\n* 评估函数类似于训练循环，除了我们在这里不使用 *teacher forcing* 。解码器在每个时间步长的输入是其先前的预测，以及隐藏状态和编码器的输出。\n* 停止预测模型何时预测结束标记。\n* 并存储每个时间步的注意力。\n\n注意：编码器输出仅针对一个输入计算一次。\n\n```python\ndef evaluate(sentence):\n    attention_plot = np.zeros((max_length_targ, max_length_inp))\n\n    sentence = preprocess_sentence(sentence)\n\n    inputs = [inp_lang.word_index[i] for i in sentence.split(' ')]\n    inputs = tf.keras.preprocessing.sequence.pad_sequences([inputs],\n                                                           maxlen=max_length_inp,\n                                                           padding='post')\n    inputs = tf.convert_to_tensor(inputs)\n\n    result = ''\n\n    hidden = [tf.zeros((1, units))]\n    enc_out, enc_hidden = encoder(inputs, hidden)\n\n    dec_hidden = enc_hidden\n    dec_input = tf.expand_dims([targ_lang.word_index['<start>']], 0)\n\n    for t in range(max_length_targ):\n        predictions, dec_hidden, attention_weights = decoder(dec_input,\n                                                             dec_hidden,\n                                                             enc_out)\n\n        # storing the attention weights to plot later on\n        attention_weights = tf.reshape(attention_weights, (-1, ))\n        attention_plot[t] = attention_weights.numpy()\n\n        predicted_id = tf.argmax(predictions[0]).numpy()\n\n        result += targ_lang.index_word[predicted_id] + ' '\n\n        if targ_lang.index_word[predicted_id] == '<end>':\n            return result, sentence, attention_plot\n\n        # the predicted ID is fed back into the model\n        dec_input = tf.expand_dims([predicted_id], 0)\n\n    return result, sentence, attention_plot\n```\n\n\n```python\n# function for plotting the attention weights\ndef plot_attention(attention, sentence, predicted_sentence):\n    fig = plt.figure(figsize=(10,10))\n    ax = fig.add_subplot(1, 1, 1)\n    ax.matshow(attention, cmap='viridis')\n\n    fontdict = {'fontsize': 14}\n\n    ax.set_xticklabels([''] + sentence, fontdict=fontdict, rotation=90)\n    ax.set_yticklabels([''] + predicted_sentence, fontdict=fontdict)\n\n    plt.show()\n```\n\n\n```python\ndef translate(sentence):\n    result, sentence, attention_plot = evaluate(sentence)\n\n    print('Input: %s' % (sentence))\n    print('Predicted translation: {}'.format(result))\n\n    attention_plot = attention_plot[:len(result.split(' ')), :len(sentence.split(' '))]\n    plot_attention(attention_plot, sentence.split(' '), result.split(' '))\n```\n\n## 7. 恢复最新的检查点并进行测试\n\n```python\n# restoring the latest checkpoint in checkpoint_dir\ncheckpoint.restore(tf.train.latest_checkpoint(checkpoint_dir))\n```\n\n```python\ntranslate(u'hace mucho frio aqui.')\n```\n\n```output\n    Input: <start> hace mucho frio aqui . <end>\n    Predicted translation: it s very cold here . <end>\n```\n\n\n![png](nmt_with_attention_43_1.png)\n\n\n\n```python\ntranslate(u'esta es mi vida.')\n```\n\n```output\n    Input: <start> esta es mi vida . <end>\n    Predicted translation: this is my life . <end>\n```\n\n\n![png](nmt_with_attention_44_1.png)\n\n\n\n```python\ntranslate(u'¿todavia estan en casa?')\n```\n\n```output\n    Input: <start> ¿ todavia estan en casa ? <end>\n    Predicted translation: are you still at home ? <end>\n```\n\n![png](nmt_with_attention_45_1.png)\n\n\n```python\n# wrong translation\ntranslate(u'trata de averiguarlo.')\n```\n\n```output\n    Input: <start> trata de averiguarlo . <end>\n    Predicted translation: try to figure it out . <end>\n```\n\n![png](nmt_with_attention_46_1.png)\n\n\n## 8. 下一步\n\n* 下载[不同的数据集](http://www.manythings.org/anki/)以试验翻译，例如，英语到德语，或英语到法语。\n* 尝试对更大的数据集进行训练，或使用更多的迭代周期\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-text-nmt_with_attention.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-text-nmt_with_attention.html)\n> 英文版本：[https://tensorflow.google.cn/beta/tutorials/text/nmt_with_attention](https://tensorflow.google.cn/beta/tutorials/text/nmt_with_attention)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/text/nmt_with_attention.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/text/nmt_with_attention.md)\n"
  },
  {
    "path": "r2/tutorials/text/text_classification_rnn.md",
    "content": "---\ntitle: 使用RNN对文本进行分类实践：电影评论\ncategories: tensorflow2官方教程\ntags: tensorflow2.0教程\ntop: 1928\nabbrlink: tensorflow/tf2-tutorials-text-text_classification_rnn\n---\n\n# 使用RNN对文本进行分类实践：电影评论 (tensorflow2.0官方教程翻译)\n\n本教程在[IMDB大型影评数据集](http://ai.stanford.edu/~amaas/data/sentiment/) 上训练一个循环神经网络进行情感分类。\n\n```python\nfrom __future__ import absolute_import, division, print_function, unicode_literals\n\n# !pip install tensorflow-gpu==2.0.0-alpha0\nimport tensorflow_datasets as tfds\nimport tensorflow as tf\n```\n\n导入matplotlib并创建一个辅助函数来绘制图形\n\n```python\nimport matplotlib.pyplot as plt\n\n\ndef plot_graphs(history, string):\n  plt.plot(history.history[string])\n  plt.plot(history.history['val_'+string])\n  plt.xlabel(\"Epochs\")\n  plt.ylabel(string)\n  plt.legend([string, 'val_'+string])\n  plt.show()\n```\n\n## 1. 设置输入管道\n\nIMDB大型电影影评数据集是一个二元分类数据集，所有评论都有正面或负面的情绪标签。\n\n使用[TFDS](https://tensorflow.google.cn/datasets)下载数据集，数据集附带一个内置的子字标记器\n\n\n```python\ndataset, info = tfds.load('imdb_reviews/subwords8k', with_info=True,\n                          as_supervised=True)\ntrain_dataset, test_dataset = dataset['train'], dataset['test']\n```\n\n由于这是一个子字标记器，它可以传递任何字符串，并且标记器将对其进行标记。\n\n```python\ntokenizer = info.features['text'].encoder\n\nprint ('Vocabulary size: {}'.format(tokenizer.vocab_size))\n```\n```\n      Vocabulary size: 8185\n```\n\n\n```python\nsample_string = 'TensorFlow is cool.'\n\ntokenized_string = tokenizer.encode(sample_string)\nprint ('Tokenized string is {}'.format(tokenized_string))\n\noriginal_string = tokenizer.decode(tokenized_string)\nprint ('The original string: {}'.format(original_string))\n\nassert original_string == sample_string\n```\n\n```\n      Tokenized string is [6307, 2327, 4043, 4265, 9, 2724, 7975]\n      The original string: TensorFlow is cool.\n```\n\n如果字符串不在字典中，则标记生成器通过将字符串分解为子字符串来对字符串进行编码。\n\n```python\nfor ts in tokenized_string:\n  print ('{} ----> {}'.format(ts, tokenizer.decode([ts])))\n```\n\n```\n    6307 ----> Ten\n    2327 ----> sor\n    4043 ----> Fl\n    4265 ----> ow\n    9 ----> is\n    2724 ----> cool\n    7975 ----> .\n```\n\n\n```python\nBUFFER_SIZE = 10000\nBATCH_SIZE = 64\n\ntrain_dataset = train_dataset.shuffle(BUFFER_SIZE)\ntrain_dataset = train_dataset.padded_batch(BATCH_SIZE, train_dataset.output_shapes)\n\ntest_dataset = test_dataset.padded_batch(BATCH_SIZE, test_dataset.output_shapes)\n```\n\n## 2. 创建模型\n\n构建一个`tf.keras.Sequential`模型并从嵌入层开始，嵌入层每个字存储一个向量，当被调用时，它将单词索引的序列转换为向量序列，这些向量是可训练的，在训练之后（在足够的数据上），具有相似含义的词通常具有相似的向量。\n\n这种索引查找比通过`tf.keras.layers.Dense`层传递独热编码向量的等效操作更有效。\n\n递归神经网络（RNN）通过迭代元素来处理序列输入，RNN将输出从一个时间步传递到其输入端，然后传递到下一个时间步。\n\n`tf.keras.layers.Bidirectional`包装器也可以与RNN层一起使用。这通过RNN层向前和向后传播输入，然后连接输出。这有助于RNN学习远程依赖性。\n\n```python\nmodel = tf.keras.Sequential([\n    tf.keras.layers.Embedding(tokenizer.vocab_size, 64),\n    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)),\n    tf.keras.layers.Dense(64, activation='relu'),\n    tf.keras.layers.Dense(1, activation='sigmoid')\n])\n\n# 编译Keras模型以配置训练过程：\nmodel.compile(loss='binary_crossentropy',\n              optimizer='adam',\n              metrics=['accuracy'])\n```\n\n## 3. 训练模型\n\n```python\nhistory = model.fit(train_dataset, epochs=10,\n                    validation_data=test_dataset)\n```\n\n```\n      ...\n      Epoch 10/10\n      391/391 [==============================] - 70s 180ms/step - loss: 0.3074 - accuracy: 0.8692 - val_loss: 0.5533 - val_accuracy: 0.7873\n```\n\n\n```python\ntest_loss, test_acc = model.evaluate(test_dataset)\n\nprint('Test Loss: {}'.format(test_loss))\nprint('Test Accuracy: {}'.format(test_acc))\n```\n\n```\n          391/Unknown - 19s 47ms/step - loss: 0.5533 - accuracy: 0.7873Test Loss: 0.553319326714\n      Test Accuracy: 0.787320017815\n```\n\n\n上面的模型没有屏蔽应用于序列的填充。如果我们对填充序列进行训练，并对未填充序列进行测试，就会导致偏斜。理想情况下，模型应该学会忽略填充，但是正如您在下面看到的，它对输出的影响确实很小。\n\n如果预测 >=0.5，则为正，否则为负。\n\n```python\ndef pad_to_size(vec, size):\n  zeros = [0] * (size - len(vec))\n  vec.extend(zeros)\n  return vec\n\ndef sample_predict(sentence, pad):\n  tokenized_sample_pred_text = tokenizer.encode(sample_pred_text)\n\n  if pad:\n    tokenized_sample_pred_text = pad_to_size(tokenized_sample_pred_text, 64)\n\n  predictions = model.predict(tf.expand_dims(tokenized_sample_pred_text, 0))\n\n  return (predictions)\n```\n\n\n```python\n# 对不带填充的示例文本进行预测 \n\nsample_pred_text = ('The movie was cool. The animation and the graphics '\n                    'were out of this world. I would recommend this movie.')\npredictions = sample_predict(sample_pred_text, pad=False)\nprint (predictions)\n```\n\n```\n        [[ 0.68914342]]\n```\n\n\n```python\n# 对带填充的示例文本进行预测 \n\nsample_pred_text = ('The movie was cool. The animation and the graphics '\n                    'were out of this world. I would recommend this movie.')\npredictions = sample_predict(sample_pred_text, pad=True)\nprint (predictions)\n```\n\n```\n       [[ 0.68634349]]\n```\n\n```python\nplot_graphs(history, 'accuracy')\n```\n\n![png](https://tensorflow.google.cn/beta/tutorials/text/text_classification_rnn_files/output_29_0.png)\n\n\n```python\nplot_graphs(history, 'loss')\n```\n\n![png](https://tensorflow.google.cn/beta/tutorials/text/text_classification_rnn_files/output_30_0.png)\n\n## 4. 堆叠两个或更多LSTM层\n\nKeras递归层有两种可以用的模式，由`return_sequences`构造函数参数控制：\n\n* 返回每个时间步的连续输出的完整序列（3D张量形状 `(batch_size, timesteps, output_features)`）。\n\n* 仅返回每个输入序列的最后一个输出（2D张量形状 `(batch_size, output_features)`）。\n\n```python\nmodel = tf.keras.Sequential([\n    tf.keras.layers.Embedding(tokenizer.vocab_size, 64),\n    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(\n        64, return_sequences=True)),\n    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32)),\n    tf.keras.layers.Dense(64, activation='relu'),\n    tf.keras.layers.Dense(1, activation='sigmoid')\n])\n\nmodel.compile(loss='binary_crossentropy',\n              optimizer='adam',\n              metrics=['accuracy'])\n\nhistory = model.fit(train_dataset, epochs=10,\n                    validation_data=test_dataset)\n```\n\n```\n      ...\n      Epoch 10/10\n      391/391 [==============================] - 154s 394ms/step - loss: 0.1120 - accuracy: 0.9643 - val_loss: 0.5646 - val_accuracy: 0.8070\n```\n\n```python\ntest_loss, test_acc = model.evaluate(test_dataset)\n\nprint('Test Loss: {}'.format(test_loss))\nprint('Test Accuracy: {}'.format(test_acc))\n```\n\n```\n            391/Unknown - 45s 115ms/step - loss: 0.5646 - accuracy: 0.8070Test Loss: 0.564571284348\n        Test Accuracy: 0.80703997612\n```\n\n\n```python\n# 在没有填充的情况下预测示例文本\n\nsample_pred_text = ('The movie was not good. The animation and the graphics '\n                    'were terrible. I would not recommend this movie.')\npredictions = sample_predict(sample_pred_text, pad=False)\nprint (predictions)\n```\n\n```\n       [[ 0.00393916]]\n```\n\n\n```python\n# 在有填充的情况下预测示例文本\n\nsample_pred_text = ('The movie was not good. The animation and the graphics '\n                    'were terrible. I would not recommend this movie.')\npredictions = sample_predict(sample_pred_text, pad=True)\nprint (predictions)\n```\n\n```\n      [[ 0.01098633]]\n```\n\n\n```python\nplot_graphs(history, 'accuracy')\n```\n\n![png](https://tensorflow.google.cn/beta/tutorials/text/text_classification_rnn_files/output_38_0.png)\n\n\n```python\nplot_graphs(history, 'loss')\n```\n\n![png](https://tensorflow.google.cn/beta/tutorials/text/text_classification_rnn_files/output_39_0.png)\n\n查看其它现有的递归层，例如[GRU层](https://tensorflow.google.cn/api_docs/python/tf/keras/layers/GRU)。\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-text-text_classification_rnn.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-text-text_classification_rnn.html)\n> 英文版本：[https://tensorflow.google.cn/beta/tutorials/text/text_classification_rnn](https://tensorflow.google.cn/beta/tutorials/text/text_classification_rnn)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/text/text_classification_rnn.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/text/text_classification_rnn.md)\n"
  },
  {
    "path": "r2/tutorials/text/text_generation.md",
    "content": "---\ntitle: 使用RNN生成文本实战：莎士比亚风格诗句\ncategories: tensorflow2官方教程\ntags: tensorflow2.0教程\ntop: 1961\nabbrlink: tensorflow/tf2-tutorials-text-text_generation\n---\n\n# 使用RNN生成文本实战：莎士比亚风格诗句  (tensorflow2.0官方教程翻译）\n\n本教程演示了如何使用基于字符的 RNN 生成文本。我们将使用 Andrej Karpathy 在 [The Unreasonable Effectiveness of Recurrent Neural Networks](http://karpathy.github.io/2015/05/21/rnn-effectiveness/) 一文中提供的莎士比亚作品数据集。我们根据此数据（“Shakespear”）中的给定字符序列训练一个模型，让它预测序列的下一个字符（“e”）。通过重复调用该模型，可以生成更长的文本序列。\n\n\n注意：启用 GPU 加速可提高执行速度。在 Colab 中依次选择“运行时”>“更改运行时类型”>“硬件加速器”>“GPU”。如果在本地运行，请确保 TensorFlow 的版本为 1.11.0 或更高版本。\n\n本教程中包含使用 [tf.keras](https://tensorflow.google.cn/guide/keras) 和 [Eager Execution](https://tensorflow.google.cn/guide/eager) 实现的可运行代码。以下是本教程中的模型训练了30个周期时的示例输出，并以字符串“Q”开头：\n\n<pre>\nQUEENE:\nI had thought thou hadst a Roman; for the oracle,\nThus by All bids the man against the word,\nWhich are so weak of care, by old care done;\nYour children were in your holy love,\nAnd the precipitation through the bleeding throne.\n\nBISHOP OF ELY:\nMarry, and will, my lord, to weep in such a one were prettiest;\nYet now I was adopted heir\nOf the world's lamentable day,\nTo watch the next way with his father with his face?\n\nESCALUS:\nThe cause why then we are all resolved more sons.\n\nVOLUMNIA:\nO, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, it is no sin it should be dead,\nAnd love and pale as any will to that word.\n\nQUEEN ELIZABETH:\nBut how long have I heard the soul for this world,\nAnd show his hands of life be proved to stand.\n\nPETRUCHIO:\nI say he look'd on, if I must be content\nTo stay him from the fatal of our country's bliss.\nHis lordship pluck'd from this sentence then for prey,\nAnd then let us twain, being the moon,\nwere she such a case as fills m\n</pre>\n\n虽然有些句子合乎语法规则，但大多数句子都没有意义。该模型尚未学习单词的含义，但请考虑以下几点：\n\n* 该模型是基于字符的模型。在训练之初，该模型都不知道如何拼写英语单词，甚至不知道单词是一种文本单位。\n\n* 输出的文本结构仿照了剧本的结构：文本块通常以讲话者的名字开头，并且像数据集中一样，这些名字全部采用大写字母。\n\n* 如下文所示，尽管该模型只使用小批次的文本（每批文本包含 100 个字符）训练而成，但它仍然能够生成具有连贯结构的更长文本序列。\n\n## 1. 设置Setup\n\n### 1.1. 导入 TensorFlow 和其他库\n\n\n```python\nfrom __future__ import absolute_import, division, print_function, unicode_literals\n\n# !pip install tensorflow-gpu==2.0.0-alpha0\nimport tensorflow as tf\n\nimport numpy as np\nimport os\nimport time\n```\n\n```\n    Collecting tensorflow-gpu==2.0.0-alpha0\n    Successfully installed google-pasta-0.1.4 tb-nightly-1.14.0a20190303 tensorflow-estimator-2.0-preview-1.14.0.dev2019030300 tensorflow-gpu==2.0.0-alpha0-2.0.0.dev20190303\n```\n\n### 1.2. 下载莎士比亚数据集\n\n\n通过更改以下行可使用您自己的数据运行此代码。\n\n```python\npath_to_file = tf.keras.utils.get_file('shakespeare.txt', 'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')\n```\n\n### 1.3. 读取数据\n\n首先，我们来看一下文本内容。\n\n```python\n# Read, then decode for py2 compat.\ntext = open(path_to_file, 'rb').read().decode(encoding='utf-8')\n# length of text is the number of characters in it\nprint ('Length of text: {} characters'.format(len(text)))\n```\n\n    Length of text: 1115394 characters\n\n\n\n```python\n# Take a look at the first 250 characters in text\nprint(text[:250])\n```\n\n    First Citizen:\n    Before we proceed any further, hear me speak.\n    \n    All:\n    Speak, speak.\n    \n    First Citizen:\n    You are all resolved rather to die than to famish?\n    \n    All:\n    Resolved. resolved.\n    \n    First Citizen:\n    First, you know Caius Marcius is chief enemy to the people.\n    \n\n\n\n```python\n# The unique characters in the file\nvocab = sorted(set(text))\nprint ('{} unique characters'.format(len(vocab)))\n```\n\n    65 unique characters\n\n\n## 2. 处理文本\n\n### 2.1. 向量化文本\n\n在训练之前，我们需要将字符串映射到数字表示值。创建两个对照表：一个用于将字符映射到数字，另一个用于将数字映射到字符。\n\n```python\n# Creating a mapping from unique characters to indices\nchar2idx = {u:i for i, u in enumerate(vocab)}\nidx2char = np.array(vocab)\n\ntext_as_int = np.array([char2idx[c] for c in text])\n```\n\n现在，每个字符都有一个对应的整数表示值。请注意，我们按从 0 到 `len(unique)` 的索引映射字符。\n\n```python\nprint('{')\nfor char,_ in zip(char2idx, range(20)):\n    print('  {:4s}: {:3d},'.format(repr(char), char2idx[char]))\nprint('  ...\\n}')\n```\n\n    {\n      '\\n':   0,\n      ' ' :   1,\n      '!' :   2,\n      ...\n      'F' :  18,\n      'G' :  19,\n      ...\n    }\n\n\n\n```\n# Show how the first 13 characters from the text are mapped to integers\nprint ('{} ---- characters mapped to int ---- > {}'.format(repr(text[:13]), text_as_int[:13]))\n```\n\n    'First Citizen' ---- characters mapped to int ---- > [18 47 56 57 58  1 15 47 58 47 64 43 52]\n\n\n### 2.2. 预测任务\n\n根据给定的字符或字符序列预测下一个字符最有可能是什么？这是我们要训练模型去执行的任务。模型的输入将是字符序列，而我们要训练模型去预测输出，即每一个时间步的下一个字符。\n\n由于 RNN 会依赖之前看到的元素来维持内部状态，那么根据目前为止已计算过的所有字符，下一个字符是什么？\n\n### 2.3. 创建训练样本和目标\n\n将文本划分为训练样本和训练目标。每个训练样本都包含从文本中选取的 `seq_length` 个字符。\n\n相应的目标也包含相同长度的文本，但是将所选的字符序列向右顺移一个字符。\n\n将文本拆分成文本块，每个块的长度为 `seq_length+1` 个字符。例如，假设 `seq_length` 为 4，我们的文本为“Hello”，则可以将“Hell”创建为训练样本，将“ello”创建为目标。\n\n为此，首先使用`tf.data.Dataset.from_tensor_slices`函数将文本向量转换为字符索引流。\n\n```python\n# The maximum length sentence we want for a single input in characters\nseq_length = 100\nexamples_per_epoch = len(text)//seq_length\n\n# Create training examples / targets\nchar_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)\n\nfor i in char_dataset.take(5):\n  print(idx2char[i.numpy()])\n```\n\n    F\n    i\n    r\n    s\n    t\n\n\n批处理方法可以让我们轻松地将这些单个字符转换为所需大小的序列。\n\n```python\nsequences = char_dataset.batch(seq_length+1, drop_remainder=True)\n\nfor item in sequences.take(5):\n  print(repr(''.join(idx2char[item.numpy()])))\n```\n\n    'First Citizen:\\nBefore we proceed any further, hear me speak.\\n\\nAll:\\nSpeak, speak.\\n\\nFirst Citizen:\\nYou '\n    'are all resolved rather to die than to famish?\\n\\nAll:\\nResolved. resolved.\\n\\nFirst Citizen:\\nFirst, you k'\n    \"now Caius Marcius is chief enemy to the people.\\n\\nAll:\\nWe know't, we know't.\\n\\nFirst Citizen:\\nLet us ki\"\n    \"ll him, and we'll have corn at our own price.\\nIs't a verdict?\\n\\nAll:\\nNo more talking on't; let it be d\"\n    'one: away, away!\\n\\nSecond Citizen:\\nOne word, good citizens.\\n\\nFirst Citizen:\\nWe are accounted poor citi'\n\n\n对于每个序列，复制并移动它以创建输入文本和目标文本，方法是使用 `map` 方法将简单函数应用于每个批处理：\n\n```python\ndef split_input_target(chunk):\n    input_text = chunk[:-1]\n    target_text = chunk[1:]\n    return input_text, target_text\n\ndataset = sequences.map(split_input_target)\n```\n\n打印第一个样本输入和目标值：\n\n```python\nfor input_example, target_example in  dataset.take(1):\n  print ('Input data: ', repr(''.join(idx2char[input_example.numpy()])))\n  print ('Target data:', repr(''.join(idx2char[target_example.numpy()])))\n```\n\n    Input data:  'First Citizen:\\nBefore we proceed any further, hear me speak.\\n\\nAll:\\nSpeak, speak.\\n\\nFirst Citizen:\\nYou'\n    Target data: 'irst Citizen:\\nBefore we proceed any further, hear me speak.\\n\\nAll:\\nSpeak, speak.\\n\\nFirst Citizen:\\nYou '\n\n这些向量的每个索引均作为一个时间步来处理。对于时间步 0 的输入，我们收到了映射到字符 “F” 的索引，并尝试预测 “i” 的索引作为下一个字符。在下一个时间步，执行相同的操作，但除了当前字符外，`RNN` 还要考虑上一步的信息。\n\n```python\nfor i, (input_idx, target_idx) in enumerate(zip(input_example[:5], target_example[:5])):\n    print(\"Step {:4d}\".format(i))\n    print(\"  input: {} ({:s})\".format(input_idx, repr(idx2char[input_idx])))\n    print(\"  expected output: {} ({:s})\".format(target_idx, repr(idx2char[target_idx])))\n```\n\n    Step    0\n      input: 18 ('F')\n      expected output: 47 ('i')\n    ...\n    Step    4\n      input: 58 ('t')\n      expected output: 1 (' ')\n\n\n### 2.4. 使用 tf.data 创建批次文本并重排这些批次\n\n我们使用 `tf.data` 将文本拆分为可管理的序列。但在将这些数据馈送到模型中之前，我们需要对数据进行重排，并将其打包成批。\n\n```\n# Batch size\nBATCH_SIZE = 64\n\n# Buffer size to shuffle the dataset\n# (TF data is designed to work with possibly infinite sequences,\n# so it doesn't attempt to shuffle the entire sequence in memory. Instead,\n# it maintains a buffer in which it shuffles elements).\nBUFFER_SIZE = 10000\n\ndataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)\n\ndataset\n```\n\n```\n    <BatchDataset shapes: ((64, 100), (64, 100)), types: (tf.int64, tf.int64)>\n```\n\n## 3. 实现模型\n\n使用`tf.keras.Sequential`来定义模型。对于这个简单的例子，我们可以使用三个层来定义模型：\n\n* `tf.keras.layers.Embedding`：嵌入层（输入层）。一个可训练的对照表，它会将每个字符的数字映射到具有 `embedding_dim` 个维度的高维度向量；\n\n* `tf.keras.layers.GRU`：  GRU 层：一种层大小等于单位数(`units = rnn_units`)的 RNN。（在此示例中，您也可以使用 LSTM 层。）\n\n* `tf.keras.layers.Dense`：密集层（输出层），带有`vocab_size`个单元输出。\n\n```python\n# Length of the vocabulary in chars\nvocab_size = len(vocab)\n\n# The embedding dimension\nembedding_dim = 256\n\n# Number of RNN units\nrnn_units = 1024\n```\n\n\n```python\ndef build_model(vocab_size, embedding_dim, rnn_units, batch_size):\n  model = tf.keras.Sequential([\n    tf.keras.layers.Embedding(vocab_size, embedding_dim,\n                              batch_input_shape=[batch_size, None]),\n    tf.keras.layers.LSTM(rnn_units,\n                        return_sequences=True,\n                        stateful=True,\n                        recurrent_initializer='glorot_uniform'),\n    tf.keras.layers.Dense(vocab_size)\n  ])\n  return model\n```\n\n\n```python\nmodel = build_model(\n  vocab_size = len(vocab),\n  embedding_dim=embedding_dim,\n  rnn_units=rnn_units,\n  batch_size=BATCH_SIZE)\n```\n\n对于每个字符，模型查找嵌入，以嵌入作为输入一次运行GRU，并应用密集层生成预测下一个字符的对数可能性的logits：\n\n![A drawing of the data passing through the model](https://raw.githubusercontent.com/mari-linhares/docs/patch-1/site/en/tutorials/sequences/images/text_generation_training.png)\n\n## 4. 试试这个模型\n\n现在运行模型以查看它的行为符合预期，首先检查输出的形状：\n\n\n```python\nfor input_example_batch, target_example_batch in dataset.take(1):\n  example_batch_predictions = model(input_example_batch)\n  print(example_batch_predictions.shape, \"# (batch_size, sequence_length, vocab_size)\")\n```\n\n    (64, 100, 65) # (batch_size, sequence_length, vocab_size)\n\n\n在上面的示例中，输入的序列长度为 `100` ，但模型可以在任何长度的输入上运行：\n\n```python\nmodel.summary()\n```\n\n    Model: \"sequential\"\n    _________________________________________________________________\n    Layer (type)                 Output Shape              Param #\n    =================================================================\n    embedding (Embedding)        (64, None, 256)           16640\n    _________________________________________________________________\n    unified_lstm (UnifiedLSTM)   (64, None, 1024)          5246976\n    _________________________________________________________________\n    dense (Dense)                (64, None, 65)            66625\n    =================================================================\n    Total params: 5,330,241\n    Trainable params: 5,330,241\n    Non-trainable params: 0\n    _________________________________________________________________\n\n\n为了从模型中获得实际预测，我们需要从输出分布中进行采样，以获得实际的字符索引。此分布由字符词汇表上的logits定义。\n\n注意：从这个分布中进行_sample_（采样）非常重要，因为获取分布的_argmax_可以轻松地将模型卡在循环中。\n\n尝试批处理中的第一个样本：\n\n```python\nsampled_indices = tf.random.categorical(example_batch_predictions[0], num_samples=1)\nsampled_indices = tf.squeeze(sampled_indices,axis=-1).numpy()\n```\n\n这使我们在每个时间步都预测下一个字符索引：\n\n```python\nsampled_indices\n```\n\n\n    array([21,  2, 58, 40, 42, 32, 39,  7, 18, 38, 30, 58, 23, 58, 37, 10, 23,\n           16, 52, 14, 43,  8, 32, 49, 62, 41, 53, 38, 17, 36, 24, 59, 41, 38,\n            4, 27, 33, 59, 54, 34, 14,  1,  1, 56, 55, 40, 37,  4, 32, 44, 62,\n           59,  1, 10, 20, 29,  2, 48, 37, 26, 10, 22, 58,  5, 26,  9, 23, 26,\n           54, 43, 46, 36, 62, 57,  8, 53, 52, 23, 57, 42, 60, 10, 43, 11, 45,\n           12, 28, 46, 46, 15, 51,  9, 56,  7, 53, 51,  2,  1, 10, 58])\n\n\n解码这些以查看此未经训练的模型预测的文本：\n\n\n```python\nprint(\"Input: \\n\", repr(\"\".join(idx2char[input_example_batch[0]])))\nprint()\nprint(\"Next Char Predictions: \\n\", repr(\"\".join(idx2char[sampled_indices ])))\n```\n\n    Input:\n     'to it far before thy time?\\nWarwick is chancellor and the lord of Calais;\\nStern Falconbridge commands'\n    \n    Next Char Predictions:\n     \"I!tbdTa-FZRtKtY:KDnBe.TkxcoZEXLucZ&OUupVB  rqbY&Tfxu :HQ!jYN:Jt'N3KNpehXxs.onKsdv:e;g?PhhCm3r-om! :t\"\n\n\n## 5. 训练模型\n\n此时，问题可以被视为标准分类问题。给定先前的RNN状态，以及此时间步的输入，预测下一个字符的类。\n\n### 5.1. 添加优化器和损失函数\n\n标准的`tf.keras.losses.sparse_softmax_crossentropy`损失函数在这种情况下有效，因为它应用于预测的最后一个维度。\n\n因为我们的模型返回logits，所以我们需要设置`from_logits`标志。\n\n```python\ndef loss(labels, logits):\n  return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)\n\nexample_batch_loss  = loss(target_example_batch, example_batch_predictions)\nprint(\"Prediction shape: \", example_batch_predictions.shape, \" # (batch_size, sequence_length, vocab_size)\")\nprint(\"scalar_loss:      \", example_batch_loss.numpy().mean())\n```\n\n    Prediction shape:  (64, 100, 65)  # (batch_size, sequence_length, vocab_size)\n    scalar_loss:       4.174188\n\n使用 `tf.keras.Model.compile` 方法配置培训过程。我们将使用带有默认参数和损失函数的 `tf.keras.optimizers.Adam`。\n\n\n```python\nmodel.compile(optimizer='adam', loss=loss)\n```\n\n### 5.2. 配置检查点 \n\n使用`tf.keras.callbacks.ModelCheckpoint`确保在训练期间保存检查点：\n\n```python\n# Directory where the checkpoints will be saved\ncheckpoint_dir = './training_checkpoints'\n# Name of the checkpoint files\ncheckpoint_prefix = os.path.join(checkpoint_dir, \"ckpt_{epoch}\")\n\ncheckpoint_callback=tf.keras.callbacks.ModelCheckpoint(\n    filepath=checkpoint_prefix,\n    save_weights_only=True)\n```\n\n### 5.3. 开始训练\n\n为了使训练时间合理，使用10个时期来训练模型。在Colab中，将运行时设置为GPU以便更快地进行训练。\n\n```python\nEPOCHS=10\n\nhistory = model.fit(dataset, epochs=EPOCHS, callbacks=[checkpoint_callback])\n```\n\n    Epoch 1/10\n    172/172 [==============================] - 31s 183ms/step - loss: 2.7052\n    ......\n    Epoch 10/10\n    172/172 [==============================] - 31s 180ms/step - loss: 1.2276\n\n\n## 6. 生成文本\n\n### 6.1. 加载最新的检查点\n\n要使此预测步骤简单，请使用批处理大小1。\n\n由于RNN状态从时间步长传递到时间步的方式，模型一旦构建就只接受固定大小的批次数据。\n\n要使用不同的 `batch_size` 运行模型，我们需要重建模型并从检查点恢复权重。\n\n```python\ntf.train.latest_checkpoint(checkpoint_dir)\n```\n\n```\n        './training_checkpoints/ckpt_10'\n```\n\n```python\nmodel = build_model(vocab_size, embedding_dim, rnn_units, batch_size=1)\n\nmodel.load_weights(tf.train.latest_checkpoint(checkpoint_dir))\n\nmodel.build(tf.TensorShape([1, None]))\n\nmodel.summary()\n```\n\n    Model: \"sequential_1\"\n    _________________________________________________________________\n    Layer (type)                 Output Shape              Param #\n    =================================================================\n    embedding_1 (Embedding)      (1, None, 256)            16640\n    _________________________________________________________________\n    unified_lstm_1 (UnifiedLSTM) (1, None, 1024)           5246976\n    _________________________________________________________________\n    dense_1 (Dense)              (1, None, 65)             66625\n    =================================================================\n    Total params: 5,330,241\n    Trainable params: 5,330,241\n    Non-trainable params: 0\n    _________________________________________________________________\n\n\n### 6.2. 预测循环\n\n下面的代码块可生成文本：\n\n* 首先选择一个起始字符串，初始化 RNN 状态，并设置要生成的字符数。\n\n* 使用起始字符串和 RNN 状态获取预测值。\n\n* 然后，使用多项分布计算预测字符的索引。 将此预测字符用作模型的下一个输入。\n\n* 模型返回的 RNN 状态被馈送回模型中，使模型现在拥有更多上下文，而不是仅有一个单词。在模型预测下一个单词之后，经过修改的 RNN 状态再次被馈送回模型中，模型从先前预测的单词获取更多上下文，从而通过这种方式进行学习。\n\n\n![To generate text the model's output is fed back to the input](https://github.com/mari-linhares/docs/blob/patch-1/site/en/tutorials/sequences/images/text_generation_sampling.png?raw=true)\n\n查看生成的文本后，您会发现模型知道何时应使用大写字母，以及如何构成段落和模仿莎士比亚风格的词汇。由于执行的训练周期较少，因此该模型尚未学会生成连贯的句子。\n\n```python\ndef generate_text(model, start_string):\n  # Evaluation step (generating text using the learned model)\n\n  # Number of characters to generate\n  num_generate = 1000\n\n  # Converting our start string to numbers (vectorizing)\n  input_eval = [char2idx[s] for s in start_string]\n  input_eval = tf.expand_dims(input_eval, 0)\n\n  # Empty string to store our results\n  text_generated = []\n\n  # Low temperatures results in more predictable text.\n  # Higher temperatures results in more surprising text.\n  # Experiment to find the best setting.\n  temperature = 1.0\n\n  # Here batch size == 1\n  model.reset_states()\n  for i in range(num_generate):\n      predictions = model(input_eval)\n      # remove the batch dimension\n      predictions = tf.squeeze(predictions, 0)\n\n      # using a categorical distribution to predict the word returned by the model\n      predictions = predictions / temperature\n      predicted_id = tf.random.categorical(predictions, num_samples=1)[-1,0].numpy()\n\n      # We pass the predicted word as the next input to the model\n      # along with the previous hidden state\n      input_eval = tf.expand_dims([predicted_id], 0)\n\n      text_generated.append(idx2char[predicted_id])\n\n  return (start_string + ''.join(text_generated))\n```\n\n\n```python\nprint(generate_text(model, start_string=u\"ROMEO: \"))\n```\n\n    ROMEO: now to have weth hearten sonce,\n    No more than the thing stand perfect your self,\n    Love way come. Up, this is d so do in friends:\n    If I fear e this, I poisple\n    My gracious lusty, born once for readyus disguised:\n    But that a pry; do it sure, thou wert love his cause;\n    My mind is come too!\n    \n    POMPEY:\n    Serve my master's him: he hath extreme over his hand in the\n    where they shall not hear they right for me.\n    \n    PROSSPOLUCETER:\n    I pray you, mistress, I shall be construted\n    With one that you shall that we know it, in this gentleasing earls of daiberkers now\n    he is to look upon this face, which leadens from his master as\n    you should not put what you perciploce backzat of cast,\n    Nor fear it sometime but for a pit\n    a world of Hantua?\n    \n    First Gentleman:\n    That we can fall of bastards my sperial;\n    O, she Go seeming that which I have\n    what enby oar own best injuring them,\n    Or thom I do now, I, in heart is nothing gone,\n    Leatt the bark which was done born.\n    \n    BRUTUS:\n    Both Margaret, he is sword of the house person. If born,\n\n\n如果要改进结果，最简单的方法是增加模型训练的时长（请尝试 EPOCHS=30）。\n\n您还可以尝试使用不同的起始字符，或尝试添加另一个 RNN 层以提高模型的准确率，又或者调整温度参数以生成具有一定随机性的预测值。\n\n## 7. 高级：自定义训练\n\n上述训练程序很简单，但不会给你太多控制。\n\n所以现在您已经了解了如何手动运行模型，让我们解压缩训练循环，并自己实现。例如，如果要实施课程学习以帮助稳定模型的开环输出，这就是一个起点。\n\n我们将使用 `tf.GradientTape` 来跟踪梯度。您可以通过阅读[eager execution guide](https://www.tensorflow.org/guide/eager)来了解有关此方法的更多信息。\n\n该程序的工作原理如下：\n\n* 首先，初始化 RNN 状态。 我们通过调用 `tf.keras.Model.reset_states` 方法来完成此操作。\n* 接下来，迭代数据集（逐批）并计算与每个数据集关联的预测。\n* 打开 `tf.GradientTape` ，计算该上下文中的预测和损失。\n* 使用 `tf.GradientTape.grads` 方法计算相对于模型变量的损失梯度。\n* 最后，使用优化器的 `tf.train.Optimizer.apply_gradients` 方法向下迈出一步。\n\n```python\nmodel = build_model(\n  vocab_size = len(vocab),\n  embedding_dim=embedding_dim,\n  rnn_units=rnn_units,\n  batch_size=BATCH_SIZE)\n\n\noptimizer = tf.keras.optimizers.Adam()\n```\n\n\n```python\n@tf.function\ndef train_step(inp, target):\n  with tf.GradientTape() as tape:\n    predictions = model(inp)\n    loss = tf.reduce_mean(\n        tf.keras.losses.sparse_categorical_crossentropy(\n            target, predictions, from_logits=True))\n  grads = tape.gradient(loss, model.trainable_variables)\n  optimizer.apply_gradients(zip(grads, model.trainable_variables))\n\n  return loss\n```\n\n\n```python\n# Training step\nEPOCHS = 10\n\nfor epoch in range(EPOCHS):\n  start = time.time()\n\n  # initializing the hidden state at the start of every epoch\n  # initally hidden is None\n  hidden = model.reset_states()\n\n  for (batch_n, (inp, target)) in enumerate(dataset):\n    loss = train_step(inp, target)\n\n    if batch_n % 100 == 0:\n      template = 'Epoch {} Batch {} Loss {}'\n      print(template.format(epoch+1, batch_n, loss))\n\n  # saving (checkpoint) the model every 5 epochs\n  if (epoch + 1) % 5 == 0:\n    model.save_weights(checkpoint_prefix.format(epoch=epoch))\n\n  print ('Epoch {} Loss {:.4f}'.format(epoch+1, loss))\n  print ('Time taken for 1 epoch {} sec\\n'.format(time.time() - start))\n\nmodel.save_weights(checkpoint_prefix.format(epoch=epoch))\n```\n\n```\n    .....\n    Epoch 10 Batch 0 Loss 1.2350478172302246\n    Epoch 10 Batch 100 Loss 1.1610674858093262\n    Epoch 10 Loss 1.1558\n    Time taken for 1 epoch 14.261839628219604 sec\n```    \n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-text-text_generation.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-text-text_generation.html)\n> 英文版本：[https://tensorflow.google.cn/beta/tutorials/text/text_generation](https://tensorflow.google.cn/beta/tutorials/text/text_generation)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/text/text_generation.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/text/text_generation.md)\n"
  },
  {
    "path": "r2/tutorials/text/transformer.md",
    "content": "---\ntitle: 用于语言理解的变换器模型\ncategories: tensorflow2官方教程\ntags: tensorflow2.0教程\ntop: 1964\nabbrlink: tensorflow/tf2-tutorials-text-transformer\n---\n\n# 用于语言理解的变换器模型 (tensorflow2.0官方教程翻译)\n\nThis tutorial trains a <a href=\"https://arxiv.org/abs/1706.03762\" class=\"external\">Transformer model</a> to translate Portuguese to English. This is an advanced example that assumes knowledge of [text generation](text_generation.ipynb) and [attention](nmt_with_attention.ipynb).\n\nThe core idea behind the Transformer model is *self-attention*—the ability to attend to different positions of the input sequence to compute a representation of that sequence. Transformer creates stacks of self-attention layers and is explained below in the sections *Scaled dot product attention* and *Multi-head attention*.\n\nA transformer model handles variable-sized input using stacks of self-attention layers instead of [RNNs](text_classification_rnn.ipynb) or [CNNs](../images/intro_to_cnns.ipynb). This general architecture has a number of advantages:\n\n* It make no assumptions about the temporal/spatial relationships across the data. This is ideal for processing a set of objects (for example, [StarCraft units](https://deepmind.com/blog/alphastar-mastering-real-time-strategy-game-starcraft-ii/#block-8)).\n* Layer outputs can be calculated in parallel, instead of a series like an RNN.\n* Distant items can affect each other's output without passing through many RNN-steps, or convolution layers (see [Scene Memory Transformer](https://arxiv.org/pdf/1903.03878.pdf) for example).\n* It can learn long-range dependencies. This is a challenge in many sequence tasks.\n\nThe downsides of this architecture are:\n\n* For a time-series, the output for a time-step is calculated from the *entire history* instead of only the inputs and current hidden-state. This _may_ be less efficient.   \n* If the input *does* have a  temporal/spatial relationship, like text, some positional encoding must be added or the model will effectively see a bag of words. \n\nAfter training the model in this notebook, you will be able to input a Portuguese sentence and return the English translation.\n\n<img src=\"https://www.tensorflow.org/images/tutorials/transformer/attention_map_portuguese.png\" width=\"800\" alt=\"Attention heatmap\">\n\n\n```\nfrom __future__ import absolute_import, division, print_function, unicode_literals\n\n!pip install tf-nightly-gpu-2.0-preview\nimport tensorflow_datasets as tfds\nimport tensorflow as tf\n\nimport time\nimport numpy as np\nimport matplotlib.pyplot as plt\n```\n\n## Setup input pipeline\n\nUse [TFDS](https://www.tensorflow.org/datasets) to load the [Portugese-English translation dataset](https://github.com/neulab/word-embeddings-for-nmt) from the [TED Talks Open Translation Project](https://www.ted.com/participate/translate).\n\nThis dataset contains approximately 50000 training examples, 1100 validation examples, and 2000 test examples.\n\n\n```\nexamples, metadata = tfds.load('ted_hrlr_translate/pt_to_en', with_info=True,\n                               as_supervised=True)\ntrain_examples, val_examples = examples['train'], examples['validation']\n```\n\nCreate a custom subwords tokenizer from the training dataset. \n\n\n```\ntokenizer_en = tfds.features.text.SubwordTextEncoder.build_from_corpus(\n    (en.numpy() for pt, en in train_examples), target_vocab_size=2**13)\n\ntokenizer_pt = tfds.features.text.SubwordTextEncoder.build_from_corpus(\n    (pt.numpy() for pt, en in train_examples), target_vocab_size=2**13)\n```\n\n\n```\nsample_string = 'Transformer is awesome.'\n\ntokenized_string = tokenizer_en.encode(sample_string)\nprint ('Tokenized string is {}'.format(tokenized_string))\n\noriginal_string = tokenizer_en.decode(tokenized_string)\nprint ('The original string: {}'.format(original_string))\n\nassert original_string == sample_string\n```\n\n    Tokenized string is [7915, 1248, 7946, 7194, 13, 2799, 7877]\n    The original string: Transformer is awesome.\n\n\nThe tokenizer encodes the string by breaking it into subwords if the word is not in its dictionary.\n\n\n```\nfor ts in tokenized_string:\n  print ('{} ----> {}'.format(ts, tokenizer_en.decode([ts])))\n```\n\n    7915 ----> T\n    1248 ----> ran\n    7946 ----> s\n    7194 ----> former \n    13 ----> is \n    2799 ----> awesome\n    7877 ----> .\n\n\n\n```\nBUFFER_SIZE = 20000\nBATCH_SIZE = 64\n```\n\nAdd a start and end token to the input and target. \n\n\n```\ndef encode(lang1, lang2):\n  lang1 = [tokenizer_pt.vocab_size] + tokenizer_pt.encode(\n      lang1.numpy()) + [tokenizer_pt.vocab_size+1]\n\n  lang2 = [tokenizer_en.vocab_size] + tokenizer_en.encode(\n      lang2.numpy()) + [tokenizer_en.vocab_size+1]\n  \n  return lang1, lang2\n```\n\nNote: To keep this example small and relatively fast, drop examples with a length of over 40 tokens.\n\n\n```\nMAX_LENGTH = 40\n```\n\n\n```\ndef filter_max_length(x, y, max_length=MAX_LENGTH):\n  return tf.logical_and(tf.size(x) <= max_length,\n                        tf.size(y) <= max_length)\n```\n\nOperations inside `.map()` run in graph mode and receive a graph tensor that do not have a numpy attribute. The `tokenizer` expects a string or Unicode symbol to encode it into integers. Hence, you need to run the encoding inside a `tf.py_function`, which receives an eager tensor having a numpy attribute that contains the string value.\n\n\n```\ndef tf_encode(pt, en):\n  return tf.py_function(encode, [pt, en], [tf.int64, tf.int64])\n```\n\n\n```\ntrain_dataset = train_examples.map(tf_encode)\ntrain_dataset = train_dataset.filter(filter_max_length)\n# cache the dataset to memory to get a speedup while reading from it.\ntrain_dataset = train_dataset.cache()\ntrain_dataset = train_dataset.shuffle(BUFFER_SIZE).padded_batch(\n    BATCH_SIZE, padded_shapes=([-1], [-1]))\ntrain_dataset = train_dataset.prefetch(tf.data.experimental.AUTOTUNE)\n\n\nval_dataset = val_examples.map(tf_encode)\nval_dataset = val_dataset.filter(filter_max_length).padded_batch(\n    BATCH_SIZE, padded_shapes=([-1], [-1]))\n```\n\n\n```\nde_batch, en_batch = next(iter(val_dataset))\nde_batch, en_batch\n```\n\n\n\n\n    (<tf.Tensor: id=311487, shape=(64, 40), dtype=int64, numpy=\n     array([[8214, 1259,    5, ...,    0,    0,    0],\n            [8214,  299,   13, ...,    0,    0,    0],\n            [8214,   59,    8, ...,    0,    0,    0],\n            ..., \n            [8214,   95,    3, ...,    0,    0,    0],\n            [8214, 5157,    1, ...,    0,    0,    0],\n            [8214, 4479, 7990, ...,    0,    0,    0]])>,\n     <tf.Tensor: id=311488, shape=(64, 40), dtype=int64, numpy=\n     array([[8087,   18,   12, ...,    0,    0,    0],\n            [8087,  634,   30, ...,    0,    0,    0],\n            [8087,   16,   13, ...,    0,    0,    0],\n            ..., \n            [8087,   12,   20, ...,    0,    0,    0],\n            [8087,   17, 4981, ...,    0,    0,    0],\n            [8087,   12, 5453, ...,    0,    0,    0]])>)\n\n\n\n## Positional encoding\n\nSince this model doesn't contain any recurrence or convolution, positional encoding is added to give the model some information about the relative position of the words in the sentence. \n\nThe positional encoding vector is added to the embedding vector. Embeddings represent a token in a d-dimensional space where tokens with similar meaning will be closer to each other. But the embeddings do not encode the relative position of words in a sentence. So after adding the positional encoding, words will be closer to each other based on the *similarity of their meaning and their position in the sentence*, in the d-dimensional space.\n\nSee the notebook on [positional encoding](https://github.com/tensorflow/examples/blob/master/community/en/position_encoding.ipynb) to learn more about it. The formula for calculating the positional encoding is as follows:\n\n$$\\Large{PE_{(pos, 2i)} = sin(pos / 10000^{2i / d_{model}})} $$\n$$\\Large{PE_{(pos, 2i+1)} = cos(pos / 10000^{2i / d_{model}})} $$\n\n\n```\ndef get_angles(pos, i, d_model):\n  angle_rates = 1 / np.power(10000, (2 * (i//2)) / np.float32(d_model))\n  return pos * angle_rates\n```\n\n\n```\ndef positional_encoding(position, d_model):\n  angle_rads = get_angles(np.arange(position)[:, np.newaxis],\n                          np.arange(d_model)[np.newaxis, :],\n                          d_model)\n  \n  # apply sin to even indices in the array; 2i\n  sines = np.sin(angle_rads[:, 0::2])\n  \n  # apply cos to odd indices in the array; 2i+1\n  cosines = np.cos(angle_rads[:, 1::2])\n  \n  pos_encoding = np.concatenate([sines, cosines], axis=-1)\n  \n  pos_encoding = pos_encoding[np.newaxis, ...]\n    \n  return tf.cast(pos_encoding, dtype=tf.float32)\n```\n\n\n```\npos_encoding = positional_encoding(50, 512)\nprint (pos_encoding.shape)\n\nplt.pcolormesh(pos_encoding[0], cmap='RdBu')\nplt.xlabel('Depth')\nplt.xlim((0, 512))\nplt.ylabel('Position')\nplt.colorbar()\nplt.show()\n```\n\n    (1, 50, 512)\n\n\n\n![png](transformer_files/transformer_27_1.png)\n\n\n## Masking\n\nMask all the pad tokens in the batch of sequence. It ensures that the model does not treat padding as the input. The mask indicates where pad value `0` is present: it outputs a `1` at those locations, and a `0` otherwise.\n\n\n```\ndef create_padding_mask(seq):\n  seq = tf.cast(tf.math.equal(seq, 0), tf.float32)\n  \n  # add extra dimensions so that we can add the padding\n  # to the attention logits.\n  return seq[:, tf.newaxis, tf.newaxis, :]  # (batch_size, 1, 1, seq_len)\n```\n\n\n```\nx = tf.constant([[7, 6, 0, 0, 1], [1, 2, 3, 0, 0], [0, 0, 0, 4, 5]])\ncreate_padding_mask(x)\n```\n\n\n\n\n    <tf.Tensor: id=311505, shape=(3, 1, 1, 5), dtype=float32, numpy=\n    array([[[[ 0.,  0.,  1.,  1.,  0.]]],\n    \n    \n           [[[ 0.,  0.,  0.,  1.,  1.]]],\n    \n    \n           [[[ 1.,  1.,  1.,  0.,  0.]]]], dtype=float32)>\n\n\n\nThe look-ahead mask is used to mask the future tokens in a sequence. In other words, the mask indicates which entries should not be used.\n\nThis means that to predict the third word, only the first and second word will be used. Similarly to predict the fourth word, only the first, second and the third word will be used and so on.\n\n\n```\ndef create_look_ahead_mask(size):\n  mask = 1 - tf.linalg.band_part(tf.ones((size, size)), -1, 0)\n  return mask  # (seq_len, seq_len)\n```\n\n\n```\nx = tf.random.uniform((1, 3))\ntemp = create_look_ahead_mask(x.shape[1])\ntemp\n```\n\n\n\n\n    <tf.Tensor: id=311521, shape=(3, 3), dtype=float32, numpy=\n    array([[ 0.,  1.,  1.],\n           [ 0.,  0.,  1.],\n           [ 0.,  0.,  0.]], dtype=float32)>\n\n\n\n## Scaled dot product attention\n\n<img src=\"https://www.tensorflow.org/images/tutorials/transformer/scaled_attention.png\" width=\"500\" alt=\"scaled_dot_product_attention\">\n\nThe attention function used by the transformer takes three inputs: Q (query), K (key), V (value). The equation used to calculate the attention weights is:\n\n$$\\Large{Attention(Q, K, V) = softmax_k(\\frac{QK^T}{\\sqrt{d_k}}) V} $$\n\nThe dot-product attention is scaled by a factor of square root of the depth. This is done because for large values of depth, the dot product grows large in magnitude pushing the softmax function where it has small gradients resulting in a very hard softmax. \n\nFor example, consider that `Q` and `K` have a mean of 0 and variance of 1. Their matrix multiplication will have a mean of 0 and variance of `dk`. Hence, *square root of `dk`* is used for scaling (and not any other number) because the matmul of `Q` and `K` should have a mean of 0 and variance of 1, so that we get a gentler softmax.\n\nThe mask is multiplied with *-1e9 (close to negative infinity).* This is done because the mask is summed with the scaled matrix multiplication of Q and K and is applied immediately before a softmax. The goal is to zero out these cells, and large negative inputs to softmax are near zero in the output.\n\n\n```\ndef scaled_dot_product_attention(q, k, v, mask):\n  \"\"\"Calculate the attention weights.\n  q, k, v must have matching leading dimensions.\n  The mask has different shapes depending on its type(padding or look ahead) \n  but it must be broadcastable for addition.\n  \n  Args:\n    q: query shape == (..., seq_len_q, depth)\n    k: key shape == (..., seq_len_k, depth)\n    v: value shape == (..., seq_len_v, depth)\n    mask: Float tensor with shape broadcastable \n          to (..., seq_len_q, seq_len_k). Defaults to None.\n    \n  Returns:\n    output, attention_weights\n  \"\"\"\n\n  matmul_qk = tf.matmul(q, k, transpose_b=True)  # (..., seq_len_q, seq_len_k)\n  \n  # scale matmul_qk\n  dk = tf.cast(tf.shape(k)[-1], tf.float32)\n  scaled_attention_logits = matmul_qk / tf.math.sqrt(dk)\n\n  # add the mask to the scaled tensor.\n  if mask is not None:\n    scaled_attention_logits += (mask * -1e9)\n\n  # softmax is normalized on the last axis (seq_len_k) so that the scores\n  # add up to 1.\n  attention_weights = tf.nn.softmax(scaled_attention_logits, axis=-1)  # (..., seq_len_q, seq_len_k)\n\n  output = tf.matmul(attention_weights, v)  # (..., seq_len_v, depth)\n\n  return output, attention_weights\n```\n\nAs the softmax normalization is done on K, its values decide the amount of importance given to Q.\n\nThe output represents the multiplication of the attention weights and the V (value) vector. This ensures that the words we want to focus on are kept as is and the irrelevant words are flushed out.\n\n\n```\ndef print_out(q, k, v):\n  temp_out, temp_attn = scaled_dot_product_attention(\n      q, k, v, None)\n  print ('Attention weights are:')\n  print (temp_attn)\n  print ('Output is:')\n  print (temp_out)\n```\n\n\n```\nnp.set_printoptions(suppress=True)\n\ntemp_k = tf.constant([[10,0,0],\n                      [0,10,0],\n                      [0,0,10],\n                      [0,0,10]], dtype=tf.float32)  # (4, 3)\n\ntemp_v = tf.constant([[   1,0],\n                      [  10,0],\n                      [ 100,5],\n                      [1000,6]], dtype=tf.float32)  # (4, 3)\n\n# This `query` aligns with the second `key`,\n# so the second `value` is returned.\ntemp_q = tf.constant([[0, 10, 0]], dtype=tf.float32)  # (1, 3)\nprint_out(temp_q, temp_k, temp_v)\n```\n\n    Attention weights are:\n    tf.Tensor([[ 0.  1.  0.  0.]], shape=(1, 4), dtype=float32)\n    Output is:\n    tf.Tensor([[ 10.   0.]], shape=(1, 2), dtype=float32)\n\n\n\n```\n# This query aligns with a repeated key (third and fourth), \n# so all associated values get averaged.\ntemp_q = tf.constant([[0, 0, 10]], dtype=tf.float32)  # (1, 3)\nprint_out(temp_q, temp_k, temp_v)\n```\n\n    Attention weights are:\n    tf.Tensor([[ 0.   0.   0.5  0.5]], shape=(1, 4), dtype=float32)\n    Output is:\n    tf.Tensor([[ 550.     5.5]], shape=(1, 2), dtype=float32)\n\n\n\n```\n# This query aligns equally with the first and second key, \n# so their values get averaged.\ntemp_q = tf.constant([[10, 10, 0]], dtype=tf.float32)  # (1, 3)\nprint_out(temp_q, temp_k, temp_v)\n```\n\n    Attention weights are:\n    tf.Tensor([[ 0.5  0.5  0.   0. ]], shape=(1, 4), dtype=float32)\n    Output is:\n    tf.Tensor([[ 5.5  0. ]], shape=(1, 2), dtype=float32)\n\n\nPass all the queries together.\n\n\n```\ntemp_q = tf.constant([[0, 0, 10], [0, 10, 0], [10, 10, 0]], dtype=tf.float32)  # (3, 3)\nprint_out(temp_q, temp_k, temp_v)\n```\n\n    Attention weights are:\n    tf.Tensor(\n    [[ 0.   0.   0.5  0.5]\n     [ 0.   1.   0.   0. ]\n     [ 0.5  0.5  0.   0. ]], shape=(3, 4), dtype=float32)\n    Output is:\n    tf.Tensor(\n    [[ 550.     5.5]\n     [  10.     0. ]\n     [   5.5    0. ]], shape=(3, 2), dtype=float32)\n\n\n## Multi-head attention\n\n<img src=\"https://www.tensorflow.org/images/tutorials/transformer/multi_head_attention.png\" width=\"500\" alt=\"multi-head attention\">\n\n\nMulti-head attention consists of four parts:\n*    Linear layers and split into heads.\n*    Scaled dot-product attention.\n*    Concatenation of heads.\n*    Final linear layer.\n\nEach multi-head attention block gets three inputs; Q (query), K (key), V (value). These are put through linear (Dense) layers and split up into multiple heads. \n\nThe `scaled_dot_product_attention` defined above is applied to each head (broadcasted for efficiency). An appropriate mask must be used in the attention step.  The attention output for each head is then concatenated (using `tf.transpose`, and `tf.reshape`) and put through a final `Dense` layer.\n\nInstead of one single attention head, Q, K, and V are split into multiple heads because it allows the model to jointly attend to information at different positions from different representational spaces. After the split each head has a reduced dimensionality, so the total computation cost is the same as a single head attention with full dimensionality.\n\n\n```\nclass MultiHeadAttention(tf.keras.layers.Layer):\n  def __init__(self, d_model, num_heads):\n    super(MultiHeadAttention, self).__init__()\n    self.num_heads = num_heads\n    self.d_model = d_model\n    \n    assert d_model % self.num_heads == 0\n    \n    self.depth = d_model // self.num_heads\n    \n    self.wq = tf.keras.layers.Dense(d_model)\n    self.wk = tf.keras.layers.Dense(d_model)\n    self.wv = tf.keras.layers.Dense(d_model)\n    \n    self.dense = tf.keras.layers.Dense(d_model)\n        \n  def split_heads(self, x, batch_size):\n    \"\"\"Split the last dimension into (num_heads, depth).\n    Transpose the result such that the shape is (batch_size, num_heads, seq_len, depth)\n    \"\"\"\n    x = tf.reshape(x, (batch_size, -1, self.num_heads, self.depth))\n    return tf.transpose(x, perm=[0, 2, 1, 3])\n    \n  def call(self, v, k, q, mask):\n    batch_size = tf.shape(q)[0]\n    \n    q = self.wq(q)  # (batch_size, seq_len, d_model)\n    k = self.wk(k)  # (batch_size, seq_len, d_model)\n    v = self.wv(v)  # (batch_size, seq_len, d_model)\n    \n    q = self.split_heads(q, batch_size)  # (batch_size, num_heads, seq_len_q, depth)\n    k = self.split_heads(k, batch_size)  # (batch_size, num_heads, seq_len_k, depth)\n    v = self.split_heads(v, batch_size)  # (batch_size, num_heads, seq_len_v, depth)\n    \n    # scaled_attention.shape == (batch_size, num_heads, seq_len_v, depth)\n    # attention_weights.shape == (batch_size, num_heads, seq_len_q, seq_len_k)\n    scaled_attention, attention_weights = scaled_dot_product_attention(\n        q, k, v, mask)\n    \n    scaled_attention = tf.transpose(scaled_attention, perm=[0, 2, 1, 3])  # (batch_size, seq_len_v, num_heads, depth)\n\n    concat_attention = tf.reshape(scaled_attention, \n                                  (batch_size, -1, self.d_model))  # (batch_size, seq_len_v, d_model)\n\n    output = self.dense(concat_attention)  # (batch_size, seq_len_v, d_model)\n        \n    return output, attention_weights\n```\n\nCreate a `MultiHeadAttention` layer to try out. At each location in the sequence, `y`, the `MultiHeadAttention` runs all 8 attention heads across all other locations in the sequence, returning a new vector of the same length at each location.\n\n\n```\ntemp_mha = MultiHeadAttention(d_model=512, num_heads=8)\ny = tf.random.uniform((1, 60, 512))  # (batch_size, encoder_sequence, d_model)\nout, attn = temp_mha(y, k=y, q=y, mask=None)\nout.shape, attn.shape\n```\n\n\n\n\n    (TensorShape([1, 60, 512]), TensorShape([1, 8, 60, 60]))\n\n\n\n## Point wise feed forward network\n\nPoint wise feed forward network consists of two fully-connected layers with a ReLU activation in between.\n\n\n```\ndef point_wise_feed_forward_network(d_model, dff):\n  return tf.keras.Sequential([\n      tf.keras.layers.Dense(dff, activation='relu'),  # (batch_size, seq_len, dff)\n      tf.keras.layers.Dense(d_model)  # (batch_size, seq_len, d_model)\n  ])\n```\n\n\n```\nsample_ffn = point_wise_feed_forward_network(512, 2048)\nsample_ffn(tf.random.uniform((64, 50, 512))).shape\n```\n\n\n\n\n    TensorShape([64, 50, 512])\n\n\n\n## Encoder and decoder\n\n<img src=\"https://www.tensorflow.org/images/tutorials/transformer/transformer.png\" width=\"600\" alt=\"transformer\">\n\nThe transformer model follows the same general pattern as a standard [sequence to sequence with attention model](nmt_with_attention.ipynb). \n\n* The input sentence is passed through `N` encoder layers that generates an output for each word/token in the sequence.\n* The decoder attends on the encoder's output and its own input (self-attention) to predict the next word. \n\n### Encoder layer\n\nEach encoder layer consists of sublayers:\n\n1.   Multi-head attention (with padding mask) \n2.    Point wise feed forward networks. \n\nEach of these sublayers has a residual connection around it followed by a layer normalization. Residual connections help in avoiding the vanishing gradient problem in deep networks.\n\nThe output of each sublayer is `LayerNorm(x + Sublayer(x))`. The normalization is done on the `d_model` (last) axis. There are N encoder layers in the transformer.\n\n\n```\nclass EncoderLayer(tf.keras.layers.Layer):\n  def __init__(self, d_model, num_heads, dff, rate=0.1):\n    super(EncoderLayer, self).__init__()\n\n    self.mha = MultiHeadAttention(d_model, num_heads)\n    self.ffn = point_wise_feed_forward_network(d_model, dff)\n\n    self.layernorm1 = tf.keras.layers.LayerNormalization(epsilon=1e-6)\n    self.layernorm2 = tf.keras.layers.LayerNormalization(epsilon=1e-6)\n    \n    self.dropout1 = tf.keras.layers.Dropout(rate)\n    self.dropout2 = tf.keras.layers.Dropout(rate)\n    \n  def call(self, x, training, mask):\n\n    attn_output, _ = self.mha(x, x, x, mask)  # (batch_size, input_seq_len, d_model)\n    attn_output = self.dropout1(attn_output, training=training)\n    out1 = self.layernorm1(x + attn_output)  # (batch_size, input_seq_len, d_model)\n    \n    ffn_output = self.ffn(out1)  # (batch_size, input_seq_len, d_model)\n    ffn_output = self.dropout2(ffn_output, training=training)\n    out2 = self.layernorm2(out1 + ffn_output)  # (batch_size, input_seq_len, d_model)\n    \n    return out2\n```\n\n\n```\nsample_encoder_layer = EncoderLayer(512, 8, 2048)\n\nsample_encoder_layer_output = sample_encoder_layer(\n    tf.random.uniform((64, 43, 512)), False, None)\n\nsample_encoder_layer_output.shape  # (batch_size, input_seq_len, d_model)\n```\n\n\n\n\n    TensorShape([64, 43, 512])\n\n\n\n### Decoder layer\n\nEach decoder layer consists of sublayers:\n\n1.   Masked multi-head attention (with look ahead mask and padding mask)\n2.   Multi-head attention (with padding mask). V (value) and K (key) receive the *encoder output* as inputs. Q (query) receives the *output from the masked multi-head attention sublayer.*\n3.   Point wise feed forward networks\n\nEach of these sublayers has a residual connection around it followed by a layer normalization. The output of each sublayer is `LayerNorm(x + Sublayer(x))`. The normalization is done on the `d_model` (last) axis.\n\nThere are N decoder layers in the transformer.\n\nAs Q receives the output from decoder's first attention block, and K receives the encoder output, the attention weights represent the importance given to the decoder's input based on the encoder's output. In other words, the decoder predicts the next word by looking at the encoder output and self-attending to its own output. See the demonstration above in the scaled dot product attention section.\n\n\n```\nclass DecoderLayer(tf.keras.layers.Layer):\n  def __init__(self, d_model, num_heads, dff, rate=0.1):\n    super(DecoderLayer, self).__init__()\n\n    self.mha1 = MultiHeadAttention(d_model, num_heads)\n    self.mha2 = MultiHeadAttention(d_model, num_heads)\n\n    self.ffn = point_wise_feed_forward_network(d_model, dff)\n \n    self.layernorm1 = tf.keras.layers.LayerNormalization(epsilon=1e-6)\n    self.layernorm2 = tf.keras.layers.LayerNormalization(epsilon=1e-6)\n    self.layernorm3 = tf.keras.layers.LayerNormalization(epsilon=1e-6)\n    \n    self.dropout1 = tf.keras.layers.Dropout(rate)\n    self.dropout2 = tf.keras.layers.Dropout(rate)\n    self.dropout3 = tf.keras.layers.Dropout(rate)\n    \n    \n  def call(self, x, enc_output, training, \n           look_ahead_mask, padding_mask):\n    # enc_output.shape == (batch_size, input_seq_len, d_model)\n\n    attn1, attn_weights_block1 = self.mha1(x, x, x, look_ahead_mask)  # (batch_size, target_seq_len, d_model)\n    attn1 = self.dropout1(attn1, training=training)\n    out1 = self.layernorm1(attn1 + x)\n    \n    attn2, attn_weights_block2 = self.mha2(\n        enc_output, enc_output, out1, padding_mask)  # (batch_size, target_seq_len, d_model)\n    attn2 = self.dropout2(attn2, training=training)\n    out2 = self.layernorm2(attn2 + out1)  # (batch_size, target_seq_len, d_model)\n    \n    ffn_output = self.ffn(out2)  # (batch_size, target_seq_len, d_model)\n    ffn_output = self.dropout3(ffn_output, training=training)\n    out3 = self.layernorm3(ffn_output + out2)  # (batch_size, target_seq_len, d_model)\n    \n    return out3, attn_weights_block1, attn_weights_block2\n```\n\n\n```\nsample_decoder_layer = DecoderLayer(512, 8, 2048)\n\nsample_decoder_layer_output, _, _ = sample_decoder_layer(\n    tf.random.uniform((64, 50, 512)), sample_encoder_layer_output, \n    False, None, None)\n\nsample_decoder_layer_output.shape  # (batch_size, target_seq_len, d_model)\n```\n\n\n\n\n    TensorShape([64, 50, 512])\n\n\n\n### Encoder\n\nThe `Encoder` consists of:\n1.   Input Embedding\n2.   Positional Encoding\n3.   N encoder layers\n\nThe input is put through an embedding which is summed with the positional encoding. The output of this summation is the input to the encoder layers. The output of the encoder is the input to the decoder.\n\n\n```\nclass Encoder(tf.keras.layers.Layer):\n  def __init__(self, num_layers, d_model, num_heads, dff, input_vocab_size, \n               rate=0.1):\n    super(Encoder, self).__init__()\n\n    self.d_model = d_model\n    self.num_layers = num_layers\n    \n    self.embedding = tf.keras.layers.Embedding(input_vocab_size, d_model)\n    self.pos_encoding = positional_encoding(input_vocab_size, self.d_model)\n    \n    \n    self.enc_layers = [EncoderLayer(d_model, num_heads, dff, rate) \n                       for _ in range(num_layers)]\n  \n    self.dropout = tf.keras.layers.Dropout(rate)\n        \n  def call(self, x, training, mask):\n\n    seq_len = tf.shape(x)[1]\n    \n    # adding embedding and position encoding.\n    x = self.embedding(x)  # (batch_size, input_seq_len, d_model)\n    x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))\n    x += self.pos_encoding[:, :seq_len, :]\n\n    x = self.dropout(x, training=training)\n    \n    for i in range(self.num_layers):\n      x = self.enc_layers[i](x, training, mask)\n    \n    return x  # (batch_size, input_seq_len, d_model)\n```\n\n\n```\nsample_encoder = Encoder(num_layers=2, d_model=512, num_heads=8, \n                         dff=2048, input_vocab_size=8500)\n\nsample_encoder_output = sample_encoder(tf.random.uniform((64, 62)), \n                                       training=False, mask=None)\n\nprint (sample_encoder_output.shape)  # (batch_size, input_seq_len, d_model)\n```\n\n    (64, 62, 512)\n\n\n### Decoder\n\n The `Decoder` consists of:\n1.   Output Embedding\n2.   Positional Encoding\n3.   N decoder layers\n\nThe target is put through an embedding which is summed with the positional encoding. The output of this summation is the input to the decoder layers. The output of the decoder is the input to the final linear layer.\n\n\n```\nclass Decoder(tf.keras.layers.Layer):\n  def __init__(self, num_layers, d_model, num_heads, dff, target_vocab_size, \n               rate=0.1):\n    super(Decoder, self).__init__()\n\n    self.d_model = d_model\n    self.num_layers = num_layers\n    \n    self.embedding = tf.keras.layers.Embedding(target_vocab_size, d_model)\n    self.pos_encoding = positional_encoding(target_vocab_size, self.d_model)\n    \n    self.dec_layers = [DecoderLayer(d_model, num_heads, dff, rate) \n                       for _ in range(num_layers)]\n    self.dropout = tf.keras.layers.Dropout(rate)\n    \n  def call(self, x, enc_output, training, \n           look_ahead_mask, padding_mask):\n\n    seq_len = tf.shape(x)[1]\n    attention_weights = {}\n    \n    x = self.embedding(x)  # (batch_size, target_seq_len, d_model)\n    x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))\n    x += self.pos_encoding[:, :seq_len, :]\n    \n    x = self.dropout(x, training=training)\n\n    for i in range(self.num_layers):\n      x, block1, block2 = self.dec_layers[i](x, enc_output, training,\n                                             look_ahead_mask, padding_mask)\n      \n      attention_weights['decoder_layer{}_block1'.format(i+1)] = block1\n      attention_weights['decoder_layer{}_block2'.format(i+1)] = block2\n    \n    # x.shape == (batch_size, target_seq_len, d_model)\n    return x, attention_weights\n```\n\n\n```\nsample_decoder = Decoder(num_layers=2, d_model=512, num_heads=8, \n                         dff=2048, target_vocab_size=8000)\n\noutput, attn = sample_decoder(tf.random.uniform((64, 26)), \n                              enc_output=sample_encoder_output, \n                              training=False, look_ahead_mask=None, \n                              padding_mask=None)\n\noutput.shape, attn['decoder_layer2_block2'].shape\n```\n\n\n\n\n    (TensorShape([64, 26, 512]), TensorShape([64, 8, 26, 62]))\n\n\n\n## Create the Transformer\n\nTransformer consists of the encoder, decoder and a final linear layer. The output of the decoder is the input to the linear layer and its output is returned.\n\n\n```\nclass Transformer(tf.keras.Model):\n  def __init__(self, num_layers, d_model, num_heads, dff, input_vocab_size, \n               target_vocab_size, rate=0.1):\n    super(Transformer, self).__init__()\n\n    self.encoder = Encoder(num_layers, d_model, num_heads, dff, \n                           input_vocab_size, rate)\n\n    self.decoder = Decoder(num_layers, d_model, num_heads, dff, \n                           target_vocab_size, rate)\n\n    self.final_layer = tf.keras.layers.Dense(target_vocab_size)\n    \n  def call(self, inp, tar, training, enc_padding_mask, \n           look_ahead_mask, dec_padding_mask):\n\n    enc_output = self.encoder(inp, training, enc_padding_mask)  # (batch_size, inp_seq_len, d_model)\n    \n    # dec_output.shape == (batch_size, tar_seq_len, d_model)\n    dec_output, attention_weights = self.decoder(\n        tar, enc_output, training, look_ahead_mask, dec_padding_mask)\n    \n    final_output = self.final_layer(dec_output)  # (batch_size, tar_seq_len, target_vocab_size)\n    \n    return final_output, attention_weights\n```\n\n\n```\nsample_transformer = Transformer(\n    num_layers=2, d_model=512, num_heads=8, dff=2048, \n    input_vocab_size=8500, target_vocab_size=8000)\n\ntemp_input = tf.random.uniform((64, 62))\ntemp_target = tf.random.uniform((64, 26))\n\nfn_out, _ = sample_transformer(temp_input, temp_target, training=False, \n                               enc_padding_mask=None, \n                               look_ahead_mask=None,\n                               dec_padding_mask=None)\n\nfn_out.shape  # (batch_size, tar_seq_len, target_vocab_size)\n```\n\n\n\n\n    TensorShape([64, 26, 8000])\n\n\n\n## Set hyperparameters\n\nTo keep this example small and relatively fast, the values for *num_layers, d_model, and dff* have been reduced. \n\nThe values used in the base model of transformer were; *num_layers=6*, *d_model = 512*, *dff = 2048*. See the [paper](https://arxiv.org/abs/1706.03762) for all the other versions of the transformer.\n\nNote: By changing the values below, you can get the model that achieved state of the art on many tasks.\n\n\n```\nnum_layers = 4\nd_model = 128\ndff = 512\nnum_heads = 8\n\ninput_vocab_size = tokenizer_pt.vocab_size + 2\ntarget_vocab_size = tokenizer_en.vocab_size + 2\ndropout_rate = 0.1\n```\n\n## Optimizer\n\nUse the Adam optimizer with a custom learning rate scheduler according to the formula in the [paper](https://arxiv.org/abs/1706.03762).\n\n$$\\Large{lrate = d_{model}^{-0.5} * min(step{\\_}num^{-0.5}, step{\\_}num * warmup{\\_}steps^{-1.5})}$$\n\n\n\n```\nclass CustomSchedule(tf.keras.optimizers.schedules.LearningRateSchedule):\n  def __init__(self, d_model, warmup_steps=4000):\n    super(CustomSchedule, self).__init__()\n    \n    self.d_model = d_model\n    self.d_model = tf.cast(self.d_model, tf.float32)\n\n    self.warmup_steps = warmup_steps\n    \n  def __call__(self, step):\n    arg1 = tf.math.rsqrt(step)\n    arg2 = step * (self.warmup_steps ** -1.5)\n    \n    return tf.math.rsqrt(self.d_model) * tf.math.minimum(arg1, arg2)\n```\n\n\n```\nlearning_rate = CustomSchedule(d_model)\n\noptimizer = tf.keras.optimizers.Adam(learning_rate, beta_1=0.9, beta_2=0.98, \n                                     epsilon=1e-9)\n```\n\n\n```\ntemp_learning_rate_schedule = CustomSchedule(d_model)\n\nplt.plot(temp_learning_rate_schedule(tf.range(40000, dtype=tf.float32)))\nplt.ylabel(\"Learning Rate\")\nplt.xlabel(\"Train Step\")\n```\n\n\n\n\n    <matplotlib.text.Text at 0x7fa7c8353590>\n\n\n\n\n![png](transformer_files/transformer_82_1.png)\n\n\n## Loss and metrics\n\nSince the target sequences are padded, it is important to apply a padding mask when calculating the loss.\n\n\n```\nloss_object = tf.keras.losses.SparseCategoricalCrossentropy(\n    from_logits=True, reduction='none')\n```\n\n\n```\ndef loss_function(real, pred):\n  mask = tf.math.logical_not(tf.math.equal(real, 0))\n  loss_ = loss_object(real, pred)\n\n  mask = tf.cast(mask, dtype=loss_.dtype)\n  loss_ *= mask\n  \n  return tf.reduce_mean(loss_)\n```\n\n\n```\ntrain_loss = tf.keras.metrics.Mean(name='train_loss')\ntrain_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(\n    name='train_accuracy')\n```\n\n## Training and checkpointing\n\n\n```\ntransformer = Transformer(num_layers, d_model, num_heads, dff,\n                          input_vocab_size, target_vocab_size, dropout_rate)\n```\n\n\n```\ndef create_masks(inp, tar):\n  # Encoder padding mask\n  enc_padding_mask = create_padding_mask(inp)\n  \n  # Used in the 2nd attention block in the decoder.\n  # This padding mask is used to mask the encoder outputs.\n  dec_padding_mask = create_padding_mask(inp)\n  \n  # Used in the 1st attention block in the decoder.\n  # It is used to pad and mask future tokens in the input received by \n  # the decoder.\n  look_ahead_mask = create_look_ahead_mask(tf.shape(tar)[1])\n  dec_target_padding_mask = create_padding_mask(tar)\n  combined_mask = tf.maximum(dec_target_padding_mask, look_ahead_mask)\n  \n  return enc_padding_mask, combined_mask, dec_padding_mask\n```\n\nCreate the checkpoint path and the checkpoint manager. This will be used to save checkpoints every `n` epochs.\n\n\n```\ncheckpoint_path = \"./checkpoints/train\"\n\nckpt = tf.train.Checkpoint(transformer=transformer,\n                           optimizer=optimizer)\n\nckpt_manager = tf.train.CheckpointManager(ckpt, checkpoint_path, max_to_keep=5)\n\n# if a checkpoint exists, restore the latest checkpoint.\nif ckpt_manager.latest_checkpoint:\n  ckpt.restore(ckpt_manager.latest_checkpoint)\n  print ('Latest checkpoint restored!!')\n```\n\nThe target is divided into tar_inp and tar_real. tar_inp is passed as an input to the decoder. `tar_real` is that same input shifted by 1: At each location in `tar_input`, `tar_real` contains the  next token that should be predicted.\n\nFor example, `sentence` = \"SOS A lion in the jungle is sleeping EOS\"\n\n`tar_inp` =  \"SOS A lion in the jungle is sleeping\"\n\n`tar_real` = \"A lion in the jungle is sleeping EOS\"\n\nThe transformer is an auto-regressive model: it makes predictions one part at a time, and uses its output so far to decide what to do next. \n\nDuring training this example uses teacher-forcing (like in the [text generation tutorial](./text_generation.ipynb)). Teacher forcing is passing the true output to the next time step regardless of what the model predicts at the current time step.\n\nAs the transformer predicts each word, *self-attention* allows it to look at the previous words in the input sequence to better predict the next word.\n\nTo prevent the model from peaking at the expected output the model uses a look-ahead mask.\n\n\n```\nEPOCHS = 20\n```\n\n\n```\n@tf.function\ndef train_step(inp, tar):\n  tar_inp = tar[:, :-1]\n  tar_real = tar[:, 1:]\n  \n  enc_padding_mask, combined_mask, dec_padding_mask = create_masks(inp, tar_inp)\n  \n  with tf.GradientTape() as tape:\n    predictions, _ = transformer(inp, tar_inp, \n                                 True, \n                                 enc_padding_mask, \n                                 combined_mask, \n                                 dec_padding_mask)\n    loss = loss_function(tar_real, predictions)\n\n  gradients = tape.gradient(loss, transformer.trainable_variables)    \n  optimizer.apply_gradients(zip(gradients, transformer.trainable_variables))\n  \n  train_loss(loss)\n  train_accuracy(tar_real, predictions)\n```\n\nPortuguese is used as the input language and English is the target language.\n\n\n```\nfor epoch in range(EPOCHS):\n  start = time.time()\n  \n  train_loss.reset_states()\n  train_accuracy.reset_states()\n  \n  # inp -> portuguese, tar -> english\n  for (batch, (inp, tar)) in enumerate(train_dataset):\n    train_step(inp, tar)\n    \n    if batch % 500 == 0:\n      print ('Epoch {} Batch {} Loss {:.4f} Accuracy {:.4f}'.format(\n          epoch + 1, batch, train_loss.result(), train_accuracy.result()))\n      \n  if (epoch + 1) % 5 == 0:\n    ckpt_save_path = ckpt_manager.save()\n    print ('Saving checkpoint for epoch {} at {}'.format(epoch+1,\n                                                         ckpt_save_path))\n    \n  print ('Epoch {} Loss {:.4f} Accuracy {:.4f}'.format(epoch + 1, \n                                                train_loss.result(), \n                                                train_accuracy.result()))\n\n  print ('Time taken for 1 epoch: {} secs\\n'.format(time.time() - start))\n```\n\n    Epoch 1 Batch 0 Loss 4.4091 Accuracy 0.0000\n    Epoch 1 Batch 500 Loss 2.8389 Accuracy 0.4932\n    Epoch 1 Loss 2.4228 Accuracy 0.5203\n    Time taken for 1 epoch: 319.307568073 secs\n    \n    Epoch 2 Batch 0 Loss 1.3543 Accuracy 0.6036\n    Epoch 2 Batch 500 Loss 1.1689 Accuracy 0.6405\n    Epoch 2 Loss 1.1440 Accuracy 0.6460\n    Time taken for 1 epoch: 251.259027004 secs\n    \n    Epoch 3 Batch 0 Loss 1.1281 Accuracy 0.6554\n    Epoch 3 Batch 500 Loss 1.0244 Accuracy 0.6723\n    Epoch 3 Loss 1.0115 Accuracy 0.6754\n    Time taken for 1 epoch: 70.5513730049 secs\n    \n    Epoch 4 Batch 0 Loss 1.0113 Accuracy 0.6764\n    Epoch 4 Batch 500 Loss 0.9192 Accuracy 0.6975\n    Epoch 4 Loss 0.9037 Accuracy 0.7019\n    Time taken for 1 epoch: 70.917550087 secs\n    \n    Epoch 5 Batch 0 Loss 0.9030 Accuracy 0.7027\n    Epoch 5 Batch 500 Loss 0.8099 Accuracy 0.7260\n    Saving checkpoint for epoch 5 at ./checkpoints/train/ckpt-1\n    Epoch 5 Loss 0.7974 Accuracy 0.7293\n    Time taken for 1 epoch: 73.0342350006 secs\n    \n    Epoch 6 Batch 0 Loss 0.8077 Accuracy 0.7360\n    Epoch 6 Batch 500 Loss 0.7201 Accuracy 0.7475\n    Epoch 6 Loss 0.7084 Accuracy 0.7503\n    Time taken for 1 epoch: 70.6219291687 secs\n    \n    Epoch 7 Batch 0 Loss 0.7275 Accuracy 0.7451\n    Epoch 7 Batch 500 Loss 0.6304 Accuracy 0.7688\n    Epoch 7 Loss 0.6182 Accuracy 0.7719\n    Time taken for 1 epoch: 72.2072319984 secs\n    \n    Epoch 8 Batch 0 Loss 0.6404 Accuracy 0.7730\n    Epoch 8 Batch 500 Loss 0.5517 Accuracy 0.7887\n    Epoch 8 Loss 0.5430 Accuracy 0.7911\n    Time taken for 1 epoch: 70.9613239765 secs\n    \n    Epoch 9 Batch 0 Loss 0.5784 Accuracy 0.7829\n    Epoch 9 Batch 500 Loss 0.4962 Accuracy 0.8035\n    Epoch 9 Loss 0.4900 Accuracy 0.8052\n    Time taken for 1 epoch: 68.5947010517 secs\n    \n    Epoch 10 Batch 0 Loss 0.5201 Accuracy 0.7956\n    Epoch 10 Batch 500 Loss 0.4545 Accuracy 0.8145\n    Saving checkpoint for epoch 10 at ./checkpoints/train/ckpt-2\n    Epoch 10 Loss 0.4493 Accuracy 0.8161\n    Time taken for 1 epoch: 68.6737399101 secs\n    \n    Epoch 11 Batch 0 Loss 0.4883 Accuracy 0.8055\n    Epoch 11 Batch 500 Loss 0.4206 Accuracy 0.8240\n    Epoch 11 Loss 0.4164 Accuracy 0.8251\n    Time taken for 1 epoch: 69.4070420265 secs\n    \n    Epoch 12 Batch 0 Loss 0.4413 Accuracy 0.8195\n    Epoch 12 Batch 500 Loss 0.3937 Accuracy 0.8317\n    Epoch 12 Loss 0.3902 Accuracy 0.8328\n    Time taken for 1 epoch: 70.7441010475 secs\n    \n    Epoch 13 Batch 0 Loss 0.4223 Accuracy 0.8236\n    Epoch 13 Batch 500 Loss 0.3716 Accuracy 0.8380\n    Epoch 13 Loss 0.3685 Accuracy 0.8389\n    Time taken for 1 epoch: 71.3240449429 secs\n    \n    Epoch 14 Batch 0 Loss 0.4037 Accuracy 0.8265\n    Epoch 14 Batch 500 Loss 0.3511 Accuracy 0.8442\n    Epoch 14 Loss 0.3483 Accuracy 0.8450\n    Time taken for 1 epoch: 75.1278469563 secs\n    \n    Epoch 15 Batch 0 Loss 0.3782 Accuracy 0.8331\n    Epoch 15 Batch 500 Loss 0.3339 Accuracy 0.8493\n    Saving checkpoint for epoch 15 at ./checkpoints/train/ckpt-3\n    Epoch 15 Loss 0.3318 Accuracy 0.8499\n    Time taken for 1 epoch: 71.4256718159 secs\n    \n    Epoch 16 Batch 0 Loss 0.3577 Accuracy 0.8409\n    Epoch 16 Batch 500 Loss 0.3195 Accuracy 0.8537\n    Epoch 16 Loss 0.3172 Accuracy 0.8544\n    Time taken for 1 epoch: 70.8179049492 secs\n    \n    Epoch 17 Batch 0 Loss 0.3447 Accuracy 0.8466\n    Epoch 17 Batch 500 Loss 0.3055 Accuracy 0.8579\n    Epoch 17 Loss 0.3033 Accuracy 0.8587\n    Time taken for 1 epoch: 68.7967669964 secs\n    \n    Epoch 18 Batch 0 Loss 0.3385 Accuracy 0.8487\n    Epoch 18 Batch 500 Loss 0.2931 Accuracy 0.8620\n    Epoch 18 Loss 0.2910 Accuracy 0.8626\n    Time taken for 1 epoch: 67.865557909 secs\n    \n    Epoch 19 Batch 0 Loss 0.3198 Accuracy 0.8503\n    Epoch 19 Batch 500 Loss 0.2818 Accuracy 0.8657\n    Epoch 19 Loss 0.2797 Accuracy 0.8665\n    Time taken for 1 epoch: 67.9785480499 secs\n    \n    Epoch 20 Batch 0 Loss 0.3110 Accuracy 0.8557\n    Epoch 20 Batch 500 Loss 0.2726 Accuracy 0.8684\n    Saving checkpoint for epoch 20 at ./checkpoints/train/ckpt-4\n    Epoch 20 Loss 0.2706 Accuracy 0.8692\n    Time taken for 1 epoch: 71.4560930729 secs\n    \n\n\n## Evaluate\n\nThe following steps are used for evaluation:\n\n* Encode the input sentence using the Portuguese tokenizer (`tokenizer_pt`). Moreover, add the start and end token so the input is equivalent to what the model is trained with. This is the encoder input.\n* The decoder input is the `start token == tokenizer_en.vocab_size`.\n* Calculate the padding masks and the look ahead masks.\n* The `decoder` then outputs the predictions by looking at the `encoder output` and its own output (self-attention).\n* Select the last word and calculate the argmax of that.\n* Concatentate the predicted word to the decoder input as pass it to the decoder.\n* In this approach, the decoder predicts the next word based on the previous words it predicted.\n\nNote: The model used here has less capacity to keep the example relatively faster so the predictions maybe less right. To reproduce the results in the paper, use the entire dataset and base transformer model or transformer XL, by changing the hyperparameters above.\n\n\n```\ndef evaluate(inp_sentence):\n  start_token = [tokenizer_pt.vocab_size]\n  end_token = [tokenizer_pt.vocab_size + 1]\n  \n  # inp sentence is portuguese, hence adding the start and end token\n  inp_sentence = start_token + tokenizer_pt.encode(inp_sentence) + end_token\n  encoder_input = tf.expand_dims(inp_sentence, 0)\n  \n  # as the target is english, the first word to the transformer should be the\n  # english start token.\n  decoder_input = [tokenizer_en.vocab_size]\n  output = tf.expand_dims(decoder_input, 0)\n    \n  for i in range(MAX_LENGTH):\n    enc_padding_mask, combined_mask, dec_padding_mask = create_masks(\n        encoder_input, output)\n  \n    # predictions.shape == (batch_size, seq_len, vocab_size)\n    predictions, attention_weights = transformer(encoder_input, \n                                                 output,\n                                                 False,\n                                                 enc_padding_mask,\n                                                 combined_mask,\n                                                 dec_padding_mask)\n    \n    # select the last word from the seq_len dimension\n    predictions = predictions[: ,-1:, :]  # (batch_size, 1, vocab_size)\n\n    predicted_id = tf.cast(tf.argmax(predictions, axis=-1), tf.int32)\n    \n    # return the result if the predicted_id is equal to the end token\n    if tf.equal(predicted_id, tokenizer_en.vocab_size+1):\n      return tf.squeeze(output, axis=0), attention_weights\n    \n    # concatentate the predicted_id to the output which is given to the decoder\n    # as its input.\n    output = tf.concat([output, predicted_id], axis=-1)\n\n  return tf.squeeze(output, axis=0), attention_weights\n```\n\n\n```\ndef plot_attention_weights(attention, sentence, result, layer):\n  fig = plt.figure(figsize=(16, 8))\n  \n  sentence = tokenizer_pt.encode(sentence)\n  \n  attention = tf.squeeze(attention[layer], axis=0)\n  \n  for head in range(attention.shape[0]):\n    ax = fig.add_subplot(2, 4, head+1)\n    \n    # plot the attention weights\n    ax.matshow(attention[head][:-1, :], cmap='viridis')\n\n    fontdict = {'fontsize': 10}\n    \n    ax.set_xticks(range(len(sentence)+2))\n    ax.set_yticks(range(len(result)))\n    \n    ax.set_ylim(len(result)-1.5, -0.5)\n        \n    ax.set_xticklabels(\n        ['<start>']+[tokenizer_pt.decode([i]) for i in sentence]+['<end>'], \n        fontdict=fontdict, rotation=90)\n    \n    ax.set_yticklabels([tokenizer_en.decode([i]) for i in result \n                        if i < tokenizer_en.vocab_size], \n                       fontdict=fontdict)\n    \n    ax.set_xlabel('Head {}'.format(head+1))\n  \n  plt.tight_layout()\n  plt.show()\n```\n\n\n```\ndef translate(sentence, plot=''):\n  result, attention_weights = evaluate(sentence)\n  \n  predicted_sentence = tokenizer_en.decode([i for i in result \n                                            if i < tokenizer_en.vocab_size])  \n\n  print('Input: {}'.format(sentence))\n  print('Predicted translation: {}'.format(predicted_sentence))\n  \n  if plot:\n    plot_attention_weights(attention_weights, sentence, result, plot)\n```\n\n\n```\ntranslate(\"este é um problema que temos que resolver.\")\nprint (\"Real translation: this is a problem we have to solve .\")\n```\n\n    Input: este é um problema que temos que resolver.\n    Predicted translation: this is a problem that we have to solve .\n    Real translation: this is a problem we have to solve .\n\n\n\n```\ntranslate(\"os meus vizinhos ouviram sobre esta ideia.\")\nprint (\"Real translation: and my neighboring homes heard about this idea .\")\n```\n\n    Input: os meus vizinhos ouviram sobre esta ideia.\n    Predicted translation: my neighbors heard about this idea .\n    Real translation: and my neighboring homes heard about this idea .\n\n\n\n```\ntranslate(\"vou então muito rapidamente partilhar convosco algumas histórias de algumas coisas mágicas que aconteceram.\")\nprint (\"Real translation: so i 'll just share with you some stories very quickly of some magical things that have happened .\")\n```\n\n    Input: vou então muito rapidamente partilhar convosco algumas histórias de algumas coisas mágicas que aconteceram.\n    Predicted translation: so i 'm going to play with you some magic moments that happened , they have happening .\n    Real translation: so i 'll just share with you some stories very quickly of some magical things that have happened .\n\n\nYou can pass different layers and attention blocks of the decoder to the `plot` parameter.\n\n\n```\ntranslate(\"este é o primeiro livro que eu fiz.\", plot='decoder_layer4_block2')\nprint (\"Real translation: this is the first book i've ever done.\")\n```\n\n    Input: este é o primeiro livro que eu fiz.\n    Predicted translation: this is the first book that i did .\n\n\n\n![png](transformer_files/transformer_107_1.png)\n\n\n    Real translation: this is the first book i've ever done.\n\n\n## Summary\n\nIn this tutorial, you learned about positional encoding, multi-head attention, the importance of masking and how to create a transformer.\n\nTry using a different dataset to train the transformer. You can also create the base transformer or transformer XL by changing the hyperparameters above. You can also use the layers defined here to create [BERT](https://arxiv.org/abs/1810.04805) and train state of the art models. Futhermore, you can implement beam search to get better predictions.\n"
  },
  {
    "path": "r2/tutorials/text/word_embeddings.md",
    "content": "---\ntitle: NLP词嵌入Word embedding实战项目\ncategories: tensorflow2官方教程\ntags: tensorflow2.0教程\ntop: 1926\nabbrlink: tensorflow/tf2-tutorials-text-word_embeddings\n---\n\n# NLP词嵌入Word embedding实战项目 (tensorflow2.0官方教程翻译)\n\n本文介绍词嵌入向量 Word embedding，包含完整的代码，可以在小型数据集上从零开始训练词嵌入，并使用[Embedding Projector](http://projector.tensorflow.org) 可视化这些嵌入，如下图所示：\n\n<img src=\"https://github.com/tensorflow/docs/blob/master/site/en/r2/tutorials/text/images/embedding.jpg?raw=1\" alt=\"Screenshot of the embedding projector\" width=\"400\"/>\n\n> 词嵌入向量(Word Embedding)是NLP里面一个重要的概念，我们可以利用 WordEmbedding 将一个单词转换成固定长度的向量表示，从而便于进行数学处理。\n\n## 1. 将文本表示为数字\n\n机器学习模型以向量（数字数组）作为输入，在处理文本时，我们必须首先想出一个策略，将字符串转换为数字（或将文本“向量化”），然后再将其提供给模型。在本节中，我们将研究三种策略。\n\n### 1.1. 独热编码（One-hot encodings）\n\n首先，我们可以用“one-hot”对词汇的每个单词进行编码，想想“the cat sat on the mat”这句话，这个句子中的词汇（或独特的单词）是（cat,mat,on,The），为了表示每个单词，我们将创建一个长度等于词汇表的零向量，然后再对应单词的索引中放置一个1。这种方法如下图所示：\n\n<img src=\"https://raw.githubusercontent.com/tensorflow/docs/master/site/en/r2/tutorials/text/images/one-hot.png\" alt=\"Diagram of one-hot encodings\" width=\"400\" />\n\n为了创建包含句子编码的向量，我们可以连接每个单词的one-hot向量。\n\n关键点：这种方法是低效的，一个热编码的向量是稀疏的（意思是，大多数指标是零）。假设我们有10000个单词，要对每个单词进行一个热编码，我们将创建一个向量，其中99.99%的元素为零。\n\n### 1.2. 用唯一的数字编码每个单词\n\n我们尝试第二种方法，使用唯一的数字编码每个单词。继续上面的例子，我们可以将1赋值给“cat”，将2赋值给“mat”，以此类推，然后我们可以将句子“The cat sat on the mat”编码为像[5, 1, 4, 3, 5, 2]这样的密集向量。这种方法很有效，我们现有有一个稠密的向量（所有元素都是满的），而不是稀疏的向量。\n\n然而，这种方法有两个缺点：\n\n* 整数编码是任意的（它不捕获单词之间的任何关系）。\n\n* 对于模型来说，整数编码的解释是很有挑战性的。例如，线性分类器为每个特征学习单个权重。由于任何两个单词的相似性与它们编码的相似性之间没有关系，所以这种特征权重组合没有意义。\n\n\n### 1.3. 词嵌入\n\n词嵌入为我们提供了一种使用高效、密集表示的方法，其中相似的单词具有相似的编码，重要的是，我们不必手工指定这种编码，嵌入是浮点值的密集向量（向量的长度是您指定的参数），它们不是手工指定嵌入的值，而是可训练的参数（模型在训练期间学习的权重，与模型学习密集层的权重的方法相同）。通常会看到8维（对于小数据集）的词嵌入，在处理大型数据集时最多可达1024维。更高维度的嵌入可以捕获单词之间的细粒度关系，但需要更多的数据来学习。\n\n<img src=\"https://github.com/tensorflow/docs/blob/master/site/en/r2/tutorials/text/images/embedding2.png?raw=1\" alt=\"Diagram of an embedding\" width=\"400\" />\n\n上面是词嵌入的图表，每个单词表示为浮点值的4维向量，另一种考虑嵌入的方法是“查找表”，在学习了这些权重之后，我们可以通过查找表中对应的密集向量来编码每个单词。\n\n## 2. 利用 Embedding 层学习词嵌入\n\nKeras可以轻松使用词嵌入。我们来看看 [Embedding](https://tensorflow.goolge.cn/api_docs/python/tf/keras/layers/Embedding) 层。\n\n```python\nfrom __future__ import absolute_import, division, print_function, unicode_literals\n\n# !pip install tf-nightly-2.0-preview\nimport tensorflow as tf\n\nfrom tensorflow import keras\nfrom tensorflow.keras import layers\n\n# Embedding层至少需要两个参数： \n# 词汇表中可能的单词数量，这里是1000（1+最大单词索引）； \n# embeddings的维数，这里是32.。\nembedding_layer = layers.Embedding(1000, 32)\n```\n\nEmbedding层可以理解为一个查询表，它从整数索引（表示特定的单词）映射到密集向量（它们的嵌入）。嵌入的维数（或宽度）是一个参数，您可以用它进行试验，看看什么对您的问题有效，这与您在一个密集层中对神经元数量进行试验的方法非常相似。\n\n创建Embedding层时，嵌入的权重会随机初始化（就像任何其他层一样），在训练期间，它们通过反向传播逐渐调整，一旦经过训练，学习的词嵌入将粗略地编码单词之间的相似性（因为它们是针对您的模型所训练的特定问题而学习的）。\n\n作为输入，Embedding层采用形状`(samples, sequence_length)`的整数2D张量，其中每个条目都是整数序列，它可以嵌入可以变长度的序列。您可以使用形状`(32, 10)` （批次为32个长度为10的序列）或`(64, 15)` （批次为64个长度为15的序列）导入上述批次的嵌入层，批处理中的序列必须具有相同的长度，因此较短的序列应该用零填充，较长的序列应该被截断。\n\n作为输出，Embedding层返回一个形状`(samples, sequence_length, embedding_dimensionality)`的三维浮点张量，这样一个三维张量可以由一个RNN层来处理，也可以简单地由一个扁平化或合并的密集层处理。我们将在本教程中展示第一种方法，您可以参考[使用RNN的文本分类](https://github.com/tensorflow/docs/blob/master/site/en/r2/tutorials/text/text_classification_rnn.ipynb)来学习第一种方法。\n\n\n## 3. 从头开始学习嵌入\n\n我们将在 IMDB 影评上训练一个情感分类器，在这个过程中，我们将从头开始学习嵌入，通过下载和预处理数据集的代码快速开始(请参阅本教程[tutorial](https://tensorflow.goolge.cn/tutorials/keras/basic_text_classification)了解更多细节)。\n\n\n```python\nvocab_size = 10000\nimdb = keras.datasets.imdb\n(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=vocab_size)\n\nprint(train_data[0])\n```\n\n```\n      [1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, ...]\n```\n\n导入时，评论文本是整数编码的（每个整数代表字典中的特定单词）。\n\n\n\n### 3.1. 将整数转换会单词\n\n了解如何将整数转换回文本可能很有用，在这里我们将创建一个辅助函数来查询包含整数到字符串映射的字典对象：\n\n```python\n# 将单词映射到整数索引的字典\nword_index = imdb.get_word_index()\n\n# 第一个指数是保留的\nword_index = {k:(v+3) for k,v in word_index.items()}\nword_index[\"<PAD>\"] = 0\nword_index[\"<START>\"] = 1\nword_index[\"<UNK>\"] = 2  # unknown\nword_index[\"<UNUSED>\"] = 3\n\nreverse_word_index = dict([(value, key) for (key, value) in word_index.items()])\n\ndef decode_review(text):\n    return ' '.join([reverse_word_index.get(i, '?') for i in text])\n\ndecode_review(train_data[0])\n```\n\n```\nDownloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb_word_index.json\n1646592/1641221 [==============================] - 0s 0us/step\n\n\"<START> this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert ...\"\n```\n\n电影评论可以有不同的长度，我们将使用`pad_sequences`函数来标准化评论的长度：\n\n\n```python\nmaxlen = 500\n\ntrain_data = keras.preprocessing.sequence.pad_sequences(train_data,\n                                                        value=word_index[\"<PAD>\"],\n                                                        padding='post',\n                                                        maxlen=maxlen)\n\ntest_data = keras.preprocessing.sequence.pad_sequences(test_data,\n                                                       value=word_index[\"<PAD>\"],\n                                                       padding='post',\n                                                       maxlen=maxlen)\n                                                       \nprint(train_data[0])                                                       \n```\n\n检查填充数据的第一个元素：\n\n```\n    [   1   14   22   16   43  530  973 1622 1385   65  458 4468   66 3941\n        4  173   36  256    5   25  100   43  838  112   50  670    2    9\n       ...\n        0    0    0    0    0    0    0    0    0    0]\n```\n\n### 3.2. 创建一个简单的模型\n\n我们将使用 [Keras Sequential API](https://www.tensorflow.org/guide/keras)来定义我们的模型。\n\n* 第一层是`Embedding`层。该层采用整数编码的词汇表，并查找每个词索引的嵌入向量，这些向量是作为模型训练学习的，向量为输出数组添加维度，得到的维度是:`(batch, sequence, embedding)`。\n\n* 接下来，`GlobalAveragePooling1D`层通过对序列维度求平均，为每个示例返回固定长度的输出向量，这允许模型以尽可能最简单的方式处理可变长度的输入。\n\n* 该固定长度输出矢量通过具有16个隐藏单元的完全连接（`Dense`）层进行管道传输。\n\n* 最后一层与单个输出节点密集连接，使用`sigmoid`激活函数，此值是介于0和1之间的浮点值，表示评论为正的概率（或置信度）。\n\n```python\nembedding_dim=16\n\nmodel = keras.Sequential([\n  layers.Embedding(vocab_size, embedding_dim, input_length=maxlen),\n  layers.GlobalAveragePooling1D(),\n  layers.Dense(16, activation='relu'),\n  layers.Dense(1, activation='sigmoid')\n])\n\nmodel.summary()\n```\n\n```\n      Model: \"sequential\"\n      _________________________________________________________________\n      Layer (type)                 Output Shape              Param #   \n      =================================================================\n      embedding_1 (Embedding)      (None, 500, 16)           160000    \n      _________________________________________________________________\n      global_average_pooling1d (Gl (None, 16)                0         \n      _________________________________________________________________\n      dense (Dense)                (None, 16)                272       \n      _________________________________________________________________\n      dense_1 (Dense)              (None, 1)                 17        \n      =================================================================\n      Total params: 160,289\n      Trainable params: 160,289\n      Non-trainable params: 0\n      _________________________________________________________________\n```\n\n### 3.3. 编译和训练模型\n\n\n```python\nmodel.compile(optimizer='adam',\n              loss='binary_crossentropy',\n              metrics=['accuracy'])\n\nhistory = model.fit(\n    train_data,\n    train_labels,\n    epochs=30,\n    batch_size=512,\n    validation_split=0.2)\n```\n\n```\n      Train on 20000 samples, validate on 5000 samples\n      ...\n      Epoch 30/30\n      20000/20000 [==============================] - 1s 54us/sample - loss: 0.1639 - accuracy: 0.9449 - val_loss: 0.2840 - val_accuracy: 0.8912\n```\n\n通过这种方法，我们的模型达到了大约88%的验证精度（注意模型过度拟合，训练精度显著提高）。\n\n```python\nimport matplotlib.pyplot as plt\n\nacc = history.history['accuracy']\nval_acc = history.history['val_accuracy']\n\nepochs = range(1, len(acc) + 1)\n\nplt.figure(figsize=(12,9))\nplt.plot(epochs, acc, 'bo', label='Training acc')\nplt.plot(epochs, val_acc, 'b', label='Validation acc')\nplt.title('Training and validation accuracy')\nplt.xlabel('Epochs')\nplt.ylabel('Accuracy')\nplt.legend(loc='lower right')\nplt.ylim((0.5,1))\n\nplt.show()\n```\n\n```\n<Figure size 1200x900 with 1 Axes>\n<Figure size 1200x900 with 1 Axes>\n```\n\n## 4. 检索学习的嵌入\n\n接下来，让我们检索在训练期间学习的嵌入词，这将是一个形状矩阵 `(vocab_size,embedding-dimension)`。\n\n```python\ne = model.layers[0]\nweights = e.get_weights()[0]\nprint(weights.shape) # shape: (vocab_size, embedding_dim)\n```\n```\n    (10000, 16)\n```\n\n我们现在将权重写入磁盘。要使用[Embedding Projector](http://projector.tensorflow.org)，我们将以制表符分隔格式上传两个文件：向量文件（包含嵌入）和元数据文件（包含单词）。\n\n```python\nimport io\n\nout_v = io.open('vecs.tsv', 'w', encoding='utf-8')\nout_m = io.open('meta.tsv', 'w', encoding='utf-8')\nfor word_num in range(vocab_size):\n  word = reverse_word_index[word_num]\n  embeddings = weights[word_num]\n  out_m.write(word + \"\\n\")\n  out_v.write('\\t'.join([str(x) for x in embeddings]) + \"\\n\")\nout_v.close()\nout_m.close()\n```\n\n如果您在Colaboratory中运行本教程，则可以使用以下代码段将这些文件下载到本地计算机（或使用文件浏览器， *View -> Table of contents -> File browser*）。\n\n\n```python\ntry:\n  from google.colab import files\nexcept ImportError:\n  pass\nelse:\n  files.download('vecs.tsv')\n  files.download('meta.tsv')\n```\n\n## 5. 可视化嵌入\n\n为了可视化我们的嵌入，我们将把它们上传到[Embedding Projector](http://projector.tensorflow.org)。\n\n打开[Embedding Projector](http://projector.tensorflow.org)：\n\n* 点击“Load data”\n\n* 上传我们上面创建的两个文件：`vecs.tsv`和`meta.tsv`。\n\n现在将显示您已训练的嵌入，您可以搜索单词以查找最近的邻居。例如，尝试搜索“beautiful”，你可能会看到像“wonderful”这样的邻居。注意：您的结果可能有点不同，这取决于在训练嵌入层之前如何随机初始化权重。\n\n*注意：通过实验，你可以使用更简单的模型生成更多可解释的嵌入，尝试删除`Dense（16）`层，重新训练模型，再次可视化嵌入。*\n\n<img src=\"https://raw.githubusercontent.com/tensorflow/docs/master/site/en/r2/tutorials/text/images/embedding.jpg\" alt=\"Screenshot of the embedding projector\" width=\"400\"/>\n\n\n## 6. 下一步\n\n本教程向你展示了如何在小型数据集上从头开始训练和可视化词嵌入。\n\n* 要了解有关嵌入Keras的更多信息，我们推荐FrançoisChollet推出的教程，[链接](https://github.com/fchollet/deep-learning-with-python-notebooks/blob/master/6.2-understanding-recurrent-neural-networks.ipynb)。\n\n* 要了解有关文本分类的更多信息（包括整体工作流程，如果您对何时使用嵌入与one-hot编码感到好奇），我们推荐[Google的实战课程-文本分类指南](https://developers.google.cn/machine-learning/guides/text-classification/step-2-5)。\n\n> 最新版本：[https://www.mashangxue123.com/tensorflow/tf2-tutorials-text-word_embeddings.html](https://www.mashangxue123.com/tensorflow/tf2-tutorials-text-word_embeddings.html)\n> 英文版本：[https://tensorflow.google.cn/beta/tutorials/text/word_embeddings](https://tensorflow.google.cn/beta/tutorials/text/word_embeddings)\n> 翻译建议PR：[https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/text/word_embeddings.md](https://github.com/mashangxue/tensorflow2-zh/edit/master/r2/tutorials/text/word_embeddings.md)\n"
  }
]