[
  {
    "path": "README.md",
    "content": "# Keras governance structure\n\n![Keras logo](https://s3.amazonaws.com/keras.io/img/keras-logo-2018-large-1200.png)\n\n---\n\n## Design review process\n\nDesign-related communications are expected to happen primarily asynchronously via:\n\n- The Pull Requests used for API proposals.\n- [The Keras mailing list](https://groups.google.com/forum/#!forum/keras-users).\n\nThe process for writing and submitting design proposals is same as the [TensorFlow RFC process](https://github.com/tensorflow/community/blob/master/governance/TF-RFCs.md).\n\n- Start from [this template](https://github.com/keras-team/governance/blob/master/rfcs/yyyymmdd-rfc-template.md).\n- Fill in the content. Note that you will need to insert code examples.\n    - Provide enough context information for anyone to undertsand what's going on.\n    - Provide a solid argument as for why the feature is neeed.\n    - Include a code example of the **end-to-end workflow** you have in mind.\n- Open a Pull Request in the [Keras API proposals folder in this repository](https://github.com/keras-team/governance/tree/master/rfcs).\n- Send the Pull Request link to `keras-users@googlegroups.com` with a subject that starts with `[API DESIGN REVIEW]` (all caps) so that we notice it.\n- Wait for comments, and answer them as they come. Edit the proposal as necessary.\n- The proposal will finally be approved or rejected. Once approved, you can send out Pull Requests to implement the API changes or ask others to write Pull Requests (targeting `keras-team/keras`).\n\nNote that:\n\n- Anyone is free to send out API proposals.\n- Anyone is free to comment on API proposals or ask questions.\n\n---\n\n## Leadership\n\n### BDFL\n\nRole: final call in decisions related to the Keras API.\n\n- Francois Chollet (fchollet@google.com)\n\n---\n\n## Our mission\n\nThe purpose of our work is to democratize access to machine learning through dependable standards and usable, productive APIs.\nWe seek to empower as many people as possible, from a wide diversity of backgrounds, to take ownership of ML technology and to use it to build their own solutions to their own problems.\n\nExisting machine learning technology has the potential to solve a huge amount of problems in the world today, across every industry, and to help a tremendous amount of people. The potential is sky-high. We've barely even started. So how do we fully realize this potential?\n\nWe believe that we will only fully realize the potential of machine learning if it becomes a tool in everyone's hands -- not just a technology developed behind closed doors by an \"AI industry\", that you could only deploy by waiting for a turnkey cloud API to become available commercially, or by contracting an expensive consulting firm. We can't wait for experts to solve every problem -- experts at large tech companies don't even have visibility into a tiny fraction of the problems that can be solved. End users should solve their own problems. And our mission is to empower them to do just that.\n\nOur mission is to make these capabilities available to anyone with basic computer literacy, for free. This is how we maximize the realized potential of these technologies, and how we maximize our positive impact on the world.\n\n---\n\n## Code of conduct\n\nIn the interest of fostering an open and welcoming environment,\nwe as contributors and maintainers pledge to making participation in our project\nand our community a harassment-free experience for everyone.\nAll activity will abide by the [Code of Conduct](https://github.com/tensorflow/tensorflow/blob/master/CODE_OF_CONDUCT.md).\n\n---\n\n## Our values and our strengths\n\nWe will be able to reach our milestones because we yield superpowers of a kind that is quite uncommon among developers of ML tools:\n\n- We have empathy for our users.\n- We value and practice good design.\n- We embrace openness and we are dedicated to foster our developer community.\n- We value and practice good communication. \n- We have a unique brand that users love.\n\n**1) We have empathy for our users.** We know that every decision we make should be made with the user in mind, whether a design decision or a strategy decision. \"What will be the total impact of our choices on the people who rely on our software?\" is the question behind everything we do.\n\nHaving empathy for our users means:\n\n- Being users ourselves -- actively using our product in similar scenarios as what our users face.\n- Understanding our users by being closely in touch with them and having clear visibility into the developer experience: actively seeking community input in everything we do, listening to feedback, talking with users at external talks and developer events.\n- Putting ourselves in our users' shoes: always going above and beyond to be helpful to our users and to improve the user experience of our products.\n\n**2) We value good design.** We are fully aware that a delightful UX is what has made us successful so far and what will keep making us successful in the future. We know that things should be as simple as possible (but no simpler). We prefer elegance and minimalism over technical prowess. We follow formal principles for good design.\n\n**3) We embrace openness and we are dedicated to foster our developer community.** We know that long-term community building is critical to the success of our project, and we know that developer communities don't get built behind closed doors. We understand the necessity of doing our work in the open and the importance of involving open-source contributors at all stages of the development process.\n\n**4) We value and practice good communication.** We understand that great documentation, great code examples, and transparency of governance are essential to Keras adoption and contribute meaningfully to the Keras UX. We value external communication, documentation, and developers relations as much as we value technical contributions.\n\n**5) We value what makes us different and unique, and we value our brand that users love, Keras.** The Keras brand is an essential tool in reaching our goals: it stands for user-friendliness, accessibility, and good design. We are proud of our banner and we will carry it forward.\n"
  },
  {
    "path": "keras_api_design_guidelines.md",
    "content": "# Keras API design guidelines\n\nThese guidelines are meant to help focus design discussions and help us create delightful developer experiences.\n\nThese are meant as guidelines, not rules: each decision should be debated in its own unique context.\n\nSome text remixed from external references:\n\n- [User experience design for APIs](https://blog.keras.io/user-experience-design-for-apis.html)\n- [Notes to Myself on Software Engineering](https://medium.com/s/story/notes-to-myself-on-software-engineering-c890f16f4e4d)\n\n\n---\n\n## Design end-to-end workflows, not individual functions and classes.\n\nWhen developing APIs, start by designing end-to-end workflows, and only sketch out specific function/class signatures at the end.\n\n- The goal is to arrive to workflows that feel like they are purposefully designed and well-optimized, rather than cobbled together to route around the features provided by the API. The workflows should come first, before atomic features. **Features only exist to support a workflow.** No feature should exist to provide a capability “just in case”, “because we can”.\n- **Every design review document should prominently feature a code example of one or two end-to-end workflows showing the canonical use-case for the new API.**\n- Every time we discuss choices surrounding a specific API feature, we should start by asking: **in what workflows will this be used?** Then we should make the choice that makes the most sense with respect to these workflows. We should not make API design decisions about features in isolation. \n- This implies that we will often ask the question: **do users really need to configure this parameter?**, and in many cases, the answer will be “no”, rather than being “yes” by default.\n\n\n---\n\n## Carefully weigh whether a new feature should be included.\n\n\nIt’s okay to say no: just because someone asks for a feature doesn’t mean we should do it. Every feature has a cost that goes beyond the initial CL: maintenance cost, documentation cost, and cognitive cost for our users (a sprawling API surface is a major usability issue).\n\nIn particular, in the Keras API, every new feature has to be maintained in perpetuity, and has to be replicated in every implementation of the Keras API (which includes tf.keras, tensorflow.js, and other third-party implementations).\n\nAs, such, our criteria for adding a new feature in the API is the following:\n\n- **It should be broadly useful to our users**, rather than a niche feature that is only relevant to a specific vertical of researchers. Niche features should be maintained independently by those who need them (e.g. by extending the API via subclassing), as third-party add-on packages.\n- **It should be widely recognized as a machine learning best practice.** We will not add new layers/etc that were recently published to ArXiv.org, even in case of claims of increased accuracy/etc. We only add new objects that are already commonly used in the machine learning community. Presumably, a new technique that does result in meaningful gains would be broadly adopted after a few months anyway (like ResNet), and that’s when we would be adding it to the core API. SIG-addons maintains a repository of significantly more volatile and independently maintained code to which the barriers to entry are lower.\n- **It should have an owner committed to maintaining it in the long term.** In particular, the code should be maintainable by multiple people on the team, not just by one technical guru.\n\n\nIn addition, when saying yes to a request for supporting a new use case, remember that **literally adding what the user/team requested is often not the optimal choice**. Users are focused on their own specific use case, and we must counter this with a holistic and principled vision of the whole project (see: designing end-to-end workflows, not atomic functions/classes). Often, the right answer is to extend an existing feature. **Find the natural place to integrate the new feature in existing APIs.**\n\n\n### Examples:\n\n- We should not have added the self-normalizing activation function to the API. It was added before passing the test of time, and that technique has shown later not to reach broad adoption. **Note that citation count is not a good metric of adoption**; that paper has a high citation count.\n- We should not move to core an API that has debuted somewhere on GitHub or TF-Addons but has failed to gain more than a few users after a few months.\n\n\n---\n\n## Seek to minimize cognitive load for our users.\n\nAlways seek to minimize the cognitive load imposed on our users in the course of using our APIs.\n\nAt a high level:\n\n- **Automate everything that can be automated.**\n- **Minimize the actions & choices required from the user.** Make sure default values for arguments are sensible and reflect best practices (so that users usually wouldn’t have to manually configure these). Don’t expose options that are not important or do not match real use cases, “just in case”.\n- **Design simple and consistent workflows that reflect simple and consistent mental models.**\n\nHere are a few practical rules:\n\n- **No API should deal with internal implementation details.** An API is a language for our users to talk about the problem they care about -- and they don’t care about our internal hacks. For instance, an option like `use_locking` in an optimizer should be avoided. If an argument requires users to understand the implementation (not just what the code is supposed to implement, like SGD in this case), then the argument should not be included in the public API. **An API is all about the problem it solves, not about how the code works in the background.**\n- **Introduce as few new concepts as possible.** It's not just that additional data structures require more effort in order to learn about their methods and properties, it's that they multiply the number of **mental models** that are necessary to grok your API. Ideally, you should only need **a single universal mental model around which everything is organized** (in Keras, that's the `Layer`). Definitely avoid having more than 2 or 3 mental models underlying the workflows you design. Likewise, avoid having concepts that are mostly overlapping but subtly different, since the difference will be difficult to convey clearly and will confuse our users (like, say, `Network` and `Model` -- this is why we don't export `Network` as a public API).\n- **Objects that do interchangeable things should have identical or very close APIs.** In particular they should have the same positional arguments. For example, it should be possible to swap one optimizer for another in user code (when leaving all arguments to their default value) without editing the arguments.\n- **If you find yourself proposing a signature with more than 6-7 arguments, consider whether all of these arguments are useful.** How many people and use cases would be affected if you removed one argument? How much would they be affected -- would they be able to easily extend the API (e.g. via subclassing) to support their use case without that built-in argument? Could this API be broken up into smaller, modular objects?\n- **Best-practices should come baked into your API.** The simplest way to use your API (leaving all arguments to their default value, using the most obvious tool for the task, etc) should be as close as possible to the best way of solving the problem. In particular, all arguments that can be given a default value should be given a default value, and that default should match the most common use case.\n- **Plain Python types are preferable to custom types.** Use tuples, strings, ints... A custom type requires more knowledge and effort on the part of the user (e.g. `TensorShape`, which is also breaking established conventions of scientific Python). **When using enums, make sure that their values are strings**, so as to make it possible for users to pass plain strings (example: `data_format=\"channels_last\"`, `padding=\"valid\"`).\n- **Explicit, single-level configuration arguments are preferable to nested, hidden configuration arguments.** Avoid something like: `MyLayer(hyperparameter_dict)`, instead use `MyLayer(units, activation=None, ...)`.\n- **No API should rely on TF Variable names or Op names.** These change all the time, and should be considered a convenience, not a part of the TensorFlow & Keras API.\n\nIn particular, naming is important and difficult:\n\n- **The meaning of an argument should be clear from its name and should not require knowledge that only the implementers have.** In particular, argument names should only involve recognized terms of art (“L1 norm” is a term of art), and should not involve implementation-related vocabulary (e.g. “fused batchnorm”).\n- **Avoid `OverlyLongAndSpecificNamingPatterns`.** If you find yourself with argument names with involve more than 3 subparts (e.g. “squared_operator_norm”), reconsider. Argument names should be intuitive and easy to remember.\n- Avoid overly generic names (`x`, `variable`, `parameter`).\n- **Make sure you are consistent in your naming choices.** Naming consistency means both **internal naming consistency** (don’t call `dim` what is called `axis` in other places, don’t call `ndims` what is called `ndim` elsewhere) and **consistency with established conventions for the problem domain (terms of art)**. Before settling on a name, make sure to look up existing names used by domain experts (or other APIs). In our case, argument names should be consistent with the broader scientific Python conventions, in particular NumPy.\n\nNote that Keras uses the following naming rules:\n\n- We use the convention `num_*` for counters, though omitting an explicit counter is nicer when there is no ambiguity (e.g. `units`, `epochs`, `filters`). \n- The rank of a tensor is its `ndim`. A specific dimension index is an `axis`. The number of dimensions in a linear projection (or similar) is `units`.\n- By convention Keras layers are named with nouns rather than verbs (e.g. `Normalization` and not `Normalize`, `Convolution` and not `Convolve`).\n- Following Python conventions, classes use capitalized parts (e.g. `ClassName`) and functions and methods use snake case (e.g. `function_name`).\n- If an argument name has a numerical suffix (e.g. `alpha_1`), we put an underscore before the suffix in snake case. The capitalized equivalent would be e.g. `Alpha1`.\n- We used fully spelled-out names, e.g. `attention_scores` and not `attn_scores`. There are a couple standardized exceptions to this rule, in particular `dim` for \"dimension\" and `num` for \"number\". These are sufficiently common that they are not ambiguous to a first-time reader.\n\n\n### Example:\n\n```python\nMyConstructor(\n   per_variable_sparsity_config=[\n      'layer_1/kernel:0.8', 'layer_2/kernel:1.5'])\n```\n\nWhat's wrong with this?\n\n- Overly long argument name\n- Too much cognitive load involved in preparing an appropriate argument value\n- Preparing an argument value requires internal implementation knowledge\n- Reliance on TF variable names (subject to changes at any time, thus breaking this code)\n- Nested config adding indirection\n- Incorrect typing (float values being passing as strings)\n\nPossible alternative:\n\n```\nobj = MyConstructor()\nobj.configure_sparsity(some_layer.kernel, value=0.8)\nobj.configure_sparsity(some_other_layer.kernel, value=1.5)\n```\n\nWhat's nice about this?\n\n- Object-based variable references.\n- Modular, simple action, with a clear name.\n- Plain Python types.\n\n\n---\n\n## Balance expressivity vs. user-friendliness.\n\n### Simple use cases should be simple, advanced use cases should be possible:\n\n**Don’t increase the cognitive load of common use cases for the sake of niche use cases**, even minimally.\n**Make sure that advanced users have a path to support their use case**, even if this path requires the users to roll out plugins or other API extensions (in particular via subclassing). **It is ok for advanced use cases not to be directly supported in the built-in API options.**\n\n\n### Keep our APIs modular.\n\n**Complex objects should be achievable by composing simple objects with few arguments, that do one thing reliably.** There is a balance to strike between having complex signatures on fewer objects, and having more objects with simpler signatures. A good API has a reasonable number of objects, with reasonably simple signatures (see also: avoiding signatures with more than 6-7 arguments).\n\n**Things that create state or side-effects should be classes. Functions should be stateless.**\nFor instance, layers that create weights should not be cast as functions, since it makes the weights (and other elements of state) hard to access, impossible to update, and forces reliance on a global state capturing the side effects of layer-functions.\n\n\n### APIs should be strictly compartmentalized.\n\nFor instance, the optimizer API or the layers API should not contain arguments for configuring distributed training. That should go into the distribution API.\n\n\n---\n\n## Don’t neglect error messages, docstrings, and documentation.\n\nDocumentation and error messages are an integral part of the API. Good docs and helpful error messages are key to a delightful user experience.\n\n- **Catch user errors early and anticipate common mistakes.** Do user input validation as soon as possible. Actively keep track of common mistakes that people make (by screening GitHub and StackOverflow), and either solve them by simplifying our API, adding targeted error messages for these mistakes, or having a \"solutions to common issues\" page in our docs. Consider adding automated fallback behaviors (e.g. casting a wrongly-typed input) instead of raising errors, when applicable. Be nice to our users.\n- **Provide detailed feedback messages upon user error.** Error messages should be contextual, informative, and actionable. Every error message that transparently provides the user with the solution to their problem means one less support ticket, multiplied by how many times users run into the same issue. A good error message should answer:\n    - What happened, in what context?\n    - What did the software expect?\n    - How can the user fix it?\n- **A docstring should answer the question: what is this about, and why & how should I use it?** It should assume as little context as possible, and it shouldn’t mention specialized terms without first introducing them (for example, “num_blocks: Number of blocks in the kernel” is not a good argument description if this is the first time you mention “blocks” in your docstring).\n- **Show, don’t tell: your documentation should not talk about how the software works, it should show how to use it.** Show code examples for end-to-end workflows; show code examples for each and every common use case and key feature of your API. **All docstrings should include code examples.**\n- **Deliberately design the user onboarding process for your feature.** How are complete newcomers going to find out the best way to solve their use case with your tool? Have an answer ready. Make sure your onboarding material closely maps to what your users care about: don't teach newcomers how your framework is implemented, teach them how they can use it to solve their own problems. After shipping a CL and writing good docstrings, make sure to create a Colab guide / tutorial showcasing the target workflow, and post it on the docs website or the TF blog.\n- The feature is not ready until:\n    - 1) Users know about it\n    - 2) They know how to use it\n    - 3) They're actually using it to solve the corresponding problem.\n\n\nNote that Keras uses the following rules for writing docstrings:\n\n- For class docstrings, document arguments in a `Arguments:` section in the class docstring, not in `__init__`.\n    - When a user creates a class, they are not calling the `MyLayer.__init__()` method as if it were a regular method, they are calling `MyLayer`. We don't want to generate documentation for the `__init__()` method as a standalone method that needs to be called directly, that would be confusing. We also don't need `__init__()` docstrings that always start with \"Initializes a MyLayer class.\", which is useless information. Leaving `__init__()` without a docstring is the best practice.\n    - If constructor arguments are documented in `__init__`, it forces us to programmatically copy the `__init__` docstring when generating docs and concatenate it to the class docstring. This means that the Arguments section becomes the last thing in the docstring, which is bad.\n- The order of information in a class docstring should be:\n    - One-line description of the class, that gives initial context to the user. e.g. `Applies Dropout to the input.` Make sure the one-line description is useful. No `Intantiates an ObscureName class instance.`\n    - Paragraph(s) of more detailed information that tells the user what the object is for and when they need to use it. e.g. `The Dropout layer randomly sets input units to 0 with a frequency of \"rate\" at each step during training time, which helps prevent overfitting. Inputs not set to 0 are scaled up by \"1/(1 - rate)\" such that the sum over all inputs is unchanged. [...]`\n    - If there is a reference paper, cite it here.\n    - `Arguments` section.\n    - If it's a layer that has arguments in `call`, the `Call arguments` section.\n    - If it's a `Layer`, `Input shape` and `Output shape` sections.\n    - Example(s).\n    - Lastly, addendum. Information that isn't very important and that most users don't need, but that should be documented somewhere.\n        - e.g. the section \"About the layer's `dtype` attribute\" in the base Layer class.\n        - e.g. warnings about edge cases or compatibility issues.\n        - e.g. pointers to further guides and tutorials.\n\n\n### Error messages: a case study\n\n\nThe following would be a very poor error message:\n\n```\nAssertionError: '1 != 3'\n```\n\nIn general, to validate user input, always use `ValueError` and avoid `assert`.\n\nAlso bad:\n\n```\nValueError: 'Invalid target shape (600, 1).'\n```\n\nThe following is better, but still not sufficient, because it does not tell the user what they passed, and does not quite say how to fix it:\n\n```\nValueError: 'categorical_crossentropy requires target.shape[1] == classes'\n```\n\nNow, here's a good example, that says **what was passed**, **what was expected**, and **how to fix the issue**:\n\n```\nValueError: '''You are passing a target array of shape (600, 1) while using as loss `categorical_crossentropy`.\n`categorical_crossentropy` expects targets to be binary matrices (1s and 0s) of shape (samples, classes).\nIf your targets are integer classes, you can convert them to the expected format via:\n\n---\nfrom keras.utils import to_categorical\ny_binary = to_categorical(y_int)\n---\n\nAlternatively, you can use the loss function `sparse_categorical_crossentropy` instead, which does expect integer targets.\n```\n\n\n\n\n\n\n\n\n"
  },
  {
    "path": "project_setup_best_practices.md",
    "content": "# Best Practices for Managing Keras Projects on GitHub\n\nThis document describes the best practices for managing the projects under\n\"keras-team\" on GitHub which use GitHub as the source of truth, including\n[keras-tuner](https://github.com/keras-team/keras-tuner),\n[autokeras](https://github.com/keras-team/autokeras),\n[keras-cv](https://github.com/keras-team/keras-cv),\n[keras-nlp](https://github.com/keras-team/keras-nlp),\nand maybe more in the future. It covers linting, formating, testing, continuous\nintegration, issues and pull requests tagging, and so on.\n\nThe goal of this document is to:\n* Improve the overall quality of the projects. The fact that projects all\n  follow the same standard for dev process, which may evolve through time, will\n  ensure the quality from all aspects.\n* Unify the external contributing experience. The external open-source\n  contributors may contribute to multiple Keras projects by submitting issues\n  or pull requests. They don't need to learn from different contributing\n  guides.\n* Save time for the project leads. They save time by copying and pasting the\n  same setup and by avoiding the listed caveats.\n\n## Testing\n\n### Testing framework\n\nWe use [pytest](https://docs.pytest.org/en/6.2.x/) for writing tests for the\nprojects, which is the most widely used testing framework for Python in the OSS\nworld. The configuration of pytest is\n[here](https://github.com/keras-team/keras-tuner/blob/1.1.0/setup.cfg#L4-L16).\n\n### File locations for the tests\n\nUnit tests should be contained in sibling files, relative to the class or\nutility files they are testing. The name of a test file should follow the\npattern of `*_test.py`. For example, the tests for\n`/keras_tuner/engine/hyperparameters.py` are in\n`/keras_tuner/engine/hyperparameters_tests.py`.\n\nIntegration tests may be contained in their own `/keras_tuner/integration_tests`\ndirectory, as they may require extra files such as data.\n\nWhile our unit test placement is not suggested in the\n[good practices of pytest](https://docs.pytest.org/en/6.2.x/goodpractices.html)\ndoc, we recommend this approach to improve the discoverability of the unit\ntests for new contributors. This discoverability doubles up as a method of\ndocumentation; when users want to see what `util.utility_function()` does, they\ncan simply open the conveniently located sibling file, `util_test.py`.\n\n### Test Coverage\n\nWe use [CodeCov](https://about.codecov.io/) to track the test coverage.You may\nalso refer to\n[these settings](https://github.com/keras-team/keras-tuner/blob/1.1.0/setup.cfg#L24-L28)\nin `setup.cfg`. We will see more about it in the continuous integration section.\n\nPytest CodeCov supports a wildcard exclude field, which should be set to\ninclude `*_test.py`, as to ensure that tests are not included in the code\ncoverage count.\n\n### Useful code snippets\nFix the random seed for all tests:\n[Link1](https://github.com/keras-team/keras-tuner/blob/1.1.0/tests/conftest.py#L8-L17),\n[Link2](https://github.com/keras-team/keras-tuner/blob/master/tests/unit_tests/randomness_test.py),\n[Link3](https://www.tensorflow.org/api_docs/python/tf/keras/utils/set_random_seed).\n\nCreate a temporary path for testing: [Link](https://docs.pytest.org/en/6.2.x/tmpdir.html).\n\n## Code styles\n\n### Importing Keras modules\n\nFor projects based on Keras and TensorFlow, top-level imports are encouraged, like\nshows in the following example.\n\n```py\nimport tensorflow as tf\nfrom tensorflow import keras\n```\n\nExceptions may be acceptable when the module appeared too many times in the code,\nlike `keras.layers`.\n\n### Linting and formatting\n\nWe use\n[black](https://black.readthedocs.io/en/stable/),\n[isort](https://pycqa.github.io/isort/), \n[flake8](https://flake8.pycqa.org/en/latest/)\nto lint and format the code. black is to generally format the code. isort is to\nsort the imports. flake8 is for some additional checks that black doesn't do,\nlike the long lines with a single string. You can see the relevant sections of\n[setup.cfg](https://github.com/keras-team/keras-tuner/blob/1.1.0/setup.cfg) for\nthe detailed configuration of these tools.\n\nThe user does not need to know how to use these tools to lint or format the\ncode. We provide them with two shell scripts:\n[`/shell/lint.sh`](https://github.com/keras-team/keras-tuner/blob/master/shell/lint.sh)\nand\n[`/shell/format.sh`](https://github.com/keras-team/keras-tuner/blob/master/shell/format.sh).\nIn these scripts, we also check and add the Apache 2.0 License head to every\nfile.\n\n## Releasing\n\n### Release setups\n\nThe version number of the package is stored only in `/package_name/__init__.py`\nwith a single line of `__version__ = 'master'` on the master branch.\n[example](https://github.com/keras-team/keras-tuner/blob/1e13aabe5b6659340a8ee81328805479a57b2105/keras_tuner/__init__.py#L35)\n\nWe also need the `setup.py` file for the PyPI release.\n[example](https://github.com/keras-team/keras-tuner/blob/1e13aabe5b6659340a8ee81328805479a57b2105/setup.py)\n\nFor the `setup.py` file to grab the current version number from\n`/package_name/__init__.py`, we need additional lines in `setup.cfg`.\n[example](https://github.com/keras-team/keras-tuner/blob/1.1.0/setup.cfg#L1-L2)\n\n### Draft a new release\n\nFor releasing a new version of the package, please following these steps:\n* Create a new branch from the master branch.\n* Modify the `__version__` value in the new branch.\n* Create a new release on GitHub.\n  [Official tutorial](https://docs.github.com/en/repositories/releasing-projects-on-github/managing-releases-in-a-repository)\n\nNote that the continuous integration will upload it to PyPI automatically.\n\n### Excluding Sibling Test\n\nUnit tests are hosted in sibling files relative to the files containing the\ncode they are testing. `SetupTools.find_packages()` supports an\n[exclude field](https://github.com/pypa/setuptools/blob/f838bc6a170046c9fdfc2251e5466040a669ca12/setuptools/__init__.py#L52).\nThis field should contain `*_test.py` to ensure that tests are not packaged\nwith the release.\n\n## Continuous integration\n\nWe use [GitHub Actions](https://github.com/features/actions) for continuous\nintegrations. It automates running tests, checking the code styles, uploading\ntest coverages to CodeCov, and uploading new releases to PyPI.\n\nYou can refer to\n[this file](https://github.com/keras-team/keras-tuner/blob/master/.github/workflows/actions.yml)\nfor how to set it up. We use a single YAML file for all the GitHub Actions to\navoid installing the dependencies multiple times.\n\nTo use this setup, you also need to upload your CodeCov and PyPI credentials to\nthe project. Here is the\n[official tutorial](https://docs.github.com/en/actions/security-guides/encrypted-secrets#creating-encrypted-secrets-for-a-repository).\n\nMake sure you follow the naming of the following secrets for the GitHub Actions YAML file to work.\nName the CodeCov token as `CODECOV_TOKEN`.\nName the PyPI username and password as `PYPI_USERNAME` and `PYPI_PASSWORD`.\n\nWe should also test against tf-nightly every day to discover bugs and\nincompatible issues early and well before the stable release of TensorFlow.\nThe CI setup for it is\n[here](https://github.com/keras-team/keras-tuner/blob/master/.github/workflows/nightly.yml).\n\n## Contributing experience\n\nWe will have a common CONTRIBUTING.md in `keras-team/governance` to be\ndistributed to the other repos. This\n[GitHub Action](https://github.com/marketplace/actions/file-sync) may be a good\nway to sync a centralized contributing guide to different repos.\nWe should also have\n[this directory](https://github.com/keras-team/keras-tuner/tree/master/.devcontainer)\nto support GitHub Codespaces, which is a trend on GitHub. It provides a\nweb-based IDE to save the contributors from setting up their own dev\nenvironment, which would attract more contributors.\n\n## Issues and pull requests\n\nWe will have the same issue and pull request\n[templates](https://github.com/keras-team/keras/tree/master/.github/ISSUE_TEMPLATE)\nacross projects in `keras-team`. They will also be stored in\n`keras-team/governance` and be distributed to the other repos.\n\nAlso need to confirm if there is a way to unify the taggings between the repos.\n"
  },
  {
    "path": "rfcs/20190502-preprocessing-layers.md",
    "content": "# Keras Preprocessing Layers\n\n| Status        | Accepted      |\n:-------------- |:---------------------------------------------------- |\n| **Author(s)** | Mark Omernick (momernick@google.com), Stan Bileschi (bileschi@google.com), Kester Tong (kestert@google.com), Francois Chollet (fchollet@google.com) |\n| **Updated**   | 2019-05-21                                           |\n\n\n## Objective\n\nWe aim at providing additional Keras layers to handle [data preprocessing operations](https://en.wikipedia.org/wiki/Data_pre-processing)\nsuch as text vectorization, data normalization, and data discretization (binning).\nThese operations are currently handled separately from a Keras model via utilities such\nas those from `keras.preprocessing`.\n\nThese new layers will allow users to include data preprocessing directly in their Keras model, so as to create models that map raw data (such as uint8 tensors for images, or string tensors for text) to predictions.\n\n\n## Key benefits\n\nIncluding preprocessing layers in the Keras model means that the same preprocessing steps will be performed when that model is exported and used in serving.\nIt also means the steps will be part of the model when the model is saved and loaded as part of another model.\n\nThis presents the following advantages:\n\n- Model portability (encapsulation for sharing models). With PreprocessingLayers, your Keras Model contains all the preprocessing it requires. If another user wishes to use your model in a different workflow, there is no risk of incorrect preprocessing. Models will be more end-to-end.\n- Serving reliability. The Model object will contain everything you expect to be done at serving time.\n- Simpler optimization using tf.data and tf.Transform. By providing simple, well defined building blocks for preprocessing, we simplify the process of using tf.data and tf.Transform to optimize preprocessing steps. Users can offload computation of vocabularies, quantiles and mean and variance, to tf.Transform.  They can also use tf.data to move data preprocessing in training off the critical path. The preprocessing layer API is designed to make both of these easy and simple.\n\nIn particular, we expect preprocessing layers to make it easier to serve models in TF.js or in mobile applications. It will also reduce the risk that benchmarks of Keras applications use incorrect preprocessing and subsquently publish invalid findings.\n\n\n## Design overview\n\n### End-to-end workflow overview\n\nCase where a user has a single preprocessing layer to do image normalization.\n\n```python\nnormalization = keras.layers.Normalization(axis=-1)\nnormalization.adapt(data_sample)\n\nmodel = keras.Sequential([\n    normalization,\n    keras.applications.ResNet50(weights=None),\n])\nmodel.fit(data, targets, epochs=10)\n```\n\nCase where a user has a single preprocessing layer to do text vectorization where each input sample is encoded as a sequence of word indices.\n\n```python\nvectorization = keras.layers.TextVectorization(mode='int')\nvectorization.adapt(data_sample)\n\nmodel = keras.Sequential([\n    vectorization,\n    keras.layers.Embedding(128),  # The number of int indices is not specified since it is inferred.\n    keras.layers.LSTM(32),\n    keras.layers.Dense(10, activation='softmax'),\n])\nmodel.fit(data, targets, epochs=10)\n```\n\nCase where a user has a single preprocessing layer to do text vectorization where each input sample is encoded as a dense vector of TF-IDF scores.\n\n```python\nvectorization = keras.layers.TextVectorization(mode='tfidf')\nvectorization.adapt(data_sample)\n\nmodel = keras.Sequential([\n    vectorization,\n    keras.layers.Dense(10, activation='softmax'),\n])\nmodel.fit(data, targets, epochs=10)\n```\n\nCase where a user chains a a normalization step with a discretization step.\n\n```python\nnormalization = keras.layers.Normalization()\ndiscretization = keras.layers.Discretization()\npreprocessing_stage = keras.layers.PreprocessingStage([normalization,\n                                                       discretization])\npreprocessing_stage.adapt(data_sample)\n\nmodel = keras.Sequential([\n    preprocessing_stage,\n    keras.layers.Dense(10, activation='softmax'),\n])\nmodel.fit(data, targets, epochs=10)\n```\n\n\n### Base class: `PreprocessingLayer`\n\nAll preprocessing layers inherit from a base class: `PreprocessingLayer`, which itself inherits from `Layer`.\n\nThis class presents a few key differences compared to regular layers:\n\n**Separate training mechanism**\n\nThe internal state of a `PreprocessingLayer` is not affected by backpropagation: all of its weights are non-trainable. A `PreprocessingLayer` has to be trained in a separate step, as follow:\n\n```python\npreprocessing_layer.adapt(data_sample)\n```\n\n**Possible non-differentiability**\n\nProcessing layers extend Keras by allowing preprocessing to be part of the model. Unlike existing layers, these computations are not always differentiable, e.g. both `Discretize` and `VectorizeText` are non-differentiable.\n\nAs a result, all preprocessing layers are treated as frozen when used as part of a model. In addition, if a non-differentiable layer is used in the middle of a model (rather than at the start), the model will raise an exception related to differentiability when trying to compute gradients (e.g. as part of `fit`).\n\n\n### New layers\n\n- `PreprocessingLayer` base class: implements shared logic, in particular the `adapt` method for setting the state of the layer.\n- `PreprocessingStage` class: makes it possible to chain multiple preprocessing layers together while training them in one single `adapt` call (by doing cascading training of the underlying layers).\n- `Normalization`: normalizes data feature-wise by subtracting the mean of some sample dataset and dividing by the variance.\n- `Discretization`: transforms continuous data into one-hot encoded binary vectors representing the different \"bins\" that the continuous data belongs to.\n- `TextVectorization`: transforms string data into either dense vectors (e.g. TF-IDF transform) or sequences of token indices (e.g. to be passed to an `Embedding` layer).\n\n\n## Design details\n\n### Detailed layer signatures\n\n#### PreprocessingLayer\n\n```python\ndef adapt(self, data, reset_state=True):\n    \"\"\"Fits the state of the preprocessing layer to the data being passed.\n\n    Arguments:\n        data: The data to train on. It can be passed either as a tf.data Dataset,\n            or as a numpy array (or a dict or list of arrays in case of multi-input\n            preprocessing stages).\n        reset_state: Optional argument specifying whether to clear the state of the\n            layer at the start of the call to `adapt`, or whether to start from\n            the existing state. This argument may not be relevant to all\n            preprocessing layers: a subclass of PreprocessingLayer may chose to\n            only implement `adapt(self, data)`.\n    \"\"\"\n```\n\n#### PrepocessingStage\n\nThere are two ways to instantiate a `PrepocessingStage` layer: either `Sequential` style (pass a list of preprocessing layer instances) or Functional style (pass the inputs and outputs of a DAG of preprocessing layers).\n\nIf any layer other than `PreprocessingLayer` instances is included in a `PrepocessingStage`, these layers will be treated as frozen both during `adapt` and later during `fit`.\n\n\n#### Normalization\n\n```python\ndef __init__(self, axis=-1, **kwargs):\n    \"\"\"Feature-wise normalization of the data.\n\n    Arguments:\n        axis: Integer or tuple of integers, the axis or axes\n            that should be normalized (typically the features axis).\n\n    Input shape and type:\n        dtype: floating point.\n        shape: any shape with rank >= 2 is accepted.\n\n    Output shape and type:\n        dtype: same as input.\n        shape: same as input.\n\n    What happens in `adapt`:\n        Compute mean and variance of the data\n        and store them as the layer's weights.\n    \"\"\"\n```\n\n#### Discretization\n\n```python\ndef __init__(self, bins=None, strategy='quantiles', sparse=False, **kwargs):\n    \"\"\"Maps continuous data into one-hot binary vectors of bin indicators.\n\n    Each non-overlapping bin covers\n    a contiguous portion of the dimension considered.\n    Bin boundaries can be provided by the user or learned as quantiles.\n\n    Arguments:\n        bins: int | List<float>\n            If bins is an int, then bin boundaries are to be learned,\n            and the width of the output will be exactly bins.\n            For instance, setting bins to 4 implies that\n            inputs are to be sorted into quantiles,\n            and three boundaries are to be learned,\n            corresponding to the 25th, 50th, and 75th percentile value.\n            If, instead, bins is a list of floats, then those are\n            the bin boundary values and nothing is to be learned.\n            The width of the output will in that case be the len(bins) + 1.\n        strategy: callable | 'quantiles'\n            If strategy is the string 'quantiles' (default),\n            then bin boundaries will be learned such that each bin\n            receives an approximately equal number of sample input values.\n            ‘Strategy’ may also be a callable that takes\n            (float value, list[float] boundaries) and returns\n            an int bucket_index which represents\n            which bucket to map ‘value’ to.\n        sparse: If True, the layer will output a SparseTensor.\n            Otherwise it will be dense.\n            This does not change the shape or structure of the output.\n            Specifically tf.sparse.to_dense(output) will be the same for both.\n\n    Input shape and type:\n        dtype: floating point.\n        shape: [batch_size, ..., features]\n\n    Output shape and type:\n        dtype: int\n        shape: [batch_size, ..., features, num_bins]\n            i.e., the same as the input shape,\n            with an additional dimension corresponding to\n            the number of bins, which is equal to either\n            the bins constructor argument (if it is an integer),\n            or the length of the bins constructor argument plus 1,\n            if it is a list.\n\n    What happens in `adapt`:\n        We use a streaming quantile estimator to update the bin boundaries\n        so that statistically an element is about equally likely\n        to fall into any bin.\n        Multiple calls to update continue to mutate\n        the layer based on all data seen so far.\n    \"\"\"\n```\n\n#### TextVectorization\n\nThis layer has basic options for managing text in the Keras model.\nIt is expected that more advanced users needing custom control will uses Keras-compatible layers provided by tf.text.\n\nTransform a batch of strings (one sample = one string) into either a list of token indices\n(one sample = 1D int tensor), or a dense representation (1 sample = 1D float vector).\n\nThe processing of each sample unfolds as:\n- Standardize each sample (usually lowercasing + punctuation stripping)\n- Split each sample into substrings (usually words)\n- Recombine substrings into tokens (usually ngrams)\n- Index tokens (associate a unique int value with each token)\n- Transform each sample using this index, either into a vector of ints or a dense float vector.\n\n\n```python\ndef __init__(self,\n             tokens=None,\n             standardize='lower_and_strip_punctuation',\n             split='whitespace',\n             ngrams=1,\n             mode='int',\n             max_length=None):\n    \"\"\"Transforms text into dense vectors or sequences of word indices.\n\n    Arguments:\n        tokens: None (default) | int | list<string>\n            If tokens is an int, then this layer will learn\n            an internal vocabulary of size (tokens - 2),\n            such that each of the most frequent (tokens - 2) words\n            is assigned assigned to one of the values in [0, tokens).\n            The output will have a total to tokens possible values,\n            once the out-of-vocabulary value (1)\n            and the reserved masking value (0) is taken into account.\n            If tokens is None, the number of tokens is automatically inferred\n            from the training data (the output will have a number\n            of possible values equal to the total number of unique tokens\n            seen in the data, plus 2).\n            If, instead, tokens is a list of strings, then it constitutes\n            exactly to a map from string to integer,\n            and there is nothing to be learned.\n            The vocabulary output width will be len(tokens) + 2,\n            accounting for the out-of-vocabulary value (1)\n            and the reserved masking value (0).\n        standardize: 'lower_and_strip_punctuation' (default) | None | callable string -> string\n            if standardize is the string \"lower_and_strip_punctuation\",\n            each sample is converted to lowercase\n            and the following characters are stripped from each sample\n            before splitting: '!\"#$%&()*+,-./:;<=>?@[\\\\]^_`{|}~\\t\\n'\n            if it is a callable, that callable is used\n            to preprocess each input string before splitting.\n        split: ‘whitespace’ (default) | None | Callable string -> list<string>\n            if split is ‘whitespace’, then the string\n            will be split on whitespace characters.\n            if split is None, then each string is treated as a single token.\n            if, instead, split is a function from strings to lists of strings,\n            then that function will be applied to each string in the input.\n        ngrams: 1 (default) | 2 | 3\n            Controls the ngram functionality of this layer.\n            This layer performs ngrams by concatenating strings\n            with no separator and no begin or end tokens;\n            the ngramming algorithm is not configurable.\n            if ngrams is an int N = 2 or 3,\n            the substrings returned by the split function\n            are combined into N-grams before being indexed.\n        mode: 'int' (default) | 'count' | 'binary' | 'tfidf'\n            controls how the integerized words are\n            reduced and packed into an output vector.\n            if mode is 'count', then the output vector will be\n            of length tokens, and the element at position i will\n            summarize how many times the string mapping to\n            integer i occurred in the split input.\n            If, instead, mode is 'binary',\n            then the output vector will be the same as for 'count'\n            but will contain a 1 if the count is greater than 0.\n            if, instead, mode is 'tfidf',\n            then the output vector will be the same as for 'count',\n            but instead of counts of tokens, will contain\n            the weighted count where weights are determined\n            by the ‘tfidf’ algorithm.\n            if, instead, mode is 'int', then the output vector is\n            an int tensor where each int is the index of one token\n            in the input string.\n        max_length:  None (default) | int.\n            Only used if mode=int. If set to an int,\n            the output int tensors are of shape [..., max_length],\n            with longer sequences being truncated at the end and\n            shorter sequences being right-padded.\n            If set to None, output sequences are\n            of shape [..., max_length_in_batch],\n            where max_length_in_batch is the length\n            of the longest sequence in the current batch:\n            shorter sequences get right-padded.\n\n    Input shape and type:\n        dtype: string.\n        shape: (batch_size, ..., 1)\n\n    Output shape and type:\n        if `mode='int'`:\n            dtype: int\n            shape: (batch_size, ..., max_length), where max_length\n                is the length of the longest token sequence in the current batch, or\n                the value of the argument `max_length` if it was passed.\n        else:\n            dtype: floating point.\n            shape: (batch_size, ..., num_tokens)\n\n    What happens in `adapt`:\n        We build an index mapping tokens to token indices,\n        and in the case of `mode='count'` and `mode='tfidf`,\n        we keep track of how many time each token has appeared.\n    \"\"\"\n```\n\n### Writing a subclass of `PreprocessingLayer`\n\nThe following 3 methods should be overridden:\n\n- `__init__`: constructor of the layer, used to configure its behavior.\n- `build(self, inputs_shape)`: creates the state variables of the layer.\n- `call(self, inputs)`: transforms the inputs (should only be called after `adapt` has been called).\n- `adapt(self, data, [reset_state=True])`: sets the state of the layer given the data provided (either as a tf.data dataset or numpy array(s)). The `reset_state` argument is optional and may be ignored.\n\n\n### Handling of async prefetching\n\nSome preprocessing ops are CPU-only and benefit from being executed asynchronously on the accelerator host (as opposed to the accelerator itself, e.g. GPU or TPU),\nwith a batch of data being prepocessed on the host while the previous batch is being processed by the accelerator. This pattern is known as \"async prefetching\".\n\nThis is normally done as part of a tf.data pipeline. The current proposal implies moving some of that preprocessing to inside the model itself, which is normally\nexecuted end-to-end on an accelerator.\n\nThis means that we need a way to lift the preprocessing part of the model in a tf.data pipeline during model training. In `fit`, we can do this automatically.\nIn custom training loops, we will expect the user to do it manually (see subsection \"Custom training loops\").\n\nWe propose the addition of two new methods on the `Model` class:\n\n```python\ndef get_preprocessing_stage(self):\n    \"\"\"Retrieves the preprocessing part of the model.\n\n    This is the part of the model that should be executed asynchronously\n    on the device host during training.\n\n    Returns:\n        Instance of `PreprocessingLayer` or `PreprocessingStage`.\n        May be None if the model does not start with preprocessing layers.\n    \"\"\"\n    pass\n\ndef get_main_stage(self):\n    \"\"\"Retrieves the main processing part of the model.\n\n    This is the part of the model that should be executed\n    on the accelator device.\n\n    Returns:\n        Model instance.\n    \"\"\"\n```\n\nThus, for any model that starts with preprocessing layers, the following:\n\n```python\noutputs = model(inputs)\n```\n\nis functionally equivalent to:\n\n```python\npreprocessed_inputs = model.get_preprocessing_stage()(inputs)\noutputs = model.get_main_stage()(preprocessed_inputs)\n```\n\n\n#### Examples:\n\nSequential model with a preprocessing layer:\n\n```python\nvectorization = keras.layers.TextVectorization()\nvectorization.adapt(data_sample)\n\nmodel = keras.Sequential([\n    vectorization,\n    keras.layers.Dense(10, activation='softmax'),\n])\n\n# This is the `vectorization` layer.\npreproc_stage = model.get_preprocessing_stage()\n# model containing the `Dense` layer only.\nmain_stage = model.get_main_stage()\n```\n\nFunctional model with 2 branches, each with a preprocessing layer:\n\n```python\nnormalization_a = layers.Normalization()\nnormalization_b = layers.Normalization()\nnormalization_a.adapt(data_a)\nnormalization_b.adapt(data_b)\n\ninput_a = Input(shape_a)\ninput_b = Input(shape_b)\nnormed_a = normalization_a(input_a)\nnormed_b = normalization_b(input_b)\na = layers.Dense(32)(normed_a)\nb = layers.Dense(32)(normed_b)\nc = layers.concatenate([a, b])\noutputs = layers.Dense(1, activation='sigmoid')(c)\n\nmodel = Model([input_a, input_b], outputs)\n\n# `PreprocessingStage` instance\n# mapping `[input_a, input_b]` to `[normed_a, normed_b]`\npreproc_stage = model.get_preprocessing_stage()\n\n# Model instance mapping `[normed_a, normed_b]` to `outputs`.\nmain_stage = model.get_main_stage()\n```\n\nSubclassed model with a preprocessing layer:\n\n```python\nclass MyModel(Model):\n\n    def __init__(self, **kwargs):\n        super(MyModel, self).__init__(**kwargs)\n        self.preproc_layer = layers.Normalization()\n        self.submodel = MySubmodel()\n\n    def call(self, inputs):\n        return self.submodel(self.preproc_layer(inputs))\n\n    def get_preprocessing_stage(self):\n        return self.preproc_layer\n\n    def get_main_stage(self):\n        return self.submodel\n```\n\n\n#### The case of the built-in `fit` loop\n\n\nWhen calling `fit` or `evaluate` on a Dataset a model that contains preprocessing layers,\nthe lifting happens automatically and the user-facing workflow doesn't change.\n\n```python\nmodel.fit(dataset, epochs=10)\n```\n\n#### Custom training loops\n\nWhen writing custom training loops, the user must manually do the lifting of the preprocessing stage\ninto the data pipeline:\n\n```python\nmodel = Model(...)\npreproc_stage = model.get_preprocessing_stage()\nmain_model = model.get_main_stage()\n\npreproc_dataset = Dataset(...)\npreproc_stage.adapt(preproc_dataset)\n\n# Map the preprocessing stage on the dataset.\ndataset = Dataset(...)\ndataset = dataset.map(preproc_stage)\n\n# Regular training loop (using `main_model`).\nfor x, y in dataset:\n    with GradientTape() as tape:\n        y_pred = main_model(x)\n        loss = loss_fn(y, y_pred)\n        ...\n```\n\nIn general, you won't have to refer to `get_preprocessing_stage` and `get_main_stage` directly, because you will\nprobably already have direct handles on your preprocessing layer and the rest of the model:\n\n```python\nnormalization = layers.Normalization()\nnormalization.adapt(preproc_dataset)\ndataset = dataset.map(normalization)\n\nfor x, y in dataset:\n    with GradientTape() as tape:\n        y_pred = model(x)\n        loss = loss_fn(y, y_pred)\n        ...\n```\n\n\n## Questions and Discussion Topics\n\n### Naming Discussion\n\n#### Naming conventions to follow for preprocessing layers\n\n[RESOLUTION: we will use option A]\n\nWe have two possible sets of names for the layers:\n\n##### Option A: Normalization, Discretization, TextVectorization\n\nPros: consistent with most existing layers, in particular BatchNormalization.\nCons: It's longer.\n\n##### Option B: Normalize, Discretize, VectorizeText\n\nPros: It's shorter.\nCons: Normalize vs BatchNormalization is jarring.\n\n\n#### Using the name \"preprocessing\" or \"processing\"\n\n[RESOLUTION: we will use option A, \"preprocessing\"]\n\nIt has been proposed that we use the name \"processing\" throughout the API instead of \"preprocessing\".\n\n##### Option A: \"preprocessing\".\n\nPros:\n1) The meaning of \"preprocessing\" is clear to all users (\"data normalization and stuff\").\n2) We need a clear semantic boundary between the main data processing flow of a model and what goes before it (the preprocessing stage).\n3) It replaces the functionality of the `keras.preprocessing` module, and should be consistent with this naming convention.\n\nCons:\nThe `Normalization` layer, being differentiable, can be used in the middle of a model, rather than at the start.\nHowever, there's nothing weird about keeping the name \"preprocessing\" in this specific case: it is widely understood that a `Normalization` layer is doing \"data preprocessing\", independently of where you use it -- in fact, normalization is the first example that shows up in most definitions of \"data preprocessing\". \n\n\n##### Option B: \"processing\".\n\nPros: The Normalization layer can be used elsewhere in a model than at the start (although it would have to be trained separately).\nCons: It's very generic, and does not clearly convey the difference between \"preprocessing stage\" and \"main processing stage\" required by the async prefetching API.\n\n\n#### Name to use for `adapt` method\n\n[RESOLUTION: decision delayed until implementation]\n\nWe may want to use the name `fit` instead (other suggestions welcome).\n\nPros of using `fit`: consistency with `model.fit()`, and the `fit` method on `ImageDataGenerator` and `Tokenizer` from the `keras.preprocessing` module.\nCons of using `fit`: It may confuse users, since `preprocessing_layer.fit()` would have a different signature.\n\n---\n\n[OTHER ADDITIONS FROM DESIGN REVIEW]\n\n- We should decouple the user-facing `adapt(data)` method (or `fit(data)`), and the implementer-facing method, so as to make it easier to implement support for different data formats.\n\n\n\n"
  },
  {
    "path": "rfcs/20190729-keras-preprocessing-redesign.md",
    "content": "# Keras Preprocessing API\n\n| Status        | Proposed      |\n:-------------- |:---------------------------------------------------- |\n| **Author(s)** | Francois Chollet (fchollet@google.com), Frederic Branchaud-Charron (Frederic.Branchaud-Charron@usherbrooke.ca)|\n| **Updated**   | 2019-08-21                                           |\n\n\n## Context\n\n`tf.data.Dataset` is the main API for data loading and preprocessing in TensorFLow. It has two advantages:\n\n- It supports GPU prefetching\n- It supports distribution via the Distribution Strategies API\n\nMeanwhile, `keras.preprocessing` is a major API for data loading and preprocessing in Keras. It is based\non Numpy and Scipy, and it produces instances of the `keras.utils.Sequence` class, which are finite-length,\nresettable Python generators that yield batches of data.\n\nSome features of `keras.preprocessing` are highly useful and don't have straightforward equivalents in `tf.data`\n(in particular image data augmentation and dynamic time series iteration).\n\nIdeally, the utilities in `keras.preprocessing` should be made compatible with `tf.data`.\nThis presents the opportunity to improve on the existing API. In particular we don't have good support\nfor image segmentation use cases today.\n\nSome features are also being supplanted by [preprocessing layers](https://github.com/keras-team/governance/blob/master/rfcs/20190502-preprocessing-layers.md), in particular text processing. \nAs a result we may want move the current API to an API similar to Layers.\n\n\n## Goals\n\n- Unify \"keras.preprocessing\" and the recently-introduced [Preprocessing Layers API](https://github.com/keras-team/governance/blob/master/rfcs/20190502-preprocessing-layers.md).\n- Make all features of `keras.preprocessing` compatible with `tf.data`.\n- As a by-product, add required ops to TensorFlow (`tf.image`).\n\n\n## Proposed changes at a high-level\n\n\n- Deprecate `ImagePipelineGenerator` in favor of new `ImagePipeline` class similar to a `Sequential` model.\n- Inherits from `keras.layers.PreprocessingLayer` for all image transformations.\n- Deprecate `Tokenizer` class in favor of `TextVectorization` preprocessing layer.\n- Replace `TimeseriesGenerator` with a function-based API.\n\n\n## Detailed API changes\n\n\n### ImagePipeline\n\n#### Constructor\n\n`ImagePipeline` inherits from `PreprocessingLayer` (or alternatively `keras.model.Sequential`, whose behavior is similar) and takes a list of layers as inputs. In the future it will inherit from `PreprocessingStage`.\n\n`ImagePipeline` is a preprocessing layer that encapsulate a series of image transformations. Since some of these transformations may be trained (featurewise normalization), it exposes the method `adapt`, like all other preprocessing layers.\n\n\n```python\n\nclass ImagePipeline(Sequential):\n\n    def __init__(self, layers:List[Layer]):\n        ...\n```\n\n#### Example usage\n\n```python\npreprocessor = ImagePipeline([\n    RandomFlip(horizontal=True),\n    RandomRotation(0.2, fill_mode='constant'),\n    RandomZoom(0.2, fill_mode='constant'),\n    RandomTranslation(0.2, fill_mode='constant'),\n    Normalization(),  # This is the same Normalization introduced in preprocessing layers\n])\npreprocessor.adapt(sample_data)  # optional step in case the object needs to be trained\n\ndataset = preprocessor.from_directory(dir_name, image_size=(512, 512))\nmodel.fit(dataset, epochs=10)\n```\n\n#### Methods\n\n```python\ndef from_directory(\n    self,\n    directory,\n    targets='inferred',\n    target_mode='categorical',\n    class_names='inferred',\n    color_mode='rgb',\n    batch_size=32,\n    image_size=(255, 255),\n    shuffle=True,\n    seed=None,\n    follow_links=False,\n    validation_split=None,\n    subset='training',\n    subset=None):\n    \"\"\"Generates a Dataset from files in a directory.\n\n    # Arguments:\n        directory: Directory where the data is located.\n            If `targets` is \"inferred\", it should contain\n            subdirectories, each containing images for a class.\n            Otherwise, the directory structure is ignored.\n        targets: Either\n            \"inferred\" (targets are generated from the directory structure),\n            None (no targets),\n            or a list of integer labels of the same size as the number of image\n            files found in the directory.\n        target_mode:\n            - 'categorical' means that the inferred labels are\n                encoded as a categorical vector (e.g. for categorical_crossentropy).\n            - 'binary' means that the inferred labels (there can be only 2)\n                are encoded as binary scalars (e.g. for binary_crossentropy).\n        class_names: Only valid if \"targets\" is \"inferred\". This is the explict\n            list of class names (must match names of subdirectories). Used\n            to control the order of the classes (otherwise alphanumerical order is used).\n        color_mode: One of \"grayscale\", \"rgb\", \"rgba\". Default: \"rgb\".\n            Whether the images will be converted to\n            have 1, 3, or 4 channels.\n        batch_size: Size of the batches of data (default: 32).\n        image_size: Size to resize images to after they are read from disk.\n          Since the pipeline processes batches of images that must all have the same size,\n          this must be provided.\n        shuffle: Whether to shuffle the data (default: True)\n            If set to False, sorts the data in alphanumeric order.\n        seed: Optional random seed for shuffling and transformations.\n        follow_links: Whether to follow links inside\n            subdirectories (default: False).\n        validation_split: Optional float between 0 and 1,\n            fraction of data to reserve for validation.\n        subset: One of \"training\" or \"validation\". Only used if `validation_split` is set.\n    \"\"\"\n\ndef from_dataframe(\n    self,\n    dataframe,\n    directory=None,\n    data_column='filename',\n    target_column='class',\n    target_mode='categorical',\n    weight_column=None,\n    color_mode='rgb',\n    batch_size=32,\n    image_size=(255, 255),\n    shuffle=True,\n    seed=None,\n    validation_split=None,\n    subset=None):\n    \"\"\"Generates a Dataset from a Pandas dataframe.\n\n    # Arguments:\n        dataframe: Pandas dataframe instance.\n        directory: The directory that image paths refer to.\n        data_column: Name of column with the paths for the input images.\n        target_column: Name of column with the class information.\n        target_mode:\n            - 'categorical' means that the inferred labels are\n                encoded as a categorical vector (e.g. for categorical_crossentropy).\n            - 'binary' means that the inferred labels (there can be only 2)\n                are encoded as binary scalars (e.g. for binary_crossentropy).\n        weight_column: Name of column with sample weight information.\n        color_mode: One of \"grayscale\", \"rgb\", \"rgba\". Default: \"rgb\".\n            Whether the images will be converted to\n            have 1, 3, or 4 channels.\n        batch_size: Size of the batches of data (default: 32).\n        image_size: Size to resize images to after they are read from disk.\n          Since the pipeline processes batches of images that must all have the same size,\n          this must be provided.\n        shuffle: Whether to shuffle the data (default: True)\n            If set to False, sorts the data in alphanumeric order.\n        seed: Optional random seed for shuffling and transformations.\n        validation_split: Optional float between 0 and 1,\n            fraction of data to reserve for validation.\n        subset: One of \"training\" or \"validation\". Only used if `validation_split` is set.\n    \"\"\"\n\ndef preview(self, data, save_to_directory=None, save_prefix=None, save_format='png'):\n    \"\"\"Enables users to preview the image augmentation configuration.\n\n    # Arguments\n        data: Image data. Could be strings (a list of image paths), a list of PIL image instances,\n            a list of arrays, or a list of eager tensors.\n        save_to_directory: Directory to save transformed images. Mandatory if not in a notebook.\n            If in a notebook and this is not specified, images are displayed in-line.\n        save_prefix: String, filename prefix for saved images.\n        save_format: String, extension for saved images.\n    \"\"\"\n```\n\n**Note:** `from_arrays` is not included since it is possible to transform Numpy data simply by calling the `ImagePipeline` object (like a layer).\n\n\n### Layers\n\nThe new data augmentation layers will inherit `keras.layers.Layer` and work in a similar way.\n\n```python\nResizing(height, width)  # Resize while distorting aspect ratio\nCenterCrop(height, width)  # Resize without distorting aspect ratio\nRandomCrop(height, width, seed=None)  # Return a (height, width) crop from a random location\nRescaling(value)  # Divide by `value`\nRandomFlip(horizontal=False, vertical=False, seed=None)\nRandomTranslation(amplitude=0., fill_mode='constant', fill_value=0., seed=None)\nRandomRotation(amplitude=0., fill_mode='constant', fill_value=0., seed=None)\nRandomZoom(amplitude=0., fill_mode='constant', fill_value=0., seed=None)\nRandomBrightness(amplitude=0., seed=None)\nRandomContrast(amplitude=0., seed=None)\nRandomSaturation(amplitude=0., seed=None)\nRandomWidth(amplitude=0., seed=None)  # Expand / shrink width while distorting aspect ratio\nRandomHeight(amplitude=0., seed=None)  # Expand / shrink height while distorting aspect ratio\n```\n\nThe `amplitude` argument may be:\n- a positive float: it is understood as \"fraction of total\" (total is the current width, or height, or 180 degrees in the case `RandomRotation`). E.g. `0.2` results in variations in the [-20%, +20%] range. If larger than 1, it is rounded to one for the lower boundary (but not the higher boundary).\n- a tuple of 2 positive floats: understood as a fractional range, e.g. `(0.2, 0.4)` is interpreted as the [-20%, +40%] range. The first float may not be larger than 1.\n\nTo do a random center crop that zooms in and discards part of the image, you would do:\n\n```python\npreprocessor = ImagePipeline([\n  RandomZoom([0., 0.2]),\n  CenterCrop(height, width),\n])\n```\n\n\n#### Notes\n\n- We are dropping support for ZCA whitening as it is no longer popular in the computer vision community.\n- We don't have immediate support for random translations along only one axis.\n- We only plan on implementing support for `data_format='channels_last'`. As such this argument does not appear in the API.\n\n\n#### Example implementation\n\n```python\nclass RandomFlip(PreprocessingLayer):\n\n  def __init__(self, horizontal=False, vertical=False, seed=None):\n    self.horizontal = horizontal\n    self.vertical = vertical\n    self.seed = seed or random_int()\n    self._rng = rng_from_seed(seed)\n\n  def call(self, inputs, training=None, seed=None):\n    seed = seed or self._rng.sample()\n    if training:\n      if self.horizontal:\n        inputs = tf.image.random_flip_left_right(inputs, seed=seed)\n      if self.vertical:\n        inputs = tf.image.random_flip_up_down(inputs, seed=seed)\n    return inputs\n```\n\n\n\n#### Question: how to support image segmentation in a simple way?\n\n**Requirements:**\n- Image loading and image augmentation should be synced across inputs and targets\n- It should be possible to use different standardization preprocessing (outside of augmentation) across inputs and targets\n\n**Proposal:**\n\n```python\n# Shared spatial transformations for inputs and targets\naugmenter = ImagePipeline([\n    RandomRotation(0.5),\n    RandomFlip(vertical=True)\n])\n\ninput_pipeline = ImagePipeline([\n    augmenter,\n    RandomBrightness(0.2),\n    RandomContrast(0.2),\n    RandomSaturation(0.2),\n])\ntarget_pipeline = ImagePipeline([\n    augmenter,\n    OneHot(num_classes)\n])\n\ninput_ds = input_pipeline.from_directory(\n    input_dir, targets=None, image_size=(150, 150), batch_size=32,\n    seed=123)  # This seed supercedes the per-layer seed in all transformations\ntarget_ds = target_pipeline.from_directory(\n    target_dir,  # target_dir should have same structure as input_dir.\n    targets=None, image_size=(150, 150), batch_size=32, seed=123)\n\nds = tf.data.Dataset.zip((input_ds, target_ds))\nmodel.fit(ds)\n```\n\nNote that the behavior of having the `seed` argument in `from_directory` supercedes the per-layer argument is achieved by using the seed\nto sample new random ints (scalar tensors from `tf.random.experimental.Generator`) to serve as the `call` argument to each underlying layer.\n\n\n### TimeseriesGenerator\n\n- Deprecate existing `TimeSeriesGenerator` class\n- Introduce functional replacement `timeseries_dataset`:\n\n```python\ndef timeseries_dataset(\n      data, targets, length,\n      sampling_rate=1,\n      stride=1,\n      start_index=0,\n      end_index=None,\n      shuffle=False,\n      reverse=False,\n      batch_size=128):\n      \"\"\"Utility function for generating batches of temporal data.\n\n      This function takes in a sequence of data-points gathered at\n      equal intervals, along with time series parameters such as\n      stride, length of history, etc., to produce batches for\n      training/validation.\n\n      # Arguments\n          data: Indexable generator (such as list or Numpy array)\n              containing consecutive data points (timesteps).\n              The data should be at 2D, and axis 0 is expected\n              to be the time dimension.\n          targets: Targets corresponding to timesteps in `data`.\n              It should have same length as `data`.\n          length: Length of the output sequences (in number of timesteps).\n          sampling_rate: Period between successive individual timesteps\n              within sequences. For rate `r`, timesteps\n              `data[i]`, `data[i-r]`, ... `data[i - length]`\n              are used for create a sample sequence.\n          stride: Period between successive output sequences.\n              For stride `s`, consecutive output samples would\n              be centered around `data[i]`, `data[i+s]`, `data[i+2*s]`, etc.\n          start_index: Data points earlier than `start_index` will not be used\n              in the output sequences. This is useful to reserve part of the\n              data for test or validation.\n          end_index: Data points later than `end_index` will not be used\n              in the output sequences. This is useful to reserve part of the\n              data for test or validation.\n          shuffle: Whether to shuffle output samples,\n              or instead draw them in chronological order.\n          reverse: Boolean: if `true`, timesteps in each output sample will be\n              in reverse chronological order.\n          batch_size: Number of timeseries samples in each batch\n              (except maybe the last one).\n\n      # Returns\n          A Dataset instance.\n      \"\"\"\n```\n\n"
  },
  {
    "path": "rfcs/20191212-keras-categorical-inputs.md",
    "content": "# Keras categorical inputs\n\n| Status        | Implemented (https://github.com/tensorflow/community/pull/209) |\n:-------------- |:---------------------------------------------------- |\n| **Author(s)** | Zhenyu Tan (tanzheny@google.com), Francois Chollet (fchollet@google.com)|\n| **Sponsor**   | Karmel Allison (karmel@google.com), Martin Wicke (wicke@google.com) |\n| **Updated**   | 2019-02-22                                           |\n\n## Objective\n\nThis document proposes 5 new Keras preprocessing layers (KPL) (`StringLookup`, `CategoryCrossing`, `CategoryEncoding`, `Hashing`, `IntegerLookup`) and allow users to:\n* Perform basic feature engineering for categorical inputs\n* Replace feature columns and `tf.keras.layers.DenseFeatures` with proposed layers\n* Introduce sparse inputs that work with Keras linear models and other layers that support sparsity\n\nOther proposed layers for replacement of feature columns such as `tf.feature_column.bucketized_column` and `tf.feature_column.numeric_column` has been discussed [here](https://github.com/keras-team/governance/blob/master/rfcs/20190502-preprocessing-layers.md).\n\nThe proposed layers should support ragged tensors.\n\n## Motivation\n\nSpecifically, by introducing the 5 layers, we aim to address these pain points:\n* Users have to define both feature columns and Keras Inputs for the model, resulting in code duplication and deviation from DRY (Do not repeat yourself) principle. See this [Github issue](https://github.com/tensorflow/tensorflow/issues/27416).\n* Users with large dimension categorical inputs will incur large memory footprint and computation cost, if wrapped with indicator column through `tf.keras.layers.DenseFeatures`.\n* Currently there is no way to correctly feed Keras linear model or dense layer with multivalent categorical inputs or weighted categorical inputs, or shared embedding inputs.\n* Feature columns offer black-box implementations, mix feature engineering with trainable objects, and lead to\n  unintended coding pattern.\n\n## User Benefit\n\nWe expect to get rid of the user painpoints once migrating off feature columns.\n\n## Example Workflows\n\nTwo example workflows are presented below. These workflows can be found at this [colab](https://colab.sandbox.google.com/drive/1cEJhSYLcc2MKH7itwcDvue4PfvrLN-OR).\n\n### Workflow 1 -- Official guide on how to replace feature columns with KPL\n\nRefer to [tf.feature_column](https://www.tensorflow.org/api_docs/python/tf/feature_column) for a complete list of feature columns.\n\n1. Replacing `tf.feature_column.categorical_column_with_hash_bucket` with `Hashing`\nfrom\n```python\ntf.feature_column.categorical_column_with_hash_bucket(key, hash_bucket_size)\n```\nto\n```python\nkeras_input = tf.keras.Input(shape=(1,), name=key, dtype=dtype)\nhashed_input = tf.keras.experimental.preprocessing.Hashing(num_bins=hash_bucket_size)(keras_input)\n```\n\nNote the hashed output from KPL will be different than the hashed output from feature column, given how seed is choosen. `Hashing` also supports customized `salt`.\n\n2. `tf.feature_column.categorical_column_with_identity`\nThis feature column is merely for having identical inputs and outputs except mapping out-of-range value into `default_value`, thus can easily be done at data cleaning stage,\nnot be part of feature engineering, and hence dropped in this proposal.\n\n3. Replacing `tf.feature_column.categorical_column_with_vocabulary_file` and `tf.feature_column.categorical_column_with_vocabulary_list` with `StringLookup` or `IntegerLookup`.\nfor string inputs,\nfrom\n```python\ntf.feature_column.categorical_column_with_vocabulary_file(key, vocabulary_file, vocabulary_size, tf.dtypes.string, default_value, num_oov_buckets)\n```\nto\n```python\nkeras_input = tf.keras.Input(shape=(1,), name=key, dtype=tf.dtypes.string)\nid_input = tf.keras.experimental.preprocessing.StringLookup(max_tokens=vocabulary_size + num_oov_buckets,\n  num_oov_indices=num_oov_buckets, mask_token=None, vocabulary=vocabulary_file)(keras_input)\n```\n\nSimilarly, from\n```python\ntf.feature_column.categorical_column_with_vocabulary_list(key, vocabulary_list, tf.dtypes.string, default_value, num_oov_buckets)\n```\nto\n```python\nkeras_input = tf.keras.Input(shape=(1,), name=key, dtype=tf.dtypes.string)\nid_input = tf.keras.experimental.preprocessing.StringLookup(max_tokens=len(vocabulary_list) + num_oov_buckets, num_oov_indices=num_oov_buckets,\n  mask_token=None, vocabulary=vocabulary_list)(keras_input)\n```\n\n\nNote that `default_value` is mutually exclusive with `num_oov_buckets`, in the case of `num_oov_buckets=0` and `default_value=-1`, simply set `num_oov_indices=0`. We do not support\nany values other than `default_value=-1`.\n\nNote the out-of-range values for `StringLookup` is prepended, i.e., [0,..., num_oov_tokens) for out-of-range values, whereas for `categorical_colulmn_with_vocabulary_file` is\nappended, i.e., [vocabulary_size, vocabulary_size + num_oov_tokens) for out-of-range values. The former can give you more flexibility when reloading and adding vocab.\n\nFor integer inputs,\nfrom\n```python\ntf.feature_column.categorical_column_with_vocabulary_file(key, vocabulary_file, vocabulary_size, tf.dtypes.int64, default_value, num_oov_buckets)\n```\nto\n```python\nkeras_input = tf.keras.Input(shape=(1,), name=key, dtype=tf.dtypes.int64)\nid_input = tf.keras.experimental.preprocessing.IntegerLookup(max_values=vocabulary_size + num_oov_buckets, num_oov_indices=num_oov_buckets, mask_value=None, vocabulary=vocabulary_file)(keras_input)\n```\n\nSimilarly, from\n```python\ntf.feature_column.categorical_column_with_vocabulary_list(key, vocabulary_list, tf.dtypes.int64, default_value, num_oov_buckets)\n```\nto\n```python\nkeras_input = tf.keras.Input(shape=(1,), name=key, dtype=tf.dtypes.int64)\nid_input = tf.keras.experimental.preprocessing.IntegerLookup(max_values=len(vocabulary_list) + num_oov_buckets, num_oov_indices=num_oov_buckets, mask_value=None, vocabulary=vocabulary_list)(keras_input)\n```\n\n\n4. Replacing `tf.feature_column.crossed_column` with `CategoryCrossing` or `Hashing`\nfrom\n```python\ntf.feature_column.crossed_column(keys, hash_bucket_size, hash_key)\n```\nto\n```python\nkeras_inputs = []\nfor key in keys:\n  keras_inputs.append(tf.keras.Input(shape=(1,), name=key, dtype=tf.dtypes.string))\nhashed_input = tf.keras.layers.experimental.preprocessing.Hashing(num_bins=hash_bucket_size)(keras_inputs)\n```\n\nNote when `hash_bucket_size=0`, no hashing is performed, in this case it should be replaced with:\n```python\nkeras_inputs = []\nfor key in keys:\n  keras_inputs.append(tf.keras.Input(shape=(1,), name=key, dtype=tf.dtypes.string))\ncrossed_input = tf.keras.layers.experimental.preprocessing.CategoryCrossing()(keras_inputs)\n```\n\n5. Replacing `tf.feature_column.embedding_column` with `tf.keras.layers.Embedding`\nNote that `combiner=sum` can be replaced with `tf.reduce_sum` and `combiner=mean` with `tf.reduce_mean` after\nthe embedding output. `sqrtn` can also be implemented using tf operations. For example:\n```python\ncategorical_column = tf.feature_column.categorical_column_with_vocabulary_list(key, vocabulary_list)\ntf.feature_column.embedding_column(categorical_column, dimension=dimension, combiner=\"sum\", initializer=initializer,\n  max_norm=max_norm)\n```\ncan be replaced with:\n```python\ncategorical_input = tf.keras.Input(name=key, dtype=tf.string)\nid_input = tf.keras.layers.experimental.preprocessing.StringLookup(vocabulary=vocabulary_list)(categorical_input)\nembedding_input = tf.keras.layers.Embedding(input_dim=len(vocabulary_list), output_dim=dimension,\n  embeddings_initializer=initializer, embeddings_constraint=tf.keras.constraints.MaxNorm(max_norm))(id_input)\nembedding_input = tf.reduce_sum(embedding_input, axis=-2)\n```\n\n6. Replacing `tf.feature_column.indicator_column` with `CategoryEncoding`\nfrom\n```python\ncategorical_column = tf.feature_column.categorical_column_with_vocabulary_list(key, vocabulary_list)\ntf.feature_column.indicator_column(categorical_column)\n```\nto\n```python\ncategorical_input = tf.keras.Input(name=key, dtype=tf.string)\nid_input = tf.keras.layers.experimental.preprocessing.StringLookup(vocabulary=vocabulary_list)(categorical_input)\nencoded_input = tf.keras.layers.experimental.preprocessing.CateogoryEncoding(\n  max_tokens=categorical_column.num_buckets, output_mode=\"count\", sparse=True)(id_input)\n```\n\nNote that `CategoryEncoding` supports one-hot through `output_mode=\"binary\"` as well. This is a much more\nefficient approach than `tf.one_hot` + `tf.reduce_sum(axis=-2)` to reduce the multivalent categorical inputs.\n\nNote that by specifing `sparse` flag, the output can be either a `tf.Tensor` or `tf.SparseTensor`.\n\n7. Replacing `tf.feature_column.weighted_categorical_column` with `CategoryEncoding`\nfrom\n```python\ncategorical_column = tf.feature_column.categorical_column_with_vocabulary_list(key, vocabulary_list)\ntf.feature_column.weighted_categorical_column(categorical_column, weight_feature_key)\n```\nto\n```python\ncategorical_input = tf.keras.Input(name=key, dtype=tf.string)\nlookup_output = tf.keras.layers.experimental.preprocessing.StringLookup(vocabulary=vocabulary_list)(categorical_input)\nweight_input = tf.keras.Input(shape=(1,), dtype=tf.float32, name=weight_feature_key)\nweighted_output = tf.keras.layers.experimental.preprocessing.CategoryEncoding(\n  max_tokens=categorical_column.num_buckets)(lookup_output, weight_input)\n```\n\n8. Replacing `tf.feature_column.shared_embeddings` with a single `tf.keras.layers.Embedding`.\nSimilar to 5, but with multiple categorical inputs:\nfrom\n```python\nwatched_video_id = tf.feature_column.categorical_column_with_vocabulary_list('watched_video_id', video_vocab_list)\nimpression_video_id = tf.feature_column.categorical_column_with_vocabulary_list('impression_video_id', video_vocab_list)\ntf.feature_column.shared_embeddings([watched_video_id, impression_video_id], dimension)\n```\nto\n```python\nwatched_video_input = tf.keras.Input(shape=(1,), name='watched_video_id', dtype=tf.int64)\nimpression_video_input = tf.keras.Input(shape=(1,), name='impression_video_id', dtype=tf.int64)\nembed_layer = tf.keras.layers.Embedding(input_dim=len(video_vocab_list), output_dim=dimension)\nembedded_watched_video_input = embed_layer(watched_video_input)\nembedded_impression_video_input = embed_layer(impression_video_input)\n```\n\n9. Replacing `tf.estimator.LinearXXX` with `CategoryEncoding` and `tf.keras.experimental.LinearModel`.\nLinearClassifier or LinearRegressor treats categorical columns by multi-hot, this can be replaced by encoding layer and Keras linear model, see Workflow 2 for details.\n\n10. Replacing `tf.feature_column.numeric_column` and `tf.feature_column.sequence_numeric_column` with `tf.keras.Input` and `Normalization`.\n`tf.keras.layers.experimental.preprocessing.Normalization` with `set_weights` on mean and standard deviation.\n\n11. Replacing `tf.feature_column.sequence_categorical_xxx`.\nReplacing `tf.feature_column.sequence_categorical_xxx` is similar to `tf.feature_column.categorical_xxx` except `tf.keras.Input` should take time dimension into\n`input_shape` as well.\n\n12. Replacing `tf.feature_column.bucketized_column` with `Discretization`.\nfrom\n```python\nsource_column = tf.feature_column.numeric_column(key)\ntf.feature_column.bucketized_column(source_column, boundaries)\n```\nto\n```python\nkeras_input = tf.keras.Input(shape=(1,), name=key, dtype=tf.float32)\nbucketized_input = tf.keras.experimental.preprocessing.Discretization(bins=boundaries)(keras_input)\n```\n\n\n### Workflow 2 -- Complete Example\n\nThis example gives an equivalent code snippet to canned `LinearEstimator` [tutorial](https://www.tensorflow.org/tutorials/estimator/linear) on the Titanic dataset:\n\nRefer to this [colab](https://colab.sandbox.google.com/drive/1cEJhSYLcc2MKH7itwcDvue4PfvrLN-OR) to reproduce.\n\n```python\ndftrain = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/train.csv')\ny_train = dftrain.pop('survived')\n\nSTRING_CATEGORICAL_COLUMNS = ['sex', 'class', 'deck', 'embark_town', 'alone']\nINT_CATEGORICAL_COLUMNS = ['n_siblings_spouses', 'parch']\nNUMERIC_COLUMNS = ['age', 'fare']\n\nkeras_inputs = {}\nkeras_preproc_inputs = []\nfor key in STRING_CATEGORICAL_COLUMNS:\n  keras_input = tf.keras.Input(shape=(1,), dtype=tf.string, name=key)\n  keras_inputs[key] = keras_input\n  vocab = dftrain[key].unique()\n  keras_preproc_input = tf.keras.layers.experimental.preprocessing.StringLookup(vocabulary=vocab, num_oov_indices=0, mask_token=None, name='lookup' + key)(keras_input)\n  keras_preproc_input = tf.keras.layers.experimental.preprocessing.CategoryEncoding(max_tokens=len(vocab), output_mode='count', sparse=True, name='encode' + key)(keras_preproc_input)\n  keras_preproc_inputs.append(keras_preproc_input)\n\nfor key in INT_CATEGORICAL_COLUMNS:\n  keras_input = tf.keras.Input(shape=(1,), dtype=tf.int64, name=key)\n  keras_inputs[key] = keras_input\n  vocab = dftrain[key].unique()\n  keras_preproc_input = tf.keras.layers.experimental.preprocessing.IntegerLookup(vocabulary=vocab, num_oov_indices=0, mask_value=None, name='lookup' + key)(keras_input)\n  keras_preproc_input = tf.keras.layers.experimental.preprocessing.CategoryEncoding(max_tokens=len(vocab), output_mode='count', sparse=True, name='encode' + key)(keras_preproc_input)\n  keras_preproc_inputs.append(keras_preproc_input)\n\nfor key in NUMERIC_COLUMNS:\n  keras_input = tf.keras.Input(shape=(1,), dtype=tf.float32, name=key)\n  keras_inputs[key] = keras_input\n  keras_preproc_inputs.append(keras_preproc_input)\n\nage_x_sex = tf.keras.layers.experimental.preprocessing.CategoryCrossing(name='age_x_sex_crossing')([keras_inputs['age'], keras_inputs['sex']])\nage_x_sex = tf.keras.layers.experimental.preprocessing.Hashing(num_bins=100, name='age_x_sex_hashing')(age_x_sex)\nkeras_output_age_x_sex = tf.keras.layers.experimental.preprocessing.CategoryEncoding(max_tokens=100, output_mode='count', sparse=True, name='age_x_sex_encoding')(age_x_sex)\nkeras_preproc_inputs.append(keras_output_age_x_sex)\n\n\nlinear_model = tf.keras.experimental.LinearModel(units=1, kernel_initializer='zeros', activation='sigmoid')\nlinear_logits = linear_model(keras_preproc_inputs)\nsorted_keras_inputs = tuple(keras_inputs[key] for key in sorted(keras_inputs.keys()))\nmodel = tf.keras.Model(sorted_keras_inputs, linear_logits)\n\nmodel.compile('ftrl', 'binary_crossentropy', metrics=['accuracy'])\n\ndf_dataset = tf.data.Dataset.from_tensor_slices((dict(dftrain), y_train))\ndef encode_map(features, labels):\n  encoded_features = tuple(tf.expand_dims(features[key], axis=1) for key in sorted(features.keys()))\n  return (encoded_features, labels)\nencoded_dataset = df_dataset.batch(32).map(encode_map)\n\nmodel.fit(encoded_dataset)\n```\n\n## Design Proposal\n\n```python\n`tf.keras.layers.StringLookup`\nStringLookup(PreprocessingLayer):\n\"\"\"This layer transforms categorical inputs to index space.\n   If input is dense/sparse, then output is dense/sparse.\"\"\"\n\n  def __init__(self, max_tokens=None, num_oov_indices=1, mask_token=\"\",\n               oov_token=\"[UNK]\", vocabulary=None, encoding=None,\n               invert=False, name=None, **kwargs):\n    \"\"\"Constructs a IndexLookup layer.\n\n    Args:\n      max_tokens: The maximum size of the vocabulary for this layer. If None,\n              there is no cap on the size of the vocabulary. Note that this vocabulary\n              includes the OOV and mask tokens, so the effective number of tokens is\n              (max_tokens - num_oov_indices - (1 if mask_token else 0))\n      num_oov_indices: The number of out-of-vocabulary tokens to use; defaults to\n              1. If this value is more than 1, OOV inputs are hashed to determine their\n              OOV value; if this value is 0, passing an OOV input will result in a '-1'\n              being returned for that value in the output tensor. (Note that, because\n              the value is -1 and not 0, this will allow you to effectively drop OOV\n              values from categorical encodings.)\n      mask_token: A token that represents masked values, and which is mapped to\n              index 0. Defaults to the empty string \"\". If set to None, no mask term\n              will be added and the OOV tokens, if any, will be indexed from\n              (0...num_oov_indices) instead of (1...num_oov_indices+1).\n      oov_token: The token representing an out-of-vocabulary value. Defaults to\n              \"[UNK]\".\n      vocabulary: An optional list of vocabulary terms, or a path to a text file\n              containing a vocabulary to load into this layer. The file should contain\n              one token per line. If the list or file contains the same token multiple\n              times, an error will be thrown.\n      encoding: The Python string encoding to use. Defaults to `'utf-8'`.\n      invert: If true, this layer will map indices to vocabulary items instead\n              of mapping vocabulary items to indices.\n      name: Name of the layer.\n      **kwargs: Keyword arguments to construct a layer.\n\n    Input shape:\n            a string or int tensor of shape `[batch_size, d1, ..., dm]`\n    Output shape:\n            an int tensor of shape `[batch_size, d1, ..., dm]`\n\n    Example:\n      >>> vocab = [\"a\", \"b\", \"c\", \"d\"]\n      >>> data = tf.constant([[\"a\", \"c\", \"d\"], [\"d\", \"z\", \"b\"]])\n      >>> layer = StringLookup(vocabulary=vocab)\n      >>> layer(data)\n      <tf.Tensor: shape=(2, 3), dtype=int64, numpy=\n      array([[2, 4, 5],\n             [5, 1, 3]])>\n    \"\"\"\n    pass\n\n\n`tf.keras.layers.IntegerLookup`\nIntegerLookup(PreprocessingLayer):\n\"\"\"This layer transforms categorical inputs to index space.\n   If input is dense/sparse, then output is dense/sparse.\"\"\"\n\n  def __init__(self, max_values=None, num_oov_indices=1, mask_value=0,\n               oov_value=-1, vocabulary=None, invert=False, name=None, **kwargs):\n    \"\"\"Constructs a IndexLookup layer.\n\n    Args:\n      max_values: The maximum size of the vocabulary for this layer. If None,\n              there is no cap on the size of the vocabulary. Note that this vocabulary\n              includes the OOV and mask values, so the effective number of values is\n              (max_values - num_oov_values - (1 if mask_token else 0))\n      num_oov_indices: The number of out-of-vocabulary values to use; defaults to\n              1. If this value is more than 1, OOV inputs are modulated to determine\n              their OOV value; if this value is 0, passing an OOV input will result in\n              a '-1' being returned for that value in the output tensor. (Note that,\n              because the value is -1 and not 0, this will allow you to effectively drop\n              OOV values from categorical encodings.)\n      mask_value: A value that represents masked inputs, and which is mapped to\n              index 0. Defaults to 0. If set to None, no mask term will be added and the\n              OOV values, if any, will be indexed from (0...num_oov_values) instead of\n              (1...num_oov_values+1).\n      oov_value: The value representing an out-of-vocabulary value. Defaults to -1.\n      vocabulary: An optional list of values, or a path to a text file containing\n              a vocabulary to load into this layer. The file should contain one value\n              per line. If the list or file contains the same token multiple times, an\n              error will be thrown.\n      invert: If true, this layer will map indices to vocabulary items instead\n              of mapping vocabulary items to indices.\n      name: Name of the layer.\n      **kwargs: Keyword arguments to construct a layer.\n\n    Input shape:\n            a string or int tensor of shape `[batch_size, d1, ..., dm]`\n    Output shape:\n            an int tensor of shape `[batch_size, d1, ..., dm]`\n\n    Example:\n      >>> vocab = [12, 36, 1138, 42]\n      >>> data = tf.constant([[12, 1138, 42], [42, 1000, 36]])\n      >>> layer = IntegerLookup(vocabulary=vocab)\n      >>> layer(data)\n      <tf.Tensor: shape=(2, 3), dtype=int64, numpy=\n      array([[2, 4, 5],\n             [5, 1, 3]])>\n    \"\"\"\n    pass\n\n\n`tf.keras.layers.CategoryCrossing`\nCategoryCrossing(PreprocessingLayer):\n\"\"\"This layer transforms multiple categorical inputs to categorical outputs\n   by Cartesian product, and hash the output if necessary.\n   If any of the inputs is sparse, then all outputs will be sparse. Otherwise, all outputs will be dense.\"\"\"\n\n  def __init__(self, depth=None, separator=None, name=None, **kwargs):\n    \"\"\"Constructs a CategoryCrossing layer.\n    Args:\n      depth: depth of input crossing. By default None, all inputs are crossed into\n            one output. It can also be an int or tuple/list of ints. Passing an\n            integer will create combinations of crossed outputs with depth up to that\n            integer, i.e., [1, 2, ..., `depth`), and passing a tuple of integers will\n            create crossed outputs with depth for the specified values in the tuple,\n            i.e., `depth`=(N1, N2) will create all possible crossed outputs with depth\n            equal to N1 or N2. Passing `None` means a single crossed output with all\n            inputs. For example, with inputs `a`, `b` and `c`, `depth=2` means the\n            output will be [a;b;c;cross(a, b);cross(bc);cross(ca)].\n      separator: A string added between each input being joined. Defaults to '_X_'.\n      name: Name to give to the layer.\n      **kwargs: Keyword arguments to construct a layer.\n\n    Input shape: a list of string or int tensors or sparse tensors of shape\n            `[batch_size, d1, ..., dm]`\n\n    Output shape: a single string or int tensor or sparse tensor of shape\n            `[batch_size, d1, ..., dm]`\n\n    Example: (`depth`=None)\n      If the layer receives three inputs:\n      `a=[[1], [4]]`, `b=[[2], [5]]`, `c=[[3], [6]]`\n      the output will be a string tensor:\n      `[[b'1_X_2_X_3'], [b'4_X_5_X_6']]`\n    \"\"\"\n    pass\n\n`tf.keras.layers.CategoryEncoding`\nCategoryEncoding(PreprocessingLayer):\n\"\"\"This layer transforms categorical inputs from index space to category space.\n   If input is dense/sparse, then output is dense/sparse.\"\"\"\n\n  def __init__(self, max_tokens=None, output_mode=\"binary\", sparse=False, name=None, **kwargs):\n    \"\"\"Constructs a CategoryEncoding layer.\n    Args:\n      max_tokens: The maximum size of the vocabulary for this layer. If None,\n              there is no cap on the size of the vocabulary.\n      output_mode: Specification for the output of the layer.\n              Defaults to \"binary\". Values can be \"binary\", \"count\" or \"tf-idf\",\n              configuring the layer as follows:\n              \"binary\": Outputs a single int array per batch, of either vocab_size or\n                max_tokens size, containing 1s in all elements where the token mapped\n                to that index exists at least once in the batch item.\n              \"count\": As \"binary\", but the int array contains a count of the number\n                of times the token at that index appeared in the batch item.\n              \"tf-idf\": As \"binary\", but the TF-IDF algorithm is applied to find the\n                value in each token slot.\n      sparse: Boolean. If true, returns a `SparseTensor` instead of a dense\n              `Tensor`. Defaults to `False`.\n      name: Name to give to the layer.\n     **kwargs: Keyword arguments to construct a layer.\n\n    Input shape: A int tensor of shape `[batch_size, d1, ..., dm-1, dm]`\n    Output shape: a float tensor of shape `[batch_size, d1, ..., dm-1, num_categories]`\n\n    Example:\n      >>> layer = tf.keras.layers.experimental.preprocessing.CategoryEncoding(\n      ...           max_tokens=4, output_mode=\"count\")\n      >>> layer([[0, 1], [0, 0], [1, 2], [3, 1]])\n      <tf.Tensor: shape=(4, 4), dtype=float32, numpy=\n        array([[1., 1., 0., 0.],\n               [2., 0., 0., 0.],\n               [0., 1., 1., 0.],\n               [0., 1., 0., 1.]], dtype=float32)>\n    \"\"\"\n    pass\n\n`tf.keras.layers.Hashing`\nHashing(PreprocessingLayer):\n\"\"\"This layer transforms categorical inputs to hashed output.\n   If input is dense/sparse, then output is dense/sparse.\"\"\"\n  def __init__(self, num_bins, salt=None, name=None, **kwargs):\n    \"\"\"Constructs a Hashing layer.\n\n    Args:\n      num_bins: Number of hash bins.\n      salt: A single unsigned integer or None.\n              If passed, the hash function used will be SipHash64, with these values\n              used as an additional input (known as a \"salt\" in cryptography).\n              These should be non-zero. Defaults to `None` (in that\n              case, the FarmHash64 hash function is used). It also supports\n              tuple/list of 2 unsigned integer numbers, see reference paper for details.\n      name: Name to give to the layer.\n      **kwargs: Keyword arguments to construct a layer.\n\n    Input shape: A single or list of string, int32 or int64 `Tensor`,\n            `SparseTensor` or `RaggedTensor` of shape `[batch_size, ...,]`\n\n    Output shape: An int64 `Tensor`, `SparseTensor` or `RaggedTensor` of shape\n            `[batch_size, ...]`. If any input is `RaggedTensor` then output is\n            `RaggedTensor`, otherwise if any input is `SparseTensor` then output is\n            `SparseTensor`, otherwise the output is `Tensor`.\n\n    Example:\n      >>> layer = tf.keras.layers.experimental.preprocessing.Hashing(num_bins=3)\n      >>> inp = [['A'], ['B'], ['C'], ['D'], ['E']]\n      >>> layer(inp)\n      <tf.Tensor: shape=(5, 1), dtype=int64, numpy=\n        array([[1],\n               [0],\n               [1],\n               [1],\n               [2]])>\n    \"\"\"\n    pass\n\n```\n\n### Alternatives Considered\nAn alternative is to provide solutions on top of feature columns. This will make user code to be slightly cleaner but far less flexible.\n\n### Performance Implications\nEnd to End benchmark should be same or faster than feature columns implementations.\n\n### Dependencies\nThis proposal does not add any new dependencies.\n\n### Engineering Impact\nThese changes will include more layers and thus binary size and build time. It will not impact startup time.\nThis code can be tested in its own and maintained in its own buildable unit.\n\n### Platforms and Environments\nThis proposal should work in all platforms and environments.\n\n### Best Practices, Tutorials and Examples\nThis proposal does not change the best engineering practices.\n\n### Compatibility\nNo backward compatibility issues.\n\n### User Impact\nUser facing changes to migrate feature column based Keras modeling to preprocessing layer based Keras modeling, as the example workflow suggests.\n\n## Questions and Meeting Notes\nWe'd like to gather feedbacks on `IndexLookup`, specifically we propose migrating off from mutually exclusive `num_oov_buckets` and `default_value` and replace with `num_oov_tokens`.\n1. Naming for encoding v.s. vectorize: encoding can mean many things, vectorize seems to general. We will go with \"CategoryEncoding\"\n2. \"mode\" should be \"count\" or \"avg_count\", instead of \"sum\" and \"mean\".\n3. Rename \"sparse_combiner\" to \"mode\", which aligns with scikit-learn.\n4. Have a 'sparse_out' flag for \"CategoryEncoding\" layer.\n5. Hashing -- we refer to hashing when we mean fingerprinting. Keep using \"Hashing\" for layer name, but document how it relies on tf.fingerprint, and also provides option for salt.\n5. Rename \"CategoryLookup\" to \"IndexLookup\"\n\n## Updates on 07/14/20\nMark the RFC as completed, update the layer naming and arguments.\n"
  },
  {
    "path": "rfcs/20200826-keras-nlp-scoping-design.md",
    "content": "# Keras NLP\n\n| Status        | Proposed      |\n:-------------- |:---------------------------------------------------- |\n| **Author(s)** | Zhenyu Tan (tanzheny@google.com), Mark Omernick (momernick@google.com), Francois Chollet (fchollet@google.com), Hongkun Yu (hongkuny@google.com)|\n| **Updated**   | 2020-09-11                                           |\n\n\n## Objective\n\nWe aim at describing the scope of [keras-nlp](https://github.com/keras-team/keras-nlp), especially:\n\n- What use cases `keras-nlp` should cover\n- Boundaries between `keras-nlp` and [tensorflow addons](https://github.com/tensorflow/addons)\n- Boundaries between `keras-nlp` and [tensorflow model garden](https://github.com/tensorflow/models)\n- Boundaries between `keras-nlp` and [tf.keras](https://www.tensorflow.org/api_docs/python/tf/keras).\n- Boundaries between `keras-nlp` and [tf.text](https://www.tensorflow.org/tutorials/tensorflow_text/intro).\n\n## Motivation\n\nNatural Language Processing (NLP) is a major application area for our users.\nIn recent years, Transformer-based models have become the foundation of many NLP workflows.\nThese workflows tend to reuse similar components, for which in some cases third-party packages\nhave been developed by the open-source community.\n\nThese third-party solutions are not always kept up to date or up to the same quality standards as core Keras.\nThey also raise the issue of API standardization.\n\nTo fix this, we want machine learning engineers to have access to a standard Keras-native,\noptimized, and well-tested set of components to build their Transformer-based (and beyond) NLP workflows.\n\nThis provides key user benefits:\n\n- The package would be first-party and thus always up to date with modern best practices.\n- High code quality and testing standards and strict quality control: same level of trust as core Keras\n- A shared API standard across the community\n- Ability for the open-source community to build more advanced solutions *on top* of this package instead of reinventing it\n- Ability for research scientists to benefit from subclassing and customizing base components to quickly test new research ideas\n\n## Design Proposal\n\n`keras-nlp` will include most standard Transformer-based modules, specifically:\n\n- Keras layer components such as Transformer encoder and decoder blocks.\n- Keras task components such as masked language, span labeler and named entity recognition.\n- Tensorflow operations such as beam search.\n- Keras optimizer utilities such as learning rate schedules widely used.\n- Data loader and preprocessing for different dataset, such as SQUAD, GLUE.\n\n### Success criteria for keras-nlp\n\n- Reusable and standardized components that cover the above\n- Easy-to-use API\n- Models run on CPU/GPU/TPU seamlessly\n- State of the art performance\n- Models can be readily deployed to production\n\n### Boundaries between keras-nlp and tf.text\n\n- `tf.text` will contain all pre-processing operations, such as WordPiece Tokenizer, n-grams, that handles strings.\n- `keras-nlp` will contain modeling components that cover workflows past the tokenization stage.\n\n### Boundaries between `keras-nlp` and TensorFlow Addons:\n\n- Highly experimental modeling, layers, losses, etc, live in Addons (e.g. newly published research code).\n- Components from Addons will graduate to Model Garden, given they get sufficient usage,\nand given that they work on CPU/GPU/TPU. The API interface will remain experimental for a short time after graduation,\nso as to leave us the option to make changes based on user feedback.\n\n### Boundaries between keras-nlp and Model Garden\n\n- End to end modeling workflow and model specific details live in Model Garden\n- Model garden will re-use most of the building blocks from keras-nlp\n- Components from Model Garden can graduate to keras-nlp, given they get sufficient usage,\nand given that they work on CPU/GPU/TPU. The API interface should remain stable after graduation.\n\n### Boundaries between keras-nlp and core Keras\n\n- `keras-nlp` will contain NLP-specific components\n(e.g. the `MultiHeadAttention` layer may be used outside of NLP, and thus is shipping in core Keras).\n- Components from keras-nlp can graduate to Keras core, given its usage expands beyond\n natural language processing.\n\n## Dependencies\n\n- Tensorflow version >= 2.4\n- Tensorflow datasets\n\n## Backwards compatibility\n\nWe propose to guarantee major release backwards compatibility.\n\n## Maintenance\n\nThe `keras-nlp` codebase will be primarily maintained by the Keras team at Google,\nwith help and contributions from the community. The codebase will be developed\non GitHub as part of the `keras-team` organization. The same process for tracking\nissues and reviewing PRs will be used as for the core Keras repository.\n\n## Performance Benchmark\n\nWe will set up Keras benchmark utilities to help users contribute to this repository.\n\nDetailed design will be shared in a separate document (this document only focuses on scope).\n\n## Questions and Discussion Topics\n\nPlease share any questions or suggestion.\n"
  },
  {
    "path": "rfcs/20200827-keras-cv-scoping-design.md",
    "content": "# Keras CV\n\n| Status        | Proposed      |\n:-------------- |:---------------------------------------------------- |\n| **Author(s)** | Zhenyu Tan (tanzheny@google.com), Francois Chollet (fchollet@google.com) |\n| **Updated**   | 2020-08-27                                           |\n\n\n## Objective\n\nThis document describes the scope of the [keras-cv](https://github.com/keras-team/keras-cv) package, especially:\n- What use cases `keras-cv` should cover\n- Boundaries between `keras-cv` and [TensorFlow Addons](https://github.com/tensorflow/addons)\n- Boundaries between `keras-cv` and [TensorFlow model garden](https://github.com/tensorflow/models)\n- Boundaries between `keras-cv` and [tf.keras.applications](https://keras.io/api/applications/)\n\n## Motivation\n\nComputer vision (CV) is a major application area for our users.\nKeras on its own provides good support for image classification tasks, in particular via `tf.keras.applications`.\nHowever, a Keras-native modeling solutions for more advanced tasks,\nsuch as object detection, instance segmentation, etc., is still lacking.\n\nAs a result, the open-source community has rolled out many different solutions for these use cases,\nmade available via PyPI and GitHub. These third-party solutions are not always kept up to date, and\nmany still rely on the legacy multi-backend Keras. They also raise the issue of API standardization.\n\nTo fix this, we want machine learning engineers to have access to a standard Keras-native,\noptimized, and well-tested set of components to build their advanced computer vision models.\n\nThis provides key user benefits:\n\n- The package would be first-party and thus always up to date with modern best practices.\n- High code quality and testing standards and strict quality control: same level of trust as core Keras\n- A shared API standard across the community\n- Ability for the open-source community to build more advanced solutions *on top* of this package instead of reinventing it\n\n## Design Proposal\n\n`keras-cv` will provide components that cover the following areas:\n\n- Object Detection tasks.\n- Instance Segmentation tasks.\n- Semantic Segmentation tasks.\n- Keypoint Detection tasks.\n- Video Classification tasks.\n- Object Tracking tasks.\n\nSpecifically, for Object Detection tasks, `keras-cv` will include most anchor-based modules:\n\n- Common objects such as anchor generator, box matcher.\n- Keras layer components such as ROI generator, NMS postprocessor.\n- Keras backbone components that fills the gap from keras-applications.\n- Keras losses and metrics, such as Focal loss and coco metrics.\n- Data loader and preprocessing for different dataset, such as COCO.\n\nFor Semantic Segmentation tasks, `keras-cv` will include:\n\n- Keras head components such as Atrous Spatial Pyramid Pooling (ASPP).\n\n### Success criteria for `keras-cv`\n\n- Cover all modeling tasks listed above\n- Easy-to-use API\n- Models run on CPU/GPU/TPU seamlessly\n- State of the art performance\n- Models can be readily deployed to production\n\n### Boundaries between keras-cv and keras-applications\n\n- keras-applications will be improved to include basic building blocks such as mobilenet bottleneck, that\n include feature maps\n- keras-cv will depend on keras-applications for importing backbones.\n\n### Boundaries between keras-cv and Tensorflow Addons\n\n- Highly experimental modeling, layers, losses, etc, live in addons.\n- Components from addons will graduate to keras-cv, given it incurs more usage,\n and it works in CPU/GPU/TPU. The API interface will remain experimental after graduation.\n\n### Boundaries between keras-cv and Model Garden\n\n- End to end modeling workflow and model specific details live in Model Garden\n- Model garden will re-use most of the building blocks from keras-cv and Tensorflow Addons.\n- Components from Model Garden can graduate to keras-cv, given it is widely accepted, \n it works performant in CPU/GPU/TPU. The API interface should remain stable after graduation.\n\n## Dependencies\n\n- Tensorflow version >= 2.4\n- Tensorflow datasets\n- Keras-applications\n\n## Backwards compatibility\n\nWe propose to guarantee major release backwards compatibility.\n\n## Maintenance & development process\n\nThe `keras-cv` codebase will be primarily maintained by the Keras team at Google,\nwith help and contributions from the community. The codebase will be developed\non GitHub as part of the `keras-team` organization. The same process for tracking\nissues and reviewing PRs will be used as for the core Keras repository.\n\n## Performance benchmark\n\nWe will set up Keras benchmark utilities to help users contribute to this repository.\n\n## Detailed Design\n\nDetailed design will be shared in a separate document (this document only focuses on scope).\n\n## Questions and Discussion Topics\n\nPlease share any questions or suggestion.\n"
  },
  {
    "path": "rfcs/20200920-keras-nlp-bert.md",
    "content": "# keras-nlp Transformer Encoder API\n\n| Status        | Proposed      |\n:-------------- |:---------------------------------------------------- |\n| **Author(s)** | Zhenyu Tan (tanzheny@google.com), Francois Chollet (fchollet@google.com), Hongkun Yu (hongkuny@google.com)|\n| **Sponsor(s)** | Mark Omernick (momernick@google.com)|\n| **Updated**   | 2020-09-21                                           |\n\n\n## Objective\n\nWe aim at providing a set of Keras layers to handle Transformer-Encoder BERT-style models.\n\n## Key Benefits\n\nBERT-style Transformer-Encoders are a state-of-art technique that powers many NLP tasks:\n\n- Single sentence classification task, e.g., sentiment analysis\n- Sentence pair classification task, e.g., next sentence prediction\n- Question answering task, e.g., SQuAD\n- Single sentence tagging task, e.g., named entity recognition\n\nWith this proposal, Keras users will be able to handle the tasks above with a simple API. \n\n## Design overview\n\nThis proposal builds on the assumption that inputs are lookup indices, i.e., `tf.int64` sequences.\nTokenization is not part of this proposal but will be our immediate next step.\n\n### Classification task\n\nCase where a user want to use a pretrained BERT encoder for sentiment analysis:\n\n```python\n# Considering a imbd review dataset\nimport tensorflow as tf\nimport tensorflow_datasets as tfds\nimport keras_nlp\nimport tensorflow_text as tftext\n\nimdb_reviews = tfds.load('imdb_reviews')\ntrain_ds = imdb_reviews['train'].batch(32)\ntest_ds = imdb_reviews['test'].batch(32)\n\n# Tokenization with BertTokenizer\nvocab_path = \"gs://<bucket_name>/<file_path>/vocab.txt\"\ntokenizer = tftext.BertTokenizer(vocab_path, token_out_type=tf.int64, lower_case=False)\nSEQUENCE_LENGTH = 128\ndef preprocess(input_text):\n  token_ids = tokenizer.tokenize_with_offsets(input_text)\n  segment_ids = tf.concat([tf.zeros_like(cls), tf.ones_like(token_ids), tf.ones_like(sep)], axis=1)\n  output_shape = [None, SEQUENCE_LENGTH]\n  token_ids = token_ids.merge_dims(-2, -1)\n  segment_ids = segment_ids.merge_dims(-2, -1).to_tensor(shape=output_shape)\n  input_mask = tf.ones_like(token_ids).to_tensor(shape=output_shape)\n  token_ids = token_ids.to_tensor(shape=output_shape)\n  return {\n      'input_ids': token_ids,\n      'input_mask': input_mask,\n      'input_type_ids': segment_ids\n  }\n\nstrategy = tf.distribute.TPUStrategy(...)\nwith strategy.scope():\n  encoder = keras_nlp.encoders.BertEncoder(vocab_size=30522, max_sequence_length=512, type_vocab_size=2)\n  encoder.load_weights(\"gs://<bucket_name>/<file_path>\")\n  token_ids = tf.keras.layers.Input(shape=(SEQUENCE_LENGTH,), name='input_ids', dtype=tf.int32)\n  input_mask = tf.keras.layers.Input(shape=(SEQUENCE_LENGTH,), name='input_mask', dtype=tf.int32)\n  type_ids = tf.keras.layers.Input(shape=(128,), name='input_type_ids', dtype=tf.int32)\n  x = encoder([token_ids, input_mask, type_ids])['pooled_output']\n  x = tf.keras.layers.Dropout(rate=0.1)(x)\n  output = tf.keras.layers.Dense(1, activation='sigmoid')(x)\n  model = tf.keras.Model(inputs=[token_ids, input_mask, type_ids], outputs=output)\n\nmodel.compile('adam', 'binary_crossentropy', ['accuracy'])\nmodel.fit(train_ds, epochs=5, validation_data=test_ds)\n```\n\n### Pretraining task\n\nWe aim to provide pretrained checkpoints for `BertEncoder` with different datasets and different sizes through TF Hub,\nhowever the user can choose to pretrain a new BertEncoder based on their own dataset.\n\n```python\nwith strategy.scope():\n  encoder = keras_nlp.encoders.BertEncoder(vocab_size, max_sequence_length, type_vocab_size)\n  token_ids = tf.keras.layers.Input(shape=(SEQUENCE_LENGTH,), name='word_token_ids', dtype=tf.int32)\n  input_mask = tf.keras.layers.Input(shape=(SEQUENCE_LENGTH,), name='input_mask', dtype=tf.int32)\n  type_ids = tf.keras.layers.Input(shape=(128,), name='input_type_ids', dtype=tf.int32)\n  masked_lm_positions = tf.keras.layers.Input(shape=(None,), name='masked_lm_positions', dtype=tf.int32)\n  x = encoder([token_ids, input_mask, type_ids])['pooled_output']\n  cls_output, sequence_output = output['pooled_output'], outputs['sequence_output']\n  masked_lm = keras_nlp.layers.MaskedLM(embedding_table=encoder.get_embedding_table())\n  lm_output = masked_lm(sequence_output, masked_positions=masked_lm_positions)\n  cls_output = tf.keras.layers.Dense(units=num_classes, activation='softmax')(cls_output)\n  model = tf.keras.Model(inputs=[token_ids, input_mask, type_ids, masked_lm_positions],\n                         outputs={'lm_output': masked_lm, 'cls_output': cls_output})\n\nmodel.compile('adam', {'lm_output': 'sparse_categorical_crossentropy', 'cls_output': 'sparse_categorical_crossentropy'})\nmodel.fit(train_ds, epochs=100)\n```\n\n### Other encoder-based networks\n\n`BertEncoder` is the first encoder network we propose in this doc. However other encoder networks can be easily\nbuilt on top of the `TransformerEncoder` layer. For example, for a transformer encoder sharing mechanism\nwith [ALBERT](https://arxiv.org/pdf/1909.11942.pdf), this can be achieved by:\n\n```python\ntoken_ids = tf.keras.layers.Input(shape=(None,), dtype=tf.int32, name='input_word_ids')\nmask = tf.keras.layers.Input(shape=(None,), dtype=tf.int32, name='input_mask')\ntype_ids = tf.keras.layers.Input(shape=(None,), dtype=tf.int32, name='input_type_ids')\nword_embeddings = keras_nlp.layers.OnDeviceEmbedding(vocab_size, embedding_width)(token_ids)\nposition_embeddings = keras_nlp.layers.PositionEmbedding(max_sequence_length)(word_embeddings)\ntype_embeddings = keras_nlp.layers.OnDeviceEmbedding(\n  vocab_size=type_vocab_size, embedding_width=embedding_width, use_one_hot=True)(type_ids)\nembeddings = tf.keras.layers.Add()([word_embeddings, position_embeddings, type_embeddings])\nembeddings = tf.keras.layers.LayerNormalization(axis=-1)(embeddings)\nembeddings = tf.keras.layers.Dropout(rate=dropout_rate)(embeddings)\nembeddings = tf.keras.layers.experimental.EinsumDense(\n  '...x,xy->...y', output_shape=hidden_size, bias_axes='y')(embeddings)\ndata = emnbeddings\nattention_mask = layers.SelfAttentionMask()([data, mask])\nshared_layer = keras_nlp.layers.TransformerEncoder(num_attention_heads, inner_dim)\nfor _ in range(num_layers):\n  data = shared_layer([data, attention_mask])\nfirst_token_tensor = tf.keras.layers.Lambda(lambda x: tf.squeeze(x[:, 0:1, :], axis=1))(data)\ncls_output = tf.keras.layers.Dense(units=hidden_size, activation='tanh')(first_token_tensor)\noutputs = dict(sequence_output=data, pooled_output=cls_output)\nmodel = tf.keras.Model(inputs=[word_ids, mask, type_ids], outputs=outputs)\n```\n\n## Detailed Design\n\n### Layers -- TransformerEncoder\n\nThis layer encapsulates a single layer of Transformer Encoder.\n\n```python\nclass TransformerEncoder(tf.keras.layers.Layer):\n  \"\"\"TransformerEncoder layer.\n\n  This layer implements the Transformer Encoder from\n  \"Attention Is All You Need\". (https://arxiv.org/abs/1706.03762),\n  which combines a `tf.keras.layers.MultiHeadAttention` layer with a\n  two-layer feedforward network.\n\n  References:\n    [Attention Is All You Need](https://arxiv.org/abs/1706.03762)\n    [BERT: Pre-training of Deep Bidirectional Transformers for Language\n     Understanding](https://arxiv.org/abs/1810.04805)\n  \"\"\"\n\n  def __init__(self,\n               num_attention_heads,\n               inner_dim,\n               inner_activation,\n               output_range=None,\n               kernel_initializer=\"glorot_uniform\",\n               bias_initializer=\"zeros\",\n               kernel_regularizer=None,\n               bias_regularizer=None,\n               activity_regularizer=None,\n               kernel_constraint=None,\n               bias_constraint=None,\n               use_bias=True,\n               norm_first=False,\n               norm_epsilon=1e-12,\n               output_dropout=0.0,\n               attention_dropout=0.0,\n               inner_dropout=0.0,\n               attention_initializer=None,\n               **kwargs):\n    \"\"\"Initializes `TransformerEncoder`.\n\n    Arguments:\n      num_attention_heads: Number of attention heads.\n      inner_dim: The output dimension of the first Dense layer in a two-layer\n        feedforward network.\n      inner_activation: The activation for the first Dense layer in a two-layer\n        feedforward network.\n      output_range: the sequence output range, [0, output_range) for slicing the\n        target sequence. `None` means the target sequence is not sliced.\n      kernel_initializer: Initializer for dense layer kernels.\n      bias_initializer: Initializer for dense layer biases.\n      kernel_regularizer: Regularizer for dense layer kernels.\n      bias_regularizer: Regularizer for dense layer biases.\n      activity_regularizer: Regularizer for dense layer activity.\n      kernel_constraint: Constraint for dense layer kernels.\n      bias_constraint: Constraint for dense layer kernels.\n      use_bias: Whether to enable use_bias in attention layer. If set False,\n        use_bias in attention layer is disabled.\n      norm_first: Whether to normalize inputs to attention and intermediate\n        dense layers. If set False, output of attention and intermediate dense\n        layers is normalized.\n      norm_epsilon: Epsilon value to initialize normalization layers.\n      output_dropout: Dropout probability for the post-attention and output\n        dropout.\n      attention_dropout: Dropout probability for within the attention layer.\n      inner_dropout: Dropout probability for the first Dense layer in a\n        two-layer feedforward network.\n      attention_initializer: Initializer for kernels of attention layers. If set\n        `None`, attention layers use kernel_initializer as initializer for\n        kernel.\n      **kwargs: keyword arguments/\n    \"\"\"\n```\n\n### Layers -- SelfAttentionMask\n\n```python\nclass SelfAttentionMask(tf.keras.layers.Layer):\n  \"\"\"Create 3D attention mask from a 2D tensor mask.\"\"\"\n\n  def call(self, inputs, to_mask):\n  \"\"\"\n  Args:\n    inputs[0]: from_tensor: 2D or 3D Tensor of shape\n      [batch_size, from_seq_length, ...].\n    inputs[1]: to_mask: int32 Tensor of shape [batch_size, to_seq_length].\n\n  Returns:\n      float Tensor of shape [batch_size, from_seq_length, to_seq_length].\n  \"\"\"\n```\n\n### Layers -- OnDeviceEmbedding\nThis is the experimental layer that would support either one-hot tf.matmul approach or tf.gather approach.\n\n```python\nclass OnDeviceEmbedding(tf.keras.layers.Layer):\n  \"\"\"Performs an embedding lookup suitable for accelerator devices.\n\n  This layer uses either tf.gather or tf.one_hot to translate integer indices to\n  float embeddings.\n\n  Arguments:\n    vocab_size: Number of elements in the vocabulary.\n    embedding_width: Output size of the embedding layer.\n    initializer: The initializer to use for the embedding weights. Defaults to\n      \"glorot_uniform\".\n    use_one_hot: Whether to use tf.one_hot over tf.gather for the embedding\n      lookup. Defaults to False (that is, using tf.gather). Setting this option\n      to True may improve performance, especially on small vocabulary sizes, but\n      will generally require more memory.\n  \"\"\"\n\n  def __init__(self,\n               vocab_size,\n               embedding_width,\n               initializer=\"glorot_uniform\",\n               use_one_hot=False,\n               **kwargs):\n```\n\n### Layers -- PositionEmbedding\n\n```python\nclass PositionEmbedding(tf.keras.layers.Layer):\n  \"\"\"Creates a positional embedding.\n\n  Arguments:\n    max_length: The maximum size of the dynamic sequence.\n    initializer: The initializer to use for the embedding weights. Defaults to\n      \"glorot_uniform\".\n\n  Reference: This layer creates a positional embedding as described in\n  [BERT: Pre-training of Deep Bidirectional Transformers for Language\n  Understanding](https://arxiv.org/abs/1810.04805).\n  \"\"\"\n```\n\n### Layers -- MaskedLM\n\n```python\nclass MaskedLM(tf.keras.layers.Layer):\n  \"\"\"Masked language model network head for BERT modeling.\n\n  This layer implements a masked language model based on the provided\n  transformer based encoder. It assumes that the encoder network being passed\n  has a \"get_embedding_table()\" method.\n\n  Arguments:\n    embedding_table: The embedding table from encoder network.\n    activation: The activation, if any, for the dense layer.\n    initializer: The initializer for the dense layer. Defaults to a Glorot\n      uniform initializer.\n    output: The output style for this layer. Can be either 'logits' or\n      'predictions'.\n  \"\"\"\n\n  def __init__(self,\n               embedding_table,\n               activation=None,\n               initializer='glorot_uniform',\n               output='logits',\n               name=None,\n               **kwargs):\n```\n\n### Encoders -- BertEncoder\n\n```python\nclass BertEncoder(tf.keras.Model):\n  \"\"\"Bi-directional Transformer-based encoder network.\n\n  This network implements a bi-directional Transformer-based encoder as\n  described in \"BERT: Pre-training of Deep Bidirectional Transformers for\n  Language Understanding\" (https://arxiv.org/abs/1810.04805). It includes the\n  embedding lookups and transformer layers, but not the masked language model\n  or classification task networks.\n\n  The default values for this object are taken from the BERT-Base implementation\n  in \"BERT: Pre-training of Deep Bidirectional Transformers for Language\n  Understanding\".\n\n  *Note* that the network is constructed by\n  [Keras Functional API](https://keras.io/guides/functional_api/).\n\n  Arguments:\n    vocab_size: The size of the token vocabulary.\n    hidden_size: The size of the transformer hidden layers.\n    num_layers: The number of transformer layers.\n    num_attention_heads: The number of attention heads for each transformer. The\n      hidden size must be divisible by the number of attention heads.\n    max_sequence_length: The maximum sequence length that this encoder can\n      consume. If None, max_sequence_length uses the value from sequence length.\n      This determines the variable shape for positional embeddings.\n    type_vocab_size: The number of types that the 'type_ids' input can take.\n    inner_dim: The output dimension of the first Dense layer in a two-layer\n        feedforward network for each transformer.\n    inner_activation: The activation for the first Dense layer in a two-layer\n        feedforward network for each transformer.\n    output_dropout: Dropout probability for the post-attention and output\n        dropout.\n    attention_dropout: The dropout rate to use for the attention layers\n      within the transformer layers.\n    initializer: The initialzer to use for all weights in this encoder.\n    output_range: The sequence output range, [0, output_range), by slicing the\n      target sequence of the last transformer layer. `None` means the entire\n      target sequence will attend to the source sequence, which yeilds the full\n      output.\n    embedding_width: The width of the word embeddings. If the embedding width is\n      not equal to hidden size, embedding parameters will be factorized into two\n      matrices in the shape of ['vocab_size', 'embedding_width'] and\n      ['embedding_width', 'hidden_size'] ('embedding_width' is usually much\n      smaller than 'hidden_size').\n  \"\"\"\n\n  def __init__(\n      self,\n      vocab_size,\n      hidden_size=768,\n      num_layers=12,\n      num_attention_heads=12,\n      max_sequence_length=512,\n      type_vocab_size=16,\n      inner_dim=3072,\n      inner_activation='gelu',\n      output_dropout=0.1,\n      attention_dropout=0.1,\n      initializer='truncated_normal',\n      output_range=None,\n      embedding_width=None,\n      **kwargs):\n```\n\n## Questions and Discussion Topics\n\nGathering feedbacks on arguments & naming conventions.\n"
  },
  {
    "path": "rfcs/20200928-keras-cv-single-stage-2d-object-detection.md",
    "content": "# keras-cv Single Stage Two-Dimensional Object Detection API\n\n| Status        | Proposed      |\n:-------------- |:---------------------------------------------------- |\n| **Author(s)** | Zhenyu Tan (tanzheny@google.com), Francois Chollet (fchollet@google.com)|\n| **Contributor(s)** | Pengchong Jin (pengchong@google.com)|\n| **Updated**   | 2020-09-28                                           |\n\n## Objective\n\nWe aim at providing the core primitive components for training and serving single-stage two-dimensional object\ndetection models, such as Single-Shot MultiBox Detector (SSD), RetinaNet, and You-Only-Look-Once (YOLO).\nPretrained models will also be provides, similar to keras-applications.\n\n## Key Benefits\n\nSingle-stage object detection models are a state-of-art technique that powers many computer vision tasks, they provide\nfaster detection compared to two-stage models (such as FasterRCNN), while maintaining comparable performance.\n\nWith this proposal, Keras users will be able to build end-to-end models with a simple API.\n\n## Design overview\n\nThis proposal includes the specific core components for building single-stage object detection models. It does not, however, include:\n\n1. Data augmentation, such as image and groundtruth box preprocessing\n2. Model backbone, such as DarkNet, or functions to generate feature maps\n3. Detection heads, such as Feature Pyramid\n4. metrics utilities such as COCO Evaluator, or visualization utils.\n\nData augmentation will be included as a separate RFC that handles a\nbroader context than object detection.\n\nModel backbone and detection heads are model-specific, we anticipate them to be analyzed and proposed in \n`keras.applications` for heavily used patterns, however the user can build them easily using Keras.\n\n#### Training\n\nCase where a user want to train from scratch:\n\n```python\nimport tensorflow as tf\nimport tensorflow_datasets as tfds\nimport keras_cv\n\n# Considering a COCO dataset\ncoco_dataset = tfds.load('coco/2017')\ntrain_ds, eval_ds = coco_dataset['train'], coco_dataset['validation']\n\ndef preprocess(features):\n  image, gt_boxes, gt_labels = features['image'], features['objects']['bbox'], features['objects']['label']\n  # preprocess image, gt_boxes, gt_labels, such as flip, resize, and padding, and reserve 0 for background label.\n  return image, gt_boxes, gt_labels\n\nanchor_generator = keras_cv.ops.AnchorGenerator(anchor_sizes, scales, aspect_ratios, strides)\nsimilarity_calculator = keras_cv.layers.IOUSimilarity()\nbox_matcher = keras_cv.ops.BoxMatcher(positive_threshold, negative_threshold)\ntarget_gather = keras_cv.ops.TargetGather()\nbox_coder = keras_cv.ops.BoxCoder(offset='sigmoid')\n\ndef encode_label(image, gt_boxes, gt_labels):\n  anchor_boxes = anchor_generator(image_size)\n  iou = similarity_calculator(gt_boxes, anchor_boxes)\n  match_indices, match_indicators = box_matcher(iou)\n\n  mask = tf.less_equal(match_indicators, 0)\n  class_mask = tf.expand_dims(mask, -1)\n  box_mask = tf.tile(class_mask, [1, 4])\n\n  class_targets = target_gather(gt_labels, match_indices, class_mask, -1)\n  box_targets = target_gather(gt_boxes, match_indices, box_mask, 0.0)\n  box_targets = box_coder.encode(box_targets, anchor_boxes)\n\n  weights = tf.squeeze(tf.ones_like(gt_labels), axis=-1)\n  ignore_mask = tf.equal(match_indicators, -2)\n  class_weights = target_gather(weights, match_indices, ignore_mask, 0.0)\n  box_weights = target_gather(weights, match_indices, mask, 0.0)\n\n  return (image, {'classification': class_targets, 'regression': box_targets},\n          {'classification': class_weights, 'regression': box_weights})\n\nclass RetinaNet(tf.keras.Model):\n  # includes backbone and feature pyramid head.\n  def __init__(self):\n    # self.backbone = Model Backbone that returns dict of feature map\n    # self.fpn = Feature Pyramid Heads that\n    # self.head = classification and regression heads\n  \n  def call(self, image, training=None):\n    feature_map = self.backbone(image, training)\n    feature_map = self.fpn(feature_map, training)\n    class_scores, boxes = self.head(feature_map, training)\n    return {'classification': class_scores, 'regression': boxes}\n\ntransformed_train_ds = train_ds.map(preprocess).map(encode_label).batch(128).shuffle(1024)\ntransformed_eval_ds = eval_ds.map(preprocess).map(encode_label).batch(128)\n\nstrategy = tf.distribute.TPUStrategy(...)\nwith strategy.scope():\n    optimizer = tf.keras.optimizers.SGD(lr_scheduler)\n    model = RetinaNet()\n    model.compile(optimizer=optimizer,\n                  loss={'classification': keras_cv.losses.Focal(), 'regression': tf.keras.losses.Huber()},\n                  metrics=[])\n\nmodel.fit(transformed_train_ds, epochs=120, validation_data=transformed_eval_ds)\nmodel.save(file_path)\n``` \n\n#### Serving\n\nCase where a user want to serve the trained model for a single image.\n\n```python\nloaded_model = tf.keras.models.load(file_path)\nbox_coder = keras_cv.ops.BoxCoder(offset='sigmoid')\nanchor_generator = keras_cv.ops.AnchorGenerator()\nanchor_boxes = anchor_generator(image_size)\ndetection_generator = keras_cv.layers.NMSDetectionDecoder()\n\n@tf.function\ndef serving_fn(image):\n  batched_image = tf.expand_dims(image)\n  raw_boxes, scores = loaded_model(batched_image, training=False)\n  decoded_boxes = box_coder.decode(raw_boxes, anchor_boxes)\n  classes, scores, boxes, _ = detection_generator(scores, decoded_boxes)\n  return {'classes': classes, 'scores': scores, 'boxes': boxes}\n```\n\n## Detailed Design\n\nFor the rest of the design, we denote `B` as batch size, `N` as the number of ground truth boxes, and `M` as the number\nof anchor boxes.\n\nWe propose 2 layers, 1 loss and 4 ops in this RFC.\n\n#### Layers -- IouSimilarity\nWe propose IouSimilarity layer to support ragged tensor directly, however user can also pad ground truth\nboxes or anchor boxes and pass a mask\n \n```python\nclass IouSimilarity(tf.keras.layers.Layer):\n  \"\"\"Class to compute similarity based on Intersection over Union (IOU) metric.\"\"\"\n \n  def __init__(self, mask_value):\n    \"\"\"Initializes IouSimilarity layer.\n    Args:\n      mask_value: A float mask value to fill where `mask` is True. \n    \"\"\"\n \n  def call(self, groundtruth_boxes, anchors, mask=None):\n    \"\"\"Compute pairwise IOU similarity between ground truth boxes and anchors.\n \n    Args:\n      groundtruth_boxes: A float Tensor [N], or [B, N] represent coordinates.\n      anchors: A float Tensor [M], or [B, M] represent coordinates.\n      mask: A boolean tensor with [N, M] or [B, N, M].\n \n    Returns:\n      A float tensor with shape [M, N] or [B, M, N] representing pairwise\n        iou scores, anchor per row and groundtruth_box per colulmn.\n \n    Input shape:\n      groundtruth_boxes: [N, 4], or [B, N, 4]\n      anchors: [M, 4], or [B, M, 4]\n \n    Output shape:\n      [M, N], or [B, M, N]\n    \"\"\"\n```\n\n#### Layers -- NMSDetectionDecoder\n\n```python\nclass NMSDetectionDecoder(tf.keras.layers.Layer):\n  \"\"\"Generates detected boxes with scores and classes for one-stage detector.\"\"\"\n\n  def __init__(self,\n               pre_nms_top_k=5000,\n               pre_nms_score_threshold=0.05,\n               nms_iou_threshold=0.5,\n               max_num_detections=100,\n               use_batched_nms=False,\n               **kwargs):\n    \"\"\"Initializes a detection generator.\n\n    Args:\n      pre_nms_top_k: int, the number of top scores proposals to be kept before\n        applying NMS.\n      pre_nms_score_threshold: float, the score threshold to apply before\n        applying  NMS. Proposals whose scores are below this threshold are\n        thrown away.\n      nms_iou_threshold: float in [0, 1], the NMS IoU threshold.\n      max_num_detections: int, the final number of total detections to generate.\n      use_batched_nms: bool, whether or not use\n        `tf.image.combined_non_max_suppression`.\n      **kwargs: other key word arguments passed to Layer.\n    \"\"\"\n\n  def call(self, raw_boxes, raw_scores, anchor_boxes, image_shape):\n    \"\"\"Generate final detections.\n\n    Args:\n      raw_boxes: a single Tensor or dict with keys representing FPN levels and values\n        representing box tenors of shape\n        [batch, feature_h, feature_w, num_anchors * 4].\n      raw_scores: a single Tensor or dict with keys representing FPN levels and values\n        representing logit tensors of shape\n        [batch, feature_h, feature_w, num_anchors].\n      anchor_boxes: a tensor of shape of [batch_size, K, 4] representing the\n        corresponding anchor boxes w.r.t `box_outputs`.\n      image_shape: a tensor of shape of [batch_size, 2] storing the image height\n        and width w.r.t. the scaled image, i.e. the same image space as\n        `box_outputs` and `anchor_boxes`.\n\n    Returns:\n    `detection_boxes`: float Tensor of shape [B, max_num_detections, 4]\n      representing top detected boxes in [y1, x1, y2, x2].\n    `detection_scores`: float Tensor of shape [B, max_num_detections]\n      representing sorted confidence scores for detected boxes. The values\n      are between [0, 1].\n    `detection_classes`: int Tensor of shape [B, max_num_detections]\n      representing classes for detected boxes.\n    `num_detections`: int Tensor of shape [B] only the first\n      `num_detections` boxes are valid detections\n    \"\"\"\n```\n\n#### Losses -- Focal\n\n```python\nclass FocalLoss(tf.keras.losses.Loss):\n  \"\"\"Implements a Focal loss for classification problems.\n\n  Reference:\n    [Focal Loss for Dense Object Detection](https://arxiv.org/abs/1708.02002).\n  \"\"\"\n\n  def __init__(self,\n               alpha=0.25,\n               gamma=2.0,\n               reduction=tf.keras.losses.Reduction.AUTO,\n               name=None):\n    \"\"\"Initializes `FocalLoss`.\n\n    Arguments:\n      alpha: The `alpha` weight factor for binary class imbalance.\n      gamma: The `gamma` focusing parameter to re-weight loss.\n      reduction: (Optional) Type of `tf.keras.losses.Reduction` to apply to\n        loss. Default value is `AUTO`. `AUTO` indicates that the reduction\n        option will be determined by the usage context. For almost all cases\n        this defaults to `SUM_OVER_BATCH_SIZE`. When used with\n        `tf.distribute.Strategy`, outside of built-in training loops such as\n        `tf.keras` `compile` and `fit`, using `AUTO` or `SUM_OVER_BATCH_SIZE`\n        will raise an error. Please see this custom training [tutorial](\n          https://www.tensorflow.org/tutorials/distribute/custom_training) for\n            more details.\n      name: Optional name for the op. Defaults to 'retinanet_class_loss'.\n    \"\"\"\n\n  def call(self, y_true, y_pred):\n    \"\"\"Invokes the `FocalLoss`.\n\n    Arguments:\n      y_true: A tensor of size [batch, num_anchors, num_classes]\n      y_pred: A tensor of size [batch, num_anchors, num_classes]\n\n    Returns:\n      Summed loss float `Tensor`.\n    \"\"\"\n```\n\n#### Ops -- AnchorGenerator\n\n```python\nclass AnchorGenerator:\n  \"\"\"Utility to generate anchors for a multiple feature maps.\"\"\"\n\n  def __init__(self,\n               anchor_sizes,\n               scales,\n               aspect_ratios,\n               strides,\n               clip_boxes=False):\n    \"\"\"Constructs multiscale anchors.\n\n    Args:\n      anchor_sizes: A list/dict of int represents the anchor size for each scale. The\n        anchor height will be `anchor_size / sqrt(aspect_ratio)`, anchor width\n        will be `anchor_size * sqrt(aspect_ratio)` for each scale.\n      scales: A list/tuple/dict, or a list/tuple/dict of a list/tuple of positive\n        floats representing the actual anchor size to the base `anchor_size`.\n      aspect_ratios: A list/tuple/dict, or a list/tuple/dict of a list/tuple of positive\n        floats representing the ratio of anchor width to anchor height.\n      strides: A list/tuple of ints represent the anchor stride size between\n        center of anchors at each scale.\n      clip_boxes: Boolean to represents whether the anchor coordinates should be\n        clipped to the image size. Defaults to `False`. \n\n    Input shape: the size of the image, `[H, W, C]`\n    Output shape: the size of anchors concat on each level, `[(H /\n      strides) * (W / strides), K * 4]`\n    \"\"\"\n  def __call__(self, image_size):\n    \"\"\"\n    Args:\n      image_size: a tuple of 2 for image_height and image_width.\n    Returns:\n      anchors: a dict or single Tensor.\n    \"\"\"\n```\n\n#### Ops -- BoxMatcher\n\n```python\nclass BoxMatcher:\n  \"\"\"Matcher based on highest value.\n\n  This class computes matches from a similarity matrix. Each column is matched\n  to a single row.\n\n  To support object detection target assignment this class enables setting both\n  positive_threshold (upper threshold) and negative_threshold (lower thresholds)\n  defining three categories of similarity which define whether examples are\n  positive, negative, or ignored:\n  (1) similarity >= positive_threshold: Highest similarity. Matched/Positive!\n  (2) positive_threshold > similarity >= negative_threshold: Medium similarity.\n        This is Ignored.\n  (3) negative_threshold > similarity: Lowest similarity for Negative Match.\n  For ignored matches this class sets the values in the Match object to -2.\n  \"\"\"\n\n  def __init__(\n      self,\n      positive_threshold,\n      negative_threshold=None,\n      force_match_for_each_col=False,\n      positive_value=1,\n      negative_value=-1,\n      ignore_value=-2):\n    \"\"\"Construct BoxMatcher.\n\n    Args:\n      positive_threshold: Threshold for positive matches. Positive if\n        sim >= positive_threshold, where sim is the maximum value of the\n        similarity matrix for a given column. Set to None for no threshold.\n      negative_threshold: Threshold for negative matches. Negative if\n        sim < negative_threshold. Defaults to positive_threshold when set to None.\n      force_match_for_each_col: If True, ensures that each column is matched to\n        at least one row (which is not guaranteed otherwise if the\n        positive_threshold is high). Defaults to False.\n      positive_value: An integer to fill for positive match indicators.\n      negative_value: An integer to fill for negative match indicators.\n      ignore_value: An integer to fill for ignored match indicators.\n\n    Raises:\n      ValueError: If negative_threshold > positive_threshold.\n    \"\"\"\n\n  def __call__(self, similarity_matrix):\n    \"\"\"Tries to match each column of the similarity matrix to a row.\n\n    Args:\n      similarity_matrix: A float tensor of shape [N, M], or [Batch_size, N, M]\n        representing any similarity metric.\n\n    Returns:\n      matched_indices: A integer tensor of shape [N] with corresponding match indices for each\n        of M columns, the value represent the column index that argmax match in the matrix.\n      matched_indicators: A integer tensor of shape [N] or [B, N]. For positive match, the match \n        result will be the `positive_value`, for negative match, the match will be\n        `negative_value`, for ignored match, the match result will be\n        `ignore_value`.\n    \"\"\"\n```\n\n#### Ops -- TargetGather\n\n```python\nclass TargetGather:\n  \"\"\"Labeler for dense object detector.\"\"\"\n\n  def __init__(self):\n    \"\"\"Constructs Anchor Labeler.\"\"\"\n\n  def __call__(self, labels, match_indices, mask, mask_val=0.0):\n    \"\"\"Labels anchors with ground truth inputs.\n\n    Args:\n      labels: An integer tensor with shape [N, dim], or [B, N, dim] representing\n        groundtruth classes.\n      match_indices: An integer tensor with shape [N] or [B, N] representing match\n        ground truth box index.\n      mask: An integer tensor with shape [N] representing match\n        labels, e.g., 1 for positive, -1 for negative, -2 for ignore.\n      mask_val: An python primitive to fill in places where mask is True.\n    Returns:\n      targets: A tensor with [M, dim] or [B, M, dim] selected from the `match_indices`.\n    \"\"\"\n```\n\n#### Ops -- BoxCoder\n\n```python\nclass BoxCoder:\n  \"\"\"box coder for RetinaNet, FasterRcnn, SSD, and YOLO.\"\"\"\n\n  def __init__(self, scale_factors=None):\n    \"\"\"Constructor for BoxCoder.\n\n    Args:\n      scale_factors: List of 4 positive scalars to scale ty, tx, th and tw. If\n        set to None, does not perform scaling. For Faster RCNN, the open-source\n        implementation recommends using [10.0, 10.0, 5.0, 5.0].\n      offset: The offset used to code the box coordinates, it can be 'sigmoid',\n        i.e., coded_coord = coord + sigmoid(tx) which\n        is used for RetinaNet, FasterRcnn, and SSD, or it can be 'linear',\n        i.e., encoded_coord = coord + width * tx which is used for YOLO. \n    \"\"\"\n  def encode(self, boxes, anchors):\n    \"\"\"Compute coded_coord from coord.\"\"\"\n  def decode(self, boxes, anchors):\n    \"\"\"Compute coord from coded_coord.\"\"\"\n```\n\n## Questions and Discussion Topics\n\n* Whether `BoxMatcher` should take a list of thresholds (e.g., size 2) and a list of values (e.g., size 3).\n* Gathering feedbacks on arguments & naming conventions.\n* How to better generalize box coding, to differentiate RCNN-family encoding and YOLO-family encoding.\n* Whether to have BoxCoder(inverse=False) and a single call method, or BoxCoder with `encode` and `decode` methods."
  },
  {
    "path": "rfcs/20210920-tune-end-to-end-ml-workflows-in-keras-tuner.md",
    "content": "# Tune end-to-end ML workflows in KerasTuner\n\n| Status        | Proposed                                             |\n:-------------- |:---------------------------------------------------- |\n| **Author**    | Haifeng Jin (haifengj@google.com)                    |\n| **Updated**   | 2021-09-20                                           |\n\n## Objective\n\nImproving the user experience of KerasTuner to tune end-to-end workflows.\nReduce the learning curve and code hacks for workflows involves hyperparameters\nin data preprocessing and model fitting.\n\n## Motivation\n\nDifferent users prefer different workflows for their tuning process -- like\nKeras has different getting-started tutorials for engineers and researchers.\nThere are users who prefer to learn more about the framework and to implement\neverything by overriding class methods, and users who prefer to write\neverything from scratch to have a shorter learning curve and better\nconfigurability for the details.  For example, some users would like to\noverride `Model.train_step()` to make the code cleaner, others like to write\nthe training loop from scratch.\n\n\nCurrently, KerasTuner has good support for the users who would like to\nrestructure their code by learning the KerasTuner framework, and for users who\nonly need to do some light customization of the model building process.\nHowever, the support for users who need to write their model building and\ntraining process from scratch is not adequate.\n\n\nMoreover, many users use the hyperparameter tuning library as an intermediate\nstep in their ML process rather than their main API. In their workflow,\nimplementing and training a model with Keras are usually a separate process\nfrom hyperparameter tuning. They would first write the code using Keras, then\ntry to put it into KerasTuner to tune, and put the hyperparameter values back\ninto their Keras model. Therefore, we should maximize the code and model\nportability in KerasTuner for these users, and minimize the code changes\nrequired for them to adopt and remove KerasTuner.\n\n### The old workflow\n\nThe current workflow for writing their model training process with KerasTuner\nis as follows. The user defines the model in the `HyperModel.build()` function.\nDefines the data preprocessing and model training by overriding\n`Tuner.run_trial()`. The arguments, like the dataset, are passed through the\n`Tuner.search()` function, and finally received by `Tuner.run_trial()`.\n\n\n```py\nimport keras_tuner as kt\n\nclass MyHyperModel(kt.HyperModel):\n  def build(self, hp):\n    # Model building\n    model = keras.Sequential()\n    model.add(keras.layers.Dense(\n        hp.Choice('units', [8, 16, 32]),\n        activation='relu'))\n    model.add(keras.layers.Dense(1, activation='relu'))\n    model.compile(loss='mse')\n    return model\n\nclass MyTuner(kt.Tuner):\n  def run_trial(self, trial, *fit_args, **fit_kwargs):\n    hp = trial.hyperparameters\n  \n    # data preprocessing       \n    training_data, validation_data = data_preprocessing(\n        hp, *fit_args, **fit_kwargs)\n    model = self.hypermodel.build(hp)\n   \n    # model training\n    model.fit(\n        training_data,\n        epochs=hp.Int(...),\n        validation_data=validation_data,\n        ...)\n       \n    # evaluation and reporting\n    score = model.evaluate(validation_data, ...)\n    self.oracle.update_trial(trial.trial_id, {'score': score})\n    self.save_model(trial.trial_id, model)\n\ntuner = MyTuner(\n    hypermodel=MyHyperModel(),\n    objective=kt.Objective('score', 'min'),\n    ...)\n\n# Passing in the args\ntuner.search(*fit_args, **fit_kwargs)\n```\n\n### Problems\n\nThe key problem of this workflow is that the code is split in two classes. Any\ncontrol flow and data flow between data preprocessing, model building, and\nmodel training would all have to pass through the framework and function calls.\nTo use the framework, the user would have to understand how these different\nfunctions are called, and wire their data and information properly between\nthese functions.\n\n### Use cases to improve\n\nThe following use cases are not well supported because of the problem above.\n\n#### Configure and jointly tune data preprocessing and model training\n\nFor example, writing a custom training loop, or tuning the data preprocessing\nsteps, or anything in the training loop like whether to shuffle the training\ndata, they need to override the `Tuner.run_trial()` function, which adds more\nto the learning curve.\n\nFor example, in natural language processing, tokenization and vectorization may\naffect the later model type. They will need to find a way to pass this\ninformation from `Tuner.run_trial()` to HyperModel.build.\n\n#### Tune existing Keras code\n\nIf the users have their code for model building and training ready written using\nKeras, and they want to tune some of the hyperparameters, they would have to\nchange the code a lot to separate their code apart and wire the data flow and\ncontrol flow between the overridden functions.\n\n#### Retrain the model after tuning\n\nIf the user wants to retrain the model using the best hyperparameter values\nfound, there is not a straight-forward way to do it if they used the\nhyperparameter in `Tuner.run_trial()` for data preprocessing and model\ntraining.\n\n## User Benefit\n\nThe use cases described above would all have smooth workflows, without much\nextra code or learning of the framework.\n\n## Design Proposal\n\nWe propose two workflows: the `Tuner` workflow and the `HyperModel` workflow to\nsolve the problems above.\n\nThe `Tuner` workflow is to override `Tuner.run_trial()`. The user can put all the\ncode for data preprocessing, model building, model training all in one place in\nthe `Tuner.run_trial()` function. No `HyperModel` is needed. It supports all the\nuse cases mentioned above by providing the maximum freedom to the user.\n\nThe `HyperModel` workflow follows the original `HyperModel` style. It is easier\nto learn and needs less code compared to the first workflow, but covers all the\nuse cases as long as the code for building and training the model are separate.\nThe user only needs to override the `HyperModel.fit()` for any tuning of the\ndata preprocessing and model fitting process.\n\n## Detailed Design\n\n### The `Tuner` workflow\n\nHere is an end-to-end code example of the new workflow.\n\nThe user only needs to override `Tuner.run_trial()` to put everything together,\nincluding data preprocessing, model building, and model training. It returns\nthe evaluation results back to the tuner. \n\n```py\nclass MyTuner(kt.Tuner):\n  def run_trial(self, trial, x, y, callbacks=None, **kwargs):\n    hp = trial.hyperparameters\n    # Data preprocessing\n    num_features = hp.Int(\"num_features\", 10, 15)\n    x, y = feature_selection(num_features=num_features, x, y)\n    # Model building\n    # Input shape depending on data preprocessing.\n    inputs = keras.Input(shape=(num_features,))\n    outputs = keras.layers.Dense(\n        hp.Choice('units', [8, 16, 32]),\n        activation='relu')(inputs)\n    outputs = keras.layers.Dense(1, activation='relu')(outputs)\n    model = keras.Model(inputs=inputs, outputs=outputs)\n    model.compile(loss='mse',\n                  metrics=['mae'])\n    # Model training\n    history = model.fit(\n        x,\n        y,\n        epochs=100,\n        validation_data=validation_data,\n        # Tune whether to use shuffle.\n        shuffle=hp.Boolean(\"shuffle\"),\n        # Tune whether to use sample_weights.\n        sample_weight=sample_weight if hp.Boolean(\"sample_weight\") else None,\n        # The provided callbacks list contains checkpointing and tensorboard.\n        callbacks=callbacks)\n    # Save the model to a unique path with `trial_id`.\n    model.save(os.path.join(trial.trial_id, 'model'))\n    # Returning the evaluation results\n    return np.min(history.history[\"val_mae\"])\n\n# When Tuner.run_trial is overridden,\n# `hypermodel` and `objective` are optional.\ntuner = MyTuner(\n    max_trials=3,\n    executions_per_trial=2,\n    overwrite=True,\n    directory=\"my_dir\",\n    project_name=\"helloworld\",\n)\n\n# Anything passed to `search()` will\n# go to `**kwargs` for `Tuner.run_trial()`.\ntuner.search(x, y)\n# Get the best model.\nbest_model = tuner.get_best_models()[0]\n```\n\nThere are several important features in this workflow:\n\n* Tune the arguments in `HyperModel.fit()`, like `shuffle` and `sample_weight`.\n\n* Share local variables across the workflow. For example, the model building\n  process can access the `num_features`, which is a variable in data\n  preprocessing. It solves the problem of joint tuning.\n\n* Use built-in callbacks for convenience. The callbacks argument contains\n  callback functions for checkpointing and TensorBoard setup.\n\n* The return value is flexible. It can be a single value, or a list of values,\n  or a dictionary of metrics, or even a `History` object returned by\n  `model.fit()`.\n\n* The `hypermodel` and `objective` can be optional. The user doesn't need to\n  define a `HyperModel`. If the return value is a single value, it will be\n  minimized by default. Therefore, objective is also optional.\n\n* The user can build a unique path to save each model with `trial.trial_id`.\n\nFor the use case of reusing existing Keras code. The user can use the following\nworkflow, which calls a function using all the hyperparameters. The user only\nneeds to write a function to call the existing Keras code and return the\nevaluation results.\n\n```py\nclass MyTuner(kt.Tuner):\n def run_trial(self, trial, **kwargs):\n   hp = trial.hyperparameters\n   return build_and_evaluate_model(\n       hp.Int(\"num_features\", 10, 15),\n       hp.Choice('units', [8, 16, 32]),\n       ...\n       trial.trial_id,\n   ))\n   # Save model can be handled by the user.\n   # `trial_id` is unique for each trial.\n\ntuner = MyTuner(...)\ntuner.search()\n# Retraining the model\nbuild_and_evaluate_model(**tuner.get_best_hyperparameters()[0])\n```\n\t\n\nIn this workflow, the user can easily retrain the model by calling the function again with the best hyperparameters.\n\n### The HyperModel workflow\n\nFor users who prefer to follow the old workflow, they can also implement the HyperModel above by overriding the build function and the fit function. The build function builds and returns the model. The fit function does the data preprocessing and model training.\n\nFollowing is a code example implementing the same functionality of the code example above.\n\n```py\nimport numpy as np\nimport keras_tuner as kt\nfrom tensorflow import keras\n\nclass MyHyperModel(kt.HyperModel):\n\n  def build(self, hp):\n    # Model building\n    # Input shape depends on a hyperparameter used by data preprocessing.\n    inputs = keras.Input(shape=(hp.Int(\"num_features\", 10, 15),))\n    x = keras.layers.Dense(\n        hp.Choice('units', [8, 16, 32]),\n        activation='relu')(inputs)\n    outputs = keras.layers.Dense(1, activation='relu')(x)\n    model = keras.Model(inputs=inputs, outputs=outputs)\n    model.compile(loss='mse',\n                  metrics=['mae'])\n    return model\n  \n  def fit(self, hp, model, x, y, validation_data, callbacks=None, **kwargs):\n    # Data preprocessing\n    # Get the hyperparameter value used in `build()`.\n    x, y = feature_selection(num_features=hp.get(\"num_features\"), x, y)\n    # Model training\n    # Returning the training history\n    # or a similar dictionary if using custom training loop.\n    return model.fit(\n        x,\n        y,\n        epochs=100,\n        validation_data=validation_data,\n        # Tune whether to use shuffle.\n        shuffle=hp.Boolean(\"shuffle\"),\n        # Tune whether to use sample_weights.\n        sample_weight=sample_weight if hp.Boolean(\"sample_weight\") else None,\n        # The provided callbacks list contains checkpointing and tensorboard.\n        callbacks=callbacks)\n\ntuner = kt.RandomSearch(\n    hypermodel=MyHyperModel(),\n    objective=kt.Objective('val_mae', 'min'),\n    directory='dir',\n    max_trials=3,\n    executions_per_trial=2,\n    overwrite=True,\n    directory=\"my_dir\",\n    project_name=\"helloworld\",\n)\n\n# Any arg passed to `search()` would be passed to `fit()`.\ntuner.search(x, y)\n\n# Exporting the best models.\nmodels = tuner.get_best_models(num_models=2)\n\n# Retraining the model with the second best hyperparameters.\nsecond_best_hp = tuner.get_best_hyperparameters(num_models=2)[1]\nhypermodel = MyHyperModel()\nmodel = hypermodel.build(second_best_hp)\nhypermodel.fit(\n    hp=second_best_hp, \n    model=model,\n    x=new_x,\n    y=new_y,\n    validation_data=new_validation_data,\n    # Save the model at its best epoch to a custom path\n    callbacks=[tf.keras.callbacks.ModelCheckpoint(\n        filepath=\"path_to_checkpoint\",\n        monitor='val_loss',\n        save_best_only=True)])\n# Save the final model.\nmodel.save(\"path_to_saved_model\")\n```\n\nPlease take note of the following four points:\n\n* Similar to `Tuner.run_trial()`, the return value of the fit function supports\n  all different formats.\n\n* The user can use built-in callbacks just like in `Tuner.run_trial()`.\n\n* `build()` and `fit()` can share hyperparameters. In this example,\n  `num_features` is shared between the two functions. In `fit()`, we can use\n  `hp.get()` to obtain the value of a hyperparameter used in `build()`.\n\n* We can easily retrain the model with any hyperparameter value set with\n  `hypermodel.build()` and `hypermodel.fit()`.\n\nWith these proposed workflows, the user now has the maximum flexibility. Any\nstep in an end-to-end machine learning workflow can be tuned. Moreover, the\nchanges needed to tune existing Keras code is minimized.\n\nHere we present HyperModel code examples of three important use cases:\n\n* Text tokenization.\n\n* Custom training loop.\n\n* Fine tuning with pretrained weights.\n\n#### Text tokenization\n\n```py\nimport json\n\n# Save the vocabulary to disk before search.\ntext_vectorizer = layers.TextVectorization()\ntext_vectorizer.adapt(dataset.map(lambda x, y: x))\nwith open('vocab.json', 'w') as f:\n  json.dump(text_vectorizer.get_vocabulary(), f)\n\nclass MyHyperModel(kt.HyperModel):\n  def build(self, hp):\n    inputs = keras.Input(shape=(10,))\n    outputs = layers.Embedding(\n        # max_token is a hyperparameter also used in text vectorization.\n        input_dim=hp.Int(\"max_tokens\", 100, 500, step=100),\n        output_dim=64)(inputs)\n    outputs = layers.LSTM(hp.Int(\"units\", 32, 128, step=32))(outputs)\n    outputs = layers.Dense(1, activation='sigmoid')(outputs)\n    model = keras.Model(inputs, outputs)\n    model.compile(loss='mse')\n    return model\n  \n  def fit(self, hp, model, dataset, validation_data, callbacks, **kwargs):\n    # Load the vocabulary from file.\n    with open('vocab.json', 'r') as f:\n      vocab = json.load(f)\n\n    # Create and adapt the text vectorizer.\n    text_vectorizer = layers.TextVectorization(\n        # The max_tokens is a hyperparameter created in build().\n        vocabulary=vocab[:hp.get(\"max_tokens\")],\n        output_mode=\"int\",\n        output_sequence_length=10)\n  \n    return model.fit(\n        # Convert x from strings to integer vectors.\n        dataset.map(\n            lambda x, y: (text_vectorizer(x), y),\n            num_parallel_calls=tf.data.AUTOTUNE),\n        validation_data=validation_data,\n        callbacks=callbacks,\n    )\n```\n\t\n\n#### Custom training loop\n\n```py\nclass MyHyperModel(kt.HyperModel):\n  def build(self, hp):\n    inputs = keras.Input(shape=(10,))\n    outputs = layers.Dense(hp.Int(\"units\", 16, 128), activation='relu')(inputs)\n    outputs = layers.Dense(1, activation='sigmoid')\n    model = keras.Model(inputs, outputs)\n    return model\n  \n  def fit(self, hp, model, dataset, validation_data, **kwargs):\n    lr = hp.Float(\"learning_rate\", 1e-4, 1e-2, sampling=\"log\", default=1e-3)\n    optimizer = tf.keras.optimizers.Adam(lr)\n    loss_tracker = tf.keras.metrics.Mean()\n    # Track the validation loss\n    val_loss_tracker = tf.keras.metrics.Mean()\n    loss_fn = tf.keras.losses.SparseCategoricalCrossentropy()\n    # Record the minimum validation loss during fit.\n    min_val_loss = float(\"inf\")\n  \n     @tf.function\n    def run_train_step(data):\n      images = tf.dtypes.cast(data[0], \"float32\") / 255.0\n      labels = data[1]\n      with tf.GradientTape() as tape:\n        logits = model(images)\n        loss = loss_fn(labels, logits)\n      gradients = tape.gradient(loss, model.trainable_variables)\n      optimizer.apply_gradients(zip(gradients, model.trainable_variables))\n      loss_tracker.update_state(loss)\n  \n     @tf.function\n    def run_val_step(data):\n      images = tf.dtypes.cast(data[0], \"float32\") / 255.0\n      labels = data[1]\n      logits = model(images)\n      loss = loss_fn(labels, logits)\n      val_loss_tracker.update_state(loss)\n  \n    for epoch in range(2):\n      for batch, data in enumerate(dataset):\n        run_train_step(data)\n      print(f\"Epoch loss: {loss_tracker.result().numpy()}\")\n      loss_tracker.reset_states()\n      for batch, data in enumerate(validation_data):\n        run_val_step(data)\n      val_loss = val_loss_tracker.result().numpy()\n      min_val_loss = min(min_val_loss, val_loss)\n      print(f\"Epoch val_loss: {val_loss}\")\n      val_loss_tracker.reset_states()\n  \n    return min_val_loss\n```\n\nYou may also subclass `keras.Model` to override `train_step()`.\n\n#### Fine tuning with pretrained weights\n\n```py\nclass MyHyperModel(kt.HyperModel):\n\n  def build(self, hp):\n    return keras.Sequential([\n        keras.applications.ResNet50(\n            weights=\"imagenet\",\n            input_shape=(32, 32, 3),\n            include_top=False,\n        ),\n        layers.GlobalAveragePooling2D(),\n        layers.Dense(hp.Int(\"units\", 32, 128)),\n        layers.Dense(1),\n    ])\n  \n  def fit(self, hp, model, dataset, validation_data, callbacks, **kwargs):\n    # Fit the model with the `base_model` freezed.\n    model.layers[0].trainable = False\n    model.compile(\n        optimizer=\"adam\",\n        loss=keras.losses.BinaryCrossentropy(from_logits=True),\n    )\n    model.fit(dataset, epochs=20)\n    # Fit the model again with some layers in the `base_model` freezed.\n    model.layers[0].trainable = True\n    for layer in model.layers[:hp.Int(\"freeze\", 0, 20)]:\n      layer.trainable = False\n    model.compile(\n        # Use a smaller learning rate.\n        optimizer=keras.optimizers.Adam(learning_rate=1e-5),\n        loss=keras.losses.BinaryCrossentropy(from_logits=True),\n    )\n    return model.fit(\n        dataset,\n        epochs=20,\n        callbacks=callbacks,\n        validation_data=validation_data)\n```\n\n### API documentation\n\nThe APIs in the new `HyperModel` class are as follows.\n\n```py\nclass HyperModel():\n  def fit(self, hp, model, callbacks, **kwargs):\n    \"\"\"Train the model.\n   \n    Args:\n        hp: HyperParameters.\n        model: `keras.Model` built in the `build()` function.\n        callbacks: A list of prebuild Keras callbacks for model checkpointing\n          and tensorboard configuration.\n        **kwargs: Anything the user defines. They are passed from\n            `Tuner.search()`.\n   \n    Returns:\n        A `History` object, a similar dictionary, or a single value.\n    \"\"\"\n    pass\n\nclass Tuner():\n  def run_trial(self, trial, callbacks, **kwargs):\n    \"\"\"Train the model.\n   \n    Args:\n        trial: Trial. The current Trial object.\n        callbacks: A list of prebuild Keras callbacks for model checkpointing\n          and tensorboard configuration.\n        **kwargs: Anything the user defines. They are passed from Tuner.search().\n\n    Returns:\n        A `History` object, a similar dictionary, or a single value.\n    \"\"\"\n```\n\n## Questions and Discussion Topics\n\nDoes the fit function need `trial_id` in the args to do model saving? The user\nmay need this arg to build unique saving paths for the models.\n"
  },
  {
    "path": "rfcs/20220804-keras-cv-two-stage-2d-object-detection.md",
    "content": "# keras-cv Two Stage Two-Dimensional Object Detection API\n\n| Status        | Proposed      |\n:-------------- |:---------------------------------------------------- |\n| **Author(s)** | Zhenyu Tan (tanzheny@google.com)|\n| **Contributor(s)** | Francois Chollet (fchollet@google.com)|\n| **Updated**   | 2022-08-04                                           |\n\n## Objective\n\nWe aim at providing the core primitive components for training and serving two-stage two-dimensional object\ndetection models, specifically Faster RCNN.\nPretrained models will also be provided, similar to keras-applications.\n\n## Key Benefits\n\nTwo-stage object detection models are state-of-art technique that powers many computer vision tasks, they provide\nmore accurate detection compared to single-stage models (such as SSD), while maintaining lower inference speed.\n\nWith this proposal, Keras users will be able to build end-to-end models with a simple API.\n\n## Design overview\n\nThis proposal includes the specific core components for building faster rcnn models. It does not, however, include:\n\n1. Model backbone, such as ResNet, or functions to generate feature maps\n2. Detection heads, such as Feature Pyramid\n3. metrics utilities such as COCO Evaluator, or visualization utils.\n4. primitive components from [single-stage detector]([url](https://github.com/keras-team/governance/blob/master/rfcs/20200928-keras-cv-single-stage-2d-object-detection.md)), we will re-use those components in this design.\n\nData augmentation with ground truth box processing is currently being developed in KerasCV.\n\nIn this document, region of interest (roi) is used interchangeably with region proposal, or simply proposal.\n\n#### Training\n\nCase where a user want to train from scratch:\n\n```python\nimport tensorflow as tf\nimport tensorflow_datasets as tfds\nimport keras_cv\n\n# Considering a COCO dataset\ncoco_dataset = tfds.load('coco/2017')\ntrain_ds, eval_ds = coco_dataset['train'], coco_dataset['validation']\n\ndef preprocess(features):\n  image, gt_boxes, gt_labels = features['image'], features['objects']['bbox'], features['objects']['label']\n  # preprocess image, gt_boxes, gt_labels, such as flip, resize, and padding, and reserve 0 for background label.\n  # but a batch of images (typically 2 per GPU) should have same size.\n  return image, gt_boxes, gt_labels\n\nanchor_generator = keras_cv.ops.AnchorGenerator(anchor_sizes, scales, aspect_ratios, strides)\nsimilarity_calculator = keras_cv.layers.IOUSimilarity()\n# positive anchor with IOU > 0.7, negative anchor with IOU <= 0.3\nrpn_box_matcher = keras_cv.ops.BoxMatcher([0.7, 0.3])\n# positive ROI with IOU > 0.5, negative ROI with IOU <= 0.5\nrcnn_box_mather = keras_cv.ops.BoxMatcher(0.5)\ntarget_gather = keras_cv.ops.TargetGather()\nbox_coder = keras_cv.ops.BoxCoder(offset='sigmoid')\nrpn_= keras_cv.layers.ProposalSampler(positive_fraction=0.5, batch_size=256)\nrcnn_sampler = keras_cv.layers.ProposalSampler(positive_fraction=0.25, batch_size=128)\nrpn_labeler = keras_cv.ops.AnchorLabeler(rpn_sampler, rpn_box_matcher, similarity_calculator, box_coder)\nrcnn_labeler = keras_cv.ops.AnchorLabeler(rcnn_sampler, rcnn_box_matcher, similarity_calculator, box_coder)\nroi_filter = keras_cv.layers.ROIFilter(pre_nms_top_k=2000, nms_iou_threshold=0.7, test_pre_nms_top_k=1000)\nroi_pooler = keras_cv.layers.ROIPooler(output_size=[7, 7])\n# Build RPN and ROI Heads, use Keras backbone\nbackbone = tf.keras.applications.ResNet50()\n\ndef encode_rpn_label(image, gt_boxes, gt_labels):\n  anchor_boxes = anchor_generator(image_size)\n  cls_targets, box_targets, cls_weights, box_weights = rpn_labeler(anchor_boxes, gt_boxes, gt_labels)\n  return (gt_boxes, gt_labels, cls_targets, box_targets), (cls_weights, box_weights)\n\nclass FasterRCNN(tf.keras.Model):\n  # includes backbone and feature pyramid head.\n  def __init__(self, backbone='resnet50_fpn', rpn_head, roi_head, roi_filter, roi_pooler):\n    # self.backbone = Model Backbone that returns dict of feature map, or Feature Pyramid Network that wraps it\n    # self.rpn_head = Region Proposal Network that provides objectness scores and bbox offset against anchor boxes\n    # self.roi_filter = A filter layer that shrinks from a dense predictions to topk sparse predictions based on scores\n    # self.roi_head = RCNN detection network that provides softmaxed classification score and bbox offset against rois\n    # self.rpn_cls_loss_fn = a Binary CrossEntropy Keras loss \n    # self.rpn_reg_loss_fn = a Regression Keras loss, e.g., Huber loss\n    # self.rcnn_cls_loss_fn = a Binary CrossEntropy Keras loss\n    # self.rcnn_reg_loss_fn = a Regression Keras loss, e.g., Huber loss\n  \n  def call(self, image, training=None):\n    # returns a single or multi level feature maps\n    feature_map = self.backbone(image, training)\n    # from the region proposal network, returns the predicted objectness scores\n    # and class-agnostic offsets relative to anchor boxes\n    rpn_cls_pred, rpn_bbox_pred = self.rpn_head(feature_map)\n    # apply offset to anchors and recover proposal in (x1, y1, x2, y2) format\n    rpn_rois = box_coder.decode_offset(anchors, rpn_bbox_pred)\n    # select top-k proposals according to objectness scores\n    rois, cls_pred = self.roi_filter(rpn_rois, rpn_cls_pred)\n    # pooling feature map with variable sized rois to fixed size feature map\n    feature_map = self.roi_pooler(feature_map, rois)\n    # get class independent scores and bounding boxes offsets relative to proposals\n    rcnn_cls_pred, rcnn_bbox_pred = self.roi_head(feature_map)\n    if not training:\n      rcnn_cls_pred, rcnn_bbox_pred = self.nms_detection_decoder(rois, rcnn_cls_pred, rcnn_bbox_pred, image_shape)\n      return rcnn_cls_pred, rcnn_bbox_pred\n    return {\"rpn_cls_pred\": rpn_cls_pred, \"rpn_bbox_pred\": rpn_bbox_pred, \"rois\": rois,\n            \"rcnn_cls_pred\": rcnn_cls_pred, \"rcnn_bbox_pred\": rcnn_bbox_pred}\n  \n  def train_step(self, data):\n    image, (gt_labels, gt_boxes, rpn_cls_targets, rpn_box_targets), (rpn_cls_weights, rpn_box_weights) = data\n    # Using approximate joint training instead of alternating training\n    with tf.GradientTape() as tape:\n      outputs = self(x, training=True)\n      # Compute RPN losses using targets from input pipeline, this will normalize by N_cls and N_reg as well\n      rpn_cls_loss = rpn_cls_loss_fn(rpn_cls_targets, outputs[\"rpn_cls_pred\"], rpn_cls_weights)\n      rpn_box_loss = rpn_reg_loss_fn(rpn_box_targets, outputs[\"rpn_boxes_pred\"], rpn_box_weights)\n      # Compute RCNN losses which only picks k-th bbox prediction where k is the predicted class\n      rois = outputs[\"rpn_rois\"]\n      rcnn_cls_true, rcnn_box_true, rcnn_cls_weights, rcnn_box_weights = self.rcnn_labeler(rois, gt_boxes, gt_labels)\n      rcnn_cls_loss = rcnn_cls_loss_fn(rcnn_scores_true, outputs[\"rcnn_cls_scores\"], rcnn_cls_weights)\n      rcnn_box_loss = rcnn_reg_loss_fn(rcnn_box_true, outputs[\"rcnn_bbox_offsets\"], rcnn_box_weights)\n      total_loss = rpn_cls_loss + rpn_box_loss + rcnn_cls_loss + rcnn_box_loss\n    self.optimizer.minimize(loss, self.trainable_variables, tape=tape)\n    return self.compute_metrics(...)\n      \n\ntransformed_train_ds = train_ds.map(preprocess).map(encode_rpn_label).batch(128).shuffle(1024)\ntransformed_eval_ds = eval_ds.map(preprocess).map(encode_rpn_label).batch(128)\n\nstrategy = tf.distribute.TPUStrategy(...)\nwith strategy.scope():\n    optimizer = tf.keras.optimizers.SGD(lr_scheduler)\n    model = RetinaNet()\n    model.compile(optimizer=optimizer,\n                  loss={'classification': keras_cv.losses.Focal(), 'regression': tf.keras.losses.Huber()},\n                  metrics=[])\n\nmodel.fit(transformed_train_ds, epochs=100, validation_data=transformed_eval_ds)\nmodel.save(file_path)\n``` \n\n#### Serving\n\nCase where a user want to serve the trained model for a single image, this will be identical to single-stage object detector.\n\n## Detailed Design\n\nFor the rest of the design, we denote `B` as batch size, `N` as the number of ground truth boxes, and `M` as the number\nof anchor boxes.\n\nWe propose 3 layers and 1 op in this RFC.\n\n#### Layers -- ProposalSampler\nGiven a dense anchor/proposal set, we propose ProposalSampler layer to for selecting positive and negative proposals according\nto the required batch size and positive : negative ratio\nboxes or anchor boxes and pass a mask\n \n```python\nclass ProposalSampler(tf.keras.layers.Layer):\n  \"\"\"Class to select positive and negative proposals.\"\"\"\n \n  def __init__(self, positive_fraction, batch_size, positive_indicator=1, negative_indicator=-1):\n    \"\"\"Initializes ProposalSampler layer.\n    Args:\n      positive_fraction: A float number between [0, 1], 0.5 means positive:negative ratio is 1:1\n      batch_size: the number of samples to generate\n      positive_indicator: for the inputs to the layer, value for positive proposal, default to 1\n      negative_indicator: for the inputs to the layer, value for negative proposal, default to -1\n    \"\"\"\n \n  def call(self, matched_indicators):\n    \"\"\"Get a balanced positive and negative samples.\n \n    Args:\n      matched_indicators: A int Tensor [N], or [B, N] represent positive or negative values\n \n    Returns:\n      Int tensors with shape [sample_size] or [B, sample_size] representing the selected indices for propsals.\n \n    \"\"\"\n```\n\n#### Layers -- ROIPooler\nWe propose ROIPooler layer to crop feature maps from proposals\n \n```python\nclass ROIPooler(tf.keras.layers.Layer):\n  \"\"\"Class to compute extract feature maps from region proposals by quantization.\"\"\"\n \n  def __init__(self, output_size=[7, 7]):\n    \"\"\"Initializes ROIPooler layer.\n    Args:\n      output_size: A tuple representing the output height and width. \n    \"\"\"\n \n  def call(self, feature_maps, rois):\n    \"\"\"Compute pairwise IOU similarity between ground truth boxes and anchors.\n \n    Args:\n      groundtruth_boxes: A float Tensor [H, W, C] or [B, H, W, C] or dict of multiple levels\n      rois: A float or int Tensor [M], or [B, M] represent coordinates within [H, W].\n \n    Returns:\n      A float tensor with shape [output_size] or [B, output_size] representing cropped feature maps.\n    \"\"\"\n```\n\n#### Layers -- ROIFilter\nWe propose ROIFilter layer to select top-k proposals based on some score\n \n```python\nclass ROIFilter(tf.keras.layers.Layer):\n  \"\"\"Class to select top-k proposals based on some score.\"\"\"\n \n  def __init__(self, \n               pre_nms_top_k: int = 2000,\n               pre_nms_score_threshold: float = 0.0,\n               pre_nms_min_size_threshold: float = 0.0,\n               nms_iou_threshold: float = 0.7,\n               num_proposals: int = 1000,\n               test_pre_nms_top_k: int = 1000,\n               test_pre_nms_score_threshold: float = 0.0,\n               test_pre_nms_min_size_threshold: float = 0.0,\n               test_nms_iou_threshold: float = 0.7,\n               test_num_proposals: int = 1000,\n               use_batched_nms: bool = False,):\n    \"\"\"Initializes ROIFilter layer.\n    Args:\n      pre_nms_top_k: An `int` of the number of top scores proposals to be kept\n        before applying NMS.\n      pre_nms_score_threshold: A `float` of the score threshold to apply before\n        applying NMS. Proposals whose scores are below this threshold are\n        thrown away.\n      pre_nms_min_size_threshold: A `float` of the threshold of each side of the\n        box (w.r.t. the scaled image). Proposals whose sides are below this\n        threshold are thrown away.\n      nms_iou_threshold: A `float` in [0, 1], the NMS IoU threshold.\n      num_proposals: An `int` of the final number of proposals to generate.\n      test_pre_nms_top_k: An `int` of the number of top scores proposals to be\n        kept before applying NMS in testing.\n      test_pre_nms_score_threshold: A `float` of the score threshold to apply\n        before applying NMS in testing. Proposals whose scores are below this\n        threshold are thrown away.\n      test_pre_nms_min_size_threshold: A `float` of the threshold of each side\n        of the box (w.r.t. the scaled image) in testing. Proposals whose sides\n        are below this threshold are thrown away.\n      test_nms_iou_threshold: A `float` in [0, 1] of the NMS IoU threshold in\n        testing.\n      test_num_proposals: An `int` of the final number of proposals to generate\n        in testing.\n      use_batched_nms: A `bool` of whether or not use\n        `tf.image.combined_non_max_suppression`.\n    \"\"\"\n \n  def call(self, self,\n           rois: Mapping[str, tf.Tensor],\n           raw_scores: Mapping[str, tf.Tensor],\n           image_shape: tf.Tensor):\n    \"\"\".\n \n    Args:\n      rois: A float Tensor [N], or [B, N] represent region proposals.\n      roi_scores: A float Tensor [N], or [B, N] represent scores for each region.\n      image_shape: A int tensor [2] or [B, 2] representing image size.\n \n    Returns:\n      roi: A `tf.Tensor` of shape [B, num_proposals, 4], the proposed\n        ROIs in the scaled image coordinate.\n      roi_scores: A `tf.Tensor` of shape [B, num_proposals], scores of the\n        proposed ROIs.\n\n    \"\"\"\n```\n\n#### Ops -- AnchorLabeler\n\n```python\nclass AnchorLabeler:\n  \"\"\"Labelers that matches ground truth with anchors and proposals.\"\"\"\n\n  def __init__(self,\n               proposal_sampler,\n               proposal_matcher,\n               similarity_calculator,\n               box_coder):\n    \"\"\".\n\n    Args:\n      proposal_sampler: a ProposalSampler\n      proposal_matcher: A BoxMatcher\n      similarity_calculator: Such as IOU layer\n      box_coder: a BoxCoder that transforms between different formats\n\n    \"\"\"\n  def __call__(self, proposals, gt_boxes, gt_labels):\n    \"\"\"\n    Args:\n      proposals: a float [N, 4] Tensor represent different proposals.\n      gt_boxes: a float [M, 4] Tensor represent ground truth boxes.\n      gt_labels: a int [M] Tensor represent ground truth labels.\n    Returns:\n      cls_targets: a int [K] Tensor represent mapped proposal labels from ground truth labels.\n      box_targets: a float [K, 4] Tensor represent mapped proposal boxes from ground truth boxes.\n      cls_weights: a float [K] Tensor represent weights for each cls_targets\n      box_weights: a float [K] or [K, 4] Tensor represent weights for each box_targets\n    \"\"\"\n```\n\n## Questions and Discussion Topics\n* Should we provide a meta arch for FasterRCNN.\n* SHould we provide some default out-of-box RPN Head and ROI Head.\n"
  },
  {
    "path": "rfcs/README.md",
    "content": "# Keras API proposal \"Request For Comment\" (RFC) docs\n\nThis folder contains approved API proposals. To propose a new API to be considered for review, you can open a Pull Request in this repository to add a new RFC `.md` doc.\n\n## Process\n\nThe process for writing and submitting design proposals is same as the [TensorFlow RFC process](https://github.com/tensorflow/community/blob/master/governance/TF-RFCs.md).\n\n- Start from [this template](https://github.com/keras-team/governance/blob/master/rfcs/yyyymmdd-rfc-template.md).\n- Fill in the content. Note that you will need to insert code examples.\n    - Provide enough context information for anyone to undertsand what's going on.\n    - Provide a solid argument as for why the feature is neeed.\n    - Include a code example of the **end-to-end workflow** you have in mind.\n- Open a Pull Request in the [Keras API proposals folder in this repository](https://github.com/keras-team/governance/tree/master/rfcs).\n- Send the Pull Request link to `keras-users@googlegroups.com` with a subject that starts with `[API DESIGN REVIEW]` (all caps) so that we notice it.\n- Wait for comments, and answer them as they come. Edit the proposal as necessary.\n- The proposal will finally be approved or rejected during a meeting of the Keras SIG chairs. Once approved, you can send out Pull Requests to implement the API changes or ask others to write Pull Requests (targeting `tf.keras` and `keras-team/keras`).\n\nNote that:\n- Anyone is free to send out API proposals.\n- Anyone is free to comment on API proposals or ask questions.\n- Anyone is free to attend design review meetings as an observer.\n- Participation in design review meetings is restricted to Keras SIG chairs.\n- Design review meeting notes will be posted publicly after each meeting.\n\n## Template\n\nUse [this template](https://github.com/keras-team/governance/blob/master/rfcs/yyyymmdd-rfc-template.md) to draft an RFC.\n"
  },
  {
    "path": "rfcs/yyyymmdd-rfc-template.md",
    "content": "# Title of RFC\n\n| Status        | (Proposed / Accepted / Implemented / Obsolete)       |\n:-------------- |:---------------------------------------------------- |\n| **Author(s)** | My Name (me@example.org), AN Other (you@example.org) |\n| **Sponsor**   | A N Expert (expert@example.org)                      |\n| **Updated**   | YYYY-MM-DD                                           |\n| **Obsoletes** | RFC it replaces, else remove this header             |\n\n## Objective\n\nWhat are we doing and why? What problem will this solve? What are the goals and\nnon-goals? This is your executive summary; keep it short, elaborate below.\n\n## Motivation\n\nWhy this is a valuable problem to solve? What background information is needed\nto show how this design addresses the problem?\n\nWhich users are affected by the problem? Why is it a problem? What data supports\nthis? What related work exists?\n\n## User Benefit\n\nHow will users (or other contributors) benefit from this work? What would be the\nheadline in the release notes or blog post?\n\n## Design Proposal\n\nThis is the meat of the document, where you explain your proposal. If you have\nmultiple alternatives, be sure to use sub-sections for better separation of the\nidea, and list pros/cons to each approach. If there are alternatives that you\nhave eliminated, you should also list those here, and explain why you believe\nyour chosen approach is superior.\n\nFactors to consider include:\n\n* UX and usability\n* How will this change impact users, and how will that be managed?\n* Performance implications\n* Dependencies\n* Maintenance\n* Backwards compatibility\n\n## Detailed Design\n\nThis section is optional. Elaborate on details if they’re important to\nunderstanding the design, but would make it hard to read the proposal section\nabove.\n\n## Questions and Discussion Topics\n\nSeed this with open questions you require feedback on from the RFC process."
  }
]