Full Code of keras-team/governance for AI

master cd74cbc33f3a cached

14 files

175.1 KB

40.9k tokens

1 requests

Download .txt

Repository: keras-team/governance
Branch: master
Commit: cd74cbc33f3a
Files: 14
Total size: 175.1 KB

Directory structure:
gitextract_pu1p5x3w/

├── README.md
├── keras_api_design_guidelines.md
├── project_setup_best_practices.md
└── rfcs/
    ├── 20190502-preprocessing-layers.md
    ├── 20190729-keras-preprocessing-redesign.md
    ├── 20191212-keras-categorical-inputs.md
    ├── 20200826-keras-nlp-scoping-design.md
    ├── 20200827-keras-cv-scoping-design.md
    ├── 20200920-keras-nlp-bert.md
    ├── 20200928-keras-cv-single-stage-2d-object-detection.md
    ├── 20210920-tune-end-to-end-ml-workflows-in-keras-tuner.md
    ├── 20220804-keras-cv-two-stage-2d-object-detection.md
    ├── README.md
    └── yyyymmdd-rfc-template.md

================================================
FILE CONTENTS
================================================

================================================
FILE: README.md
================================================
# Keras governance structure

![Keras logo](https://s3.amazonaws.com/keras.io/img/keras-logo-2018-large-1200.png)

---

## Design review process

Design-related communications are expected to happen primarily asynchronously via:

- The Pull Requests used for API proposals.
- [The Keras mailing list](https://groups.google.com/forum/#!forum/keras-users).

The process for writing and submitting design proposals is same as the [TensorFlow RFC process](https://github.com/tensorflow/community/blob/master/governance/TF-RFCs.md).

- Start from [this template](https://github.com/keras-team/governance/blob/master/rfcs/yyyymmdd-rfc-template.md).
- Fill in the content. Note that you will need to insert code examples.
    - Provide enough context information for anyone to undertsand what's going on.
    - Provide a solid argument as for why the feature is neeed.
    - Include a code example of the **end-to-end workflow** you have in mind.
- Open a Pull Request in the [Keras API proposals folder in this repository](https://github.com/keras-team/governance/tree/master/rfcs).
- Send the Pull Request link to `keras-users@googlegroups.com` with a subject that starts with `[API DESIGN REVIEW]` (all caps) so that we notice it.
- Wait for comments, and answer them as they come. Edit the proposal as necessary.
- The proposal will finally be approved or rejected. Once approved, you can send out Pull Requests to implement the API changes or ask others to write Pull Requests (targeting `keras-team/keras`).

Note that:

- Anyone is free to send out API proposals.
- Anyone is free to comment on API proposals or ask questions.

---

## Leadership

### BDFL

Role: final call in decisions related to the Keras API.

- Francois Chollet (fchollet@google.com)

---

## Our mission

The purpose of our work is to democratize access to machine learning through dependable standards and usable, productive APIs.
We seek to empower as many people as possible, from a wide diversity of backgrounds, to take ownership of ML technology and to use it to build their own solutions to their own problems.

Existing machine learning technology has the potential to solve a huge amount of problems in the world today, across every industry, and to help a tremendous amount of people. The potential is sky-high. We've barely even started. So how do we fully realize this potential?

We believe that we will only fully realize the potential of machine learning if it becomes a tool in everyone's hands -- not just a technology developed behind closed doors by an "AI industry", that you could only deploy by waiting for a turnkey cloud API to become available commercially, or by contracting an expensive consulting firm. We can't wait for experts to solve every problem -- experts at large tech companies don't even have visibility into a tiny fraction of the problems that can be solved. End users should solve their own problems. And our mission is to empower them to do just that.

Our mission is to make these capabilities available to anyone with basic computer literacy, for free. This is how we maximize the realized potential of these technologies, and how we maximize our positive impact on the world.

---

## Code of conduct

In the interest of fostering an open and welcoming environment,
we as contributors and maintainers pledge to making participation in our project
and our community a harassment-free experience for everyone.
All activity will abide by the [Code of Conduct](https://github.com/tensorflow/tensorflow/blob/master/CODE_OF_CONDUCT.md).

---

## Our values and our strengths

We will be able to reach our milestones because we yield superpowers of a kind that is quite uncommon among developers of ML tools:

- We have empathy for our users.
- We value and practice good design.
- We embrace openness and we are dedicated to foster our developer community.
- We value and practice good communication. 
- We have a unique brand that users love.

**1) We have empathy for our users.** We know that every decision we make should be made with the user in mind, whether a design decision or a strategy decision. "What will be the total impact of our choices on the people who rely on our software?" is the question behind everything we do.

Having empathy for our users means:

- Being users ourselves -- actively using our product in similar scenarios as what our users face.
- Understanding our users by being closely in touch with them and having clear visibility into the developer experience: actively seeking community input in everything we do, listening to feedback, talking with users at external talks and developer events.
- Putting ourselves in our users' shoes: always going above and beyond to be helpful to our users and to improve the user experience of our products.

**2) We value good design.** We are fully aware that a delightful UX is what has made us successful so far and what will keep making us successful in the future. We know that things should be as simple as possible (but no simpler). We prefer elegance and minimalism over technical prowess. We follow formal principles for good design.

**3) We embrace openness and we are dedicated to foster our developer community.** We know that long-term community building is critical to the success of our project, and we know that developer communities don't get built behind closed doors. We understand the necessity of doing our work in the open and the importance of involving open-source contributors at all stages of the development process.

**4) We value and practice good communication.** We understand that great documentation, great code examples, and transparency of governance are essential to Keras adoption and contribute meaningfully to the Keras UX. We value external communication, documentation, and developers relations as much as we value technical contributions.

**5) We value what makes us different and unique, and we value our brand that users love, Keras.** The Keras brand is an essential tool in reaching our goals: it stands for user-friendliness, accessibility, and good design. We are proud of our banner and we will carry it forward.


================================================
FILE: keras_api_design_guidelines.md
================================================
# Keras API design guidelines

These guidelines are meant to help focus design discussions and help us create delightful developer experiences.

These are meant as guidelines, not rules: each decision should be debated in its own unique context.

Some text remixed from external references:

- [User experience design for APIs](https://blog.keras.io/user-experience-design-for-apis.html)
- [Notes to Myself on Software Engineering](https://medium.com/s/story/notes-to-myself-on-software-engineering-c890f16f4e4d)


---

## Design end-to-end workflows, not individual functions and classes.

When developing APIs, start by designing end-to-end workflows, and only sketch out specific function/class signatures at the end.

- The goal is to arrive to workflows that feel like they are purposefully designed and well-optimized, rather than cobbled together to route around the features provided by the API. The workflows should come first, before atomic features. **Features only exist to support a workflow.** No feature should exist to provide a capability “just in case”, “because we can”.
- **Every design review document should prominently feature a code example of one or two end-to-end workflows showing the canonical use-case for the new API.**
- Every time we discuss choices surrounding a specific API feature, we should start by asking: **in what workflows will this be used?** Then we should make the choice that makes the most sense with respect to these workflows. We should not make API design decisions about features in isolation. 
- This implies that we will often ask the question: **do users really need to configure this parameter?**, and in many cases, the answer will be “no”, rather than being “yes” by default.


---

## Carefully weigh whether a new feature should be included.


It’s okay to say no: just because someone asks for a feature doesn’t mean we should do it. Every feature has a cost that goes beyond the initial CL: maintenance cost, documentation cost, and cognitive cost for our users (a sprawling API surface is a major usability issue).

In particular, in the Keras API, every new feature has to be maintained in perpetuity, and has to be replicated in every implementation of the Keras API (which includes tf.keras, tensorflow.js, and other third-party implementations).

As, such, our criteria for adding a new feature in the API is the following:

- **It should be broadly useful to our users**, rather than a niche feature that is only relevant to a specific vertical of researchers. Niche features should be maintained independently by those who need them (e.g. by extending the API via subclassing), as third-party add-on packages.
- **It should be widely recognized as a machine learning best practice.** We will not add new layers/etc that were recently published to ArXiv.org, even in case of claims of increased accuracy/etc. We only add new objects that are already commonly used in the machine learning community. Presumably, a new technique that does result in meaningful gains would be broadly adopted after a few months anyway (like ResNet), and that’s when we would be adding it to the core API. SIG-addons maintains a repository of significantly more volatile and independently maintained code to which the barriers to entry are lower.
- **It should have an owner committed to maintaining it in the long term.** In particular, the code should be maintainable by multiple people on the team, not just by one technical guru.


In addition, when saying yes to a request for supporting a new use case, remember that **literally adding what the user/team requested is often not the optimal choice**. Users are focused on their own specific use case, and we must counter this with a holistic and principled vision of the whole project (see: designing end-to-end workflows, not atomic functions/classes). Often, the right answer is to extend an existing feature. **Find the natural place to integrate the new feature in existing APIs.**


### Examples:

- We should not have added the self-normalizing activation function to the API. It was added before passing the test of time, and that technique has shown later not to reach broad adoption. **Note that citation count is not a good metric of adoption**; that paper has a high citation count.
- We should not move to core an API that has debuted somewhere on GitHub or TF-Addons but has failed to gain more than a few users after a few months.


---

## Seek to minimize cognitive load for our users.

Always seek to minimize the cognitive load imposed on our users in the course of using our APIs.

At a high level:

- **Automate everything that can be automated.**
- **Minimize the actions & choices required from the user.** Make sure default values for arguments are sensible and reflect best practices (so that users usually wouldn’t have to manually configure these). Don’t expose options that are not important or do not match real use cases, “just in case”.
- **Design simple and consistent workflows that reflect simple and consistent mental models.**

Here are a few practical rules:

- **No API should deal with internal implementation details.** An API is a language for our users to talk about the problem they care about -- and they don’t care about our internal hacks. For instance, an option like `use_locking` in an optimizer should be avoided. If an argument requires users to understand the implementation (not just what the code is supposed to implement, like SGD in this case), then the argument should not be included in the public API. **An API is all about the problem it solves, not about how the code works in the background.**
- **Introduce as few new concepts as possible.** It's not just that additional data structures require more effort in order to learn about their methods and properties, it's that they multiply the number of **mental models** that are necessary to grok your API. Ideally, you should only need **a single universal mental model around which everything is organized** (in Keras, that's the `Layer`). Definitely avoid having more than 2 or 3 mental models underlying the workflows you design. Likewise, avoid having concepts that are mostly overlapping but subtly different, since the difference will be difficult to convey clearly and will confuse our users (like, say, `Network` and `Model` -- this is why we don't export `Network` as a public API).
- **Objects that do interchangeable things should have identical or very close APIs.** In particular they should have the same positional arguments. For example, it should be possible to swap one optimizer for another in user code (when leaving all arguments to their default value) without editing the arguments.
- **If you find yourself proposing a signature with more than 6-7 arguments, consider whether all of these arguments are useful.** How many people and use cases would be affected if you removed one argument? How much would they be affected -- would they be able to easily extend the API (e.g. via subclassing) to support their use case without that built-in argument? Could this API be broken up into smaller, modular objects?
- **Best-practices should come baked into your API.** The simplest way to use your API (leaving all arguments to their default value, using the most obvious tool for the task, etc) should be as close as possible to the best way of solving the problem. In particular, all arguments that can be given a default value should be given a default value, and that default should match the most common use case.
- **Plain Python types are preferable to custom types.** Use tuples, strings, ints... A custom type requires more knowledge and effort on the part of the user (e.g. `TensorShape`, which is also breaking established conventions of scientific Python). **When using enums, make sure that their values are strings**, so as to make it possible for users to pass plain strings (example: `data_format="channels_last"`, `padding="valid"`).
- **Explicit, single-level configuration arguments are preferable to nested, hidden configuration arguments.** Avoid something like: `MyLayer(hyperparameter_dict)`, instead use `MyLayer(units, activation=None, ...)`.
- **No API should rely on TF Variable names or Op names.** These change all the time, and should be considered a convenience, not a part of the TensorFlow & Keras API.

In particular, naming is important and difficult:

- **The meaning of an argument should be clear from its name and should not require knowledge that only the implementers have.** In particular, argument names should only involve recognized terms of art (“L1 norm” is a term of art), and should not involve implementation-related vocabulary (e.g. “fused batchnorm”).
- **Avoid `OverlyLongAndSpecificNamingPatterns`.** If you find yourself with argument names with involve more than 3 subparts (e.g. “squared_operator_norm”), reconsider. Argument names should be intuitive and easy to remember.
- Avoid overly generic names (`x`, `variable`, `parameter`).
- **Make sure you are consistent in your naming choices.** Naming consistency means both **internal naming consistency** (don’t call `dim` what is called `axis` in other places, don’t call `ndims` what is called `ndim` elsewhere) and **consistency with established conventions for the problem domain (terms of art)**. Before settling on a name, make sure to look up existing names used by domain experts (or other APIs). In our case, argument names should be consistent with the broader scientific Python conventions, in particular NumPy.

Note that Keras uses the following naming rules:

- We use the convention `num_*` for counters, though omitting an explicit counter is nicer when there is no ambiguity (e.g. `units`, `epochs`, `filters`). 
- The rank of a tensor is its `ndim`. A specific dimension index is an `axis`. The number of dimensions in a linear projection (or similar) is `units`.
- By convention Keras layers are named with nouns rather than verbs (e.g. `Normalization` and not `Normalize`, `Convolution` and not `Convolve`).
- Following Python conventions, classes use capitalized parts (e.g. `ClassName`) and functions and methods use snake case (e.g. `function_name`).
- If an argument name has a numerical suffix (e.g. `alpha_1`), we put an underscore before the suffix in snake case. The capitalized equivalent would be e.g. `Alpha1`.
- We used fully spelled-out names, e.g. `attention_scores` and not `attn_scores`. There are a couple standardized exceptions to this rule, in particular `dim` for "dimension" and `num` for "number". These are sufficiently common that they are not ambiguous to a first-time reader.


### Example:

```python
MyConstructor(
   per_variable_sparsity_config=[
      'layer_1/kernel:0.8', 'layer_2/kernel:1.5'])
```

What's wrong with this?

- Overly long argument name
- Too much cognitive load involved in preparing an appropriate argument value
- Preparing an argument value requires internal implementation knowledge
- Reliance on TF variable names (subject to changes at any time, thus breaking this code)
- Nested config adding indirection
- Incorrect typing (float values being passing as strings)

Possible alternative:

```
obj = MyConstructor()
obj.configure_sparsity(some_layer.kernel, value=0.8)
obj.configure_sparsity(some_other_layer.kernel, value=1.5)
```

What's nice about this?

- Object-based variable references.
- Modular, simple action, with a clear name.
- Plain Python types.


---

## Balance expressivity vs. user-friendliness.

### Simple use cases should be simple, advanced use cases should be possible:

**Don’t increase the cognitive load of common use cases for the sake of niche use cases**, even minimally.
**Make sure that advanced users have a path to support their use case**, even if this path requires the users to roll out plugins or other API extensions (in particular via subclassing). **It is ok for advanced use cases not to be directly supported in the built-in API options.**


### Keep our APIs modular.

**Complex objects should be achievable by composing simple objects with few arguments, that do one thing reliably.** There is a balance to strike between having complex signatures on fewer objects, and having more objects with simpler signatures. A good API has a reasonable number of objects, with reasonably simple signatures (see also: avoiding signatures with more than 6-7 arguments).

**Things that create state or side-effects should be classes. Functions should be stateless.**
For instance, layers that create weights should not be cast as functions, since it makes the weights (and other elements of state) hard to access, impossible to update, and forces reliance on a global state capturing the side effects of layer-functions.


### APIs should be strictly compartmentalized.

For instance, the optimizer API or the layers API should not contain arguments for configuring distributed training. That should go into the distribution API.


---

## Don’t neglect error messages, docstrings, and documentation.

Documentation and error messages are an integral part of the API. Good docs and helpful error messages are key to a delightful user experience.

- **Catch user errors early and anticipate common mistakes.** Do user input validation as soon as possible. Actively keep track of common mistakes that people make (by screening GitHub and StackOverflow), and either solve them by simplifying our API, adding targeted error messages for these mistakes, or having a "solutions to common issues" page in our docs. Consider adding automated fallback behaviors (e.g. casting a wrongly-typed input) instead of raising errors, when applicable. Be nice to our users.
- **Provide detailed feedback messages upon user error.** Error messages should be contextual, informative, and actionable. Every error message that transparently provides the user with the solution to their problem means one less support ticket, multiplied by how many times users run into the same issue. A good error message should answer:
    - What happened, in what context?
    - What did the software expect?
    - How can the user fix it?
- **A docstring should answer the question: what is this about, and why & how should I use it?** It should assume as little context as possible, and it shouldn’t mention specialized terms without first introducing them (for example, “num_blocks: Number of blocks in the kernel” is not a good argument description if this is the first time you mention “blocks” in your docstring).
- **Show, don’t tell: your documentation should not talk about how the software works, it should show how to use it.** Show code examples for end-to-end workflows; show code examples for each and every common use case and key feature of your API. **All docstrings should include code examples.**
- **Deliberately design the user onboarding process for your feature.** How are complete newcomers going to find out the best way to solve their use case with your tool? Have an answer ready. Make sure your onboarding material closely maps to what your users care about: don't teach newcomers how your framework is implemented, teach them how they can use it to solve their own problems. After shipping a CL and writing good docstrings, make sure to create a Colab guide / tutorial showcasing the target workflow, and post it on the docs website or the TF blog.
- The feature is not ready until:
    - 1) Users know about it
    - 2) They know how to use it
    - 3) They're actually using it to solve the corresponding problem.


Note that Keras uses the following rules for writing docstrings:

- For class docstrings, document arguments in a `Arguments:` section in the class docstring, not in `__init__`.
    - When a user creates a class, they are not calling the `MyLayer.__init__()` method as if it were a regular method, they are calling `MyLayer`. We don't want to generate documentation for the `__init__()` method as a standalone method that needs to be called directly, that would be confusing. We also don't need `__init__()` docstrings that always start with "Initializes a MyLayer class.", which is useless information. Leaving `__init__()` without a docstring is the best practice.
    - If constructor arguments are documented in `__init__`, it forces us to programmatically copy the `__init__` docstring when generating docs and concatenate it to the class docstring. This means that the Arguments section becomes the last thing in the docstring, which is bad.
- The order of information in a class docstring should be:
    - One-line description of the class, that gives initial context to the user. e.g. `Applies Dropout to the input.` Make sure the one-line description is useful. No `Intantiates an ObscureName class instance.`
    - Paragraph(s) of more detailed information that tells the user what the object is for and when they need to use it. e.g. `The Dropout layer randomly sets input units to 0 with a frequency of "rate" at each step during training time, which helps prevent overfitting. Inputs not set to 0 are scaled up by "1/(1 - rate)" such that the sum over all inputs is unchanged. [...]`
    - If there is a reference paper, cite it here.
    - `Arguments` section.
    - If it's a layer that has arguments in `call`, the `Call arguments` section.
    - If it's a `Layer`, `Input shape` and `Output shape` sections.
    - Example(s).
    - Lastly, addendum. Information that isn't very important and that most users don't need, but that should be documented somewhere.
        - e.g. the section "About the layer's `dtype` attribute" in the base Layer class.
        - e.g. warnings about edge cases or compatibility issues.
        - e.g. pointers to further guides and tutorials.


### Error messages: a case study


The following would be a very poor error message:

```
AssertionError: '1 != 3'
```

In general, to validate user input, always use `ValueError` and avoid `assert`.

Also bad:

```
ValueError: 'Invalid target shape (600, 1).'
```

The following is better, but still not sufficient, because it does not tell the user what they passed, and does not quite say how to fix it:

```
ValueError: 'categorical_crossentropy requires target.shape[1] == classes'
```

Now, here's a good example, that says **what was passed**, **what was expected**, and **how to fix the issue**:

```
ValueError: '''You are passing a target array of shape (600, 1) while using as loss `categorical_crossentropy`.
`categorical_crossentropy` expects targets to be binary matrices (1s and 0s) of shape (samples, classes).
If your targets are integer classes, you can convert them to the expected format via:

---
from keras.utils import to_categorical
y_binary = to_categorical(y_int)
---

Alternatively, you can use the loss function `sparse_categorical_crossentropy` instead, which does expect integer targets.
```










================================================
FILE: project_setup_best_practices.md
================================================
# Best Practices for Managing Keras Projects on GitHub

This document describes the best practices for managing the projects under
"keras-team" on GitHub which use GitHub as the source of truth, including
[keras-tuner](https://github.com/keras-team/keras-tuner),
[autokeras](https://github.com/keras-team/autokeras),
[keras-cv](https://github.com/keras-team/keras-cv),
[keras-nlp](https://github.com/keras-team/keras-nlp),
and maybe more in the future. It covers linting, formating, testing, continuous
integration, issues and pull requests tagging, and so on.

The goal of this document is to:
* Improve the overall quality of the projects. The fact that projects all
  follow the same standard for dev process, which may evolve through time, will
  ensure the quality from all aspects.
* Unify the external contributing experience. The external open-source
  contributors may contribute to multiple Keras projects by submitting issues
  or pull requests. They don't need to learn from different contributing
  guides.
* Save time for the project leads. They save time by copying and pasting the
  same setup and by avoiding the listed caveats.

## Testing

### Testing framework

We use [pytest](https://docs.pytest.org/en/6.2.x/) for writing tests for the
projects, which is the most widely used testing framework for Python in the OSS
world. The configuration of pytest is
[here](https://github.com/keras-team/keras-tuner/blob/1.1.0/setup.cfg#L4-L16).

### File locations for the tests

Unit tests should be contained in sibling files, relative to the class or
utility files they are testing. The name of a test file should follow the
pattern of `*_test.py`. For example, the tests for
`/keras_tuner/engine/hyperparameters.py` are in
`/keras_tuner/engine/hyperparameters_tests.py`.

Integration tests may be contained in their own `/keras_tuner/integration_tests`
directory, as they may require extra files such as data.

While our unit test placement is not suggested in the
[good practices of pytest](https://docs.pytest.org/en/6.2.x/goodpractices.html)
doc, we recommend this approach to improve the discoverability of the unit
tests for new contributors. This discoverability doubles up as a method of
documentation; when users want to see what `util.utility_function()` does, they
can simply open the conveniently located sibling file, `util_test.py`.

### Test Coverage

We use [CodeCov](https://about.codecov.io/) to track the test coverage.You may
also refer to
[these settings](https://github.com/keras-team/keras-tuner/blob/1.1.0/setup.cfg#L24-L28)
in `setup.cfg`. We will see more about it in the continuous integration section.

Pytest CodeCov supports a wildcard exclude field, which should be set to
include `*_test.py`, as to ensure that tests are not included in the code
coverage count.

### Useful code snippets
Fix the random seed for all tests:
[Link1](https://github.com/keras-team/keras-tuner/blob/1.1.0/tests/conftest.py#L8-L17),
[Link2](https://github.com/keras-team/keras-tuner/blob/master/tests/unit_tests/randomness_test.py),
[Link3](https://www.tensorflow.org/api_docs/python/tf/keras/utils/set_random_seed).

Create a temporary path for testing: [Link](https://docs.pytest.org/en/6.2.x/tmpdir.html).

## Code styles

### Importing Keras modules

For projects based on Keras and TensorFlow, top-level imports are encouraged, like
shows in the following example.

```py
import tensorflow as tf
from tensorflow import keras
```

Exceptions may be acceptable when the module appeared too many times in the code,
like `keras.layers`.

### Linting and formatting

We use
[black](https://black.readthedocs.io/en/stable/),
[isort](https://pycqa.github.io/isort/), 
[flake8](https://flake8.pycqa.org/en/latest/)
to lint and format the code. black is to generally format the code. isort is to
sort the imports. flake8 is for some additional checks that black doesn't do,
like the long lines with a single string. You can see the relevant sections of
[setup.cfg](https://github.com/keras-team/keras-tuner/blob/1.1.0/setup.cfg) for
the detailed configuration of these tools.

The user does not need to know how to use these tools to lint or format the
code. We provide them with two shell scripts:
[`/shell/lint.sh`](https://github.com/keras-team/keras-tuner/blob/master/shell/lint.sh)
and
[`/shell/format.sh`](https://github.com/keras-team/keras-tuner/blob/master/shell/format.sh).
In these scripts, we also check and add the Apache 2.0 License head to every
file.

## Releasing

### Release setups

The version number of the package is stored only in `/package_name/__init__.py`
with a single line of `__version__ = 'master'` on the master branch.
[example](https://github.com/keras-team/keras-tuner/blob/1e13aabe5b6659340a8ee81328805479a57b2105/keras_tuner/__init__.py#L35)

We also need the `setup.py` file for the PyPI release.
[example](https://github.com/keras-team/keras-tuner/blob/1e13aabe5b6659340a8ee81328805479a57b2105/setup.py)

For the `setup.py` file to grab the current version number from
`/package_name/__init__.py`, we need additional lines in `setup.cfg`.
[example](https://github.com/keras-team/keras-tuner/blob/1.1.0/setup.cfg#L1-L2)

### Draft a new release

For releasing a new version of the package, please following these steps:
* Create a new branch from the master branch.
* Modify the `__version__` value in the new branch.
* Create a new release on GitHub.
  [Official tutorial](https://docs.github.com/en/repositories/releasing-projects-on-github/managing-releases-in-a-repository)

Note that the continuous integration will upload it to PyPI automatically.

### Excluding Sibling Test

Unit tests are hosted in sibling files relative to the files containing the
code they are testing. `SetupTools.find_packages()` supports an
[exclude field](https://github.com/pypa/setuptools/blob/f838bc6a170046c9fdfc2251e5466040a669ca12/setuptools/__init__.py#L52).
This field should contain `*_test.py` to ensure that tests are not packaged
with the release.

## Continuous integration

We use [GitHub Actions](https://github.com/features/actions) for continuous
integrations. It automates running tests, checking the code styles, uploading
test coverages to CodeCov, and uploading new releases to PyPI.

You can refer to
[this file](https://github.com/keras-team/keras-tuner/blob/master/.github/workflows/actions.yml)
for how to set it up. We use a single YAML file for all the GitHub Actions to
avoid installing the dependencies multiple times.

To use this setup, you also need to upload your CodeCov and PyPI credentials to
the project. Here is the
[official tutorial](https://docs.github.com/en/actions/security-guides/encrypted-secrets#creating-encrypted-secrets-for-a-repository).

Make sure you follow the naming of the following secrets for the GitHub Actions YAML file to work.
Name the CodeCov token as `CODECOV_TOKEN`.
Name the PyPI username and password as `PYPI_USERNAME` and `PYPI_PASSWORD`.

We should also test against tf-nightly every day to discover bugs and
incompatible issues early and well before the stable release of TensorFlow.
The CI setup for it is
[here](https://github.com/keras-team/keras-tuner/blob/master/.github/workflows/nightly.yml).

## Contributing experience

We will have a common CONTRIBUTING.md in `keras-team/governance` to be
distributed to the other repos. This
[GitHub Action](https://github.com/marketplace/actions/file-sync) may be a good
way to sync a centralized contributing guide to different repos.
We should also have
[this directory](https://github.com/keras-team/keras-tuner/tree/master/.devcontainer)
to support GitHub Codespaces, which is a trend on GitHub. It provides a
web-based IDE to save the contributors from setting up their own dev
environment, which would attract more contributors.

## Issues and pull requests

We will have the same issue and pull request
[templates](https://github.com/keras-team/keras/tree/master/.github/ISSUE_TEMPLATE)
across projects in `keras-team`. They will also be stored in
`keras-team/governance` and be distributed to the other repos.

Also need to confirm if there is a way to unify the taggings between the repos.


================================================
FILE: rfcs/20190502-preprocessing-layers.md
================================================
# Keras Preprocessing Layers

| Status        | Accepted      |
:-------------- |:---------------------------------------------------- |
| **Author(s)** | Mark Omernick (momernick@google.com), Stan Bileschi (bileschi@google.com), Kester Tong (kestert@google.com), Francois Chollet (fchollet@google.com) |
| **Updated**   | 2019-05-21                                           |


## Objective

We aim at providing additional Keras layers to handle [data preprocessing operations](https://en.wikipedia.org/wiki/Data_pre-processing)
such as text vectorization, data normalization, and data discretization (binning).
These operations are currently handled separately from a Keras model via utilities such
as those from `keras.preprocessing`.

These new layers will allow users to include data preprocessing directly in their Keras model, so as to create models that map raw data (such as uint8 tensors for images, or string tensors for text) to predictions.


## Key benefits

Including preprocessing layers in the Keras model means that the same preprocessing steps will be performed when that model is exported and used in serving.
It also means the steps will be part of the model when the model is saved and loaded as part of another model.

This presents the following advantages:

- Model portability (encapsulation for sharing models). With PreprocessingLayers, your Keras Model contains all the preprocessing it requires. If another user wishes to use your model in a different workflow, there is no risk of incorrect preprocessing. Models will be more end-to-end.
- Serving reliability. The Model object will contain everything you expect to be done at serving time.
- Simpler optimization using tf.data and tf.Transform. By providing simple, well defined building blocks for preprocessing, we simplify the process of using tf.data and tf.Transform to optimize preprocessing steps. Users can offload computation of vocabularies, quantiles and mean and variance, to tf.Transform.  They can also use tf.data to move data preprocessing in training off the critical path. The preprocessing layer API is designed to make both of these easy and simple.

In particular, we expect preprocessing layers to make it easier to serve models in TF.js or in mobile applications. It will also reduce the risk that benchmarks of Keras applications use incorrect preprocessing and subsquently publish invalid findings.


## Design overview

### End-to-end workflow overview

Case where a user has a single preprocessing layer to do image normalization.

```python
normalization = keras.layers.Normalization(axis=-1)
normalization.adapt(data_sample)

model = keras.Sequential([
    normalization,
    keras.applications.ResNet50(weights=None),
])
model.fit(data, targets, epochs=10)
```

Case where a user has a single preprocessing layer to do text vectorization where each input sample is encoded as a sequence of word indices.

```python
vectorization = keras.layers.TextVectorization(mode='int')
vectorization.adapt(data_sample)

model = keras.Sequential([
    vectorization,
    keras.layers.Embedding(128),  # The number of int indices is not specified since it is inferred.
    keras.layers.LSTM(32),
    keras.layers.Dense(10, activation='softmax'),
])
model.fit(data, targets, epochs=10)
```

Case where a user has a single preprocessing layer to do text vectorization where each input sample is encoded as a dense vector of TF-IDF scores.

```python
vectorization = keras.layers.TextVectorization(mode='tfidf')
vectorization.adapt(data_sample)

model = keras.Sequential([
    vectorization,
    keras.layers.Dense(10, activation='softmax'),
])
model.fit(data, targets, epochs=10)
```

Case where a user chains a a normalization step with a discretization step.

```python
normalization = keras.layers.Normalization()
discretization = keras.layers.Discretization()
preprocessing_stage = keras.layers.PreprocessingStage([normalization,
                                                       discretization])
preprocessing_stage.adapt(data_sample)

model = keras.Sequential([
    preprocessing_stage,
    keras.layers.Dense(10, activation='softmax'),
])
model.fit(data, targets, epochs=10)
```


### Base class: `PreprocessingLayer`

All preprocessing layers inherit from a base class: `PreprocessingLayer`, which itself inherits from `Layer`.

This class presents a few key differences compared to regular layers:

**Separate training mechanism**

The internal state of a `PreprocessingLayer` is not affected by backpropagation: all of its weights are non-trainable. A `PreprocessingLayer` has to be trained in a separate step, as follow:

```python
preprocessing_layer.adapt(data_sample)
```

**Possible non-differentiability**

Processing layers extend Keras by allowing preprocessing to be part of the model. Unlike existing layers, these computations are not always differentiable, e.g. both `Discretize` and `VectorizeText` are non-differentiable.

As a result, all preprocessing layers are treated as frozen when used as part of a model. In addition, if a non-differentiable layer is used in the middle of a model (rather than at the start), the model will raise an exception related to differentiability when trying to compute gradients (e.g. as part of `fit`).


### New layers

- `PreprocessingLayer` base class: implements shared logic, in particular the `adapt` method for setting the state of the layer.
- `PreprocessingStage` class: makes it possible to chain multiple preprocessing layers together while training them in one single `adapt` call (by doing cascading training of the underlying layers).
- `Normalization`: normalizes data feature-wise by subtracting the mean of some sample dataset and dividing by the variance.
- `Discretization`: transforms continuous data into one-hot encoded binary vectors representing the different "bins" that the continuous data belongs to.
- `TextVectorization`: transforms string data into either dense vectors (e.g. TF-IDF transform) or sequences of token indices (e.g. to be passed to an `Embedding` layer).


## Design details

### Detailed layer signatures

#### PreprocessingLayer

```python
def adapt(self, data, reset_state=True):
    """Fits the state of the preprocessing layer to the data being passed.

    Arguments:
        data: The data to train on. It can be passed either as a tf.data Dataset,
            or as a numpy array (or a dict or list of arrays in case of multi-input
            preprocessing stages).
        reset_state: Optional argument specifying whether to clear the state of the
            layer at the start of the call to `adapt`, or whether to start from
            the existing state. This argument may not be relevant to all
            preprocessing layers: a subclass of PreprocessingLayer may chose to
            only implement `adapt(self, data)`.
    """
```

#### PrepocessingStage

There are two ways to instantiate a `PrepocessingStage` layer: either `Sequential` style (pass a list of preprocessing layer instances) or Functional style (pass the inputs and outputs of a DAG of preprocessing layers).

If any layer other than `PreprocessingLayer` instances is included in a `PrepocessingStage`, these layers will be treated as frozen both during `adapt` and later during `fit`.


#### Normalization

```python
def __init__(self, axis=-1, **kwargs):
    """Feature-wise normalization of the data.

    Arguments:
        axis: Integer or tuple of integers, the axis or axes
            that should be normalized (typically the features axis).

    Input shape and type:
        dtype: floating point.
        shape: any shape with rank >= 2 is accepted.

    Output shape and type:
        dtype: same as input.
        shape: same as input.

    What happens in `adapt`:
        Compute mean and variance of the data
        and store them as the layer's weights.
    """
```

#### Discretization

```python
def __init__(self, bins=None, strategy='quantiles', sparse=False, **kwargs):
    """Maps continuous data into one-hot binary vectors of bin indicators.

    Each non-overlapping bin covers
    a contiguous portion of the dimension considered.
    Bin boundaries can be provided by the user or learned as quantiles.

    Arguments:
        bins: int | List<float>
            If bins is an int, then bin boundaries are to be learned,
            and the width of the output will be exactly bins.
            For instance, setting bins to 4 implies that
            inputs are to be sorted into quantiles,
            and three boundaries are to be learned,
            corresponding to the 25th, 50th, and 75th percentile value.
            If, instead, bins is a list of floats, then those are
            the bin boundary values and nothing is to be learned.
            The width of the output will in that case be the len(bins) + 1.
        strategy: callable | 'quantiles'
            If strategy is the string 'quantiles' (default),
            then bin boundaries will be learned such that each bin
            receives an approximately equal number of sample input values.
            ‘Strategy’ may also be a callable that takes
            (float value, list[float] boundaries) and returns
            an int bucket_index which represents
            which bucket to map ‘value’ to.
        sparse: If True, the layer will output a SparseTensor.
            Otherwise it will be dense.
            This does not change the shape or structure of the output.
            Specifically tf.sparse.to_dense(output) will be the same for both.

    Input shape and type:
        dtype: floating point.
        shape: [batch_size, ..., features]

    Output shape and type:
        dtype: int
        shape: [batch_size, ..., features, num_bins]
            i.e., the same as the input shape,
            with an additional dimension corresponding to
            the number of bins, which is equal to either
            the bins constructor argument (if it is an integer),
            or the length of the bins constructor argument plus 1,
            if it is a list.

    What happens in `adapt`:
        We use a streaming quantile estimator to update the bin boundaries
        so that statistically an element is about equally likely
        to fall into any bin.
        Multiple calls to update continue to mutate
        the layer based on all data seen so far.
    """
```

#### TextVectorization

This layer has basic options for managing text in the Keras model.
It is expected that more advanced users needing custom control will uses Keras-compatible layers provided by tf.text.

Transform a batch of strings (one sample = one string) into either a list of token indices
(one sample = 1D int tensor), or a dense representation (1 sample = 1D float vector).

The processing of each sample unfolds as:
- Standardize each sample (usually lowercasing + punctuation stripping)
- Split each sample into substrings (usually words)
- Recombine substrings into tokens (usually ngrams)
- Index tokens (associate a unique int value with each token)
- Transform each sample using this index, either into a vector of ints or a dense float vector.


```python
def __init__(self,
             tokens=None,
             standardize='lower_and_strip_punctuation',
             split='whitespace',
             ngrams=1,
             mode='int',
             max_length=None):
    """Transforms text into dense vectors or sequences of word indices.

    Arguments:
        tokens: None (default) | int | list<string>
            If tokens is an int, then this layer will learn
            an internal vocabulary of size (tokens - 2),
            such that each of the most frequent (tokens - 2) words
            is assigned assigned to one of the values in [0, tokens).
            The output will have a total to tokens possible values,
            once the out-of-vocabulary value (1)
            and the reserved masking value (0) is taken into account.
            If tokens is None, the number of tokens is automatically inferred
            from the training data (the output will have a number
            of possible values equal to the total number of unique tokens
            seen in the data, plus 2).
            If, instead, tokens is a list of strings, then it constitutes
            exactly to a map from string to integer,
            and there is nothing to be learned.
            The vocabulary output width will be len(tokens) + 2,
            accounting for the out-of-vocabulary value (1)
            and the reserved masking value (0).
        standardize: 'lower_and_strip_punctuation' (default) | None | callable string -> string
            if standardize is the string "lower_and_strip_punctuation",
            each sample is converted to lowercase
            and the following characters are stripped from each sample
            before splitting: '!"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n'
            if it is a callable, that callable is used
            to preprocess each input string before splitting.
        split: ‘whitespace’ (default) | None | Callable string -> list<string>
            if split is ‘whitespace’, then the string
            will be split on whitespace characters.
            if split is None, then each string is treated as a single token.
            if, instead, split is a function from strings to lists of strings,
            then that function will be applied to each string in the input.
        ngrams: 1 (default) | 2 | 3
            Controls the ngram functionality of this layer.
            This layer performs ngrams by concatenating strings
            with no separator and no begin or end tokens;
            the ngramming algorithm is not configurable.
            if ngrams is an int N = 2 or 3,
            the substrings returned by the split function
            are combined into N-grams before being indexed.
        mode: 'int' (default) | 'count' | 'binary' | 'tfidf'
            controls how the integerized words are
            reduced and packed into an output vector.
            if mode is 'count', then the output vector will be
            of length tokens, and the element at position i will
            summarize how many times the string mapping to
            integer i occurred in the split input.
            If, instead, mode is 'binary',
            then the output vector will be the same as for 'count'
            but will contain a 1 if the count is greater than 0.
            if, instead, mode is 'tfidf',
            then the output vector will be the same as for 'count',
            but instead of counts of tokens, will contain
            the weighted count where weights are determined
            by the ‘tfidf’ algorithm.
            if, instead, mode is 'int', then the output vector is
            an int tensor where each int is the index of one token
            in the input string.
        max_length:  None (default) | int.
            Only used if mode=int. If set to an int,
            the output int tensors are of shape [..., max_length],
            with longer sequences being truncated at the end and
            shorter sequences being right-padded.
            If set to None, output sequences are
            of shape [..., max_length_in_batch],
            where max_length_in_batch is the length
            of the longest sequence in the current batch:
            shorter sequences get right-padded.

    Input shape and type:
        dtype: string.
        shape: (batch_size, ..., 1)

    Output shape and type:
        if `mode='int'`:
            dtype: int
            shape: (batch_size, ..., max_length), where max_length
                is the length of the longest token sequence in the current batch, or
                the value of the argument `max_length` if it was passed.
        else:
            dtype: floating point.
            shape: (batch_size, ..., num_tokens)

    What happens in `adapt`:
        We build an index mapping tokens to token indices,
        and in the case of `mode='count'` and `mode='tfidf`,
        we keep track of how many time each token has appeared.
    """
```

### Writing a subclass of `PreprocessingLayer`

The following 3 methods should be overridden:

- `__init__`: constructor of the layer, used to configure its behavior.
- `build(self, inputs_shape)`: creates the state variables of the layer.
- `call(self, inputs)`: transforms the inputs (should only be called after `adapt` has been called).
- `adapt(self, data, [reset_state=True])`: sets the state of the layer given the data provided (either as a tf.data dataset or numpy array(s)). The `reset_state` argument is optional and may be ignored.


### Handling of async prefetching

Some preprocessing ops are CPU-only and benefit from being executed asynchronously on the accelerator host (as opposed to the accelerator itself, e.g. GPU or TPU),
with a batch of data being prepocessed on the host while the previous batch is being processed by the accelerator. This pattern is known as "async prefetching".

This is normally done as part of a tf.data pipeline. The current proposal implies moving some of that preprocessing to inside the model itself, which is normally
executed end-to-end on an accelerator.

This means that we need a way to lift the preprocessing part of the model in a tf.data pipeline during model training. In `fit`, we can do this automatically.
In custom training loops, we will expect the user to do it manually (see subsection "Custom training loops").

We propose the addition of two new methods on the `Model` class:

```python
def get_preprocessing_stage(self):
    """Retrieves the preprocessing part of the model.

    This is the part of the model that should be executed asynchronously
    on the device host during training.

    Returns:
        Instance of `PreprocessingLayer` or `PreprocessingStage`.
        May be None if the model does not start with preprocessing layers.
    """
    pass

def get_main_stage(self):
    """Retrieves the main processing part of the model.

    This is the part of the model that should be executed
    on the accelator device.

    Returns:
        Model instance.
    """
```

Thus, for any model that starts with preprocessing layers, the following:

```python
outputs = model(inputs)
```

is functionally equivalent to:

```python
preprocessed_inputs = model.get_preprocessing_stage()(inputs)
outputs = model.get_main_stage()(preprocessed_inputs)
```


#### Examples:

Sequential model with a preprocessing layer:

```python
vectorization = keras.layers.TextVectorization()
vectorization.adapt(data_sample)

model = keras.Sequential([
    vectorization,
    keras.layers.Dense(10, activation='softmax'),
])

# This is the `vectorization` layer.
preproc_stage = model.get_preprocessing_stage()
# model containing the `Dense` layer only.
main_stage = model.get_main_stage()
```

Functional model with 2 branches, each with a preprocessing layer:

```python
normalization_a = layers.Normalization()
normalization_b = layers.Normalization()
normalization_a.adapt(data_a)
normalization_b.adapt(data_b)

input_a = Input(shape_a)
input_b = Input(shape_b)
normed_a = normalization_a(input_a)
normed_b = normalization_b(input_b)
a = layers.Dense(32)(normed_a)
b = layers.Dense(32)(normed_b)
c = layers.concatenate([a, b])
outputs = layers.Dense(1, activation='sigmoid')(c)

model = Model([input_a, input_b], outputs)

# `PreprocessingStage` instance
# mapping `[input_a, input_b]` to `[normed_a, normed_b]`
preproc_stage = model.get_preprocessing_stage()

# Model instance mapping `[normed_a, normed_b]` to `outputs`.
main_stage = model.get_main_stage()
```

Subclassed model with a preprocessing layer:

```python
class MyModel(Model):

    def __init__(self, **kwargs):
        super(MyModel, self).__init__(**kwargs)
        self.preproc_layer = layers.Normalization()
        self.submodel = MySubmodel()

    def call(self, inputs):
        return self.submodel(self.preproc_layer(inputs))

    def get_preprocessing_stage(self):
        return self.preproc_layer

    def get_main_stage(self):
        return self.submodel
```


#### The case of the built-in `fit` loop


When calling `fit` or `evaluate` on a Dataset a model that contains preprocessing layers,
the lifting happens automatically and the user-facing workflow doesn't change.

```python
model.fit(dataset, epochs=10)
```

#### Custom training loops

When writing custom training loops, the user must manually do the lifting of the preprocessing stage
into the data pipeline:

```python
model = Model(...)
preproc_stage = model.get_preprocessing_stage()
main_model = model.get_main_stage()

preproc_dataset = Dataset(...)
preproc_stage.adapt(preproc_dataset)

# Map the preprocessing stage on the dataset.
dataset = Dataset(...)
dataset = dataset.map(preproc_stage)

# Regular training loop (using `main_model`).
for x, y in dataset:
    with GradientTape() as tape:
        y_pred = main_model(x)
        loss = loss_fn(y, y_pred)
        ...
```

In general, you won't have to refer to `get_preprocessing_stage` and `get_main_stage` directly, because you will
probably already have direct handles on your preprocessing layer and the rest of the model:

```python
normalization = layers.Normalization()
normalization.adapt(preproc_dataset)
dataset = dataset.map(normalization)

for x, y in dataset:
    with GradientTape() as tape:
        y_pred = model(x)
        loss = loss_fn(y, y_pred)
        ...
```


## Questions and Discussion Topics

### Naming Discussion

#### Naming conventions to follow for preprocessing layers

[RESOLUTION: we will use option A]

We have two possible sets of names for the layers:

##### Option A: Normalization, Discretization, TextVectorization

Pros: consistent with most existing layers, in particular BatchNormalization.
Cons: It's longer.

##### Option B: Normalize, Discretize, VectorizeText

Pros: It's shorter.
Cons: Normalize vs BatchNormalization is jarring.


#### Using the name "preprocessing" or "processing"

[RESOLUTION: we will use option A, "preprocessing"]

It has been proposed that we use the name "processing" throughout the API instead of "preprocessing".

##### Option A: "preprocessing".

Pros:
1) The meaning of "preprocessing" is clear to all users ("data normalization and stuff").
2) We need a clear semantic boundary between the main data processing flow of a model and what goes before it (the preprocessing stage).
3) It replaces the functionality of the `keras.preprocessing` module, and should be consistent with this naming convention.

Cons:
The `Normalization` layer, being differentiable, can be used in the middle of a model, rather than at the start.
However, there's nothing weird about keeping the name "preprocessing" in this specific case: it is widely understood that a `Normalization` layer is doing "data preprocessing", independently of where you use it -- in fact, normalization is the first example that shows up in most definitions of "data preprocessing". 


##### Option B: "processing".

Pros: The Normalization layer can be used elsewhere in a model than at the start (although it would have to be trained separately).
Cons: It's very generic, and does not clearly convey the difference between "preprocessing stage" and "main processing stage" required by the async prefetching API.


#### Name to use for `adapt` method

[RESOLUTION: decision delayed until implementation]

We may want to use the name `fit` instead (other suggestions welcome).

Pros of using `fit`: consistency with `model.fit()`, and the `fit` method on `ImageDataGenerator` and `Tokenizer` from the `keras.preprocessing` module.
Cons of using `fit`: It may confuse users, since `preprocessing_layer.fit()` would have a different signature.

---

[OTHER ADDITIONS FROM DESIGN REVIEW]

- We should decouple the user-facing `adapt(data)` method (or `fit(data)`), and the implementer-facing method, so as to make it easier to implement support for different data formats.





================================================
FILE: rfcs/20190729-keras-preprocessing-redesign.md
================================================
# Keras Preprocessing API

| Status        | Proposed      |
:-------------- |:---------------------------------------------------- |
| **Author(s)** | Francois Chollet (fchollet@google.com), Frederic Branchaud-Charron (Frederic.Branchaud-Charron@usherbrooke.ca)|
| **Updated**   | 2019-08-21                                           |


## Context

`tf.data.Dataset` is the main API for data loading and preprocessing in TensorFLow. It has two advantages:

- It supports GPU prefetching
- It supports distribution via the Distribution Strategies API

Meanwhile, `keras.preprocessing` is a major API for data loading and preprocessing in Keras. It is based
on Numpy and Scipy, and it produces instances of the `keras.utils.Sequence` class, which are finite-length,
resettable Python generators that yield batches of data.

Some features of `keras.preprocessing` are highly useful and don't have straightforward equivalents in `tf.data`
(in particular image data augmentation and dynamic time series iteration).

Ideally, the utilities in `keras.preprocessing` should be made compatible with `tf.data`.
This presents the opportunity to improve on the existing API. In particular we don't have good support
for image segmentation use cases today.

Some features are also being supplanted by [preprocessing layers](https://github.com/keras-team/governance/blob/master/rfcs/20190502-preprocessing-layers.md), in particular text processing. 
As a result we may want move the current API to an API similar to Layers.


## Goals

- Unify "keras.preprocessing" and the recently-introduced [Preprocessing Layers API](https://github.com/keras-team/governance/blob/master/rfcs/20190502-preprocessing-layers.md).
- Make all features of `keras.preprocessing` compatible with `tf.data`.
- As a by-product, add required ops to TensorFlow (`tf.image`).


## Proposed changes at a high-level


- Deprecate `ImagePipelineGenerator` in favor of new `ImagePipeline` class similar to a `Sequential` model.
- Inherits from `keras.layers.PreprocessingLayer` for all image transformations.
- Deprecate `Tokenizer` class in favor of `TextVectorization` preprocessing layer.
- Replace `TimeseriesGenerator` with a function-based API.


## Detailed API changes


### ImagePipeline

#### Constructor

`ImagePipeline` inherits from `PreprocessingLayer` (or alternatively `keras.model.Sequential`, whose behavior is similar) and takes a list of layers as inputs. In the future it will inherit from `PreprocessingStage`.

`ImagePipeline` is a preprocessing layer that encapsulate a series of image transformations. Since some of these transformations may be trained (featurewise normalization), it exposes the method `adapt`, like all other preprocessing layers.


```python

class ImagePipeline(Sequential):

    def __init__(self, layers:List[Layer]):
        ...
```

#### Example usage

```python
preprocessor = ImagePipeline([
    RandomFlip(horizontal=True),
    RandomRotation(0.2, fill_mode='constant'),
    RandomZoom(0.2, fill_mode='constant'),
    RandomTranslation(0.2, fill_mode='constant'),
    Normalization(),  # This is the same Normalization introduced in preprocessing layers
])
preprocessor.adapt(sample_data)  # optional step in case the object needs to be trained

dataset = preprocessor.from_directory(dir_name, image_size=(512, 512))
model.fit(dataset, epochs=10)
```

#### Methods

```python
def from_directory(
    self,
    directory,
    targets='inferred',
    target_mode='categorical',
    class_names='inferred',
    color_mode='rgb',
    batch_size=32,
    image_size=(255, 255),
    shuffle=True,
    seed=None,
    follow_links=False,
    validation_split=None,
    subset='training',
    subset=None):
    """Generates a Dataset from files in a directory.

    # Arguments:
        directory: Directory where the data is located.
            If `targets` is "inferred", it should contain
            subdirectories, each containing images for a class.
            Otherwise, the directory structure is ignored.
        targets: Either
            "inferred" (targets are generated from the directory structure),
            None (no targets),
            or a list of integer labels of the same size as the number of image
            files found in the directory.
        target_mode:
            - 'categorical' means that the inferred labels are
                encoded as a categorical vector (e.g. for categorical_crossentropy).
            - 'binary' means that the inferred labels (there can be only 2)
                are encoded as binary scalars (e.g. for binary_crossentropy).
        class_names: Only valid if "targets" is "inferred". This is the explict
            list of class names (must match names of subdirectories). Used
            to control the order of the classes (otherwise alphanumerical order is used).
        color_mode: One of "grayscale", "rgb", "rgba". Default: "rgb".
            Whether the images will be converted to
            have 1, 3, or 4 channels.
        batch_size: Size of the batches of data (default: 32).
        image_size: Size to resize images to after they are read from disk.
          Since the pipeline processes batches of images that must all have the same size,
          this must be provided.
        shuffle: Whether to shuffle the data (default: True)
            If set to False, sorts the data in alphanumeric order.
        seed: Optional random seed for shuffling and transformations.
        follow_links: Whether to follow links inside
            subdirectories (default: False).
        validation_split: Optional float between 0 and 1,
            fraction of data to reserve for validation.
        subset: One of "training" or "validation". Only used if `validation_split` is set.
    """

def from_dataframe(
    self,
    dataframe,
    directory=None,
    data_column='filename',
    target_column='class',
    target_mode='categorical',
    weight_column=None,
    color_mode='rgb',
    batch_size=32,
    image_size=(255, 255),
    shuffle=True,
    seed=None,
    validation_split=None,
    subset=None):
    """Generates a Dataset from a Pandas dataframe.

    # Arguments:
        dataframe: Pandas dataframe instance.
        directory: The directory that image paths refer to.
        data_column: Name of column with the paths for the input images.
        target_column: Name of column with the class information.
        target_mode:
            - 'categorical' means that the inferred labels are
                encoded as a categorical vector (e.g. for categorical_crossentropy).
            - 'binary' means that the inferred labels (there can be only 2)
                are encoded as binary scalars (e.g. for binary_crossentropy).
        weight_column: Name of column with sample weight information.
        color_mode: One of "grayscale", "rgb", "rgba". Default: "rgb".
            Whether the images will be converted to
            have 1, 3, or 4 channels.
        batch_size: Size of the batches of data (default: 32).
        image_size: Size to resize images to after they are read from disk.
          Since the pipeline processes batches of images that must all have the same size,
          this must be provided.
        shuffle: Whether to shuffle the data (default: True)
            If set to False, sorts the data in alphanumeric order.
        seed: Optional random seed for shuffling and transformations.
        validation_split: Optional float between 0 and 1,
            fraction of data to reserve for validation.
        subset: One of "training" or "validation". Only used if `validation_split` is set.
    """

def preview(self, data, save_to_directory=None, save_prefix=None, save_format='png'):
    """Enables users to preview the image augmentation configuration.

    # Arguments
        data: Image data. Could be strings (a list of image paths), a list of PIL image instances,
            a list of arrays, or a list of eager tensors.
        save_to_directory: Directory to save transformed images. Mandatory if not in a notebook.
            If in a notebook and this is not specified, images are displayed in-line.
        save_prefix: String, filename prefix for saved images.
        save_format: String, extension for saved images.
    """
```

**Note:** `from_arrays` is not included since it is possible to transform Numpy data simply by calling the `ImagePipeline` object (like a layer).


### Layers

The new data augmentation layers will inherit `keras.layers.Layer` and work in a similar way.

```python
Resizing(height, width)  # Resize while distorting aspect ratio
CenterCrop(height, width)  # Resize without distorting aspect ratio
RandomCrop(height, width, seed=None)  # Return a (height, width) crop from a random location
Rescaling(value)  # Divide by `value`
RandomFlip(horizontal=False, vertical=False, seed=None)
RandomTranslation(amplitude=0., fill_mode='constant', fill_value=0., seed=None)
RandomRotation(amplitude=0., fill_mode='constant', fill_value=0., seed=None)
RandomZoom(amplitude=0., fill_mode='constant', fill_value=0., seed=None)
RandomBrightness(amplitude=0., seed=None)
RandomContrast(amplitude=0., seed=None)
RandomSaturation(amplitude=0., seed=None)
RandomWidth(amplitude=0., seed=None)  # Expand / shrink width while distorting aspect ratio
RandomHeight(amplitude=0., seed=None)  # Expand / shrink height while distorting aspect ratio
```

The `amplitude` argument may be:
- a positive float: it is understood as "fraction of total" (total is the current width, or height, or 180 degrees in the case `RandomRotation`). E.g. `0.2` results in variations in the [-20%, +20%] range. If larger than 1, it is rounded to one for the lower boundary (but not the higher boundary).
- a tuple of 2 positive floats: understood as a fractional range, e.g. `(0.2, 0.4)` is interpreted as the [-20%, +40%] range. The first float may not be larger than 1.

To do a random center crop that zooms in and discards part of the image, you would do:

```python
preprocessor = ImagePipeline([
  RandomZoom([0., 0.2]),
  CenterCrop(height, width),
])
```


#### Notes

- We are dropping support for ZCA whitening as it is no longer popular in the computer vision community.
- We don't have immediate support for random translations along only one axis.
- We only plan on implementing support for `data_format='channels_last'`. As such this argument does not appear in the API.


#### Example implementation

```python
class RandomFlip(PreprocessingLayer):

  def __init__(self, horizontal=False, vertical=False, seed=None):
    self.horizontal = horizontal
    self.vertical = vertical
    self.seed = seed or random_int()
    self._rng = rng_from_seed(seed)

  def call(self, inputs, training=None, seed=None):
    seed = seed or self._rng.sample()
    if training:
      if self.horizontal:
        inputs = tf.image.random_flip_left_right(inputs, seed=seed)
      if self.vertical:
        inputs = tf.image.random_flip_up_down(inputs, seed=seed)
    return inputs
```



#### Question: how to support image segmentation in a simple way?

**Requirements:**
- Image loading and image augmentation should be synced across inputs and targets
- It should be possible to use different standardization preprocessing (outside of augmentation) across inputs and targets

**Proposal:**

```python
# Shared spatial transformations for inputs and targets
augmenter = ImagePipeline([
    RandomRotation(0.5),
    RandomFlip(vertical=True)
])

input_pipeline = ImagePipeline([
    augmenter,
    RandomBrightness(0.2),
    RandomContrast(0.2),
    RandomSaturation(0.2),
])
target_pipeline = ImagePipeline([
    augmenter,
    OneHot(num_classes)
])

input_ds = input_pipeline.from_directory(
    input_dir, targets=None, image_size=(150, 150), batch_size=32,
    seed=123)  # This seed supercedes the per-layer seed in all transformations
target_ds = target_pipeline.from_directory(
    target_dir,  # target_dir should have same structure as input_dir.
    targets=None, image_size=(150, 150), batch_size=32, seed=123)

ds = tf.data.Dataset.zip((input_ds, target_ds))
model.fit(ds)
```

Note that the behavior of having the `seed` argument in `from_directory` supercedes the per-layer argument is achieved by using the seed
to sample new random ints (scalar tensors from `tf.random.experimental.Generator`) to serve as the `call` argument to each underlying layer.


### TimeseriesGenerator

- Deprecate existing `TimeSeriesGenerator` class
- Introduce functional replacement `timeseries_dataset`:

```python
def timeseries_dataset(
      data, targets, length,
      sampling_rate=1,
      stride=1,
      start_index=0,
      end_index=None,
      shuffle=False,
      reverse=False,
      batch_size=128):
      """Utility function for generating batches of temporal data.

      This function takes in a sequence of data-points gathered at
      equal intervals, along with time series parameters such as
      stride, length of history, etc., to produce batches for
      training/validation.

      # Arguments
          data: Indexable generator (such as list or Numpy array)
              containing consecutive data points (timesteps).
              The data should be at 2D, and axis 0 is expected
              to be the time dimension.
          targets: Targets corresponding to timesteps in `data`.
              It should have same length as `data`.
          length: Length of the output sequences (in number of timesteps).
          sampling_rate: Period between successive individual timesteps
              within sequences. For rate `r`, timesteps
              `data[i]`, `data[i-r]`, ... `data[i - length]`
              are used for create a sample sequence.
          stride: Period between successive output sequences.
              For stride `s`, consecutive output samples would
              be centered around `data[i]`, `data[i+s]`, `data[i+2*s]`, etc.
          start_index: Data points earlier than `start_index` will not be used
              in the output sequences. This is useful to reserve part of the
              data for test or validation.
          end_index: Data points later than `end_index` will not be used
              in the output sequences. This is useful to reserve part of the
              data for test or validation.
          shuffle: Whether to shuffle output samples,
              or instead draw them in chronological order.
          reverse: Boolean: if `true`, timesteps in each output sample will be
              in reverse chronological order.
          batch_size: Number of timeseries samples in each batch
              (except maybe the last one).

      # Returns
          A Dataset instance.
      """
```



================================================
FILE: rfcs/20191212-keras-categorical-inputs.md
================================================
# Keras categorical inputs

| Status        | Implemented (https://github.com/tensorflow/community/pull/209) |
:-------------- |:---------------------------------------------------- |
| **Author(s)** | Zhenyu Tan (tanzheny@google.com), Francois Chollet (fchollet@google.com)|
| **Sponsor**   | Karmel Allison (karmel@google.com), Martin Wicke (wicke@google.com) |
| **Updated**   | 2019-02-22                                           |

## Objective

This document proposes 5 new Keras preprocessing layers (KPL) (`StringLookup`, `CategoryCrossing`, `CategoryEncoding`, `Hashing`, `IntegerLookup`) and allow users to:
* Perform basic feature engineering for categorical inputs
* Replace feature columns and `tf.keras.layers.DenseFeatures` with proposed layers
* Introduce sparse inputs that work with Keras linear models and other layers that support sparsity

Other proposed layers for replacement of feature columns such as `tf.feature_column.bucketized_column` and `tf.feature_column.numeric_column` has been discussed [here](https://github.com/keras-team/governance/blob/master/rfcs/20190502-preprocessing-layers.md).

The proposed layers should support ragged tensors.

## Motivation

Specifically, by introducing the 5 layers, we aim to address these pain points:
* Users have to define both feature columns and Keras Inputs for the model, resulting in code duplication and deviation from DRY (Do not repeat yourself) principle. See this [Github issue](https://github.com/tensorflow/tensorflow/issues/27416).
* Users with large dimension categorical inputs will incur large memory footprint and computation cost, if wrapped with indicator column through `tf.keras.layers.DenseFeatures`.
* Currently there is no way to correctly feed Keras linear model or dense layer with multivalent categorical inputs or weighted categorical inputs, or shared embedding inputs.
* Feature columns offer black-box implementations, mix feature engineering with trainable objects, and lead to
  unintended coding pattern.

## User Benefit

We expect to get rid of the user painpoints once migrating off feature columns.

## Example Workflows

Two example workflows are presented below. These workflows can be found at this [colab](https://colab.sandbox.google.com/drive/1cEJhSYLcc2MKH7itwcDvue4PfvrLN-OR).

### Workflow 1 -- Official guide on how to replace feature columns with KPL

Refer to [tf.feature_column](https://www.tensorflow.org/api_docs/python/tf/feature_column) for a complete list of feature columns.

1. Replacing `tf.feature_column.categorical_column_with_hash_bucket` with `Hashing`
from
```python
tf.feature_column.categorical_column_with_hash_bucket(key, hash_bucket_size)
```
to
```python
keras_input = tf.keras.Input(shape=(1,), name=key, dtype=dtype)
hashed_input = tf.keras.experimental.preprocessing.Hashing(num_bins=hash_bucket_size)(keras_input)
```

Note the hashed output from KPL will be different than the hashed output from feature column, given how seed is choosen. `Hashing` also supports customized `salt`.

2. `tf.feature_column.categorical_column_with_identity`
This feature column is merely for having identical inputs and outputs except mapping out-of-range value into `default_value`, thus can easily be done at data cleaning stage,
not be part of feature engineering, and hence dropped in this proposal.

3. Replacing `tf.feature_column.categorical_column_with_vocabulary_file` and `tf.feature_column.categorical_column_with_vocabulary_list` with `StringLookup` or `IntegerLookup`.
for string inputs,
from
```python
tf.feature_column.categorical_column_with_vocabulary_file(key, vocabulary_file, vocabulary_size, tf.dtypes.string, default_value, num_oov_buckets)
```
to
```python
keras_input = tf.keras.Input(shape=(1,), name=key, dtype=tf.dtypes.string)
id_input = tf.keras.experimental.preprocessing.StringLookup(max_tokens=vocabulary_size + num_oov_buckets,
  num_oov_indices=num_oov_buckets, mask_token=None, vocabulary=vocabulary_file)(keras_input)
```

Similarly, from
```python
tf.feature_column.categorical_column_with_vocabulary_list(key, vocabulary_list, tf.dtypes.string, default_value, num_oov_buckets)
```
to
```python
keras_input = tf.keras.Input(shape=(1,), name=key, dtype=tf.dtypes.string)
id_input = tf.keras.experimental.preprocessing.StringLookup(max_tokens=len(vocabulary_list) + num_oov_buckets, num_oov_indices=num_oov_buckets,
  mask_token=None, vocabulary=vocabulary_list)(keras_input)
```


Note that `default_value` is mutually exclusive with `num_oov_buckets`, in the case of `num_oov_buckets=0` and `default_value=-1`, simply set `num_oov_indices=0`. We do not support
any values other than `default_value=-1`.

Note the out-of-range values for `StringLookup` is prepended, i.e., [0,..., num_oov_tokens) for out-of-range values, whereas for `categorical_colulmn_with_vocabulary_file` is
appended, i.e., [vocabulary_size, vocabulary_size + num_oov_tokens) for out-of-range values. The former can give you more flexibility when reloading and adding vocab.

For integer inputs,
from
```python
tf.feature_column.categorical_column_with_vocabulary_file(key, vocabulary_file, vocabulary_size, tf.dtypes.int64, default_value, num_oov_buckets)
```
to
```python
keras_input = tf.keras.Input(shape=(1,), name=key, dtype=tf.dtypes.int64)
id_input = tf.keras.experimental.preprocessing.IntegerLookup(max_values=vocabulary_size + num_oov_buckets, num_oov_indices=num_oov_buckets, mask_value=None, vocabulary=vocabulary_file)(keras_input)
```

Similarly, from
```python
tf.feature_column.categorical_column_with_vocabulary_list(key, vocabulary_list, tf.dtypes.int64, default_value, num_oov_buckets)
```
to
```python
keras_input = tf.keras.Input(shape=(1,), name=key, dtype=tf.dtypes.int64)
id_input = tf.keras.experimental.preprocessing.IntegerLookup(max_values=len(vocabulary_list) + num_oov_buckets, num_oov_indices=num_oov_buckets, mask_value=None, vocabulary=vocabulary_list)(keras_input)
```


4. Replacing `tf.feature_column.crossed_column` with `CategoryCrossing` or `Hashing`
from
```python
tf.feature_column.crossed_column(keys, hash_bucket_size, hash_key)
```
to
```python
keras_inputs = []
for key in keys:
  keras_inputs.append(tf.keras.Input(shape=(1,), name=key, dtype=tf.dtypes.string))
hashed_input = tf.keras.layers.experimental.preprocessing.Hashing(num_bins=hash_bucket_size)(keras_inputs)
```

Note when `hash_bucket_size=0`, no hashing is performed, in this case it should be replaced with:
```python
keras_inputs = []
for key in keys:
  keras_inputs.append(tf.keras.Input(shape=(1,), name=key, dtype=tf.dtypes.string))
crossed_input = tf.keras.layers.experimental.preprocessing.CategoryCrossing()(keras_inputs)
```

5. Replacing `tf.feature_column.embedding_column` with `tf.keras.layers.Embedding`
Note that `combiner=sum` can be replaced with `tf.reduce_sum` and `combiner=mean` with `tf.reduce_mean` after
the embedding output. `sqrtn` can also be implemented using tf operations. For example:
```python
categorical_column = tf.feature_column.categorical_column_with_vocabulary_list(key, vocabulary_list)
tf.feature_column.embedding_column(categorical_column, dimension=dimension, combiner="sum", initializer=initializer,
  max_norm=max_norm)
```
can be replaced with:
```python
categorical_input = tf.keras.Input(name=key, dtype=tf.string)
id_input = tf.keras.layers.experimental.preprocessing.StringLookup(vocabulary=vocabulary_list)(categorical_input)
embedding_input = tf.keras.layers.Embedding(input_dim=len(vocabulary_list), output_dim=dimension,
  embeddings_initializer=initializer, embeddings_constraint=tf.keras.constraints.MaxNorm(max_norm))(id_input)
embedding_input = tf.reduce_sum(embedding_input, axis=-2)
```

6. Replacing `tf.feature_column.indicator_column` with `CategoryEncoding`
from
```python
categorical_column = tf.feature_column.categorical_column_with_vocabulary_list(key, vocabulary_list)
tf.feature_column.indicator_column(categorical_column)
```
to
```python
categorical_input = tf.keras.Input(name=key, dtype=tf.string)
id_input = tf.keras.layers.experimental.preprocessing.StringLookup(vocabulary=vocabulary_list)(categorical_input)
encoded_input = tf.keras.layers.experimental.preprocessing.CateogoryEncoding(
  max_tokens=categorical_column.num_buckets, output_mode="count", sparse=True)(id_input)
```

Note that `CategoryEncoding` supports one-hot through `output_mode="binary"` as well. This is a much more
efficient approach than `tf.one_hot` + `tf.reduce_sum(axis=-2)` to reduce the multivalent categorical inputs.

Note that by specifing `sparse` flag, the output can be either a `tf.Tensor` or `tf.SparseTensor`.

7. Replacing `tf.feature_column.weighted_categorical_column` with `CategoryEncoding`
from
```python
categorical_column = tf.feature_column.categorical_column_with_vocabulary_list(key, vocabulary_list)
tf.feature_column.weighted_categorical_column(categorical_column, weight_feature_key)
```
to
```python
categorical_input = tf.keras.Input(name=key, dtype=tf.string)
lookup_output = tf.keras.layers.experimental.preprocessing.StringLookup(vocabulary=vocabulary_list)(categorical_input)
weight_input = tf.keras.Input(shape=(1,), dtype=tf.float32, name=weight_feature_key)
weighted_output = tf.keras.layers.experimental.preprocessing.CategoryEncoding(
  max_tokens=categorical_column.num_buckets)(lookup_output, weight_input)
```

8. Replacing `tf.feature_column.shared_embeddings` with a single `tf.keras.layers.Embedding`.
Similar to 5, but with multiple categorical inputs:
from
```python
watched_video_id = tf.feature_column.categorical_column_with_vocabulary_list('watched_video_id', video_vocab_list)
impression_video_id = tf.feature_column.categorical_column_with_vocabulary_list('impression_video_id', video_vocab_list)
tf.feature_column.shared_embeddings([watched_video_id, impression_video_id], dimension)
```
to
```python
watched_video_input = tf.keras.Input(shape=(1,), name='watched_video_id', dtype=tf.int64)
impression_video_input = tf.keras.Input(shape=(1,), name='impression_video_id', dtype=tf.int64)
embed_layer = tf.keras.layers.Embedding(input_dim=len(video_vocab_list), output_dim=dimension)
embedded_watched_video_input = embed_layer(watched_video_input)
embedded_impression_video_input = embed_layer(impression_video_input)
```

9. Replacing `tf.estimator.LinearXXX` with `CategoryEncoding` and `tf.keras.experimental.LinearModel`.
LinearClassifier or LinearRegressor treats categorical columns by multi-hot, this can be replaced by encoding layer and Keras linear model, see Workflow 2 for details.

10. Replacing `tf.feature_column.numeric_column` and `tf.feature_column.sequence_numeric_column` with `tf.keras.Input` and `Normalization`.
`tf.keras.layers.experimental.preprocessing.Normalization` with `set_weights` on mean and standard deviation.

11. Replacing `tf.feature_column.sequence_categorical_xxx`.
Replacing `tf.feature_column.sequence_categorical_xxx` is similar to `tf.feature_column.categorical_xxx` except `tf.keras.Input` should take time dimension into
`input_shape` as well.

12. Replacing `tf.feature_column.bucketized_column` with `Discretization`.
from
```python
source_column = tf.feature_column.numeric_column(key)
tf.feature_column.bucketized_column(source_column, boundaries)
```
to
```python
keras_input = tf.keras.Input(shape=(1,), name=key, dtype=tf.float32)
bucketized_input = tf.keras.experimental.preprocessing.Discretization(bins=boundaries)(keras_input)
```


### Workflow 2 -- Complete Example

This example gives an equivalent code snippet to canned `LinearEstimator` [tutorial](https://www.tensorflow.org/tutorials/estimator/linear) on the Titanic dataset:

Refer to this [colab](https://colab.sandbox.google.com/drive/1cEJhSYLcc2MKH7itwcDvue4PfvrLN-OR) to reproduce.

```python
dftrain = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/train.csv')
y_train = dftrain.pop('survived')

STRING_CATEGORICAL_COLUMNS = ['sex', 'class', 'deck', 'embark_town', 'alone']
INT_CATEGORICAL_COLUMNS = ['n_siblings_spouses', 'parch']
NUMERIC_COLUMNS = ['age', 'fare']

keras_inputs = {}
keras_preproc_inputs = []
for key in STRING_CATEGORICAL_COLUMNS:
  keras_input = tf.keras.Input(shape=(1,), dtype=tf.string, name=key)
  keras_inputs[key] = keras_input
  vocab = dftrain[key].unique()
  keras_preproc_input = tf.keras.layers.experimental.preprocessing.StringLookup(vocabulary=vocab, num_oov_indices=0, mask_token=None, name='lookup' + key)(keras_input)
  keras_preproc_input = tf.keras.layers.experimental.preprocessing.CategoryEncoding(max_tokens=len(vocab), output_mode='count', sparse=True, name='encode' + key)(keras_preproc_input)
  keras_preproc_inputs.append(keras_preproc_input)

for key in INT_CATEGORICAL_COLUMNS:
  keras_input = tf.keras.Input(shape=(1,), dtype=tf.int64, name=key)
  keras_inputs[key] = keras_input
  vocab = dftrain[key].unique()
  keras_preproc_input = tf.keras.layers.experimental.preprocessing.IntegerLookup(vocabulary=vocab, num_oov_indices=0, mask_value=None, name='lookup' + key)(keras_input)
  keras_preproc_input = tf.keras.layers.experimental.preprocessing.CategoryEncoding(max_tokens=len(vocab), output_mode='count', sparse=True, name='encode' + key)(keras_preproc_input)
  keras_preproc_inputs.append(keras_preproc_input)

for key in NUMERIC_COLUMNS:
  keras_input = tf.keras.Input(shape=(1,), dtype=tf.float32, name=key)
  keras_inputs[key] = keras_input
  keras_preproc_inputs.append(keras_preproc_input)

age_x_sex = tf.keras.layers.experimental.preprocessing.CategoryCrossing(name='age_x_sex_crossing')([keras_inputs['age'], keras_inputs['sex']])
age_x_sex = tf.keras.layers.experimental.preprocessing.Hashing(num_bins=100, name='age_x_sex_hashing')(age_x_sex)
keras_output_age_x_sex = tf.keras.layers.experimental.preprocessing.CategoryEncoding(max_tokens=100, output_mode='count', sparse=True, name='age_x_sex_encoding')(age_x_sex)
keras_preproc_inputs.append(keras_output_age_x_sex)


linear_model = tf.keras.experimental.LinearModel(units=1, kernel_initializer='zeros', activation='sigmoid')
linear_logits = linear_model(keras_preproc_inputs)
sorted_keras_inputs = tuple(keras_inputs[key] for key in sorted(keras_inputs.keys()))
model = tf.keras.Model(sorted_keras_inputs, linear_logits)

model.compile('ftrl', 'binary_crossentropy', metrics=['accuracy'])

df_dataset = tf.data.Dataset.from_tensor_slices((dict(dftrain), y_train))
def encode_map(features, labels):
  encoded_features = tuple(tf.expand_dims(features[key], axis=1) for key in sorted(features.keys()))
  return (encoded_features, labels)
encoded_dataset = df_dataset.batch(32).map(encode_map)

model.fit(encoded_dataset)
```

## Design Proposal

```python
`tf.keras.layers.StringLookup`
StringLookup(PreprocessingLayer):
"""This layer transforms categorical inputs to index space.
   If input is dense/sparse, then output is dense/sparse."""

  def __init__(self, max_tokens=None, num_oov_indices=1, mask_token="",
               oov_token="[UNK]", vocabulary=None, encoding=None,
               invert=False, name=None, **kwargs):
    """Constructs a IndexLookup layer.

    Args:
      max_tokens: The maximum size of the vocabulary for this layer. If None,
              there is no cap on the size of the vocabulary. Note that this vocabulary
              includes the OOV and mask tokens, so the effective number of tokens is
              (max_tokens - num_oov_indices - (1 if mask_token else 0))
      num_oov_indices: The number of out-of-vocabulary tokens to use; defaults to
              1. If this value is more than 1, OOV inputs are hashed to determine their
              OOV value; if this value is 0, passing an OOV input will result in a '-1'
              being returned for that value in the output tensor. (Note that, because
              the value is -1 and not 0, this will allow you to effectively drop OOV
              values from categorical encodings.)
      mask_token: A token that represents masked values, and which is mapped to
              index 0. Defaults to the empty string "". If set to None, no mask term
              will be added and the OOV tokens, if any, will be indexed from
              (0...num_oov_indices) instead of (1...num_oov_indices+1).
      oov_token: The token representing an out-of-vocabulary value. Defaults to
              "[UNK]".
      vocabulary: An optional list of vocabulary terms, or a path to a text file
              containing a vocabulary to load into this layer. The file should contain
              one token per line. If the list or file contains the same token multiple
              times, an error will be thrown.
      encoding: The Python string encoding to use. Defaults to `'utf-8'`.
      invert: If true, this layer will map indices to vocabulary items instead
              of mapping vocabulary items to indices.
      name: Name of the layer.
      **kwargs: Keyword arguments to construct a layer.

    Input shape:
            a string or int tensor of shape `[batch_size, d1, ..., dm]`
    Output shape:
            an int tensor of shape `[batch_size, d1, ..., dm]`

    Example:
      >>> vocab = ["a", "b", "c", "d"]
      >>> data = tf.constant([["a", "c", "d"], ["d", "z", "b"]])
      >>> layer = StringLookup(vocabulary=vocab)
      >>> layer(data)
      <tf.Tensor: shape=(2, 3), dtype=int64, numpy=
      array([[2, 4, 5],
             [5, 1, 3]])>
    """
    pass


`tf.keras.layers.IntegerLookup`
IntegerLookup(PreprocessingLayer):
"""This layer transforms categorical inputs to index space.
   If input is dense/sparse, then output is dense/sparse."""

  def __init__(self, max_values=None, num_oov_indices=1, mask_value=0,
               oov_value=-1, vocabulary=None, invert=False, name=None, **kwargs):
    """Constructs a IndexLookup layer.

    Args:
      max_values: The maximum size of the vocabulary for this layer. If None,
              there is no cap on the size of the vocabulary. Note that this vocabulary
              includes the OOV and mask values, so the effective number of values is
              (max_values - num_oov_values - (1 if mask_token else 0))
      num_oov_indices: The number of out-of-vocabulary values to use; defaults to
              1. If this value is more than 1, OOV inputs are modulated to determine
              their OOV value; if this value is 0, passing an OOV input will result in
              a '-1' being returned for that value in the output tensor. (Note that,
              because the value is -1 and not 0, this will allow you to effectively drop
              OOV values from categorical encodings.)
      mask_value: A value that represents masked inputs, and which is mapped to
              index 0. Defaults to 0. If set to None, no mask term will be added and the
              OOV values, if any, will be indexed from (0...num_oov_values) instead of
              (1...num_oov_values+1).
      oov_value: The value representing an out-of-vocabulary value. Defaults to -1.
      vocabulary: An optional list of values, or a path to a text file containing
              a vocabulary to load into this layer. The file should contain one value
              per line. If the list or file contains the same token multiple times, an
              error will be thrown.
      invert: If true, this layer will map indices to vocabulary items instead
              of mapping vocabulary items to indices.
      name: Name of the layer.
      **kwargs: Keyword arguments to construct a layer.

    Input shape:
            a string or int tensor of shape `[batch_size, d1, ..., dm]`
    Output shape:
            an int tensor of shape `[batch_size, d1, ..., dm]`

    Example:
      >>> vocab = [12, 36, 1138, 42]
      >>> data = tf.constant([[12, 1138, 42], [42, 1000, 36]])
      >>> layer = IntegerLookup(vocabulary=vocab)
      >>> layer(data)
      <tf.Tensor: shape=(2, 3), dtype=int64, numpy=
      array([[2, 4, 5],
             [5, 1, 3]])>
    """
    pass


`tf.keras.layers.CategoryCrossing`
CategoryCrossing(PreprocessingLayer):
"""This layer transforms multiple categorical inputs to categorical outputs
   by Cartesian product, and hash the output if necessary.
   If any of the inputs is sparse, then all outputs will be sparse. Otherwise, all outputs will be dense."""

  def __init__(self, depth=None, separator=None, name=None, **kwargs):
    """Constructs a CategoryCrossing layer.
    Args:
      depth: depth of input crossing. By default None, all inputs are crossed into
            one output. It can also be an int or tuple/list of ints. Passing an
            integer will create combinations of crossed outputs with depth up to that
            integer, i.e., [1, 2, ..., `depth`), and passing a tuple of integers will
            create crossed outputs with depth for the specified values in the tuple,
            i.e., `depth`=(N1, N2) will create all possible crossed outputs with depth
            equal to N1 or N2. Passing `None` means a single crossed output with all
            inputs. For example, with inputs `a`, `b` and `c`, `depth=2` means the
            output will be [a;b;c;cross(a, b);cross(bc);cross(ca)].
      separator: A string added between each input being joined. Defaults to '_X_'.
      name: Name to give to the layer.
      **kwargs: Keyword arguments to construct a layer.

    Input shape: a list of string or int tensors or sparse tensors of shape
            `[batch_size, d1, ..., dm]`

    Output shape: a single string or int tensor or sparse tensor of shape
            `[batch_size, d1, ..., dm]`

    Example: (`depth`=None)
      If the layer receives three inputs:
      `a=[[1], [4]]`, `b=[[2], [5]]`, `c=[[3], [6]]`
      the output will be a string tensor:
      `[[b'1_X_2_X_3'], [b'4_X_5_X_6']]`
    """
    pass

`tf.keras.layers.CategoryEncoding`
CategoryEncoding(PreprocessingLayer):
"""This layer transforms categorical inputs from index space to category space.
   If input is dense/sparse, then output is dense/sparse."""

  def __init__(self, max_tokens=None, output_mode="binary", sparse=False, name=None, **kwargs):
    """Constructs a CategoryEncoding layer.
    Args:
      max_tokens: The maximum size of the vocabulary for this layer. If None,
              there is no cap on the size of the vocabulary.
      output_mode: Specification for the output of the layer.
              Defaults to "binary". Values can be "binary", "count" or "tf-idf",
              configuring the layer as follows:
              "binary": Outputs a single int array per batch, of either vocab_size or
                max_tokens size, containing 1s in all elements where the token mapped
                to that index exists at least once in the batch item.
              "count": As "binary", but the int array contains a count of the number
                of times the token at that index appeared in the batch item.
              "tf-idf": As "binary", but the TF-IDF algorithm is applied to find the
                value in each token slot.
      sparse: Boolean. If true, returns a `SparseTensor` instead of a dense
              `Tensor`. Defaults to `False`.
      name: Name to give to the layer.
     **kwargs: Keyword arguments to construct a layer.

    Input shape: A int tensor of shape `[batch_size, d1, ..., dm-1, dm]`
    Output shape: a float tensor of shape `[batch_size, d1, ..., dm-1, num_categories]`

    Example:
      >>> layer = tf.keras.layers.experimental.preprocessing.CategoryEncoding(
      ...           max_tokens=4, output_mode="count")
      >>> layer([[0, 1], [0, 0], [1, 2], [3, 1]])
      <tf.Tensor: shape=(4, 4), dtype=float32, numpy=
        array([[1., 1., 0., 0.],
               [2., 0., 0., 0.],
               [0., 1., 1., 0.],
               [0., 1., 0., 1.]], dtype=float32)>
    """
    pass

`tf.keras.layers.Hashing`
Hashing(PreprocessingLayer):
"""This layer transforms categorical inputs to hashed output.
   If input is dense/sparse, then output is dense/sparse."""
  def __init__(self, num_bins, salt=None, name=None, **kwargs):
    """Constructs a Hashing layer.

    Args:
      num_bins: Number of hash bins.
      salt: A single unsigned integer or None.
              If passed, the hash function used will be SipHash64, with these values
              used as an additional input (known as a "salt" in cryptography).
              These should be non-zero. Defaults to `None` (in that
              case, the FarmHash64 hash function is used). It also supports
              tuple/list of 2 unsigned integer numbers, see reference paper for details.
      name: Name to give to the layer.
      **kwargs: Keyword arguments to construct a layer.

    Input shape: A single or list of string, int32 or int64 `Tensor`,
            `SparseTensor` or `RaggedTensor` of shape `[batch_size, ...,]`

    Output shape: An int64 `Tensor`, `SparseTensor` or `RaggedTensor` of shape
            `[batch_size, ...]`. If any input is `RaggedTensor` then output is
            `RaggedTensor`, otherwise if any input is `SparseTensor` then output is
            `SparseTensor`, otherwise the output is `Tensor`.

    Example:
      >>> layer = tf.keras.layers.experimental.preprocessing.Hashing(num_bins=3)
      >>> inp = [['A'], ['B'], ['C'], ['D'], ['E']]
      >>> layer(inp)
      <tf.Tensor: shape=(5, 1), dtype=int64, numpy=
        array([[1],
               [0],
               [1],
               [1],
               [2]])>
    """
    pass

```

### Alternatives Considered
An alternative is to provide solutions on top of feature columns. This will make user code to be slightly cleaner but far less flexible.

### Performance Implications
End to End benchmark should be same or faster than feature columns implementations.

### Dependencies
This proposal does not add any new dependencies.

### Engineering Impact
These changes will include more layers and thus binary size and build time. It will not impact startup time.
This code can be tested in its own and maintained in its own buildable unit.

### Platforms and Environments
This proposal should work in all platforms and environments.

### Best Practices, Tutorials and Examples
This proposal does not change the best engineering practices.

### Compatibility
No backward compatibility issues.

### User Impact
User facing changes to migrate feature column based Keras modeling to preprocessing layer based Keras modeling, as the example workflow suggests.

## Questions and Meeting Notes
We'd like to gather feedbacks on `IndexLookup`, specifically we propose migrating off from mutually exclusive `num_oov_buckets` and `default_value` and replace with `num_oov_tokens`.
1. Naming for encoding v.s. vectorize: encoding can mean many things, vectorize seems to general. We will go with "CategoryEncoding"
2. "mode" should be "count" or "avg_count", instead of "sum" and "mean".
3. Rename "sparse_combiner" to "mode", which aligns with scikit-learn.
4. Have a 'sparse_out' flag for "CategoryEncoding" layer.
5. Hashing -- we refer to hashing when we mean fingerprinting. Keep using "Hashing" for layer name, but document how it relies on tf.fingerprint, and also provides option for salt.
5. Rename "CategoryLookup" to "IndexLookup"

## Updates on 07/14/20
Mark the RFC as completed, update the layer naming and arguments.


================================================
FILE: rfcs/20200826-keras-nlp-scoping-design.md
================================================
# Keras NLP

| Status        | Proposed      |
:-------------- |:---------------------------------------------------- |
| **Author(s)** | Zhenyu Tan (tanzheny@google.com), Mark Omernick (momernick@google.com), Francois Chollet (fchollet@google.com), Hongkun Yu (hongkuny@google.com)|
| **Updated**   | 2020-09-11                                           |


## Objective

We aim at describing the scope of [keras-nlp](https://github.com/keras-team/keras-nlp), especially:

- What use cases `keras-nlp` should cover
- Boundaries between `keras-nlp` and [tensorflow addons](https://github.com/tensorflow/addons)
- Boundaries between `keras-nlp` and [tensorflow model garden](https://github.com/tensorflow/models)
- Boundaries between `keras-nlp` and [tf.keras](https://www.tensorflow.org/api_docs/python/tf/keras).
- Boundaries between `keras-nlp` and [tf.text](https://www.tensorflow.org/tutorials/tensorflow_text/intro).

## Motivation

Natural Language Processing (NLP) is a major application area for our users.
In recent years, Transformer-based models have become the foundation of many NLP workflows.
These workflows tend to reuse similar components, for which in some cases third-party packages
have been developed by the open-source community.

These third-party solutions are not always kept up to date or up to the same quality standards as core Keras.
They also raise the issue of API standardization.

To fix this, we want machine learning engineers to have access to a standard Keras-native,
optimized, and well-tested set of components to build their Transformer-based (and beyond) NLP workflows.

This provides key user benefits:

- The package would be first-party and thus always up to date with modern best practices.
- High code quality and testing standards and strict quality control: same level of trust as core Keras
- A shared API standard across the community
- Ability for the open-source community to build more advanced solutions *on top* of this package instead of reinventing it
- Ability for research scientists to benefit from subclassing and customizing base components to quickly test new research ideas

## Design Proposal

`keras-nlp` will include most standard Transformer-based modules, specifically:

- Keras layer components such as Transformer encoder and decoder blocks.
- Keras task components such as masked language, span labeler and named entity recognition.
- Tensorflow operations such as beam search.
- Keras optimizer utilities such as learning rate schedules widely used.
- Data loader and preprocessing for different dataset, such as SQUAD, GLUE.

### Success criteria for keras-nlp

- Reusable and standardized components that cover the above
- Easy-to-use API
- Models run on CPU/GPU/TPU seamlessly
- State of the art performance
- Models can be readily deployed to production

### Boundaries between keras-nlp and tf.text

- `tf.text` will contain all pre-processing operations, such as WordPiece Tokenizer, n-grams, that handles strings.
- `keras-nlp` will contain modeling components that cover workflows past the tokenization stage.

### Boundaries between `keras-nlp` and TensorFlow Addons:

- Highly experimental modeling, layers, losses, etc, live in Addons (e.g. newly published research code).
- Components from Addons will graduate to Model Garden, given they get sufficient usage,
and given that they work on CPU/GPU/TPU. The API interface will remain experimental for a short time after graduation,
so as to leave us the option to make changes based on user feedback.

### Boundaries between keras-nlp and Model Garden

- End to end modeling workflow and model specific details live in Model Garden
- Model garden will re-use most of the building blocks from keras-nlp
- Components from Model Garden can graduate to keras-nlp, given they get sufficient usage,
and given that they work on CPU/GPU/TPU. The API interface should remain stable after graduation.

### Boundaries between keras-nlp and core Keras

- `keras-nlp` will contain NLP-specific components
(e.g. the `MultiHeadAttention` layer may be used outside of NLP, and thus is shipping in core Keras).
- Components from keras-nlp can graduate to Keras core, given its usage expands beyond
 natural language processing.

## Dependencies

- Tensorflow version >= 2.4
- Tensorflow datasets

## Backwards compatibility

We propose to guarantee major release backwards compatibility.

## Maintenance

The `keras-nlp` codebase will be primarily maintained by the Keras team at Google,
with help and contributions from the community. The codebase will be developed
on GitHub as part of the `keras-team` organization. The same process for tracking
issues and reviewing PRs will be used as for the core Keras repository.

## Performance Benchmark

We will set up Keras benchmark utilities to help users contribute to this repository.

Detailed design will be shared in a separate document (this document only focuses on scope).

## Questions and Discussion Topics

Please share any questions or suggestion.


================================================
FILE: rfcs/20200827-keras-cv-scoping-design.md
================================================
# Keras CV

| Status        | Proposed      |
:-------------- |:---------------------------------------------------- |
| **Author(s)** | Zhenyu Tan (tanzheny@google.com), Francois Chollet (fchollet@google.com) |
| **Updated**   | 2020-08-27                                           |


## Objective

This document describes the scope of the [keras-cv](https://github.com/keras-team/keras-cv) package, especially:
- What use cases `keras-cv` should cover
- Boundaries between `keras-cv` and [TensorFlow Addons](https://github.com/tensorflow/addons)
- Boundaries between `keras-cv` and [TensorFlow model garden](https://github.com/tensorflow/models)
- Boundaries between `keras-cv` and [tf.keras.applications](https://keras.io/api/applications/)

## Motivation

Computer vision (CV) is a major application area for our users.
Keras on its own provides good support for image classification tasks, in particular via `tf.keras.applications`.
However, a Keras-native modeling solutions for more advanced tasks,
such as object detection, instance segmentation, etc., is still lacking.

As a result, the open-source community has rolled out many different solutions for these use cases,
made available via PyPI and GitHub. These third-party solutions are not always kept up to date, and
many still rely on the legacy multi-backend Keras. They also raise the issue of API standardization.

To fix this, we want machine learning engineers to have access to a standard Keras-native,
optimized, and well-tested set of components to build their advanced computer vision models.

This provides key user benefits:

- The package would be first-party and thus always up to date with modern best practices.
- High code quality and testing standards and strict quality control: same level of trust as core Keras
- A shared API standard across the community
- Ability for the open-source community to build more advanced solutions *on top* of this package instead of reinventing it

## Design Proposal

`keras-cv` will provide components that cover the following areas:

- Object Detection tasks.
- Instance Segmentation tasks.
- Semantic Segmentation tasks.
- Keypoint Detection tasks.
- Video Classification tasks.
- Object Tracking tasks.

Specifically, for Object Detection tasks, `keras-cv` will include most anchor-based modules:

- Common objects such as anchor generator, box matcher.
- Keras layer components such as ROI generator, NMS postprocessor.
- Keras backbone components that fills the gap from keras-applications.
- Keras losses and metrics, such as Focal loss and coco metrics.
- Data loader and preprocessing for different dataset, such as COCO.

For Semantic Segmentation tasks, `keras-cv` will include:

- Keras head components such as Atrous Spatial Pyramid Pooling (ASPP).

### Success criteria for `keras-cv`

- Cover all modeling tasks listed above
- Easy-to-use API
- Models run on CPU/GPU/TPU seamlessly
- State of the art performance
- Models can be readily deployed to production

### Boundaries between keras-cv and keras-applications

- keras-applications will be improved to include basic building blocks such as mobilenet bottleneck, that
 include feature maps
- keras-cv will depend on keras-applications for importing backbones.

### Boundaries between keras-cv and Tensorflow Addons

- Highly experimental modeling, layers, losses, etc, live in addons.
- Components from addons will graduate to keras-cv, given it incurs more usage,
 and it works in CPU/GPU/TPU. The API interface will remain experimental after graduation.

### Boundaries between keras-cv and Model Garden

- End to end modeling workflow and model specific details live in Model Garden
- Model garden will re-use most of the building blocks from keras-cv and Tensorflow Addons.
- Components from Model Garden can graduate to keras-cv, given it is widely accepted, 
 it works performant in CPU/GPU/TPU. The API interface should remain stable after graduation.

## Dependencies

- Tensorflow version >= 2.4
- Tensorflow datasets
- Keras-applications

## Backwards compatibility

We propose to guarantee major release backwards compatibility.

## Maintenance & development process

The `keras-cv` codebase will be primarily maintained by the Keras team at Google,
with help and contributions from the community. The codebase will be developed
on GitHub as part of the `keras-team` organization. The same process for tracking
issues and reviewing PRs will be used as for the core Keras repository.

## Performance benchmark

We will set up Keras benchmark utilities to help users contribute to this repository.

## Detailed Design

Detailed design will be shared in a separate document (this document only focuses on scope).

## Questions and Discussion Topics

Please share any questions or suggestion.


================================================
FILE: rfcs/20200920-keras-nlp-bert.md
================================================
# keras-nlp Transformer Encoder API

| Status        | Proposed      |
:-------------- |:---------------------------------------------------- |
| **Author(s)** | Zhenyu Tan (tanzheny@google.com), Francois Chollet (fchollet@google.com), Hongkun Yu (hongkuny@google.com)|
| **Sponsor(s)** | Mark Omernick (momernick@google.com)|
| **Updated**   | 2020-09-21                                           |


## Objective

We aim at providing a set of Keras layers to handle Transformer-Encoder BERT-style models.

## Key Benefits

BERT-style Transformer-Encoders are a state-of-art technique that powers many NLP tasks:

- Single sentence classification task, e.g., sentiment analysis
- Sentence pair classification task, e.g., next sentence prediction
- Question answering task, e.g., SQuAD
- Single sentence tagging task, e.g., named entity recognition

With this proposal, Keras users will be able to handle the tasks above with a simple API. 

## Design overview

This proposal builds on the assumption that inputs are lookup indices, i.e., `tf.int64` sequences.
Tokenization is not part of this proposal but will be our immediate next step.

### Classification task

Case where a user want to use a pretrained BERT encoder for sentiment analysis:

```python
# Considering a imbd review dataset
import tensorflow as tf
import tensorflow_datasets as tfds
import keras_nlp
import tensorflow_text as tftext

imdb_reviews = tfds.load('imdb_reviews')
train_ds = imdb_reviews['train'].batch(32)
test_ds = imdb_reviews['test'].batch(32)

# Tokenization with BertTokenizer
vocab_path = "gs://<bucket_name>/<file_path>/vocab.txt"
tokenizer = tftext.BertTokenizer(vocab_path, token_out_type=tf.int64, lower_case=False)
SEQUENCE_LENGTH = 128
def preprocess(input_text):
  token_ids = tokenizer.tokenize_with_offsets(input_text)
  segment_ids = tf.concat([tf.zeros_like(cls), tf.ones_like(token_ids), tf.ones_like(sep)], axis=1)
  output_shape = [None, SEQUENCE_LENGTH]
  token_ids = token_ids.merge_dims(-2, -1)
  segment_ids = segment_ids.merge_dims(-2, -1).to_tensor(shape=output_shape)
  input_mask = tf.ones_like(token_ids).to_tensor(shape=output_shape)
  token_ids = token_ids.to_tensor(shape=output_shape)
  return {
      'input_ids': token_ids,
      'input_mask': input_mask,
      'input_type_ids': segment_ids
  }

strategy = tf.distribute.TPUStrategy(...)
with strategy.scope():
  encoder = keras_nlp.encoders.BertEncoder(vocab_size=30522, max_sequence_length=512, type_vocab_size=2)
  encoder.load_weights("gs://<bucket_name>/<file_path>")
  token_ids = tf.keras.layers.Input(shape=(SEQUENCE_LENGTH,), name='input_ids', dtype=tf.int32)
  input_mask = tf.keras.layers.Input(shape=(SEQUENCE_LENGTH,), name='input_mask', dtype=tf.int32)
  type_ids = tf.keras.layers.Input(shape=(128,), name='input_type_ids', dtype=tf.int32)
  x = encoder([token_ids, input_mask, type_ids])['pooled_output']
  x = tf.keras.layers.Dropout(rate=0.1)(x)
  output = tf.keras.layers.Dense(1, activation='sigmoid')(x)
  model = tf.keras.Model(inputs=[token_ids, input_mask, type_ids], outputs=output)

model.compile('adam', 'binary_crossentropy', ['accuracy'])
model.fit(train_ds, epochs=5, validation_data=test_ds)
```

### Pretraining task

We aim to provide pretrained checkpoints for `BertEncoder` with different datasets and different sizes through TF Hub,
however the user can choose to pretrain a new BertEncoder based on their own dataset.

```python
with strategy.scope():
  encoder = keras_nlp.encoders.BertEncoder(vocab_size, max_sequence_length, type_vocab_size)
  token_ids = tf.keras.layers.Input(shape=(SEQUENCE_LENGTH,), name='word_token_ids', dtype=tf.int32)
  input_mask = tf.keras.layers.Input(shape=(SEQUENCE_LENGTH,), name='input_mask', dtype=tf.int32)
  type_ids = tf.keras.layers.Input(shape=(128,), name='input_type_ids', dtype=tf.int32)
  masked_lm_positions = tf.keras.layers.Input(shape=(None,), name='masked_lm_positions', dtype=tf.int32)
  x = encoder([token_ids, input_mask, type_ids])['pooled_output']
  cls_output, sequence_output = output['pooled_output'], outputs['sequence_output']
  masked_lm = keras_nlp.layers.MaskedLM(embedding_table=encoder.get_embedding_table())
  lm_output = masked_lm(sequence_output, masked_positions=masked_lm_positions)
  cls_output = tf.keras.layers.Dense(units=num_classes, activation='softmax')(cls_output)
  model = tf.keras.Model(inputs=[token_ids, input_mask, type_ids, masked_lm_positions],
                         outputs={'lm_output': masked_lm, 'cls_output': cls_output})

model.compile('adam', {'lm_output': 'sparse_categorical_crossentropy', 'cls_output': 'sparse_categorical_crossentropy'})
model.fit(train_ds, epochs=100)
```

### Other encoder-based networks

`BertEncoder` is the first encoder network we propose in this doc. However other encoder networks can be easily
built on top of the `TransformerEncoder` layer. For example, for a transformer encoder sharing mechanism
with [ALBERT](https://arxiv.org/pdf/1909.11942.pdf), this can be achieved by:

```python
token_ids = tf.keras.layers.Input(shape=(None,), dtype=tf.int32, name='input_word_ids')
mask = tf.keras.layers.Input(shape=(None,), dtype=tf.int32, name='input_mask')
type_ids = tf.keras.layers.Input(shape=(None,), dtype=tf.int32, name='input_type_ids')
word_embeddings = keras_nlp.layers.OnDeviceEmbedding(vocab_size, embedding_width)(token_ids)
position_embeddings = keras_nlp.layers.PositionEmbedding(max_sequence_length)(word_embeddings)
type_embeddings = keras_nlp.layers.OnDeviceEmbedding(
  vocab_size=type_vocab_size, embedding_width=embedding_width, use_one_hot=True)(type_ids)
embeddings = tf.keras.layers.Add()([word_embeddings, position_embeddings, type_embeddings])
embeddings = tf.keras.layers.LayerNormalization(axis=-1)(embeddings)
embeddings = tf.keras.layers.Dropout(rate=dropout_rate)(embeddings)
embeddings = tf.keras.layers.experimental.EinsumDense(
  '...x,xy->...y', output_shape=hidden_size, bias_axes='y')(embeddings)
data = emnbeddings
attention_mask = layers.SelfAttentionMask()([data, mask])
shared_layer = keras_nlp.layers.TransformerEncoder(num_attention_heads, inner_dim)
for _ in range(num_layers):
  data = shared_layer([data, attention_mask])
first_token_tensor = tf.keras.layers.Lambda(lambda x: tf.squeeze(x[:, 0:1, :], axis=1))(data)
cls_output = tf.keras.layers.Dense(units=hidden_size, activation='tanh')(first_token_tensor)
outputs = dict(sequence_output=data, pooled_output=cls_output)
model = tf.keras.Model(inputs=[word_ids, mask, type_ids], outputs=outputs)
```

## Detailed Design

### Layers -- TransformerEncoder

This layer encapsulates a single layer of Transformer Encoder.

```python
class TransformerEncoder(tf.keras.layers.Layer):
  """TransformerEncoder layer.

  This layer implements the Transformer Encoder from
  "Attention Is All You Need". (https://arxiv.org/abs/1706.03762),
  which combines a `tf.keras.layers.MultiHeadAttention` layer with a
  two-layer feedforward network.

  References:
    [Attention Is All You Need](https://arxiv.org/abs/1706.03762)
    [BERT: Pre-training of Deep Bidirectional Transformers for Language
     Understanding](https://arxiv.org/abs/1810.04805)
  """

  def __init__(self,
               num_attention_heads,
               inner_dim,
               inner_activation,
               output_range=None,
               kernel_initializer="glorot_uniform",
               bias_initializer="zeros",
               kernel_regularizer=None,
               bias_regularizer=None,
               activity_regularizer=None,
               kernel_constraint=None,
               bias_constraint=None,
               use_bias=True,
               norm_first=False,
               norm_epsilon=1e-12,
               output_dropout=0.0,
               attention_dropout=0.0,
               inner_dropout=0.0,
               attention_initializer=None,
               **kwargs):
    """Initializes `TransformerEncoder`.

    Arguments:
      num_attention_heads: Number of attention heads.
      inner_dim: The output dimension of the first Dense layer in a two-layer
        feedforward network.
      inner_activation: The activation for the first Dense layer in a two-layer
        feedforward network.
      output_range: the sequence output range, [0, output_range) for slicing the
        target sequence. `None` means the target sequence is not sliced.
      kernel_initializer: Initializer for dense layer kernels.
      bias_initializer: Initializer for dense layer biases.
      kernel_regularizer: Regularizer for dense layer kernels.
      bias_regularizer: Regularizer for dense layer biases.
      activity_regularizer: Regularizer for dense layer activity.
      kernel_constraint: Constraint for dense layer kernels.
      bias_constraint: Constraint for dense layer kernels.
      use_bias: Whether to enable use_bias in attention layer. If set False,
        use_bias in attention layer is disabled.
      norm_first: Whether to normalize inputs to attention and intermediate
        dense layers. If set False, output of attention and intermediate dense
        layers is normalized.
      norm_epsilon: Epsilon value to initialize normalization layers.
      output_dropout: Dropout probability for the post-attention and output
        dropout.
      attention_dropout: Dropout probability for within the attention layer.
      inner_dropout: Dropout probability for the first Dense layer in a
        two-layer feedforward network.
      attention_initializer: Initializer for kernels of attention layers. If set
        `None`, attention layers use kernel_initializer as initializer for
        kernel.
      **kwargs: keyword arguments/
    """
```

### Layers -- SelfAttentionMask

```python
class SelfAttentionMask(tf.keras.layers.Layer):
  """Create 3D attention mask from a 2D tensor mask."""

  def call(self, inputs, to_mask):
  """
  Args:
    inputs[0]: from_tensor: 2D or 3D Tensor of shape
      [batch_size, from_seq_length, ...].
    inputs[1]: to_mask: int32 Tensor of shape [batch_size, to_seq_length].

  Returns:
      float Tensor of shape [batch_size, from_seq_length, to_seq_length].
  """
```

### Layers -- OnDeviceEmbedding
This is the experimental layer that would support either one-hot tf.matmul approach or tf.gather approach.

```python
class OnDeviceEmbedding(tf.keras.layers.Layer):
  """Performs an embedding lookup suitable for accelerator devices.

  This layer uses either tf.gather or tf.one_hot to translate integer indices to
  float embeddings.

  Arguments:
    vocab_size: Number of elements in the vocabulary.
    embedding_width: Output size of the embedding layer.
    initializer: The initializer to use for the embedding weights. Defaults to
      "glorot_uniform".
    use_one_hot: Whether to use tf.one_hot over tf.gather for the embedding
      lookup. Defaults to False (that is, using tf.gather). Setting this option
      to True may improve performance, especially on small vocabulary sizes, but
      will generally require more memory.
  """

  def __init__(self,
               vocab_size,
               embedding_width,
               initializer="glorot_uniform",
               use_one_hot=False,
               **kwargs):
```

### Layers -- PositionEmbedding

```python
class PositionEmbedding(tf.keras.layers.Layer):
  """Creates a positional embedding.

  Arguments:
    max_length: The maximum size of the dynamic sequence.
    initializer: The initializer to use for the embedding weights. Defaults to
      "glorot_uniform".

  Reference: This layer creates a positional embedding as described in
  [BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding](https://arxiv.org/abs/1810.04805).
  """
```

### Layers -- MaskedLM

```python
class MaskedLM(tf.keras.layers.Layer):
  """Masked language model network head for BERT modeling.

  This layer implements a masked language model based on the provided
  transformer based encoder. It assumes that the encoder network being passed
  has a "get_embedding_table()" method.

  Arguments:
    embedding_table: The embedding table from encoder network.
    activation: The activation, if any, for the dense layer.
    initializer: The initializer for the dense layer. Defaults to a Glorot
      uniform initializer.
    output: The output style for this layer. Can be either 'logits' or
      'predictions'.
  """

  def __init__(self,
               embedding_table,
               activation=None,
               initializer='glorot_uniform',
               output='logits',
               name=None,
               **kwargs):
```

### Encoders -- BertEncoder

```python
class BertEncoder(tf.keras.Model):
  """Bi-directional Transformer-based encoder network.

  This network implements a bi-directional Transformer-based encoder as
  described in "BERT: Pre-training of Deep Bidirectional Transformers for
  Language Understanding" (https://arxiv.org/abs/1810.04805). It includes the
  embedding lookups and transformer layers, but not the masked language model
  or classification task networks.

  The default values for this object are taken from the BERT-Base implementation
  in "BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding".

  *Note* that the network is constructed by
  [Keras Functional API](https://keras.io/guides/functional_api/).

  Arguments:
    vocab_size: The size of the token vocabulary.
    hidden_size: The size of the transformer hidden layers.
    num_layers: The number of transformer layers.
    num_attention_heads: The number of attention heads for each transformer. The
      hidden size must be divisible by the number of attention heads.
    max_sequence_length: The maximum sequence length that this encoder can
      consume. If None, max_sequence_length uses the value from sequence length.
      This determines the variable shape for positional embeddings.
    type_vocab_size: The number of types that the 'type_ids' input can take.
    inner_dim: The output dimension of the first Dense layer in a two-layer
        feedforward network for each transformer.
    inner_activation: The activation for the first Dense layer in a two-layer
        feedforward network for each transformer.
    output_dropout: Dropout probability for the post-attention and output
        dropout.
    attention_dropout: The dropout rate to use for the attention layers
      within the transformer layers.
    initializer: The initialzer to use for all weights in this encoder.
    output_range: The sequence output range, [0, output_range), by slicing the
      target sequence of the last transformer layer. `None` means the entire
      target sequence will attend to the source sequence, which yeilds the full
      output.
    embedding_width: The width of the word embeddings. If the embedding width is
      not equal to hidden size, embedding parameters will be factorized into two
      matrices in the shape of ['vocab_size', 'embedding_width'] and
      ['embedding_width', 'hidden_size'] ('embedding_width' is usually much
      smaller than 'hidden_size').
  """

  def __init__(
      self,
      vocab_size,
      hidden_size=768,
      num_layers=12,
      num_attention_heads=12,
      max_sequence_length=512,
      type_vocab_size=16,
      inner_dim=3072,
      inner_activation='gelu',
      output_dropout=0.1,
      attention_dropout=0.1,
      initializer='truncated_normal',
      output_range=None,
      embedding_width=None,
      **kwargs):
```

## Questions and Discussion Topics

Gathering feedbacks on arguments & naming conventions.


================================================
FILE: rfcs/20200928-keras-cv-single-stage-2d-object-detection.md
================================================
# keras-cv Single Stage Two-Dimensional Object Detection API

| Status        | Proposed      |
:-------------- |:---------------------------------------------------- |
| **Author(s)** | Zhenyu Tan (tanzheny@google.com), Francois Chollet (fchollet@google.com)|
| **Contributor(s)** | Pengchong Jin (pengchong@google.com)|
| **Updated**   | 2020-09-28                                           |

## Objective

We aim at providing the core primitive components for training and serving single-stage two-dimensional object
detection models, such as Single-Shot MultiBox Detector (SSD), RetinaNet, and You-Only-Look-Once (YOLO).
Pretrained models will also be provides, similar to keras-applications.

## Key Benefits

Single-stage object detection models are a state-of-art technique that powers many computer vision tasks, they provide
faster detection compared to two-stage models (such as FasterRCNN), while maintaining comparable performance.

With this proposal, Keras users will be able to build end-to-end models with a simple API.

## Design overview

This proposal includes the specific core components for building single-stage object detection models. It does not, however, include:

1. Data augmentation, such as image and groundtruth box preprocessing
2. Model backbone, such as DarkNet, or functions to generate feature maps
3. Detection heads, such as Feature Pyramid
4. metrics utilities such as COCO Evaluator, or visualization utils.

Data augmentation will be included as a separate RFC that handles a
broader context than object detection.

Model backbone and detection heads are model-specific, we anticipate them to be analyzed and proposed in 
`keras.applications` for heavily used patterns, however the user can build them easily using Keras.

#### Training

Case where a user want to train from scratch:

```python
import tensorflow as tf
import tensorflow_datasets as tfds
import keras_cv

# Considering a COCO dataset
coco_dataset = tfds.load('coco/2017')
train_ds, eval_ds = coco_dataset['train'], coco_dataset['validation']

def preprocess(features):
  image, gt_boxes, gt_labels = features['image'], features['objects']['bbox'], features['objects']['label']
  # preprocess image, gt_boxes, gt_labels, such as flip, resize, and padding, and reserve 0 for background label.
  return image, gt_boxes, gt_labels

anchor_generator = keras_cv.ops.AnchorGenerator(anchor_sizes, scales, aspect_ratios, strides)
similarity_calculator = keras_cv.layers.IOUSimilarity()
box_matcher = keras_cv.ops.BoxMatcher(positive_threshold, negative_threshold)
target_gather = keras_cv.ops.TargetGather()
box_coder = keras_cv.ops.BoxCoder(offset='sigmoid')

def encode_label(image, gt_boxes, gt_labels):
  anchor_boxes = anchor_generator(image_size)
  iou = similarity_calculator(gt_boxes, anchor_boxes)
  match_indices, match_indicators = box_matcher(iou)

  mask = tf.less_equal(match_indicators, 0)
  class_mask = tf.expand_dims(mask, -1)
  box_mask = tf.tile(class_mask, [1, 4])

  class_targets = target_gather(gt_labels, match_indices, class_mask, -1)
  box_targets = target_gather(gt_boxes, match_indices, box_mask, 0.0)
  box_targets = box_coder.encode(box_targets, anchor_boxes)

  weights = tf.squeeze(tf.ones_like(gt_labels), axis=-1)
  ignore_mask = tf.equal(match_indicators, -2)
  class_weights = target_gather(weights, match_indices, ignore_mask, 0.0)
  box_weights = target_gather(weights, match_indices, mask, 0.0)

  return (image, {'classification': class_targets, 'regression': box_targets},
          {'classification': class_weights, 'regression': box_weights})

class RetinaNet(tf.keras.Model):
  # includes backbone and feature pyramid head.
  def __init__(self):
    # self.backbone = Model Backbone that returns dict of feature map
    # self.fpn = Feature Pyramid Heads that
    # self.head = classification and regression heads
  
  def call(self, image, training=None):
    feature_map = self.backbone(image, training)
    feature_map = self.fpn(feature_map, training)
    class_scores, boxes = self.head(feature_map, training)
    return {'classification': class_scores, 'regression': boxes}

transformed_train_ds = train_ds.map(preprocess).map(encode_label).batch(128).shuffle(1024)
transformed_eval_ds = eval_ds.map(preprocess).map(encode_label).batch(128)

strategy = tf.distribute.TPUStrategy(...)
with strategy.scope():
    optimizer = tf.keras.optimizers.SGD(lr_scheduler)
    model = RetinaNet()
    model.compile(optimizer=optimizer,
                  loss={'classification': keras_cv.losses.Focal(), 'regression': tf.keras.losses.Huber()},
                  metrics=[])

model.fit(transformed_train_ds, epochs=120, validation_data=transformed_eval_ds)
model.save(file_path)
``` 

#### Serving

Case where a user want to serve the trained model for a single image.

```python
loaded_model = tf.keras.models.load(file_path)
box_coder = keras_cv.ops.BoxCoder(offset='sigmoid')
anchor_generator = keras_cv.ops.AnchorGenerator()
anchor_boxes = anchor_generator(image_size)
detection_generator = keras_cv.layers.NMSDetectionDecoder()

@tf.function
def serving_fn(image):
  batched_image = tf.expand_dims(image)
  raw_boxes, scores = loaded_model(batched_image, training=False)
  decoded_boxes = box_coder.decode(raw_boxes, anchor_boxes)
  classes, scores, boxes, _ = detection_generator(scores, decoded_boxes)
  return {'classes': classes, 'scores': scores, 'boxes': boxes}
```

## Detailed Design

For the rest of the design, we denote `B` as batch size, `N` as the number of ground truth boxes, and `M` as the number
of anchor boxes.

We propose 2 layers, 1 loss and 4 ops in this RFC.

#### Layers -- IouSimilarity
We propose IouSimilarity layer to support ragged tensor directly, however user can also pad ground truth
boxes or anchor boxes and pass a mask
 
```python
class IouSimilarity(tf.keras.layers.Layer):
  """Class to compute similarity based on Intersection over Union (IOU) metric."""
 
  def __init__(self, mask_value):
    """Initializes IouSimilarity layer.
    Args:
      mask_value: A float mask value to fill where `mask` is True. 
    """
 
  def call(self, groundtruth_boxes, anchors, mask=None):
    """Compute pairwise IOU similarity between ground truth boxes and anchors.
 
    Args:
      groundtruth_boxes: A float Tensor [N], or [B, N] represent coordinates.
      anchors: A float Tensor [M], or [B, M] represent coordinates.
      mask: A boolean tensor with [N, M] or [B, N, M].
 
    Returns:
      A float tensor with shape [M, N] or [B, M, N] representing pairwise
        iou scores, anchor per row and groundtruth_box per colulmn.
 
    Input shape:
      groundtruth_boxes: [N, 4], or [B, N, 4]
      anchors: [M, 4], or [B, M, 4]
 
    Output shape:
      [M, N], or [B, M, N]
    """
```

#### Layers -- NMSDetectionDecoder

```python
class NMSDetectionDecoder(tf.keras.layers.Layer):
  """Generates detected boxes with scores and classes for one-stage detector."""

  def __init__(self,
               pre_nms_top_k=5000,
               pre_nms_score_threshold=0.05,
               nms_iou_threshold=0.5,
               max_num_detections=100,
               use_batched_nms=False,
               **kwargs):
    """Initializes a detection generator.

    Args:
      pre_nms_top_k: int, the number of top scores proposals to be kept before
        applying NMS.
      pre_nms_score_threshold: float, the score threshold to apply before
        applying  NMS. Proposals whose scores are below this threshold are
        thrown away.
      nms_iou_threshold: float in [0, 1], the NMS IoU threshold.
      max_num_detections: int, the final number of total detections to generate.
      use_batched_nms: bool, whether or not use
        `tf.image.combined_non_max_suppression`.
      **kwargs: other key word arguments passed to Layer.
    """

  def call(self, raw_boxes, raw_scores, anchor_boxes, image_shape):
    """Generate final detections.

    Args:
      raw_boxes: a single Tensor or dict with keys representing FPN levels and values
        representing box tenors of shape
        [batch, feature_h, feature_w, num_anchors * 4].
      raw_scores: a single Tensor or dict with keys representing FPN levels and values
        representing logit tensors of shape
        [batch, feature_h, feature_w, num_anchors].
      anchor_boxes: a tensor of shape of [batch_size, K, 4] representing the
        corresponding anchor boxes w.r.t `box_outputs`.
      image_shape: a tensor of shape of [batch_size, 2] storing the image height
        and width w.r.t. the scaled image, i.e. the same image space as
        `box_outputs` and `anchor_boxes`.

    Returns:
    `detection_boxes`: float Tensor of shape [B, max_num_detections, 4]
      representing top detected boxes in [y1, x1, y2, x2].
    `detection_scores`: float Tensor of shape [B, max_num_detections]
      representing sorted confidence scores for detected boxes. The values
      are between [0, 1].
    `detection_classes`: int Tensor of shape [B, max_num_detections]
      representing classes for detected boxes.
    `num_detections`: int Tensor of shape [B] only the first
      `num_detections` boxes are valid detections
    """
```

#### Losses -- Focal

```python
class FocalLoss(tf.keras.losses.Loss):
  """Implements a Focal loss for classification problems.

  Reference:
    [Focal Loss for Dense Object Detection](https://arxiv.org/abs/1708.02002).
  """

  def __init__(self,
               alpha=0.25,
               gamma=2.0,
               reduction=tf.keras.losses.Reduction.AUTO,
               name=None):
    """Initializes `FocalLoss`.

    Arguments:
      alpha: The `alpha` weight factor for binary class imbalance.
      gamma: The `gamma` focusing parameter to re-weight loss.
      reduction: (Optional) Type of `tf.keras.losses.Reduction` to apply to
        loss. Default value is `AUTO`. `AUTO` indicates that the reduction
        option will be determined by the usage context. For almost all cases
        this defaults to `SUM_OVER_BATCH_SIZE`. When used with
        `tf.distribute.Strategy`, outside of built-in training loops such as
        `tf.keras` `compile` and `fit`, using `AUTO` or `SUM_OVER_BATCH_SIZE`
        will raise an error. Please see this custom training [tutorial](
          https://www.tensorflow.org/tutorials/distribute/custom_training) for
            more details.
      name: Optional name for the op. Defaults to 'retinanet_class_loss'.
    """

  def call(self, y_true, y_pred):
    """Invokes the `FocalLoss`.

    Arguments:
      y_true: A tensor of size [batch, num_anchors, num_classes]
      y_pred: A tensor of size [batch, num_anchors, num_classes]

    Returns:
      Summed loss float `Tensor`.
    """
```

#### Ops -- AnchorGenerator

```python
class AnchorGenerator:
  """Utility to generate anchors for a multiple feature maps."""

  def __init__(self,
               anchor_sizes,
               scales,
               aspect_ratios,
               strides,
               clip_boxes=False):
    """Constructs multiscale anchors.

    Args:
      anchor_sizes: A list/dict of int represents the anchor size for each scale. The
        anchor height will be `anchor_size / sqrt(aspect_ratio)`, anchor width
        will be `anchor_size * sqrt(aspect_ratio)` for each scale.
      scales: A list/tuple/dict, or a list/tuple/dict of a list/tuple of positive
        floats representing the actual anchor size to the base `anchor_size`.
      aspect_ratios: A list/tuple/dict, or a list/tuple/dict of a list/tuple of positive
        floats representing the ratio of anchor width to anchor height.
      strides: A list/tuple of ints represent the anchor stride size between
        center of anchors at each scale.
      clip_boxes: Boolean to represents whether the anchor coordinates should be
        clipped to the image size. Defaults to `False`. 

    Input shape: the size of the image, `[H, W, C]`
    Output shape: the size of anchors concat on each level, `[(H /
      strides) * (W / strides), K * 4]`
    """
  def __call__(self, image_size):
    """
    Args:
      image_size: a tuple of 2 for image_height and image_width.
    Returns:
      anchors: a dict or single Tensor.
    """
```

#### Ops -- BoxMatcher

```python
class BoxMatcher:
  """Matcher based on highest value.

  This class computes matches from a similarity matrix. Each column is matched
  to a single row.

  To support object detection target assignment this class enables setting both
  positive_threshold (upper threshold) and negative_threshold (lower thresholds)
  defining three categories of similarity which define whether examples are
  positive, negative, or ignored:
  (1) similarity >= positive_threshold: Highest similarity. Matched/Positive!
  (2) positive_threshold > similarity >= negative_threshold: Medium similarity.
        This is Ignored.
  (3) negative_threshold > similarity: Lowest similarity for Negative Match.
  For ignored matches this class sets the values in the Match object to -2.
  """

  def __init__(
      self,
      positive_threshold,
      negative_threshold=None,
      force_match_for_each_col=False,
      positive_value=1,
      negative_value=-1,
      ignore_value=-2):
    """Construct BoxMatcher.

    Args:
      positive_threshold: Threshold for positive matches. Positive if
        sim >= positive_threshold, where sim is the maximum value of the
        similarity matrix for a given column. Set to None for no threshold.
      negative_threshold: Threshold for negative matches. Negative if
        sim < negative_threshold. Defaults to positive_threshold when set to None.
      force_match_for_each_col: If True, ensures that each column is matched to
        at least one row (which is not guaranteed otherwise if the
        positive_threshold is high). Defaults to False.
      positive_value: An integer to fill for positive match indicators.
      negative_value: An integer to fill for negative match indicators.
      ignore_value: An integer to fill for ignored match indicators.

    Raises:
      ValueError: If negative_threshold > positive_threshold.
    """

  def __call__(self, similarity_matrix):
    """Tries to match each column of the similarity matrix to a row.

    Args:
      similarity_matrix: A float tensor of shape [N, M], or [Batch_size, N, M]
        representing any similarity metric.

    Returns:
      matched_indices: A integer tensor of shape [N] with corresponding match indices for each
        of M columns, the value represent the column index that argmax match in the matrix.
      matched_indicators: A integer tensor of shape [N] or [B, N]. For positive match, the match 
        result will be the `positive_value`, for negative match, the match will be
        `negative_value`, for ignored match, the match result will be
        `ignore_value`.
    """
```

#### Ops -- TargetGather

```python
class TargetGather:
  """Labeler for dense object detector."""

  def __init__(self):
    """Constructs Anchor Labeler."""

  def __call__(self, labels, match_indices, mask, mask_val=0.0):
    """Labels anchors with ground truth inputs.

    Args:
      labels: An integer tensor with shape [N, dim], or [B, N, dim] representing
        groundtruth classes.
      match_indices: An integer tensor with shape [N] or [B, N] representing match
        ground truth box index.
      mask: An integer tensor with shape [N] representing match
        labels, e.g., 1 for positive, -1 for negative, -2 for ignore.
      mask_val: An python primitive to fill in places where mask is True.
    Returns:
      targets: A tensor with [M, dim] or [B, M, dim] selected from the `match_indices`.
    """
```

#### Ops -- BoxCoder

```python
class BoxCoder:
  """box coder for RetinaNet, FasterRcnn, SSD, and YOLO."""

  def __init__(self, scale_factors=None):
    """Constructor for BoxCoder.

    Args:
      scale_factors: List of 4 positive scalars to scale ty, tx, th and tw. If
        set to None, does not perform scaling. For Faster RCNN, the open-source
        implementation recommends using [10.0, 10.0, 5.0, 5.0].
      offset: The offset used to code the box coordinates, it can be 'sigmoid',
        i.e., coded_coord = coord + sigmoid(tx) which
        is used for RetinaNet, FasterRcnn, and SSD, or it can be 'linear',
        i.e., encoded_coord = coord + width * tx which is used for YOLO. 
    """
  def encode(self, boxes, anchors):
    """Compute coded_coord from coord."""
  def decode(self, boxes, anchors):
    """Compute coord from coded_coord."""
```

## Questions and Discussion Topics

* Whether `BoxMatcher` should take a list of thresholds (e.g., size 2) and a list of values (e.g., size 3).
* Gathering feedbacks on arguments & naming conventions.
* How to better generalize box coding, to differentiate RCNN-family encoding and YOLO-family encoding.
* Whether to have BoxCoder(inverse=False) and a single call method, or BoxCoder with `encode` and `decode` methods.

================================================
FILE: rfcs/20210920-tune-end-to-end-ml-workflows-in-keras-tuner.md
================================================
# Tune end-to-end ML workflows in KerasTuner

| Status        | Proposed                                             |
:-------------- |:---------------------------------------------------- |
| **Author**    | Haifeng Jin (haifengj@google.com)                    |
| **Updated**   | 2021-09-20                                           |

## Objective

Improving the user experience of KerasTuner to tune end-to-end workflows.
Reduce the learning curve and code hacks for workflows involves hyperparameters
in data preprocessing and model fitting.

## Motivation

Different users prefer different workflows for their tuning process -- like
Keras has different getting-started tutorials for engineers and researchers.
There are users who prefer to learn more about the framework and to implement
everything by overriding class methods, and users who prefer to write
everything from scratch to have a shorter learning curve and better
configurability for the details.  For example, some users would like to
override `Model.train_step()` to make the code cleaner, others like to write
the training loop from scratch.


Currently, KerasTuner has good support for the users who would like to
restructure their code by learning the KerasTuner framework, and for users who
only need to do some light customization of the model building process.
However, the support for users who need to write their model building and
training process from scratch is not adequate.


Moreover, many users use the hyperparameter tuning library as an intermediate
step in their ML process rather than their main API. In their workflow,
implementing and training a model with Keras are usually a separate process
from hyperparameter tuning. They would first write the code using Keras, then
try to put it into KerasTuner to tune, and put the hyperparameter values back
into their Keras model. Therefore, we should maximize the code and model
portability in KerasTuner for these users, and minimize the code changes
required for them to adopt and remove KerasTuner.

### The old workflow

The current workflow for writing their model training process with KerasTuner
is as follows. The user defines the model in the `HyperModel.build()` function.
Defines the data preprocessing and model training by overriding
`Tuner.run_trial()`. The arguments, like the dataset, are passed through the
`Tuner.search()` function, and finally received by `Tuner.run_trial()`.


```py
import keras_tuner as kt

class MyHyperModel(kt.HyperModel):
  def build(self, hp):
    # Model building
    model = keras.Sequential()
    model.add(keras.layers.Dense(
        hp.Choice('units', [8, 16, 32]),
        activation='relu'))
    model.add(keras.layers.Dense(1, activation='relu'))
    model.compile(loss='mse')
    return model

class MyTuner(kt.Tuner):
  def run_trial(self, trial, *fit_args, **fit_kwargs):
    hp = trial.hyperparameters
  
    # data preprocessing       
    training_data, validation_data = data_preprocessing(
        hp, *fit_args, **fit_kwargs)
    model = self.hypermodel.build(hp)
   
    # model training
    model.fit(
        training_data,
        epochs=hp.Int(...),
        validation_data=validation_data,
        ...)
       
    # evaluation and reporting
    score = model.evaluate(validation_data, ...)
    self.oracle.update_trial(trial.trial_id, {'score': score})
    self.save_model(trial.trial_id, model)

tuner = MyTuner(
    hypermodel=MyHyperModel(),
    objective=kt.Objective('score', 'min'),
    ...)

# Passing in the args
tuner.search(*fit_args, **fit_kwargs)
```

### Problems

The key problem of this workflow is that the code is split in two classes. Any
control flow and data flow between data preprocessing, model building, and
model training would all have to pass through the framework and function calls.
To use the framework, the user would have to understand how these different
functions are called, and wire their data and information properly between
these functions.

### Use cases to improve

The following use cases are not well supported because of the problem above.

#### Configure and jointly tune data preprocessing and model training

For example, writing a custom training loop, or tuning the data preprocessing
steps, or anything in the training loop like whether to shuffle the training
data, they need to override the `Tuner.run_trial()` function, which adds more
to the learning curve.

For example, in natural language processing, tokenization and vectorization may
affect the later model type. They will need to find a way to pass this
information from `Tuner.run_trial()` to HyperModel.build.

#### Tune existing Keras code

If the users have their code for model building and training ready written using
Keras, and they want to tune some of the hyperparameters, they would have to
change the code a lot to separate their code apart and wire the data flow and
control flow between the overridden functions.

#### Retrain the model after tuning

If the user wants to retrain the model using the best hyperparameter values
found, there is not a straight-forward way to do it if they used the
hyperparameter in `Tuner.run_trial()` for data preprocessing and model
training.

## User Benefit

The use cases described above would all have smooth workflows, without much
extra code or learning of the framework.

## Design Proposal

We propose two workflows: the `Tuner` workflow and the `HyperModel` workflow to
solve the problems above.

The `Tuner` workflow is to override `Tuner.run_trial()`. The user can put all the
code for data preprocessing, model building, model training all in one place in
the `Tuner.run_trial()` function. No `HyperModel` is needed. It supports all the
use cases mentioned above by providing the maximum freedom to the user.

The `HyperModel` workflow follows the original `HyperModel` style. It is easier
to learn and needs less code compared to the first workflow, but covers all the
use cases as long as the code for building and training the model are separate.
The user only needs to override the `HyperModel.fit()` for any tuning of the
data preprocessing and model fitting process.

## Detailed Design

### The `Tuner` workflow

Here is an end-to-end code example of the new workflow.

The user only needs to override `Tuner.run_trial()` to put everything together,
including data preprocessing, model building, and model training. It returns
the evaluation results back to the tuner. 

```py
class MyTuner(kt.Tuner):
  def run_trial(self, trial, x, y, callbacks=None, **kwargs):
    hp = trial.hyperparameters
    # Data preprocessing
    num_features = hp.Int("num_features", 10, 15)
    x, y = feature_selection(num_features=num_features, x, y)
    # Model building
    # Input shape depending on data preprocessing.
    inputs = keras.Input(shape=(num_features,))
    outputs = keras.layers.Dense(
        hp.Choice('units', [8, 16, 32]),
        activation='relu')(inputs)
    outputs = keras.layers.Dense(1, activation='relu')(outputs)
    model = keras.Model(inputs=inputs, outputs=outputs)
    model.compile(loss='mse',
                  metrics=['mae'])
    # Model training
    history = model.fit(
        x,
        y,
        epochs=100,
        validation_data=validation_data,
        # Tune whether to use shuffle.
        shuffle=hp.Boolean("shuffle"),
        # Tune whether to use sample_weights.
        sample_weight=sample_weight if hp.Boolean("sample_weight") else None,
        # The provided callbacks list contains checkpointing and tensorboard.
        callbacks=callbacks)
    # Save the model to a unique path with `trial_id`.
    model.save(os.path.join(trial.trial_id, 'model'))
    # Returning the evaluation results
    return np.min(history.history["val_mae"])

# When Tuner.run_trial is overridden,
# `hypermodel` and `objective` are optional.
tuner = MyTuner(
    max_trials=3,
    executions_per_trial=2,
    overwrite=True,
    directory="my_dir",
    project_name="helloworld",
)

# Anything passed to `search()` will
# go to `**kwargs` for `Tuner.run_trial()`.
tuner.search(x, y)
# Get the best model.
best_model = tuner.get_best_models()[0]
```

There are several important features in this workflow:

* Tune the arguments in `HyperModel.fit()`, like `shuffle` and `sample_weight`.

* Share local variables across the workflow. For example, the model building
  process can access the `num_features`, which is a variable in data
  preprocessing. It solves the problem of joint tuning.

* Use built-in callbacks for convenience. The callbacks argument contains
  callback functions for checkpointing and TensorBoard setup.

* The return value is flexible. It can be a single value, or a list of values,
  or a dictionary of metrics, or even a `History` object returned by
  `model.fit()`.

* The `hypermodel` and `objective` can be optional. The user doesn't need to
  define a `HyperModel`. If the return value is a single value, it will be
  minimized by default. Therefore, objective is also optional.

* The user can build a unique path to save each model with `trial.trial_id`.

For the use case of reusing existing Keras code. The user can use the following
workflow, which calls a function using all the hyperparameters. The user only
needs to write a function to call the existing Keras code and return the
evaluation results.

```py
class MyTuner(kt.Tuner):
 def run_trial(self, trial, **kwargs):
   hp = trial.hyperparameters
   return build_and_evaluate_model(
       hp.Int("num_features", 10, 15),
       hp.Choice('units', [8, 16, 32]),
       ...
       trial.trial_id,
   ))
   # Save model can be handled by the user.
   # `trial_id` is unique for each trial.

tuner = MyTuner(...)
tuner.search()
# Retraining the model
build_and_evaluate_model(**tuner.get_best_hyperparameters()[0])
```
	

In this workflow, the user can easily retrain the model by calling the function again with the best hyperparameters.

### The HyperModel workflow

For users who prefer to follow the old workflow, they can also implement the HyperModel above by overriding the build function and the fit function. The build function builds and returns the model. The fit function does the data preprocessing and model training.

Following is a code example implementing the same functionality of the code example above.

```py
import numpy as np
import keras_tuner as kt
from tensorflow import keras

class MyHyperModel(kt.HyperModel):

  def build(self, hp):
    # Model building
    # Input shape depends on a hyperparameter used by data preprocessing.
    inputs = keras.Input(shape=(hp.Int("num_features", 10, 15),))
    x = keras.layers.Dense(
        hp.Choice('units', [8, 16, 32]),
        activation='relu')(inputs)
    outputs = keras.layers.Dense(1, activation='relu')(x)
    model = keras.Model(inputs=inputs, outputs=outputs)
    model.compile(loss='mse',
                  metrics=['mae'])
    return model
  
  def fit(self, hp, model, x, y, validation_data, callbacks=None, **kwargs):
    # Data preprocessing
    # Get the hyperparameter value used in `build()`.
    x, y = feature_selection(num_features=hp.get("num_features"), x, y)
    # Model training
    # Returning the training history
    # or a similar dictionary if using custom training loop.
    return model.fit(
        x,
        y,
        epochs=100,
        validation_data=validation_data,
        # Tune whether to use shuffle.
        shuffle=hp.Boolean("shuffle"),
        # Tune whether to use sample_weights.
        sample_weight=sample_weight if hp.Boolean("sample_weight") else None,
        # The provided callbacks list contains checkpointing and tensorboard.
        callbacks=callbacks)

tuner = kt.RandomSearch(
    hypermodel=MyHyperModel(),
    objective=kt.Objective('val_mae', 'min'),
    directory='dir',
    max_trials=3,
    executions_per_trial=2,
    overwrite=True,
    directory="my_dir",
    project_name="helloworld",
)

# Any arg passed to `search()` would be passed to `fit()`.
tuner.search(x, y)

# Exporting the best models.
models = tuner.get_best_models(num_models=2)

# Retraining the model with the second best hyperparameters.
second_best_hp = tuner.get_best_hyperparameters(num_models=2)[1]
hypermodel = MyHyperModel()
model = hypermodel.build(second_best_hp)
hypermodel.fit(
    hp=second_best_hp, 
    model=model,
    x=new_x,
    y=new_y,
    validation_data=new_validation_data,
    # Save the model at its best epoch to a custom path
    callbacks=[tf.keras.callbacks.ModelCheckpoint(
        filepath="path_to_checkpoint",
        monitor='val_loss',
        save_best_only=True)])
# Save the final model.
model.save("path_to_saved_model")
```

Please take note of the following four points:

* Similar to `Tuner.run_trial()`, the return value of the fit function supports
  all different formats.

* The user can use built-in callbacks just like in `Tuner.run_trial()`.

* `build()` and `fit()` can share hyperparameters. In this example,
  `num_features` is shared between the two functions. In `fit()`, we can use
  `hp.get()` to obtain the value of a hyperparameter used in `build()`.

* We can easily retrain the model with any hyperparameter value set with
  `hypermodel.build()` and `hypermodel.fit()`.

With these proposed workflows, the user now has the maximum flexibility. Any
step in an end-to-end machine learning workflow can be tuned. Moreover, the
changes needed to tune existing Keras code is minimized.

Here we present HyperModel code examples of three important use cases:

* Text tokenization.

* Custom training loop.

* Fine tuning with pretrained weights.

#### Text tokenization

```py
import json

# Save the vocabulary to disk before search.
text_vectorizer = layers.TextVectorization()
text_vectorizer.adapt(dataset.map(lambda x, y: x))
with open('vocab.json', 'w') as f:
  json.dump(text_vectorizer.get_vocabulary(), f)

class MyHyperModel(kt.HyperModel):
  def build(self, hp):
    inputs = keras.Input(shape=(10,))
    outputs = layers.Embedding(
        # max_token is a hyperparameter also used in text vectorization.
        input_dim=hp.Int("max_tokens", 100, 500, step=100),
        output_dim=64)(inputs)
    outputs = layers.LSTM(hp.Int("units", 32, 128, step=32))(outputs)
    outputs = layers.Dense(1, activation='sigmoid')(outputs)
    model = keras.Model(inputs, outputs)
    model.compile(loss='mse')
    return model
  
  def fit(self, hp, model, dataset, validation_data, callbacks, **kwargs):
    # Load the vocabulary from file.
    with open('vocab.json', 'r') as f:
      vocab = json.load(f)

    # Create and adapt the text vectorizer.
    text_vectorizer = layers.TextVectorization(
        # The max_tokens is a hyperparameter created in build().
        vocabulary=vocab[:hp.get("max_tokens")],
        output_mode="int",
        output_sequence_length=10)
  
    return model.fit(
        # Convert x from strings to integer vectors.
        dataset.map(
            lambda x, y: (text_vectorizer(x), y),
            num_parallel_calls=tf.data.AUTOTUNE),
        validation_data=validation_data,
        callbacks=callbacks,
    )
```
	

#### Custom training loop

```py
class MyHyperModel(kt.HyperModel):
  def build(self, hp):
    inputs = keras.Input(shape=(10,))
    outputs = layers.Dense(hp.Int("units", 16, 128), activation='relu')(inputs)
    outputs = layers.Dense(1, activation='sigmoid')
    model = keras.Model(inputs, outputs)
    return model
  
  def fit(self, hp, model, dataset, validation_data, **kwargs):
    lr = hp.Float("learning_rate", 1e-4, 1e-2, sampling="log", default=1e-3)
    optimizer = tf.keras.optimizers.Adam(lr)
    loss_tracker = tf.keras.metrics.Mean()
    # Track the validation loss
    val_loss_tracker = tf.keras.metrics.Mean()
    loss_fn = tf.keras.losses.SparseCategoricalCrossentropy()
    # Record the minimum validation loss during fit.
    min_val_loss = float("inf")
  
     @tf.function
    def run_train_step(data):
      images = tf.dtypes.cast(data[0], "float32") / 255.0
      labels = data[1]
      with tf.GradientTape() as tape:
        logits = model(images)
        loss = loss_fn(labels, logits)
      gradients = tape.gradient(loss, model.trainable_variables)
      optimizer.apply_gradients(zip(gradients, model.trainable_variables))
      loss_tracker.update_state(loss)
  
     @tf.function
    def run_val_step(data):
      images = tf.dtypes.cast(data[0], "float32") / 255.0
      labels = data[1]
      logits = model(images)
      loss = loss_fn(labels, logits)
      val_loss_tracker.update_state(loss)
  
    for epoch in range(2):
      for batch, data in enumerate(dataset):
        run_train_step(data)
      print(f"Epoch loss: {loss_tracker.result().numpy()}")
      loss_tracker.reset_states()
      for batch, data in enumerate(validation_data):
        run_val_step(data)
      val_loss = val_loss_tracker.result().numpy()
      min_val_loss = min(min_val_loss, val_loss)
      print(f"Epoch val_loss: {val_loss}")
      val_loss_tracker.reset_states()
  
    return min_val_loss
```

You may also subclass `keras.Model` to override `train_step()`.

#### Fine tuning with pretrained weights

```py
class MyHyperModel(kt.HyperModel):

  def build(self, hp):
    return keras.Sequential([
        keras.applications.ResNet50(
            weights="imagenet",
            input_shape=(32, 32, 3),
            include_top=False,
        ),
        layers.GlobalAveragePooling2D(),
        layers.Dense(hp.Int("units", 32, 128)),
        layers.Dense(1),
    ])
  
  def fit(self, hp, model, dataset, validation_data, callbacks, **kwargs):
    # Fit the model with the `base_model` freezed.
    model.layers[0].trainable = False
    model.compile(
        optimizer="adam",
        loss=keras.losses.BinaryCrossentropy(from_logits=True),
    )
    model.fit(dataset, epochs=20)
    # Fit the model again with some layers in the `base_model` freezed.
    model.layers[0].trainable = True
    for layer in model.layers[:hp.Int("freeze", 0, 20)]:
      layer.trainable = False
    model.compile(
        # Use a smaller learning rate.
        optimizer=keras.optimizers.Adam(learning_rate=1e-5),
        loss=keras.losses.BinaryCrossentropy(from_logits=True),
    )
    return model.fit(
        dataset,
        epochs=20,
        callbacks=callbacks,
        validation_data=validation_data)
```

### API documentation

The APIs in the new `HyperModel` class are as follows.

```py
class HyperModel():
  def fit(self, hp, model, callbacks, **kwargs):
    """Train the model.
   
    Args:
        hp: HyperParameters.
        model: `keras.Model` built in the `build()` function.
        callbacks: A list of prebuild Keras callbacks for model checkpointing
          and tensorboard configuration.
        **kwargs: Anything the user defines. They are passed from
            `Tuner.search()`.
   
    Returns:
        A `History` object, a similar dictionary, or a single value.
    """
    pass

class Tuner():
  def run_trial(self, trial, callbacks, **kwargs):
    """Train the model.
   
    Args:
        trial: Trial. The current Trial object.
        callbacks: A list of prebuild Keras callbacks for model checkpointing
          and tensorboard configuration.
        **kwargs: Anything the user defines. They are passed from Tuner.search().

    Returns:
        A `History` object, a similar dictionary, or a single value.
    """
```

## Questions and Discussion Topics

Does the fit function need `trial_id` in the args to do model saving? The user
may need this arg to build unique saving paths for the models.


================================================
FILE: rfcs/20220804-keras-cv-two-stage-2d-object-detection.md
================================================
# keras-cv Two Stage Two-Dimensional Object Detection API

| Status        | Proposed      |
:-------------- |:---------------------------------------------------- |
| **Author(s)** | Zhenyu Tan (tanzheny@google.com)|
| **Contributor(s)** | Francois Chollet (fchollet@google.com)|
| **Updated**   | 2022-08-04                                           |

## Objective

We aim at providing the core primitive components for training and serving two-stage two-dimensional object
detection models, specifically Faster RCNN.
Pretrained models will also be provided, similar to keras-applications.

## Key Benefits

Two-stage object detection models are state-of-art technique that powers many computer vision tasks, they provide
more accurate detection compared to single-stage models (such as SSD), while maintaining lower inference speed.

With this proposal, Keras users will be able to build end-to-end models with a simple API.

## Design overview

This proposal includes the specific core components for building faster rcnn models. It does not, however, include:

1. Model backbone, such as ResNet, or functions to generate feature maps
2. Detection heads, such as Feature Pyramid
3. metrics utilities such as COCO Evaluator, or visualization utils.
4. primitive components from [single-stage detector]([url](https://github.com/keras-team/governance/blob/master/rfcs/20200928-keras-cv-single-stage-2d-object-detection.md)), we will re-use those components in this design.

Data augmentation with ground truth box processing is currently being developed in KerasCV.

In this document, region of interest (roi) is used interchangeably with region proposal, or simply proposal.

#### Training

Case where a user want to train from scratch:

```python
import tensorflow as tf
import tensorflow_datasets as tfds
import keras_cv

# Considering a COCO dataset
coco_dataset = tfds.load('coco/2017')
train_ds, eval_ds = coco_dataset['train'], coco_dataset['validation']

def preprocess(features):
  image, gt_boxes, gt_labels = features['image'], features['objects']['bbox'], features['objects']['label']
  # preprocess image, gt_boxes, gt_labels, such as flip, resize, and padding, and reserve 0 for background label.
  # but a batch of images (typically 2 per GPU) should have same size.
  return image, gt_boxes, gt_labels

anchor_generator = keras_cv.ops.AnchorGenerator(anchor_sizes, scales, aspect_ratios, strides)
similarity_calculator = keras_cv.layers.IOUSimilarity()
# positive anchor with IOU > 0.7, negative anchor with IOU <= 0.3
rpn_box_matcher = keras_cv.ops.BoxMatcher([0.7, 0.3])
# positive ROI with IOU > 0.5, negative ROI with IOU <= 0.5
rcnn_box_mather = keras_cv.ops.BoxMatcher(0.5)
target_gather = keras_cv.ops.TargetGather()
box_coder = keras_cv.ops.BoxCoder(offset='sigmoid')
rpn_= keras_cv.layers.ProposalSampler(positive_fraction=0.5, batch_size=256)
rcnn_sampler = keras_cv.layers.ProposalSampler(positive_fraction=0.25, batch_size=128)
rpn_labeler = keras_cv.ops.AnchorLabeler(rpn_sampler, rpn_box_matcher, similarity_calculator, box_coder)
rcnn_labeler = keras_cv.ops.AnchorLabeler(rcnn_sampler, rcnn_box_matcher, similarity_calculator, box_coder)
roi_filter = keras_cv.layers.ROIFilter(pre_nms_top_k=2000, nms_iou_threshold=0.7, test_pre_nms_top_k=1000)
roi_pooler = keras_cv.layers.ROIPooler(output_size=[7, 7])
# Build RPN and ROI Heads, use Keras backbone
backbone = tf.keras.applications.ResNet50()

def encode_rpn_label(image, gt_boxes, gt_labels):
  anchor_boxes = anchor_generator(image_size)
  cls_targets, box_targets, cls_weights, box_weights = rpn_labeler(anchor_boxes, gt_boxes, gt_labels)
  return (gt_boxes, gt_labels, cls_targets, box_targets), (cls_weights, box_weights)

class FasterRCNN(tf.keras.Model):
  # includes backbone and feature pyramid head.
  def __init__(self, backbone='resnet50_fpn', rpn_head, roi_head, roi_filter, roi_pooler):
    # self.backbone = Model Backbone that returns dict of feature map, or Feature Pyramid Network that wraps it
    # self.rpn_head = Region Proposal Network that provides objectness scores and bbox offset against anchor boxes
    # self.roi_filter = A filter layer that shrinks from a dense predictions to topk sparse predictions based on scores
    # self.roi_head = RCNN detection network that provides softmaxed classification score and bbox offset against rois
    # self.rpn_cls_loss_fn = a Binary CrossEntropy Keras loss 
    # self.rpn_reg_loss_fn = a Regression Keras loss, e.g., Huber loss
    # self.rcnn_cls_loss_fn = a Binary CrossEntropy Keras loss
    # self.rcnn_reg_loss_fn = a Regression Keras loss, e.g., Huber loss
  
  def call(self, image, training=None):
    # returns a single or multi level feature maps
    feature_map = self.backbone(image, training)
    # from the region proposal network, returns the predicted objectness scores
    # and class-agnostic offsets relative to anchor boxes
    rpn_cls_pred, rpn_bbox_pred = self.rpn_head(feature_map)
    # apply offset to anchors and recover proposal in (x1, y1, x2, y2) format
    rpn_rois = box_coder.decode_offset(anchors, rpn_bbox_pred)
    # select top-k proposals according to objectness scores
    rois, cls_pred = self.roi_filter(rpn_rois, rpn_cls_pred)
    # pooling feature map with variable sized rois to fixed size feature map
    feature_map = self.roi_pooler(feature_map, rois)
    # get class independent scores and bounding boxes offsets relative to proposals
    rcnn_cls_pred, rcnn_bbox_pred = self.roi_head(feature_map)
    if not training:
      rcnn_cls_pred, rcnn_bbox_pred = self.nms_detection_decoder(rois, rcnn_cls_pred, rcnn_bbox_pred, image_shape)
      return rcnn_cls_pred, rcnn_bbox_pred
    return {"rpn_cls_pred": rpn_cls_pred, "rpn_bbox_pred": rpn_bbox_pred, "rois": rois,
            "rcnn_cls_pred": rcnn_cls_pred, "rcnn_bbox_pred": rcnn_bbox_pred}
  
  def train_step(self, data):
    image, (gt_labels, gt_boxes, rpn_cls_targets, rpn_box_targets), (rpn_cls_weights, rpn_box_weights) = data
    # Using approximate joint training instead of alternating training
    with tf.GradientTape() as tape:
      outputs = self(x, training=True)
      # Compute RPN losses using targets from input pipeline, this will normalize by N_cls and N_reg as well
      rpn_cls_loss = rpn_cls_loss_fn(rpn_cls_targets, outputs["rpn_cls_pred"], rpn_cls_weights)
      rpn_box_loss = rpn_reg_loss_fn(rpn_box_targets, outputs["rpn_boxes_pred"], rpn_box_weights)
      # Compute RCNN losses which only picks k-th bbox prediction where k is the predicted class
      rois = outputs["rpn_rois"]
      rcnn_cls_true, rcnn_box_true, rcnn_cls_weights, rcnn_box_weights = self.rcnn_labeler(rois, gt_boxes, gt_labels)
      rcnn_cls_loss = rcnn_cls_loss_fn(rcnn_scores_true, outputs["rcnn_cls_scores"], rcnn_cls_weights)
      rcnn_box_loss = rcnn_reg_loss_fn(rcnn_box_true, outputs["rcnn_bbox_offsets"], rcnn_box_weights)
      total_loss = rpn_cls_loss + rpn_box_loss + rcnn_cls_loss + rcnn_box_loss
    self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
    return self.compute_metrics(...)
      

transformed_train_ds = train_ds.map(preprocess).map(encode_rpn_label).batch(128).shuffle(1024)
transformed_eval_ds = eval_ds.map(preprocess).map(encode_rpn_label).batch(128)

strategy = tf.distribute.TPUStrategy(...)
with strategy.scope():
    optimizer = tf.keras.optimizers.SGD(lr_scheduler)
    model = RetinaNet()
    model.compile(optimizer=optimizer,
                  loss={'classification': keras_cv.losses.Focal(), 'regression': tf.keras.losses.Huber()},
                  metrics=[])

model.fit(transformed_train_ds, epochs=100, validation_data=transformed_eval_ds)
model.save(file_path)
``` 

#### Serving

Case where a user want to serve the trained model for a single image, this will be identical to single-stage object detector.

## Detailed Design

For the rest of the design, we denote `B` as batch size, `N` as the number of ground truth boxes, and `M` as the number
of anchor boxes.

We propose 3 layers and 1 op in this RFC.

#### Layers -- ProposalSampler
Given a dense anchor/proposal set, we propose ProposalSampler layer to for selecting positive and negative proposals according
to the required batch size and positive : negative ratio
boxes or anchor boxes and pass a mask
 
```python
class ProposalSampler(tf.keras.layers.Layer):
  """Class to select positive and negative proposals."""
 
  def __init__(self, positive_fraction, batch_size, positive_indicator=1, negative_indicator=-1):
    """Initializes ProposalSampler layer.
    Args:
      positive_fraction: A float number between [0, 1], 0.5 means positive:negative ratio is 1:1
      batch_size: the number of samples to generate
      positive_indicator: for the inputs to the layer, value for positive proposal, default to 1
      negative_indicator: for the inputs to the layer, value for negative proposal, default to -1
    """
 
  def call(self, matched_indicators):
    """Get a balanced positive and negative samples.
 
    Args:
      matched_indicators: A int Tensor [N], or [B, N] represent positive or negative values
 
    Returns:
      Int tensors with shape [sample_size] or [B, sample_size] representing the selected indices for propsals.
 
    """
```

#### Layers -- ROIPooler
We propose ROIPooler layer to crop feature maps from proposals
 
```python
class ROIPooler(tf.keras.layers.Layer):
  """Class to compute extract feature maps from region proposals by quantization."""
 
  def __init__(self, output_size=[7, 7]):
    """Initializes ROIPooler layer.
    Args:
      output_size: A tuple representing the output height and width. 
    """
 
  def call(self, feature_maps, rois):
    """Compute pairwise IOU similarity between ground truth boxes and anchors.
 
    Args:
      groundtruth_boxes: A float Tensor [H, W, C] or [B, H, W, C] or dict of multiple levels
      rois: A float or int Tensor [M], or [B, M] represent coordinates within [H, W].
 
    Returns:
      A float tensor with shape [output_size] or [B, output_size] representing cropped feature maps.
    """
```

#### Layers -- ROIFilter
We propose ROIFilter layer to select top-k proposals based on some score
 
```python
class ROIFilter(tf.keras.layers.Layer):
  """Class to select top-k proposals based on some score."""
 
  def __init__(self, 
               pre_nms_top_k: int = 2000,
               pre_nms_score_threshold: float = 0.0,
               pre_nms_min_size_threshold: float = 0.0,
               nms_iou_threshold: float = 0.7,
               num_proposals: int = 1000,
               test_pre_nms_top_k: int = 1000,
               test_pre_nms_score_threshold: float = 0.0,
               test_pre_nms_min_size_threshold: float = 0.0,
               test_nms_iou_threshold: float = 0.7,
               test_num_proposals: int = 1000,
               use_batched_nms: bool = False,):
    """Initializes ROIFilter layer.
    Args:
      pre_nms_top_k: An `int` of the number of top scores proposals to be kept
        before applying NMS.
      pre_nms_score_threshold: A `float` of the score threshold to apply before
        applying NMS. Proposals whose scores are below this threshold are
        thrown away.
      pre_nms_min_size_threshold: A `float` of the threshold of each side of the
        box (w.r.t. the scaled image). Proposals whose sides are below this
        threshold are thrown away.
      nms_iou_threshold: A `float` in [0, 1], the NMS IoU threshold.
      num_proposals: An `int` of the final number of proposals to generate.
      test_pre_nms_top_k: An `int` of the number of top scores proposals to be
        kept before applying NMS in testing.
      test_pre_nms_score_threshold: A `float` of the score threshold to apply
        before applying NMS in testing. Proposals whose scores are below this
        threshold are thrown away.
      test_pre_nms_min_size_threshold: A `float` of the threshold of each side
        of the box (w.r.t. the scaled image) in testing. Proposals whose sides
        are below this threshold are thrown away.
      test_nms_iou_threshold: A `float` in [0, 1] of the NMS IoU threshold in
        testing.
      test_num_proposals: An `int` of the final number of proposals to generate
        in testing.
      use_batched_nms: A `bool` of whether or not use
        `tf.image.combined_non_max_suppression`.
    """
 
  def call(self, self,
           rois: Mapping[str, tf.Tensor],
           raw_scores: Mapping[str, tf.Tensor],
           image_shape: tf.Tensor):
    """.
 
    Args:
      rois: A float Tensor [N], or [B, N] represent region proposals.
      roi_scores: A float Tensor [N], or [B, N] represent scores for each region.
      image_shape: A int tensor [2] or [B, 2] representing image size.
 
    Returns:
      roi: A `tf.Tensor` of shape [B, num_proposals, 4], the proposed
        ROIs in the scaled image coordinate.
      roi_scores: A `tf.Tensor` of shape [B, num_proposals], scores of the
        proposed ROIs.

    """
```

#### Ops -- AnchorLabeler

```python
class AnchorLabeler:
  """Labelers that matches ground truth with anchors and proposals."""

  def __init__(self,
               proposal_sampler,
               proposal_matcher,
               similarity_calculator,
               box_coder):
    """.

    Args:
      proposal_sampler: a ProposalSampler
      proposal_matcher: A BoxMatcher
      similarity_calculator: Such as IOU layer
      box_coder: a BoxCoder that transforms between different formats

    """
  def __call__(self, proposals, gt_boxes, gt_labels):
    """
    Args:
      proposals: a float [N, 4] Tensor represent different proposals.
      gt_boxes: a float [M, 4] Tensor represent ground truth boxes.
      gt_labels: a int [M] Tensor represent ground truth labels.
    Returns:
      cls_targets: a int [K] Tensor represent mapped proposal labels from ground truth labels.
      box_targets: a float [K, 4] Tensor represent mapped proposal boxes from ground truth boxes.
      cls_weights: a float [K] Tensor represent weights for each cls_targets
      box_weights: a float [K] or [K, 4] Tensor represent weights for each box_targets
    """
```

## Questions and Discussion Topics
* Should we provide a meta arch for FasterRCNN.
* SHould we provide some default out-of-box RPN Head and ROI Head.


================================================
FILE: rfcs/README.md
================================================
# Keras API proposal "Request For Comment" (RFC) docs

This folder contains approved API proposals. To propose a new API to be considered for review, you can open a Pull Request in this repository to add a new RFC `.md` doc.

## Process

The process for writing and submitting design proposals is same as the [TensorFlow RFC process](https://github.com/tensorflow/community/blob/master/governance/TF-RFCs.md).

- Start from [this template](https://github.com/keras-team/governance/blob/master/rfcs/yyyymmdd-rfc-template.md).
- Fill in the content. Note that you will need to insert code examples.
    - Provide enough context information for anyone to undertsand what's going on.
    - Provide a solid argument as for why the feature is neeed.
    - Include a code example of the **end-to-end workflow** you have in mind.
- Open a Pull Request in the [Keras API proposals folder in this repository](https://github.com/keras-team/governance/tree/master/rfcs).
- Send the Pull Request link to `keras-users@googlegroups.com` with a subject that starts with `[API DESIGN REVIEW]` (all caps) so that we notice it.
- Wait for comments, and answer them as they come. Edit the proposal as necessary.
- The proposal will finally be approved or rejected during a meeting of the Keras SIG chairs. Once approved, you can send out Pull Requests to implement the API changes or ask others to write Pull Requests (targeting `tf.keras` and `keras-team/keras`).

Note that:
- Anyone is free to send out API proposals.
- Anyone is free to comment on API proposals or ask questions.
- Anyone is free to attend design review meetings as an observer.
- Participation in design review meetings is restricted to Keras SIG chairs.
- Design review meeting notes will be posted publicly after each meeting.

## Template

Use [this template](https://github.com/keras-team/governance/blob/master/rfcs/yyyymmdd-rfc-template.md) to draft an RFC.


================================================
FILE: rfcs/yyyymmdd-rfc-template.md
================================================
# Title of RFC

| Status        | (Proposed / Accepted / Implemented / Obsolete)       |
:-------------- |:---------------------------------------------------- |
| **Author(s)** | My Name (me@example.org), AN Other (you@example.org) |
| **Sponsor**   | A N Expert (expert@example.org)                      |
| **Updated**   | YYYY-MM-DD                                           |
| **Obsoletes** | RFC it replaces, else remove this header             |

## Objective

What are we doing and why? What problem will this solve? What are the goals and
non-goals? This is your executive summary; keep it short, elaborate below.

## Motivation

Why this is a valuable problem to solve? What background information is needed
to show how this design addresses the problem?

Which users are affected by the problem? Why is it a problem? What data supports
this? What related work exists?

## User Benefit

How will users (or other contributors) benefit from this work? What would be the
headline in the release notes or blog post?

## Design Proposal

This is the meat of the document, where you explain your proposal. If you have
multiple alternatives, be sure to use sub-sections for better separation of the
idea, and list pros/cons to each approach. If there are alternatives that you
have eliminated, you should also list those here, and explain why you believe
your chosen approach is superior.

Factors to consider include:

* UX and usability
* How will this change impact users, and how will that be managed?
* Performance implications
* Dependencies
* Maintenance
* Backwards compatibility

## Detailed Design

This section is optional. Elaborate on details if they’re important to
understanding the design, but would make it hard to read the proposal section
above.

## Questions and Discussion Topics

Seed this with open questions you require feedback on from the RFC process.

Download .txt

gitextract_pu1p5x3w/

├── README.md
├── keras_api_design_guidelines.md
├── project_setup_best_practices.md
└── rfcs/
    ├── 20190502-preprocessing-layers.md
    ├── 20190729-keras-preprocessing-redesign.md
    ├── 20191212-keras-categorical-inputs.md
    ├── 20200826-keras-nlp-scoping-design.md
    ├── 20200827-keras-cv-scoping-design.md
    ├── 20200920-keras-nlp-bert.md
    ├── 20200928-keras-cv-single-stage-2d-object-detection.md
    ├── 20210920-tune-end-to-end-ml-workflows-in-keras-tuner.md
    ├── 20220804-keras-cv-two-stage-2d-object-detection.md
    ├── README.md
    └── yyyymmdd-rfc-template.md

Download .json

Condensed preview — 14 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (185K chars).

[
  {
    "path": "README.md",
    "chars": 6148,
    "preview": "# Keras governance structure\n\n![Keras logo](https://s3.amazonaws.com/keras.io/img/keras-logo-2018-large-1200.png)\n\n---\n\n"
  },
  {
    "path": "keras_api_design_guidelines.md",
    "chars": 18880,
    "preview": "# Keras API design guidelines\n\nThese guidelines are meant to help focus design discussions and help us create delightful"
  },
  {
    "path": "project_setup_best_practices.md",
    "chars": 8156,
    "preview": "# Best Practices for Managing Keras Projects on GitHub\n\nThis document describes the best practices for managing the proj"
  },
  {
    "path": "rfcs/20190502-preprocessing-layers.md",
    "chars": 24025,
    "preview": "# Keras Preprocessing Layers\n\n| Status        | Accepted      |\n:-------------- |:--------------------------------------"
  },
  {
    "path": "rfcs/20190729-keras-preprocessing-redesign.md",
    "chars": 14725,
    "preview": "# Keras Preprocessing API\n\n| Status        | Proposed      |\n:-------------- |:-----------------------------------------"
  },
  {
    "path": "rfcs/20191212-keras-categorical-inputs.md",
    "chars": 27276,
    "preview": "# Keras categorical inputs\n\n| Status        | Implemented (https://github.com/tensorflow/community/pull/209) |\n:--------"
  },
  {
    "path": "rfcs/20200826-keras-nlp-scoping-design.md",
    "chars": 5023,
    "preview": "# Keras NLP\n\n| Status        | Proposed      |\n:-------------- |:---------------------------------------------------- |\n"
  },
  {
    "path": "rfcs/20200827-keras-cv-scoping-design.md",
    "chars": 4788,
    "preview": "# Keras CV\n\n| Status        | Proposed      |\n:-------------- |:---------------------------------------------------- |\n|"
  },
  {
    "path": "rfcs/20200920-keras-nlp-bert.md",
    "chars": 15594,
    "preview": "# keras-nlp Transformer Encoder API\n\n| Status        | Proposed      |\n:-------------- |:-------------------------------"
  },
  {
    "path": "rfcs/20200928-keras-cv-single-stage-2d-object-detection.md",
    "chars": 16992,
    "preview": "# keras-cv Single Stage Two-Dimensional Object Detection API\n\n| Status        | Proposed      |\n:-------------- |:------"
  },
  {
    "path": "rfcs/20210920-tune-end-to-end-ml-workflows-in-keras-tuner.md",
    "chars": 19576,
    "preview": "# Tune end-to-end ML workflows in KerasTuner\n\n| Status        | Proposed                                             |\n:"
  },
  {
    "path": "rfcs/20220804-keras-cv-two-stage-2d-object-detection.md",
    "chars": 14309,
    "preview": "# keras-cv Two Stage Two-Dimensional Object Detection API\n\n| Status        | Proposed      |\n:-------------- |:---------"
  },
  {
    "path": "rfcs/README.md",
    "chars": 1916,
    "preview": "# Keras API proposal \"Request For Comment\" (RFC) docs\n\nThis folder contains approved API proposals. To propose a new API"
  },
  {
    "path": "rfcs/yyyymmdd-rfc-template.md",
    "chars": 1881,
    "preview": "# Title of RFC\n\n| Status        | (Proposed / Accepted / Implemented / Obsolete)       |\n:-------------- |:-------------"
  }
]

About this extraction

This page contains the full source code of the keras-team/governance GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 14 files (175.1 KB), approximately 40.9k tokens. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo