Repository: tensorflow/fairness-indicators
Branch: master
Commit: 6c970e0ec6c5
Files: 64
Total size: 442.6 KB
Directory structure:
gitextract_8nbht4qq/
├── .github/
│ ├── ISSUE_TEMPLATE/
│ │ ├── 00-bug-issue.md
│ │ ├── 10-build-installation-issue.md
│ │ ├── 20-documentation-issue.md
│ │ ├── 30-feature-request.md
│ │ ├── 40-performance-issue.md
│ │ └── 50-other-issues.md
│ ├── actions/
│ │ └── setup-env/
│ │ └── action.yml
│ └── workflows/
│ ├── build.yml
│ ├── ci-lint.yml
│ ├── docs.yml
│ └── test.yml
├── .pre-commit-config.yaml
├── CONTRIBUTING.md
├── LICENSE
├── README.md
├── RELEASE.md
├── docs/
│ ├── __init__.py
│ ├── guide/
│ │ ├── _index.yaml
│ │ ├── _toc.yaml
│ │ └── guidance.md
│ ├── index.md
│ ├── javascripts/
│ │ └── mathjax.js
│ ├── stylesheets/
│ │ └── extra.css
│ └── tutorials/
│ ├── Facessd_Fairness_Indicators_Example_Colab.ipynb
│ ├── Fairness_Indicators_Example_Colab.ipynb
│ ├── Fairness_Indicators_Pandas_Case_Study.ipynb
│ ├── Fairness_Indicators_TFCO_CelebA_Case_Study.ipynb
│ ├── Fairness_Indicators_TFCO_Wiki_Case_Study.ipynb
│ ├── Fairness_Indicators_TensorBoard_Plugin_Example_Colab.ipynb
│ ├── Fairness_Indicators_on_TF_Hub_Text_Embeddings.ipynb
│ ├── README.md
│ ├── _Deprecated_Fairness_Indicators_Lineage_Case_Study.ipynb
│ └── _toc.yaml
├── fairness_indicators/
│ ├── __init__.py
│ ├── example_model.py
│ ├── example_model_test.py
│ ├── fairness_indicators_metrics.py
│ ├── remediation/
│ │ ├── __init__.py
│ │ ├── weight_utils.py
│ │ └── weight_utils_test.py
│ ├── test_cases/
│ │ └── dlvm/
│ │ ├── fairness_indicators_dlvm_test_case.ipynb
│ │ └── fi_test_installed.sh
│ ├── tutorial_utils/
│ │ ├── __init__.py
│ │ ├── util.py
│ │ └── util_test.py
│ └── version.py
├── mkdocs.yml
├── pyproject.toml
├── requirements-docs.txt
├── setup.py
└── tensorboard_plugin/
├── README.md
├── pytest.ini
├── setup.py
└── tensorboard_plugin_fairness_indicators/
├── RELEASE.md
├── __init__.py
├── demo.py
├── metadata.py
├── metadata_test.py
├── plugin.py
├── plugin_test.py
├── static/
│ └── index.js
├── summary_v2.py
├── summary_v2_test.py
└── version.py
================================================
FILE CONTENTS
================================================
================================================
FILE: .github/ISSUE_TEMPLATE/00-bug-issue.md
================================================
---
name: Bug Issue
about: Use this template for reporting a bug
labels: 'type:bug'
---
**System information**
- Have I written custom code (as opposed to using stock example code provided):
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
- Fairness Indicators version:
- TensorFlow version:
- Python version:
**Describe the current behavior**
**Describe the expected behavior**
**Standalone code to reproduce the issue**
Provide a reproducible test case that is the bare minimum necessary to generate
the problem. If possible, please share a link to Colab/Jupyter/any notebook.
**Other info / logs** Include any logs or source code that would be helpful to
diagnose the problem. If including tracebacks, please include the full
traceback. Large logs and files should be attached.
================================================
FILE: .github/ISSUE_TEMPLATE/10-build-installation-issue.md
================================================
---
name: Build/Installation Issue
about: Use this template for build/installation issues
labels: 'type:build/install'
---
**System information**
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
- Fairness Indicators version:
- Python version:
- Pip version:
**Describe the problem**
**Provide the exact sequence of commands / steps that you executed before running into the problem**
**Any other info / logs**
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.
================================================
FILE: .github/ISSUE_TEMPLATE/20-documentation-issue.md
================================================
---
name: Documentation Issue
about: Use this template for documentation related issues
labels: 'type:docs'
---
The Fairness Indicators docs are open source! To get involved, read the
documentation contributor guide:
https://github.com/tensorflow/fairness-indicators/blob/master/CONTRIBUTING.md
## URL(s) with the issue:
Please provide a link to the documentation entry.
## Description of issue (what needs changing):
### Clear description
For example, why should someone use this method? How is it useful?
### Correct links
Is the link to the source code correct?
### Parameters defined
Are all parameters defined and formatted correctly?
### Returns defined
Are return values defined?
### Raises listed and defined
Are the errors defined?
### Usage example
Is there currently a usage example for this method?
### Request visuals, if applicable
Are there currently visuals? If not, will it clarify the content?
### Submit a pull request?
Are you planning to also submit a pull request to fix the issue?
================================================
FILE: .github/ISSUE_TEMPLATE/30-feature-request.md
================================================
---
name: Feature Request
about: Use this template for raising a feature request
labels: 'type:feature'
---
**Describe the feature and the current behavior/state.**
**Will this change the current api? How?**
**Who will benefit with this feature?**
**Are you willing to contribute it (Yes/No).**
**Any Other info.**
================================================
FILE: .github/ISSUE_TEMPLATE/40-performance-issue.md
================================================
---
name: Performance Issue
about: Use this template for reporting a performance issue
labels: 'type:performance'
---
**System information**
- Have I written custom code (as opposed to using stock example code provided):
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
- Fairness Indicators version:
- TensorFlow version:
- Python version:
**Describe the current behavior**
**Describe the expected behavior**
**Standalone code to reproduce the issue**
Provide a reproducible test case that is the bare minimum necessary to generate
the problem. If possible, please share a link to Colab/Jupyter/any notebook.
**Other info / logs** Include any logs or source code that would be helpful to
diagnose the problem. If including tracebacks, please include the full
traceback. Large logs and files should be attached.
================================================
FILE: .github/ISSUE_TEMPLATE/50-other-issues.md
================================================
---
name: Other Issues
about: Use this template for any other non-support related issues
labels: 'type:others'
---
This template is for miscellaneous issues not covered by the other categories.
================================================
FILE: .github/actions/setup-env/action.yml
================================================
name: Set up environment
description: Set up environment and install package
inputs:
python-version:
default: "3.10"
required: true
package-root-dir:
default: "./"
required: true
runs:
using: composite
steps:
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ inputs.python-version }}
cache-dependency-path: |
${{ inputs.package-root-dir }}/setup.py
- name: Install dependencies
shell: bash
run: |
python -m pip install --upgrade pip
pip install ${{ inputs.package-root-dir }}[test]
================================================
FILE: .github/workflows/build.yml
================================================
name: Build
on:
push:
branches:
- master
pull_request:
branches:
- master
workflow_dispatch:
jobs:
build:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.9", "3.10"]
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Set up python
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install python build dependencies
run: |
python -m pip install --upgrade pip build
- name: Build wheels
run: |
python -m build --wheel --sdist
mkdir wheelhouse
mv dist/* wheelhouse/
- name: List and check wheels
run: |
pip install twine pkginfo>=1.11.0
${{ matrix.ls || 'ls -lh' }} wheelhouse/
twine check wheelhouse/*
- name: Upload wheels
uses: actions/upload-artifact@v4
with:
name: wheels-${{ matrix.python-version }}
path: ./wheelhouse/*
upload_to_pypi:
name: Upload to PyPI
runs-on: ubuntu-latest
if: (github.event_name == 'release' && startsWith(github.ref, 'refs/tags')) || (github.event_name == 'workflow_dispatch')
needs: [build]
environment:
name: pypi
url: https://pypi.org/p/fairness-indicators
permissions:
id-token: write
steps:
- name: Retrieve wheels
uses: actions/download-artifact@v4.1.8
with:
merge-multiple: true
path: wheels
- name: List the build artifacts
run: |
ls -lAs wheels/
- name: Upload to PyPI
uses: pypa/gh-action-pypi-publish@release/v1.12
with:
packages_dir: wheels/
repository_url: https://pypi.org/legacy/
verify_metadata: false
verbose: true
================================================
FILE: .github/workflows/ci-lint.yml
================================================
name: pre-commit
on:
pull_request:
push:
branches: [master]
jobs:
pre-commit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4.1.7
with:
# Ensure the full history is fetched
# This is required to run pre-commit on a specific set of commits
# TODO: Remove this when all the pre-commit issues are fixed
fetch-depth: 0
- uses: actions/setup-python@v5.1.1
with:
python-version: 3.13
- uses: pre-commit/action@v3.0.1
================================================
FILE: .github/workflows/docs.yml
================================================
name: Deploy docs
on:
workflow_dispatch:
push:
branches:
- 'master'
pull_request:
permissions:
contents: write
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Checkout repo
uses: actions/checkout@v4
- name: Configure Git Credentials
run: |
git config user.name github-actions[bot]
git config user.email 41898282+github-actions[bot]@users.noreply.github.com
if: (github.event_name != 'pull_request')
- name: Set up Python 3.9
uses: actions/setup-python@v5
with:
python-version: '3.9'
cache: 'pip'
cache-dependency-path: |
setup.py
requirements-docs.txt
- name: Save time for cache for mkdocs
run: echo "cache_id=$(date --utc '+%V')" >> $GITHUB_ENV
- name: Caching
uses: actions/cache@v4
with:
key: mkdocs-material-${{ env.cache_id }}
path: .cache
restore-keys: |
mkdocs-material-
- name: Install Dependencies
run: pip install -r requirements-docs.txt
- name: Deploy to GitHub Pages
run: mkdocs gh-deploy --force
if: (github.event_name != 'pull_request')
- name: Build docs to check for errors
run: mkdocs build
if: (github.event_name == 'pull_request')
================================================
FILE: .github/workflows/test.yml
================================================
name: Tests
on:
push:
paths-ignore:
- '**.md'
- 'docs/**'
pull_request:
branches: [ master ]
paths-ignore:
- '**.md'
- 'docs/**'
workflow_dispatch:
jobs:
tests:
if: github.actor != 'copybara-service[bot]'
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ['3.9', '3.10']
package-root-dir: ['./', './tensorboard_plugin']
steps:
- name: Checkout repo
uses: actions/checkout@v4
- name: Set up environment
uses: ./.github/actions/setup-env
with:
python-version: ${{ matrix.python-version }}
package-root-dir: ${{ matrix.package-root-dir }}
- name: Run tests
shell: bash
run: |
cd ${{ matrix.package-root-dir }}
pytest
================================================
FILE: .pre-commit-config.yaml
================================================
# pre-commit is a tool to perform a predefined set of tasks manually and/or
# automatically before git commits are made.
#
# Config reference: https://pre-commit.com/#pre-commit-configyaml---top-level
#
# Common tasks
#
# - Register git hooks: pre-commit install --install-hooks
# - Run on all files: pre-commit run --all-files
#
# These pre-commit hooks are run as CI.
#
# NOTE: if it can be avoided, add configs/args in pyproject.toml or below instead of creating a new `.config.file`.
# https://pre-commit.ci/#configuration
ci:
autoupdate_schedule: monthly
autofix_commit_msg: |
[pre-commit.ci] Apply automatic pre-commit fixes
repos:
# general
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.6.0
hooks:
- id: end-of-file-fixer
exclude: '\.svg$'
- id: trailing-whitespace
exclude: '\.svg$'
- id: check-json
- id: check-yaml
args: [--allow-multiple-documents, --unsafe]
- id: check-toml
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.5.6
hooks:
- id: ruff
args: ["--fix"]
- id: ruff-format
================================================
FILE: CONTRIBUTING.md
================================================
# How to Contribute
We'd love to accept your patches and contributions to this project. There are
just a few small guidelines you need to follow.
## Contributor License Agreement
Contributions to this project must be accompanied by a Contributor License
Agreement. You (or your employer) retain the copyright to your contribution,
this simply gives us permission to use and redistribute your contributions as
part of the project. Head over to to see
your current agreements on file or to sign a new one.
You generally only need to submit a CLA once, so if you've already submitted one
(even if it was for a different project), you probably don't need to do it
again.
## Code reviews
All submissions, including submissions by project members, require review. We
use GitHub pull requests for this purpose. Consult
[GitHub Help](https://help.github.com/articles/about-pull-requests/) for more
information on using pull requests.
================================================
FILE: LICENSE
================================================
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright 2017, The TensorFlow Authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
--------------------------------------------------------------------------------
MIT
The MIT License (MIT)
Copyright (c) 2014-2015, Jon Schlinkert.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
--------------------------------------------------------------------------------
BSD-3-Clause
Copyright (c) 2016, Daniel Wirtz All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of its author, nor the names of its contributors
may be used to endorse or promote products derived from this software
without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
================================================
FILE: README.md
================================================
# Fairness Indicators

Fairness Indicators is designed to support teams in evaluating, improving, and comparing models for fairness concerns in partnership with the broader Tensorflow toolkit.
The tool is currently actively used internally by many of our products. We would love to partner with you to understand where Fairness Indicators is most useful, and where added functionality would be valuable. Please reach out at tfx@tensorflow.org. You can provide feedback and feature requests [here](https://github.com/tensorflow/fairness-indicators/issues/new/choose).
## Key links
* [Introductory Video](https://www.youtube.com/watch?v=pHT-ImFXPQo)
* [Fairness Indicators Case Study](https://developers.google.com/machine-learning/practica/fairness-indicators?utm_source=github&utm_medium=github&utm_campaign=fi-practicum&utm_term=&utm_content=repo-body)
* [Fairness Indicators Example Colab](https://colab.research.google.com/github/tensorflow/fairness-indicators/blob/master/g3doc/tutorials/Fairness_Indicators_Example_Colab.ipynb)
* [Pandas DataFrame to Fairness Indicators Case Study](https://colab.research.google.com/github/tensorflow/fairness-indicators/blob/master/g3doc/tutorials/Fairness_Indicators_Pandas_Case_Study.ipynb)
* [Fairness Indicators: Thinking about Fairness Evaluation](https://github.com/tensorflow/fairness-indicators/blob/master/g3doc/guide/guidance.md)
## What is Fairness Indicators?
Fairness Indicators enables easy computation of commonly-identified fairness metrics for **binary** and **multiclass** classifiers.
Many existing tools for evaluating fairness concerns don’t work well on large-scale datasets and models. At Google, it is important for us to have tools that can work on billion-user systems. Fairness Indicators will allow you to evaluate fairenss metrics across any size of use case.
In particular, Fairness Indicators includes the ability to:
* Evaluate the distribution of datasets
* Evaluate model performance, sliced across defined groups of users
* Feel confident about your results with confidence intervals and evals at multiple thresholds
* Dive deep into individual slices to explore root causes and opportunities for improvement
This [case study](https://developers.google.com/machine-learning/practica/fairness-indicators?utm_source=github&utm_medium=github&utm_campaign=fi-practicum&utm_term=&utm_content=repo-body), complete with [videos](https://www.youtube.com/watch?v=pHT-ImFXPQo) and programming exercises, demonstrates how Fairness Indicators can be used on one of your own products to evaluate fairness concerns over time.
[](http://www.youtube.com/watch?v=pHT-ImFXPQo "")
## [Installation](https://pypi.org/project/fairness-indicators/)
`pip install fairness-indicators`
The pip package includes:
* [**Tensorflow Data Validation (TFDV)**](https://github.com/tensorflow/data-validation) - analyze the distribution of your dataset
* [**Tensorflow Model Analysis (TFMA)**](https://github.com/tensorflow/model-analysis) - analyze model performance
* **Fairness Indicators** - an addition to TFMA that adds fairness metrics and easy performance comparison across slices
* **The What-If Tool (WIT)**](https://github.com/PAIR-code/what-if-tool - an interactive visual interface designed to probe your models better
### Nightly Packages
Fairness Indicators also hosts nightly packages at
https://pypi-nightly.tensorflow.org on Google Cloud. To install the latest
nightly package, please use the following command:
```bash
pip install --extra-index-url https://pypi-nightly.tensorflow.org/simple fairness-indicators
```
This will install the nightly packages for the major dependencies of Fairness
Indicators such as TensorFlow Data Validation (TFDV), TensorFlow Model Analysis
(TFMA).
## How can I use Fairness Indicators?
Tensorflow Models
* Access Fairness Indicators as part of the Evaluator component in Tensorflow Extended \[[docs](https://www.tensorflow.org/tfx/guide/evaluator)]
* Access Fairness Indicators in Tensorboard when evaluating other real-time metrics \[[docs](https://github.com/tensorflow/tensorboard/blob/master/docs/fairness-indicators.md)]
Not using existing Tensorflow tools? No worries!
* Download the Fairness Indicators pip package, and use Tensorflow Model Analysis as a standalone tool \[[docs](https://www.tensorflow.org/tfx/guide/fairness_indicators)]
* Model Agnostic TFMA enables you to compute Fairness Indicators based on the output of any model \[[docs](https://www.tensorflow.org/tfx/guide/fairness_indicators)]
## [Examples](https://github.com/tensorflow/fairness-indicators/tree/master/g3doc/tutorials) directory contains several examples.
* [Fairness_Indicators_Example_Colab.ipynb](https://github.com/tensorflow/fairness-indicators/blob/master/g3doc/tutorials/Fairness_Indicators_Example_Colab.ipynb) gives an overview of Fairness Indicators in [TensorFlow Model Analysis](https://www.tensorflow.org/tfx/guide/tfma) and how to use it with a real dataset. This notebook also goes over [TensorFlow Data Validation](https://www.tensorflow.org/tfx/data_validation/get_started) and [What-If Tool](https://pair-code.github.io/what-if-tool/), two tools for analyzing TensorFlow models that are packaged with Fairness Indicators.
* [Fairness_Indicators_on_TF_Hub.ipynb](https://github.com/tensorflow/fairness-indicators/blob/master/g3doc/tutorials/Fairness_Indicators_on_TF_Hub_Text_Embeddings.ipynb) demonstrates how to use Fairness Indicators to compare models trained on different [text embeddings](https://en.wikipedia.org/wiki/Word_embedding). This notebook uses text embeddings from [TensorFlow Hub](https://www.tensorflow.org/hub), TensorFlow's library to publish, discover, and reuse model components.
* [Fairness_Indicators_TensorBoard_Plugin_Example_Colab.ipynb](https://github.com/tensorflow/fairness-indicators/blob/master/g3doc/tutorials/Fairness_Indicators_TensorBoard_Plugin_Example_Colab.ipynb)
demonstrates how to visualize Fairness Indicators in TensorBoard.
## More questions?
For more information on how to think about fairness evaluation in the context of your use case, see [this link](https://github.com/tensorflow/fairness-indicators/blob/master/g3doc/guide/guidance.md).
If you have found a bug in Fairness Indicators, please file a [GitHub issue](https://github.com/tensorflow/fairness-indicators/issues/new/choose) with as much supporting information as you can provide.
## Compatible versions
The following table shows the package versions that are
compatible with each other. This is determined by our testing framework, but
other *untested* combinations may also work.
|fairness-indicators | tensorflow | tensorflow-data-validation | tensorflow-model-analysis |
|-------------------------------------------------------------------------------------------|--------------------|----------------------------|---------------------------|
|[GitHub master](https://github.com/tensorflow/fairness-indicators/blob/master/RELEASE.md) | nightly (1.x/2.x) | 1.17.0 | 0.48.0 |
|[v0.48.0](https://github.com/tensorflow/fairness-indicators/blob/v0.48.0/RELEASE.md) | 2.17 | 1.17.0 | 0.48.0 |
|[v0.47.0](https://github.com/tensorflow/fairness-indicators/blob/v0.47.0/RELEASE.md) | 2.16 | 1.16.1 | 0.47.1 |
|[v0.46.0](https://github.com/tensorflow/fairness-indicators/blob/v0.44.0/RELEASE.md) | 2.15 | 1.15.1 | 0.46.0 |
|[v0.44.0](https://github.com/tensorflow/fairness-indicators/blob/v0.44.0/RELEASE.md) | 2.12 | 1.13.0 | 0.44.0 |
|[v0.43.0](https://github.com/tensorflow/fairness-indicators/blob/v0.43.0/RELEASE.md) | 2.11 | 1.12.0 | 0.43.0 |
|[v0.42.0](https://github.com/tensorflow/fairness-indicators/blob/v0.42.0/RELEASE.md) | 1.15.5 / 2.10 | 1.11.0 | 0.42.0 |
|[v0.41.0](https://github.com/tensorflow/fairness-indicators/blob/v0.41.0/RELEASE.md) | 1.15.5 / 2.9 | 1.10.0 | 0.41.0 |
|[v0.40.0](https://github.com/tensorflow/fairness-indicators/blob/v0.40.0/RELEASE.md) | 1.15.5 / 2.9 | 1.9.0 | 0.40.0 |
|[v0.39.0](https://github.com/tensorflow/fairness-indicators/blob/v0.39.0/RELEASE.md) | 1.15.5 / 2.8 | 1.8.0 | 0.39.0 |
|[v0.38.0](https://github.com/tensorflow/fairness-indicators/blob/v0.38.0/RELEASE.md) | 1.15.5 / 2.8 | 1.7.0 | 0.38.0 |
|[v0.37.0](https://github.com/tensorflow/fairness-indicators/blob/v0.37.0/RELEASE.md) | 1.15.5 / 2.7 | 1.6.0 | 0.37.0 |
|[v0.36.0](https://github.com/tensorflow/fairness-indicators/blob/v0.36.0/RELEASE.md) | 1.15.2 / 2.7 | 1.5.0 | 0.36.0 |
|[v0.35.0](https://github.com/tensorflow/fairness-indicators/blob/v0.35.0/RELEASE.md) | 1.15.2 / 2.6 | 1.4.0 | 0.35.0 |
|[v0.34.0](https://github.com/tensorflow/fairness-indicators/blob/v0.34.0/RELEASE.md) | 1.15.2 / 2.6 | 1.3.0 | 0.34.0 |
|[v0.33.0](https://github.com/tensorflow/fairness-indicators/blob/v0.33.0/RELEASE.md) | 1.15.2 / 2.5 | 1.2.0 | 0.33.0 |
|[v0.30.0](https://github.com/tensorflow/fairness-indicators/blob/v0.30.0/RELEASE.md) | 1.15.2 / 2.4 | 0.30.0 | 0.30.0 |
|[v0.29.0](https://github.com/tensorflow/fairness-indicators/blob/v0.29.0/RELEASE.md) | 1.15.2 / 2.4 | 0.29.0 | 0.29.0 |
|[v0.28.0](https://github.com/tensorflow/fairness-indicators/blob/v0.28.0/RELEASE.md) | 1.15.2 / 2.4 | 0.28.0 | 0.28.0 |
|[v0.27.0](https://github.com/tensorflow/fairness-indicators/blob/v0.27.0/RELEASE.md) | 1.15.2 / 2.4 | 0.27.0 | 0.27.0 |
|[v0.26.0](https://github.com/tensorflow/fairness-indicators/blob/v0.26.0/RELEASE.md) | 1.15.2 / 2.3 | 0.26.0 | 0.26.0 |
|[v0.25.0](https://github.com/tensorflow/fairness-indicators/blob/v0.25.0/RELEASE.md) | 1.15.2 / 2.3 | 0.25.0 | 0.25.0 |
|[v0.24.0](https://github.com/tensorflow/fairness-indicators/blob/v0.24.0/RELEASE.md) | 1.15.2 / 2.3 | 0.24.0 | 0.24.0 |
|[v0.23.0](https://github.com/tensorflow/fairness-indicators/blob/v0.23.0/RELEASE.md) | 1.15.2 / 2.3 | 0.23.0 | 0.23.0 |
================================================
FILE: RELEASE.md
================================================
# Current Version (Still in Development)
## Major Features and Improvements
## Bug Fixes and Other Changes
## Breaking Changes
## Deprecations
# Version 0.48.0
## Major Features and Improvements
* N/A
## Bug Fixes and Other Changes
* Depends on `tensorflow>=2.17,<2.18`.
* Depends on `tensorflow-data-validation>=1.17.0,<1.18.0`.
* Depends on `tensorflow-model-analysis>=0.48,<0.49`.
* Depends on `protobuf>=4.21.6,<6.0.0`.
## Breaking Changes
* N/A
## Deprecations
* N/A
# Version 0.47.0
## Major Features and Improvements
* Add fairness indicator metrics in the third_party library.
## Bug Fixes and Other Changes
* Depends on `tensorflow>=2.16,<2.17`.
## Breaking Changes
* N/A
## Deprecations
* N/A
# Version 0.46.0
## Major Features and Improvements
* Update example model to use Keras models instead of estimators.
## Bug Fixes and Other Changes
* N/A
## Breaking Changes
* N/A
## Deprecations
* Deprecated python 3.8 support
# Version 0.44.0
## Major Features and Improvements
* N/A
## Bug Fixes and Other Changes
* Depends on `tensorflow>=2.12.0,<2.13`.
* Depends on `tensorflow-data-validation>=1.13.0,<1.14.0`.
* Depends on `tensorflow-model-analysis>=0.44,<0.45`.
* Depends on `protobuf>=3.20.3,<5`.
## Breaking Changes
* N/A
## Deprecations
* Deprecating python3.7 support.
# Version 0.43.0
## Major Features and Improvements
* N/A
## Bug Fixes and Other Changes
* Depends on `tensorflow>=2.11,<2.12`
* Depends on `tensorflow-data-validation>=1.11.0,<1.12.0`.
* Depends on `tensorflow-model-analysis>=0.42,<0.43`.
## Breaking Changes
* N/A
## Deprecations
* N/A
# Version 0.42.0
## Major Features and Improvements
* This is the last version that supports TensorFlow 1.15.x. TF 1.15.x support
will be removed in the next version. Please check the
[TF2 migration guide](https://www.tensorflow.org/guide/migrate) to migrate
to TF2.
## Bug Fixes and Other Changes
* N/A
## Breaking Changes
* N/A
## Deprecations
* N/A
# Version 0.41.0
## Major Features and Improvements
* N/A
## Bug Fixes and Other Changes
* Depends on `tensorflow-data-validation>=1.10.0,<1.11.0`.
* Depends on `tensorflow-model-analysis>=0.41,<0.42`.
* Depends on `tensorflow>=1.15.5,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,!=2.9.*,<3`.
## Breaking Changes
* N/A
## Deprecations
* N/A
# Version 0.40.0
## Major Features and Improvements
* Allow counterfactual metrics to be calculated from predictions instead of
only features.
* Add precision and recall to the set of fairness indicators metrics.
## Bug Fixes and Other Changes
* Depends on `tensorflow-data-validation>=1.9.0,<1.10.0`.
* Depends on `tensorflow-model-analysis>=0.40,<0.41`.
* Depends on `tensorflow>=1.15.5,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,<3`.
## Breaking Changes
* N/A
## Deprecations
* N/A
# Version 0.39.0
## Major Features and Improvements
* Allow counterfactual metrics to be calculated from predictions instead of
only features.
* Add precision and recall to the set of fairness indicators metrics.
## Bug Fixes and Other Changes
* Depends on `tensorflow-data-validation>=1.8.0,<1.9.0`.
* Depends on `tensorflow-model-analysis>=0.39,<0.40`.
* Depends on `tensorflow>=1.15.5,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3`.
## Breaking Changes
* N/A
## Deprecations
* N/A
# Version 0.38.0
## Major Features and Improvements
* N/A
## Bug Fixes and Other Changes
* Depends on `tensorflow-data-validation>=1.7.0,<1.8.0`.
* Depends on `tensorflow-model-analysis>=0.38,<0.39`.
* Depends on `tensorflow>=1.15.5,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3`.
## Breaking Changes
* N/A
## Deprecations
* N/A
# Version 0.37.0
## Major Features and Improvements
* N/A
## Bug Fixes and Other Changes
* Fix Fairness Indicators UI bug with overlapping charts when comparing EvalResults
* Depends on `tensorflow-data-validation>=1.6.0,<1.7.0`.
* Depends on `tensorflow-model-analysis>=0.37,<0.38`.
* Depends on `tensorflow>=1.15.5,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,<3`.
## Breaking Changes
* N/A
## Deprecations
* N/A
# Version 0.36.0
## Major Features and Improvements
* N/A
## Bug Fixes and Other Changes
* Depends on `tensorflow-data-validation>=1.5.0,<1.6.0`.
* Depends on `tensorflow-model-analysis>=0.36,<0.37`.
* Depends on `tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,<3`.
## Breaking Changes
* N/A
## Deprecations
* N/A
# Version 0.35.0
## Major Features and Improvements
* N/A
## Bug Fixes and Other Changes
* Depends on `tensorflow-data-validation>=1.4.0,<1.5.0`.
* Depends on `tensorflow-model-analysis>=0.35,<0.36`.
## Breaking Changes
* N/A
## Deprecations
* Deprecating python 3.6 support.
# Version 0.34.0
## Major Features and Improvements
* N/A
## Bug Fixes and Other Changes
* Depends on `tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,<3`.
* Depends on `tensorflow-data-validation>=1.3.0,<1.4.0`.
* Depends on `tensorflow-model-analysis>=0.34,<0.35`.
## Breaking Changes
* Drop Py2 support.
## Deprecations
* N/A
# Version 0.33.0
## Major Features and Improvements
* Porting Counterfactual Fairness metrics into FI UI.
## Bug Fixes and Other Changes
* Improve rendering of HTML stubs for Fairness Indicators UI
* Depends on `tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,<3`.
* Depends on `protobuf>=3.13,<4`.
* Depends on `tensorflow-data-validation>=1.2.0,<1.3.0`.
* Depends on `tensorflow-model-analysis>=0.33,<0.34`.
## Breaking Changes
* N/A
## Deprecations
* N/A
# Version 0.30.0
## Major Features and Improvements
* N/A
## Bug Fixes and Other Changes
* Depends on `tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.5.*,<3`.
* Depends on `tensorflow-data-validation>=0.30,<0.31`.
* Depends on `tensorflow-model-analysis>=0.30,<0.31`.
## Breaking Changes
* N/A
## Deprecations
* N/A
# Version 0.29.0
## Major Features and Improvements
* N/A
## Bug Fixes and Other Changes
* Depends on `tensorflow-data-validation>=0.29,<0.30`.
* Depends on `tensorflow-model-analysis>=0.29,<0.30`.
## Breaking Changes
* N/A
## Deprecations
* N/A
# Version 0.28.0
## Major Features and Improvements
* In Fairness Indicators UI, sort metrics list to show common metrics first
* For lift, support negative values in bar chart.
* Adding two new metrics - Flip Count and Flip Rate to evaluate Counterfactual
Fairness.
* Add Lift metrics under addons/fairness.
* Porting Lift metrics into FI UI.
## Bug Fixes and Other Changes
* Depends on `tensorflow-data-validation>=0.28,<0.29`.
* Depends on `tensorflow-model-analysis>=0.28,<0.29`.
## Breaking Changes
* N/A
## Deprecations
* N/A
# Version 0.27.0
## Major Features and Improvements
* N/A
## Bug fixes and other changes
* Added test cases for DLVM testing.
* Move the util files to a seperate folder.
* Add `tensorflow-hub` as a dependency because it's used inside the
example_model.py.
* Depends on `tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,<3`.
* Depends on `tensorflow-data-validation>=0.27,<0.28`.
* Depends on `tensorflow-model-analysis>=0.27,<0.28`.
## Breaking changes
* N/A
## Deprecations
* N/A
# Version 0.26.0
## Major Features and Improvements
* Sorting fairness metrics table rows to keep slices in order with slice drop
down in the UI.
## Bug fixes and other changes
* Update fairness_indicators.documentation.examples.util to TensorFlow 2.0.
* Table now displays 3 decimal places instead of 2.
* Fix the bug that metric list won't refresh if the input eval result changed.
* Remove d3-tip dependency.
* Depends on `tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,!=2.4.*,<3`.
* Depends on `tensorflow-data-validation>=0.26,<0.27`.
* Depends on `tensorflow-model-analysis>=0.26,<0.27`.
## Breaking changes
* N/A
## Deprecations
* N/A
# Version 0.25.0
## Major Features and Improvements
* Add workflow buttons to Fairness Indicators UI, providing tutorial on how to
configure metrics and parameters, and how to interpret the results.
* Add metric definitions as tooltips in the metric selector UI
* Removing prefix from metric names in graph titles in UI.
* From this release Fairness Indicators will also be hosting nightly packages
on https://pypi-nightly.tensorflow.org. To install the nightly package use
the following command:
```
pip install --extra-index-url https://pypi-nightly.tensorflow.org/simple fairness-indicators
```
Note: These nightly packages are unstable and breakages are likely to
happen. The fix could often take a week or more depending on the complexity
involved for the wheels to be available on the PyPI cloud service. You can
always use the stable version of Fairness Indicators available on PyPI by
running the command `pip install fairness-indicators` .
## Bug fixes and other changes
* Update table colors.
* Modify privacy note in Fairness Indicators UI.
* Depends on `tensorflow-data-validation>=0.25,<0.26`.
* Depends on `tensorflow-model-analysis>=0.25,<0.26`.
## Breaking changes
* N/A
## Deprecations
* N/A
# Version 0.24.0
## Major Features and Improvements
* Made the Fairness Indicators UI thresholds drop down list sorted.
## Bug fixes and other changes
* Fix in the issue where the Sort menu is not hidden when there is no model
comparison.
* Depends on `tensorflow-data-validation>=0.24,<0.25`.
* Depends on `tensorflow-model-analysis>=0.24,<0.25`.
## Breaking changes
* N/A
## Deprecations
* Deprecated Py3.5 support.
# Version 0.23.1
## Major Features and Improvements
* N/A
## Bug fixes and other changes
* Fix broken import path in Fairness_Indicators_Example_Colab and Fairness_Indicators_on_TF_Hub_Text_Embeddings.
## Breaking changes
* N/A
## Deprecations
* N/A
# Version 0.23.0
## Major Features and Improvements
* N/A
## Bug fixes and other changes
* Depends on `tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,<3`.
* Depends on `tensorflow-data-validation>=0.23,<0.24`.
* Depends on `tensorflow-model-analysis>=0.23,<0.24`.
## Breaking changes
* N/A
## Deprecations
* Deprecating Py2 support.
* Note: We plan to drop py3.5 support in the next release.
================================================
FILE: docs/__init__.py
================================================
# Copyright 2019 Google LLC. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
================================================
FILE: docs/guide/_index.yaml
================================================
book_path: /responsible_ai/_book.yaml
project_path: /responsible_ai/_project.yaml
title: Fairness Indicators
landing_page:
custom_css_path: /site-assets/css/style.css
nav: left
meta_tags:
- name: description
content: >
Fairness Indicators tool suite for TensorFlow.
rows:
- classname: devsite-landing-row-100
- heading: Fairness Indicators
options:
- description-50
items:
- description: >
Fairness Indicators is a library that enables easy computation of commonly-identified
fairness metrics for binary and multiclass classifiers. With the Fairness Indicators tool
suite, you can:
Compute commonly-identified fairness metrics for classification models
Compare model performance across subgroups to a baseline, or to other models
Use confidence intervals to surface statistically significant disparities
- classname: devsite-landing-row-cards
items:
- heading: "ML Practicum: Fairness in Perspective API using Fairness Indicators"
image_path: /responsible_ai/fairness_indicators/images/mlpracticum.png
path: "https://developers.google.com/machine-learning/practica/fairness-indicators?utm_source=github&utm_medium=github&utm_campaign=fi-practicum&utm_term=&utm_content=repo-body"
buttons:
- label: "Try the Case Study"
path: "https://developers.google.com/machine-learning/practica/fairness-indicators?utm_source=github&utm_medium=github&utm_campaign=fi-practicum&utm_term=&utm_content=repo-body"
- heading: "Fairness Indicators on the TensorFlow blog"
image_path: /resources/images/tf-logo-card-16x9.png
path: https://blog.tensorflow.org/2019/12/fairness-indicators-fair-ML-systems.html
buttons:
- label: "Read on the TensorFlow blog"
path: https://blog.tensorflow.org/2019/12/fairness-indicators-fair-ML-systems.html
- heading: "Fairness Indicators on GitHub"
image_path: /resources/images/github-card-16x9.png
path: https://github.com/tensorflow/fairness-indicators
buttons:
- label: "View on GitHub"
path: https://github.com/tensorflow/fairness-indicators
- classname: devsite-landing-row-cards
items:
- heading: "Fairness Indicators on the Google AI Blog"
image_path: /responsible_ai/fairness_indicators/images/googleai.png
path: https://ai.googleblog.com/2019/12/fairness-indicators-scalable.html
buttons:
- label: "Read on Google AI blog"
path: https://ai.googleblog.com/2019/12/fairness-indicators-scalable.html
- heading: "Fairness Indicators at Google I/O"
path: https://www.youtube.com/watch?v=6CwzDoE8J4M
youtube_id: 6CwzDoE8J4M?rel=0&show_info=0
buttons:
- label: "Watch the video"
path: https://www.youtube.com/watch?v=6CwzDoE8J4M
================================================
FILE: docs/guide/_toc.yaml
================================================
toc:
- title: Overview
path: /responsible_ai/fairness_indicators/guide/
- title: Thinking about fairness evaluation
path: /responsible_ai/fairness_indicators/guide/guidance
================================================
FILE: docs/guide/guidance.md
================================================
# Fairness Indicators: Thinking about Fairness Evaluation
Fairness Indicators is a useful tool for evaluating _binary_ and _multi-class_
classifiers for fairness. Eventually, we hope to expand this tool, in
partnership with all of you, to evaluate even more considerations.
Keep in mind that quantitative evaluation is only one part of evaluating a
broader user experience. Start by thinking about the different _contexts_
through which a user may experience your product. Who are the different types of
users your product is expected to serve? Who else may be affected by the
experience?
When considering AI's impact on people, it is important to always remember that
human societies are extremely complex! Understanding people, and their social
identities, social structures and cultural systems are each huge fields of open
research in their own right. Throw in the complexities of cross-cultural
differences around the globe, and getting even a foothold on understanding
societal impact can be challenging. Whenever possible, it is recommended you
consult with appropriate domain experts, which may include social scientists,
sociolinguists, and cultural anthropologists, as well as with members of the
populations on which technology will be deployed.
A single model, for example, the toxicity model that we leverage in the
[example colab](../../tutorials/Fairness_Indicators_Example_Colab),
can be used in many different contexts. A toxicity model deployed on a website
to filter offensive comments, for example, is a very different use case than the
model being deployed in an example web UI where users can type in a sentence and
see what score the model gives. Depending on the use case, and how users
experience the model prediction, your product will have different risks,
effects, and opportunities and you may want to evaluate for different fairness
concerns.
The questions above are the foundation of what ethical considerations, including
fairness, you may want to take into account when designing and developing your
ML-based product. These questions also motivate which metrics and which groups
of users you should use the tool to evaluate.
Before diving in further, here are three recommended resources for getting
started:
* **[The People + AI Guidebook](https://pair.withgoogle.com/) for
Human-centered AI design:** This guidebook is a great resource for the
questions and aspects to keep in mind when designing a machine-learning
based product. While we created this guidebook with designers in mind, many
of the principles will help answer questions like the one posed above.
* **[Our Fairness Lessons Learned](https://www.youtube.com/watch?v=6CwzDoE8J4M):**
This talk at Google I/O discusses lessons we have learned in our goal to
build and design inclusive products.
* **[ML Crash Course: Fairness](https://developers.google.com/machine-learning/crash-course/fairness/video-lecture):**
The ML Crash Course has a 70 minute section dedicated to identifying and
evaluating fairness concerns
So, why look at individual slices? Evaluation over individual slices is
important as strong overall metrics can obscure poor performance for certain
groups. Similarly, performing well for a certain metric (accuracy, AUC) doesn’t
always translate to acceptable performance for other metrics (false positive
rate, false negative rate) that are equally important in assessing opportunity
and harm for users.
The below sections will walk through some of the aspects to consider.
## Which groups should I slice by?
In general, a good practice is to slice by as many groups as may be affected by
your product, since you never know when performance might differ for one of the
other. However, if you aren’t sure, think about the different users who may be
engaging with your product, and how they might be affected. Consider,
especially, slices related to sensitive characteristics such as race, ethnicity,
gender, nationality, income, sexual orientation, and disability status.
**What if I don’t have data labeled for the slices I want to investigate?**
Good question. We know that many datasets don’t have ground-truth labels for
individual identity attributes.
If you find yourself in this position, we recommend a few approaches:
1. Identify if there _are_ attributes that you have that may give you some
insight into the performance across groups. For example, _geography_ while
not equivalent to ethnicity & race, may help you uncover any disparate
patterns in performance
1. Identify if there are representative public datasets that might map well to
your problem. You can find a range of diverse and inclusive datasets on the
[Google AI site](https://ai.google/responsibilities/responsible-ai-practices/?category=fairness),
which include
[Project Respect](https://www.blog.google/technology/ai/fairness-matters-promoting-pride-and-respect-ai/),
[Inclusive Images](https://www.kaggle.com/c/inclusive-images-challenge), and
[Open Images Extended](https://ai.google/tools/datasets/open-images-extended-crowdsourced/),
among others.
1. Leverage rules or classifiers, when relevant, to label your data with
objective surface-level attributes. For example, you can label text as to
whether or not there is an identity term _in_ the sentence. Keep in mind
that classifiers have their own challenges, and if you’re not careful, may
introduce another layer of bias as well. Be clear about what your classifier
is actually classifying. For
example, an age classifier on images is in fact classifying _perceived age_.
Additionally, when possible, leverage surface-level attributes that _can_ be
objectively identified in the data. For example, it is ill-advised to build
an image classifier for race or ethnicity, because these are not visual
traits that can be defined in an image. A classifier would likely pick up on
proxies or stereotypes. Instead, building a classifier for skin tone may be
a more appropriate way to label and evaluate an image. Lastly, ensure high
accuracy for classifiers labeling such attributes.
1. Find more representative data that is labeled
**Always make sure to evaluate on multiple, diverse datasets.**
If your evaluation data is not adequately representative of your user base, or
the types of data likely to be encountered, you may end up with deceptively good
fairness metrics. Similarly, high model performance on one dataset doesn’t
guarantee high performance on others.
**Keep in mind subgroups aren’t always the best way to classify individuals.**
People are multidimensional and belong to more than one group, even within a
single dimension -- consider someone who is multiracial, or belongs to multiple
racial groups. Also, while overall metrics for a given racial group may look
equitable, particular interactions, such as race and gender together may show
unintended bias. Moreover, many subgroups have fuzzy boundaries which are
constantly being redrawn.
**When have I tested enough slices, and how do I know which slices to test?**
We acknowledge that there are a vast number of groups or slices that may be
relevant to test, and when possible, we recommend slicing and evaluating a
diverse and wide range of slices and then deep-diving where you spot
opportunities for improvement. It is also important to acknowledge that even
though you may not see concerns on slices you have tested, that doesn’t imply
that your product works for _all_ users, and getting diverse user feedback and
testing is important to ensure that you are continually identifying new
opportunities.
To get started, we recommend thinking through your particular use case and the
different ways users may engage with your product. How might different users
have different experiences? What does that mean for slices you should evaluate?
Collecting feedback from diverse users may also highlight potential slices to
prioritize.
## Which metrics should I choose?
When selecting which metrics to evaluate for your system, consider who will be
experiencing your model, how it will be experienced, and the effects of that
experience.
For example, how does your model give people more dignity or autonomy, or
positively impact their emotional, physical or financial wellbeing? In contrast,
how could your model’s predictions reduce people's dignity or autonomy, or
negatively impact their emotional, physical or financial wellbeing?
**In general, we recommend slicing _all your existing performance metrics as
good practice. We also recommend evaluating your metrics across
multiple thresholds_** in order
to understand how the threshold can affect the performance for different groups.
In addition, if there is a predicted label which is uniformly "good" or “bad”,
then consider reporting (for each subgroup) the rate at which that label is
predicted. For example, a “good” label would be a label whose prediction grants
a person access to some resource, or enables them to perform some action.
## Critical fairness metrics for classification
When thinking about a classification model, think about the effects of _errors_
(the differences between the actual “ground truth” label, and the label from the
model). If some errors may pose more opportunity or harm to your users, make
sure you evaluate the rates of these errors across groups of users. These error
rates are defined below, in the metrics currently supported by the Fairness
Indicators beta.
**Over the course of the next year, we hope to release case studies of different
use cases and the metrics associated with these so that we can better highlight
when different metrics might be most appropriate.**
**Metrics available today in Fairness Indicators**
Note: There are many valuable fairness metrics that are not currently supported
in the Fairness Indicators beta. As we continue to add more metrics, we will
continue to add guidance for these metrics, here. Below, you can access
instructions to add your own metrics to Fairness Indicators. Additionally,
please reach out to [tfx@tensorflow.org](mailto:tfx@tensorflow.org) if there are
metrics that you would like to see. We hope to partner with you to build this
out further.
**Positive Rate / Negative Rate**
* _Definition:_ The percentage
of data points that are classified as positive or negative, independent of
ground truth
* _Relates to:_ Demographic
Parity and Equality of Outcomes, when equal across subgroups
* _When to use this metric:_
Fairness use cases where having equal final percentages of groups is
important
**True Positive Rate / False Negative Rate**
* _Definition:_ The percentage
of positive data points (as labeled in the ground truth) that are
_correctly_ classified as positive, or the percentage of positive data
points that are _incorrectly_ classified as negative
* _Relates to:_ Equality of
Opportunity (for the positive class), when equal across subgroups
* _When to use this metric:_
Fairness use cases where it is important that the same % of qualified
candidates are rated positive in each group. This is most commonly
recommended in cases of classifying positive outcomes, such as loan
applications, school admissions, or whether content is kid-friendly
**True Negative Rate / False Positive Rate**
* _Definition:_ The percentage
of negative data points (as labeled in the ground truth) that are correctly
classified as negative, or the percentage of negative data points that are
incorrectly classified as positive
* _Relates to:_ Equality of
Opportunity (for the negative class), when equal across subgroups
* _When to use this metric:_
Fairness use cases where error rates (or misclassifying something as
positive) are more concerning than classifying the positives. This is most
common in abuse cases, where _positives_ often lead to negative actions.
These are also important for Facial Analysis Technologies such as face
detection or face attributes
Note: When both “positive” and “negative” mistakes are equally important, the
metric is called “equality of
odds”. This can be measured by
evaluating and aiming for equality across both the TNR & FNR, or both the TPR &
FPR. For example, an app that counts how many cars go past a stop sign is
roughly equally bad whether or not it accidentally includes an extra car (a
false positive) or accidentally excludes a car (a false negative).
**Accuracy & AUC**
* _Relates to:_ Predictive
Parity, when equal across subgroups
* _When to use these metrics:_
Cases where precision of the task is most critical (not necessarily in a
given direction), such as face identification or face clustering
**False Discovery Rate**
* _Definition:_ The percentage
of negative data points (as labeled in the ground truth) that are
incorrectly classified as positive out of all data points classified as
positive. This is also the inverse of PPV
* _Relates to:_ Predictive
Parity (also known as Calibration), when equal across subgroups
* _When to use this metric:_
Cases where the fraction of correct positive predictions should be equal
across subgroups
**False Omission Rate**
* _Definition:_ The percentage
of positive data points (as labeled in the ground truth) that are
incorrectly classified as negative out of all data points classified as
negative. This is also the inverse of NPV
* _Relates to:_ Predictive
Parity (also known as Calibration), when equal across subgroups
* _When to use this metric:_
Cases where the fraction of correct negative predictions should be equal
across subgroups
Note: When used together, False Discovery Rate and False Omission Rate relate to
Conditional Use Accuracy Equality, when FDR and FOR are both equal across
subgroups. FDR and FOR are also similar to FPR and FNR, where FDR/FOR compare
FP/FN to predicted negative/positive data points, and FPR/FNR compare FP/FN to
ground truth negative/positive data points. FDR/FOR can be used instead of
FPR/FNR when predictive parity is more critical than equality of opportunity.
**Overall Flip Rate / Positive to Negative Prediction Flip Rate / Negative to
Positive Prediction Flip Rate**
* *Definition:* The
probability that the classifier gives a different prediction if the identity
attribute in a given feature were changed.
* *Relates to:* Counterfactual
fairness
* *When to use this metric:*
When determining whether the model’s prediction changes when the sensitive
attributes referenced in the example is removed or replaced. If it does,
consider using the Counterfactual Logit Pairing technique within the
Tensorflow Model Remediation library.
**Flip Count / Positive to Negative Prediction Flip Count / Negative to Positive
Prediction Flip Count** *
* *Definition:* The number of
times the classifier gives a different prediction if the identity term in a
given example were changed.
* *Relates to:* Counterfactual
fairness
* *When to use this metric:*
When determining whether the model’s prediction changes when the sensitive
attributes referenced in the example is removed or replaced. If it does,
consider using the Counterfactual Logit Pairing technique within the
Tensorflow Model Remediation library.
**Examples of which metrics to select**
* _Systematically failing to detect faces in a camera app can lead to a
negative user experience for certain user groups._ In this case, false
negatives in a face detection system may lead to product failure, while a
false positive (detecting a face when there isn’t one) may pose a slight
annoyance to the user. Thus, evaluating and minimizing the false negative
rate is important for this use case.
* _Unfairly marking text comments from certain people as “spam” or “high
toxicity” in a moderation system leads to certain voices being silenced._ On
one hand, a high false positive rate leads to unfair censorship. On the
other, a high false negative rate could lead to a proliferation of toxic
content from certain groups, which may both harm the user and constitute a
representational harm for those groups. Thus, both metrics are important to
consider, in addition to metrics which take into account all types of errors
such as accuracy or AUC.
**Don’t see the metrics you’re looking for?**
Follow the documentation
[here](https://tensorflow.github.io/model-analysis/post_export_metrics/)
to add you own custom metric.
## Final notes
**A gap in metric between two groups can be a sign that your model may have
unfair skews**. You should interpret your results according to your use case.
However, the first sign that you may be treating one set of users _unfairly_ is
when the metrics between that set of users and your overall are significantly
different. Make sure to account for confidence intervals when looking at these
differences. When you have too few samples in a particular slice, the difference
between metrics may not be accurate.
**Achieving equality across groups on Fairness Indicators doesn’t mean the model
is fair.** Systems are highly complex, and achieving equality on one (or even
all) of the provided metrics can’t guarantee Fairness.
**Fairness evaluations should be run throughout the development process and
post-launch (not the day before launch).** Just like improving your product is
an ongoing process and subject to adjustment based on user and market feedback,
making your product fair and equitable requires ongoing attention. As different
aspects of the model changes, such as training data, inputs from other models,
or the design itself, fairness metrics are likely to change. “Clearing the bar”
once isn’t enough to ensure that all of the interacting components have remained
intact over time.
**Adversarial testing should be performed for rare, malicious examples.**
Fairness evaluations aren’t meant to replace adversarial testing. Additional
defense against rare, targeted examples is crucial as these examples probably
will not manifest in training or evaluation data.
================================================
FILE: docs/index.md
================================================
# Fairness Indicators
/// html | div[style='float: left; width: 50%;']
Fairness Indicators is a library that enables easy computation of commonly-identified fairness metrics for binary and multiclass classifiers. With the Fairness Indicators tool suite, you can:
- Compute commonly-identified fairness metrics for classification models
- Compare model performance across subgroups to a baseline, or to other models
- Use confidence intervals to surface statistically significant disparities
- Perform evaluation over multiple thresholds
Use Fairness Indicators via the:
- [Evaluator component](https://tensorflow.github.io/tfx/guide/evaluator/) in a [TFX pipeline](https://tensorflow.github.io/tfx/)
- [TensorBoard plugin](https://github.com/tensorflow/tensorboard/blob/master/docs/fairness-indicators.md)
- [TensorFlow Model Analysis library](https://tensorflow.github.io/tfx/guide/fairness_indicators/)
- [Model Agnostic TFMA library](https://tensorflow.github.io/tfx/guide/fairness_indicators/#using-fairness-indicators-with-non-tensorflow-models)
///
/// html | div[style='float: right;width: 50%;']
```python
eval_config_pbtxt = """
model_specs {
label_key: "%s"
}
metrics_specs {
metrics {
class_name: "FairnessIndicators"
config: '{ "thresholds": [0.25, 0.5, 0.75] }'
}
metrics {
class_name: "ExampleCount"
}
}
slicing_specs {}
slicing_specs {
feature_keys: "%s"
}
options {
compute_confidence_intervals { value: False }
disabled_outputs{values: "analysis"}
}
""" % (LABEL_KEY, GROUP_KEY)
```
///
/// html | div[style='clear: both;']
///
- 
### [ML Practicum: Fairness in Perspective API using Fairness Indicators](https://developers.google.com/machine-learning/practica/fairness-indicators?utm_source=github&utm_medium=github&utm_campaign=fi-practicum&utm_term=&utm_content=repo-body)
---
[Try the Case Study](https://developers.google.com/machine-learning/practica/fairness-indicators?utm_source=github&utm_medium=github&utm_campaign=fi-practicum&utm_term=&utm_content=repo-body)
- 
### [Fairness Indicators on the TensorFlow blog](https://blog.tensorflow.org/2019/12/fairness-indicators-fair-ML-systems.html)
---
[Read on the TensorFlow blog](https://blog.tensorflow.org/2019/12/fairness-indicators-fair-ML-systems.html)
- 
### [Fairness Indicators on GitHub](https://github.com/tensorflow/fairness-indicators)
---
[View on GitHub](https://github.com/tensorflow/fairness-indicators)
- 
### [Fairness Indicators on the Google AI Blog](https://ai.googleblog.com/2019/12/fairness-indicators-scalable.html)
---
[Read on Google AI blog](https://ai.googleblog.com/2019/12/fairness-indicators-scalable.html)
-
### [Fairness Indicators at Google I/O](https://www.youtube.com/watch?v=6CwzDoE8J4M)
---
[Watch the video](https://www.youtube.com/watch?v=6CwzDoE8J4M)
================================================
FILE: docs/javascripts/mathjax.js
================================================
window.MathJax = {
tex: {
inlineMath: [["\\(", "\\)"]],
displayMath: [["\\[", "\\]"]],
processEscapes: true,
processEnvironments: true
},
options: {
ignoreHtmlClass: ".*|",
processHtmlClass: "arithmatex"
}
};
document$.subscribe(() => {
MathJax.startup.output.clearCache()
MathJax.typesetClear()
MathJax.texReset()
MathJax.typesetPromise()
})
================================================
FILE: docs/stylesheets/extra.css
================================================
:root {
--md-primary-fg-color: #FFA800;
--md-primary-fg-color--light: #CCCCCC;
--md-primary-fg-color--dark: #425066;
}
.video-wrapper {
max-width: 240px;
display: flex;
flex-direction: row;
}
.video-wrapper > iframe {
width: 100%;
aspect-ratio: 16 / 9;
}
.buttons-wrapper {
flex-wrap: wrap;
gap: 1em;
display: flex;
/* flex-grow: 1; */
/* justify-content: center; */
/* align-content: center; */
}
.buttons-wrapper > a {
justify-content: center;
align-content: center;
flex-wrap: nowrap;
/* gap: 1em; */
align-items: center;
text-align: center;
flex: 1 1 30%;
display: flex;
}
.md-button > .buttons-content {
align-items: center;
justify-content: center;
display: flex;
gap: 1em;
}
================================================
FILE: docs/tutorials/Facessd_Fairness_Indicators_Example_Colab.ipynb
================================================
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "Sxt-9qpNgPxo"
},
"source": [
"##### Copyright 2020 The TensorFlow Authors."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Phnw6c3-gQ1f"
},
"outputs": [],
"source": [
"#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
"# you may not use this file except in compliance with the License.\n",
"# You may obtain a copy of the License at\n",
"#\n",
"# https://www.apache.org/licenses/LICENSE-2.0\n",
"#\n",
"# Unless required by applicable law or agreed to in writing, software\n",
"# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
"# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
"# See the License for the specific language governing permissions and\n",
"# limitations under the License."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "aalPefrUUplk"
},
"source": [
"# FaceSSD Fairness Indicators Example Colab"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "KFRBcGOYgEAI"
},
"source": [
"
"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "UZ48WFLwbCL6"
},
"source": [
"##Overview\n",
"\n",
"In this activity, you'll use [Fairness Indicators](https://tensorflow.github.io/fairness-indicators) to explore the [FaceSSD predictions on Labeled Faces in the Wild dataset](https://modelcards.withgoogle.com/face-detection). Fairness Indicators is a suite of tools built on top of [TensorFlow Model Analysis](https://tensorflow.github.io/model-analysis/get_started) that enable regular evaluation of fairness metrics in product pipelines.\n",
"\n",
"##About the Dataset\n",
"\n",
"In this exercise, you'll work with the FaceSSD prediction dataset, approximately 200k different image predictions and groundtruths generated by FaceSSD API.\n",
"\n",
"##About the Tools\n",
"\n",
"[TensorFlow Model Analysis](https://tensorflow.github.io/model_analysis/get_started) is a library for evaluating both TensorFlow and non-TensorFlow machine learning models. It allows users to evaluate their models on large amounts of data in a distributed manner, computing in-graph and other metrics over different slices of data and visualize in notebooks.\n",
"\n",
"[TensorFlow Data Validation](https://tensorflow.github.io/data-validation/get_started) is one tool you can use to analyze your data. You can use it to find potential problems in your data, such as missing values and data imbalances, that can lead to Fairness disparities.\n",
"\n",
"With [Fairness Indicators](https://tensorflow.github.io/fairness-indicators/), users will be able to: \n",
"\n",
"* Evaluate model performance, sliced across defined groups of users\n",
"* Feel confident about results with confidence intervals and evaluations at multiple thresholds"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "u33JXdluZ2lG"
},
"source": [
"# Importing\n",
"\n",
"Run the following code to install the fairness_indicators library. This package contains the tools we'll be using in this exercise. Restart Runtime may be requested but is not necessary."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "EoRNffG599XP"
},
"outputs": [],
"source": [
"!pip install apache_beam\n",
"!pip install fairness-indicators\n",
"!pip install witwidget\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "B8dlyTyiTe-9"
},
"outputs": [],
"source": [
"import os\n",
"import tempfile\n",
"import apache_beam as beam\n",
"import numpy as np\n",
"import pandas as pd\n",
"from datetime import datetime\n",
"\n",
"import tensorflow_hub as hub\n",
"import tensorflow as tf\n",
"import tensorflow_model_analysis as tfma\n",
"import tensorflow_data_validation as tfdv\n",
"from tensorflow_model_analysis.addons.fairness.post_export_metrics import fairness_indicators\n",
"from tensorflow_model_analysis.addons.fairness.view import widget_view\n",
"from tensorflow_model_analysis.model_agnostic_eval import model_agnostic_predict as agnostic_predict\n",
"from tensorflow_model_analysis.model_agnostic_eval import model_agnostic_evaluate_graph\n",
"from tensorflow_model_analysis.model_agnostic_eval import model_agnostic_extractor\n",
"\n",
"from witwidget.notebook.visualization import WitConfigBuilder\n",
"from witwidget.notebook.visualization import WitWidget"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "TsplOJGqWCf5"
},
"source": [
"# Download and Understand the Data"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "vFOQ4AaIcAn2"
},
"source": [
"[Labeled Faces in the Wild](http://vis-www.cs.umass.edu/lfw/) is a public benchmark dataset for face verification, also known as pair matching. LFW contains more than 13,000 images of faces collected from the web.\n",
"\n",
"We ran FaceSSD predictions on this dataset to predict whether a face is present in a given image. In this Colab, we will slice data according to gender to observe if there are any significant differences between model performance for different gender groups.\n",
"\n",
"If there is more than one face in an image, gender is labeled as \"MISSING\".\n",
"\n",
"We've hosted the dataset on Google Cloud Platform for convenience. Run the following code to download the data from GCP, the data will take about a minute to download and analyze."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "NdLBi6tN5i7I"
},
"outputs": [],
"source": [
"data_location = tf.keras.utils.get_file('lfw_dataset.tf', 'https://storage.googleapis.com/facessd_dataset/lfw_dataset.tfrecord')\n",
"\n",
"stats = tfdv.generate_statistics_from_tfrecord(data_location=data_location)\n",
"tfdv.visualize_statistics(stats)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "cNODEwE5x7Uo"
},
"source": [
"# Defining Constants"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "ZF4NO87uFxdQ"
},
"outputs": [],
"source": [
"BASE_DIR = tempfile.gettempdir()\n",
"\n",
"tfma_eval_result_path = os.path.join(BASE_DIR, 'tfma_eval_result')\n",
"\n",
"compute_confidence_intervals = True\n",
"\n",
"slice_key = 'object/groundtruth/Gender'\n",
"label_key = 'object/groundtruth/face'\n",
"prediction_key = 'object/prediction/face'\n",
"\n",
"feature_map = {\n",
" slice_key:\n",
" tf.io.FixedLenFeature([], tf.string, default_value=['none']),\n",
" label_key:\n",
" tf.io.FixedLenFeature([], tf.float32, default_value=[0.0]),\n",
" prediction_key:\n",
" tf.io.FixedLenFeature([], tf.float32, default_value=[0.0]),\n",
"}"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "gVLHwuhEyI8R"
},
"source": [
"# Model Agnostic Config for TFMA"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "ej1nGCZSyJIK"
},
"outputs": [],
"source": [
"model_agnostic_config = agnostic_predict.ModelAgnosticConfig(\n",
" label_keys=[label_key],\n",
" prediction_keys=[prediction_key],\n",
" feature_spec=feature_map)\n",
"\n",
"model_agnostic_extractors = [\n",
" model_agnostic_extractor.ModelAgnosticExtractor(\n",
" model_agnostic_config=model_agnostic_config, desired_batch_size=3),\n",
" tfma.extractors.slice_key_extractor.SliceKeyExtractor(\n",
" [tfma.slicer.SingleSliceSpec(),\n",
" tfma.slicer.SingleSliceSpec(columns=[slice_key])])\n",
"]"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "wqkk9SkvyVkR"
},
"source": [
"# Fairness Callbacks and Computing Fairness Metrics"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "A0icrlliBCOb"
},
"outputs": [],
"source": [
"# Helper class for counting examples in beam PCollection\n",
"class CountExamples(beam.CombineFn):\n",
" def __init__(self, message):\n",
" self.message = message\n",
"\n",
" def create_accumulator(self):\n",
" return 0\n",
"\n",
" def add_input(self, current_sum, element):\n",
" return current_sum + 1\n",
"\n",
" def merge_accumulators(self, accumulators): \n",
" return sum(accumulators)\n",
"\n",
" def extract_output(self, final_sum):\n",
" if final_sum:\n",
" print(\"%s: %d\"%(self.message, final_sum))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "mRQjdjp9yVv2"
},
"outputs": [],
"source": [
"metrics_callbacks = [\n",
" tfma.post_export_metrics.fairness_indicators(\n",
" thresholds=[0.1, 0.3, 0.5, 0.7, 0.9],\n",
" labels_key=label_key,\n",
" target_prediction_keys=[prediction_key]),\n",
" tfma.post_export_metrics.auc(\n",
" curve='PR',\n",
" labels_key=label_key,\n",
" target_prediction_keys=[prediction_key]),\n",
"]\n",
"\n",
"eval_shared_model = tfma.types.EvalSharedModel(\n",
" add_metrics_callbacks=metrics_callbacks,\n",
" construct_fn=model_agnostic_evaluate_graph.make_construct_fn(\n",
" add_metrics_callbacks=metrics_callbacks,\n",
" config=model_agnostic_config))\n",
"\n",
"with beam.Pipeline() as pipeline:\n",
" # Read data.\n",
" data = (\n",
" pipeline\n",
" | 'ReadData' >> beam.io.ReadFromTFRecord(data_location))\n",
"\n",
" # Count all examples.\n",
" data_count = (\n",
" data | 'Count number of examples' >> beam.CombineGlobally(\n",
" CountExamples('Before filtering \"Gender:MISSING\"')))\n",
"\n",
" # If there are more than one face in image, the gender feature is 'MISSING'\n",
" # and we are filtering that image out.\n",
" def filter_missing_gender(element):\n",
" example = tf.train.Example.FromString(element)\n",
" if example.features.feature[slice_key].bytes_list.value[0] != b'MISSING':\n",
" yield element\n",
"\n",
" filtered_data = (\n",
" data\n",
" | 'Filter Missing Gender' >> beam.ParDo(filter_missing_gender))\n",
"\n",
" # Count after filtering \"Gender:MISSING\".\n",
" filtered_data_count = (\n",
" filtered_data | 'Count number of examples after filtering'\n",
" >> beam.CombineGlobally(\n",
" CountExamples('After filtering \"Gender:MISSING\"')))\n",
"\n",
" # Because LFW data set has always faces by default, we are adding\n",
" # labels as 1.0 for all images.\n",
" def add_face_groundtruth(element):\n",
" example = tf.train.Example.FromString(element)\n",
" example.features.feature[label_key].float_list.value[:] = [1.0]\n",
" yield example.SerializeToString()\n",
"\n",
" final_data = (\n",
" filtered_data\n",
" | 'Add Face Groundtruth' >> beam.ParDo(add_face_groundtruth))\n",
"\n",
" # Run TFMA.\n",
" _ = (\n",
" final_data\n",
" | 'ExtractEvaluateAndWriteResults' >>\n",
" tfma.ExtractEvaluateAndWriteResults(\n",
" eval_shared_model=eval_shared_model,\n",
" compute_confidence_intervals=compute_confidence_intervals,\n",
" output_path=tfma_eval_result_path,\n",
" extractors=model_agnostic_extractors))\n",
"\n",
"eval_result = tfma.load_eval_result(output_path=tfma_eval_result_path)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ktlASJQIzE3l"
},
"source": [
"# Render Fairness Indicators\n",
"\n",
"Render the Fairness Indicators widget with the exported evaluation results.\n",
"\n",
"Below you will see bar charts displaying performance of each slice of the data on selected metrics. You can adjust the baseline comparison slice as well as the displayed threshold(s) using the drop down menus at the top of the visualization.\n",
"\n",
"A relevant metric for this use case is true positive rate, also known as recall. Use the selector on the left hand side to choose the graph for true_positive_rate. These metric values match the values displayed on the [model card](https://modelcards.withgoogle.com/face-detection).\n",
"\n",
"For some photos, gender is labeled as young instead of male or female, if the person in the photo is too young to be accurately annotated."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "JNaNhTCTAMHm"
},
"outputs": [],
"source": [
"widget_view.render_fairness_indicator(eval_result=eval_result,\n",
" slicing_column=slice_key)"
]
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"collapsed_sections": [
"Sxt-9qpNgPxo"
],
"name": "Facessd Fairness Indicators Example Colab.ipynb",
"provenance": [],
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.22"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
================================================
FILE: docs/tutorials/Fairness_Indicators_Example_Colab.ipynb
================================================
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "Tce3stUlHN0L"
},
"source": [
"##### Copyright 2020 The TensorFlow Authors."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "tuOe1ymfHZPu"
},
"outputs": [],
"source": [
"#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
"# you may not use this file except in compliance with the License.\n",
"# You may obtain a copy of the License at\n",
"#\n",
"# https://www.apache.org/licenses/LICENSE-2.0\n",
"#\n",
"# Unless required by applicable law or agreed to in writing, software\n",
"# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
"# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
"# See the License for the specific language governing permissions and\n",
"# limitations under the License."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "aalPefrUUplk"
},
"source": [
"# Introduction to Fairness Indicators"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "MfBg1C5NB3X0"
},
"source": [
"
"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "YWcPbUNg1yez"
},
"source": [
"## Overview\n",
"\n",
"Fairness Indicators is a suite of tools built on top of [TensorFlow Model Analysis (TFMA)](https://tensorflow.github.io/model-analysis/get_started) that enable regular evaluation of fairness metrics in product pipelines. TFMA is a library for evaluating both TensorFlow and non-TensorFlow machine learning models. It allows you to evaluate your models on large amounts of data in a distributed manner, compute in-graph and other metrics over different slices of data, and visualize them in notebooks. \n",
"\n",
"Fairness Indicators is packaged with [TensorFlow Data Validation (TFDV)](https://tensorflow.github.io/data-validation/get_started) and the [What-If Tool](https://pair-code.github.io/what-if-tool/). Using Fairness Indicators allows you to: \n",
"\n",
"* Evaluate model performance, sliced across defined groups of users\n",
"* Gain confidence about results with confidence intervals and evaluations at multiple thresholds\n",
"* Evaluate the distribution of datasets\n",
"* Dive deep into individual slices to explore root causes and opportunities for improvement\n",
"\n",
"In this notebook, you will use Fairness Indicators to fix fairness issues in a model you train using the [Civil Comments dataset](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification). Watch this [video](https://www.youtube.com/watch?v=pHT-ImFXPQo) for more details and context on the real-world scenario this is based on which is also one of primary motivations for creating Fairness Indicators."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "GjuCFktB2IJW"
},
"source": [
"## Dataset\n",
"\n",
"In this notebook, you will work with the [Civil Comments dataset](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification), approximately 2 million public comments made public by the [Civil Comments platform](https://medium.com/@aja_15265/saying-goodbye-to-civil-comments-41859d3a2b1d) in 2017 for ongoing research. This effort was sponsored by [Jigsaw](https://jigsaw.google.com/), who have hosted competitions on Kaggle to help classify toxic comments as well as minimize unintended model bias.\n",
"\n",
"Each individual text comment in the dataset has a toxicity label, with the label being 1 if the comment is toxic and 0 if the comment is non-toxic. Within the data, a subset of comments are labeled with a variety of identity attributes, including categories for gender, sexual orientation, religion, and race or ethnicity."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "u33JXdluZ2lG"
},
"source": [
"## Setup\n",
"\n",
"Install `fairness-indicators` and `witwidget`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "EoRNffG599XP"
},
"outputs": [],
"source": [
"!pip install -q -U pip==20.2\n",
"\n",
"!pip install -q fairness-indicators\n",
"!pip install -q witwidget"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "alYUSbyv59j5"
},
"source": [
"You must restart the Colab runtime after installing. Select **Runtime > Restart** runtime from the Colab menu.\n",
"\n",
"Do not proceed with the rest of this tutorial without first restarting the runtime."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "RbRUqXDm6f1N"
},
"source": [
"Import all other required libraries."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "B8dlyTyiTe-9"
},
"outputs": [],
"source": [
"import os\n",
"import tempfile\n",
"import apache_beam as beam\n",
"import numpy as np\n",
"import pandas as pd\n",
"from datetime import datetime\n",
"import pprint\n",
"\n",
"from google.protobuf import text_format\n",
"\n",
"import tensorflow_hub as hub\n",
"import tensorflow as tf\n",
"import tensorflow_model_analysis as tfma\n",
"import tensorflow_data_validation as tfdv\n",
"\n",
"from tfx_bsl.tfxio import tensor_adapter\n",
"from tfx_bsl.tfxio import tf_example_record\n",
"\n",
"from tensorflow_model_analysis.addons.fairness.post_export_metrics import fairness_indicators\n",
"from tensorflow_model_analysis.addons.fairness.view import widget_view\n",
"\n",
"from fairness_indicators.tutorial_utils import util\n",
"\n",
"from witwidget.notebook.visualization import WitConfigBuilder\n",
"from witwidget.notebook.visualization import WitWidget\n",
"\n",
"from tensorflow_metadata.proto.v0 import schema_pb2"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "TsplOJGqWCf5"
},
"source": [
"## Download and analyze the data\n",
"\n",
"By default, this notebook downloads a preprocessed version of this dataset, but you may use the original dataset and re-run the processing steps if desired. In the original dataset, each comment is labeled with the percentage of raters who believed that a comment corresponds to a particular identity. For example, a comment might be labeled with the following: { male: 0.3, female: 1.0, transgender: 0.0, heterosexual: 0.8, homosexual_gay_or_lesbian: 1.0 } The processing step groups identity by category (gender, sexual_orientation, etc.) and removes identities with a score less than 0.5. So the example above would be converted to the following: of raters who believed that a comment corresponds to a particular identity. For example, the comment would be labeled with the following: { gender: [female], sexual_orientation: [heterosexual, homosexual_gay_or_lesbian] }"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "qmt4gkBFRBD2"
},
"outputs": [],
"source": [
"download_original_data = False #@param {type:\"boolean\"}\n",
"\n",
"if download_original_data:\n",
" train_tf_file = tf.keras.utils.get_file('train_tf.tfrecord',\n",
" 'https://storage.googleapis.com/civil_comments_dataset/train_tf.tfrecord')\n",
" validate_tf_file = tf.keras.utils.get_file('validate_tf.tfrecord',\n",
" 'https://storage.googleapis.com/civil_comments_dataset/validate_tf.tfrecord')\n",
"\n",
" # The identity terms list will be grouped together by their categories\n",
" # (see 'IDENTITY_COLUMNS') on threshould 0.5. Only the identity term column,\n",
" # text column and label column will be kept after processing.\n",
" train_tf_file = util.convert_comments_data(train_tf_file)\n",
" validate_tf_file = util.convert_comments_data(validate_tf_file)\n",
"\n",
"else:\n",
" train_tf_file = tf.keras.utils.get_file('train_tf_processed.tfrecord',\n",
" 'https://storage.googleapis.com/civil_comments_dataset/train_tf_processed.tfrecord')\n",
" validate_tf_file = tf.keras.utils.get_file('validate_tf_processed.tfrecord',\n",
" 'https://storage.googleapis.com/civil_comments_dataset/validate_tf_processed.tfrecord')"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "vFOQ4AaIcAn2"
},
"source": [
"Use TFDV to analyze the data and find potential problems in it, such as missing values and data imbalances, that can lead to fairness disparities."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "NdLBi6tN5i7I"
},
"outputs": [],
"source": [
"stats = tfdv.generate_statistics_from_tfrecord(data_location=train_tf_file)\n",
"tfdv.visualize_statistics(stats)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "AS9QiA96GXDE"
},
"source": [
"TFDV shows that there are some significant imbalances in the data which could lead to biased model outcomes. \n",
"\n",
"* The toxicity label (the value predicted by the model) is unbalanced. Only 8% of the examples in the training set are toxic, which means that a classifier could get 92% accuracy by predicting that all comments are non-toxic.\n",
"\n",
"* In the fields relating to identity terms, only 6.6k out of the 1.08 million (0.61%) training examples deal with homosexuality, and those related to bisexuality are even more rare. This indicates that performance on these slices may suffer due to lack of training data."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "9ekzb7vVnPCc"
},
"source": [
"## Prepare the data\n",
"\n",
"Define a feature map to parse the data. Each example will have a label, comment text, and identity features `sexual orientation`, `gender`, `religion`, `race`, and `disability` that are associated with the text."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "n4_nXQDykX6W"
},
"outputs": [],
"source": [
"BASE_DIR = tempfile.gettempdir()\n",
"\n",
"TEXT_FEATURE = 'comment_text'\n",
"LABEL = 'toxicity'\n",
"FEATURE_MAP = {\n",
" # Label:\n",
" LABEL: tf.io.FixedLenFeature([], tf.float32),\n",
" # Text:\n",
" TEXT_FEATURE: tf.io.FixedLenFeature([], tf.string),\n",
"\n",
" # Identities:\n",
" 'sexual_orientation':tf.io.VarLenFeature(tf.string),\n",
" 'gender':tf.io.VarLenFeature(tf.string),\n",
" 'religion':tf.io.VarLenFeature(tf.string),\n",
" 'race':tf.io.VarLenFeature(tf.string),\n",
" 'disability':tf.io.VarLenFeature(tf.string),\n",
"}"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "1B1ROCM__y8C"
},
"source": [
"Next, set up an input function to feed data into the model. Add a weight column to each example and upweight the toxic examples to account for the class imbalance identified by the TFDV. Use only identity features during the evaluation phase, as only the comments are fed into the model during training."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "YwoC-dzEDid3"
},
"outputs": [],
"source": [
"def train_input_fn():\n",
" def parse_function(serialized):\n",
" parsed_example = tf.io.parse_single_example(\n",
" serialized=serialized, features=FEATURE_MAP)\n",
" # Adds a weight column to deal with unbalanced classes.\n",
" parsed_example['weight'] = tf.add(parsed_example[LABEL], 0.1)\n",
" return (parsed_example,\n",
" parsed_example[LABEL])\n",
" train_dataset = tf.data.TFRecordDataset(\n",
" filenames=[train_tf_file]).map(parse_function).batch(512)\n",
" return train_dataset"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "mfbgerCsEOmN"
},
"source": [
"## Train the model\n",
"\n",
"Create and train a deep learning model on the data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "JaGvNrVijfws"
},
"outputs": [],
"source": [
"model_dir = os.path.join(BASE_DIR, 'train', datetime.now().strftime(\n",
" \"%Y%m%d-%H%M%S\"))\n",
"\n",
"embedded_text_feature_column = hub.text_embedding_column(\n",
" key=TEXT_FEATURE,\n",
" module_spec='https://tfhub.dev/google/nnlm-en-dim128/1')\n",
"\n",
"classifier = tf.estimator.DNNClassifier(\n",
" hidden_units=[500, 100],\n",
" weight_column='weight',\n",
" feature_columns=[embedded_text_feature_column],\n",
" optimizer=tf.keras.optimizers.legacy.Adagrad(learning_rate=0.003),\n",
" loss_reduction=tf.losses.Reduction.SUM,\n",
" n_classes=2,\n",
" model_dir=model_dir)\n",
"\n",
"classifier.train(input_fn=train_input_fn, steps=1000)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "jTPqije9Eg5b"
},
"source": [
"## Analyze the model\n",
"\n",
"After obtaining the trained model, analyze it to compute fairness metrics using TFMA and Fairness Indicators. Begin by exporting the model as a [SavedModel](https://www.tensorflow.org/guide/saved_model). "
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "-vRc-Jyp8dRm"
},
"source": [
"### Export SavedModel"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "QLjiy5VCzlRw"
},
"outputs": [],
"source": [
"def eval_input_receiver_fn():\n",
" serialized_tf_example = tf.compat.v1.placeholder(\n",
" dtype=tf.string, shape=[None], name='input_example_placeholder')\n",
"\n",
" # This *must* be a dictionary containing a single key 'examples', which\n",
" # points to the input placeholder.\n",
" receiver_tensors = {'examples': serialized_tf_example}\n",
"\n",
" features = tf.io.parse_example(serialized_tf_example, FEATURE_MAP)\n",
" features['weight'] = tf.ones_like(features[LABEL])\n",
"\n",
" return tfma.export.EvalInputReceiver(\n",
" features=features,\n",
" receiver_tensors=receiver_tensors,\n",
" labels=features[LABEL])\n",
"\n",
"tfma_export_dir = tfma.export.export_eval_savedmodel(\n",
" estimator=classifier,\n",
" export_dir_base=os.path.join(BASE_DIR, 'tfma_eval_model'),\n",
" eval_input_receiver_fn=eval_input_receiver_fn)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "3j8ODcee8rQ8"
},
"source": [
"### Compute Fairness Metrics\n",
"\n",
"Select the identity to compute metrics for and whether to run with confidence intervals using the dropdown in the panel on the right."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "7shDmJbx9mqa"
},
"outputs": [],
"source": [
"#@title Fairness Indicators Computation Options\n",
"tfma_eval_result_path = os.path.join(BASE_DIR, 'tfma_eval_result')\n",
"\n",
"#@markdown Modify the slice_selection for experiments on other identities.\n",
"slice_selection = 'sexual_orientation' #@param [\"sexual_orientation\", \"gender\", \"religion\", \"race\", \"disability\"]\n",
"print(f'Slice selection: {slice_selection}')\n",
"#@markdown Confidence Intervals can help you make better decisions regarding your data, but as it requires computing multiple resamples, is slower particularly in the colab environment that cannot take advantage of parallelization.\n",
"compute_confidence_intervals = False #@param {type:\"boolean\"}\n",
"print(f'Compute confidence intervals: {compute_confidence_intervals}')\n",
"\n",
"# Define slices that you want the evaluation to run on.\n",
"eval_config_pbtxt = \"\"\"\n",
" model_specs {\n",
" label_key: \"%s\"\n",
" }\n",
" metrics_specs {\n",
" metrics {\n",
" class_name: \"FairnessIndicators\"\n",
" config: '{ \"thresholds\": [0.1, 0.3, 0.5, 0.7, 0.9] }'\n",
" }\n",
" }\n",
" slicing_specs {} # overall slice\n",
" slicing_specs {\n",
" feature_keys: [\"%s\"]\n",
" }\n",
" options {\n",
" compute_confidence_intervals { value: %s }\n",
" disabled_outputs { values: \"analysis\" }\n",
" }\n",
" \"\"\" % (LABEL, slice_selection, compute_confidence_intervals)\n",
"eval_config = text_format.Parse(eval_config_pbtxt, tfma.EvalConfig())\n",
"eval_shared_model = tfma.default_eval_shared_model(\n",
" eval_saved_model_path=tfma_export_dir)\n",
"\n",
"schema = text_format.Parse(\n",
" \"\"\"\n",
" tensor_representation_group {\n",
" key: \"\"\n",
" value {\n",
" tensor_representation {\n",
" key: \"comment_text\"\n",
" value {\n",
" dense_tensor {\n",
" column_name: \"comment_text\"\n",
" shape {}\n",
" }\n",
" }\n",
" }\n",
" }\n",
" }\n",
" feature {\n",
" name: \"comment_text\"\n",
" type: BYTES\n",
" }\n",
" feature {\n",
" name: \"toxicity\"\n",
" type: FLOAT\n",
" }\n",
" feature {\n",
" name: \"sexual_orientation\"\n",
" type: BYTES\n",
" }\n",
" feature {\n",
" name: \"gender\"\n",
" type: BYTES\n",
" }\n",
" feature {\n",
" name: \"religion\"\n",
" type: BYTES\n",
" }\n",
" feature {\n",
" name: \"race\"\n",
" type: BYTES\n",
" }\n",
" feature {\n",
" name: \"disability\"\n",
" type: BYTES\n",
" }\n",
" \"\"\", schema_pb2.Schema())\n",
"tfxio = tf_example_record.TFExampleRecord(\n",
" file_pattern=validate_tf_file,\n",
" schema=schema,\n",
" raw_record_column_name=tfma.ARROW_INPUT_COLUMN)\n",
"tensor_adapter_config = tensor_adapter.TensorAdapterConfig(\n",
" arrow_schema=tfxio.ArrowSchema(),\n",
" tensor_representations=tfxio.TensorRepresentations())\n",
"\n",
"with beam.Pipeline() as pipeline:\n",
" (pipeline\n",
" | 'ReadFromTFRecordToArrow' >> tfxio.BeamSource()\n",
" | 'ExtractEvaluateAndWriteResults' >> tfma.ExtractEvaluateAndWriteResults(\n",
" eval_config=eval_config,\n",
" eval_shared_model=eval_shared_model,\n",
" output_path=tfma_eval_result_path,\n",
" tensor_adapter_config=tensor_adapter_config))\n",
"\n",
"eval_result = tfma.load_eval_result(output_path=tfma_eval_result_path)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "jtDpTBPeRw2d"
},
"source": [
"### Visualize data using the What-if Tool\n",
"\n",
"In this section, you'll use the What-If Tool's interactive visual interface to explore and manipulate data at a micro-level.\n",
"\n",
"Each point on the scatter plot on the right-hand panel represents one of the examples in the subset loaded into the tool. Click on one of the points to see details about this particular example in the left-hand panel. The comment text, ground truth toxicity, and applicable identities are shown. At the bottom of this left-hand panel, you see the inference results from the model you just trained.\n",
"\n",
"Modify the text of the example and then click the **Run inference** button to view how your changes caused the perceived toxicity prediction to change."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "wtjZo4BDlV1m"
},
"outputs": [],
"source": [
"DEFAULT_MAX_EXAMPLES = 1000\n",
"\n",
"# Load 100000 examples in memory. When first rendered, \n",
"# What-If Tool should only display 1000 of these due to browser constraints.\n",
"def wit_dataset(file, num_examples=100000):\n",
" dataset = tf.data.TFRecordDataset(\n",
" filenames=[file]).take(num_examples)\n",
" return [tf.train.Example.FromString(d.numpy()) for d in dataset]\n",
"\n",
"wit_data = wit_dataset(train_tf_file)\n",
"config_builder = WitConfigBuilder(wit_data[:DEFAULT_MAX_EXAMPLES]).set_estimator_and_feature_spec(\n",
" classifier, FEATURE_MAP).set_label_vocab(['non-toxicity', LABEL]).set_target_feature(LABEL)\n",
"wit = WitWidget(config_builder)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ktlASJQIzE3l"
},
"source": [
"## Render Fairness Indicators\n",
"\n",
"Render the Fairness Indicators widget with the exported evaluation results.\n",
"\n",
"Below you will see bar charts displaying performance of each slice of the data on selected metrics. You can adjust the baseline comparison slice as well as the displayed threshold(s) using the dropdown menus at the top of the visualization. \n",
"\n",
"The Fairness Indicator widget is integrated with the What-If Tool rendered above. If you select one slice of the data in the bar chart, the What-If Tool will update to show you examples from the selected slice. When the data reloads in the What-If Tool above, try modifying **Color By** to **toxicity**. This can give you a visual understanding of the toxicity balance of examples by slice."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "JNaNhTCTAMHm"
},
"outputs": [],
"source": [
"event_handlers={'slice-selected':\n",
" wit.create_selection_callback(wit_data, DEFAULT_MAX_EXAMPLES)}\n",
"widget_view.render_fairness_indicator(eval_result=eval_result,\n",
" slicing_column=slice_selection,\n",
" event_handlers=event_handlers\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "nRuZsLr6V_fY"
},
"source": [
"With this particular dataset and task, systematically higher false positive and false negative rates for certain identities can lead to negative consequences. For example, in a content moderation system, a higher-than-overall false positive rate for a certain group can lead to those voices being silenced. Thus, it is important to regularly evaluate these types of criteria as you develop and improve models, and utilize tools such as Fairness Indicators, TFDV, and WIT to help illuminate potential problems. Once you've identified fairness issues, you can experiment with new data sources, data balancing, or other techniques to improve performance on underperforming groups.\n",
"\n",
"See [here](../../guide/guidance) for more information and guidance on how to use Fairness Indicators.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "wCMEMtGfx0Ti"
},
"source": [
"## Use fairness evaluation results\n",
"\n",
"The [`eval_result`](https://tensorflow.github.io/model-analysis/api_docs/python/tfma/#tensorflow_model_analysis.EvalResult) object, rendered above in `render_fairness_indicator()`, has its own API that you can leverage to read TFMA results into your programs."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "z6stkMLwyfza"
},
"source": [
"### Get evaluated slices and metrics\n",
"\n",
"Use [`get_slice_names()`](https://tensorflow.github.io/model-analysis/api_docs/python/tfma/#tensorflow_model_analysis.EvalResult.get_slice_names) and [`get_metric_names()`](https://tensorflow.github.io/model-analysis/api_docs/python/tfma/#tensorflow_model_analysis.EvalResult.get_metric_names) to get the evaluated slices and metrics, respectively."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "eXrt7SdZyzWD"
},
"outputs": [],
"source": [
"pp = pprint.PrettyPrinter()\n",
"\n",
"print(\"Slices:\")\n",
"pp.pprint(eval_result.get_slice_names())\n",
"print(\"\\nMetrics:\")\n",
"pp.pprint(eval_result.get_metric_names())"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ctAvudY2zUu4"
},
"source": [
"Use [`get_metrics_for_slice()`](https://tensorflow.github.io/model-analysis/api_docs/python/tfma/#tensorflow_model_analysis.EvalResultget_metrics_for_slice) to get the metrics for a particular slice as a dictionary mapping metric names to [metric values](https://github.com/tensorflow/model-analysis/blob/cdb6790dcd7a37c82afb493859b3ef4898963fee/tensorflow_model_analysis/proto/metrics_for_slice.proto#L194)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "zjCxZGHmzF0R"
},
"outputs": [],
"source": [
"baseline_slice = ()\n",
"heterosexual_slice = (('sexual_orientation', 'heterosexual'),)\n",
"\n",
"print(\"Baseline metric values:\")\n",
"pp.pprint(eval_result.get_metrics_for_slice(baseline_slice))\n",
"print(\"\\nHeterosexual metric values:\")\n",
"pp.pprint(eval_result.get_metrics_for_slice(heterosexual_slice))"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "UDo3LhoR0Rq1"
},
"source": [
"Use [`get_metrics_for_all_slices()`](https://tensorflow.github.io/model-analysis/api_docs/python/tfma/#tensorflow_model_analysis.EvalResult.get_metrics_for_all_slices) to get the metrics for all slices as a dictionary mapping each slice to the corresponding metrics dictionary you obtain from running `get_metrics_for_slice()` on it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "96N2l2xI0fZd"
},
"outputs": [],
"source": [
"pp.pprint(eval_result.get_metrics_for_all_slices())"
]
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"collapsed_sections": [],
"name": "Fairness Indicators Example Colab.ipynb",
"private_outputs": true,
"provenance": [],
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.22"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
================================================
FILE: docs/tutorials/Fairness_Indicators_Pandas_Case_Study.ipynb
================================================
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "Bfrh3DUze0QN"
},
"source": [
"##### Copyright 2020 The TensorFlow Authors."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "sx-jnufYfcJG"
},
"outputs": [],
"source": [
"#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
"# you may not use this file except in compliance with the License.\n",
"# You may obtain a copy of the License at\n",
"#\n",
"# https://www.apache.org/licenses/LICENSE-2.0\n",
"#\n",
"# Unless required by applicable law or agreed to in writing, software\n",
"# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
"# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
"# See the License for the specific language governing permissions and\n",
"# limitations under the License."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "s1bQihY6-Y4N"
},
"source": [
"# Pandas DataFrame to Fairness Indicators Case Study\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "XHTjeiUMeolM"
},
"source": [
"
"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ay80altXzvgZ"
},
"source": [
"## Case Study Overview\n",
"In this case study we will apply [TensorFlow Model Analysis](https://tensorflow.github.io/model-analysis/get_started) and [Fairness Indicators](https://tensorflow.github.io/fairness-indicators) to evaluate data stored as a Pandas DataFrame, where each row contains ground truth labels, various features, and a model prediction. We will show how this workflow can be used to spot potential fairness concerns, independent of the framework one used to construct and train the model. As in this case study, we can analyze the results from any machine learning framework (e.g. TensorFlow, JAX, etc) once they are converted to a Pandas DataFrame.\n",
" \n",
"For this exercise, we will leverage the Deep Neural Network (DNN) model that was developed in the [Shape Constraints for Ethics with Tensorflow Lattice](https://colab.research.google.com/github/tensorflow/lattice/blob/master/docs/tutorials/shape_constraints_for_ethics.ipynb#scrollTo=uc0VwsT5nvQi) case study using the Law School Admissions dataset from the Law School Admissions Council (LSAC). This classifier attempts to predict whether or not a student will pass the bar, based on their Law School Admission Test (LSAT) score and undergraduate GPA.\n",
"\n",
"## LSAC Dataset\n",
"The dataset used within this case study was originally collected for a study called '[LSAC National Longitudinal Bar Passage Study. LSAC Research Report Series](https://eric.ed.gov/?id=ED469370)' by Linda Wightman in 1998. The dataset is currently hosted [here](http://www.seaphe.org/databases.php).\n",
"\n",
"* **dnn_bar_pass_prediction**: The LSAT prediction from the DNN model.\n",
"* **gender**: Gender of the student.\n",
"* **lsat**: LSAT score received by the student.\n",
"* **pass_bar**: Ground truth label indicating whether or not the student eventually passed the bar.\n",
"* **race**: Race of the student.\n",
"* **ugpa**: A student's undergraduate GPA.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Ob01ASKqixfw"
},
"outputs": [],
"source": [
"!pip install -q -U pip==20.2\n",
"\n",
"!pip install -q -U \\\n",
" tensorflow-model-analysis==0.48.0 \\\n",
" tensorflow-data-validation==1.17.0 \\\n",
" tfx-bsl==1.17.1"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "tnxSvgkaSEIj"
},
"source": [
"## Importing required packages:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "0q8cTfpTkEMP"
},
"outputs": [],
"source": [
"import os\n",
"import tempfile\n",
"import pandas as pd\n",
"import six.moves.urllib as urllib\n",
"import pprint\n",
"\n",
"import tensorflow_model_analysis as tfma\n",
"from google.protobuf import text_format\n",
"\n",
"import tensorflow as tf\n",
"tf.compat.v1.enable_v2_behavior()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "b8kWW3t4-eS1"
},
"source": [
"## Download the data and explore the initial dataset."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "wMZJtgj0qJ0x"
},
"outputs": [],
"source": [
"# Download the LSAT dataset and setup the required filepaths.\n",
"_DATA_ROOT = tempfile.mkdtemp(prefix='lsat-data')\n",
"_DATA_PATH = 'https://storage.googleapis.com/lawschool_dataset/bar_pass_prediction.csv'\n",
"_DATA_FILEPATH = os.path.join(_DATA_ROOT, 'bar_pass_prediction.csv')\n",
"\n",
"data = urllib.request.urlopen(_DATA_PATH)\n",
"\n",
"_LSAT_DF = pd.read_csv(data)\n",
"\n",
"# To simpliy the case study, we will only use the columns that will be used for\n",
"# our model.\n",
"_COLUMN_NAMES = [\n",
" 'dnn_bar_pass_prediction',\n",
" 'gender',\n",
" 'lsat',\n",
" 'pass_bar',\n",
" 'race1',\n",
" 'ugpa',\n",
"]\n",
"\n",
"_LSAT_DF.dropna()\n",
"_LSAT_DF['gender'] = _LSAT_DF['gender'].astype(str)\n",
"_LSAT_DF['race1'] = _LSAT_DF['race1'].astype(str)\n",
"_LSAT_DF = _LSAT_DF[_COLUMN_NAMES]\n",
"\n",
"_LSAT_DF.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "GyeVg2s7-wlB"
},
"source": [
"## Configure Fairness Indicators.\n",
"There are several parameters that you’ll need to take into account when using Fairness Indicators with a DataFrame \n",
"\n",
"* Your input DataFrame must contain a prediction column and label column from your model. By default Fairness Indicators will look for a prediction column called `prediction` and a label column called `label` within your DataFrame.\n",
" * If either of these values are not found a KeyError will be raised.\n",
"\n",
"* In addition to a DataFrame, you’ll also need to include an `eval_config` that should include the metrics to compute, slices to compute the metrics on, and the column names for example labels and predictions. \n",
" * `metrics_specs` will set the metrics to compute. The `FairnessIndicators` metric will be required to render the fairness metrics and you can see a list of additional optional metrics [here](https://tensorflow.github.io/model-analysis/metrics).\n",
"\n",
" * `slicing_specs` is an optional slicing parameter to specify what feature you’re interested in investigating. Within this case study race1 is used, however you can also set this value to another feature (for example gender in the context of this DataFrame). If `slicing_specs` is not provided all features will be included.\n",
" * If your DataFrame includes a label or prediction column that is different from the default `prediction` or `label`, you can configure the `label_key` and `prediction_key` to a new value.\n",
"\n",
"* If `output_path` is not specified a temporary directory will be created."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "53caFasB5V9p"
},
"outputs": [],
"source": [
"# Specify Fairness Indicators in eval_config.\n",
"eval_config = text_format.Parse(\"\"\"\n",
" model_specs {\n",
" prediction_key: 'dnn_bar_pass_prediction',\n",
" label_key: 'pass_bar'\n",
" }\n",
" metrics_specs {\n",
" metrics {class_name: \"AUC\"}\n",
" metrics {\n",
" class_name: \"FairnessIndicators\"\n",
" config: '{\"thresholds\": [0.50, 0.90]}'\n",
" }\n",
" }\n",
" slicing_specs {\n",
" feature_keys: 'race1'\n",
" }\n",
" slicing_specs {}\n",
" \"\"\", tfma.EvalConfig())\n",
"\n",
"# Run TensorFlow Model Analysis.\n",
"eval_result = tfma.analyze_raw_data(\n",
" data=_LSAT_DF,\n",
" eval_config=eval_config,\n",
" output_path=_DATA_ROOT)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "KD96mw0e--DE"
},
"source": [
"## Explore model performance with Fairness Indicators.\n",
"\n",
"After running Fairness Indicators, we can visualize different metrics that we selected to analyze our models performance. Within this case study we’ve included Fairness Indicators and arbitrarily picked AUC.\n",
"\n",
"When we first look at the overall AUC for each race slice we can see a slight discrepancy in model performance, but nothing that is arguably alarming.\n",
"\n",
"* **Asian**: 0.58\n",
"* **Black**: 0.58\n",
"* **Hispanic**: 0.58\n",
"* **Other**: 0.64\n",
"* **White**: 0.6\n",
"\n",
"However, when we look at the false negative rates split by race, our model again incorrectly predicts the likelihood of a user passing the bar at different rates and, this time, does so by a lot. \n",
"\n",
"* **Asian**: 0.01\n",
"* **Black**: 0.05\n",
"* **Hispanic**: 0.02\n",
"* **Other**: 0.01\n",
"* **White**: 0.01\n",
"\n",
"Most notably the difference between Black and White students is about 380%, meaning that our model is nearly 4x more likely to incorrectly predict that a black student will not pass the bar, than a whilte student. If we were to continue with this effort, a practitioner could use these results as a signal that they should spend more time ensuring that their model works well for people from all backgrounds."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "NIdchYPb-_ZV"
},
"outputs": [],
"source": [
"# Render Fairness Indicators.\n",
"tfma.addons.fairness.view.widget_view.render_fairness_indicator(eval_result)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "NprhBTCbY1sF"
},
"source": [
"# tfma.EvalResult"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "6f92-e98Y40r"
},
"source": [
"The [`eval_result`](https://tensorflow.github.io/model-analysis/api_docs/python/tfma/#tensorflow_model_analysis.EvalResult) object, rendered above in `render_fairness_indicator()`, has its own API that can be used to read TFMA results into your programs."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "CDDUxdx-Y8e0"
},
"source": [
"## [`get_slice_names()`](https://tensorflow.github.io/model-analysis/api_docs/python/tfma/#tensorflow_model_analysis.EvalResult.get_slice_names) and [`get_metric_names()`](https://tensorflow.github.io/model-analysis/api_docs/python/tfma/#tensorflow_model_analysis.EvalResult.get_metric_names)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "oG_mNUNbY98t"
},
"source": [
"To get the evaluated slices and metrics, you can use the respective functions."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "kbA1sXhCY_G7"
},
"outputs": [],
"source": [
"pp = pprint.PrettyPrinter()\n",
"\n",
"print(\"Slices:\")\n",
"pp.pprint(eval_result.get_slice_names())\n",
"print(\"\\nMetrics:\")\n",
"pp.pprint(eval_result.get_metric_names())"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "rA1M8aBmZAk6"
},
"source": [
"## [`get_metrics_for_slice()`](https://tensorflow.github.io/model-analysis/api_docs/python/tfma/#tensorflow_model_analysis.EvalResult.get_metrics_for_slice) and [`get_metrics_for_all_slices()`](https://tensorflow.github.io/model-analysis/api_docs/python/tfma/#tensorflow_model_analysis.EvalResult.get_metrics_for_all_slices)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "a3Ath5MsZCRX"
},
"source": [
"If you want to get the metrics for a particular slice, you can use `get_metrics_for_slice()`. It returns a dictionary mapping metric names to [metric values](https://github.com/tensorflow/model-analysis/blob/cdb6790dcd7a37c82afb493859b3ef4898963fee/tensorflow_model_analysis/proto/metrics_for_slice.proto#L194)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "9BWg5HoyZDh-"
},
"outputs": [],
"source": [
"baseline_slice = ()\n",
"black_slice = (('race1', 'black'),)\n",
"\n",
"print(\"Baseline metric values:\")\n",
"pp.pprint(eval_result.get_metrics_for_slice(baseline_slice))\n",
"print(\"Black metric values:\")\n",
"pp.pprint(eval_result.get_metrics_for_slice(black_slice))"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "bDcOxvqBZEfg"
},
"source": [
"If you want to get the metrics for all slices, `get_metrics_for_all_slices()` returns a dictionary mapping each slice to the corresponding `get_metrics_for_slices(slice)`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "p4NQCi52ZFrw"
},
"outputs": [],
"source": [
"pp.pprint(eval_result.get_metrics_for_all_slices())"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "y-nbqnSTkmW3"
},
"source": [
"## Conclusion\n",
"Within this case study we imported a dataset into a Pandas DataFrame that we then analyzed with Fairness Indicators. Understanding the results of your model and underlying data is an important step in ensuring your model doesn't reflect harmful bias. In the context of this case study we examined the the LSAC dataset and how predictions from this data could be impacted by a students race. The concept of “what is unfair and what is fair have been introduced in multiple disciplines for well over 50 years, including in education, hiring, and machine learning.”1 Fairness Indicator is a tool to help mitigate fairness concerns in your machine learning model.\n",
"\n",
"For more information on using Fairness Indicators and resources to learn more about fairness concerns see [here](../../).\n",
"\n",
"---\n",
"\n",
"1. Hutchinson, B., Mitchell, M. (2018). 50 Years of Test (Un)fairness: Lessons for Machine Learning. https://arxiv.org/abs/1811.10104\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "REV1rBnoBAo1"
},
"source": [
"## Appendix\n",
"\n",
"Below are a few functions to help convert ML models to Pandas DataFrame.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "F4qv9GXiBsFA"
},
"outputs": [],
"source": [
"# TensorFlow Estimator to Pandas DataFrame:\n",
"\n",
"# _X_VALUE = # X value of binary estimator.\n",
"# _Y_VALUE = # Y value of binary estimator.\n",
"# _GROUND_TRUTH_LABEL = # Ground truth value of binary estimator.\n",
"\n",
"def _get_predicted_probabilities(estimator, input_df, get_input_fn):\n",
" predictions = estimator.predict(\n",
" input_fn=get_input_fn(input_df=input_df, num_epochs=1))\n",
" return [prediction['probabilities'][1] for prediction in predictions]\n",
"\n",
"def _get_input_fn_law(input_df, num_epochs, batch_size=None):\n",
" return tf.compat.v1.estimator.inputs.pandas_input_fn(\n",
" x=input_df[[_X_VALUE, _Y_VALUE]],\n",
" y=input_df[_GROUND_TRUTH_LABEL],\n",
" num_epochs=num_epochs,\n",
" batch_size=batch_size or len(input_df),\n",
" shuffle=False)\n",
"\n",
"def estimator_to_dataframe(estimator, input_df, num_keypoints=20):\n",
" x = np.linspace(min(input_df[_X_VALUE]), max(input_df[_X_VALUE]), num_keypoints)\n",
" y = np.linspace(min(input_df[_Y_VALUE]), max(input_df[_Y_VALUE]), num_keypoints)\n",
"\n",
" x_grid, y_grid = np.meshgrid(x, y)\n",
"\n",
" positions = np.vstack([x_grid.ravel(), y_grid.ravel()])\n",
" plot_df = pd.DataFrame(positions.T, columns=[_X_VALUE, _Y_VALUE])\n",
" plot_df[_GROUND_TRUTH_LABEL] = np.ones(len(plot_df))\n",
" predictions = _get_predicted_probabilities(\n",
" estimator=estimator, input_df=plot_df, get_input_fn=_get_input_fn_law)\n",
" return pd.DataFrame(\n",
" data=np.array(np.reshape(predictions, x_grid.shape)).flatten())"
]
}
],
"metadata": {
"colab": {
"collapsed_sections": [
"Bfrh3DUze0QN"
],
"name": "Pandas DataFrame to Fairness Indicators Case Study",
"private_outputs": true,
"provenance": [],
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.22"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
================================================
FILE: docs/tutorials/Fairness_Indicators_TFCO_CelebA_Case_Study.ipynb
================================================
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "JmvzTcYice-_"
},
"source": [
"##### Copyright 2020 The TensorFlow Authors."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "zlvAS8a9cD_t"
},
"outputs": [],
"source": [
"#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
"# you may not use this file except in compliance with the License.\n",
"# You may obtain a copy of the License at\n",
"#\n",
"# https://www.apache.org/licenses/LICENSE-2.0\n",
"#\n",
"# Unless required by applicable law or agreed to in writing, software\n",
"# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
"# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
"# See the License for the specific language governing permissions and\n",
"# limitations under the License."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "b2VYQpTttmVN"
},
"source": [
"# TensorFlow Constrained Optimization Example Using CelebA Dataset"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "3iFsS2WSeRwe"
},
"source": [
"
"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "-DQoReGDeN16"
},
"source": [
"This notebook demonstrates an easy way to create and optimize constrained problems using the TFCO library. This method can be useful in improving models when we find that they’re not performing equally well across different slices of our data, which we can identify using [Fairness Indicators](../../). The second of Google’s AI principles states that our technology should avoid creating or reinforcing unfair bias, and we believe this technique can help improve model fairness in some situations. In particular, this notebook will:\n",
"\n",
"\n",
"* Train a simple, *unconstrained* neural network model to detect a person's smile in images using [`tf.keras`](https://www.tensorflow.org/guide/keras) and the large-scale CelebFaces Attributes ([CelebA](http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html)) dataset.\n",
"* Evaluate model performance against a commonly used fairness metric across age groups, using Fairness Indicators.\n",
"* Set up a simple constrained optimization problem to achieve fairer performance across age groups.\n",
"* Retrain the now *constrained* model and evaluate performance again, ensuring that our chosen fairness metric has improved.\n",
"\n",
"Last updated: 3/11 Feb 2020"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "JyCbEWt5Zxe2"
},
"source": [
"# Installation\n",
"This notebook was created in [Colaboratory](https://research.google.com/colaboratory/faq.html), connected to the Python 3 Google Compute Engine backend. If you wish to host this notebook in a different environment, then you should not experience any major issues provided you include all the required packages in the cells below.\n",
"\n",
"Note that the very first time you run the pip installs, you may be asked to restart the runtime because of preinstalled out of date packages. Once you do so, the correct packages will be used."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "T-Zm-KDdt0bn"
},
"outputs": [],
"source": [
"#@title Pip installs\n",
"!pip install -q -U pip==20.2\n",
"\n",
"!pip install git+https://github.com/google-research/tensorflow_constrained_optimization\n",
"!pip install -q tensorflow-datasets tensorflow\n",
"!pip install fairness-indicators \\\n",
" \"absl-py==0.12.0\" \\\n",
" \"apache-beam<3,>=2.47\" \\\n",
" \"avro-python3==1.9.1\" \\\n",
" \"pyzmq==17.0.0\"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "UXWXhBLvISOY"
},
"source": [
"Note that depending on when you run the cell below, you may receive a warning about the default version of TensorFlow in Colab switching to TensorFlow 2.X soon. You can safely ignore that warning as this notebook was designed to be compatible with TensorFlow 1.X and 2.X."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "UTBBdSGaZ8aW"
},
"outputs": [],
"source": [
"#@title Import Modules\n",
"import os\n",
"import sys\n",
"import tempfile\n",
"import urllib\n",
"\n",
"import tensorflow as tf\n",
"from tensorflow import keras\n",
"\n",
"import tensorflow_datasets as tfds\n",
"tfds.disable_progress_bar()\n",
"\n",
"import numpy as np\n",
"\n",
"import tensorflow_constrained_optimization as tfco\n",
"\n",
"from tensorflow_metadata.proto.v0 import schema_pb2\n",
"from tfx_bsl.tfxio import tensor_adapter\n",
"from tfx_bsl.tfxio import tf_example_record"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "70tLum8uIZUm"
},
"source": [
"Additionally, we add a few imports that are specific to Fairness Indicators which we will use to evaluate and visualize the model's performance."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "7Se0Z0Bo9K-5"
},
"outputs": [],
"source": [
"#@title Fairness Indicators related imports\n",
"import tensorflow_model_analysis as tfma\n",
"import fairness_indicators as fi\n",
"from google.protobuf import text_format\n",
"import apache_beam as beam"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "xSG2HP7goGrj"
},
"source": [
"Although TFCO is compatible with eager and graph execution, this notebook assumes that eager execution is enabled by default as it is in TensorFlow 2.x. To ensure that nothing breaks, eager execution will be enabled in the cell below."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "W0ZusW1-lBao"
},
"outputs": [],
"source": [
"#@title Enable Eager Execution and Print Versions\n",
"if tf.__version__ < \"2.0.0\":\n",
" tf.compat.v1.enable_eager_execution()\n",
" print(\"Eager execution enabled.\")\n",
"else:\n",
" print(\"Eager execution enabled by default.\")\n",
"\n",
"print(\"TensorFlow \" + tf.__version__)\n",
"print(\"TFMA \" + tfma.VERSION_STRING)\n",
"print(\"TFDS \" + tfds.version.__version__)\n",
"print(\"FI \" + fi.version.__version__)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "idY3Uuk3yvty"
},
"source": [
"# CelebA Dataset\n",
"[CelebA](http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html) is a large-scale face attributes dataset with more than 200,000 celebrity images, each with 40 attribute annotations (such as hair type, fashion accessories, facial features, etc.) and 5 landmark locations (eyes, mouth and nose positions). For more details take a look at [the paper](https://liuziwei7.github.io/projects/FaceAttributes.html).\n",
"With the permission of the owners, we have stored this dataset on Google Cloud Storage and mostly access it via [TensorFlow Datasets(`tfds`)](https://www.tensorflow.org/datasets).\n",
"\n",
"In this notebook:\n",
"* Our model will attempt to classify whether the subject of the image is smiling, as represented by the \"Smiling\" attribute*.\n",
"* Images will be resized from 218x178 to 28x28 to reduce the execution time and memory when training.\n",
"* Our model's performance will be evaluated across age groups, using the binary \"Young\" attribute. We will call this \"age group\" in this notebook.\n",
"\n",
"___\n",
"\n",
"* While there is little information available about the labeling methodology for this dataset, we will assume that the \"Smiling\" attribute was determined by a pleased, kind, or amused expression on the subject's face. For the purpose of this case study, we will take these labels as ground truth.\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "zCSemFST0b89"
},
"outputs": [],
"source": [
"gcs_base_dir = \"gs://celeb_a_dataset/\"\n",
"celeb_a_builder = tfds.builder(\"celeb_a\", data_dir=gcs_base_dir, version='2.0.0')\n",
"\n",
"celeb_a_builder.download_and_prepare()\n",
"\n",
"num_test_shards_dict = {'0.3.0': 4, '2.0.0': 2} # Used because we download the test dataset separately\n",
"version = str(celeb_a_builder.info.version)\n",
"print('Celeb_A dataset version: %s' % version)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "Ocqv3R06APfW"
},
"outputs": [],
"source": [
"#@title Test dataset helper functions\n",
"local_root = tempfile.mkdtemp(prefix='test-data')\n",
"def local_test_filename_base():\n",
" return local_root\n",
"\n",
"def local_test_file_full_prefix():\n",
" return os.path.join(local_test_filename_base(), \"celeb_a-test.tfrecord\")\n",
"\n",
"def copy_test_files_to_local():\n",
" filename_base = local_test_file_full_prefix()\n",
" num_test_shards = num_test_shards_dict[version]\n",
" for shard in range(num_test_shards):\n",
" url = \"https://storage.googleapis.com/celeb_a_dataset/celeb_a/%s/celeb_a-test.tfrecord-0000%s-of-0000%s\" % (version, shard, num_test_shards)\n",
" filename = \"%s-0000%s-of-0000%s\" % (filename_base, shard, num_test_shards)\n",
" res = urllib.request.urlretrieve(url, filename)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "u5PDLXZb_uIj"
},
"source": [
"## Caveats\n",
"Before moving forward, there are several considerations to keep in mind in using CelebA:\n",
"* Although in principle this notebook could use any dataset of face images, CelebA was chosen because it contains public domain images of public figures.\n",
"* All of the attribute annotations in CelebA are operationalized as binary categories. For example, the \"Young\" attribute (as determined by the dataset labelers) is denoted as either present or absent in the image.\n",
"* CelebA's categorizations do not reflect real human diversity of attributes.\n",
"* For the purposes of this notebook, the feature containing the \"Young\" attribute is referred to as \"age group\", where the presence of the \"Young\" attribute in an image is labeled as a member of the \"Young\" age group and the absence of the \"Young\" attribute is labeled as a member of the \"Not Young\" age group. These are assumptions made as this information is not mentioned in the [original paper](http://openaccess.thecvf.com/content_iccv_2015/html/Liu_Deep_Learning_Face_ICCV_2015_paper.html).\n",
"* As such, performance in the models trained in this notebook is tied to the ways the attributes have been operationalized and annotated by the authors of CelebA.\n",
"* This model should not be used for commercial purposes as that would violate [CelebA's non-commercial research agreement](http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html)."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Elkiu92cY2bY"
},
"source": [
"# Setting Up Input Functions\n",
"The subsequent cells will help streamline the input pipeline as well as visualize performance.\n",
"\n",
"First we define some data-related variables and define a requisite preprocessing function."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "gDdarTZxk6y4"
},
"outputs": [],
"source": [
"#@title Define Variables\n",
"ATTR_KEY = \"attributes\"\n",
"IMAGE_KEY = \"image\"\n",
"LABEL_KEY = \"Smiling\"\n",
"GROUP_KEY = \"Young\"\n",
"IMAGE_SIZE = 28"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "SD-H70Je0cTp"
},
"outputs": [],
"source": [
"#@title Define Preprocessing Functions\n",
"def preprocess_input_dict(feat_dict):\n",
" # Separate out the image and target variable from the feature dictionary.\n",
" image = feat_dict[IMAGE_KEY]\n",
" label = feat_dict[ATTR_KEY][LABEL_KEY]\n",
" group = feat_dict[ATTR_KEY][GROUP_KEY]\n",
"\n",
" # Resize and normalize image.\n",
" image = tf.cast(image, tf.float32)\n",
" image = tf.image.resize(image, [IMAGE_SIZE, IMAGE_SIZE])\n",
" image /= 255.0\n",
"\n",
" # Cast label and group to float32.\n",
" label = tf.cast(label, tf.float32)\n",
" group = tf.cast(group, tf.float32)\n",
"\n",
" feat_dict[IMAGE_KEY] = image\n",
" feat_dict[ATTR_KEY][LABEL_KEY] = label\n",
" feat_dict[ATTR_KEY][GROUP_KEY] = group\n",
"\n",
" return feat_dict\n",
"\n",
"get_image_and_label = lambda feat_dict: (feat_dict[IMAGE_KEY], feat_dict[ATTR_KEY][LABEL_KEY])\n",
"get_image_label_and_group = lambda feat_dict: (feat_dict[IMAGE_KEY], feat_dict[ATTR_KEY][LABEL_KEY], feat_dict[ATTR_KEY][GROUP_KEY])"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "iwg3sPmExciD"
},
"source": [
"Then, we build out the data functions we need in the rest of the colab."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "KbR64r0VVG5h"
},
"outputs": [],
"source": [
"# Train data returning either 2 or 3 elements (the third element being the group)\n",
"def celeb_a_train_data_wo_group(batch_size):\n",
" celeb_a_train_data = celeb_a_builder.as_dataset(split='train').shuffle(1024).repeat().batch(batch_size).map(preprocess_input_dict)\n",
" return celeb_a_train_data.map(get_image_and_label)\n",
"def celeb_a_train_data_w_group(batch_size):\n",
" celeb_a_train_data = celeb_a_builder.as_dataset(split='train').shuffle(1024).repeat().batch(batch_size).map(preprocess_input_dict)\n",
" return celeb_a_train_data.map(get_image_label_and_group)\n",
"\n",
"# Test data for the overall evaluation\n",
"celeb_a_test_data = celeb_a_builder.as_dataset(split='test').batch(1).map(preprocess_input_dict).map(get_image_label_and_group)\n",
"# Copy test data locally to be able to read it into tfma\n",
"copy_test_files_to_local()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "NXO3woTxiCk0"
},
"source": [
"# Build a simple DNN Model\n",
"Because this notebook focuses on TFCO, we will assemble a simple, unconstrained `tf.keras.Sequential` model.\n",
"\n",
"We may be able to greatly improve model performance by adding some complexity (e.g., more densely-connected layers, exploring different activation functions, increasing image size), but that may distract from the goal of demonstrating how easy it is to apply the TFCO library when working with Keras. For that reason, the model will be kept simple — but feel encouraged to explore this space."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "RNZhN_zU8DRD"
},
"outputs": [],
"source": [
"def create_model():\n",
" # For this notebook, accuracy will be used to evaluate performance.\n",
" METRICS = [\n",
" tf.keras.metrics.BinaryAccuracy(name='accuracy')\n",
" ]\n",
"\n",
" # The model consists of:\n",
" # 1. An input layer that represents the 28x28x3 image flatten.\n",
" # 2. A fully connected layer with 64 units activated by a ReLU function.\n",
" # 3. A single-unit readout layer to output real-scores instead of probabilities.\n",
" model = keras.Sequential([\n",
" keras.layers.Flatten(input_shape=(IMAGE_SIZE, IMAGE_SIZE, 3), name='image'),\n",
" keras.layers.Dense(64, activation='relu'),\n",
" keras.layers.Dense(1, activation=None)\n",
" ])\n",
"\n",
" # TFCO by default uses hinge loss — and that will also be used in the model.\n",
" model.compile(\n",
" optimizer=tf.keras.optimizers.Adam(0.001),\n",
" loss='hinge',\n",
" metrics=METRICS)\n",
" return model"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "7A4uKPNVzPVO"
},
"source": [
"We also define a function to set seeds to ensure reproducible results. Note that this colab is meant as an educational tool and does not have the stability of a finely tuned production pipeline. Running without setting a seed may lead to varied results. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "-IVw4EgKzqSF"
},
"outputs": [],
"source": [
"def set_seeds():\n",
" np.random.seed(121212)\n",
" tf.compat.v1.set_random_seed(212121)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Xrbjmmeom8pA"
},
"source": [
"# Fairness Indicators Helper Functions\n",
"Before training our model, we define a number of helper functions that will allow us to evaluate the model's performance via Fairness Indicators.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "1EPF_k620CRN"
},
"source": [
"First, we create a helper function to save our model once we train it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "ejHbhLW5epar"
},
"outputs": [],
"source": [
"def save_model(model, subdir):\n",
" base_dir = tempfile.mkdtemp(prefix='saved_models')\n",
" model_location = os.path.join(base_dir, subdir)\n",
" model.save(model_location, save_format='tf')\n",
" return model_location"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "erhKEvqByCNj"
},
"source": [
"Next, we define functions used to preprocess the data in order to correctly pass it through to TFMA."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "D2qa8Okwj_U3"
},
"outputs": [],
"source": [
"#@title Data Preprocessing functions for \n",
"def tfds_filepattern_for_split(dataset_name, split):\n",
" return f\"{local_test_file_full_prefix()}*\"\n",
"\n",
"class PreprocessCelebA(object):\n",
" \"\"\"Class that deserializes, decodes and applies additional preprocessing for CelebA input.\"\"\"\n",
" def __init__(self, dataset_name):\n",
" builder = tfds.builder(dataset_name)\n",
" self.features = builder.info.features\n",
" example_specs = self.features.get_serialized_info()\n",
" self.parser = tfds.core.example_parser.ExampleParser(example_specs)\n",
"\n",
" def __call__(self, serialized_example):\n",
" # Deserialize\n",
" deserialized_example = self.parser.parse_example(serialized_example)\n",
" # Decode\n",
" decoded_example = self.features.decode_example(deserialized_example)\n",
" # Additional preprocessing\n",
" image = decoded_example[IMAGE_KEY]\n",
" label = decoded_example[ATTR_KEY][LABEL_KEY]\n",
" # Resize and scale image.\n",
" image = tf.cast(image, tf.float32)\n",
" image = tf.image.resize(image, [IMAGE_SIZE, IMAGE_SIZE])\n",
" image /= 255.0\n",
" image = tf.reshape(image, [-1])\n",
" # Cast label and group to float32.\n",
" label = tf.cast(label, tf.float32)\n",
"\n",
" group = decoded_example[ATTR_KEY][GROUP_KEY]\n",
" \n",
" output = tf.train.Example()\n",
" output.features.feature[IMAGE_KEY].float_list.value.extend(image.numpy().tolist())\n",
" output.features.feature[LABEL_KEY].float_list.value.append(label.numpy())\n",
" output.features.feature[GROUP_KEY].bytes_list.value.append(b\"Young\" if group.numpy() else b'Not Young')\n",
" return output.SerializeToString()\n",
"\n",
"def tfds_as_pcollection(beam_pipeline, dataset_name, split):\n",
" return (\n",
" beam_pipeline\n",
" | 'Read records' >> beam.io.ReadFromTFRecord(tfds_filepattern_for_split(dataset_name, split))\n",
" | 'Preprocess' >> beam.Map(PreprocessCelebA(dataset_name))\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "fBKvxd2Tz3hK"
},
"source": [
"Finally, we define a function that evaluates the results in TFMA."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "30YduitftaNB"
},
"outputs": [],
"source": [
"def get_eval_results(model_location, eval_subdir):\n",
" base_dir = tempfile.mkdtemp(prefix='saved_eval_results')\n",
" tfma_eval_result_path = os.path.join(base_dir, eval_subdir)\n",
"\n",
" eval_config_pbtxt = \"\"\"\n",
" model_specs {\n",
" label_key: \"%s\"\n",
" }\n",
" metrics_specs {\n",
" metrics {\n",
" class_name: \"FairnessIndicators\"\n",
" config: '{ \"thresholds\": [0.22, 0.5, 0.75] }'\n",
" }\n",
" metrics {\n",
" class_name: \"ExampleCount\"\n",
" }\n",
" }\n",
" slicing_specs {}\n",
" slicing_specs { feature_keys: \"%s\" }\n",
" options {\n",
" compute_confidence_intervals { value: False }\n",
" disabled_outputs{values: \"analysis\"}\n",
" }\n",
" \"\"\" % (LABEL_KEY, GROUP_KEY)\n",
" \n",
" eval_config = text_format.Parse(eval_config_pbtxt, tfma.EvalConfig())\n",
"\n",
" eval_shared_model = tfma.default_eval_shared_model(\n",
" eval_saved_model_path=model_location, tags=[tf.saved_model.SERVING])\n",
"\n",
" schema_pbtxt = \"\"\"\n",
" tensor_representation_group {\n",
" key: \"\"\n",
" value {\n",
" tensor_representation {\n",
" key: \"%s\"\n",
" value {\n",
" dense_tensor {\n",
" column_name: \"%s\"\n",
" shape {\n",
" dim { size: 28 }\n",
" dim { size: 28 }\n",
" dim { size: 3 }\n",
" }\n",
" }\n",
" }\n",
" }\n",
" }\n",
" }\n",
" feature {\n",
" name: \"%s\"\n",
" type: FLOAT\n",
" }\n",
" feature {\n",
" name: \"%s\"\n",
" type: FLOAT\n",
" }\n",
" feature {\n",
" name: \"%s\"\n",
" type: BYTES\n",
" }\n",
" \"\"\" % (IMAGE_KEY, IMAGE_KEY, IMAGE_KEY, LABEL_KEY, GROUP_KEY)\n",
" schema = text_format.Parse(schema_pbtxt, schema_pb2.Schema())\n",
" coder = tf_example_record.TFExampleBeamRecord(\n",
" physical_format='inmem', schema=schema,\n",
" raw_record_column_name=tfma.ARROW_INPUT_COLUMN)\n",
" tensor_adapter_config = tensor_adapter.TensorAdapterConfig(\n",
" arrow_schema=coder.ArrowSchema(),\n",
" tensor_representations=coder.TensorRepresentations())\n",
" # Run the fairness evaluation.\n",
" with beam.Pipeline() as pipeline:\n",
" _ = (\n",
" tfds_as_pcollection(pipeline, 'celeb_a', 'test')\n",
" | 'ExamplesToRecordBatch' >> coder.BeamSource()\n",
" | 'ExtractEvaluateAndWriteResults' >>\n",
" tfma.ExtractEvaluateAndWriteResults(\n",
" eval_config=eval_config,\n",
" eval_shared_model=eval_shared_model,\n",
" output_path=tfma_eval_result_path,\n",
" tensor_adapter_config=tensor_adapter_config)\n",
" )\n",
" return tfma.load_eval_result(output_path=tfma_eval_result_path)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "76tZ3vk-tyo9"
},
"source": [
"# Train & Evaluate Unconstrained Model\n",
"\n",
"With the model now defined and the input pipeline in place, we’re now ready to train our model. To cut back on the amount of execution time and memory, we will train the model by slicing the data into small batches with only a few repeated iterations.\n",
"\n",
"Note that running this notebook in TensorFlow < 2.0.0 may result in a deprecation warning for `np.where`. Safely ignore this warning as TensorFlow addresses this in 2.X by using `tf.where` in place of `np.where`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "3m9OOdU_8GWo"
},
"outputs": [],
"source": [
"BATCH_SIZE = 32\n",
"\n",
"# Set seeds to get reproducible results\n",
"set_seeds()\n",
"\n",
"model_unconstrained = create_model()\n",
"model_unconstrained.fit(celeb_a_train_data_wo_group(BATCH_SIZE), epochs=5, steps_per_epoch=1000)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "nCtBH9DkvtUy"
},
"source": [
"Evaluating the model on the test data should result in a final accuracy score of just over 85%. Not bad for a simple model with no fine tuning."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "mgsjbxpTIdZf"
},
"outputs": [],
"source": [
"print('Overall Results, Unconstrained')\n",
"celeb_a_test_data = celeb_a_builder.as_dataset(split='test').batch(1).map(preprocess_input_dict).map(get_image_label_and_group)\n",
"results = model_unconstrained.evaluate(celeb_a_test_data)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "L5jslIrzwIKo"
},
"source": [
"However, performance evaluated across age groups may reveal some shortcomings.\n",
"\n",
"To explore this further, we evaluate the model with Fairness Indicators (via TFMA). In particular, we are interested in seeing whether there is a significant gap in performance between \"Young\" and \"Not Young\" categories when evaluated on false positive rate.\n",
"\n",
"A false positive error occurs when the model incorrectly predicts the positive class. In this context, a false positive outcome occurs when the ground truth is an image of a celebrity 'Not Smiling' and the model predicts 'Smiling'. By extension, the false positive rate, which is used in the visualization above, is a measure of accuracy for a test. While this is a relatively mundane error to make in this context, false positive errors can sometimes cause more problematic behaviors. For instance, a false positive error in a spam classifier could cause a user to miss an important email."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "nFL91nZF1V8D"
},
"outputs": [],
"source": [
"model_location = save_model(model_unconstrained, 'model_export_unconstrained')\n",
"eval_results_unconstrained = get_eval_results(model_location, 'eval_results_unconstrained')"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "34zHIMW0NHld"
},
"source": [
"As mentioned above, we are concentrating on the false positive rate. The current version of Fairness Indicators (0.1.2) selects false negative rate by default. After running the line below, deselect false_negative_rate and select false_positive_rate to look at the metric we are interested in."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "KXMVmUMi0ydk"
},
"outputs": [],
"source": [
"tfma.addons.fairness.view.widget_view.render_fairness_indicator(eval_results_unconstrained)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "zYVpZ-DpBsfD"
},
"source": [
"As the results show above, we do see a **disproportionate gap between \"Young\" and \"Not Young\" categories**.\n",
"\n",
"This is where TFCO can help by constraining the false positive rate to be within a more acceptable criterion.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ZNnI_Eu70gVp"
},
"source": [
"# Constrained Model Set Up\n",
"As documented in [TFCO's library](https://github.com/google-research/tensorflow_constrained_optimization/blob/master/README.md), there are several helpers that will make it easier to constrain the problem:\n",
"\n",
"1. `tfco.rate_context()` – This is what will be used in constructing a constraint for each age group category.\n",
"2. `tfco.RateMinimizationProblem()`– The rate expression to be minimized here will be the false positive rate subject to age group. In other words, performance now will be evaluated based on the difference between the false positive rates of the age group and that of the overall dataset. For this demonstration, a false positive rate of less than or equal to 5% will be set as the constraint.\n",
"3. `tfco.ProxyLagrangianOptimizerV2()` – This is the helper that will actually solve the rate constraint problem.\n",
"\n",
"The cell below will call on these helpers to set up model training with the fairness constraint.\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "BTukzvfD6iWr"
},
"outputs": [],
"source": [
"# The batch size is needed to create the input, labels and group tensors.\n",
"# These tensors are initialized with all 0's. They will eventually be assigned\n",
"# the batch content to them. A large batch size is chosen so that there are\n",
"# enough number of \"Young\" and \"Not Young\" examples in each batch.\n",
"set_seeds()\n",
"model_constrained = create_model()\n",
"BATCH_SIZE = 32\n",
"\n",
"# Create input tensor.\n",
"input_tensor = tf.Variable(\n",
" np.zeros((BATCH_SIZE, IMAGE_SIZE, IMAGE_SIZE, 3), dtype=\"float32\"),\n",
" name=\"input\")\n",
"\n",
"# Create labels and group tensors (assuming both labels and groups are binary).\n",
"labels_tensor = tf.Variable(\n",
" np.zeros(BATCH_SIZE, dtype=\"float32\"), name=\"labels\")\n",
"groups_tensor = tf.Variable(\n",
" np.zeros(BATCH_SIZE, dtype=\"float32\"), name=\"groups\")\n",
"\n",
"# Create a function that returns the applied 'model' to the input tensor\n",
"# and generates constrained predictions.\n",
"def predictions():\n",
" return model_constrained(input_tensor)\n",
"\n",
"# Create overall context and subsetted context.\n",
"# The subsetted context contains subset of examples where group attribute < 1\n",
"# (i.e. the subset of \"Not Young\" celebrity images).\n",
"# \"groups_tensor < 1\" is used instead of \"groups_tensor == 0\" as the former\n",
"# would be a comparison on the tensor value, while the latter would be a\n",
"# comparison on the Tensor object.\n",
"context = tfco.rate_context(predictions, labels=lambda:labels_tensor)\n",
"context_subset = context.subset(lambda:groups_tensor < 1)\n",
"\n",
"# Setup list of constraints.\n",
"# In this notebook, the constraint will just be: FPR to less or equal to 5%.\n",
"constraints = [tfco.false_positive_rate(context_subset) <= 0.05]\n",
"\n",
"# Setup rate minimization problem: minimize overall error rate s.t. constraints.\n",
"problem = tfco.RateMinimizationProblem(tfco.error_rate(context), constraints)\n",
"\n",
"# Create constrained optimizer and obtain train_op.\n",
"# Separate optimizers are specified for the objective and constraints\n",
"optimizer = tfco.ProxyLagrangianOptimizerV2(\n",
" optimizer=tf.keras.optimizers.legacy.Adam(learning_rate=0.001),\n",
" constraint_optimizer=tf.keras.optimizers.legacy.Adam(learning_rate=0.001),\n",
" num_constraints=problem.num_constraints)\n",
"\n",
"# A list of all trainable variables is also needed to use TFCO.\n",
"var_list = (model_constrained.trainable_weights + list(problem.trainable_variables) +\n",
" optimizer.trainable_variables())"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "thEe8A8UYbrO"
},
"source": [
"The model is now set up and ready to be trained with the false positive rate constraint across age group.\n",
"\n",
"Now, because the last iteration of the constrained model may not necessarily be the best performing model in terms of the defined constraint, the TFCO library comes equipped with `tfco.find_best_candidate_index()` that can help choose the best iterate out of the ones found after each epoch. Think of `tfco.find_best_candidate_index()` as an added heuristic that ranks each of the outcomes based on accuracy and fairness constraint (in this case, false positive rate across age group) separately with respect to the training data. That way, it can search for a better trade-off between overall accuracy and the fairness constraint.\n",
"\n",
"The following cells will start the training with constraints while also finding the best performing model per iteration."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "73doG4HL6nPS"
},
"outputs": [],
"source": [
"# Obtain train set batches.\n",
"\n",
"NUM_ITERATIONS = 100 # Number of training iterations.\n",
"SKIP_ITERATIONS = 10 # Print training stats once in this many iterations.\n",
"\n",
"# Create temp directory for saving snapshots of models.\n",
"temp_directory = tempfile.mktemp()\n",
"os.mkdir(temp_directory)\n",
"\n",
"# List of objective and constraints across iterations.\n",
"objective_list = []\n",
"violations_list = []\n",
"\n",
"# Training iterations.\n",
"iteration_count = 0\n",
"for (image, label, group) in celeb_a_train_data_w_group(BATCH_SIZE):\n",
" # Assign current batch to input, labels and groups tensors.\n",
" input_tensor.assign(image)\n",
" labels_tensor.assign(label)\n",
" groups_tensor.assign(group)\n",
"\n",
" # Run gradient update.\n",
" optimizer.minimize(problem, var_list=var_list)\n",
"\n",
" # Record objective and violations.\n",
" objective = problem.objective()\n",
" violations = problem.constraints()\n",
"\n",
" sys.stdout.write(\n",
" \"\\r Iteration %d: Hinge Loss = %.3f, Max. Constraint Violation = %.3f\"\n",
" % (iteration_count + 1, objective, max(violations)))\n",
"\n",
" # Snapshot model once in SKIP_ITERATIONS iterations.\n",
" if iteration_count % SKIP_ITERATIONS == 0:\n",
" objective_list.append(objective)\n",
" violations_list.append(violations)\n",
"\n",
" # Save snapshot of model weights.\n",
" model_constrained.save_weights(\n",
" temp_directory + \"/celeb_a_constrained_\" +\n",
" str(iteration_count / SKIP_ITERATIONS) + \".h5\")\n",
"\n",
" iteration_count += 1\n",
" if iteration_count >= NUM_ITERATIONS:\n",
" break\n",
"\n",
"# Choose best model from recorded iterates and load that model.\n",
"best_index = tfco.find_best_candidate_index(\n",
" np.array(objective_list), np.array(violations_list))\n",
"\n",
"model_constrained.load_weights(\n",
" temp_directory + \"/celeb_a_constrained_\" + str(best_index) + \".0.h5\")\n",
"\n",
"# Remove temp directory.\n",
"os.system(\"rm -r \" + temp_directory)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "6r-6_R_gSrsT"
},
"source": [
"After having applied the constraint, we evaluate the results once again using Fairness Indicators."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "5G6B3OR9CUmo"
},
"outputs": [],
"source": [
"model_location = save_model(model_constrained, 'model_export_constrained')\n",
"eval_result_constrained = get_eval_results(model_location, 'eval_results_constrained')"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "sVteOnE80ATS"
},
"source": [
"As with the previous time we used Fairness Indicators, deselect false_negative_rate and select false_positive_rate to look at the metric we are interested in.\n",
"\n",
"Note that to fairly compare the two versions of our model, it is important to use thresholds that set the overall false positive rate to be roughly equal. This ensures that we are looking at actual change as opposed to just a shift in the model equivalent to simply moving the threshold boundary. In our case, comparing the unconstrained model at 0.5 and the constrained model at 0.22 provides a fair comparison for the models."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "GRIjYftvuc7b"
},
"outputs": [],
"source": [
"eval_results_dict = {\n",
" 'constrained': eval_result_constrained,\n",
" 'unconstrained': eval_results_unconstrained,\n",
"}\n",
"tfma.addons.fairness.view.widget_view.render_fairness_indicator(multi_eval_results=eval_results_dict)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "lrT-7EBrcBvV"
},
"source": [
"With TFCO's ability to express a more complex requirement as a rate constraint, we helped this model achieve a more desirable outcome with little impact to the overall performance. There is, of course, still room for improvement, but at least TFCO was able to find a model that gets close to satisfying the constraint and reduces the disparity between the groups as much as possible."
]
}
],
"metadata": {
"colab": {
"collapsed_sections": [],
"name": "Fairness Indicators TFCO CelebA Case Study.ipynb",
"private_outputs": true,
"provenance": [],
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.22"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
================================================
FILE: docs/tutorials/Fairness_Indicators_TFCO_Wiki_Case_Study.ipynb
================================================
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "jMqk3Z8EciF8"
},
"source": [
"##### Copyright 2020 The TensorFlow Authors."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "XbpNOB-vJVKu"
},
"outputs": [],
"source": [
"#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
"# you may not use this file except in compliance with the License.\n",
"# You may obtain a copy of the License at\n",
"#\n",
"# https://www.apache.org/licenses/LICENSE-2.0\n",
"#\n",
"# Unless required by applicable law or agreed to in writing, software\n",
"# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
"# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
"# See the License for the specific language governing permissions and\n",
"# limitations under the License."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "bqdaOVRxWs8v"
},
"source": [
"# Wiki Talk Comments Toxicity Prediction"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "EG_KEDkodWsT"
},
"source": [
"
"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "y6T5tlXcdW7J"
},
"source": [
"In this example, we consider the task of predicting whether a discussion comment posted on a Wiki talk page contains toxic content (i.e. contains content that is “rude, disrespectful or unreasonable”). We use a public dataset released by the Conversation AI project, which contains over 100k comments from the English Wikipedia that are annotated by crowd workers (see [paper](https://arxiv.org/pdf/1610.08914.pdf) for labeling methodology).\n",
"\n",
"One of the challenges with this dataset is that a very small proportion of the comments cover sensitive topics such as sexuality or religion. As such, training a neural network model on this dataset leads to disparate performance on the smaller sensitive topics. This can mean that innocuous statements about those topics might get incorrectly flagged as ‘toxic’ at higher rates, causing speech to be unfairly censored\n",
"\n",
"By imposing constraints during training, we can train a *fairer* model that performs more equitably across the different topic groups. \n",
"\n",
"We will use the TFCO library to optimize for our fairness goal during training."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "DG_C2gsAKV7x"
},
"source": [
"## Installation\n",
"\n",
"Let's first install and import the relevant libraries. Note that you may have to restart your colab once after running the first cell because of outdated packages in the runtime. After doing so, there should be no further issues with imports."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "0XOLn8Pyrc_s"
},
"outputs": [],
"source": [
"#@title pip installs\n",
"!pip install git+https://github.com/google-research/tensorflow_constrained_optimization\n",
"!pip install git+https://github.com/tensorflow/fairness-indicators"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "2ZkQDo2xcDXU"
},
"source": [
"Note that depending on when you run the cell below, you may receive a warning about the default version of TensorFlow in Colab switching to TensorFlow 2.X soon. You can safely ignore that warning as this notebook was designed to be compatible with TensorFlow 1.X and 2.X."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "nd_Y6CTnWs8w"
},
"outputs": [],
"source": [
"#@title Import Modules\n",
"import io\n",
"import os\n",
"import shutil\n",
"import sys\n",
"import tempfile\n",
"import time\n",
"import urllib\n",
"import zipfile\n",
"\n",
"import apache_beam as beam\n",
"from IPython.display import display\n",
"from IPython.display import HTML\n",
"import numpy as np\n",
"import pandas as pd\n",
"\n",
"import tensorflow as tf\n",
"import tensorflow.keras as keras\n",
"from tensorflow.keras import layers\n",
"from tensorflow.keras.preprocessing import sequence\n",
"from tensorflow.keras.preprocessing import text\n",
"import tensorflow_constrained_optimization as tfco\n",
"import tensorflow_model_analysis as tfma\n",
"import fairness_indicators as fi\n",
"from tensorflow_model_analysis.addons.fairness.view import widget_view\n",
"from tensorflow_model_analysis.model_agnostic_eval import model_agnostic_evaluate_graph\n",
"from tensorflow_model_analysis.model_agnostic_eval import model_agnostic_extractor\n",
"from tensorflow_model_analysis.model_agnostic_eval import model_agnostic_predict as agnostic_predict"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "GvqR564dLEVa"
},
"source": [
"Though TFCO is compatible with eager and graph execution, this notebook assumes that eager execution is enabled by default. To ensure that nothing breaks, eager execution will be enabled in the cell below."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "avMBqzjWct4Z"
},
"outputs": [],
"source": [
"#@title Enable Eager Execution and Print Versions\n",
"if tf.__version__ < \"2.0.0\":\n",
" tf.enable_eager_execution()\n",
" print(\"Eager execution enabled.\")\n",
"else:\n",
" print(\"Eager execution enabled by default.\")\n",
"\n",
"print(\"TensorFlow \" + tf.__version__)\n",
"print(\"TFMA \" + tfma.__version__)\n",
"print(\"FI \" + fi.version.__version__)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "YUJyWaAwWs83"
},
"source": [
"## Hyper-parameters\n",
"\n",
"First, we set some hyper-parameters needed for the data preprocessing and model training."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "1aXlwlqTWs84"
},
"outputs": [],
"source": [
"hparams = {\n",
" \"batch_size\": 128,\n",
" \"cnn_filter_sizes\": [128, 128, 128],\n",
" \"cnn_kernel_sizes\": [5, 5, 5],\n",
" \"cnn_pooling_sizes\": [5, 5, 40],\n",
" \"constraint_learning_rate\": 0.01,\n",
" \"embedding_dim\": 100,\n",
" \"embedding_trainable\": False,\n",
" \"learning_rate\": 0.005,\n",
" \"max_num_words\": 10000,\n",
" \"max_sequence_length\": 250\n",
"}"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "0PMs8Iwxq98C"
},
"source": [
"## Load and pre-process dataset"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "DIe2JRDeWs87"
},
"source": [
"Next, we download the dataset and preprocess it. The train, test and validation sets are provided as separate CSV files."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "rcd2CV7pWs88"
},
"outputs": [],
"source": [
"toxicity_data_url = (\"https://github.com/conversationai/unintended-ml-bias-analysis/\"\n",
" \"raw/e02b9f12b63a39235e57ba6d3d62d8139ca5572c/data/\")\n",
"\n",
"data_train = pd.read_csv(toxicity_data_url + \"wiki_train.csv\")\n",
"data_test = pd.read_csv(toxicity_data_url + \"wiki_test.csv\")\n",
"data_vali = pd.read_csv(toxicity_data_url + \"wiki_dev.csv\")\n",
"\n",
"data_train.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Ojo617RIWs8_"
},
"source": [
"The `comment` column contains the discussion comments and `is_toxic` column indicates whether or not a comment is annotated as toxic. \n",
"\n",
"In the following, we:\n",
"1. Separate out the labels\n",
"2. Tokenize the text comments\n",
"3. Identify comments that contain sensitive topic terms \n",
"\n",
"First, we separate the labels from the train, test and validation sets. The labels are all binary (0 or 1)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "mxo7ny90Ws9A"
},
"outputs": [],
"source": [
"labels_train = data_train[\"is_toxic\"].values.reshape(-1, 1) * 1.0\n",
"labels_test = data_test[\"is_toxic\"].values.reshape(-1, 1) * 1.0\n",
"labels_vali = data_vali[\"is_toxic\"].values.reshape(-1, 1) * 1.0"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "alrWi6jUWs9C"
},
"source": [
"Next, we tokenize the textual comments using the `Tokenizer` provided by `Keras`. We use the training set comments alone to build a vocabulary of tokens, and use them to convert all the comments into a (padded) sequence of tokens of the same length."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "yvOTBsrHWs9D"
},
"outputs": [],
"source": [
"tokenizer = text.Tokenizer(num_words=hparams[\"max_num_words\"])\n",
"tokenizer.fit_on_texts(data_train[\"comment\"])\n",
"\n",
"def prep_text(texts, tokenizer, max_sequence_length):\n",
" # Turns text into into padded sequences.\n",
" text_sequences = tokenizer.texts_to_sequences(texts)\n",
" return sequence.pad_sequences(text_sequences, maxlen=max_sequence_length)\n",
"\n",
"text_train = prep_text(data_train[\"comment\"], tokenizer, hparams[\"max_sequence_length\"])\n",
"text_test = prep_text(data_test[\"comment\"], tokenizer, hparams[\"max_sequence_length\"])\n",
"text_vali = prep_text(data_vali[\"comment\"], tokenizer, hparams[\"max_sequence_length\"])"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Cn5zbgp-Ws9F"
},
"source": [
"Finally, we identify comments related to certain sensitive topic groups. We consider a subset of the identity terms provided with the dataset and group them into\n",
"four broad topic groups: *sexuality*, *gender identity*, *religion*, and *race*."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "EnFfV2gEWs9G"
},
"outputs": [],
"source": [
"terms = {\n",
" 'sexuality': ['gay', 'lesbian', 'bisexual', 'homosexual', 'straight', 'heterosexual'], \n",
" 'gender identity': ['trans', 'transgender', 'cis', 'nonbinary'],\n",
" 'religion': ['christian', 'muslim', 'jewish', 'buddhist', 'catholic', 'protestant', 'sikh', 'taoist'],\n",
" 'race': ['african', 'african american', 'black', 'white', 'european', 'hispanic', 'latino', 'latina', \n",
" 'latinx', 'mexican', 'canadian', 'american', 'asian', 'indian', 'middle eastern', 'chinese', \n",
" 'japanese']}\n",
"\n",
"group_names = list(terms.keys())\n",
"num_groups = len(group_names)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ooI3F5M4Ws9I"
},
"source": [
"We then create separate group membership matrices for the train, test and validation sets, where the rows correspond to comments, the columns correspond to the four sensitive groups, and each entry is a boolean indicating whether the comment contains a term from the topic group."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "zO7PyNckWs9J"
},
"outputs": [],
"source": [
"def get_groups(text):\n",
" # Returns a boolean NumPy array of shape (n, k), where n is the number of comments, \n",
" # and k is the number of groups. Each entry (i, j) indicates if the i-th comment \n",
" # contains a term from the j-th group.\n",
" groups = np.zeros((text.shape[0], num_groups))\n",
" for ii in range(num_groups):\n",
" groups[:, ii] = text.str.contains('|'.join(terms[group_names[ii]]), case=False)\n",
" return groups\n",
"\n",
"groups_train = get_groups(data_train[\"comment\"])\n",
"groups_test = get_groups(data_test[\"comment\"])\n",
"groups_vali = get_groups(data_vali[\"comment\"])"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "GFAI6AB9Ws9L"
},
"source": [
"As shown below, all four topic groups constitute only a small fraction of the overall dataset, and have varying proportions of toxic comments."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "8Ug4u_P9Ws9M"
},
"outputs": [],
"source": [
"print(\"Overall label proportion = %.1f%%\" % (labels_train.mean() * 100))\n",
"\n",
"group_stats = []\n",
"for ii in range(num_groups):\n",
" group_proportion = groups_train[:, ii].mean()\n",
" group_pos_proportion = labels_train[groups_train[:, ii] == 1].mean()\n",
" group_stats.append([group_names[ii],\n",
" \"%.2f%%\" % (group_proportion * 100), \n",
" \"%.1f%%\" % (group_pos_proportion * 100)])\n",
"group_stats = pd.DataFrame(group_stats, \n",
" columns=[\"Topic group\", \"Group proportion\", \"Label proportion\"])\n",
"group_stats"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "aG5ZKKrVWs9O"
},
"source": [
"We see that only 1.3% of the dataset contains comments related to sexuality. Among them, 37% of the comments have been annotated as being toxic. Note that this is significantly larger than the overall proportion of comments annotated as toxic. This could be because the few comments that used those identity terms did so in pejorative contexts. As mentioned above, this could cause our model to disporportionately misclassify comments as toxic when they include those terms. Since this is the concern, we'll make sure to look at the **False Positive Rate** when we evaluate the model's performance."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5DkJpKaLWs9P"
},
"source": [
"## Build CNN toxicity prediction model"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "niJ4KIJgWs9Q"
},
"source": [
"Having prepared the dataset, we now build a `Keras` model for prediction toxicity. The model we use is a convolutional neural network (CNN) with the same architecture used by the Conversation AI project for their debiasing analysis. We adapt code provided by them to construct the model layers.\n",
"\n",
"The model uses an embedding layer to convert the text tokens to fixed-length vectors. This layer converts the input text sequence into a sequence of vectors, and passes them through several layers of convolution and pooling operations, followed by a final fully-connected layer.\n",
"\n",
"We make use of pre-trained GloVe word vector embeddings, which we download below. This may take a few minutes to complete."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "yevbBL2oWs9Q"
},
"outputs": [],
"source": [
"zip_file_url = \"http://nlp.stanford.edu/data/glove.6B.zip\"\n",
"zip_file = urllib.request.urlopen(zip_file_url)\n",
"archive = zipfile.ZipFile(io.BytesIO(zip_file.read()))"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "a11-YWDnWs9S"
},
"source": [
"We use the downloaded GloVe embeddings to create an embedding matrix, where the rows contain the word embeddings for the tokens in the `Tokenizer`'s vocabulary. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "bBS74MMYWs9T"
},
"outputs": [],
"source": [
"embeddings_index = {}\n",
"glove_file = \"glove.6B.100d.txt\"\n",
"\n",
"with archive.open(glove_file) as f:\n",
" for line in f:\n",
" values = line.split()\n",
" word = values[0].decode(\"utf-8\") \n",
" coefs = np.asarray(values[1:], dtype=\"float32\")\n",
" embeddings_index[word] = coefs\n",
"\n",
"embedding_matrix = np.zeros((len(tokenizer.word_index) + 1, hparams[\"embedding_dim\"]))\n",
"num_words_in_embedding = 0\n",
"for word, i in tokenizer.word_index.items():\n",
" embedding_vector = embeddings_index.get(word)\n",
" if embedding_vector is not None:\n",
" num_words_in_embedding += 1\n",
" embedding_matrix[i] = embedding_vector"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "t9NVp-_eWs9V"
},
"source": [
"We are now ready to specify the `Keras` layers. We write a function to create a new model, which we will invoke whenever we wish to train a new model."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "_f_DhA6OWs9W"
},
"outputs": [],
"source": [
"def create_model():\n",
" model = keras.Sequential()\n",
"\n",
" # Embedding layer.\n",
" embedding_layer = layers.Embedding(\n",
" embedding_matrix.shape[0],\n",
" embedding_matrix.shape[1],\n",
" weights=[embedding_matrix],\n",
" input_length=hparams[\"max_sequence_length\"],\n",
" trainable=hparams['embedding_trainable'])\n",
" model.add(embedding_layer)\n",
"\n",
" # Convolution layers.\n",
" for filter_size, kernel_size, pool_size in zip(\n",
" hparams['cnn_filter_sizes'], hparams['cnn_kernel_sizes'],\n",
" hparams['cnn_pooling_sizes']):\n",
"\n",
" conv_layer = layers.Conv1D(\n",
" filter_size, kernel_size, activation='relu', padding='same')\n",
" model.add(conv_layer)\n",
"\n",
" pooled_layer = layers.MaxPooling1D(pool_size, padding='same')\n",
" model.add(pooled_layer)\n",
"\n",
" # Add a flatten layer, a fully-connected layer and an output layer.\n",
" model.add(layers.Flatten())\n",
" model.add(layers.Dense(128, activation='relu'))\n",
" model.add(layers.Dense(1))\n",
" \n",
" return model"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "CwcqYITBN7bW"
},
"source": [
"We also define a method to set random seeds. This is done to ensure reproducible results."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "C_1nsXntN98C"
},
"outputs": [],
"source": [
"def set_seeds():\n",
" np.random.seed(121212)\n",
" tf.compat.v1.set_random_seed(212121)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "X-_fKjDtWs9Y"
},
"source": [
"## Fairness indicators"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "k009haGaWs9Z"
},
"source": [
"We also write functions to plot fairness indicators."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "B9ZgGCAs8V-I"
},
"outputs": [],
"source": [
"def create_examples(labels, predictions, groups, group_names):\n",
" # Returns tf.examples with given labels, predictions, and group information. \n",
" examples = []\n",
" sigmoid = lambda x: 1/(1 + np.exp(-x)) \n",
" for ii in range(labels.shape[0]):\n",
" example = tf.train.Example()\n",
" example.features.feature['toxicity'].float_list.value.append(\n",
" labels[ii][0])\n",
" example.features.feature['prediction'].float_list.value.append(\n",
" sigmoid(predictions[ii][0])) # predictions need to be in [0, 1].\n",
" for jj in range(groups.shape[1]):\n",
" example.features.feature[group_names[jj]].bytes_list.value.append(\n",
" b'Yes' if groups[ii, jj] else b'No')\n",
" examples.append(example)\n",
" return examples"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "vESL-3dU9iiG"
},
"outputs": [],
"source": [
"def evaluate_results(labels, predictions, groups, group_names):\n",
" # Evaluates fairness indicators for given labels, predictions and group\n",
" # membership info.\n",
" examples = create_examples(labels, predictions, groups, group_names)\n",
"\n",
" # Create feature map for labels, predictions and each group.\n",
" feature_map = {\n",
" 'prediction': tf.io.FixedLenFeature([], tf.float32),\n",
" 'toxicity': tf.io.FixedLenFeature([], tf.float32),\n",
" }\n",
" for group in group_names:\n",
" feature_map[group] = tf.io.FixedLenFeature([], tf.string)\n",
"\n",
" # Serialize the examples.\n",
" serialized_examples = [e.SerializeToString() for e in examples]\n",
"\n",
" BASE_DIR = tempfile.gettempdir()\n",
" OUTPUT_DIR = os.path.join(BASE_DIR, 'output')\n",
"\n",
" with beam.Pipeline() as pipeline:\n",
" model_agnostic_config = agnostic_predict.ModelAgnosticConfig(\n",
" label_keys=['toxicity'],\n",
" prediction_keys=['prediction'],\n",
" feature_spec=feature_map)\n",
" \n",
" slices = [tfma.slicer.SingleSliceSpec()]\n",
" for group in group_names:\n",
" slices.append(\n",
" tfma.slicer.SingleSliceSpec(columns=[group]))\n",
"\n",
" extractors = [\n",
" model_agnostic_extractor.ModelAgnosticExtractor(\n",
" model_agnostic_config=model_agnostic_config),\n",
" tfma.extractors.slice_key_extractor.SliceKeyExtractor(slices)\n",
" ]\n",
"\n",
" metrics_callbacks = [\n",
" tfma.post_export_metrics.fairness_indicators(\n",
" thresholds=[0.5],\n",
" target_prediction_keys=['prediction'],\n",
" labels_key='toxicity'),\n",
" tfma.post_export_metrics.example_count()]\n",
"\n",
" # Create a model agnostic aggregator.\n",
" eval_shared_model = tfma.types.EvalSharedModel(\n",
" add_metrics_callbacks=metrics_callbacks,\n",
" construct_fn=model_agnostic_evaluate_graph.make_construct_fn(\n",
" add_metrics_callbacks=metrics_callbacks,\n",
" config=model_agnostic_config))\n",
"\n",
" # Run Model Agnostic Eval.\n",
" _ = (\n",
" pipeline\n",
" | beam.Create(serialized_examples)\n",
" | 'ExtractEvaluateAndWriteResults' >>\n",
" tfma.ExtractEvaluateAndWriteResults(\n",
" eval_shared_model=eval_shared_model,\n",
" output_path=OUTPUT_DIR,\n",
" extractors=extractors,\n",
" compute_confidence_intervals=True\n",
" )\n",
" )\n",
"\n",
" fairness_ind_result = tfma.load_eval_result(output_path=OUTPUT_DIR)\n",
"\n",
" # Also evaluate accuracy of the model.\n",
" accuracy = np.mean(labels == (predictions > 0.0))\n",
"\n",
" return fairness_ind_result, accuracy"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "W3Sp7mpsWs9f"
},
"outputs": [],
"source": [
"def plot_fairness_indicators(eval_result, title):\n",
" fairness_ind_result, accuracy = eval_result\n",
" display(HTML(\"
\"\n",
" for title in multi_eval_results.keys():\n",
" title_str+=title + \" (Accuracy = %.2f%%)\" % (multi_accuracy[title] * 100) + \"; \"\n",
" title_str=title_str[:-2]\n",
" title_str+=\"
\"\n",
" # fairness_ind_result, accuracy = eval_result\n",
" display(HTML(title_str))\n",
" widget_view.render_fairness_indicator(multi_eval_results=multi_results)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "8aWNc4CdWs9h"
},
"source": [
"## Train unconstrained model"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "DuSA8qL7Ws9i"
},
"source": [
"For the first model we train, we optimize a simple cross-entropy loss *without* any constraints.."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "0g50bauHWs9j"
},
"outputs": [],
"source": [
"# Set random seed for reproducible results.\n",
"set_seeds()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "YsCoHMG_iIzc"
},
"source": [
"**Note**: The following code cell can take ~8 minutes to run."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "tamJiG3FiDYW"
},
"outputs": [],
"source": [
"# Optimizer and loss.\n",
"optimizer = tf.keras.optimizers.Adam(learning_rate=hparams[\"learning_rate\"])\n",
"loss = lambda y_true, y_pred: tf.keras.losses.binary_crossentropy(\n",
" y_true, y_pred, from_logits=True)\n",
"\n",
"# Create, compile and fit model.\n",
"model_unconstrained = create_model()\n",
"model_unconstrained.compile(optimizer=optimizer, loss=loss)\n",
"\n",
"model_unconstrained.fit(\n",
" x=text_train, y=labels_train, batch_size=hparams[\"batch_size\"], epochs=2)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "p7AvIdktWs9t"
},
"source": [
"Having trained the unconstrained model, we plot various evaluation metrics for the model on the test set."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "tHV40_21lRL6"
},
"outputs": [],
"source": [
"scores_unconstrained_test = model_unconstrained.predict(text_test)\n",
"eval_result_unconstrained = evaluate_results(\n",
" labels_test, scores_unconstrained_test, groups_test, group_names)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "AJpRuN0EOeyG"
},
"source": [
"As explained above, we are concentrating on the false positive rate. In their current version (0.1.2), Fairness Indicators select false negative rate by default. After running the line below, go ahead and deselect false_negative_rate and select false_positive_rate to look at the metric we are interested in."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "2fwNpfou4yvP"
},
"outputs": [],
"source": [
"plot_fairness_indicators(eval_result_unconstrained, \"Unconstrained\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "J3TbAenkGM7P"
},
"source": [
"While the overall false positive rate is less than 2%, the false positive rate on the sexuality-related comments is significantly higher. This is because the sexuality group is very small in size, and has a disproportionately higher fraction of comments annotated as toxic. Hence, training a model without constraints results in the model believing that sexuality-related terms are a strong indicator of toxicity."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "KmxyAo9hWs9w"
},
"source": [
"## Train with constraints on false positive rates"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "l3dYUchIWs9w"
},
"source": [
"To avoid large differences in false positive rates across different groups, we \n",
"next train a model by constraining the false positive rates for each group to be within a desired limit. In this case, we will optimize the error rate of the model subject to the *per-group false positive rates being lesser or equal to 2%*.\n",
"\n",
"Training on minibatches with per-group constraints can be challenging for this dataset, however, as the groups we wish to constraint are all small in size, and it's likely that the individual minibatches contain very few examples from each group. Hence the gradients we compute during training will be noisy, and result in the model converging very slowly. \n",
"\n",
"To mitigate this problem, we recommend using two streams of minibatches, with the first stream formed as before from the entire training set, and the second stream formed solely from the sensitive group examples. We will compute the objective using minibatches from the first stream and the per-group constraints using minibatches from the second stream. Because the batches from the second stream are likely to contain a larger number of examples from each group, we expect our updates to be less noisy.\n",
"\n",
"We create separate features, labels and groups tensors to hold the minibatches from the two streams."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "vMuuTOEOWs9x"
},
"outputs": [],
"source": [
"# Set random seed.\n",
"set_seeds()\n",
"\n",
"# Features tensors.\n",
"batch_shape = (hparams[\"batch_size\"], hparams['max_sequence_length'])\n",
"features_tensor = tf.Variable(np.zeros(batch_shape, dtype='int32'), name='x')\n",
"features_tensor_sen = tf.Variable(np.zeros(batch_shape, dtype='int32'), name='x_sen')\n",
"\n",
"# Labels tensors.\n",
"batch_shape = (hparams[\"batch_size\"], 1)\n",
"labels_tensor = tf.Variable(np.zeros(batch_shape, dtype='float32'), name='labels')\n",
"labels_tensor_sen = tf.Variable(np.zeros(batch_shape, dtype='float32'), name='labels_sen')\n",
"\n",
"# Groups tensors.\n",
"batch_shape = (hparams[\"batch_size\"], num_groups)\n",
"groups_tensor_sen = tf.Variable(np.zeros(batch_shape, dtype='float32'), name='groups_sen')"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "-wh26V7nWs9z"
},
"source": [
"We instantiate a new model, and compute predictions for minibatches from the two streams."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "kawyrkQIWs9z"
},
"outputs": [],
"source": [
"# Create model, and separate prediction functions for the two streams. \n",
"# For the predictions, we use a nullary function returning a Tensor to support eager mode.\n",
"model_constrained = create_model()\n",
"\n",
"def predictions():\n",
" return model_constrained(features_tensor)\n",
"\n",
"def predictions_sen():\n",
" return model_constrained(features_tensor_sen)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "UG9t7dw1Ws91"
},
"source": [
"We then set up a constrained optimization problem with the error rate as the objective and with constraints on the per-group false positive rate."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "EhKAMGSJWs93"
},
"outputs": [],
"source": [
"epsilon = 0.02 # Desired false-positive rate threshold.\n",
"\n",
"# Set up separate contexts for the two minibatch streams.\n",
"context = tfco.rate_context(predictions, lambda:labels_tensor)\n",
"context_sen = tfco.rate_context(predictions_sen, lambda:labels_tensor_sen)\n",
"\n",
"# Compute the objective using the first stream.\n",
"objective = tfco.error_rate(context)\n",
"\n",
"# Compute the constraint using the second stream.\n",
"# Subset the examples belonging to the \"sexuality\" group from the second stream \n",
"# and add a constraint on the group's false positive rate.\n",
"context_sen_subset = context_sen.subset(lambda: groups_tensor_sen[:, 0] > 0)\n",
"constraint = [tfco.false_positive_rate(context_sen_subset) <= epsilon]\n",
"\n",
"# Create a rate minimization problem.\n",
"problem = tfco.RateMinimizationProblem(objective, constraint)\n",
"\n",
"# Set up a constrained optimizer.\n",
"optimizer = tfco.ProxyLagrangianOptimizerV2(\n",
" optimizer=tf.keras.optimizers.Adam(learning_rate=hparams[\"learning_rate\"]),\n",
" num_constraints=problem.num_constraints)\n",
"\n",
"# List of variables to optimize include the model weights, \n",
"# and the trainable variables from the rate minimization problem and \n",
"# the constrained optimizer.\n",
"var_list = (model_constrained.trainable_weights + list(problem.trainable_variables) +\n",
" optimizer.trainable_variables())"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "CoFWd8wMWs94"
},
"source": [
"We are ready to train the model. We maintain a separate counter for the two minibatch streams. Every time we perform a gradient update, we will have to copy the minibatch contents from the first stream to the tensors `features_tensor` and `labels_tensor`, and the minibatch contents from the second stream to the tensors `features_tensor_sen`, `labels_tensor_sen` and `groups_tensor_sen`.\n",
"\n",
"**Note**: The following code cell may take ~12 minutes to run."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "zbXohC6vWs95"
},
"outputs": [],
"source": [
"# Indices of sensitive group members.\n",
"protected_group_indices = np.nonzero(groups_train.sum(axis=1))[0]\n",
"\n",
"num_examples = text_train.shape[0]\n",
"num_examples_sen = protected_group_indices.shape[0]\n",
"batch_size = hparams[\"batch_size\"]\n",
"\n",
"# Number of steps needed for one epoch over the training sample.\n",
"num_steps = int(num_examples / batch_size)\n",
"\n",
"start_time = time.time()\n",
"\n",
"# Loop over minibatches.\n",
"for batch_index in range(num_steps):\n",
" # Indices for current minibatch in the first stream.\n",
" batch_indices = np.arange(\n",
" batch_index * batch_size, (batch_index + 1) * batch_size)\n",
" batch_indices = [ind % num_examples for ind in batch_indices]\n",
"\n",
" # Indices for current minibatch in the second stream.\n",
" batch_indices_sen = np.arange(\n",
" batch_index * batch_size, (batch_index + 1) * batch_size)\n",
" batch_indices_sen = [protected_group_indices[ind % num_examples_sen]\n",
" for ind in batch_indices_sen]\n",
"\n",
" # Assign features, labels, groups from the minibatches to the respective tensors.\n",
" features_tensor.assign(text_train[batch_indices, :])\n",
" labels_tensor.assign(labels_train[batch_indices])\n",
"\n",
" features_tensor_sen.assign(text_train[batch_indices_sen, :])\n",
" labels_tensor_sen.assign(labels_train[batch_indices_sen])\n",
" groups_tensor_sen.assign(groups_train[batch_indices_sen, :])\n",
"\n",
" # Gradient update.\n",
" optimizer.minimize(problem, var_list=var_list)\n",
" \n",
" # Record and print batch training stats every 10 steps.\n",
" if (batch_index + 1) % 10 == 0 or batch_index in (0, num_steps - 1):\n",
" hinge_loss = problem.objective()\n",
" max_violation = max(problem.constraints())\n",
"\n",
" elapsed_time = time.time() - start_time\n",
" sys.stdout.write(\n",
" \"\\rStep %d / %d: Elapsed time = %ds, Loss = %.3f, Violation = %.3f\" % \n",
" (batch_index + 1, num_steps, elapsed_time, hinge_loss, max_violation))\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "DdJfplDpWs97"
},
"source": [
"Having trained the constrained model, we plot various evaluation metrics for the model on the test set."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "jEerPEwLhfTN"
},
"outputs": [],
"source": [
"scores_constrained_test = model_constrained.predict(text_test)\n",
"eval_result_constrained = evaluate_results(\n",
" labels_test, scores_constrained_test, groups_test, group_names)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ustp5z7xQnHI"
},
"source": [
"As with last time, remember to select false_positive_rate."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "ztK7iM4LjKmT"
},
"outputs": [],
"source": [
"plot_fairness_indicators(eval_result_constrained, \"Constrained\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "6P6dxSg5_mTu"
},
"outputs": [],
"source": [
"multi_results = {\n",
" 'constrained':eval_result_constrained,\n",
" 'unconstrained':eval_result_unconstrained,\n",
"}\n",
"plot_multi_fairness_indicators(multi_eval_results=multi_results)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "EfKo5O3QWs9-"
},
"source": [
"As we can see from the Fairness Indicators, compared to the unconstrained model the constrained model yields significantly lower false positive rates for the sexuality-related comments, and does so with only a slight dip in the overall accuracy."
]
}
],
"metadata": {
"colab": {
"collapsed_sections": [],
"name": "Fairness Indicators TFCO Wiki Comments Case Study.ipynb",
"private_outputs": true,
"provenance": [],
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.22"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
================================================
FILE: docs/tutorials/Fairness_Indicators_TensorBoard_Plugin_Example_Colab.ipynb
================================================
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "_E4uORykIpG4"
},
"source": [
"##### Copyright 2020 The TensorFlow Authors."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "aBT221yVIujn"
},
"outputs": [],
"source": [
"#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
"# you may not use this file except in compliance with the License.\n",
"# You may obtain a copy of the License at\n",
"#\n",
"# https://www.apache.org/licenses/LICENSE-2.0\n",
"#\n",
"# Unless required by applicable law or agreed to in writing, software\n",
"# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
"# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
"# See the License for the specific language governing permissions and\n",
"# limitations under the License."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "aalPefrUUplk"
},
"source": [
"# Fairness Indicators TensorBoard Plugin Example Colab"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "fFTJpyFlI-uI"
},
"source": [
"
"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "w0zsksbydmNp"
},
"source": [
"In this tutorial, you will learn how to use [Fairness Indicators](https://github.com/tensorflow/fairness-indicators) to evaluate embeddings from [TF Hub](https://www.tensorflow.org/hub). This notebook uses the [Civil Comments dataset](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification)."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "u33JXdluZ2lG"
},
"source": [
"## Setup\n",
"\n",
"Install the required libraries."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "BAUEkqYlzP3W"
},
"outputs": [],
"source": [
"!pip install -q -U pip==20.2\n",
"\n",
"!pip install fairness-indicators \\\n",
" \"absl-py==0.12.0\" \\\n",
" \"pyarrow==10.0.1\" \\\n",
" \"apache-beam==2.50.0\" \\\n",
" \"avro-python3==1.9.1\""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "e6pe8c6L7kCW"
},
"source": [
"Import other required libraries."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "B8dlyTyiTe-9"
},
"outputs": [],
"source": [
"import os\n",
"import tempfile\n",
"import apache_beam as beam\n",
"from datetime import datetime\n",
"import tensorflow as tf\n",
"import tensorflow_hub as hub\n",
"import tensorflow_model_analysis as tfma\n",
"from tensorflow_model_analysis.addons.fairness.view import widget_view\n",
"from tensorflow_model_analysis.addons.fairness.post_export_metrics import fairness_indicators\n",
"from fairness_indicators import example_model\n",
"from fairness_indicators.tutorial_utils import util"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Xz4PcI0hSVcq"
},
"source": [
"### Dataset\n",
"\n",
"In this notebook, you work with the [Civil Comments dataset](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification) which contains approximately 2 million public comments made public by the [Civil Comments platform](https://github.com/reaktivstudios/civil-comments) in 2017 for ongoing research. This effort was sponsored by Jigsaw, who have hosted competitions on Kaggle to help classify toxic comments as well as minimize unintended model bias.\n",
"\n",
"Each individual text comment in the dataset has a toxicity label, with the label being 1 if the comment is toxic and 0 if the comment is non-toxic. Within the data, a subset of comments are labeled with a variety of identity attributes, including categories for gender, sexual orientation, religion, and race or ethnicity."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "9ekzb7vVnPCc"
},
"source": [
"### Prepare the data\n",
"\n",
"TensorFlow parses features from data using [`tf.io.FixedLenFeature`](https://www.tensorflow.org/api_docs/python/tf/io/FixedLenFeature) and [`tf.io.VarLenFeature`](https://www.tensorflow.org/api_docs/python/tf/io/VarLenFeature). Map out the input feature, output feature, and all other slicing features of interest."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "n4_nXQDykX6W"
},
"outputs": [],
"source": [
"BASE_DIR = tempfile.gettempdir()\n",
"\n",
"# The input and output features of the classifier\n",
"TEXT_FEATURE = 'comment_text'\n",
"LABEL = 'toxicity'\n",
"\n",
"FEATURE_MAP = {\n",
" # input and output features\n",
" LABEL: tf.io.FixedLenFeature([], tf.float32),\n",
" TEXT_FEATURE: tf.io.FixedLenFeature([], tf.string),\n",
"\n",
" # slicing features\n",
" 'sexual_orientation': tf.io.VarLenFeature(tf.string),\n",
" 'gender': tf.io.VarLenFeature(tf.string),\n",
" 'religion': tf.io.VarLenFeature(tf.string),\n",
" 'race': tf.io.VarLenFeature(tf.string),\n",
" 'disability': tf.io.VarLenFeature(tf.string)\n",
"}\n",
"\n",
"IDENTITY_TERMS = ['gender', 'sexual_orientation', 'race', 'religion', 'disability']"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "CeUtnaT49Doq"
},
"source": [
"By default, the notebook downloads a preprocessed version of this dataset, but\n",
"you may use the original dataset and re-run the processing steps if\n",
"desired.\n",
"\n",
"In the original dataset, each comment is labeled with the percentage\n",
"of raters who believed that a comment corresponds to a particular\n",
"identity. For example, a comment might be labeled with the following:\n",
"`{ male: 0.3, female: 1.0, transgender: 0.0, heterosexual: 0.8,\n",
"homosexual_gay_or_lesbian: 1.0 }`.\n",
"\n",
"The processing step groups identity by category (gender,\n",
"sexual_orientation, etc.) and removes identities with a score less\n",
"than 0.5. So the example above would be converted to the following:\n",
"of raters who believed that a comment corresponds to a particular\n",
"identity. For example, the comment above would be labeled with the\n",
"following:\n",
"`{ gender: [female], sexual_orientation: [heterosexual,\n",
"homosexual_gay_or_lesbian] }`"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "FHxa31VX9eP2"
},
"source": [
"Download the dataset."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "NUmSmqYGS0n8"
},
"outputs": [],
"source": [
"download_original_data = False #@param {type:\"boolean\"}\n",
"\n",
"if download_original_data:\n",
" train_tf_file = tf.keras.utils.get_file('train_tf.tfrecord',\n",
" 'https://storage.googleapis.com/civil_comments_dataset/train_tf.tfrecord')\n",
" validate_tf_file = tf.keras.utils.get_file('validate_tf.tfrecord',\n",
" 'https://storage.googleapis.com/civil_comments_dataset/validate_tf.tfrecord')\n",
"\n",
" # The identity terms list will be grouped together by their categories\n",
" # (see 'IDENTITY_COLUMNS') on threshold 0.5. Only the identity term column,\n",
" # text column and label column will be kept after processing.\n",
" train_tf_file = util.convert_comments_data(train_tf_file)\n",
" validate_tf_file = util.convert_comments_data(validate_tf_file)\n",
"\n",
"else:\n",
" train_tf_file = tf.keras.utils.get_file('train_tf_processed.tfrecord',\n",
" 'https://storage.googleapis.com/civil_comments_dataset/train_tf_processed.tfrecord')\n",
" validate_tf_file = tf.keras.utils.get_file('validate_tf_processed.tfrecord',\n",
" 'https://storage.googleapis.com/civil_comments_dataset/validate_tf_processed.tfrecord')"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "zz1NLR5Uu3oQ"
},
"source": [
"## Create a TensorFlow Model Analysis Pipeline\n",
"\n",
"The Fairness Indicators library operates on [TensorFlow Model Analysis (TFMA) models](https://tensorflow.github.io/model-analysis/get_started). TFMA models wrap TensorFlow models with additional functionality to evaluate and visualize their results. The actual evaluation occurs inside of an [Apache Beam pipeline](https://beam.apache.org/documentation/programming-guide/).\n",
"\n",
"The steps you follow to create a TFMA pipeline are:\n",
"1. Build a TensorFlow model\n",
"2. Build a TFMA model on top of the TensorFlow model\n",
"3. Run the model analysis in an orchestrator. The example model in this notebook uses Apache Beam as the orchestrator."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "7nSvu4IUCigW"
},
"outputs": [],
"source": [
"def embedding_fairness_result(embedding, identity_term='gender'):\n",
" \n",
" model_dir = os.path.join(BASE_DIR, 'train',\n",
" datetime.now().strftime('%Y%m%d-%H%M%S'))\n",
"\n",
" print(\"Training classifier for \" + embedding)\n",
" classifier = example_model.train_model(model_dir,\n",
" train_tf_file,\n",
" LABEL,\n",
" TEXT_FEATURE,\n",
" FEATURE_MAP,\n",
" embedding)\n",
"\n",
" # Create a unique path to store the results for this embedding.\n",
" embedding_name = embedding.split('/')[-2]\n",
" eval_result_path = os.path.join(BASE_DIR, 'eval_result', embedding_name)\n",
"\n",
" example_model.evaluate_model(classifier,\n",
" validate_tf_file,\n",
" eval_result_path,\n",
" identity_term,\n",
" LABEL,\n",
" FEATURE_MAP)\n",
" return tfma.load_eval_result(output_path=eval_result_path)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "jTPqije9Eg5b"
},
"source": [
"## Run TFMA & Fairness Indicators"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "8AvInTNt8Gyn"
},
"source": [
"### Fairness Indicators Metrics\n",
"\n",
"Some of the metrics available with Fairness Indicators are:\n",
"\n",
"* [Negative Rate, False Negative Rate (FNR), and True Negative Rate (TNR)](https://en.wikipedia.org/wiki/False_positives_and_false_negatives#False_positive_and_false_negative_rates)\n",
"* [Positive Rate, False Positive Rate (FPR), and True Positive Rate (TPR)](https://en.wikipedia.org/wiki/False_positives_and_false_negatives#False_positive_and_false_negative_rates)\n",
"* [Accuracy](https://www.tensorflow.org/api_docs/python/tf/keras/metrics/Accuracy)\n",
"* [Precision and Recall](https://en.wikipedia.org/wiki/Precision_and_recall)\n",
"* [Precision-Recall AUC](https://www.tensorflow.org/api_docs/python/tf/keras/metrics/AUC)\n",
"* [ROC AUC](https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "LGXCFtScblYt"
},
"source": [
"### Text Embeddings"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "1CI-1M5qXGjG"
},
"source": [
"**[TF-Hub](https://www.tensorflow.org/hub)** provides several **text embeddings**. These embeddings will serve as the feature column for the different models. This tutorial uses the following embeddings:\n",
"\n",
"* [**random-nnlm-en-dim128**](https://tfhub.dev/google/random-nnlm-en-dim128/1): random text embeddings, this serves as a convenient baseline.\n",
"* [**nnlm-en-dim128**](https://tfhub.dev/google/nnlm-en-dim128/1): a text embedding based on [A Neural Probabilistic Language Model](http://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf). \n",
"* [**universal-sentence-encoder**](https://tfhub.dev/google/universal-sentence-encoder/2): a text embedding based on [Universal Sentence Encoder](https://arxiv.org/pdf/1803.11175.pdf)."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "xxq97Qt7itVL"
},
"source": [
"## Fairness Indicator Results"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "27FX15awixuK"
},
"source": [
"Compute fairness indicators with the `embedding_fairness_result` pipeline, and then render the results in the Fairness Indicator UI widget with `widget_view.render_fairness_indicator` for all the above embeddings.\n",
"\n",
"Note: You may need to run the `widget_view.render_fairness_indicator` cells twice for the visualization to be displayed."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "yEUbZ93y8NCW"
},
"source": [
"#### Random NNLM"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "DkSuox-Pb6Pz"
},
"outputs": [],
"source": [
"eval_result_random_nnlm = embedding_fairness_result('https://tfhub.dev/google/random-nnlm-en-dim128/1')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "05xUesz6VpAe"
},
"outputs": [],
"source": [
"widget_view.render_fairness_indicator(eval_result=eval_result_random_nnlm)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "jmKe8Z1b8SBy"
},
"source": [
"#### NNLM"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "5b8HcTUBckj1"
},
"outputs": [],
"source": [
"eval_result_nnlm = embedding_fairness_result('https://tfhub.dev/google/nnlm-en-dim128/1')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "n6hasLzFVrDN"
},
"outputs": [],
"source": [
"widget_view.render_fairness_indicator(eval_result=eval_result_nnlm)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "1I4xEDNq8T0X"
},
"source": [
"#### Universal Sentence Encoder"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "GrdweWRkck8A"
},
"outputs": [],
"source": [
"eval_result_use = embedding_fairness_result('https://tfhub.dev/google/universal-sentence-encoder/2')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "JBABAkZMVtTK"
},
"outputs": [],
"source": [
"widget_view.render_fairness_indicator(eval_result=eval_result_use)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "402oTKbap77R"
},
"source": [
"### Comparing Embeddings"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "UgnqwNjpqBuv"
},
"source": [
"You can also use Fairness Indicators to compare embeddings directly. For example, compare the models generated from the NNLM and USE embeddings."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "49ECfYWUp7Kk"
},
"outputs": [],
"source": [
"widget_view.render_fairness_indicator(multi_eval_results={'nnlm': eval_result_nnlm, 'use': eval_result_use})"
]
}
],
"metadata": {
"colab": {
"collapsed_sections": [],
"name": "Fairness Indicators on TF-Hub Text Embeddings",
"private_outputs": true,
"provenance": [],
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.22"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
================================================
FILE: docs/tutorials/README.md
================================================
The demos listed here are designed to be used with [Google Colaboratory](https://colab.research.google.com/notebooks/welcome.ipynb), a free cloud-based environment for Jupyter notebooks. They can also be run in a local Jupyter environment.
## Google Colaboratory
To run these demos on the cloud, go to `File` -> `Open notebook` in the Colaboratory toolbar, then click on `Github` and paste in the demo's URL. Alternatively, you can use the [Open in Colab](https://chrome.google.com/webstore/detail/open-in-colab/iogfkhleblhcpcekbiedikdehleodpjo?hl=en) Chrome extension to open a notebook directly from GitHub.
## Local Jupyter Environment
To run these demos on your local machine, you will need to install [Jupyter](https://jupyter.org/install). Then, run the following commands.
jupyter nbextension enable --py widgetsnbextension --sys-prefix
jupyter nbextension install --py --symlink tensorflow_model_analysis --sys-prefix
jupyter nbextension enable --py tensorflow_model_analysis --sys-prefix
Afterwards, you can download any of the `.ipynb` files in this directory and run them via `jupyter notebook`.
================================================
FILE: docs/tutorials/_Deprecated_Fairness_Indicators_Lineage_Case_Study.ipynb
================================================
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "ueCj9KW2QTCP"
},
"source": [
"##### Copyright 2020 The TensorFlow Authors."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "wFk_qMvcQZ8S"
},
"outputs": [],
"source": [
"#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
"# you may not use this file except in compliance with the License.\n",
"# You may obtain a copy of the License at\n",
"#\n",
"# https://www.apache.org/licenses/LICENSE-2.0\n",
"#\n",
"# Unless required by applicable law or agreed to in writing, software\n",
"# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
"# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
"# See the License for the specific language governing permissions and\n",
"# limitations under the License."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "HKYXncPn7mSs"
},
"source": [
"# Fairness Indicators Lineage Case Study"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "d7A099z02DB6"
},
"source": [
"\u003ctable class=\"tfo-notebook-buttons\" align=\"left\"\u003e\n",
" \u003ctd\u003e\n",
" \u003ca target=\"_blank\" href=\"https://www.tensorflow.org/responsible_ai/fairness_indicators/tutorials/Fairness_Indicators_Lineage_Case_Study\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" /\u003eView on TensorFlow.org\u003c/a\u003e\n",
" \u003c/td\u003e\n",
" \u003ctd\u003e\n",
" \u003ca target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/fairness-indicators/blob/master/g3doc/tutorials/Fairness_Indicators_Lineage_Case_Study.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" /\u003eRun in Google Colab\u003c/a\u003e\n",
" \u003c/td\u003e\n",
" \u003ctd\u003e\n",
" \u003ca target=\"_blank\" href=\"https://github.com/tensorflow/fairness-indicators/tree/master/docs/tutorials/Fairness_Indicators_Lineage_Case_Study.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" /\u003eView on GitHub\u003c/a\u003e\n",
" \u003c/td\u003e\n",
" \u003ctd\u003e\n",
" \u003ca href=\"https://storage.googleapis.com/tensorflow_docs/fairness-indicators/g3doc/tutorials/Fairness_Indicators_Lineage_Case_Study.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/download_logo_32px.png\" /\u003eDownload notebook\u003c/a\u003e\n",
" \u003c/td\u003e\n",
" \u003ctd\u003e\n",
" \u003ca href=\"https://tfhub.dev/google/random-nnlm-en-dim128/1\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/hub_logo_32px.png\" /\u003eSee TF Hub model\u003c/a\u003e\n",
" \u003c/td\u003e\n",
"\u003c/table\u003e"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "lOKe4l_TSoKy"
},
"source": [
"\u003e Warning: Estimators are deprecated (not recommended for new code). Estimators run `v1.Session`-style code which is more difficult to write correctly, and can behave unexpectedly, especially when combined with TF 2 code. Estimators do fall under our [compatibility guarantees](https://tensorflow.org/guide/versions), but will receive no fixes other than security vulnerabilities. See the [migration guide](https://tensorflow.org/guide/migrate) for details.\n",
"\n",
"\u003c!--\n",
"TODO(b/192933099): update this to use keras instead of estimators.\n",
"--\u003e"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "oZWUeUxjlMjQ"
},
"source": [
"## COMPAS Dataset\n",
"[COMPAS](https://www.propublica.org/datastore/dataset/compas-recidivism-risk-score-data-and-analysis) (Correctional Offender Management Profiling for Alternative Sanctions) is a public dataset, which contains approximately 18,000 criminal cases from Broward County, Florida between January, 2013 and December, 2014. The data contains information about 11,000 unique defendants, including criminal history demographics, and a risk score intended to represent the defendant’s likelihood of reoffending (recidivism). A machine learning model trained on this data has been used by judges and parole officers to determine whether or not to set bail and whether or not to grant parole. \n",
"\n",
"In 2016, [an article published in ProPublica](https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing) found that the COMPAS model was incorrectly predicting that African-American defendants would recidivate at much higher rates than their white counterparts while Caucasian would not recidivate at a much higher rate. For Caucasian defendants, the model made mistakes in the opposite direction, making incorrect predictions that they wouldn’t commit another crime. The authors went on to show that these biases were likely due to an uneven distribution in the data between African-Americans and Caucasian defendants. Specifically, the ground truth label of a negative example (a defendant **would not** commit another crime) and a positive example (defendant **would** commit another crime) were disproportionate between the two races. Since 2016, the COMPAS dataset has appeared frequently in the ML fairness literature \u003csup\u003e1, 2, 3\u003c/sup\u003e, with researchers using it to demonstrate techniques for identifying and remediating fairness concerns. This [tutorial from the FAT* 2018 conference](https://youtu.be/hEThGT-_5ho?t=1) illustrates how COMPAS can dramatically impact a defendant’s prospects in the real world. \n",
"\n",
"It is important to note that developing a machine learning model to predict pre-trial detention has a number of important ethical considerations. You can learn more about these issues in the Partnership on AI “[Report on Algorithmic Risk Assessment Tools in the U.S. Criminal Justice System](https://www.partnershiponai.org/report-on-machine-learning-in-risk-assessment-tools-in-the-u-s-criminal-justice-system/).” The Partnership on AI is a multi-stakeholder organization -- of which Google is a member -- that creates guidelines around AI.\n",
"\n",
"We’re using the COMPAS dataset only as an example of how to identify and remediate fairness concerns in data. This dataset is canonical in the algorithmic fairness literature. \n",
"\n",
"## About the Tools in this Case Study\n",
"* **[TensorFlow Extended (TFX)](https://www.tensorflow.org/tfx)** is a Google-production-scale machine learning platform based on TensorFlow. It provides a configuration framework and shared libraries to integrate common components needed to define, launch, and monitor your machine learning system.\n",
"\n",
"* **[TensorFlow Model Analysis](https://www.tensorflow.org/tfx/tutorials/model_analysis/tfma_basic)** is a library for evaluating machine learning models. Users can evaluate their models on a large amount of data in a distributed manner and view metrics over different slices within a notebook.\n",
"\n",
"* **[Fairness Indicators](https://tensorflow.github.io/fairness-indicators)** is a suite of tools built on top of TensorFlow Model Analysis that enables regular evaluation of fairness metrics in product pipelines.\n",
"\n",
"* **[ML Metadata](https://www.tensorflow.org/tfx/guide/mlmd)** is a library for recording and retrieving the lineage and metadata of ML artifacts such as models, datasets, and metrics. Within TFX ML Metadata will help us understand the artifacts created in a pipeline, which is a unit of data that is passed between TFX components.\n",
"\n",
"* **[TensorFlow Data Validation](https://www.tensorflow.org/tfx/guide/tfdv)** is a library to analyze your data and check for errors that can affect model training or serving.\n",
"\n",
"\n",
"## Case Study Overview\n",
"\n",
"For the duration of this case study we will define “fairness concerns” as a bias within a model that negatively impacts a slice within our data. Specifically, we’re trying to limit any recidivism prediction that could be biased towards race.\n",
"\n",
"The walk through of the case study will proceed as follows:\n",
"\n",
"1. Download the data, preprocess, and explore the initial dataset.\n",
"2. Build a TFX pipeline with the COMPAS dataset using a Keras binary classifier.\n",
"3. Run our results through TensorFlow Model Analysis, TensorFlow Data Validation, and load Fairness Indicators to explore any potential fairness concerns within our model.\n",
"4. Use ML Metadata to track all the artifacts for a model that we trained with TFX.\n",
"5. Weight the initial COMPAS dataset for our second model to account for the uneven distribution between recidivism and race.\n",
"6. Review the performance changes within the new dataset.\n",
"7. Check the underlying changes within our TFX pipeline with ML Metadata to understand what changes were made between the two models. \n",
"\n",
"## Helpful Resources\n",
"This case study is an extension of the below case studies. It is recommended working through the below case studies first. \n",
"* [TFX Pipeline Overview](https://github.com/tensorflow/workshops/blob/master/tfx_labs/Lab_1_Pipeline_in_Colab.ipynb)\n",
"* [Fairness Indicator Case Study](https://github.com/tensorflow/fairness-indicators/blob/master/docs/tutorials/Fairness_Indicators_Example_Colab.ipynb)\n",
"* [TFX Data Validation](https://github.com/tensorflow/tfx/blob/master/tfx/examples/airflow_workshop/notebooks/step3.ipynb)\n",
"\n",
"\n",
"## Setup\n",
"To start, we will install the necessary packages, download the data, and import the required modules for the case study.\n",
"\n",
"To install the required packages for this case study in your notebook run the below PIP command.\n",
"\n",
"**Note:** See [here](https://github.com/tensorflow/tfx#compatible-versions) for a reference on compatibility between different versions of the libraries used in this case study.\n",
"\n",
"___\n",
"\n",
"1. Wadsworth, C., Vera, F., Piech, C. (2017). Achieving Fairness Through Adversarial Learning: an Application to Recidivism Prediction. https://arxiv.org/abs/1807.00199.\n",
"\n",
"2. Chouldechova, A., G’Sell, M., (2017). Fairer and more accurate, but for whom? https://arxiv.org/abs/1707.00046.\n",
"\n",
"3. Berk et al., (2017), Fairness in Criminal Justice Risk Assessments: The State of the Art, https://arxiv.org/abs/1703.09207.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "both",
"id": "42BmC-ctlMjR"
},
"outputs": [],
"source": [
"!python -m pip install -q -U \\\n",
" tfx \\\n",
" tensorflow-model-analysis \\\n",
" tensorflow-data-validation \\\n",
" tensorflow-metadata \\\n",
" tensorflow-transform \\\n",
" ml-metadata \\\n",
" tfx-bsl"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "yeS4Xy2MlMjW",
"scrolled": true
},
"outputs": [],
"source": [
"import os\n",
"import tempfile\n",
"import six.moves.urllib as urllib\n",
"\n",
"from ml_metadata.metadata_store import metadata_store\n",
"from ml_metadata.proto import metadata_store_pb2\n",
"\n",
"import pandas as pd\n",
"from google.protobuf import text_format\n",
"from sklearn.utils import shuffle\n",
"import tensorflow as tf\n",
"import tensorflow_data_validation as tfdv\n",
"\n",
"import tensorflow_model_analysis as tfma\n",
"from tensorflow_model_analysis.addons.fairness.post_export_metrics import fairness_indicators\n",
"from tensorflow_model_analysis.addons.fairness.view import widget_view\n",
"\n",
"import tfx\n",
"from tfx.components.evaluator.component import Evaluator\n",
"from tfx.components.example_gen.csv_example_gen.component import CsvExampleGen\n",
"from tfx.components.schema_gen.component import SchemaGen\n",
"from tfx.components.statistics_gen.component import StatisticsGen\n",
"from tfx.components.trainer.component import Trainer\n",
"from tfx.components.transform.component import Transform\n",
"from tfx.orchestration.experimental.interactive.interactive_context import InteractiveContext\n",
"from tfx.proto import evaluator_pb2\n",
"from tfx.proto import trainer_pb2"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "YZQLS05WlMjV"
},
"source": [
"## Download and preprocess the dataset\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "7uOVs7WJlMjl"
},
"outputs": [],
"source": [
"# Download the COMPAS dataset and setup the required filepaths.\n",
"_DATA_ROOT = tempfile.mkdtemp(prefix='tfx-data')\n",
"_DATA_PATH = 'https://storage.googleapis.com/compas_dataset/cox-violent-parsed.csv'\n",
"_DATA_FILEPATH = os.path.join(_DATA_ROOT, 'compas-scores-two-years.csv')\n",
"\n",
"data = urllib.request.urlopen(_DATA_PATH)\n",
"_COMPAS_DF = pd.read_csv(data)\n",
"\n",
"# To simpliy the case study, we will only use the columns that will be used for\n",
"# our model.\n",
"_COLUMN_NAMES = [\n",
" 'age',\n",
" 'c_charge_desc',\n",
" 'c_charge_degree',\n",
" 'c_days_from_compas',\n",
" 'is_recid',\n",
" 'juv_fel_count',\n",
" 'juv_misd_count',\n",
" 'juv_other_count',\n",
" 'priors_count',\n",
" 'r_days_from_arrest',\n",
" 'race',\n",
" 'sex',\n",
" 'vr_charge_desc',\n",
"]\n",
"_COMPAS_DF = _COMPAS_DF[_COLUMN_NAMES]\n",
"\n",
"# We will use 'is_recid' as our ground truth lable, which is boolean value\n",
"# indicating if a defendant committed another crime. There are some rows with -1\n",
"# indicating that there is no data. These rows we will drop from training.\n",
"_COMPAS_DF = _COMPAS_DF[_COMPAS_DF['is_recid'] != -1]\n",
"\n",
"# Given the distribution between races in this dataset we will only focuse on\n",
"# recidivism for African-Americans and Caucasians.\n",
"_COMPAS_DF = _COMPAS_DF[\n",
" _COMPAS_DF['race'].isin(['African-American', 'Caucasian'])]\n",
"\n",
"# Adding we weight feature that will be used during the second part of this\n",
"# case study to help improve fairness concerns.\n",
"_COMPAS_DF['sample_weight'] = 0.8\n",
"\n",
"# Load the DataFrame back to a CSV file for our TFX model.\n",
"_COMPAS_DF.to_csv(_DATA_FILEPATH, index=False, na_rep='')"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "JyCQbe5RlMjn"
},
"source": [
"## Building a TFX Pipeline\n",
"\n",
"---\n",
"There are several [TFX Pipeline Components](https://www.tensorflow.org/tfx/guide#tfx_pipeline_components) that can be used for a production model, but for the purpose the this case study will focus on using only the below components: \n",
"* **ExampleGen** to read our dataset.\n",
"* **StatisticsGen** to calculate the statistics of our dataset.\n",
"* **SchemaGen** to create a data schema.\n",
"* **Transform** for feature engineering.\n",
"* **Trainer** to run our machine learning model.\n",
"\n",
"## Create the InteractiveContext\n",
"\n",
"To run TFX within a notebook, we first will need to create an `InteractiveContext` to run the components interactively. \n",
"\n",
"`InteractiveContext` will use a temporary directory with an ephemeral ML Metadata database instance. To use your own pipeline root or database, the optional properties `pipeline_root` and `metadata_connection_config` may be passed to `InteractiveContext`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "XVMS3Dz7xk8M"
},
"outputs": [],
"source": [
"context = InteractiveContext()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "NxAOGNCelMjq"
},
"source": [
"### TFX ExampleGen Component\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "0hzCIDdblMjr"
},
"outputs": [],
"source": [
"# The ExampleGen TFX Pipeline component ingests data into TFX pipelines.\n",
"# It consumes external files/services to generate Examples which will be read by\n",
"# other TFX components. It also provides consistent and configurable partition,\n",
"# and shuffles the dataset for ML best practice.\n",
"\n",
"example_gen = CsvExampleGen(input_base=_DATA_ROOT)\n",
"context.run(example_gen)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "SW23fvThlMjz"
},
"source": [
"### TFX StatisticsGen Component\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "28D_qP3IlMj0",
"scrolled": false
},
"outputs": [],
"source": [
"# The StatisticsGen TFX pipeline component generates features statistics over\n",
"# both training and serving data, which can be used by other pipeline\n",
"# components. StatisticsGen uses Beam to scale to large datasets.\n",
"\n",
"statistics_gen = StatisticsGen(examples=example_gen.outputs['examples'])\n",
"context.run(statistics_gen)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "a72E7hT5lMj9"
},
"source": [
"### TFX SchemaGen Component"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "dkfTgKCBlMj9"
},
"outputs": [],
"source": [
"# Some TFX components use a description of your input data called a schema. The\n",
"# schema is an instance of schema.proto. It can specify data types for feature\n",
"# values, whether a feature has to be present in all examples, allowed value\n",
"# ranges, and other properties. A SchemaGen pipeline component will\n",
"# automatically generate a schema by inferring types, categories, and ranges\n",
"# from the training data.\n",
"\n",
"infer_schema = SchemaGen(statistics=statistics_gen.outputs['statistics'])\n",
"context.run(infer_schema)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "43z_COkolMkI"
},
"source": [
"### TFX Transform Component\n",
"\n",
"The `Transform` component performs data transformations and feature engineering. The results include an input TensorFlow graph which is used during both training and serving to preprocess the data before training or inference. This graph becomes part of the SavedModel that is the result of model training. Since the same input graph is used for both training and serving, the preprocessing will always be the same, and only needs to be written once.\n",
"\n",
"The Transform component requires more code than many other components because of the arbitrary complexity of the feature engineering that you may need for the data and/or model that you're working with.\n",
"\n",
"Define some constants and functions for both the `Transform` component and the `Trainer` component. Define them in a Python module, in this case saved to disk using the `%%writefile` magic command since you are working in a notebook.\n",
"\n",
"The transformation that we will be performing in this case study are as follows:\n",
"* For string values we will generate a vocabulary that maps to an integer via tft.compute_and_apply_vocabulary.\n",
"* For integer values we will standardize the column mean 0 and variance 1 via tft.scale_to_z_score.\n",
"* Remove empty row values and replace them with an empty string or 0 depending on the feature type.\n",
"* Append ‘_xf’ to column names to denote the features that were processed in the Transform Component.\n",
"\n",
"\n",
"Now let's define a module containing the `preprocessing_fn()` function that we will pass to the `Transform` component:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "83MZZqUQlMkJ"
},
"outputs": [],
"source": [
"# Setup paths for the Transform Component.\n",
"_transform_module_file = 'compas_transform.py'"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "NLzxWiOBlMkL"
},
"outputs": [],
"source": [
"%%writefile {_transform_module_file}\n",
"import tensorflow as tf\n",
"import tensorflow_transform as tft\n",
"\n",
"CATEGORICAL_FEATURE_KEYS = [\n",
" 'sex',\n",
" 'race',\n",
" 'c_charge_desc',\n",
" 'c_charge_degree',\n",
"]\n",
"\n",
"INT_FEATURE_KEYS = [\n",
" 'age',\n",
" 'c_days_from_compas',\n",
" 'juv_fel_count',\n",
" 'juv_misd_count',\n",
" 'juv_other_count',\n",
" 'priors_count',\n",
" 'sample_weight',\n",
"]\n",
"\n",
"LABEL_KEY = 'is_recid'\n",
"\n",
"# List of the unique values for the items within CATEGORICAL_FEATURE_KEYS.\n",
"MAX_CATEGORICAL_FEATURE_VALUES = [\n",
" 2,\n",
" 6,\n",
" 513,\n",
" 14,\n",
"]\n",
"\n",
"\n",
"def transformed_name(key):\n",
" return '{}_xf'.format(key)\n",
"\n",
"\n",
"def preprocessing_fn(inputs):\n",
" \"\"\"tf.transform's callback function for preprocessing inputs.\n",
"\n",
" Args:\n",
" inputs: Map from feature keys to raw features.\n",
"\n",
" Returns:\n",
" Map from string feature key to transformed feature operations.\n",
" \"\"\"\n",
" outputs = {}\n",
" for key in CATEGORICAL_FEATURE_KEYS:\n",
" outputs[transformed_name(key)] = tft.compute_and_apply_vocabulary(\n",
" _fill_in_missing(inputs[key]),\n",
" vocab_filename=key)\n",
"\n",
" for key in INT_FEATURE_KEYS:\n",
" outputs[transformed_name(key)] = tft.scale_to_z_score(\n",
" _fill_in_missing(inputs[key]))\n",
"\n",
" # Target label will be to see if the defendant is charged for another crime.\n",
" outputs[transformed_name(LABEL_KEY)] = _fill_in_missing(inputs[LABEL_KEY])\n",
" return outputs\n",
"\n",
"\n",
"def _fill_in_missing(tensor_value):\n",
" \"\"\"Replaces a missing values in a SparseTensor.\n",
"\n",
" Fills in missing values of `tensor_value` with '' or 0, and converts to a\n",
" dense tensor.\n",
"\n",
" Args:\n",
" tensor_value: A `SparseTensor` of rank 2. Its dense shape should have size\n",
" at most 1 in the second dimension.\n",
"\n",
" Returns:\n",
" A rank 1 tensor where missing values of `tensor_value` are filled in.\n",
" \"\"\"\n",
" if not isinstance(tensor_value, tf.sparse.SparseTensor):\n",
" return tensor_value\n",
" default_value = '' if tensor_value.dtype == tf.string else 0\n",
" sparse_tensor = tf.SparseTensor(\n",
" tensor_value.indices,\n",
" tensor_value.values,\n",
" [tensor_value.dense_shape[0], 1])\n",
" dense_tensor = tf.sparse.to_dense(sparse_tensor, default_value)\n",
" return tf.squeeze(dense_tensor, axis=1)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "5yzFOQrPlMkM"
},
"outputs": [],
"source": [
"# Build and run the Transform Component.\n",
"transform = Transform(\n",
" examples=example_gen.outputs['examples'],\n",
" schema=infer_schema.outputs['schema'],\n",
" module_file=_transform_module_file\n",
")\n",
"context.run(transform)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "A_ubj158lMkP"
},
"source": [
"### TFX Trainer Component\n",
"The `Trainer` Component trains a specified TensorFlow model.\n",
"\n",
"In order to run the trainer component we need to create a Python module containing a `trainer_fn` function that will return an estimator for our model. If you prefer creating a Keras model, you can do so and then convert it to an estimator using `keras.model_to_estimator()`.\n",
"\n",
"The `Trainer` component trains a specified TensorFlow model. In order to run the model we need to create a Python module containing a a function called `trainer_fn` function that TFX will call. \n",
"\n",
"For our case study we will build a Keras model that will return will return [`keras.model_to_estimator()`](https://www.tensorflow.org/api_docs/python/tf/keras/estimator/model_to_estimator)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "K9zxx6CnlMkQ"
},
"outputs": [],
"source": [
"# Setup paths for the Trainer Component.\n",
"_trainer_module_file = 'compas_trainer.py'"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "KhuwfYIRlMkR",
"scrolled": true
},
"outputs": [],
"source": [
"%%writefile {_trainer_module_file}\n",
"import tensorflow as tf\n",
"\n",
"import tensorflow_model_analysis as tfma\n",
"import tensorflow_transform as tft\n",
"from tensorflow_transform.tf_metadata import schema_utils\n",
"\n",
"from compas_transform import *\n",
"\n",
"_BATCH_SIZE = 1000\n",
"_LEARNING_RATE = 0.00001\n",
"_MAX_CHECKPOINTS = 1\n",
"_SAVE_CHECKPOINT_STEPS = 999\n",
"\n",
"\n",
"def transformed_names(keys):\n",
" return [transformed_name(key) for key in keys]\n",
"\n",
"\n",
"def transformed_name(key):\n",
" return '{}_xf'.format(key)\n",
"\n",
"\n",
"def _gzip_reader_fn(filenames):\n",
" \"\"\"Returns a record reader that can read gzip'ed files.\n",
"\n",
" Args:\n",
" filenames: A tf.string tensor or tf.data.Dataset containing one or more\n",
" filenames.\n",
"\n",
" Returns: A nested structure of tf.TypeSpec objects matching the structure of\n",
" an element of this dataset and specifying the type of individual components.\n",
" \"\"\"\n",
" return tf.data.TFRecordDataset(filenames, compression_type='GZIP')\n",
"\n",
"\n",
"# Tf.Transform considers these features as \"raw\".\n",
"def _get_raw_feature_spec(schema):\n",
" \"\"\"Generates a feature spec from a Schema proto.\n",
"\n",
" Args:\n",
" schema: A Schema proto.\n",
"\n",
" Returns:\n",
" A feature spec defined as a dict whose keys are feature names and values are\n",
" instances of FixedLenFeature, VarLenFeature or SparseFeature.\n",
" \"\"\"\n",
" return schema_utils.schema_as_feature_spec(schema).feature_spec\n",
"\n",
"\n",
"def _example_serving_receiver_fn(tf_transform_output, schema):\n",
" \"\"\"Builds the serving in inputs.\n",
"\n",
" Args:\n",
" tf_transform_output: A TFTransformOutput.\n",
" schema: the schema of the input data.\n",
"\n",
" Returns:\n",
" TensorFlow graph which parses examples, applying tf-transform to them.\n",
" \"\"\"\n",
" raw_feature_spec = _get_raw_feature_spec(schema)\n",
" raw_feature_spec.pop(LABEL_KEY)\n",
"\n",
" raw_input_fn = tf.estimator.export.build_parsing_serving_input_receiver_fn(\n",
" raw_feature_spec)\n",
" serving_input_receiver = raw_input_fn()\n",
"\n",
" transformed_features = tf_transform_output.transform_raw_features(\n",
" serving_input_receiver.features)\n",
" transformed_features.pop(transformed_name(LABEL_KEY))\n",
" return tf.estimator.export.ServingInputReceiver(\n",
" transformed_features, serving_input_receiver.receiver_tensors)\n",
"\n",
"\n",
"def _eval_input_receiver_fn(tf_transform_output, schema):\n",
" \"\"\"Builds everything needed for the tf-model-analysis to run the model.\n",
"\n",
" Args:\n",
" tf_transform_output: A TFTransformOutput.\n",
" schema: the schema of the input data.\n",
"\n",
" Returns:\n",
" EvalInputReceiver function, which contains:\n",
" - TensorFlow graph which parses raw untransformed features, applies the\n",
" tf-transform preprocessing operators.\n",
" - Set of raw, untransformed features.\n",
" - Label against which predictions will be compared.\n",
" \"\"\"\n",
" # Notice that the inputs are raw features, not transformed features here.\n",
" raw_feature_spec = _get_raw_feature_spec(schema)\n",
"\n",
" serialized_tf_example = tf.compat.v1.placeholder(\n",
" dtype=tf.string, shape=[None], name='input_example_tensor')\n",
"\n",
" # Add a parse_example operator to the tensorflow graph, which will parse\n",
" # raw, untransformed, tf examples.\n",
" features = tf.io.parse_example(\n",
" serialized=serialized_tf_example, features=raw_feature_spec)\n",
"\n",
" transformed_features = tf_transform_output.transform_raw_features(features)\n",
" labels = transformed_features.pop(transformed_name(LABEL_KEY))\n",
"\n",
" receiver_tensors = {'examples': serialized_tf_example}\n",
"\n",
" return tfma.export.EvalInputReceiver(\n",
" features=transformed_features,\n",
" receiver_tensors=receiver_tensors,\n",
" labels=labels)\n",
"\n",
"\n",
"def _input_fn(filenames, tf_transform_output, batch_size=200):\n",
" \"\"\"Generates features and labels for training or evaluation.\n",
"\n",
" Args:\n",
" filenames: List of CSV files to read data from.\n",
" tf_transform_output: A TFTransformOutput.\n",
" batch_size: First dimension size of the Tensors returned by input_fn.\n",
"\n",
" Returns:\n",
" A (features, indices) tuple where features is a dictionary of\n",
" Tensors, and indices is a single Tensor of label indices.\n",
" \"\"\"\n",
" transformed_feature_spec = (\n",
" tf_transform_output.transformed_feature_spec().copy())\n",
"\n",
" dataset = tf.compat.v1.data.experimental.make_batched_features_dataset(\n",
" filenames,\n",
" batch_size,\n",
" transformed_feature_spec,\n",
" shuffle=False,\n",
" reader=_gzip_reader_fn)\n",
"\n",
" transformed_features = dataset.make_one_shot_iterator().get_next()\n",
"\n",
" # We pop the label because we do not want to use it as a feature while we're\n",
" # training.\n",
" return transformed_features, transformed_features.pop(\n",
" transformed_name(LABEL_KEY))\n",
"\n",
"\n",
"def _keras_model_builder():\n",
" \"\"\"Build a keras model for COMPAS dataset classification.\n",
" \n",
" Returns:\n",
" A compiled Keras model.\n",
" \"\"\"\n",
" feature_columns = []\n",
" feature_layer_inputs = {}\n",
"\n",
" for key in transformed_names(INT_FEATURE_KEYS):\n",
" feature_columns.append(tf.feature_column.numeric_column(key))\n",
" feature_layer_inputs[key] = tf.keras.Input(shape=(1,), name=key)\n",
"\n",
" for key, num_buckets in zip(transformed_names(CATEGORICAL_FEATURE_KEYS),\n",
" MAX_CATEGORICAL_FEATURE_VALUES):\n",
" feature_columns.append(\n",
" tf.feature_column.indicator_column(\n",
" tf.feature_column.categorical_column_with_identity(\n",
" key, num_buckets=num_buckets)))\n",
" feature_layer_inputs[key] = tf.keras.Input(\n",
" shape=(1,), name=key, dtype=tf.dtypes.int32)\n",
"\n",
" feature_columns_input = tf.keras.layers.DenseFeatures(feature_columns)\n",
" feature_layer_outputs = feature_columns_input(feature_layer_inputs)\n",
"\n",
" dense_layers = tf.keras.layers.Dense(\n",
" 20, activation='relu', name='dense_1')(feature_layer_outputs)\n",
" dense_layers = tf.keras.layers.Dense(\n",
" 10, activation='relu', name='dense_2')(dense_layers)\n",
" output = tf.keras.layers.Dense(\n",
" 1, name='predictions')(dense_layers)\n",
"\n",
" model = tf.keras.Model(\n",
" inputs=[v for v in feature_layer_inputs.values()], outputs=output)\n",
"\n",
" model.compile(\n",
" loss=tf.keras.losses.MeanAbsoluteError(),\n",
" optimizer=tf.optimizers.Adam(learning_rate=_LEARNING_RATE))\n",
"\n",
" return model\n",
"\n",
"\n",
"# TFX will call this function.\n",
"def trainer_fn(hparams, schema):\n",
" \"\"\"Build the estimator using the high level API.\n",
"\n",
" Args:\n",
" hparams: Hyperparameters used to train the model as name/value pairs.\n",
" schema: Holds the schema of the training examples.\n",
"\n",
" Returns:\n",
" A dict of the following:\n",
" - estimator: The estimator that will be used for training and eval.\n",
" - train_spec: Spec for training.\n",
" - eval_spec: Spec for eval.\n",
" - eval_input_receiver_fn: Input function for eval.\n",
" \"\"\"\n",
" tf_transform_output = tft.TFTransformOutput(hparams.transform_output)\n",
"\n",
" train_input_fn = lambda: _input_fn(\n",
" hparams.train_files,\n",
" tf_transform_output,\n",
" batch_size=_BATCH_SIZE)\n",
"\n",
" eval_input_fn = lambda: _input_fn(\n",
" hparams.eval_files,\n",
" tf_transform_output,\n",
" batch_size=_BATCH_SIZE)\n",
"\n",
" train_spec = tf.estimator.TrainSpec(\n",
" train_input_fn,\n",
" max_steps=hparams.train_steps)\n",
"\n",
" serving_receiver_fn = lambda: _example_serving_receiver_fn(\n",
" tf_transform_output, schema)\n",
"\n",
" exporter = tf.estimator.FinalExporter('compas', serving_receiver_fn)\n",
" eval_spec = tf.estimator.EvalSpec(\n",
" eval_input_fn,\n",
" steps=hparams.eval_steps,\n",
" exporters=[exporter],\n",
" name='compas-eval')\n",
"\n",
" run_config = tf.estimator.RunConfig(\n",
" save_checkpoints_steps=_SAVE_CHECKPOINT_STEPS,\n",
" keep_checkpoint_max=_MAX_CHECKPOINTS)\n",
"\n",
" run_config = run_config.replace(model_dir=hparams.serving_model_dir)\n",
"\n",
" estimator = tf.keras.estimator.model_to_estimator(\n",
" keras_model=_keras_model_builder(), config=run_config)\n",
"\n",
" # Create an input receiver for TFMA processing.\n",
" receiver_fn = lambda: _eval_input_receiver_fn(tf_transform_output, schema)\n",
"\n",
" return {\n",
" 'estimator': estimator,\n",
" 'train_spec': train_spec,\n",
" 'eval_spec': eval_spec,\n",
" 'eval_input_receiver_fn': receiver_fn\n",
" }"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "oiC1wABllMkU",
"scrolled": false
},
"outputs": [],
"source": [
"# Uses user-provided Python function that implements a model using TensorFlow's\n",
"# Estimators API.\n",
"trainer = Trainer(\n",
" module_file=_trainer_module_file,\n",
" transformed_examples=transform.outputs['transformed_examples'],\n",
" schema=infer_schema.outputs['schema'],\n",
" transform_graph=transform.outputs['transform_graph'],\n",
" train_args=trainer_pb2.TrainArgs(num_steps=10000),\n",
" eval_args=trainer_pb2.EvalArgs(num_steps=5000)\n",
")\n",
"context.run(trainer)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "0tfnGpl2lMkv"
},
"source": [
"## TensorFlow Model Analysis\n",
"\n",
"Now that our model is trained developed and trained within TFX, we can use several additional components within the TFX exosystem to understand our models performance in a little more detail. By looking at different metrics we’re able to get a better picture of how the overall model performs for different slices within our model to make sure our model is not underperforming for any subgroup.\n",
"\n",
"First we'll examine TensorFlow Model Analysis, which is a library for evaluating TensorFlow models. It allows users to evaluate their models on large amounts of data in a distributed manner, using the same metrics defined in their trainer. These metrics can be computed over different slices of data and visualized in a notebook.\n",
"\n",
"For a list of possible metrics that can be added into TensorFlow Model Analysis see [here](https://github.com/tensorflow/model-analysis/blob/master/g3doc/metrics.md).\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "i8VdZ4z3lMk0"
},
"outputs": [],
"source": [
"# Uses TensorFlow Model Analysis to compute a evaluation statistics over\n",
"# features of a model.\n",
"model_analyzer = Evaluator(\n",
" examples=example_gen.outputs['examples'],\n",
" model=trainer.outputs['model'],\n",
"\n",
" eval_config = text_format.Parse(\"\"\"\n",
" model_specs {\n",
" label_key: 'is_recid'\n",
" }\n",
" metrics_specs {\n",
" metrics {class_name: \"BinaryAccuracy\"}\n",
" metrics {class_name: \"AUC\"}\n",
" metrics {\n",
" class_name: \"FairnessIndicators\"\n",
" config: '{\"thresholds\": [0.25, 0.5, 0.75]}'\n",
" }\n",
" }\n",
" slicing_specs {\n",
" feature_keys: 'race'\n",
" }\n",
" \"\"\", tfma.EvalConfig())\n",
")\n",
"context.run(model_analyzer)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "gXGxPEAnBkUM"
},
"source": [
"## Fairness Indicators\n",
"\n",
"Load Fairness Indicators to examine the underlying data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "4ZgUtH_OBg2x"
},
"outputs": [],
"source": [
"evaluation_uri = model_analyzer.outputs['evaluation'].get()[0].uri\n",
"eval_result = tfma.load_eval_result(evaluation_uri)\n",
"tfma.addons.fairness.view.widget_view.render_fairness_indicator(eval_result)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "igoChEEblMk4"
},
"source": [
"Fairness Indicators will allow us to drill down to see the performance of different slices and is designed to support teams in evaluating and improving models for fairness concerns. It enables easy computation of binary and multiclass classifiers and will allow you to evaluate across any size of use case.\n",
"\n",
"We willl load Fairness Indicators into this notebook and analyse the results and take a look at the results. After you have had a moment explored with Fairness Indicators, examine the False Positive Rate and False Negative Rate tabs in the tool. In this case study, we're concerned with trying to reduce the number of false predictions of recidivism, corresponding to the [False Positive Rate](https://en.wikipedia.org/wiki/Receiver_operating_characteristic).\n",
"\n",
"\n",
"\n",
"Within Fairness Indicator tool you'll see two dropdowns options:\n",
"1. A \"Baseline\" option that is set by `column_for_slicing`.\n",
"2. A \"Thresholds\" option that is set by `fairness_indicator_thresholds`.\n",
"\n",
"“Baseline” is the slice you want to compare all other slices to. Most commonly, it is represented by the overall slice, but can also be one of the specific slices as well. \n",
"\n",
"\"Threshold\" is a value set within a given binary classification model to indicate where a prediction should be placed. When setting a threshold there are two things you should keep in mind.\n",
"\n",
"1. Precision: What is the downside if your prediction results in a Type 1 error? In this case study a higher threshold would mean we're predicting more defendants *will* commit another crime when they actually *don't*.\n",
"2. Recall: What is the downside of a Type II error? In this case study a higher threshold would mean we're predicting more defendants *will not* commit another crime when they actually *do*.\n",
"\n",
"We will set arbitrary thresholds at 0.75 and we will only focus on the fairness metrics for African-American and Caucasian defendants given the small sample sizes for the other races, which aren’t large enough to draw statistically significant conclusions.\n",
"\n",
"The rates of the below might differ slightly based on how the data was shuffled at the beginning of this case study, but take a look at the difference between the data between African-American and Caucasian defendants. At a lower threshold our model is more likely to predict that a Caucasian defended will commit a second crime compared to an African-American defended. However this prediction inverts as we increase our threshold. \n",
"\n",
"* **False Positive Rate @ 0.75**\n",
" * **African-American:** ~30%\n",
" * AUC: 0.71\n",
" * Binary Accuracy: 0.67\n",
" * **Caucasian:** ~8%\n",
" * AUC: 0.71\n",
" * AUC: 0.67\n",
"\n",
"More information on Type I/II errors and threshold setting can be found [here](https://developers.google.com/machine-learning/crash-course/classification/thresholding).\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Mpbs4x9dB2PA"
},
"source": [
"## ML Metadata\n",
"\n",
"To understand where disparity could be coming from and to take a snapshot of our current model, we can use ML Metadata for recording and retrieving metadata associated with our model. ML Metadata is an integral part of TFX, but is designed so that it can be used independently.\n",
"\n",
"For this case study, we will list all artifacts that we developed previously within this case study. By cycling through the artifacts, executions, and context we will have a high level view of our TFX model to dig into where any potential issues are coming from. This will provide us a baseline overview of how our model was developed and what TFX components helped to develop our initial model.\n",
"\n",
"We will start by first laying out the high level artifacts, execution, and context types in our model.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "0wjiFKOxlMkn"
},
"outputs": [],
"source": [
"# Connect to the TFX database.\n",
"connection_config = metadata_store_pb2.ConnectionConfig()\n",
"\n",
"connection_config.sqlite.filename_uri = os.path.join(\n",
" context.pipeline_root, 'metadata.sqlite')\n",
"store = metadata_store.MetadataStore(connection_config)\n",
"\n",
"def _mlmd_type_to_dataframe(mlmd_type):\n",
" \"\"\"Helper function to turn MLMD into a Pandas DataFrame.\n",
"\n",
" Args:\n",
" mlmd_type: Metadata store type.\n",
"\n",
" Returns:\n",
" DataFrame containing type ID, Name, and Properties.\n",
" \"\"\"\n",
" pd.set_option('display.max_columns', None) \n",
" pd.set_option('display.expand_frame_repr', False)\n",
"\n",
" column_names = ['ID', 'Name', 'Properties']\n",
" df = pd.DataFrame(columns=column_names)\n",
" for a_type in mlmd_type:\n",
" mlmd_row = pd.DataFrame([[a_type.id, a_type.name, a_type.properties]],\n",
" columns=column_names)\n",
" df = df.append(mlmd_row)\n",
" return df\n",
"\n",
"# ML Metadata stores strong-typed Artifacts, Executions, and Contexts.\n",
"# First, we can use type APIs to understand what is defined in ML Metadata\n",
"# by the current version of TFX. We'll be able to view all the previous runs\n",
"# that created our initial model.\n",
"print('Artifact Types:')\n",
"display(_mlmd_type_to_dataframe(store.get_artifact_types()))\n",
"\n",
"print('\\nExecution Types:')\n",
"display(_mlmd_type_to_dataframe(store.get_execution_types()))\n",
"\n",
"print('\\nContext Types:')\n",
"display(_mlmd_type_to_dataframe(store.get_context_types()))\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "lJQoer33ZEXD"
},
"source": [
"## Identify where the fairness issue could be coming from\n",
"\n",
"For each of the above artifacts, execution, and context types we can use ML Metadata to dig into the attributes and how each part of our ML pipeline was developed.\n",
"\n",
"We'll start by diving into the `StatisticsGen` to examine the underlying data that we initially fed into the model. By knowing the artifacts within our model we can use ML Metadata and TensorFlow Data Validation to look backward and forward within the model to identify where a potential problem is coming from.\n",
"\n",
"After running the below cell, select `Lift (Y=1)` in the second chart on the `Chart to show` tab to see the [lift](https://en.wikipedia.org/wiki/Lift_(data_mining)) between the different data slices. Within `race`, the lift for African-American is approximatly 1.08 whereas Caucasian is approximatly 0.86."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "xvcw9KL0byeY"
},
"outputs": [],
"source": [
"statistics_gen = StatisticsGen(\n",
" examples=example_gen.outputs['examples'],\n",
" schema=infer_schema.outputs['schema'],\n",
" stats_options=tfdv.StatsOptions(label_feature='is_recid'))\n",
"exec_result = context.run(statistics_gen)\n",
"\n",
"for event in store.get_events_by_execution_ids([exec_result.execution_id]):\n",
" if event.path.steps[0].key == 'statistics':\n",
" statistics_w_schema_uri = store.get_artifacts_by_id([event.artifact_id])[0].uri\n",
"\n",
"model_stats = tfdv.load_statistics(\n",
" os.path.join(statistics_w_schema_uri, 'eval/stats_tfrecord/'))\n",
"tfdv.visualize_statistics(model_stats)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ofWXz48zzlGT"
},
"source": [
"## Tracking a Model Change\n",
"\n",
"Now that we have an idea on how we could improve the fairness of our model, we will first document our initial run within the ML Metadata for our own record and for anyone else that might review our changes at a future time.\n",
"\n",
"ML Metadata can keep a log of our past models along with any notes that we would like to add between runs. We'll add a simple note on our first run denoting that this run was done on the full COMPAS dataset"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "GCQ-7kzMRbXM"
},
"outputs": [],
"source": [
"_MODEL_NOTE_TO_ADD = 'First model that contains fairness concerns in the model.'\n",
"\n",
"first_trained_model = store.get_artifacts_by_type('Model')[-1]\n",
"\n",
"# Add the two notes above to the ML metadata.\n",
"first_trained_model.custom_properties['note'].string_value = _MODEL_NOTE_TO_ADD\n",
"store.put_artifacts([first_trained_model])\n",
"\n",
"def _mlmd_model_to_dataframe(model, model_number):\n",
" \"\"\"Helper function to turn a MLMD modle into a Pandas DataFrame.\n",
"\n",
" Args:\n",
" model: Metadata store model.\n",
" model_number: Number of model run within ML Metadata.\n",
"\n",
" Returns:\n",
" DataFrame containing the ML Metadata model.\n",
" \"\"\"\n",
" pd.set_option('display.max_columns', None) \n",
" pd.set_option('display.expand_frame_repr', False)\n",
"\n",
" df = pd.DataFrame()\n",
" custom_properties = ['name', 'note', 'state', 'producer_component',\n",
" 'pipeline_name']\n",
" df['id'] = [model[model_number].id]\n",
" df['uri'] = [model[model_number].uri]\n",
" for prop in custom_properties:\n",
" df[prop] = model[model_number].custom_properties.get(prop)\n",
" df[prop] = df[prop].astype(str).map(\n",
" lambda x: x.lstrip('string_value: \"').rstrip('\"\\n'))\n",
" return df\n",
"\n",
"# Print the current model to see the results of the ML Metadata for the model.\n",
"display(_mlmd_model_to_dataframe(store.get_artifacts_by_type('Model'), 0))"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "-gwiNtcoeO8S"
},
"source": [
"## Improving fairness concerns by weighting the model\n",
"\n",
"\n",
"There are several ways we can approach fixing fairness concerns within a model. Manipulating observed data/labels, implementing fairness constraints, or prejudice removal by regularization are some techniques\u003csup\u003e1\u003c/sup\u003e that have been used to fix fairness concerns. In this case study we will reweight the model by implementing a custom loss function into Keras.\n",
"\n",
"The code below is the same as the above Transform Component but with the exception of a new class called `LogisticEndpoint` that we will use for our loss within Keras and a few parameter changes.\n",
"\n",
"___\n",
"\n",
"1. Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., Galstyan, N. (2019). A Survey on Bias and Fairness in Machine Learning. https://arxiv.org/pdf/1908.09635.pdf\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "yzLWm3-1Zjvv"
},
"outputs": [],
"source": [
"%%writefile {_trainer_module_file}\n",
"import numpy as np\n",
"import tensorflow as tf\n",
"\n",
"import tensorflow_model_analysis as tfma\n",
"import tensorflow_transform as tft\n",
"from tensorflow_transform.tf_metadata import schema_utils\n",
"\n",
"from compas_transform import *\n",
"\n",
"_BATCH_SIZE = 1000\n",
"_LEARNING_RATE = 0.00001\n",
"_MAX_CHECKPOINTS = 1\n",
"_SAVE_CHECKPOINT_STEPS = 999\n",
"\n",
"\n",
"def transformed_names(keys):\n",
" return [transformed_name(key) for key in keys]\n",
"\n",
"\n",
"def transformed_name(key):\n",
" return '{}_xf'.format(key)\n",
"\n",
"\n",
"def _gzip_reader_fn(filenames):\n",
" \"\"\"Returns a record reader that can read gzip'ed files.\n",
"\n",
" Args:\n",
" filenames: A tf.string tensor or tf.data.Dataset containing one or more\n",
" filenames.\n",
"\n",
" Returns: A nested structure of tf.TypeSpec objects matching the structure of\n",
" an element of this dataset and specifying the type of individual components.\n",
" \"\"\"\n",
" return tf.data.TFRecordDataset(filenames, compression_type='GZIP')\n",
"\n",
"\n",
"# Tf.Transform considers these features as \"raw\".\n",
"def _get_raw_feature_spec(schema):\n",
" \"\"\"Generates a feature spec from a Schema proto.\n",
"\n",
" Args:\n",
" schema: A Schema proto.\n",
"\n",
" Returns:\n",
" A feature spec defined as a dict whose keys are feature names and values are\n",
" instances of FixedLenFeature, VarLenFeature or SparseFeature.\n",
" \"\"\"\n",
" return schema_utils.schema_as_feature_spec(schema).feature_spec\n",
"\n",
"\n",
"def _example_serving_receiver_fn(tf_transform_output, schema):\n",
" \"\"\"Builds the serving in inputs.\n",
"\n",
" Args:\n",
" tf_transform_output: A TFTransformOutput.\n",
" schema: the schema of the input data.\n",
"\n",
" Returns:\n",
" TensorFlow graph which parses examples, applying tf-transform to them.\n",
" \"\"\"\n",
" raw_feature_spec = _get_raw_feature_spec(schema)\n",
" raw_feature_spec.pop(LABEL_KEY)\n",
"\n",
" raw_input_fn = tf.estimator.export.build_parsing_serving_input_receiver_fn(\n",
" raw_feature_spec)\n",
" serving_input_receiver = raw_input_fn()\n",
"\n",
" transformed_features = tf_transform_output.transform_raw_features(\n",
" serving_input_receiver.features)\n",
" transformed_features.pop(transformed_name(LABEL_KEY))\n",
" return tf.estimator.export.ServingInputReceiver(\n",
" transformed_features, serving_input_receiver.receiver_tensors)\n",
"\n",
"\n",
"def _eval_input_receiver_fn(tf_transform_output, schema):\n",
" \"\"\"Builds everything needed for the tf-model-analysis to run the model.\n",
"\n",
" Args:\n",
" tf_transform_output: A TFTransformOutput.\n",
" schema: the schema of the input data.\n",
"\n",
" Returns:\n",
" EvalInputReceiver function, which contains:\n",
" - TensorFlow graph which parses raw untransformed features, applies the\n",
" tf-transform preprocessing operators.\n",
" - Set of raw, untransformed features.\n",
" - Label against which predictions will be compared.\n",
" \"\"\"\n",
" # Notice that the inputs are raw features, not transformed features here.\n",
" raw_feature_spec = _get_raw_feature_spec(schema)\n",
"\n",
" serialized_tf_example = tf.compat.v1.placeholder(\n",
" dtype=tf.string, shape=[None], name='input_example_tensor')\n",
"\n",
" # Add a parse_example operator to the tensorflow graph, which will parse\n",
" # raw, untransformed, tf examples.\n",
" features = tf.io.parse_example(\n",
" serialized=serialized_tf_example, features=raw_feature_spec)\n",
"\n",
" transformed_features = tf_transform_output.transform_raw_features(features)\n",
" labels = transformed_features.pop(transformed_name(LABEL_KEY))\n",
"\n",
" receiver_tensors = {'examples': serialized_tf_example}\n",
"\n",
" return tfma.export.EvalInputReceiver(\n",
" features=transformed_features,\n",
" receiver_tensors=receiver_tensors,\n",
" labels=labels)\n",
"\n",
"\n",
"def _input_fn(filenames, tf_transform_output, batch_size=200):\n",
" \"\"\"Generates features and labels for training or evaluation.\n",
"\n",
" Args:\n",
" filenames: List of CSV files to read data from.\n",
" tf_transform_output: A TFTransformOutput.\n",
" batch_size: First dimension size of the Tensors returned by input_fn.\n",
"\n",
" Returns:\n",
" A (features, indices) tuple where features is a dictionary of\n",
" Tensors, and indices is a single Tensor of label indices.\n",
" \"\"\"\n",
" transformed_feature_spec = (\n",
" tf_transform_output.transformed_feature_spec().copy())\n",
"\n",
" dataset = tf.compat.v1.data.experimental.make_batched_features_dataset(\n",
" filenames,\n",
" batch_size,\n",
" transformed_feature_spec,\n",
" shuffle=False,\n",
" reader=_gzip_reader_fn)\n",
"\n",
" transformed_features = dataset.make_one_shot_iterator().get_next()\n",
"\n",
" # We pop the label because we do not want to use it as a feature while we're\n",
" # training.\n",
" return transformed_features, transformed_features.pop(\n",
" transformed_name(LABEL_KEY))\n",
"\n",
"\n",
"# TFX will call this function.\n",
"def trainer_fn(hparams, schema):\n",
" \"\"\"Build the estimator using the high level API.\n",
"\n",
" Args:\n",
" hparams: Hyperparameters used to train the model as name/value pairs.\n",
" schema: Holds the schema of the training examples.\n",
"\n",
" Returns:\n",
" A dict of the following:\n",
" - estimator: The estimator that will be used for training and eval.\n",
" - train_spec: Spec for training.\n",
" - eval_spec: Spec for eval.\n",
" - eval_input_receiver_fn: Input function for eval.\n",
" \"\"\"\n",
" tf_transform_output = tft.TFTransformOutput(hparams.transform_output)\n",
"\n",
" train_input_fn = lambda: _input_fn(\n",
" hparams.train_files,\n",
" tf_transform_output,\n",
" batch_size=_BATCH_SIZE)\n",
"\n",
" eval_input_fn = lambda: _input_fn(\n",
" hparams.eval_files,\n",
" tf_transform_output,\n",
" batch_size=_BATCH_SIZE)\n",
"\n",
" train_spec = tf.estimator.TrainSpec(\n",
" train_input_fn,\n",
" max_steps=hparams.train_steps)\n",
"\n",
" serving_receiver_fn = lambda: _example_serving_receiver_fn(\n",
" tf_transform_output, schema)\n",
"\n",
" exporter = tf.estimator.FinalExporter('compas', serving_receiver_fn)\n",
" eval_spec = tf.estimator.EvalSpec(\n",
" eval_input_fn,\n",
" steps=hparams.eval_steps,\n",
" exporters=[exporter],\n",
" name='compas-eval')\n",
"\n",
" run_config = tf.estimator.RunConfig(\n",
" save_checkpoints_steps=_SAVE_CHECKPOINT_STEPS,\n",
" keep_checkpoint_max=_MAX_CHECKPOINTS)\n",
"\n",
" run_config = run_config.replace(model_dir=hparams.serving_model_dir)\n",
"\n",
" estimator = tf.keras.estimator.model_to_estimator(\n",
" keras_model=_keras_model_builder(), config=run_config)\n",
"\n",
" # Create an input receiver for TFMA processing.\n",
" receiver_fn = lambda: _eval_input_receiver_fn(tf_transform_output, schema)\n",
"\n",
" return {\n",
" 'estimator': estimator,\n",
" 'train_spec': train_spec,\n",
" 'eval_spec': eval_spec,\n",
" 'eval_input_receiver_fn': receiver_fn\n",
" }\n",
"\n",
"\n",
"def _keras_model_builder():\n",
" \"\"\"Build a keras model for COMPAS dataset classification.\n",
" \n",
" Returns:\n",
" A compiled Keras model.\n",
" \"\"\"\n",
" feature_columns = []\n",
" feature_layer_inputs = {}\n",
"\n",
" for key in transformed_names(INT_FEATURE_KEYS):\n",
" feature_columns.append(tf.feature_column.numeric_column(key))\n",
" feature_layer_inputs[key] = tf.keras.Input(shape=(1,), name=key)\n",
"\n",
" for key, num_buckets in zip(transformed_names(CATEGORICAL_FEATURE_KEYS),\n",
" MAX_CATEGORICAL_FEATURE_VALUES):\n",
" feature_columns.append(\n",
" tf.feature_column.indicator_column(\n",
" tf.feature_column.categorical_column_with_identity(\n",
" key, num_buckets=num_buckets)))\n",
" feature_layer_inputs[key] = tf.keras.Input(\n",
" shape=(1,), name=key, dtype=tf.dtypes.int32)\n",
"\n",
" feature_columns_input = tf.keras.layers.DenseFeatures(feature_columns)\n",
" feature_layer_outputs = feature_columns_input(feature_layer_inputs)\n",
"\n",
" dense_layers = tf.keras.layers.Dense(\n",
" 20, activation='relu', name='dense_1')(feature_layer_outputs)\n",
" dense_layers = tf.keras.layers.Dense(\n",
" 10, activation='relu', name='dense_2')(dense_layers)\n",
" output = tf.keras.layers.Dense(\n",
" 1, name='predictions')(dense_layers)\n",
"\n",
" model = tf.keras.Model(\n",
" inputs=[v for v in feature_layer_inputs.values()], outputs=output)\n",
"\n",
" # To weight our model we will develop a custom loss class within Keras.\n",
" # The old loss is commented out below and the new one is added in below.\n",
" model.compile(\n",
" # loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),\n",
" loss=LogisticEndpoint(),\n",
" optimizer=tf.optimizers.Adam(learning_rate=_LEARNING_RATE))\n",
"\n",
" return model\n",
"\n",
"\n",
"class LogisticEndpoint(tf.keras.layers.Layer):\n",
"\n",
" def __init__(self, name=None):\n",
" super(LogisticEndpoint, self).__init__(name=name)\n",
" self.loss_fn = tf.keras.losses.BinaryCrossentropy(from_logits=True)\n",
"\n",
" def __call__(self, y_true, y_pred, sample_weight=None):\n",
" inputs = [y_true, y_pred]\n",
" inputs += sample_weight or ['sample_weight_xf']\n",
" return super(LogisticEndpoint, self).__call__(inputs)\n",
"\n",
" def call(self, inputs):\n",
" y_true, y_pred = inputs[0], inputs[1]\n",
" if len(inputs) == 3:\n",
" sample_weight = inputs[2]\n",
" else:\n",
" sample_weight = None\n",
" loss = self.loss_fn(y_true, y_pred, sample_weight)\n",
" self.add_loss(loss)\n",
" reduce_loss = tf.math.divide_no_nan(\n",
" tf.math.reduce_sum(tf.nn.softmax(y_pred)), _BATCH_SIZE)\n",
" return reduce_loss\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "thSmshFN94pt"
},
"source": [
"## Retrain the TFX model with the weighted model\n",
"\n",
"In this next part we will use the weighted Transform Component to rerun the same Trainer model as before to see the improvement in fairness after the weighting is applied."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Bb0Rl9UOFgoM"
},
"outputs": [],
"source": [
"trainer_weighted = Trainer(\n",
" module_file=_trainer_module_file,\n",
" transformed_examples=transform.outputs['transformed_examples'],\n",
" schema=infer_schema.outputs['schema'],\n",
" transform_graph=transform.outputs['transform_graph'],\n",
" train_args=trainer_pb2.TrainArgs(num_steps=10000),\n",
" eval_args=trainer_pb2.EvalArgs(num_steps=5000)\n",
")\n",
"context.run(trainer_weighted)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "n7xH61MCPwUO"
},
"outputs": [],
"source": [
"# Again, we will run TensorFlow Model Analysis and load Fairness Indicators\n",
"# to examine the performance change in our weighted model.\n",
"model_analyzer_weighted = Evaluator(\n",
" examples=example_gen.outputs['examples'],\n",
" model=trainer_weighted.outputs['model'],\n",
"\n",
" eval_config = text_format.Parse(\"\"\"\n",
" model_specs {\n",
" label_key: 'is_recid'\n",
" }\n",
" metrics_specs {\n",
" metrics {class_name: 'BinaryAccuracy'}\n",
" metrics {class_name: 'AUC'}\n",
" metrics {\n",
" class_name: 'FairnessIndicators'\n",
" config: '{\"thresholds\": [0.25, 0.5, 0.75]}'\n",
" }\n",
" }\n",
" slicing_specs {\n",
" feature_keys: 'race'\n",
" }\n",
" \"\"\", tfma.EvalConfig())\n",
")\n",
"context.run(model_analyzer_weighted)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "206gQS1r-1FX"
},
"outputs": [],
"source": [
"evaluation_uri_weighted = model_analyzer_weighted.outputs['evaluation'].get()[0].uri\n",
"eval_result_weighted = tfma.load_eval_result(evaluation_uri_weighted)\n",
"\n",
"multi_eval_results = {\n",
" 'Unweighted Model': eval_result,\n",
" 'Weighted Model': eval_result_weighted\n",
"}\n",
"tfma.addons.fairness.view.widget_view.render_fairness_indicator(\n",
" multi_eval_results=multi_eval_results)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "bwoz69Wzvt8q"
},
"source": [
"After retraining our results with the weighted model, we can once again look at the fairness metrics to gauge any improvements in the model. This time, however, we will use the model comparison feature within Fairness Indicators to see the difference between the weighted and unweighted model. Although we’re still seeing some fairness concerns with the weighted model, the discrepancy is far less pronounced.\n",
"\n",
"The drawback, however, is that our AUC and binary accuracy has also dropped after weighting the model.\n",
"\n",
"\n",
"* **False Positive Rate @ 0.75**\n",
" * **African-American:** ~1%\n",
" * AUC: 0.47\n",
" * Binary Accuracy: 0.59\n",
" * **Caucasian:** ~0%\n",
" * AUC: 0.47\n",
" * Binary Accuracy: 0.58\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "oEhq3ne7gazf"
},
"source": [
"## Examine the data of the second run\n",
"\n",
"Finally, we can visualize the data with TensorFlow Data Validation and overlay the data changes between the two models and add an additional note to the ML Metadata indicating that this model has improved the fairness concerns."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "WM-uqqfOggcw"
},
"outputs": [],
"source": [
"# Pull the URI for the two models that we ran in this case study.\n",
"first_model_uri = store.get_artifacts_by_type('ExampleStatistics')[-1].uri\n",
"second_model_uri = store.get_artifacts_by_type('ExampleStatistics')[0].uri\n",
"\n",
"# Load the stats for both models.\n",
"first_model_uri = tfdv.load_statistics(os.path.join(\n",
" first_model_uri, 'eval/stats_tfrecord/'))\n",
"second_model_stats = tfdv.load_statistics(os.path.join(\n",
" second_model_uri, 'eval/stats_tfrecord/'))\n",
"\n",
"# Visualize the statistics between the two models.\n",
"tfdv.visualize_statistics(\n",
" lhs_statistics=second_model_stats,\n",
" lhs_name='Sampled Model',\n",
" rhs_statistics=first_model_uri,\n",
" rhs_name='COMPAS Orginal')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "YOMbqITkhNkO"
},
"outputs": [],
"source": [
"# Add a new note within ML Metadata describing the weighted model.\n",
"_NOTE_TO_ADD = 'Weighted model between race and is_recid.'\n",
"\n",
"# Pulling the URI for the weighted trained model.\n",
"second_trained_model = store.get_artifacts_by_type('Model')[-1]\n",
"\n",
"# Add the note to ML Metadata.\n",
"second_trained_model.custom_properties['note'].string_value = _NOTE_TO_ADD\n",
"store.put_artifacts([second_trained_model])\n",
"\n",
"display(_mlmd_model_to_dataframe(store.get_artifacts_by_type('Model'), -1))\n",
"display(_mlmd_model_to_dataframe(store.get_artifacts_by_type('Model'), 0))"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "f0fGWt-OIzEb"
},
"source": [
"## Conclusion\n",
"\n",
"Within this case study we developed a Keras classifier within a TFX pipeline with the COMPAS dataset to examine any fairness concerns within the dataset. After initially developing the TFX, fairness concerns were not immediately apparent until examining the individual slices within our model by our sensitive features --in our case race. After identifying the issues, we were able to track down the source of the fairness issue with TensorFlow DataValidation to identify a method to mitigate the fairness concerns via model weighting while tracking and annotating the changes via ML Metadata. Although we are not able to fully fix all the fairness concerns within the dataset, by adding a note for future developers to follow will allow others to understand and issues we faced while developing this model. \n",
"\n",
"Finally it is important to note that this case study did not fix the fairness issues that are present in the COMPAS dataset. By improving the fairness concerns in the model we also reduced the AUC and accuracy in the performance of the model. What we were able to do, however, was build a model that showcased the fairness concerns and track down where the problems could be coming from by tracking or model's lineage while annotating any model concerns within the metadata.\n",
"\n",
"For more information on the issues that the predicting pre-trial detention can have see the FAT* 2018 talk on [\"Understanding the Context and Consequences of Pre-trial Detention\"](https://www.youtube.com/watch?v=hEThGT-_5ho\u0026feature=youtu.be\u0026t=1)"
]
}
],
"metadata": {
"colab": {
"collapsed_sections": [],
"name": "Fairness Indicators Lineage Case Study",
"provenance": [],
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
================================================
FILE: docs/tutorials/_toc.yaml
================================================
toc:
- title: Introduction to Fairness Indicators
path: /responsible_ai/fairness_indicators/tutorials/Fairness_Indicators_Example_Colab
- title: Evaluate fairness using TF-Hub models
path: /responsible_ai/fairness_indicators/tutorials/Fairness_Indicators_on_TF_Hub_Text_Embeddings
- title: Visualize with TensorBoard Plugin
path: /responsible_ai/fairness_indicators/tutorials/Fairness_Indicators_TensorBoard_Plugin_Example_Colab
- title: Evaluate toxicity in Wiki comments
path: /responsible_ai/fairness_indicators/tutorials/Fairness_Indicators_TFCO_Wiki_Case_Study
- title: TensorFlow constrained optimization example
path: /responsible_ai/fairness_indicators/tutorials/Fairness_Indicators_TFCO_CelebA_Case_Study
- title: Pandas DataFrame case study
path: /responsible_ai/fairness_indicators/tutorials/Fairness_Indicators_Pandas_Case_Study
- title: FaceSSD example Colab
path: /responsible_ai/fairness_indicators/tutorials/Facessd_Fairness_Indicators_Example_Colab
================================================
FILE: fairness_indicators/__init__.py
================================================
# Copyright 2019 Google LLC. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Init module for Fairness Indicators."""
# Import version string.
from fairness_indicators.version import __version__
================================================
FILE: fairness_indicators/example_model.py
================================================
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Demo script to train and evaluate a model.
This scripts contains boilerplate code to train a Keras Text Classifier
and evaluate it using Tensorflow Model Analysis. Evaluation
results can be visualized using tools like TensorBoard.
"""
from typing import Any
import tensorflow.compat.v1 as tf
import tensorflow_model_analysis as tfma
from tensorflow import keras
from fairness_indicators import fairness_indicators_metrics # noqa: F401
TEXT_FEATURE = "comment_text"
LABEL = "toxicity"
SLICE = "slice"
FEATURE_MAP = {
LABEL: tf.io.FixedLenFeature([], tf.float32),
TEXT_FEATURE: tf.io.FixedLenFeature([], tf.string),
SLICE: tf.io.VarLenFeature(tf.string),
}
class ExampleParser(keras.layers.Layer):
"""A Keras layer that parses the tf.Example."""
def __init__(self, input_feature_key):
self._input_feature_key = input_feature_key
self.input_spec = keras.layers.InputSpec(shape=(1,), dtype=tf.string)
super().__init__()
def compute_output_shape(self, input_shape: Any):
return [1, 1]
def call(self, serialized_examples):
def get_feature(serialized_example):
parsed_example = tf.io.parse_single_example(
serialized_example, features=FEATURE_MAP
)
return parsed_example[self._input_feature_key]
serialized_examples = tf.cast(serialized_examples, tf.string)
return tf.map_fn(get_feature, serialized_examples)
class Reshaper(keras.layers.Layer):
"""A Keras layer that reshapes the input."""
def call(self, inputs):
return tf.reshape(inputs, (1, 32))
class Caster(keras.layers.Layer):
"""A Keras layer that reshapes the input."""
def call(self, inputs):
return tf.cast(inputs, tf.float32)
def get_example_model(input_feature_key: str):
"""Returns a Keras model for testing."""
parser = ExampleParser(input_feature_key)
text_vectorization = keras.layers.TextVectorization(
max_tokens=32,
output_mode="int",
output_sequence_length=32,
)
text_vectorization.adapt(
["nontoxic", "toxic comment", "test comment", "abc", "abcdef", "random"]
)
dense1 = keras.layers.Dense(
32,
activation=None,
use_bias=True,
kernel_initializer="glorot_uniform",
bias_initializer="zeros",
)
dense2 = keras.layers.Dense(
1,
activation=None,
use_bias=False,
kernel_initializer="glorot_uniform",
bias_initializer="zeros",
)
inputs = tf.keras.Input(shape=(), dtype=tf.string)
parsed_example = parser(inputs)
text_vector = text_vectorization(parsed_example)
text_vector = Reshaper()(text_vector)
text_vector = Caster()(text_vector)
output1 = dense1(text_vector)
output2 = dense2(output1)
return tf.keras.Model(inputs=inputs, outputs=output2)
def evaluate_model(
classifier_model_path,
validate_tf_file_path,
tfma_eval_result_path,
eval_config,
):
"""Evaluate Model using Tensorflow Model Analysis.
Args:
----
classifier_model_path: Trained classifier model to be evaluted.
validate_tf_file_path: File containing validation TFRecordDataset.
tfma_eval_result_path: Path to export tfma-related eval path.
eval_config: tfma eval_config.
"""
eval_shared_model = tfma.default_eval_shared_model(
eval_saved_model_path=classifier_model_path, eval_config=eval_config
)
# Run the fairness evaluation.
tfma.run_model_analysis(
eval_shared_model=eval_shared_model,
data_location=validate_tf_file_path,
output_path=tfma_eval_result_path,
eval_config=eval_config,
)
================================================
FILE: fairness_indicators/example_model_test.py
================================================
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for example_model.py.
It also serves as an example of how to use fairness indicators with a Keras
model.
"""
import datetime
import os
import tempfile
import numpy as np
import six
import tensorflow.compat.v1 as tf
import tensorflow_model_analysis as tfma
from google.protobuf import text_format
from tensorflow import keras
from fairness_indicators import example_model
tf.compat.v1.enable_eager_execution()
class ExampleModelTest(tf.test.TestCase):
def setUp(self):
super(ExampleModelTest, self).setUp()
self._base_dir = tempfile.gettempdir()
self._model_dir = os.path.join(
self._base_dir, "train", datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
)
def _create_example(self, comment_text, label, slice_value):
example = tf.train.Example()
example.features.feature[example_model.TEXT_FEATURE].bytes_list.value[:] = [
six.ensure_binary(comment_text, "utf8")
]
example.features.feature[example_model.SLICE].bytes_list.value[:] = [
six.ensure_binary(slice_value, "utf8")
]
example.features.feature[example_model.LABEL].float_list.value[:] = [label]
return example
def _create_data(self):
examples = []
examples.append(self._create_example("test comment", 0.0, "slice1"))
examples.append(self._create_example("toxic comment", 1.0, "slice1"))
examples.append(self._create_example("non-toxic comment", 0.0, "slice1"))
examples.append(self._create_example("test comment", 1.0, "slice2"))
examples.append(self._create_example("non-toxic comment", 0.0, "slice2"))
examples.append(self._create_example("test comment", 0.0, "slice3"))
examples.append(self._create_example("toxic comment", 1.0, "slice3"))
examples.append(self._create_example("toxic comment", 1.0, "slice3"))
examples.append(self._create_example("non toxic comment", 0.0, "slice3"))
examples.append(self._create_example("abc", 0.0, "slice1"))
examples.append(self._create_example("abcdef", 0.0, "slice3"))
examples.append(self._create_example("random", 0.0, "slice1"))
return examples
def _write_tf_records(self, examples):
data_location = os.path.join(self._base_dir, "input_data.rio")
with tf.io.TFRecordWriter(data_location) as writer:
for example in examples:
writer.write(example.SerializeToString())
return data_location
def test_example_model(self):
data = self._create_data()
classifier = example_model.get_example_model(example_model.TEXT_FEATURE)
classifier.compile(optimizer=keras.optimizers.Adam(), loss="mse")
classifier.fit(
tf.constant([e.SerializeToString() for e in data]),
np.array(
[
e.features.feature[example_model.LABEL].float_list.value[:][0]
for e in data
]
),
batch_size=1,
)
tf.saved_model.save(classifier, self._model_dir)
eval_config = text_format.Parse(
"""
model_specs {
signature_name: "serving_default"
prediction_key: "predictions" # placeholder
label_key: "toxicity" # placeholder
}
slicing_specs {}
slicing_specs {
feature_keys: ["slice"]
}
metrics_specs {
metrics {
class_name: "ExampleCount"
}
metrics {
class_name: "FairnessIndicators"
}
}
""",
tfma.EvalConfig(),
)
validate_tf_file_path = self._write_tf_records(data)
tfma_eval_result_path = os.path.join(self._model_dir, "tfma_eval_result")
example_model.evaluate_model(
self._model_dir,
validate_tf_file_path,
tfma_eval_result_path,
eval_config,
)
evaluation_results = tfma.load_eval_result(tfma_eval_result_path)
expected_slice_keys = [
(),
(("slice", "slice1"),),
(("slice", "slice2"),),
(("slice", "slice3"),),
]
slice_keys = [slice_key for slice_key, _ in evaluation_results.slicing_metrics]
self.assertEqual(set(expected_slice_keys), set(slice_keys))
# Verify part of the metrics of fairness indicators
metric_values = dict(evaluation_results.slicing_metrics)[
(("slice", "slice1"),)
][""][""]
self.assertEqual(metric_values["example_count"], {"doubleValue": 5.0})
self.assertEqual(
metric_values["fairness_indicators_metrics/false_positive_rate@0.1"],
{"doubleValue": 0.0},
)
self.assertEqual(
metric_values["fairness_indicators_metrics/false_negative_rate@0.1"],
{"doubleValue": 1.0},
)
self.assertEqual(
metric_values["fairness_indicators_metrics/true_positive_rate@0.1"],
{"doubleValue": 0.0},
)
self.assertEqual(
metric_values["fairness_indicators_metrics/true_negative_rate@0.1"],
{"doubleValue": 1.0},
)
================================================
FILE: fairness_indicators/fairness_indicators_metrics.py
================================================
# Copyright 2020 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Fairness Indicators Metrics."""
import collections
from typing import Any, Dict, List, Optional, Sequence
from tensorflow_model_analysis.metrics import (
binary_confusion_matrices,
metric_types,
metric_util,
)
from tensorflow_model_analysis.proto import config_pb2
FAIRNESS_INDICATORS_METRICS_NAME = "fairness_indicators_metrics"
FAIRNESS_INDICATORS_SUB_METRICS = (
"false_positive_rate",
"false_negative_rate",
"true_positive_rate",
"true_negative_rate",
"positive_rate",
"negative_rate",
"false_discovery_rate",
"false_omission_rate",
"precision",
"recall",
)
DEFAULT_THRESHOLDS = (0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9)
class FairnessIndicators(metric_types.Metric):
"""Fairness indicators metrics."""
def computations_with_logging(self):
"""Add streamz logging for fairness indicators."""
computations_fn = metric_util.merge_per_key_computations(
_fairness_indicators_metrics_at_thresholds
)
def merge_and_log_computations_fn(
eval_config: Optional[config_pb2.EvalConfig] = None,
# A tf metadata schema.
schema: Optional[Any] = None,
model_names: Optional[List[str]] = None,
output_names: Optional[List[str]] = None,
sub_keys: Optional[List[Optional[metric_types.SubKey]]] = None,
aggregation_type: Optional[metric_types.AggregationType] = None,
class_weights: Optional[Dict[int, float]] = None,
example_weighted: bool = False,
query_key: Optional[str] = None,
**kwargs,
):
return computations_fn(
eval_config,
schema,
model_names,
output_names,
sub_keys,
aggregation_type,
class_weights,
example_weighted,
query_key,
**kwargs,
)
return merge_and_log_computations_fn
def __init__(
self,
thresholds: Sequence[float] = DEFAULT_THRESHOLDS,
name: str = FAIRNESS_INDICATORS_METRICS_NAME,
):
"""Initializes fairness indicators metrics.
Args:
----
thresholds: Thresholds to use for fairness metrics.
name: Metric name.
"""
super().__init__(
self.computations_with_logging(), thresholds=thresholds, name=name
)
def calculate_digits(thresholds):
digits = [len(str(t)) - 2 for t in thresholds]
return max(max(digits), 1)
def _fairness_indicators_metrics_at_thresholds(
thresholds: List[float],
name: str = FAIRNESS_INDICATORS_METRICS_NAME,
eval_config: Optional[config_pb2.EvalConfig] = None,
model_name: str = "",
output_name: str = "",
aggregation_type: Optional[metric_types.AggregationType] = None,
sub_key: Optional[metric_types.SubKey] = None,
class_weights: Optional[Dict[int, float]] = None,
example_weighted: bool = False,
) -> metric_types.MetricComputations:
"""Returns computations for fairness metrics at thresholds."""
metric_key_by_name_by_threshold = collections.defaultdict(dict)
keys = []
digits_num = calculate_digits(thresholds)
for t in thresholds:
for m in FAIRNESS_INDICATORS_SUB_METRICS:
key = metric_types.MetricKey(
name="%s/%s@%.*f"
% (
name,
m,
digits_num,
t,
), # e.g. "fairness_indicators_metrics/positive_rate@0.5"
model_name=model_name,
output_name=output_name,
sub_key=sub_key,
example_weighted=example_weighted,
)
keys.append(key)
metric_key_by_name_by_threshold[t][m] = key
# Make sure matrices are calculated.
computations = binary_confusion_matrices.binary_confusion_matrices(
eval_config=eval_config,
model_name=model_name,
output_name=output_name,
sub_key=sub_key,
aggregation_type=aggregation_type,
class_weights=class_weights,
example_weighted=example_weighted,
thresholds=thresholds,
)
confusion_matrices_key = computations[-1].keys[-1]
def result(
metrics: Dict[metric_types.MetricKey, Any],
) -> Dict[metric_types.MetricKey, Any]:
"""Returns fairness metrics values."""
metric = metrics[confusion_matrices_key]
output = {}
for i, threshold in enumerate(thresholds):
num_positives = metric.tp[i] + metric.fn[i]
num_negatives = metric.tn[i] + metric.fp[i]
tpr = metric.tp[i] / (num_positives or float("nan"))
tnr = metric.tn[i] / (num_negatives or float("nan"))
fpr = metric.fp[i] / (num_negatives or float("nan"))
fnr = metric.fn[i] / (num_positives or float("nan"))
pr = (metric.tp[i] + metric.fp[i]) / (
(num_positives + num_negatives) or float("nan")
)
nr = (metric.tn[i] + metric.fn[i]) / (
(num_positives + num_negatives) or float("nan")
)
precision = metric.tp[i] / ((metric.tp[i] + metric.fp[i]) or float("nan"))
recall = metric.tp[i] / ((metric.tp[i] + metric.fn[i]) or float("nan"))
fdr = metric.fp[i] / ((metric.fp[i] + metric.tp[i]) or float("nan"))
fomr = metric.fn[i] / ((metric.fn[i] + metric.tn[i]) or float("nan"))
output[
metric_key_by_name_by_threshold[threshold]["false_positive_rate"]
] = fpr
output[
metric_key_by_name_by_threshold[threshold]["false_negative_rate"]
] = fnr
output[metric_key_by_name_by_threshold[threshold]["true_positive_rate"]] = (
tpr
)
output[metric_key_by_name_by_threshold[threshold]["true_negative_rate"]] = (
tnr
)
output[metric_key_by_name_by_threshold[threshold]["positive_rate"]] = pr
output[metric_key_by_name_by_threshold[threshold]["negative_rate"]] = nr
output[
metric_key_by_name_by_threshold[threshold]["false_discovery_rate"]
] = fdr
output[
metric_key_by_name_by_threshold[threshold]["false_omission_rate"]
] = fomr
output[metric_key_by_name_by_threshold[threshold]["precision"]] = precision
output[metric_key_by_name_by_threshold[threshold]["recall"]] = recall
return output
derived_computation = metric_types.DerivedMetricComputation(
keys=keys, result=result
)
computations.append(derived_computation)
return computations
metric_types.register_metric(FairnessIndicators)
================================================
FILE: fairness_indicators/remediation/__init__.py
================================================
# Copyright 2019 Google LLC. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
================================================
FILE: fairness_indicators/remediation/weight_utils.py
================================================
"""Utilities to suggest weights based on model analysis results."""
from typing import Any, Dict, Mapping
import tensorflow_model_analysis as tfma
def create_percentage_difference_dictionary(
eval_result: tfma.EvalResult, baseline_name: str, metric_name: str
) -> Dict[str, Any]:
"""Creates dictionary of a % difference between a baseline and other slices.
Args:
----
eval_result: Loaded eval result from running TensorFlow Model Analysis.
baseline_name: Name of the baseline slice, 'Overall' or a specified tuple.
metric_name: Name of the metric on which to perform comparisons.
Returns:
-------
Dictionary mapping slices to percentage difference from the baseline slice.
"""
baseline_value = get_baseline_value(eval_result, baseline_name, metric_name)
difference = {}
for metrics_tuple in eval_result.slicing_metrics:
slice_key = metrics_tuple[0]
metrics = metrics_tuple[1]
# Concatenate feature name/values for intersectional features.
column = "-".join([elem[0] for elem in slice_key])
feature_val = "-".join([elem[1] for elem in slice_key])
if column not in difference:
difference[column] = {}
difference[column][feature_val] = (
_get_metric_value(metrics, metric_name) - baseline_value
) / baseline_value
return difference
def _get_metric_value(
nested_dict: Mapping[str, Mapping[str, Any]], metric_name: str
) -> float:
"""Returns the value of the named metric from a slice's metrics.
Args:
----
nested_dict: Dictionary of metrics from slice.
metric_name: Value to return from the metric slice.
Returns:
-------
Percentage value of the baseline slice name requested.
Raises:
------
KeyError: If the metric name isn't found in the metrics dictionary or if the
input metrics dictionary is empty.
TypeError: If an unsupported value type is found within dictionary slice.
passed.
"""
for value in nested_dict.values():
if metric_name in value[""]:
typed_value = value[""][metric_name]
if "doubleValue" in typed_value:
return typed_value["doubleValue"]
if "boundedValue" in typed_value:
return typed_value["boundedValue"]["value"]
raise TypeError("Unsupported value type: %s" % typed_value)
else:
raise KeyError(
"Key %s not found in %s" % (metric_name, list(value[""].keys()))
)
raise KeyError(
"Unable to return a metric value because the dictionary passed is empty."
)
def get_baseline_value(
eval_result: tfma.EvalResult, baseline_name: str, metric_name: str
) -> float:
"""Looks through the evaluation result for the value of the baseline slice.
Args:
----
eval_result: Loaded eval result from running TensorFlow Model Analysis.
baseline_name: Name of the baseline slice, 'Overall' or a specified tuple.
metric_name: Name of the metric on which to perform comparisons.
Returns:
-------
Percentage value of the baseline slice name requested.
Raises:
------
Value error if the baseline slice is not found in eval_results.
"""
for metrics_tuple in eval_result.slicing_metrics:
slice_tuple = metrics_tuple[0]
if baseline_name == "Overall" and not slice_tuple:
return _get_metric_value(metrics_tuple[1], metric_name)
if baseline_name == slice_tuple:
return _get_metric_value(metrics_tuple[1], metric_name)
raise ValueError(
"Could not find baseline %s in eval_result: %s" % (baseline_name, eval_result)
)
================================================
FILE: fairness_indicators/remediation/weight_utils_test.py
================================================
"""Tests for fairness_indicators.remediation.weight_utils."""
import collections
import tensorflow.compat.v1 as tf
from fairness_indicators.remediation import weight_utils
EvalResult = collections.namedtuple("EvalResult", ["slicing_metrics"])
class WeightUtilsTest(tf.test.TestCase):
def create_eval_result(self):
return EvalResult(
slicing_metrics=[
(
(),
{
"": {
"": {
"post_export_metrics/negative_rate@0.10": {
"doubleValue": 0.08
},
"accuracy": {"doubleValue": 0.444},
}
}
},
),
(
(("gender", "female"),),
{
"": {
"": {
"post_export_metrics/negative_rate@0.10": {
"doubleValue": 0.09
},
"accuracy": {"doubleValue": 0.333},
}
}
},
),
(
(
("gender", "female"),
("sexual_orientation", "homosexual_gay_or_lesbian"),
),
{
"": {
"": {
"post_export_metrics/negative_rate@0.10": {
"doubleValue": 0.1
},
"accuracy": {"doubleValue": 0.222},
}
}
},
),
]
)
def create_bounded_result(self):
return EvalResult(
slicing_metrics=[
(
(),
{
"": {
"": {
"post_export_metrics/negative_rate@0.10": {
"boundedValue": {
"lowerBound": 0.07,
"upperBound": 0.09,
"value": 0.08,
"methodology": "POISSON_BOOTSTRAP",
}
},
"accuracy": {
"boundedValue": {
"lowerBound": 0.07,
"upperBound": 0.09,
"value": 0.444,
"methodology": "POISSON_BOOTSTRAP",
}
},
}
}
},
),
(
(("gender", "female"),),
{
"": {
"": {
"post_export_metrics/negative_rate@0.10": {
"boundedValue": {
"lowerBound": 0.07,
"upperBound": 0.09,
"value": 0.09,
"methodology": "POISSON_BOOTSTRAP",
}
},
"accuracy": {
"boundedValue": {
"lowerBound": 0.07,
"upperBound": 0.09,
"value": 0.333,
"methodology": "POISSON_BOOTSTRAP",
}
},
}
}
},
),
(
(
("gender", "female"),
("sexual_orientation", "homosexual_gay_or_lesbian"),
),
{
"": {
"": {
"post_export_metrics/negative_rate@0.10": {
"boundedValue": {
"lowerBound": 0.07,
"upperBound": 0.09,
"value": 0.1,
"methodology": "POISSON_BOOTSTRAP",
}
},
"accuracy": {
"boundedValue": {
"lowerBound": 0.07,
"upperBound": 0.09,
"value": 0.222,
"methodology": "POISSON_BOOTSTRAP",
}
},
}
}
},
),
]
)
def test_baseline(self):
test_eval_result = self.create_eval_result()
self.assertEqual(
0.08,
weight_utils.get_baseline_value(
test_eval_result, "Overall", "post_export_metrics/negative_rate@0.10"
),
)
self.assertEqual(
0.09,
weight_utils.get_baseline_value(
test_eval_result,
(("gender", "female"),),
"post_export_metrics/negative_rate@0.10",
),
)
# Test 'accuracy'.
self.assertEqual(
0.444,
weight_utils.get_baseline_value(test_eval_result, "Overall", "accuracy"),
)
# Test intersectional metrics.
self.assertEqual(
0.222,
weight_utils.get_baseline_value(
test_eval_result,
(
("gender", "female"),
("sexual_orientation", "homosexual_gay_or_lesbian"),
),
"accuracy",
),
)
with self.assertRaises(ValueError):
# Test slice not found.
weight_utils.get_baseline_value(
test_eval_result, (("nonexistant", "slice"),), "accuracy"
)
with self.assertRaises(KeyError):
# Test metric not found.
weight_utils.get_baseline_value(
test_eval_result, (("gender", "female"),), "nonexistent_metric"
)
def test_get_metric_value_raise_key_error(self):
input_dict = {"": {"": {"accuracy": 0.1}}}
metric_name = "nonexistent_metric"
with self.assertRaises(KeyError):
weight_utils._get_metric_value(input_dict, metric_name)
def test_get_metric_value_raise_unsupported_value(self):
input_dict = {"": {"": {"accuracy": {"boundedValue": {1}}}}}
metric_name = "accuracy"
with self.assertRaises(TypeError):
weight_utils._get_metric_value(input_dict, metric_name)
def test_get_metric_value_raise_empty_dict(self):
with self.assertRaises(KeyError):
weight_utils._get_metric_value({}, "metric_name")
def test_create_difference_dictionary(self):
test_eval_result = self.create_eval_result()
res = weight_utils.create_percentage_difference_dictionary(
test_eval_result, "Overall", "post_export_metrics/negative_rate@0.10"
)
self.assertEqual(3, len(res))
self.assertIn("gender-sexual_orientation", res)
self.assertIn("gender", res)
self.assertAlmostEqual(res["gender"]["female"], 0.125)
self.assertAlmostEqual(res[""][""], 0)
def test_create_difference_dictionary_baseline(self):
test_eval_result = self.create_eval_result()
res = weight_utils.create_percentage_difference_dictionary(
test_eval_result,
(("gender", "female"),),
"post_export_metrics/negative_rate@0.10",
)
self.assertEqual(3, len(res))
self.assertIn("gender-sexual_orientation", res)
self.assertIn("gender", res)
self.assertAlmostEqual(res["gender"]["female"], 0)
self.assertAlmostEqual(res[""][""], -0.11111111)
def test_create_difference_dictionary_bounded_metrics(self):
test_eval_result = self.create_bounded_result()
res = weight_utils.create_percentage_difference_dictionary(
test_eval_result, "Overall", "post_export_metrics/negative_rate@0.10"
)
self.assertEqual(3, len(res))
self.assertIn("gender-sexual_orientation", res)
self.assertIn("gender", res)
self.assertAlmostEqual(res["gender"]["female"], 0.125)
self.assertAlmostEqual(res[""][""], 0)
================================================
FILE: fairness_indicators/test_cases/dlvm/fairness_indicators_dlvm_test_case.ipynb
================================================
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "Tce3stUlHN0L"
},
"source": [
"##### Copyright 2020 The TensorFlow Authors."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "tuOe1ymfHZPu"
},
"outputs": [],
"source": [
"#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
"# you may not use this file except in compliance with the License.\n",
"# You may obtain a copy of the License at\n",
"#\n",
"# https://www.apache.org/licenses/LICENSE-2.0\n",
"#\n",
"# Unless required by applicable law or agreed to in writing, software\n",
"# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
"# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
"# See the License for the specific language governing permissions and\n",
"# limitations under the License."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "aalPefrUUplk"
},
"source": [
"# Fairness Indicators DLVM Test Case"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "u33JXdluZ2lG"
},
"source": [
"## Setup"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "B8dlyTyiTe-9"
},
"outputs": [],
"source": [
"import tensorflow as tf\n",
"print('TF version: {}'.format(tf.__version__))\n",
"\n",
"import tensorflow_model_analysis as tfma\n",
"print('TFMA version: {}'.format(tfma.__version__))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "HG4ww5SwVUaq"
},
"outputs": [],
"source": [
"# Download the tar file from GCP and extract it\n",
"import io, os, tempfile\n",
"TAR_NAME = 'saved_models-2.2'\n",
"BASE_DIR = tempfile.mkdtemp()\n",
"DATA_DIR = os.path.join(BASE_DIR, TAR_NAME, 'data')\n",
"MODELS_DIR = os.path.join(BASE_DIR, TAR_NAME, 'models')\n",
"SCHEMA = os.path.join(BASE_DIR, TAR_NAME, 'schema.pbtxt')\n",
"OUTPUT_DIR = os.path.join(BASE_DIR, 'output')\n",
"\n",
"!curl -O https://storage.googleapis.com/artifacts.tfx-oss-public.appspot.com/datasets/{TAR_NAME}.tar\n",
"!tar xf {TAR_NAME}.tar\n",
"!mv {TAR_NAME} {BASE_DIR}\n",
"!rm {TAR_NAME}.tar\n",
"\n",
"print(\"Here's what we downloaded:\")\n",
"!ls -R {BASE_DIR}"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "h8i1NGecVZv1"
},
"outputs": [],
"source": [
"from google.protobuf import text_format\n",
"from tensorflow.python.lib.io import file_io\n",
"from tensorflow_metadata.proto.v0 import schema_pb2\n",
"from tensorflow.core.example import example_pb2\n",
"\n",
"schema = schema_pb2.Schema()\n",
"contents = file_io.read_file_to_string(SCHEMA)\n",
"schema = text_format.Parse(contents, schema)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "BPg2wEx_Vk3o"
},
"outputs": [],
"source": [
"import csv\n",
"\n",
"datafile = os.path.join(DATA_DIR, 'eval', 'data.csv')\n",
"reader = csv.DictReader(open(datafile, 'r'))\n",
"examples = []\n",
"for line in reader:\n",
" example = example_pb2.Example()\n",
" for feature in schema.feature:\n",
" key = feature.name\n",
" if feature.type == schema_pb2.FLOAT:\n",
" example.features.feature[key].float_list.value[:] = (\n",
" [float(line[key])] if len(line[key]) \u003e 0 else [])\n",
" elif feature.type == schema_pb2.INT:\n",
" example.features.feature[key].int64_list.value[:] = (\n",
" [int(line[key])] if len(line[key]) \u003e 0 else [])\n",
" elif feature.type == schema_pb2.BYTES:\n",
" example.features.feature[key].bytes_list.value[:] = (\n",
" [line[key].encode('utf8')] if len(line[key]) \u003e 0 else [])\n",
" # Add a new column 'big_tipper' that indicates if tips was \u003e 20% of the fare. \n",
" # TODO(b/157064428): Remove after label transformation is supported for Keras.\n",
" big_tipper = float(line['tips']) \u003e float(line['fare']) * 0.2\n",
" example.features.feature['big_tipper'].float_list.value[:] = [big_tipper]\n",
" examples.append(example)\n",
"\n",
"tfrecord_file = os.path.join(BASE_DIR, 'train_data.rio')\n",
"with tf.io.TFRecordWriter(tfrecord_file) as writer:\n",
" for example in examples:\n",
" writer.write(example.SerializeToString())\n",
"\n",
"!ls {tfrecord_file}"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "B8AQqw20YAB9"
},
"source": [
"## Run Fairness Indicators and TFMA"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "RhN80nIvVn49"
},
"outputs": [],
"source": [
"# Setup tfma.EvalConfig settings\n",
"keras_eval_config = text_format.Parse(\"\"\"\n",
" ## Model information\n",
" model_specs {\n",
" # For keras (and serving models) we need to add a `label_key`.\n",
" label_key: \"big_tipper\"\n",
" }\n",
"\n",
" ## Post training metric information. These will be merged with any built-in\n",
" ## metrics from training.\n",
" metrics_specs {\n",
" metrics { class_name: \"ExampleCount\" }\n",
" metrics { class_name: \"BinaryAccuracy\" }\n",
" metrics { class_name: \"AUC\" }\n",
" metrics { class_name: \"MeanLabel\" }\n",
" metrics { class_name: \"MeanPrediction\" }\n",
" metrics {\n",
" class_name: \"FairnessIndicators\"\n",
" config: '{ \"thresholds\": [0.3, 0.5, 0.7] }'\n",
" }\n",
" }\n",
"\n",
" ## Slicing information\n",
" slicing_specs {} # overall slice\n",
" slicing_specs {\n",
" feature_keys: [\"trip_start_hour\"]\n",
" }\n",
" slicing_specs {\n",
" feature_keys: [\"trip_start_day\"]\n",
" }\n",
" slicing_specs {\n",
" feature_values: {\n",
" key: \"trip_start_month\"\n",
" value: \"1\"\n",
" }\n",
" }\n",
" slicing_specs {\n",
" feature_keys: [\"trip_start_hour\", \"trip_start_day\"]\n",
" }\n",
"\"\"\", tfma.EvalConfig())\n",
"\n",
"# Create a tfma.EvalSharedModel that points at our keras model.\n",
"keras_model_path = os.path.join(MODELS_DIR, 'keras', '2')\n",
"keras_eval_shared_model = tfma.default_eval_shared_model(\n",
" eval_saved_model_path=keras_model_path,\n",
" eval_config=keras_eval_config)\n",
"\n",
"keras_output_path = os.path.join(OUTPUT_DIR, 'keras')\n",
"\n",
"# Run TFMA\n",
"keras_eval_result = tfma.run_model_analysis(\n",
" eval_shared_model=keras_eval_shared_model,\n",
" eval_config=keras_eval_config,\n",
" data_location=tfrecord_file,\n",
" output_path=keras_output_path)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ktlASJQIzE3l"
},
"source": [
"## Render Fairness Indicators"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "ul0Ud9TVWB_b"
},
"outputs": [],
"source": [
"tfma.addons.fairness.view.widget_view.render_fairness_indicator(eval_result=keras_eval_result)"
]
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"collapsed_sections": [],
"name": "Fairness Indicators DLVM Test Case.ipynb",
"private_outputs": true,
"provenance": [
],
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
================================================
FILE: fairness_indicators/test_cases/dlvm/fi_test_installed.sh
================================================
#!/bin/bash
#
# Copyright 2021 Google LLC. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# A script to test a Fairness Indicators JupyterLab in the current environment.
#
# Internally this script is used to test Fairness Indicators installation on DLVM/DL Container
# images.
# - https://cloud.google.com/deep-learning-vm
# - https://cloud.google.com/ai-platform/deep-learning-containers
#
# The list of the container images can be found in:
# https://cloud.google.com/ai-platform/deep-learning-containers/docs/choosing-container
#
notebook_test() {
FILENAME=$1
OUTPUT_FILENAME="results_${1}"
if ! papermill --no-progress-bar --no-log-output "$FILENAME" "$OUTPUT_FILENAME"; then
echo "Notebook test failed. Unable to run the test using papermill for the file: ${FILENAME}"
exit 1
fi
}
set -ex
PYTHON_BINARY=$(which python)
TENSORFLOW_VERSION=$(${PYTHON_BINARY} -c 'import tensorflow; print(tensorflow.__version__)')
{
FI_VERSION=$(${PYTHON_BINARY} -c 'import fairness_indicators; print(fairness_indicators.__version__)');
} || {
if [[ "${TENSORFLOW_VERSION}" == 2.3.* || "${TENSORFLOW_VERSION}" == 2.4.* ]]; then
echo "ERROR: Fairness Indicators should be installed on Tensorflow ${TENSORFLOW_VERSION}"
exit 1
else
echo "Fairness Indicators is not installed on Tensorflow ${TENSORFLOW_VERSION}"
exit 0
fi
}
if [[ "${FI_VERSION}" != *dev* ]]; then
VERSION_TAG_FLAG="-b v${FI_VERSION} --depth 1"
fi
rm -rf fairness-indicators
# Check FI v0.26.* with TF 2.3.*
if [[ "${TENSORFLOW_VERSION}" == 2.3.* ]]; then
if [[ "${FI_VERSION}" > 0.26.* && "${FI_VERSION}" < 0.27.* ]]; then
# The test cases is added after 0.27.0.
git clone -b v0.27.0 --depth 1 https://github.com/tensorflow/fairness-indicators.git
else
echo "ERROR: Fairness Indicators ${FI_VERSION} should not be installed on Tensorflow ${TENSORFLOW_VERSION}."
exit 1
fi
# Check FI v0.27.* with TF 2.4.*
elif [[ "${TENSORFLOW_VERSION}" == 2.4.* ]]; then
if [[ "${FI_VERSION}" > 0.27.* ]]; then
git clone ${VERSION_TAG_FLAG} https://github.com/tensorflow/fairness-indicators.git
else
echo "ERROR: Fairness Indicators ${FI_VERSION} should not be installed on Tensorflow ${TENSORFLOW_VERSION}."
exit 1
fi
else
echo "Fairness Indicators should not be installed on Tensorflow ${TENSORFLOW_VERSION}."
exit 0
fi
cd fairness-indicators/fairness_indicators/test_cases/dlvm/
notebook_test fairness_indicators_dlvm_test_case.ipynb
================================================
FILE: fairness_indicators/tutorial_utils/__init__.py
================================================
# Copyright 2019 Google LLC. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Init file for fairness_indicators.tutorial_utils."""
from fairness_indicators.tutorial_utils.util import (
convert_comments_data,
get_eval_results,
)
================================================
FILE: fairness_indicators/tutorial_utils/util.py
================================================
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Util methods for the example colabs."""
import os
import os.path
import tempfile
import pandas as pd
import tensorflow as tf
import tensorflow_model_analysis as tfma
from google.protobuf import text_format
TEXT_FEATURE = "comment_text"
LABEL = "toxicity"
SEXUAL_ORIENTATION_COLUMNS = [
"heterosexual",
"homosexual_gay_or_lesbian",
"bisexual",
"other_sexual_orientation",
]
GENDER_COLUMNS = ["male", "female", "transgender", "other_gender"]
RELIGION_COLUMNS = [
"christian",
"jewish",
"muslim",
"hindu",
"buddhist",
"atheist",
"other_religion",
]
RACE_COLUMNS = ["black", "white", "asian", "latino", "other_race_or_ethnicity"]
DISABILITY_COLUMNS = [
"physical_disability",
"intellectual_or_learning_disability",
"psychiatric_or_mental_illness",
"other_disability",
]
IDENTITY_COLUMNS = {
"gender": GENDER_COLUMNS,
"sexual_orientation": SEXUAL_ORIENTATION_COLUMNS,
"religion": RELIGION_COLUMNS,
"race": RACE_COLUMNS,
"disability": DISABILITY_COLUMNS,
}
_THRESHOLD = 0.5
def convert_comments_data(input_filename, output_filename=None):
"""Convert the public civil comments data.
In the orginal dataset
https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/data
for each indentity annotation columns, the value comes
from percent of raters thought the comment referenced the identity. When
processing the raw data, the threshold 0.5 is chosen and the identity terms
are grouped together by their categories. For example if one comment has {
male: 0.3, female: 1.0, transgender: 0.0, heterosexual: 0.8,
homosexual_gay_or_lesbian: 1.0 }. After the processing, the data will be {
gender: [female], sexual_orientation: [heterosexual,
homosexual_gay_or_lesbian] }.
Args:
----
input_filename: The path to the raw civil comments data, with extension
'tfrecord' or 'csv'.
output_filename: The path to write the processed civil comments data.
Returns:
-------
The file path to the converted dataset.
Raises:
------
ValueError: If the input_filename does not have a supported extension.
"""
extension = os.path.splitext(input_filename)[1][1:]
if not output_filename:
output_filename = os.path.join(tempfile.mkdtemp(), "output." + extension)
if extension == "tfrecord":
return _convert_comments_data_tfrecord(input_filename, output_filename)
elif extension == "csv":
return _convert_comments_data_csv(input_filename, output_filename)
raise ValueError(
"input_filename must have supported file extension csv or tfrecord, "
f"given: {input_filename}"
)
def _convert_comments_data_tfrecord(input_filename, output_filename=None):
"""Convert the public civil comments data, for tfrecord data."""
with tf.io.TFRecordWriter(output_filename) as writer:
for serialized in tf.data.TFRecordDataset(filenames=[input_filename]):
example = tf.train.Example()
example.ParseFromString(serialized.numpy())
if not example.features.feature[TEXT_FEATURE].bytes_list.value:
continue
new_example = tf.train.Example()
new_example.features.feature[TEXT_FEATURE].bytes_list.value.extend(
example.features.feature[TEXT_FEATURE].bytes_list.value
)
new_example.features.feature[LABEL].float_list.value.append(
1
if example.features.feature[LABEL].float_list.value[0] >= _THRESHOLD
else 0
)
for identity_category, identity_list in IDENTITY_COLUMNS.items():
grouped_identity = []
for identity in identity_list:
if (
example.features.feature[identity].float_list.value
and example.features.feature[identity].float_list.value[0]
>= _THRESHOLD
):
grouped_identity.append(identity.encode())
new_example.features.feature[identity_category].bytes_list.value.extend(
grouped_identity
)
writer.write(new_example.SerializeToString())
return output_filename
def _convert_comments_data_csv(input_filename, output_filename=None):
"""Convert the public civil comments data, for csv data."""
df = pd.read_csv(input_filename)
# Filter out rows with empty comment text values.
df = df[df[TEXT_FEATURE].ne("")]
df = df[df[TEXT_FEATURE].notnull()]
new_df = pd.DataFrame()
new_df[TEXT_FEATURE] = df[TEXT_FEATURE]
# Reduce the label to value 0 or 1.
new_df[LABEL] = df[LABEL].ge(_THRESHOLD).astype(int)
# Extract the list of all identity terms that exceed the threshold.
def identity_conditions(df, identity_list):
group = []
for identity in identity_list:
if df[identity] >= _THRESHOLD:
group.append(identity)
return group
for identity_category, identity_list in IDENTITY_COLUMNS.items():
new_df[identity_category] = df.apply(
identity_conditions, args=((identity_list),), axis=1
)
new_df.to_csv(
output_filename,
header=[TEXT_FEATURE, LABEL, *IDENTITY_COLUMNS.keys()],
index=False,
)
return output_filename
def get_eval_results(
model_location,
eval_result_path,
validate_tfrecord_file,
slice_selection="religion",
thresholds=None,
compute_confidence_intervals=True,
):
"""Get Fairness Indicators eval results."""
if thresholds is None:
thresholds = [0.4, 0.4125, 0.425, 0.4375, 0.45, 0.4675, 0.475, 0.4875, 0.5]
# Define slices that you want the evaluation to run on.
eval_config = text_format.Parse(
"""
model_specs {
label_key: '%s'
}
metrics_specs {
metrics {class_name: "AUC"}
metrics {class_name: "ExampleCount"}
metrics {class_name: "Accuracy"}
metrics {
class_name: "FairnessIndicators"
config: '{"thresholds": %s}'
}
}
slicing_specs {
feature_keys: '%s'
}
slicing_specs {}
options {
compute_confidence_intervals { value: %s }
disabled_outputs{values: "analysis"}
}
"""
% (
LABEL,
thresholds,
slice_selection,
"true" if compute_confidence_intervals else "false",
),
tfma.EvalConfig(),
)
eval_shared_model = tfma.default_eval_shared_model(
eval_saved_model_path=model_location, tags=[tf.saved_model.SERVING]
)
# Run the fairness evaluation.
return tfma.run_model_analysis(
eval_shared_model=eval_shared_model,
data_location=validate_tfrecord_file,
file_format="tfrecords",
eval_config=eval_config,
output_path=eval_result_path,
extractors=None,
)
================================================
FILE: fairness_indicators/tutorial_utils/util_test.py
================================================
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for fairness_indicators.tutorial_utils.util."""
import csv
import os
import tempfile
from unittest import mock
import pandas as pd
import tensorflow as tf
import tensorflow_model_analysis as tfma
from google.protobuf import text_format
from fairness_indicators.tutorial_utils import util
class UtilTest(tf.test.TestCase):
def _create_example_tfrecord(self):
example = text_format.Parse(
"""
features {
feature { key: "comment_text"
value { bytes_list { value: [ "comment 1" ] }}
}
feature { key: "toxicity" value { float_list { value: [ 0.1 ] }}}
feature { key: "heterosexual" value { float_list { value: [ 0.1 ] }}}
feature { key: "homosexual_gay_or_lesbian"
value { float_list { value: [ 0.1 ] }}
}
feature { key: "bisexual" value { float_list { value: [ 0.5 ] }}}
feature { key: "other_sexual_orientation"
value { float_list { value: [ 0.1 ] }}
}
feature { key: "male" value { float_list { value: [ 0.1 ] }}}
feature { key: "female" value { float_list { value: [ 0.2 ] }}}
feature { key: "transgender" value { float_list { value: [ 0.3 ] }}}
feature { key: "other_gender" value { float_list { value: [ 0.4 ] }}}
feature { key: "christian" value { float_list { value: [ 0.0 ] }}}
feature { key: "jewish" value { float_list { value: [ 0.1 ] }}}
feature { key: "muslim" value { float_list { value: [ 0.2 ] }}}
feature { key: "hindu" value { float_list { value: [ 0.3 ] }}}
feature { key: "buddhist" value { float_list { value: [ 0.4 ] }}}
feature { key: "atheist" value { float_list { value: [ 0.5 ] }}}
feature { key: "other_religion"
value { float_list { value: [ 0.6 ] }}
}
feature { key: "black" value { float_list { value: [ 0.1 ] }}}
feature { key: "white" value { float_list { value: [ 0.2 ] }}}
feature { key: "asian" value { float_list { value: [ 0.3 ] }}}
feature { key: "latino" value { float_list { value: [ 0.4 ] }}}
feature { key: "other_race_or_ethnicity"
value { float_list { value: [ 0.5 ] }}
}
feature { key: "physical_disability"
value { float_list { value: [ 0.6 ] }}
}
feature { key: "intellectual_or_learning_disability"
value { float_list { value: [ 0.7 ] }}
}
feature { key: "psychiatric_or_mental_illness"
value { float_list { value: [ 0.8 ] }}
}
feature { key: "other_disability"
value { float_list { value: [ 1.0 ] }}
}
}
""",
tf.train.Example(),
)
empty_comment_example = text_format.Parse(
"""
features {
feature { key: "comment_text"
value { bytes_list {} }
}
feature { key: "toxicity" value { float_list { value: [ 0.1 ] }}}
}
""",
tf.train.Example(),
)
return [example, empty_comment_example]
def _write_tf_records(self, examples):
filename = os.path.join(tempfile.mkdtemp(), "input.tfrecord")
with tf.io.TFRecordWriter(filename) as writer:
for e in examples:
writer.write(e.SerializeToString())
return filename
def test_convert_data_tfrecord(self):
input_file = self._write_tf_records(self._create_example_tfrecord())
output_file = util.convert_comments_data(input_file)
output_example_list = []
for serialized in tf.data.TFRecordDataset(filenames=[output_file]):
output_example = tf.train.Example()
output_example.ParseFromString(serialized.numpy())
output_example_list.append(output_example)
self.assertEqual(len(output_example_list), 1)
self.assertEqual(
output_example_list[0],
text_format.Parse(
"""
features {
feature { key: "comment_text"
value { bytes_list {value: [ "comment 1" ] }}
}
feature { key: "toxicity" value { float_list { value: [ 0.0 ] }}}
feature { key: "sexual_orientation"
value { bytes_list { value: ["bisexual"] }}
}
feature { key: "gender" value { bytes_list { }}}
feature { key: "race"
value { bytes_list { value: [ "other_race_or_ethnicity" ] }}
}
feature { key: "religion"
value { bytes_list {
value: [ "atheist", "other_religion" ] }
}
}
feature { key: "disability" value { bytes_list {
value: [
"physical_disability",
"intellectual_or_learning_disability",
"psychiatric_or_mental_illness",
"other_disability"] }}
}
}
""",
tf.train.Example(),
),
)
def _create_example_csv(self, use_fake_embedding=False):
header = [
"comment_text",
"toxicity",
"heterosexual",
"homosexual_gay_or_lesbian",
"bisexual",
"other_sexual_orientation",
"male",
"female",
"transgender",
"other_gender",
"christian",
"jewish",
"muslim",
"hindu",
"buddhist",
"atheist",
"other_religion",
"black",
"white",
"asian",
"latino",
"other_race_or_ethnicity",
"physical_disability",
"intellectual_or_learning_disability",
"psychiatric_or_mental_illness",
"other_disability",
]
example = [
"comment 1" if not use_fake_embedding else 0.35,
0.1,
# sexual orientation
0.1,
0.1,
0.5,
0.1,
# gender
0.1,
0.2,
0.3,
0.4,
# religion
0.0,
0.1,
0.2,
0.3,
0.4,
0.5,
0.6,
# race or ethnicity
0.1,
0.2,
0.3,
0.4,
0.5,
# disability
0.6,
0.7,
0.8,
1.0,
]
empty_comment_example = [
"" if not use_fake_embedding else 0.35,
0.1,
0.1,
0.1,
0.5,
0.1,
0.1,
0.2,
0.3,
0.4,
0.0,
0.1,
0.2,
0.3,
0.4,
0.5,
0.6,
0.1,
0.2,
0.3,
0.4,
0.5,
0.6,
0.7,
0.8,
1.0,
]
return [header, example, empty_comment_example]
def _write_csv(self, examples):
filename = os.path.join(tempfile.mkdtemp(), "input.csv")
with open(filename, "w", newline="") as csvfile:
csvwriter = csv.writer(csvfile, delimiter=",")
for example in examples:
csvwriter.writerow(example)
return filename
def test_convert_data_csv(self):
input_file = self._write_csv(self._create_example_csv())
output_file = util.convert_comments_data(input_file)
# Remove the quotes around identity terms list that read_csv injects.
df = pd.read_csv(output_file).replace("'", "", regex=True)
expected_df = pd.DataFrame()
expected_df = pd.concat(
[
expected_df,
pd.DataFrame.from_dict(
{
"comment_text": ["comment 1"],
"toxicity": [0.0],
"gender": [[]],
"sexual_orientation": [["bisexual"]],
"race": [["other_race_or_ethnicity"]],
"religion": [["atheist", "other_religion"]],
"disability": [
[
"physical_disability",
"intellectual_or_learning_disability",
"psychiatric_or_mental_illness",
"other_disability",
]
],
}
),
],
ignore_index=True,
)
self.assertEqual(
df.reset_index(drop=True, inplace=True),
expected_df.reset_index(drop=True, inplace=True),
)
# TODO(b/172260507): we should also look into testing the e2e call with tfma.
@mock.patch("tensorflow_model_analysis.default_eval_shared_model", autospec=True)
@mock.patch("tensorflow_model_analysis.run_model_analysis", autospec=True)
def test_get_eval_results_called_correclty(
self, mock_run_model_analysis, mock_shared_model
):
mock_model = "model"
mock_shared_model.return_value = mock_model
model_location = "saved_model"
eval_results_path = "eval_results"
data_file = "data"
util.get_eval_results(model_location, eval_results_path, data_file)
mock_shared_model.assert_called_once_with(
eval_saved_model_path=model_location, tags=[tf.saved_model.SERVING]
)
expected_eval_config = text_format.Parse(
"""
model_specs {
label_key: 'toxicity'
}
metrics_specs {
metrics {class_name: "AUC"}
metrics {class_name: "ExampleCount"}
metrics {class_name: "Accuracy"}
metrics {
class_name: "FairnessIndicators"
config: '{"thresholds": [0.4, 0.4125, 0.425, 0.4375, 0.45, 0.4675, 0.475, 0.4875, 0.5]}'
}
}
slicing_specs {
feature_keys: 'religion'
}
slicing_specs {}
options {
compute_confidence_intervals { value: true }
disabled_outputs{values: "analysis"}
}
""",
tfma.EvalConfig(),
)
mock_run_model_analysis.assert_called_once_with(
eval_shared_model=mock_model,
data_location=data_file,
file_format="tfrecords",
eval_config=expected_eval_config,
output_path=eval_results_path,
extractors=None,
)
================================================
FILE: fairness_indicators/version.py
================================================
# Copyright 2019 Google LLC. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Contains the version string of Fairness Indicators."""
# Note that setup.py uses this version.
__version__ = "0.49.0.dev"
================================================
FILE: mkdocs.yml
================================================
site_name: Fairness Indicators
repo_name: "fairness-indicators"
repo_url: https://github.com/tensorflow/fairness-indicators
theme:
name: material
logo: images/tf_full_color_primary_icon.svg
palette:
# Palette toggle for automatic mode
- media: "(prefers-color-scheme)"
primary: custom
accent: custom
toggle:
icon: material/brightness-auto
name: Switch to light mode
# Palette toggle for light mode
- media: "(prefers-color-scheme: light)"
primary: custom
accent: custom
scheme: default
toggle:
icon: material/brightness-7
name: Switch to dark mode
# Palette toggle for dark mode
- media: "(prefers-color-scheme: dark)"
primary: custom
accent: custom
scheme: slate
toggle:
icon: material/brightness-4
name: Switch to system preference
favicon: images/tf_full_color_primary_icon.svg
features:
- content.code.copy
- content.code.select
- content.action.edit
extra_css:
- stylesheets/extra.css
extra_javascript:
- javascripts/mathjax.js
- https://unpkg.com/mathjax@3/es5/tex-mml-chtml.js
plugins:
- mkdocs-jupyter:
execute: false
markdown_extensions:
- admonition
- attr_list
- def_list
- tables
- toc:
permalink: true
- pymdownx.highlight:
anchor_linenums: true
linenums: false
line_spans: __span
pygments_lang_class: true
- pymdownx.inlinehilite
- pymdownx.snippets
- pymdownx.superfences
- pymdownx.arithmatex:
generic: true
- pymdownx.critic
- pymdownx.caret
- pymdownx.keys
- pymdownx.mark
- pymdownx.tilde
- pymdownx.blocks.html
- md_in_html
- pymdownx.emoji:
emoji_index: !!python/name:material.extensions.emoji.twemoji
emoji_generator: !!python/name:material.extensions.emoji.to_svg
nav:
- "Overview": index.md
- "Thinking about Fairness Evaluation": guide/guidance.md
- "Introduction to Fairness Indicators": tutorials/Fairness_Indicators_Example_Colab.ipynb
- "Evaluate fairness using TF-Hub models": tutorials/Fairness_Indicators_on_TF_Hub_Text_Embeddings.ipynb
- "Visualize with Tensor Board Plugin": tutorials/Fairness_Indicators_TensorBoard_Plugin_Example_Colab.ipynb
- "Evaluate toxicity in Wiki comments": tutorials/Fairness_Indicators_TFCO_Wiki_Case_Study.ipynb
- "Tensor Flow constrained optimization example": tutorials/Fairness_Indicators_TFCO_CelebA_Case_Study.ipynb
- "Pandas Data Frame case study": tutorials/Fairness_Indicators_Pandas_Case_Study.ipynb
- "Face SSD example Colab": tutorials/Facessd_Fairness_Indicators_Example_Colab.ipynb
================================================
FILE: pyproject.toml
================================================
# Copyright 2020 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
[build-system]
requires = [
"setuptools",
"wheel",
]
[tool.ruff]
line-length = 88
[tool.ruff.lint]
select = [
# pycodestyle
"E",
"W",
# Pyflakes
"F",
# pyupgrade
"UP",
# flake8-bugbear
"B",
# flake8-simplify
"SIM",
# isort
"I",
# pep8 naming
"N",
# pydocstyle
"D",
# annotations
"ANN",
# debugger
"T10",
# flake8-pytest
"PT",
# flake8-return
"RET",
# flake8-unused-arguments
"ARG",
# flake8-fixme
"FIX",
# flake8-eradicate
"ERA",
# pandas-vet
"PD",
# numpy-specific rules
"NPY",
]
ignore = [
"D104", # Missing docstring in public package
"D100", # Missing docstring in public module
"D211", # No blank line before class
"PD901", # Avoid using 'df' for pandas dataframes. Perfectly fine in functions with limited scope
"ANN201", # Missing return type annotation for public function (makes no sense for NoneType return types...)
"ANN101", # Missing type annotation for `self`
"ANN204", # Missing return type annotation for special method
"ANN002", # Missing type annotation for `*args`
"ANN003", # Missing type annotation for `**kwargs`
"D105", # Missing docstring in magic method
"D203", # 1 blank line before after class docstring
"D204", # 1 blank line required after class docstring
"D413", # 1 blank line after parameters
"SIM108", # Simplify if/else to one line; not always clearer
"D206", # Docstrings should be indented with spaces; unnecessary when running ruff-format
"E501", # Line length too long; unnecessary when running ruff-format
"W191", # Indentation contains tabs; unnecessary when running ruff-format
# REMOVE THESE AS FIXED
"ANN001", # Missing type annotation for function argument
"ANN202", # Missing return type annotation for private function
"ANN401", # Dynamically typed expressions (typing.Any) are disallowed
"ARG001", # Unused function argument
"ARG002", # Unused method argument
"B018", # Found useless expression
"D101", # Missing docstring in public class
"D102", # Missing docstring in public method
"D103", # Missing docstring in public function
"D107", # Missing docstring in `__init__`
"D401", # First line of docstring should be in imperative mood
"ERA001", # Found commented-out code
"FIX002", # Line contains TODO
"N802", # Function name should be lowercase
"PD002", # `inplace=True` should be avoided
"PD004", # `.notna` is preferred to `.notnull`
"PT009", # Use a regular `assert` instead of unittest-style
"PT027", # Use `pytest.raises` instead of unittest-style `assertRaises`
"RET505", # Unnecessary `elif` after `return` statement
"RET506", # Unnecessary `else` after `raise` statement
"SIM105", # Use `contextlib.suppress` instead of `try`-`except`-`pass`
"UP008", # Use `super()` instead of `super(__class__, self)`
"UP031", # Use format specifiers instead of percent format
]
[tool.ruff.lint.per-file-ignores]
"__init__.py" = ["F401"]
[tool.pytest.ini_options]
addopts = "--import-mode=importlib"
testpaths = ["fairness_indicators"]
python_files = ["*_test.py"]
================================================
FILE: requirements-docs.txt
================================================
mkdocs
mkdocs-material
mkdocs-jupyter
================================================
FILE: setup.py
================================================
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Setup to install Fairness Indicators."""
import os
import sys
from pathlib import Path
import setuptools
if sys.version_info >= (3, 11):
sys.exit("Sorry, Python >= 3.11 is not supported")
def select_constraint(default, nightly=None, git_master=None):
"""Select dependency constraint based on TFX_DEPENDENCY_SELECTOR env var."""
selector = os.environ.get("TFX_DEPENDENCY_SELECTOR")
if selector == "UNCONSTRAINED":
return ""
elif selector == "NIGHTLY" and nightly is not None:
return nightly
elif selector == "GIT_MASTER" and git_master is not None:
return git_master
else:
return default
REQUIRED_PACKAGES = [
"tensorflow>=2.17,<2.18",
"tensorflow-hub>=0.16.1,<1.0.0",
"tensorflow-data-validation>=1.17.0,<2.0.0",
"tensorflow-model-analysis>=0.48.0,<0.49.0",
"witwidget>=1.4.4,<2",
"protobuf>=4.21.6,<6.0.0",
]
TEST_PACKAGES = [
"pytest>=8.3.0,<9",
]
with open(Path("./requirements-docs.txt").expanduser().absolute()) as f:
DOCS_PACKAGES = [req.strip() for req in f.readlines()]
# Get version from version module.
with open("fairness_indicators/version.py") as fp:
globals_dict = {}
exec(fp.read(), globals_dict) # pylint: disable=exec-used
__version__ = globals_dict["__version__"]
with open("README.md", encoding="utf-8") as fh:
long_description = fh.read()
setuptools.setup(
name="fairness_indicators",
version=__version__,
description="Fairness Indicators",
long_description=long_description,
long_description_content_type="text/markdown",
url="https://github.com/tensorflow/fairness-indicators",
author="Google LLC",
author_email="packages@tensorflow.org",
packages=setuptools.find_packages(exclude=["tensorboard_plugin"]),
package_data={
"fairness_indicators": ["documentation/*"],
},
python_requires=">=3.9,<4",
install_requires=REQUIRED_PACKAGES,
tests_require=REQUIRED_PACKAGES,
extras_require={"docs": DOCS_PACKAGES, "test": TEST_PACKAGES, "dev": "pre-commit"},
# PyPI package information.
classifiers=[
"Development Status :: 4 - Beta",
"Intended Audience :: Developers",
"Intended Audience :: Education",
"Intended Audience :: Science/Research",
"License :: OSI Approved :: Apache Software License",
"Operating System :: OS Independent",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3 :: Only",
"Topic :: Scientific/Engineering",
"Topic :: Scientific/Engineering :: Mathematics",
"Topic :: Scientific/Engineering :: Artificial Intelligence",
"Topic :: Software Development",
"Topic :: Software Development :: Libraries",
"Topic :: Software Development :: Libraries :: Python Modules",
],
license="Apache 2.0",
keywords=(
"tensorflow model analysis fairness indicators tensorboard machine" " learning"
),
)
================================================
FILE: tensorboard_plugin/README.md
================================================
# Evaluating Models with the Fairness Indicators Dashboard [Beta]

Fairness Indicators for TensorBoard enables easy computation of
commonly-identified fairness metrics for _binary_ and _multiclass_ classifiers.
With the plugin, you can visualize fairness evaluations for your runs and easily
compare performance across groups.
In particular, Fairness Indicators for TensorBoard allows you to evaluate and
visualize model performance, sliced across defined groups of users. Feel
confident about your results with confidence intervals and evaluations at
multiple thresholds.
Many existing tools for evaluating fairness concerns don’t work well on large
scale datasets and models. At Google, it is important for us to have tools that
can work on billion-user systems. Fairness Indicators will allow you to evaluate
across any size of use case, in the TensorBoard environment or in
[Colab](https://github.com/tensorflow/fairness-indicators/blob/master/g3doc/tutorials/).
## Requirements
To install Fairness Indicators for TensorBoard, run:
```
python3 -m virtualenv ~/tensorboard_demo
source ~/tensorboard_demo/bin/activate
pip install --upgrade pip
pip install fairness_indicators
pip install tensorboard-plugin-fairness-indicators
```
### Nightly Packages
Tensorboard Plugin also hosts nightly packages at
https://pypi-nightly.tensorflow.org on Google Cloud. To install the latest
nightly package, please use the following command:
```bash
pip install --extra-index-url https://pypi-nightly.tensorflow.org/simple tensorboard-plugin-fairness-indicators
```
This will install the nightly packages for the major dependencies of Tensorboard
Plugin such as TensorFlow Model Analysis (TFMA).
## Demo Colab
[Fairness_Indicators_TensorBoard_Plugin_Example_Colab.ipynb](https://github.com/tensorflow/fairness-indicators/blob/master/g3doc/tutorials/Fairness_Indicators_TensorBoard_Plugin_Example_Colab.ipynb)
contains an end-to-end demo to train and evaluate a model and visualize fairness evaluation
results in TensorBoard.
## Usage
To use the Fairness Indicators with your own data and evaluations:
1. Train a new model and evaluate using
`tensorflow_model_analysis.run_model_analysis` or
`tensorflow_model_analysis.ExtractEvaluateAndWriteResult` API in
[model_eval_lib](https://github.com/tensorflow/model-analysis/blob/master/tensorflow_model_analysis/api/model_eval_lib.py).
For code snippets on how to do this, see the Fairness Indicators colab
[here](https://github.com/tensorflow/fairness-indicators).
2. Write a summary data file using [`demo.py`](https://github.com/tensorflow/fairness-indicators/blob/master/tensorboard_plugin/tensorboard_plugin_fairness_indicators/demo.py), which will be read
by TensorBoard to render the Fairness Indicators dashboard (See the
[TensorBoard tutorial](https://github.com/tensorflow/tensorboard/blob/master/README.md)
for more information on summary data files).
Flags to be used with the `demo.py` utility:
- `--logdir`: Directory where TensorBoard will write the summary
- `--eval_result_output_dir`: Directory containing evaluation results
evaluated by TFMA
```
python demo.py --logdir= --eval_result_output_dir=`
```
Or you can also use `tensorboard_plugin_fairness_indicators.summary_v2` API to write the summary file.
```
writer = tf.summary.create_file_writer()
with writer.as_default():
summary_v2.FairnessIndicators(, step=1)
writer.close()
```
3. Run TensorBoard
Note: This will start a local instance. After the local instance is started, a link
will be displayed to the terminal. Open the link in your browser to view the
Fairness Indicators dashboard.
- `tensorboard --logdir=`
- Select the new evaluation run using the drop-down on the left side of
the dashboard to visualize results.
## Compatible versions
The following table shows the package versions that are
compatible with each other. This is determined by our testing framework, but
other *untested* combinations may also work.
|tensorboard-pluginn | tensorflow | tensorflow-model-analysis |
|-------------------------------------------------------------------------------------------------------------|---------------|---------------------------|
|[GitHub master](https://github.com/tensorflow/fairness-indicators/blob/master/tensorboard_plugin/README.md) | nightly (2.x) | 0.48.0 |
|[v0.48.0](https://github.com/tensorflow/fairness-indicators/blob/v0.48.0/tensorboard_plugin/README.md) | 2.17.1 | 0.48.0 |
|[v0.47.0](https://github.com/tensorflow/fairness-indicators/blob/v0.47.0/tensorboard_plugin/README.md) | 2.16.2 | 0.47.1 |
|[v0.46.0](https://github.com/tensorflow/fairness-indicators/blob/v0.46.0/tensorboard_plugin/README.md) | 2.15.0 | 0.46.0 |
|[v0.44.0](https://github.com/tensorflow/fairness-indicators/blob/v0.44.0/tensorboard_plugin/README.md) | 2.12.0 | 0.44.0 |
|[v0.43.0](https://github.com/tensorflow/fairness-indicators/blob/v0.43.0/tensorboard_plugin/README.md) | 2.11.0 | 0.43.0 |
|[v0.42.0](https://github.com/tensorflow/fairness-indicators/blob/v0.42.0/tensorboard_plugin/README.md) | 2.10.0 | 0.42.0 |
|[v0.41.0](https://github.com/tensorflow/fairness-indicators/blob/v0.41.0/tensorboard_plugin/README.md) | 2.9.0 | 0.41.0 |
|[v0.40.0](https://github.com/tensorflow/fairness-indicators/blob/v0.40.0/tensorboard_plugin/README.md) | 2.9.0 | 0.40.0 |
|[v0.39.0](https://github.com/tensorflow/fairness-indicators/blob/v0.39.0/tensorboard_plugin/README.md) | 2.8.0 | 0.39.0 |
|[v0.38.0](https://github.com/tensorflow/fairness-indicators/blob/v0.38.0/tensorboard_plugin/README.md) | 2.8.0 | 0.38.0 |
|[v0.37.0](https://github.com/tensorflow/fairness-indicators/blob/v0.37.0/tensorboard_plugin/README.md) | 2.7.0 | 0.37.0 |
|[v0.36.0](https://github.com/tensorflow/fairness-indicators/blob/v0.36.0/tensorboard_plugin/README.md) | 2.7.0 | 0.36.0 |
|[v0.35.0](https://github.com/tensorflow/fairness-indicators/blob/v0.35.0/tensorboard_plugin/README.md) | 2.6.0 | 0.35.0 |
|[v0.34.0](https://github.com/tensorflow/fairness-indicators/blob/v0.34.0/tensorboard_plugin/README.md) | 2.6.0 | 0.34.0 |
|[v0.33.0](https://github.com/tensorflow/fairness-indicators/blob/v0.33.0/tensorboard_plugin/README.md) | 2.5.0 | 0.33.0 |
|[v0.30.0](https://github.com/tensorflow/fairness-indicators/blob/v0.30.0/tensorboard_plugin/README.md) | 2.4.0 | 0.30.0 |
|[v0.29.0](https://github.com/tensorflow/fairness-indicators/blob/v0.29.0/tensorboard_plugin/README.md) | 2.4.0 | 0.29.0 |
|[v0.28.0](https://github.com/tensorflow/fairness-indicators/blob/v0.28.0/tensorboard_plugin/README.md) | 2.4.0 | 0.28.0 |
|[v0.27.0](https://github.com/tensorflow/fairness-indicators/blob/v0.27.0/tensorboard_plugin/README.md) | 2.4.0 | 0.27.0 |
|[v0.26.0](https://github.com/tensorflow/fairness-indicators/blob/v0.26.0/tensorboard_plugin/README.md) | 2.3.0 | 0.26.0 |
|[v0.25.0](https://github.com/tensorflow/fairness-indicators/blob/v0.25.0/tensorboard_plugin/README.md) | 2.3.0 | 0.25.0 |
|[v0.24.0](https://github.com/tensorflow/fairness-indicators/blob/v0.24.0/tensorboard_plugin/README.md) | 2.3.0 | 0.24.0 |
|[v0.23.0](https://github.com/tensorflow/fairness-indicators/blob/v0.23.0/tensorboard_plugin/README.md) | 2.3.0 | 0.23.0 |
================================================
FILE: tensorboard_plugin/pytest.ini
================================================
[pytest]
addopts = "--import-mode=importlib"
testpaths = "tensorboard_plugin_fairness_indicators"
python_files = "*_test.py"
================================================
FILE: tensorboard_plugin/setup.py
================================================
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Setup to install Fairness Indicators Tensorboard plugin."""
import os
import sys
from setuptools import find_packages, setup
if sys.version_info >= (3, 11):
sys.exit("Sorry, Python >= 3.11 is not supported")
def select_constraint(default, nightly=None, git_master=None):
"""Select dependency constraint based on TFX_DEPENDENCY_SELECTOR env var."""
selector = os.environ.get("TFX_DEPENDENCY_SELECTOR")
if selector == "UNCONSTRAINED":
return ""
elif selector == "NIGHTLY" and nightly is not None:
return nightly
elif selector == "GIT_MASTER" and git_master is not None:
return git_master
else:
return default
REQUIRED_PACKAGES = [
"protobuf>=4.21.6,<6.0.0",
"tensorboard>=2.17.0,<2.18.0",
"tensorflow>=2.17,<2.18",
"tf-keras>=2.17,<2.18",
"tensorflow-model-analysis>=0.48,<0.49",
"werkzeug<2",
]
TEST_PACKAGES = [
"pytest>=8.3.0,<9",
]
with open("README.md", encoding="utf-8") as fh:
long_description = fh.read()
# Get version from version module.
with open("tensorboard_plugin_fairness_indicators/version.py") as fp:
globals_dict = {}
exec(fp.read(), globals_dict) # pylint: disable=exec-used
__version__ = globals_dict["__version__"]
setup(
name="tensorboard_plugin_fairness_indicators",
version=__version__,
description="Fairness Indicators TensorBoard Plugin",
long_description=long_description,
long_description_content_type="text/markdown",
url="https://github.com/tensorflow/fairness-indicators",
author="Google LLC",
author_email="packages@tensorflow.org",
packages=find_packages(),
package_data={
"tensorboard_plugin_fairness_indicators": ["static/**"],
},
entry_points={
"tensorboard_plugins": [
"fairness_indicators = tensorboard_plugin_fairness_indicators.plugin:FairnessIndicatorsPlugin",
],
},
python_requires=">=3.9,<4",
install_requires=REQUIRED_PACKAGES,
tests_require=REQUIRED_PACKAGES,
extras_require={
"test": TEST_PACKAGES,
},
classifiers=[
"Development Status :: 4 - Beta",
"Intended Audience :: Developers",
"Intended Audience :: Education",
"Intended Audience :: Science/Research",
"License :: OSI Approved :: Apache Software License",
"Operating System :: OS Independent",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3 :: Only",
"Topic :: Scientific/Engineering",
"Topic :: Scientific/Engineering :: Mathematics",
"Topic :: Scientific/Engineering :: Artificial Intelligence",
"Topic :: Software Development",
"Topic :: Software Development :: Libraries",
"Topic :: Software Development :: Libraries :: Python Modules",
],
license="Apache 2.0",
keywords="tensorflow model analysis fairness indicators tensorboard machine learning",
)
================================================
FILE: tensorboard_plugin/tensorboard_plugin_fairness_indicators/RELEASE.md
================================================
# Current Version (Still in Development)
## Major Features and Improvements
## Bug Fixes and Other Changes
## Breaking Changes
## Deprecations
# Version 0.48.0
## Major Features and Improvements
* N/A
## Bug Fixes and Other Changes
* Support `tensorflow>=2.17,<2.18`.
* Depends on `tensorflow-model-analysis>=0.48,<0.49`.
## Breaking Changes
* N/A
## Deprecations
* N/A
# Version 0.47.0
## Major Features and Improvements
* N/A
## Bug Fixes and Other Changes
* Support `tensorflow>=2.16,<2.17`.
## Breaking Changes
* N/A
## Deprecations
* N/A
# Version 0.46.0
## Major Features and Improvements
* N/A
## Bug Fixes and Other Changes
* N/A
## Breaking Changes
* N/A
## Deprecations
* Deprecated python 3.8 support
# Version 0.44.0
## Major Features and Improvements
* N/A
## Bug Fixes and Other Changes
* Depends on `tensorflow>=2.12,<2.13`.
* Depends on tensorflow-model-analysis>=0.44,<0.45.
* Depends on `protobuf>=3.20.3,<5`.
## Breaking Changes
* N/A
## Deprecations
* Deprecating python3.7 support.
# Version 0.43.0
## Major Features and Improvements
* N/A
## Bug Fixes and Other Changes
* Depends on tensorflow>=2.11,<3.
* Depends on tensorflow-model-analysis>=0.43,<0.44.
## Breaking Changes
* N/A
## Deprecations
* N/A
# Version 0.42.0
## Major Features and Improvements
* This is the last version that supports TensorFlow 1.15.x. TF 1.15.x support
will be removed in the next version. Please check the
[TF2 migration guide](https://www.tensorflow.org/guide/migrate) to migrate
to TF2.
## Bug Fixes and Other Changes
* Depends on tensorflow>=2.10.0,<3.
* Depends on tensorflow-model-analysis>=0.42,<0.43.
## Breaking Changes
* N/A
## Deprecations
* N/A
# Version 0.41.0
## Major Features and Improvements
* N/A
## Bug Fixes and Other Changes
* Depends on tensorflow>=2.9.0,<3.
* Depends on tensorflow-model-analysis>=0.41,<0.42.
## Breaking Changes
* N/A
## Deprecations
* N/A
# Version 0.40.0
## Major Features and Improvements
* N/A
## Bug Fixes and Other Changes
* Depends on tensorflow>=2.8.0,<3.
* Depends on tensorflow-model-analysis>=0.40,<0.41.
## Breaking Changes
* N/A
## Deprecations
* N/A
# Version 0.39.0
## Major Features and Improvements
* N/A
## Bug Fixes and Other Changes
* Depends on `Werkzeug<2`.
* Depends on `tensorflow>=2.8.0,<3`.
* Depends on `tensorboard>=2.8.0,<3`.
* Depends on `tensorflow-model-analysis>=0.38,<0.39`.
## Breaking Changes
* N/A
## Deprecations
* N/A
# Version 0.38.0
## Major Features and Improvements
* N/A
## Bug Fixes and Other Changes
* Depends on `tensorflow>=2.8.0,<3`.
* Depends on `tensorboard>=2.8.0,<3`.
* Depends on `tensorflow-model-analysis>=0.38,<0.39`.
## Breaking Changes
* N/A
## Deprecations
* N/A
# Version 0.37.0
## Major Features and Improvements
* N/A
## Bug Fixes and Other Changes
* N/A
## Breaking Changes
* Depends on `tensorflow-model-analysis>=0.37,<0.38`.
## Deprecations
* N/A
# Version 0.36.0
## Major Features and Improvements
* N/A
## Bug Fixes and Other Changes
* Depends on `tensorflow>=2.7.0,<3`.
* Depends on `tensorboard>=2.7.0,<3`.
* Depends on `tensorflow-model-analysis>=0.36,<0.37`.
## Breaking Changes
* N/A
## Deprecations
* N/A
# Version 0.35.0
## Major Features and Improvements
* N/A
## Bug Fixes and Other Changes
* N/A
## Breaking Changes
* Depends on `tensorflow-model-analysis>=0.35,<0.36`.
## Deprecations
* Deprecating python3.6 support.
# Version 0.34.0
## Major Features and Improvements
* N/A
## Bug Fixes and Other Changes
* Depends on `tensorboard>=2.5.0,<3`.
* Depends on `tensorflow>=2.6.0,<3`.
* Depends on `tensorflow-model-analysis>=0.34,<0.35`.
## Breaking Changes
* N/A
## Deprecations
* N/A
# Version 0.33.0
## Major Features and Improvements
* N/A
## Bug Fixes and Other Changes
* Depends on `tensorboard>=2.5.0,<3`.
* Depends on `tensorflow>=2.5.0,<3`.
* Depends on `protobuf>=3.13,<4`.
* Depends on `tensorflow-model-analysis>=0.33,<0.34`.
## Breaking Changes
* N/A
## Deprecations
* N/A
# Version 0.30.0
## Major Features and Improvements
* N/A
## Bug Fixes and Other Changes
* Depends on `tensorboard>=2.4.0,!=2.5.*,<3`.
* Depends on `tensorflow>=2.4.0,!=2.5.*,<3`.
* Depends on `tensorflow-model-analysis>=0.30,<0.31`.
## Breaking Changes
* N/A
## Deprecations
* N/A
# Version 0.29.0
## Major Features and Improvements
* N/A
## Bug Fixes and Other Changes
* Depends on `tensorflow-model-analysis>=0.29,<0.30`.
## Breaking Changes
* N/A
## Deprecations
* N/A
# Version 0.28.0
## Major Features and Improvements
* N/A
## Bug Fixes and Other Changes
* Depends on `tensorflow-model-analysis>=0.28,<0.29`.
## Breaking Changes
* N/A
## Deprecations
* N/A
# Version 0.27.0
## Major Features and Improvements
* N/A
## Bug fixes and other changes
* Depends on `tensorboard>=2.4.0,<3`.
* Depends on `tensorflow>=2.4.0,<3`.
* Depends on `tensorflow-model-analysis>=0.27,<0.28`.
## Breaking changes
* N/A
## Deprecations
* N/A
# Version 0.26.0
## Major Features and Improvements
* N/A
## Bug fixes and other changes
* Depends on `tensorboard>=2.3.0,!=2.4.*,<3`.
* Depends on `tensorflow>=2.3.0,!=2.4.*,<3`.
* Depends on `tensorflow-model-analysis>=0.26,<0.27`.
## Breaking changes
* N/A
## Deprecations
* N/A
# Version 0.25.0
## Major Features and Improvements
* From this release Tensorboard Plugin will also be hosting nightly packages
on https://pypi-nightly.tensorflow.org. To install the nightly package use
the following command:
```
pip install --extra-index-url https://pypi-nightly.tensorflow.org/simple tensorboard-plugin-fairness-indicators
```
Note: These nightly packages are unstable and breakages are likely to
happen. The fix could often take a week or more depending on the complexity
involved for the wheels to be available on the PyPI cloud service. You can
always use the stable version of Tensorboard Plugin available on PyPI by
running the command `pip install tensorboard-plugin-fairness-indicators` .
## Bug fixes and other changes
* Adding support for model comparison using dynamic URL in TensorBoard plugin.
* Depends on `tensorflow-model-analysis>=0.25,<0.26`.
## Breaking changes
* N/A
## Deprecations
* N/A
# Version 0.24.0
## Major Features and Improvements
* N/A
## Bug fixes and other changes
* Fix in the error message while rendering evaluation results in
TensorBoard plugin from evaluation output path provided in the URL.
* Depends on `tensorflow-model-analysis>=0.24,<0.25`.
## Breaking changes
* N/A
## Deprecations
* Deprecating Py3.5 support.
# Version 0.23.0
## Major Features and Improvements
* N/A
## Bug fixes and other changes
* Depends on `tensorboard>=2.3.0,<3`.
* Depends on `tensorflow>=2.3.0,<3`.
* Depends on `tensorflow-model-analysis>=0.23,<0.24`.
* Adding model comparison support in TensorBoard Plugin.
## Breaking changes
* N/A
## Deprecations
* Deprecating Py2 support.
* Note: We plan to drop py3.5 support in the next release.
================================================
FILE: tensorboard_plugin/tensorboard_plugin_fairness_indicators/__init__.py
================================================
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
================================================
FILE: tensorboard_plugin/tensorboard_plugin_fairness_indicators/demo.py
================================================
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Fairness Indicators Plugin Demo."""
import tensorflow.compat.v1 as tf
import tensorflow.compat.v2 as tf2
from absl import app, flags
from tensorboard_plugin_fairness_indicators import summary_v2
tf.enable_eager_execution()
tf = tf2
FLAGS = flags.FLAGS
flags.DEFINE_string(
"eval_result_output_dir", "", "Log dir containing evaluation results."
)
flags.DEFINE_string("logdir", "", "Log dir where demo logs will be written.")
def main(unused_argv):
writer = tf.summary.create_file_writer(FLAGS.logdir)
with writer.as_default():
summary_v2.FairnessIndicators(FLAGS.eval_result_output_dir, step=1)
writer.close()
if __name__ == "__main__":
app.run(main)
================================================
FILE: tensorboard_plugin/tensorboard_plugin_fairness_indicators/metadata.py
================================================
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Plugin-specific global metadata."""
from tensorboard.compat.proto import summary_pb2
PLUGIN_NAME = "fairness_indicators"
def CreateSummaryMetadata(description=None):
return summary_pb2.SummaryMetadata(
summary_description=description,
plugin_data=summary_pb2.SummaryMetadata.PluginData(plugin_name=PLUGIN_NAME),
)
================================================
FILE: tensorboard_plugin/tensorboard_plugin_fairness_indicators/metadata_test.py
================================================
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for util function to create plugin metadata."""
import tensorflow.compat.v1 as tf
from tensorboard_plugin_fairness_indicators import metadata
class MetadataTest(tf.test.TestCase):
def testCreateSummaryMetadata(self):
summary_metadata = metadata.CreateSummaryMetadata("description")
self.assertEqual(metadata.PLUGIN_NAME, summary_metadata.plugin_data.plugin_name)
self.assertEqual("description", summary_metadata.summary_description)
def testCreateSummaryMetadata_withoutDescription(self):
summary_metadata = metadata.CreateSummaryMetadata()
self.assertEqual(metadata.PLUGIN_NAME, summary_metadata.plugin_data.plugin_name)
================================================
FILE: tensorboard_plugin/tensorboard_plugin_fairness_indicators/plugin.py
================================================
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""TensorBoard Fairnss Indicators plugin."""
import os
from typing import Any, Union
import six
import tensorflow as tf
import tensorflow_model_analysis as tfma
from absl import logging
from google.protobuf import json_format
from tensorboard.backend import http_util
from tensorboard.plugins import base_plugin
from werkzeug import wrappers
from tensorboard_plugin_fairness_indicators import metadata
_TEMPLATE_LOCATION = os.path.normpath(
os.path.join(
__file__, "../../" "tensorflow_model_analysis/static/vulcanized_tfma.js"
)
)
def stringify_slice_key_value(
slice_key: tfma.slicer.slicer_lib.SliceKeyType,
) -> str:
"""Stringifies a slice key value.
The string representation of a SingletonSliceKeyType is "feature:value". This
function returns value.
When
multiple columns / features are specified, the string representation of a
SliceKeyType's value is "v1_X_v2_X_..." where v1, v2, ... are values. For
example,
('gender, 'f'), ('age', 5) becomes f_X_5. If no columns / feature
specified, return "Overall".
Note that we do not perform special escaping for slice values that contain
'_X_'. This stringified representation is meant to be human-readbale rather
than a reversible encoding.
The columns will be in the same order as in SliceKeyType. If they are
generated using SingleSliceSpec.generate_slices, they will be in sorted order,
ascending.
Technically float values are not supported, but we don't check for them here.
Args:
----
slice_key: Slice key to stringify. The constituent SingletonSliceKeyTypes
should be sorted in ascending order.
Returns:
-------
String representation of the slice key's value.
"""
if not slice_key:
return "Overall"
# Since this is meant to be a human-readable string, we assume that the
# feature values are valid UTF-8 strings (might not be true in cases where
# people store serialised protos in the features for instance).
# We need to call as_str_any to convert non-string (e.g. integer) values to
# string first before converting to text.
# We use u'{}' instead of '{}' here to avoid encoding a unicode character with
# ascii codec.
values = [
f"{tf.compat.as_text(tf.compat.as_str_any(value))}" for _, value in slice_key
]
return "_X_".join(values)
def _add_cross_slice_key_data(
slice_key: tfma.slicer.slicer_lib.CrossSliceKeyType,
metrics: tfma.view.view_types.MetricsByTextKey,
data: list[Any],
):
"""Adds data for cross slice key.
Baseline and comparison slice keys are joined by '__XX__'.
Args:
----
slice_key: Cross slice key.
metrics: Metrics data for the cross slice key.
data: List where UI data is to be appended.
"""
baseline_key = slice_key[0]
comparison_key = slice_key[1]
stringify_slice_value = (
stringify_slice_key_value(baseline_key)
+ "__XX__"
+ stringify_slice_key_value(comparison_key)
)
stringify_slice = (
tfma.slicer.slicer_lib.stringify_slice_key(baseline_key)
+ "__XX__"
+ tfma.slicer.slicer_lib.stringify_slice_key(comparison_key)
)
data.append(
{
"sliceValue": stringify_slice_value,
"slice": stringify_slice,
"metrics": metrics,
}
)
def convert_slicing_metrics_to_ui_input(
slicing_metrics: list[
tuple[
tfma.slicer.slicer_lib.SliceKeyOrCrossSliceKeyType,
tfma.view.view_types.MetricsByOutputName,
]
],
slicing_column: Union[str, None] = None,
slicing_spec: Union[tfma.slicer.slicer_lib.SingleSliceSpec, None] = None,
output_name: str = "",
multi_class_key: str = "",
) -> Union[list[dict[str, Any]], None]:
"""Renders the Fairness Indicator view.
Args:
----
slicing_metrics: tfma.EvalResult.slicing_metrics.
slicing_column: The slicing column to to filter results. If both
slicing_column and slicing_spec are None, show all eval results.
slicing_spec: The slicing spec to filter results. If both slicing_column and
slicing_spec are None, show all eval results.
output_name: The output name associated with metric (for multi-output
models).
multi_class_key: The multi-class key associated with metric (for multi-class
models).
Returns:
-------
A list of dicts for each slice, where each dict contains keys 'sliceValue',
'slice', and 'metrics'.
Raises:
------
ValueError if no related eval result found or both slicing_column and
slicing_spec are not None.
"""
if slicing_column and slicing_spec:
raise ValueError(
'Only one of the "slicing_column" and "slicing_spec" parameters '
"can be set."
)
if slicing_column:
slicing_spec = tfma.slicer.slicer_lib.SingleSliceSpec(columns=[slicing_column])
data = []
for slice_key, metric_value in slicing_metrics:
if (
metric_value is not None
and output_name in metric_value
and multi_class_key in metric_value[output_name]
):
metrics = metric_value[output_name][multi_class_key]
# To add evaluation data for cross slice comparison.
if tfma.slicer.slicer_lib.is_cross_slice_key(slice_key):
_add_cross_slice_key_data(slice_key, metrics, data)
# To add evaluation data for regular slices.
elif (
slicing_spec is None
or not slice_key
or slicing_spec.is_slice_applicable(slice_key)
):
data.append(
{
"sliceValue": stringify_slice_key_value(slice_key),
"slice": tfma.slicer.slicer_lib.stringify_slice_key(slice_key),
"metrics": metrics,
}
)
if not data:
raise ValueError(
'No eval result found for output_name:"%s" and '
'multi_class_key:"%s" and slicing_column:"%s" and slicing_spec:"%s".'
% (output_name, multi_class_key, slicing_column, slicing_spec)
)
return data
class FairnessIndicatorsPlugin(base_plugin.TBPlugin):
"""A plugin to visualize Fairness Indicators."""
plugin_name = metadata.PLUGIN_NAME
def __init__(self, context):
"""Instantiates plugin via TensorBoard core.
Args:
----
context: A base_plugin.TBContext instance. A magic container that
TensorBoard uses to make objects available to the plugin.
"""
self._multiplexer = context.multiplexer
def get_plugin_apps(self):
"""Gets all routes offered by the plugin.
This method is called by TensorBoard when retrieving all the
routes offered by the plugin.
Returns
-------
A dictionary mapping URL path to route that handles it.
"""
return {
"/get_evaluation_result": self._get_evaluation_result,
"/get_evaluation_result_from_remote_path": self._get_evaluation_result_from_remote_path,
"/index.js": self._serve_js,
"/vulcanized_tfma.js": self._serve_vulcanized_js,
}
def frontend_metadata(self):
return base_plugin.FrontendMetadata(
es_module_path="/index.js",
disable_reload=False,
tab_name="Fairness Indicators",
remove_dom=False,
element_name=None,
)
def is_active(self):
"""Determines whether this plugin is active.
This plugin is only active if TensorBoard sampled any summaries
relevant to the plugin.
Returns
-------
Whether this plugin is active.
"""
return bool(
self._multiplexer.PluginRunToTagToContent(
FairnessIndicatorsPlugin.plugin_name
)
)
# pytype: disable=wrong-arg-types
@wrappers.Request.application
def _serve_js(self, request):
filepath = os.path.join(os.path.dirname(__file__), "static", "index.js")
with open(filepath) as infile:
contents = infile.read()
return http_util.Respond(
request, contents, content_type="application/javascript"
)
@wrappers.Request.application
def _serve_vulcanized_js(self, request):
with open(_TEMPLATE_LOCATION) as infile:
contents = infile.read()
return http_util.Respond(
request, contents, content_type="application/javascript"
)
@wrappers.Request.application
def _get_evaluation_result(self, request):
run = request.args.get("run")
try:
run = six.ensure_text(run)
except (UnicodeDecodeError, AttributeError):
pass
data = []
try:
eval_result_output_dir = six.ensure_text(
self._multiplexer.Tensors(run, FairnessIndicatorsPlugin.plugin_name)[
0
].tensor_proto.string_val[0]
)
eval_result = tfma.load_eval_result(output_path=eval_result_output_dir)
# TODO(b/141283811): Allow users to choose different model output names
# and class keys in case of multi-output and multi-class model.
data = convert_slicing_metrics_to_ui_input(eval_result.slicing_metrics)
except (KeyError, json_format.ParseError) as error:
logging.info("Error while fetching evaluation data, %s", error)
return http_util.Respond(request, data, content_type="application/json")
def _get_output_file_format(self, evaluation_output_path):
file_format = os.path.splitext(evaluation_output_path)[1]
if file_format:
return file_format[1:]
return ""
@wrappers.Request.application
def _get_evaluation_result_from_remote_path(self, request):
evaluation_output_path = request.args.get("evaluation_output_path")
try:
evaluation_output_path = six.ensure_text(evaluation_output_path)
except (UnicodeDecodeError, AttributeError):
pass
try:
eval_result = tfma.load_eval_result(
os.path.dirname(evaluation_output_path),
output_file_format=self._get_output_file_format(evaluation_output_path),
)
data = convert_slicing_metrics_to_ui_input(eval_result.slicing_metrics)
except (KeyError, json_format.ParseError) as error:
logging.info("Error while fetching evaluation data, %s", error)
data = []
return http_util.Respond(request, data, content_type="application/json")
# pytype: enable=wrong-arg-types
================================================
FILE: tensorboard_plugin/tensorboard_plugin_fairness_indicators/plugin_test.py
================================================
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests the Tensorboard Fairness Indicators plugin."""
import os
import shutil
from collections import abc
from unittest import mock
import pytest
import six
import tensorflow.compat.v1 as tf
import tensorflow.compat.v2 as tf2
import tensorflow_model_analysis as tfma
from google.protobuf import text_format
from tensorboard.backend import application
from tensorboard.backend.event_processing import (
plugin_event_multiplexer as event_multiplexer,
)
from tensorboard.plugins import base_plugin
from tensorflow_model_analysis.utils import example_keras_model
from werkzeug import test as werkzeug_test
from werkzeug import wrappers
from tensorboard_plugin_fairness_indicators import plugin, summary_v2
tf.enable_eager_execution()
tf = tf2
class PluginTest(tf.test.TestCase):
"""Tests for Fairness Indicators plugin server."""
def setUp(self):
super(PluginTest, self).setUp()
# Log dir to save temp events into.
self._log_dir = self.get_temp_dir()
self._eval_result_output_dir = os.path.join(self.get_temp_dir(), "eval_result")
if not os.path.isdir(self._eval_result_output_dir):
os.mkdir(self._eval_result_output_dir)
writer = tf.summary.create_file_writer(self._log_dir)
with writer.as_default():
summary_v2.FairnessIndicators(self._eval_result_output_dir, step=1)
writer.close()
# Start a server that will receive requests.
self._multiplexer = event_multiplexer.EventMultiplexer(
{
".": self._log_dir,
}
)
self._context = base_plugin.TBContext(
logdir=self._log_dir, multiplexer=self._multiplexer
)
self._plugin = plugin.FairnessIndicatorsPlugin(self._context)
self._multiplexer.Reload()
wsgi_app = application.TensorBoardWSGI([self._plugin])
self._server = werkzeug_test.Client(wsgi_app, wrappers.Response)
self._routes = self._plugin.get_plugin_apps()
def tearDown(self):
super(PluginTest, self).tearDown()
shutil.rmtree(self._log_dir, ignore_errors=True)
def _export_keras_model(self, classifier):
temp_eval_export_dir = os.path.join(self.get_temp_dir(), "eval_export_dir")
classifier.compile(optimizer=tf.keras.optimizers.Adam(), loss="mse")
tf.saved_model.save(classifier, temp_eval_export_dir)
return temp_eval_export_dir
def _write_tf_examples_to_tfrecords(self, examples):
data_location = os.path.join(self.get_temp_dir(), "input_data.rio")
with tf.io.TFRecordWriter(data_location) as writer:
for example in examples:
writer.write(example.SerializeToString())
return data_location
def _make_example(self, age, language, label):
example = tf.train.Example()
example.features.feature["age"].float_list.value[:] = [age]
example.features.feature["language"].bytes_list.value[:] = [
six.ensure_binary(language, "utf8")
]
example.features.feature["label"].float_list.value[:] = [label]
return example
def _make_eval_config(self):
return text_format.Parse(
"""
model_specs {
signature_name: "serving_default"
prediction_key: "predictions" # placeholder
label_key: "label" # placeholder
}
slicing_specs {}
metrics_specs {
metrics {
class_name: "ExampleCount"
}
metrics {
class_name: "Accuracy"
}
}
""",
tfma.EvalConfig(),
)
def testRoutes(self):
self.assertIsInstance(self._routes["/get_evaluation_result"], abc.Callable)
self.assertIsInstance(
self._routes["/get_evaluation_result_from_remote_path"], abc.Callable
)
self.assertIsInstance(self._routes["/index.js"], abc.Callable)
self.assertIsInstance(self._routes["/vulcanized_tfma.js"], abc.Callable)
@mock.patch.object(
event_multiplexer.EventMultiplexer,
"PluginRunToTagToContent",
return_value={"bar": {"foo": b""}},
)
def testIsActive(self, get_random_stub): # pylint: disable=unused-argument
self.assertTrue(self._plugin.is_active())
@mock.patch.object(
event_multiplexer.EventMultiplexer, "PluginRunToTagToContent", return_value={}
)
def testIsInactive(self, get_random_stub): # pylint: disable=unused-argument
self.assertFalse(self._plugin.is_active())
def testIndexJsRoute(self):
"""Tests that the /tags route offers the correct run to tag mapping."""
response = self._server.get("/data/plugin/fairness_indicators/index.js")
self.assertEqual(200, response.status_code)
@pytest.mark.xfail(
reason=(
"Failing on `master` as of `942b672457e07ac2ac27de0bcc45a4c80276785c`. "
"Please remove once fixed."
)
)
def testVulcanizedTemplateRoute(self):
"""Tests that the /tags route offers the correct run to tag mapping."""
response = self._server.get(
"/data/plugin/fairness_indicators/vulcanized_tfma.js"
)
self.assertEqual(200, response.status_code)
def testGetEvalResultsRoute(self):
model_location = self._export_keras_model(
example_keras_model.get_example_classifier_model(
input_feature_key="language"
)
)
examples = [
self._make_example(age=3.0, language="english", label=1.0),
self._make_example(age=3.0, language="chinese", label=0.0),
self._make_example(age=4.0, language="english", label=1.0),
self._make_example(age=5.0, language="chinese", label=1.0),
self._make_example(age=5.0, language="hindi", label=1.0),
]
eval_config = self._make_eval_config()
data_location = self._write_tf_examples_to_tfrecords(examples)
_ = tfma.run_model_analysis(
eval_shared_model=tfma.default_eval_shared_model(
eval_saved_model_path=model_location, eval_config=eval_config
),
eval_config=eval_config,
data_location=data_location,
output_path=self._eval_result_output_dir,
)
response = self._server.get(
"/data/plugin/fairness_indicators/get_evaluation_result?run=."
)
self.assertEqual(200, response.status_code)
def testGetEvalResultsFromURLRoute(self):
model_location = self._export_keras_model(
example_keras_model.get_example_classifier_model(
input_feature_key="language"
)
)
examples = [
self._make_example(age=3.0, language="english", label=1.0),
self._make_example(age=3.0, language="chinese", label=0.0),
self._make_example(age=4.0, language="english", label=1.0),
self._make_example(age=5.0, language="chinese", label=1.0),
self._make_example(age=5.0, language="hindi", label=1.0),
]
eval_config = self._make_eval_config()
data_location = self._write_tf_examples_to_tfrecords(examples)
_ = tfma.run_model_analysis(
eval_shared_model=tfma.default_eval_shared_model(
eval_saved_model_path=model_location, eval_config=eval_config
),
eval_config=eval_config,
data_location=data_location,
output_path=self._eval_result_output_dir,
)
response = self._server.get(
"/data/plugin/fairness_indicators/"
+ "get_evaluation_result_from_remote_path?evaluation_output_path="
+ os.path.join(self._eval_result_output_dir, tfma.METRICS_KEY)
)
self.assertEqual(200, response.status_code)
def testGetOutputFileFormat(self):
self.assertEqual("", self._plugin._get_output_file_format("abc_path"))
self.assertEqual(
"tfrecord", self._plugin._get_output_file_format("abc_path.tfrecord")
)
================================================
FILE: tensorboard_plugin/tensorboard_plugin_fairness_indicators/static/index.js
================================================
// Copyright 2019 The TensorFlow Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// ==============================================================================
/** Function render Fairness Indicators UI. */
export async function render() {
const script = document.createElement('script');
script.src = "./vulcanized_tfma.js";
document.body.appendChild(script);
const container = document.createElement('fairness-tensorboard-container');
document.body.appendChild(container);
}
================================================
FILE: tensorboard_plugin/tensorboard_plugin_fairness_indicators/summary_v2.py
================================================
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Summaries for Fairness Indicators plugin."""
from tensorboard.compat import tf2 as tf
from tensorboard_plugin_fairness_indicators import metadata
def FairnessIndicators(eval_result_output_dir, step=None, description=None):
"""Write a Fairness Indicators summary.
Arguments:
---------
eval_result_output_dir: Directory output created by
tfma.model_eval_lib.ExtractEvaluateAndWriteResults API, which contains
'metrics' file having MetricsForSlice results.
step: Explicit `int64`-castable monotonic step value for this summary. If
omitted, this defaults to `tf.summary.experimental.get_step()`, which must
not be None.
description: Optional long-form description for this summary, as a constant
`str`. Markdown is supported. Defaults to empty.
Returns:
-------
True on success, or false if no summary was written because no default
summary writer was available.
Raises:
------
ValueError: if a default writer exists, but no step was provided and
`tf.summary.experimental.get_step()` is None.
"""
with tf.summary.experimental.summary_scope(metadata.PLUGIN_NAME):
return tf.summary.write(
tag=metadata.PLUGIN_NAME,
tensor=tf.constant(eval_result_output_dir),
step=step,
metadata=metadata.CreateSummaryMetadata(description),
)
================================================
FILE: tensorboard_plugin/tensorboard_plugin_fairness_indicators/summary_v2_test.py
================================================
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for Fairness Indicators summary."""
import glob
import os
import six
import tensorflow.compat.v1 as tf
from tensorboard.compat import tf2
from tensorboard_plugin_fairness_indicators import metadata, summary_v2
try:
tf2.__version__ # Force lazy import to resolve
except ImportError:
tf2 = None
try:
tf.enable_eager_execution()
except AttributeError:
# TF 2.0 doesn't have this symbol because eager is the default.
pass
class SummaryV2Test(tf.test.TestCase):
def _write_summary(self, eval_result_output_dir):
writer = tf2.summary.create_file_writer(self.get_temp_dir())
with writer.as_default():
summary_v2.FairnessIndicators(eval_result_output_dir, step=1)
writer.close()
def _get_event(self):
event_files = sorted(glob.glob(os.path.join(self.get_temp_dir(), "*")))
self.assertEqual(len(event_files), 1)
events = list(tf.train.summary_iterator(event_files[0]))
# Expect a boilerplate event for the file_version, then the summary one.
self.assertEqual(len(events), 2)
return events[1]
def testSummary(self):
self._write_summary("output_dir")
event = self._get_event()
self.assertEqual(1, event.step)
summary_value = event.summary.value[0]
self.assertEqual(metadata.PLUGIN_NAME, summary_value.tag)
self.assertEqual(
"output_dir", six.ensure_text(summary_value.tensor.string_val[0], "utf-8")
)
self.assertEqual(
metadata.PLUGIN_NAME, summary_value.metadata.plugin_data.plugin_name
)
================================================
FILE: tensorboard_plugin/tensorboard_plugin_fairness_indicators/version.py
================================================
# Copyright 2018 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Contains the version string of Fairness Indicators Tensorboard Plugin."""
# Note that setup.py uses this version.
__version__ = "0.49.0.dev"