Repository: linkedin/LiFT Branch: main Commit: b4fe810f202f Files: 54 Total size: 62.1 MB Directory structure: gitextract_uglroetk/ ├── .github/ │ └── workflows/ │ └── ci.yml ├── .gitignore ├── CONTRIBUTING.md ├── LICENSE ├── NOTICE ├── README.md ├── acknowledgements.md ├── build.gradle ├── dataset-fairness.md ├── dependencies.md ├── docs/ │ └── release-notes.md ├── equality-of-opportunity.md ├── gradle/ │ ├── java-publication.gradle │ ├── release.gradle │ └── wrapper/ │ ├── gradle-wrapper.jar │ └── gradle-wrapper.properties ├── gradle.properties ├── gradlew ├── gradlew.bat ├── lift/ │ ├── build.gradle │ └── src/ │ ├── main/ │ │ └── scala/ │ │ └── com/ │ │ └── linkedin/ │ │ └── lift/ │ │ ├── eval/ │ │ │ ├── FairnessMetricsUtils.scala │ │ │ ├── MeasureDatasetFairnessMetricsCmdLineArgs.scala │ │ │ ├── MeasureModelFairnessMetricsCmdLineArgs.scala │ │ │ └── jobs/ │ │ │ ├── MeasureDatasetFairnessMetrics.scala │ │ │ └── MeasureModelFairnessMetrics.scala │ │ ├── lib/ │ │ │ ├── DivergenceUtils.scala │ │ │ ├── PermutationTestUtils.scala │ │ │ ├── PositionBiasUtils.scala │ │ │ ├── StatsUtils.scala │ │ │ └── testing/ │ │ │ ├── TestCustomMetric.scala │ │ │ ├── TestUtils.scala │ │ │ └── TestValues.scala │ │ ├── mitigation/ │ │ │ └── EOppUtils.scala │ │ └── types/ │ │ ├── BenefitMap.scala │ │ ├── CustomMetric.scala │ │ ├── Distribution.scala │ │ ├── EOppCaseClasses.scala │ │ ├── FairnessResult.scala │ │ └── ModelPrediction.scala │ └── test/ │ ├── data/ │ │ ├── TrainingData.csv │ │ └── ValidationData.csv │ └── scala/ │ └── com/ │ └── linkedin/ │ └── lift/ │ ├── eval/ │ │ └── FairnessMetricsUtilsTest.scala │ ├── lib/ │ │ ├── DivergenceUtilsTest.scala │ │ ├── PermutationTestUtilsTest.scala │ │ ├── PositionBiasUtilsTest.scala │ │ └── StatsUtilsTest.scala │ ├── mitigation/ │ │ └── EOppUtilsTest.scala │ └── types/ │ ├── BenefitMapTest.scala │ ├── DistributionTest.scala │ ├── FairnessResultTest.scala │ └── ModelPredictionTest.scala ├── model-fairness.md ├── settings.gradle └── version.properties ================================================ FILE CONTENTS ================================================ ================================================ FILE: .github/workflows/ci.yml ================================================ # CI build that assembles artifacts and runs tests. # If validation is successful this workflow releases from the main dev branch. # # - skipping CI: add [skip ci] to the commit message # - skipping release: add [skip release] to the commit message name: CI on: push: branches: ['main'] tags-ignore: [v*] # release tags are autogenerated after a successful CI, no need to run CI against them paths-ignore: - 'docs/**' - '**.md' pull_request: branches: ['**'] jobs: build: runs-on: ubuntu-latest strategy: matrix: include: - scala-version: 2.11.8 spark-version: 2.3.0 - scala-version: 2.11.8 spark-version: 2.4.3 - scala-version: 2.12.11 spark-version: 2.4.3 if: "! contains(toJSON(github.event.commits.*.message), '[skip ci]')" steps: - name: Check out code # https://github.com/actions/checkout uses: actions/checkout@v6 with: # Needed to get all tags. Refer https://github.com/shipkit/shipkit-changelog#fetch-depth-on-ci fetch-depth: '0' - name: Setup Java uses: actions/setup-java@v5 with: distribution: temurin # Added for actions/setup-java v2+ compatibility with: java-version: 1.8 - name: Build and test code, and test artifact publishing run: ./gradlew build publishToMavenLocal artifactoryPublishAll -s -Partifactory.dryRun -PscalaVersion=$SCALA_VERSION -PsparkVersion=$SPARK_VERSION env: SCALA_VERSION: ${{ matrix.scala-version }} SPARK_VERSION: ${{ matrix.spark-version }} - name: Release to Maven Central and LinkedIn Artifactory # Release job, only for pushes to the main development branch if: github.event_name == 'push' && github.ref == 'refs/heads/main' && github.repository == 'linkedin/LiFT' && !contains(toJSON(github.event.commits.*.message), '[skip release]') run: ./gradlew publishToSonatype closeAndReleaseStagingRepository artifactoryPublishAll -s -PscalaVersion=$SCALA_VERSION -PsparkVersion=$SPARK_VERSION env: SCALA_VERSION: ${{ matrix.scala-version }} SPARK_VERSION: ${{ matrix.spark-version }} SONATYPE_USER: ${{ secrets.SONATYPE_USER }} SONATYPE_PWD: ${{ secrets.SONATYPE_PWD }} PGP_KEY: ${{ secrets.PGP_KEY }} PGP_PWD: ${{ secrets.PGP_PWD }} ARTIFACTORY_USER: ${{ secrets.ARTIFACTORY_USER }} ARTIFACTORY_KEY: ${{ secrets.ARTIFACTORY_KEY }} github-release: runs-on: ubuntu-latest needs: build # Release job, only for pushes to the main development branch if: github.event_name == 'push' && github.ref == 'refs/heads/main' && github.repository == 'linkedin/LiFT' && !contains(toJSON(github.event.commits.*.message), '[skip release]') steps: - name: Check out code # https://github.com/actions/checkout uses: actions/checkout@v6 with: # Needed to get all tags. Refer https://github.com/shipkit/shipkit-changelog#fetch-depth-on-ci fetch-depth: '0' - name: Setup Java uses: actions/setup-java@v5 with: distribution: temurin # Added for actions/setup-java v2+ compatibility with: java-version: 1.8 - name: Release run: ./gradlew githubRelease -s env: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} ================================================ FILE: .gitignore ================================================ .gradle out build .DS_Store .idea spark-warehouse *.ipr *.iml *.iws *~ ================================================ FILE: CONTRIBUTING.md ================================================ # Contribution Agreement As a contributor, you represent that the code you submit is your original work or that of your employer (in which case you represent you have the right to bind your employer). By submitting code, you (and, if applicable, your employer) are licensing the submitted code to LinkedIn and the open source community subject to the BSD 2-Clause license. # Responsible Disclosure of Security Vulnerabilities **Do not file an issue on Github for security issues.** Please review the [guidelines for disclosure][disclosure_guidelines]. Reports should be encrypted using PGP ([public key][pubkey]) and sent to [security@linkedin.com][disclosure_email] preferably with the title "Vulnerability in Github LinkedIn/lift - <short summary>". # Tips for Getting Your Pull Request Accepted 1. Make sure all new features are tested and the tests pass. 2. Bug fixes must include a test case demonstrating the error that it fixes. 3. Open an issue first and seek advice for your change before submitting a pull request. Large features which have never been discussed are unlikely to be accepted. **You have been warned.** [disclosure_guidelines]: https://www.linkedin.com/help/linkedin/answer/62924 [pubkey]: https://www.linkedin.com/help/linkedin/answer/79676 [disclosure_email]: mailto:security@linkedin.com?subject=Vulnerability%20in%20Github%20LinkedIn/fairsscale%20-%20%3Csummary%3E ================================================ FILE: LICENSE ================================================ BSD 2-CLAUSE LICENSE Copyright 2020 LinkedIn Corporation All Rights Reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ================================================ FILE: NOTICE ================================================ Copyright 2020 LinkedIn Corporation All Rights Reserved. Licensed under the BSD 2-Clause License (the "License"). See License in the project root for license information. ================================================================================ This product uses Gradle (https://github.com/gradle/gradle) as its build system and includes the Gradle wrapper script. Copyright 2020 Gradle License: Apache-2.0 ======================================================================== External dependencies ======================================================================== In addition, this product automatically loads third party code from an external repository. Such third party code is subject to other license terms than as set forth above. In addition, such third party code may also depend on and load multiple tiers of dependencies. ================================================ FILE: README.md ================================================ # The LinkedIn Fairness Toolkit (LiFT) [![Build Status](https://github.com/linkedin/LiFT/actions/workflows/ci.yml/badge.svg?branch=main&event=push)](https://github.com/linkedin/LiFT/actions/workflows/ci.yml?query=branch%3Amain+event%3Apush) [![Release](https://img.shields.io/github/v/release/linkedin/LiFT)](https://github.com/linkedin/LiFT/releases/) [![License](https://img.shields.io/badge/License-BSD%202--Clause-orange.svg)](LICENSE) --- > 📣 We've moved from Bintray to [Artifactory](https://linkedin.jfrog.io/artifactory/LiFT/)! > > As of version [0.2.2](https://github.com/linkedin/LiFT/releases/tag/v0.2.2), we are only publishing versions > to LinkedIn's Artifactory instance rather than Bintray, which is approaching end of life. ## Introduction The LinkedIn Fairness Toolkit (LiFT) is a Scala/Spark library that enables the measurement of fairness and the mitigation of bias in large-scale machine learning workflows. The measurement module includes measuring biases in training data, evaluating fairness metrics for ML models, and detecting statistically significant differences in their performance across different subgroups. It can also be used for ad-hoc fairness analysis. The mitigation part includes a post-processing method for transforming model scores to ensure the so-called equality of opportunity for rankings (in the presence/absence of position bias). This method can be directly applied to the model-generated scores without changing the existing model training pipeline. This library was created by [Sriram Vasudevan](https://www.linkedin.com/in/vasudevansriram/) and [Krishnaram Kenthapadi](https://www.linkedin.com/in/krishnaramkenthapadi/) (work done while at LinkedIn). Additional Contributors: 1. [Preetam Nandy](https://www.linkedin.com/in/preetamnandy/) ## Copyright Copyright 2020 LinkedIn Corporation All Rights Reserved. Licensed under the BSD 2-Clause License (the "License"). See [License](LICENSE) in the project root for license information. ## Features LiFT provides a configuration-driven Spark job for scheduled deployments, with support for custom metrics through User Defined Functions (UDFs). APIs at various levels are also exposed to enable users to build upon the library's capabilities as they see fit. One can thus opt for a plug-and-play approach or deploy a customized job that uses LiFT. As a result, the library can be easily integrated into ML pipelines. It can also be utilized in Jupyter notebooks for more exploratory fairness analyses. LiFT leverages Apache Spark to load input data into in-memory, fault-tolerant and scalable data structures. It strategically caches datasets and any pre-computation performed. Distributed computation is balanced with single system execution to obtain a good mix of scalability and speed. For example, distance, distribution and divergence related metrics are computed on the entire dataset in a distributed manner, while benefit vectors and permutation tests (for model performance) are computed on scored dataset samples that can be collected to the driver. The LinkedIn Fairness Toolkit (LiFT) provides the following capabilities: 1. [Measuring Fairness Metrics on Training Data](dataset-fairness.md) 2. [Measuring Fairness Metrics for Model Performance](model-fairness.md) 3. [Achieving Equality of Opportunity](equality-of-opportunity.md) As part of the model performance metrics, it also contains the implementation of a new permutation testing framework that detects statistically significant differences in model performance (as measured by an arbitrary performance metric) across different subgroups. High-level details about the parameters, metrics supported and usage are described below. More details about the metrics themselves are provided in the links above. A list of automatically downloaded direct dependencies are provided [here](dependencies.md). ## Usage ### Building the Library It is recommended to use Scala 2.11.8 and Spark 2.3.0. To build, run the following: ```bash ./gradlew build ``` This will produce a JAR file in the ``./lift/build/libs/`` directory. If you want to use the library with Spark 2.4 (and the Scala 2.11.8 default), you can specify this when running the build command. ```bash ./gradlew build -PsparkVersion=2.4.3 ``` You can also build an artifact with Spark 2.4 and Scala 2.12. ```bash ./gradlew build -PsparkVersion=2.4.3 -PscalaVersion=2.12.11 ``` Tests typically run with the `test` task. If you want to force-run all tests, you can use: ```bash ./gradlew cleanTest test --no-build-cache ``` To force rebuild the library, you can use: ```bash ./gradlew clean build --no-build-cache ``` ### Add a LiFT Dependency to Your Project Please check [Artifactory](https://linkedin.jfrog.io/artifactory/LiFT/) for the latest artifact versions. #### Gradle Example The artifacts are available in LinkedIn's Artifactory instance and in Maven Central, so you can specify either repository in the top-level build.gradle file. ``` repositories { mavenCentral() maven { url "https://linkedin.jfrog.io/artifactory/open-source/" } } ``` Add the LiFT dependency to the module-level `build.gradle` file. Here are some examples for multiple recent Spark/Scala version combinations: ``` dependencies { compile 'com.linkedin.lift:lift_2.3.0_2.11:0.1.4' } ``` ``` dependencies { compile 'com.linkedin.lift:lift_2.4.3_2.11:0.1.4' } ``` ``` dependencies { compile 'com.linkedin.lift:lift_2.4.3_2.12:0.1.4' } ``` #### Using the JAR File Depending on the mode of usage, the built JAR can be deployed as part of an offline data pipeline, depended upon to build jobs using its APIs, or added to the classpath of a Spark Jupyter notebook or a Spark Shell instance. For example: ```bash $SPARK_HOME/bin/spark-shell --jars target/lift_2.3.0_2.11_0.1.4.jar ``` ### Usage Examples #### Measuring Dataset Fairness Metrics using the provided Spark job LiFT provides a Spark job for measuring fairness metrics for training data, as well as for the validation or test dataset: `com.linkedin.fairness.eval.jobs.MeasureDatasetFairnessMetrics` This job can be configured using various parameters to compute fairness metrics on the dataset of interest: ``` 1. datasetPath: Input data path 2. protectedDatasetPath: Input path to the protected dataset (optional). If not provided, the library attempts to use the right dataset based on the protected attribute. 3. dataFormat: Format of the input datasets. This is the parameter passed to the Spark reader's format method. Defaults to avro. 4. dataOptions: A map of options to be used with Spark's reader (optional). 5. uidField: The unique ID field, like a memberId field. It acts as the join key for the primary dataset. 6. labelField: The label field 7. protectedAttributeField: The protected attribute field 8. uidProtectedAttributeField: The uid field (join key) for the protected attribute dataset 9. outputPath: Output data path 10. referenceDistribution: A reference distribution to compare against (optional). Only accepted value currently is UNIFORM. 11. distanceMetrics: Distance and divergence metrics like SKEWS, INF_NORM_DIST, TOTAL_VAR_DIST, JS_DIVERGENCE, KL_DIVERGENCE and DEMOGRAPHIC_PARITY (optional). 12. overallMetrics: Aggregate metrics like GENERALIZED_ENTROPY_INDEX, ATKINSONS_INDEX, THEIL_L_INDEX, THEIL_T_INDEX and COEFFICIENT_OF_VARIATION, along with their corresponding parameters. 13. benefitMetrics: The distance/divergence metrics to use as the benefit vector when computing the overall metrics. Acceptable values are SKEWS and DEMOGRAPHIC_PARITY. ``` The most up-to-date information on these parameters can always be found [here](lift/src/main/scala/com/linkedin/lift/eval/MeasureDatasetFairnessMetricsCmdLineArgs.scala). The Spark job performs no preprocessing of the input data, and makes assumptions like assuming that the unique ID field (the join key) is stored in the same format in the input data and the `protectedAttribute` data. This might not be the case for your dataset, in which case you can always create your own Spark job similar to the provided example (described below). #### Measuring Model Fairness Metrics using the provided Spark job LiFT provides a Spark job for measuring fairness metrics for model performance, based on the labels and scores of the test or validation data: `com.linkedin.fairness.eval.jobs.MeasureModelFairnessMetrics` This job can be configured using various parameters to compute fairness metrics on the dataset of interest: ``` 1. datasetPath Input data path 2. protectedDatasetPath Input path to the protected dataset (optional). If not provided, the library attempts to use the right dataset based on the protected attribute. 3. dataFormat: Format of the input datasets. This is the parameter passed to the Spark reader's format method. Defaults to avro. 4. dataOptions: A map of options to be used with Spark's reader (optional). 5. uidField The unique ID field, like a memberId field. It acts as the join key for the primary dataset. 6. labelField The label field 7. scoreField The score field 8. scoreType Whether the scores are raw scores or probabilities. Accepted values are RAW or PROB. 9. protectedAttributeField The protected attribute field 10. uidProtectedAttributeField The uid field (join key) for the protected attribute dataset. 11. groupIdField An optional field to be used for grouping, in case of ranking metrics 12. outputPath Output data path 13. referenceDistribution A reference distribution to compare against (optional). Only accepted value currently is UNIFORM. 14. approxRows The approximate number of rows to sample from the input data when computing model metrics. The final sampled value is min(numRowsInDataset, approxRows) 15. labelZeroPercentage The percentage of the sampled data that must be negatively labeled. This is useful in case the input data is highly skewed and you believe that stratified sampling will not obtain sufficient number of examples of a certain label. 16. thresholdOpt An optional value that contains a threshold. It is used in case you want to generate hard binary classifications. If not provided and you request metrics that depend on explicit label predictions (eg. precision), the scoreType information is used to convert the scores into the probabilities of predicting positives. This is used for computing expected positive prediction counts. 17. numTrials Number of trials to run the permutation test for. More trials yield results with lower variance in the computed p-value, but takes more time 18. seed The random value seed 19. distanceMetrics Distance and divergence metrics that are to be computed. These are metrics such as Demographic Parity and Equalized Odds. 20. permutationMetrics The metrics to use for permutation testing 21. distanceBenefitMetrics The model metrics that are to be used for computing benefit vectors, one for each distance metric specified. 22. performanceBenefitMetrics The model metrics that are to be used for computing benefit vectors, one for each model performance metric specified. 23. overallMetrics The aggregate metrics that are to be computed on each of the benefit vectors generated. ``` The most up-to-date information on these parameters can always be found [here](lift/src/main/scala/com/linkedin/lift/eval/MeasureModelFairnessMetricsCmdLineArgs.scala). The Spark job performs no preprocessing of the input data, and makes assumptions like assuming that the unique ID field (the join key) is stored in the same format in the input data and the `protectedAttribute` data. This might not be the case for your dataset, in which case you can always create your own Spark job similar to the provided example (described below) #### Learning and Applying Equality of Opportunity (EOpp) on Local Datasets An example is provided in [EOppUtilsTest](lift/src/Test/scala/com/linkedin/lift/mitigation/EOppUtilsTest.scala) for applying the EOpp transformation to local datasets. We provide two simulated datasets [TrainingData.csv](lift/src/Test/Data/TrainingData.csv) and [ValidationData.csv]((lift/src/Test/Data/ValidationData.csv)) each containing 1M samples. The workflow is provided as a test function eOppTransformationTest() consisting of the following steps: 1. Learning position bias corrected EOpp transformation using the training data 2. Applying the EOpp transformation on the validation data 3. Checking EOpp in the transformed validation data with position bias 4. Checking the (optional) score distribution preserving property of the EOpp transformation #### Custom Spark jobs built on LiFT If you are implementing your own driver program to measure dataset metrics, here's how you can make use of LiFT: ```scala object MeasureDatasetFairnessMetrics { def main(progArgs: Array[String]): Unit = { // Get spark session val spark = SparkSession .builder() .appName(getClass.getSimpleName) .getOrCreate() // Parse args val args = MeasureDatasetFairnessMetricsCmdLineArgs.parseArgs(progArgs) // Load and preprocess data val df = spark.read.format(args.dataFormat) .load(args.datasetPath) .select(args.uidField, args.labelField) // Load protected data and join val joinedDF = ... joinedDF.persist // Obtain reference distribution (optional). This can be used to provide a // custom distribution to compare the dataset against. val referenceDistrOpt = ... // Passing in the appropriate parameters to this API computes and writes // out the fairness metrics FairnessMetricsUtils.computeAndWriteDatasetMetrics(distribution, referenceDistrOpt, args) } } ``` A complete example for the above can be found [here](lift/src/main/scala/com/linkedin/lift/eval/jobs/MeasureDatasetFairnessMetrics.scala). In the case of measuring model metrics, a similar Spark job can be implemented: ```scala object MeasureModelFairnessMetrics { def main(progArgs: Array[String]): Unit = { // Get spark session val spark = SparkSession .builder() .appName(getClass.getSimpleName) .getOrCreate() // Parse args val args = MeasureModelFairnessMetricsCmdLineArgs.parseArgs(progArgs) // Load and preprocess data val df = spark.read.format(args.dataFormat) .load(args.datasetPath) .select(args.uidField, args.labelField) // Load protected data and join val joinedDF = ... joinedDF.persist // Obtain reference distribution (optional). This can be used to provide a // custom distribution to compare the dataset against. val referenceDistrOpt = ... // Passing in the appropriate parameters to this API computes and writes // out the fairness metrics FairnessMetricsUtils.computeAndWriteModelMetrics( joinedDF, referenceDistrOpt, args) } } ``` A complete example for the above can be found [here](lift/src/main/scala/com/linkedin/lift/eval/jobs/MeasureModelFairnessMetrics.scala). ## Contributions If you would like to contribute to this project, please review the instructions [here](CONTRIBUTING.md). ## Acknowledgments Implementations of some methods in LiFT were inspired by other open-source libraries. LiFT also contains the implementation of a new permutation testing framework. Discussions with several LinkedIn employees influenced aspects of this library. A full list of acknowledgements can be found [here](acknowledgements.md). ## Citations If you publish material that references the LinkedIn Fairness Toolkit (LiFT), you can use the following citations: ``` @inproceedings{vasudevan20lift, author = {Vasudevan, Sriram and Kenthapadi, Krishnaram}, title = {{LiFT}: A Scalable Framework for Measuring Fairness in ML Applications}, booktitle = {Proceedings of the 29th ACM International Conference on Information and Knowledge Management}, series = {CIKM '20}, year = {2020}, pages = {}, numpages = {8} } @misc{lift, author = {Vasudevan, Sriram and Kenthapadi, Krishnaram}, title = {The LinkedIn Fairness Toolkit ({LiFT})}, howpublished = {\url{https://github.com/linkedin/lift}}, month = aug, year = 2020 } ``` If you publish material that references the permutation testing methodology that is available as part of LiFT, you can use the following citation: ``` @inproceedings{diciccio20evaluating, author = {DiCiccio, Cyrus and Vasudevan, Sriram and Basu, Kinjal and Kenthapadi, Krishnaram and Agarwal, Deepak}, title = {Evaluating Fairness Using Permutation Tests}, booktitle = {Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining}, series = {KDD '20}, year = {2020}, pages = {}, numpages = {11} } ``` If you publish material that references the equality of opportunity methodology that is available as part of LiFT, you can use the following citation: ``` @misc{nandy21mitigation, author = {Preetam Nandy and Cyrus Diciccio and Divya Venugopalan and Heloise Logan and Kinjal Basu and Noureddine El Karoui}, title = {Achieving Fairness via Post-Processing in Web-Scale Recommender Systems}, year = {2021}, eprint = {2006.11350}, archivePrefix = {arXiv} } ``` ================================================ FILE: acknowledgements.md ================================================ # Acknowledgements The computation of AUC and ROC in this library were inspired by the following implementations: - numpy (3-clause BSD): https://numpy.org/doc/stable/reference/generated/numpy.trapz.html - spark-mllib (Apache 2.0): https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/evaluation/AreaUnderCurve.scala The computation of a generalized confusion matrix was inspired by the following implementation: - AIF360 (Apache 2.0): https://github.com/IBM/AIF360/blob/master/aif360/metrics/classification_metric.py The permutation test implmented in this library was developed in conjuction with Cyrus DiCiccio, Kinjal Basu and Deepak Agarwal. The LinkedIn Fairness Toolkit (LiFT) was validated through its deployment as part of different product vertical ML workflows and existing ML platforms at LinkedIn. Deepak Agarwal, Parvez Ahammad, Stuart Ambler, Romil Bansal, Kinjal Basu, Bee-Chung Chen, Cyrus DiCiccio, Carlos Faham, Divya Gadde, Priyanka Gariba, Sahin Cem Geyik, Daniel Hewlett, Roshan Lal, Nicole Li, Heloise Logan, Sofus Macskassy, Varun Mithal, Arashpreet Singh Mor, Tanvi Motwani, Preetam Nandy, Cagri Ozcaglar, Nitin Panjwani, Igor Perisic, Romer Rosales, Guillaume Saint-Jacques, Badrul Sarwar, Amir Sepehri, Arun Swami, Ram Swaminathan, Grace Tang, Xin Wang, Ya Xu, and Yang Yang provided insightful feedback and discussions that influenced various aspects of LiFT. ================================================ FILE: build.gradle ================================================ buildscript { repositories { mavenLocal() // for local testing maven { url "https://plugins.gradle.org/m2/" } } dependencies { classpath "org.jfrog.buildinfo:build-info-extractor-gradle:4.+" classpath "io.github.gradle-nexus:publish-plugin:1.+" classpath "org.shipkit:shipkit-auto-version:1.+" classpath "org.shipkit:shipkit-changelog:1.+" } } apply from: "gradle/release.gradle" allprojects { apply plugin: "eclipse" apply plugin: "idea" group = "com.linkedin.lift" repositories { mavenCentral() } } task clean(type: Delete) { delete "build" } ================================================ FILE: dataset-fairness.md ================================================ # Dataset-level Fairness Metrics We provide here a list of the various metrics available for measuring fairness of datasets, as well as a short description of each of them. 1. **Metrics that compare against a given reference distribution:** These metrics involve computing some measure of distance or divergence from a given reference distribution provided by the user. The library supports only the `UNIFORM` distribution out of the box (all `label-protectedAttribute` combinations must have equal number of records), but users may supply their own distribution (such as an apriori known gender distribution etc.). For the most up-to-date documentation on the supported metrics, please look at the link [here](lift/src/main/scala/com/linkedin/lift/lib/DivergenceUtils.scala), and look for the `computeDatasetDistanceMetrics` method as the starting point. The following metrics fall under this category: 1. **Skews:** Computes the logarithm of the ratio of the observed value to the expected value. For example, if we are dealing with label-gender distributions, this metric computes ![\log\left(\frac{(0.0, MALE)_{obs}}{(0.0, MALE)_{exp}}\right), \log\left(\frac{(1.0, MALE)_{obs}}{(1.0, MALE)_{exp}}\right), \log\left(\frac{(0.0, FEMALE)_{obs}}{(0.0, FEMALE)_{exp}}\right), \log\left(\frac{(1.0, FEMALE)_{obs}}{(1.0, FEMALE)_{exp}}\right)](https://render.githubusercontent.com/render/math?math=%5Clog%5Cleft(%5Cfrac%7B(0.0%2C%20MALE)_%7Bobs%7D%7D%7B(0.0%2C%20MALE)_%7Bexp%7D%7D%5Cright)%2C%20%5Clog%5Cleft(%5Cfrac%7B(1.0%2C%20MALE)_%7Bobs%7D%7D%7B(1.0%2C%20MALE)_%7Bexp%7D%7D%5Cright)%2C%20%5Clog%5Cleft(%5Cfrac%7B(0.0%2C%20FEMALE)_%7Bobs%7D%7D%7B(0.0%2C%20FEMALE)_%7Bexp%7D%7D%5Cright)%2C%20%5Clog%5Cleft(%5Cfrac%7B(1.0%2C%20FEMALE)_%7Bobs%7D%7D%7B(1.0%2C%20FEMALE)_%7Bexp%7D%7D%5Cright)) 2. **Infinity Norm Distance:** Computes the Chebyshev Distance between the observed and reference distribution. It equals the maximum difference between the two distributions. 3. **Total Variation Distance:** Computes the Total Variation Distance between the observed and reference distribution. It is equal to half the L1 distance between the two distributions. 4. **JS Divergence:** The Jensen-Shannon Divergence between the observed and reference distribution. Suppose that the average of these two distributions is given by M. Then, the JS Divergence is the average of the KL Divergences between the observed distribution and M, and the reference distribution and M. 5. **KL Divergence:** The Kullback-Leibler Divergence between the observed and reference distribution. It is the expectation (over the observed distribution) of the logarithmic differences between the observed and reference distributions. The latter is the Skew we measure above. 2. **Metrics computed on the observed distribution only:** These metrics compute some notion of distance or divergence between various segments of the observed distribution. For the most up-to-date documentation on the supported metrics, please look at the link [here](lift/src/main/scala/com/linkedin/lift/lib/DivergenceUtils.scala), and look for the `computeDatasetDistanceMetrics` method as the starting point. The following metrics are supported for training data: 1. **Demographic Parity:** It measures the difference between the conditional expected value of the prediction (given one protected attribute value) and the conditional expected value of the prediction (given the other protected attribute value). This is measured for all pairs of protected attribute values. ![DP_{(g_1, g_2)} = E\\[Y(X)|G=g_1\\] - E\\[Y(X)|G=g_2\\] = P(Y(X)=1|G=g_1) - P(Y(X)=1|G=g_2)](https://render.githubusercontent.com/render/math?math=DP_%7B(g_1%2C%20g_2)%7D%20%3D%20E%5C%5BY(X)%7CG%3Dg_1%5C%5D%20-%20E%5C%5BY(X)%7CG%3Dg_2%5C%5D%20%3D%20P(Y(X)%3D1%7CG%3Dg_1)%20-%20P(Y(X)%3D1%7CG%3Dg_2)) 3. **Aggregate Metrics:** These metrics are useful to obtain higher level (or second order) notions of inequality, when comparing multiple per-protected-attribute-value inequality metrics. For example, these could be used to say if one set of Skews measured is more equally distributed that another set of Skews. These lower-level metrics are called benefit vectors, and the aggregate metrics provide a notion of how uniformly these inequalities are distributed. Note that these metrics capture inequalities within the vector. Thus, going by this metric alone is not sufficient. For example, take a benefit vector that captures Demographic Parity differences between (MALE, FEMALE), (FEMALE, UNKNOWN), and (MALE, UNKNOWN). Suppose that the vector for one distribution is (a, 2a, 3a) and the other is (0.5a, 1.5a, 2a). Even though the individual differences are smaller in the second distribution (for each pair of protected attribute values), an aggregate metric will deem it to be more unfair than the former because the differences in the elements of the vector are more drastic than the other (for the first one, the ratio is 1:2:3 while for the second it is 1:3:4). However, the latter has better Demographic Parity. Hence, there may be conflicting notions of fairness being measured, and it is up to the end user to identify which one they would like to focus on. For the most up-to-date documentation on the supported metrics, please look at the link [here](lift/src/main/scala/com/linkedin/lift/types/BenefitMap.scala), and look for the `computeMetric` method as the starting point. The following aggregate metrics are available: 1. **Generalized Entropy Index:** Computes an average of the relative benefits based on some input parameters. 2. **Atkinsons Index:** A derivative of the Generalized Entropy Index. Used more commonly in the field of economics. 3. **Theil's L Index:** The Generalized Entropy Index when its parameter is set to 0. It is more sensitive to differences at the lower end of the distribution (the benefit vector values). 4. **Theil's T Index:** The Generalized Entropy Index when its parameter is set to 1. It is more sensitive to differences at the higher end of the distribution (the benefit vector values). 5. **Coefficient of Variation:** A derivative of the Generalized Entropy Index. It computes the value of the standard deviation divided by the mean of the benefit vector. ================================================ FILE: dependencies.md ================================================ # Dependencies This product automatically downloads the following: 1. Apache Spark and its subcomponents: While generally licensed under the Apache License 2.0, the Apache Spark project contains subcomponents with separate copyright notices and license terms. See https://github.com/apache/spark/blob/branch-2.3/NOTICE. 2. ScalaTest: While generally licensed under the Apache License 2.0, the ScalaTest library contains subcomponents with separate copyright notices and license terms. See https://github.com/scalatest/scalatest/blob/3.1.x/NOTICE. 3. scopt: While generally licensed under the MIT License, the scopt library contains subcomponents with separate copyright notices and license terms. See https://github.com/scopt/scopt/blob/scopt4/LICENSE.md. 4. TestNG: While generally licensed under the Apache License 2.0, the TestNG library contains subcomponents with separate copyright notices and license terms. See https://github.com/cbeust/testng/blob/master/LICENSE.txt ================================================ FILE: docs/release-notes.md ================================================ *Release notes were automatically generated by [Shipkit](http://shipkit.org/)* #### 0.2.1 - 2021-03-29 - [1 commit](https://github.com/linkedin/LiFT/compare/v0.2.0...v0.2.1) by [Preetam Nandy](https://github.com/preetamnandy) - published to [![Bintray](https://img.shields.io/badge/Bintray-0.2.1-green.svg)](https://bintray.com/linkedin/maven/LiFT/0.2.1) - mitigation: equality of opportunity [(#7)](https://github.com/linkedin/LiFT/pull/7) #### 0.2.0 - 2020-09-15 - [1 commit](https://github.com/linkedin/LiFT/compare/v0.1.4...v0.2.0) by [Sriram Vasudevan](https://github.com/sriramvasudevan) - published to [![Bintray](https://img.shields.io/badge/Bintray-0.2.0-green.svg)](https://bintray.com/linkedin/maven/LiFT/0.2.0) - Update computeJoinedDF to take a DataFrame [(#3)](https://github.com/linkedin/LiFT/pull/3) #### 0.1.4 - 2020-09-10 - [2 commits](https://github.com/linkedin/LiFT/compare/v0.1.3...v0.1.4) by [Sriram Vasudevan](https://github.com/sriramvasudevan) - published to [![Bintray](https://img.shields.io/badge/Bintray-0.1.4-green.svg)](https://bintray.com/linkedin/maven/LiFT/0.1.4) - Update Readme to document JCenter details [(#2)](https://github.com/linkedin/LiFT/pull/2) #### 0.1.3 - 2020-09-08 - no code changes (no commits) - published to [![Bintray](https://img.shields.io/badge/Bintray-0.1.3-green.svg)](https://bintray.com/linkedin/maven/LiFT/0.1.3) #### 0.1.2 - 2020-09-07 - 5 commits by [Sriram Vasudevan](https://github.com/sriramvasudevan) - published to [![Bintray](https://img.shields.io/badge/Bintray-0.1.2-green.svg)](https://bintray.com/linkedin/maven/LiFT/0.1.2) - No pull requests referenced in commit messages. ================================================ FILE: equality-of-opportunity.md ================================================ # Equality of Opportunity (EOpp) ## EOpp Definition Equality of opportunity is one of the most widely used definitions of fairness. For a recommender system, EOpp suggests that randomly chosen ``qualified'' candidates should be represented equally regardless of which group they belong to; in other words, the exposure of qualified candidates from any group should be equal. Most recommender systems generate scores s(X) (predicting the relevance of an item to a binary response variable Y) to rank candidate items based on a feature set X. In these cases, EOpp corresponds to the independence of <s(X) and characteristic/attribute C given the response/label Y=1, i.e. ![P(s(X) \leq t \mid C= c_1, Y=1) = P(s(X) \leq t \mid C= c_2, Y=1),\forall c_1, c_2.](https://render.githubusercontent.com/render/math?math=P(s(X)%20%5Cleq%20t%20%5Cmid%20C%3D%20c_1%2C%20Y%3D1)%20%3D%20P(s(X)%20%5Cleq%20t%20%5Cmid%20C%3D%20c_2%2C%20Y%3D1)%2C%5Cforall%20c_1%2C%20c_2.) ## EOpp Algorithm We provide the post-processing technique presented in *[Nandy et al. (2021)](https://arxiv.org/abs/2006.11350)*. The function eOppTransformation() (see [EOppUtils](lift/src/main/scala/com/linkedin/lift/mitigation/EOppUtils.scala)) can be used to learn a transformation that can be applied to model scores for achieving EOpp. The distribution of the transformed scores can be forced to match as the distribution before transformation by setting the argument originalScale = true. This is useful for blending the transformed scores s\*(X) with the original scores s(X) as t \* s\*(X) + (1-t) \* s(X) to achieve a fairness-performance trade-off by adjusting the tuning parameter t in [0, 1]. ## Position bias adjustment To define EOpp in the presence of the position bias, we need to take into account the dependency of the response variable Y on the position where the item is shown. To this end, we denote the counterfactual response when an item appears at position j by Y(j). Furthermore, we use to denote the position of an item in the ranking generated by s(X). Therefore, the observed response is given by Y(). A scoring function s(X) of a recommendation system satisfies EOpp with respect to a characteristic C if ![P(s(X) \leq t \mid C=c_1,Y(\gamma)=1) = P(s(X) \leq t \mid C=c_2,Y(\gamma)=1), \forall t, c_1, c_2.](https://render.githubusercontent.com/render/math?math=P(s(X)%20%5Cleq%20t%20%5Cmid%20C%3Dc_1%2CY(%5Cgamma)%3D1)%20%3D%20P(s(X)%20%5Cleq%20t%20%5Cmid%20C%3Dc_2%2CY(%5Cgamma)%3D1)%2C%20%5Cforall%20t%2C%20c_1%2C%20c_2.) We provide a debiasing technique that should be applied before applying the EOpp algorithm in the presence of position bias. The function debiasPositiveLabelScores() (see [positionBiasUtils](lift/src/main/scala/com/linkedin/lift/lib/positionBiasUtils.scala)) removes the effect of the position bias from the training data and the output can be directly used by eOppTransformation() (see [EOppUtils](lift/src/main/scala/com/linkedin/lift/mitigation/EOppUtils.scala)) for learning the EOpp transformation. ## Example We illustrate the position bias adjusted EOpp algorithm in [EOppUtilsTest](lift/src/Test/scala/com/linkedin/lift/mitigation/EOppUtilsTest.scala). Data Generation (as in *[Nandy et al. (2021)](https://arxiv.org/abs/2006.11350)*): We generate a population of p = 50,000 items, where each item consists of id i, characteristic Ci in {0, 1}, label at position 1 Yi(1) and relevance Ri. We independently generate Ci's from a Bernoulli(0.6) distribution. The conditional distribution Yi(1) given Ci = 0 is Bernoulli(0.4), and the conditional distribution Yi(1) given Ci = 0 is Bernoulli(0.5). Finally, Ri | (Ci, Yi(1)) is generated from Gaussian(0.6Yi(1) + 2Ci, 0.5) + (1 - Ci) \* Uniform[0,~ (1 + Yi(1))]. We consider a recommendation system with K = 50 slots. For each session, we randomly select 50 items from the population and assign a score si = Ri + Gaussian(0, 0.1) to each selected item i = 1,..., 50000. The selected items are then ranked according to si (in a descending order) and assigned position according to rank(i). Finally, the item i at position j gets observed response Y(j) = Y(1) \* Bernoulli(wj) with position bias wj = 1 / log2(1+j). Validation: We learn the EOpp transformation using [training data](lift/src/Test/Data/TrainingData.csv) containing 20K i.i.d.\ sessions (i.e. 20K * 50 = 1M samples). For testing, we apply the transformation on [validation data](lift/src/Test/Data/TrainingData.csv) containing 20000 i.i.d.\ sessions. To apply the effect of position bias in the transformed validation data, we multiply the labels Y(1) with Bernoulli(1/(1 + position)) random numbers, where the position corresponds to the rank of an item according to the transformed score. We validate the EOpp transformation by computing the 2nd Wasserstein distance between the transformed positive label score distributions corresponding to C=0 and C=1. Additionally, we validate the equality of the transformed score distribution and the scores before the transformation. ================================================ FILE: gradle/java-publication.gradle ================================================ assert plugins.hasPlugin(JavaPlugin) tasks.withType(Jar) { from "$rootDir/LICENSE" from "$rootDir/NOTICE" } // Auxiliary jar files required by Maven module publications task sourcesJar(type: Jar, dependsOn: classes) { classifier 'sources' from sourceSets.main.allSource } task javadocJar(type: Jar, dependsOn: javadoc) { classifier 'javadoc' from javadoc.destinationDir } artifacts { archives sourcesJar archives javadocJar } apply plugin: "maven-publish" // https://docs.gradle.org/current/userguide/publishing_maven.html publishing { publications { liftJar(MavenPublication) { from components.java artifact sourcesJar artifact javadocJar artifactId = project.archivesBaseName pom { name = artifactId description = "A Scala/Spark library that enables the measurement of fairness in large scale machine learning workflows." url = "https://github.com/linkedin/LiFT" licenses { license { name = 'BSD 2-CLAUSE' url = 'https://github.com/linkedin/LiFT/blob/master/LICENSE' distribution = 'repo' } } developers { [ 'sriramvasudevan:Sriram Vasudevan' ].each { devData -> developer { def devInfo = devData.split(':') id = devInfo[0] name = devInfo[1] url = 'https://github.com/' + devInfo[0] roles = ["Core developer"] } } } scm { url = 'https://github.com/linkedin/LiFT.git' } issueManagement { url = 'https://github.com/linkedin/LiFT/issues' system = 'GitHub issues' } ciManagement { url = 'https://travis-ci.com/linkedin/LiFT' system = 'Travis CI' } } } } //useful for testing - running "publish" will create artifacts/pom in a local dir repositories { maven { url = "$rootProject.buildDir/repo" } } } //fleshes out problems with Maven pom generation when building tasks.build.dependsOn("publishLiftJarPublicationToMavenLocal") apply plugin: 'signing' //https://docs.gradle.org/current/userguide/signing_plugin.html signing { if (System.getenv("PGP_KEY")) { useInMemoryPgpKeys(System.getenv("PGP_KEY"), System.getenv("PGP_PWD")) sign publishing.publications.liftJar } } ////////////////////////////////// // LinkedIn Artifactory Config ////////////////////////////////// apply plugin: "com.jfrog.artifactory" //https://www.jfrog.com/confluence/display/rtf/gradle+artifactory+plugin artifactory { contextUrl = 'https://linkedin.jfrog.io/artifactory' publish { repository { repoKey = 'LiFT' username = System.getenv('ARTIFACTORY_USER') password = System.getenv('ARTIFACTORY_KEY') maven = true } defaults { publications('liftJar') publishBuildInfo = true publishArtifacts = true publishPom = true publishIvy = true } } clientConfig.setIncludeEnvVars(false) } artifactoryPublish { skip = project.hasProperty('artifactory.dryRun') } ================================================ FILE: gradle/release.gradle ================================================ //Plugin jars are added to the buildscript classpath in the root build.gradle file ////////////////////////////////// // Token Verification Tasks ////////////////////////////////// task checkGitHubToken { doFirst { if (System.getenv("GITHUB_TOKEN") == null) { throw new Exception("Environment variable GITHUB_TOKEN not set."); } println "Using repository " + System.getenv("GITHUB_REPOSITORY") } } task verifyArtifactoryProperties { doFirst { if (!project.hasProperty('artifactory.dryRun')) { if (System.getenv('ARTIFACTORY_USER') == null) { throw new Exception("Environment variable ARTIFACTORY_USER not set."); } if (System.getenv('ARTIFACTORY_KEY') == null) { throw new Exception("Environment variable ARTIFACTORY_KEY not set."); } } } } ////////////////////////////////// // Shipkit Tasks ////////////////////////////////// apply plugin: "org.shipkit.shipkit-auto-version" //https://github.com/shipkit/shipkit-auto-version apply plugin: "org.shipkit.shipkit-changelog" //https://github.com/shipkit/shipkit-changelog tasks.named("generateChangelog") { dependsOn checkGitHubToken previousRevision = project.ext.'shipkit-auto-version.previous-tag' githubToken = System.getenv("GITHUB_TOKEN") repository = "linkedin/LiFT" } apply plugin: "org.shipkit.shipkit-github-release" //https://github.com/shipkit/shipkit-changelog tasks.named("githubRelease") { def genTask = tasks.named("generateChangelog").get() dependsOn genTask dependsOn checkGitHubToken repository = genTask.repository changelog = genTask.outputFile githubToken = System.getenv("GITHUB_TOKEN") newTagRevision = System.getenv("GITHUB_SHA") } ////////////////////////////////// // Maven Central Config ////////////////////////////////// apply plugin: "io.github.gradle-nexus.publish-plugin" //https://github.com/gradle-nexus/publish-plugin/ nexusPublishing { repositories { if (System.getenv("SONATYPE_PWD")) { sonatype { username = System.getenv("SONATYPE_USER") password = System.getenv("SONATYPE_PWD") } } } } ////////////////////////////////// // Additional Release Tasks ////////////////////////////////// task artifactoryPublishAll { description = "Runs 'artifactoryPublish' tasks from all projects" } allprojects { tasks.matching { it.name == "artifactoryPublish" }.all { it.dependsOn verifyArtifactoryProperties artifactoryPublishAll.dependsOn it } } ================================================ FILE: gradle/wrapper/gradle-wrapper.properties ================================================ distributionBase=GRADLE_USER_HOME distributionPath=wrapper/dists distributionUrl=https\://services.gradle.org/distributions/gradle-6.8.3-bin.zip zipStoreBase=GRADLE_USER_HOME zipStorePath=wrapper/dists ================================================ FILE: gradle.properties ================================================ org.gradle.caching=true ================================================ FILE: gradlew ================================================ #!/usr/bin/env sh # # Copyright 2015 the original author or authors. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # https://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # ############################################################################## ## ## Gradle start up script for UN*X ## ############################################################################## # Attempt to set APP_HOME # Resolve links: $0 may be a link PRG="$0" # Need this for relative symlinks. while [ -h "$PRG" ] ; do ls=`ls -ld "$PRG"` link=`expr "$ls" : '.*-> \(.*\)$'` if expr "$link" : '/.*' > /dev/null; then PRG="$link" else PRG=`dirname "$PRG"`"/$link" fi done SAVED="`pwd`" cd "`dirname \"$PRG\"`/" >/dev/null APP_HOME="`pwd -P`" cd "$SAVED" >/dev/null APP_NAME="Gradle" APP_BASE_NAME=`basename "$0"` # Add default JVM options here. You can also use JAVA_OPTS and GRADLE_OPTS to pass JVM options to this script. DEFAULT_JVM_OPTS='"-Xmx64m" "-Xms64m"' # Use the maximum available, or set MAX_FD != -1 to use that value. MAX_FD="maximum" warn () { echo "$*" } die () { echo echo "$*" echo exit 1 } # OS specific support (must be 'true' or 'false'). cygwin=false msys=false darwin=false nonstop=false case "`uname`" in CYGWIN* ) cygwin=true ;; Darwin* ) darwin=true ;; MINGW* ) msys=true ;; NONSTOP* ) nonstop=true ;; esac CLASSPATH=$APP_HOME/gradle/wrapper/gradle-wrapper.jar # Determine the Java command to use to start the JVM. if [ -n "$JAVA_HOME" ] ; then if [ -x "$JAVA_HOME/jre/sh/java" ] ; then # IBM's JDK on AIX uses strange locations for the executables JAVACMD="$JAVA_HOME/jre/sh/java" else JAVACMD="$JAVA_HOME/bin/java" fi if [ ! -x "$JAVACMD" ] ; then die "ERROR: JAVA_HOME is set to an invalid directory: $JAVA_HOME Please set the JAVA_HOME variable in your environment to match the location of your Java installation." fi else JAVACMD="java" which java >/dev/null 2>&1 || die "ERROR: JAVA_HOME is not set and no 'java' command could be found in your PATH. Please set the JAVA_HOME variable in your environment to match the location of your Java installation." fi # Increase the maximum file descriptors if we can. if [ "$cygwin" = "false" -a "$darwin" = "false" -a "$nonstop" = "false" ] ; then MAX_FD_LIMIT=`ulimit -H -n` if [ $? -eq 0 ] ; then if [ "$MAX_FD" = "maximum" -o "$MAX_FD" = "max" ] ; then MAX_FD="$MAX_FD_LIMIT" fi ulimit -n $MAX_FD if [ $? -ne 0 ] ; then warn "Could not set maximum file descriptor limit: $MAX_FD" fi else warn "Could not query maximum file descriptor limit: $MAX_FD_LIMIT" fi fi # For Darwin, add options to specify how the application appears in the dock if $darwin; then GRADLE_OPTS="$GRADLE_OPTS \"-Xdock:name=$APP_NAME\" \"-Xdock:icon=$APP_HOME/media/gradle.icns\"" fi # For Cygwin or MSYS, switch paths to Windows format before running java if [ "$cygwin" = "true" -o "$msys" = "true" ] ; then APP_HOME=`cygpath --path --mixed "$APP_HOME"` CLASSPATH=`cygpath --path --mixed "$CLASSPATH"` JAVACMD=`cygpath --unix "$JAVACMD"` # We build the pattern for arguments to be converted via cygpath ROOTDIRSRAW=`find -L / -maxdepth 1 -mindepth 1 -type d 2>/dev/null` SEP="" for dir in $ROOTDIRSRAW ; do ROOTDIRS="$ROOTDIRS$SEP$dir" SEP="|" done OURCYGPATTERN="(^($ROOTDIRS))" # Add a user-defined pattern to the cygpath arguments if [ "$GRADLE_CYGPATTERN" != "" ] ; then OURCYGPATTERN="$OURCYGPATTERN|($GRADLE_CYGPATTERN)" fi # Now convert the arguments - kludge to limit ourselves to /bin/sh i=0 for arg in "$@" ; do CHECK=`echo "$arg"|egrep -c "$OURCYGPATTERN" -` CHECK2=`echo "$arg"|egrep -c "^-"` ### Determine if an option if [ $CHECK -ne 0 ] && [ $CHECK2 -eq 0 ] ; then ### Added a condition eval `echo args$i`=`cygpath --path --ignore --mixed "$arg"` else eval `echo args$i`="\"$arg\"" fi i=$((i+1)) done case $i in (0) set -- ;; (1) set -- "$args0" ;; (2) set -- "$args0" "$args1" ;; (3) set -- "$args0" "$args1" "$args2" ;; (4) set -- "$args0" "$args1" "$args2" "$args3" ;; (5) set -- "$args0" "$args1" "$args2" "$args3" "$args4" ;; (6) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" ;; (7) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" ;; (8) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" "$args7" ;; (9) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" "$args7" "$args8" ;; esac fi # Escape application args save () { for i do printf %s\\n "$i" | sed "s/'/'\\\\''/g;1s/^/'/;\$s/\$/' \\\\/" ; done echo " " } APP_ARGS=$(save "$@") # Collect all arguments for the java command, following the shell quoting and substitution rules eval set -- $DEFAULT_JVM_OPTS $JAVA_OPTS $GRADLE_OPTS "\"-Dorg.gradle.appname=$APP_BASE_NAME\"" -classpath "\"$CLASSPATH\"" org.gradle.wrapper.GradleWrapperMain "$APP_ARGS" # by default we should be in the correct project dir, but when run from Finder on Mac, the cwd is wrong if [ "$(uname)" = "Darwin" ] && [ "$HOME" = "$PWD" ]; then cd "$(dirname "$0")" fi exec "$JAVACMD" "$@" ================================================ FILE: gradlew.bat ================================================ @rem @rem Copyright 2015 the original author or authors. @rem @rem Licensed under the Apache License, Version 2.0 (the "License"); @rem you may not use this file except in compliance with the License. @rem You may obtain a copy of the License at @rem @rem https://www.apache.org/licenses/LICENSE-2.0 @rem @rem Unless required by applicable law or agreed to in writing, software @rem distributed under the License is distributed on an "AS IS" BASIS, @rem WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. @rem See the License for the specific language governing permissions and @rem limitations under the License. @rem @if "%DEBUG%" == "" @echo off @rem ########################################################################## @rem @rem Gradle startup script for Windows @rem @rem ########################################################################## @rem Set local scope for the variables with windows NT shell if "%OS%"=="Windows_NT" setlocal set DIRNAME=%~dp0 if "%DIRNAME%" == "" set DIRNAME=. set APP_BASE_NAME=%~n0 set APP_HOME=%DIRNAME% @rem Add default JVM options here. You can also use JAVA_OPTS and GRADLE_OPTS to pass JVM options to this script. set DEFAULT_JVM_OPTS="-Xmx64m" "-Xms64m" @rem Find java.exe if defined JAVA_HOME goto findJavaFromJavaHome set JAVA_EXE=java.exe %JAVA_EXE% -version >NUL 2>&1 if "%ERRORLEVEL%" == "0" goto init echo. echo ERROR: JAVA_HOME is not set and no 'java' command could be found in your PATH. echo. echo Please set the JAVA_HOME variable in your environment to match the echo location of your Java installation. goto fail :findJavaFromJavaHome set JAVA_HOME=%JAVA_HOME:"=% set JAVA_EXE=%JAVA_HOME%/bin/java.exe if exist "%JAVA_EXE%" goto init echo. echo ERROR: JAVA_HOME is set to an invalid directory: %JAVA_HOME% echo. echo Please set the JAVA_HOME variable in your environment to match the echo location of your Java installation. goto fail :init @rem Get command-line arguments, handling Windows variants if not "%OS%" == "Windows_NT" goto win9xME_args :win9xME_args @rem Slurp the command line arguments. set CMD_LINE_ARGS= set _SKIP=2 :win9xME_args_slurp if "x%~1" == "x" goto execute set CMD_LINE_ARGS=%* :execute @rem Setup the command line set CLASSPATH=%APP_HOME%\gradle\wrapper\gradle-wrapper.jar @rem Execute Gradle "%JAVA_EXE%" %DEFAULT_JVM_OPTS% %JAVA_OPTS% %GRADLE_OPTS% "-Dorg.gradle.appname=%APP_BASE_NAME%" -classpath "%CLASSPATH%" org.gradle.wrapper.GradleWrapperMain %CMD_LINE_ARGS% :end @rem End local scope for the variables with windows NT shell if "%ERRORLEVEL%"=="0" goto mainEnd :fail rem Set variable GRADLE_EXIT_CONSOLE if you need the _script_ return code instead of rem the _cmd.exe /c_ return code! if not "" == "%GRADLE_EXIT_CONSOLE%" exit 1 exit /b 1 :mainEnd if "%OS%"=="Windows_NT" endlocal :omega ================================================ FILE: lift/build.gradle ================================================ import org.gradle.util.VersionNumber plugins { id 'scala' } def scalaVersion = findProperty("scalaVersion") ?: "2.11.8" // Scala 2.11.8 is the default Scala build version. println "Scala version: $scalaVersion" // If scalaVersion == "2.11.8", then scalaVersionShort == "2.11". def scalaVersionShort = "${VersionNumber.parse(scalaVersion).getMajor()}.${VersionNumber.parse(scalaVersion).getMinor()}" def sparkVersion = findProperty("sparkVersion") ?: "2.3.0" // Spark 2.3.0 is the default Spark build version. println "Spark version: $sparkVersion" dependencies { if(VersionNumber.parse(sparkVersion) >= VersionNumber.parse("2.4.0")) { compile("org.apache.spark:spark-avro_$scalaVersionShort:$sparkVersion") } else { compile("com.databricks:spark-avro_$scalaVersionShort:4.0.0") } compile("com.github.scopt:scopt_$scalaVersionShort:3.5.0") compile("org.apache.spark:spark-core_$scalaVersionShort:$sparkVersion") compile("org.apache.spark:spark-sql_$scalaVersionShort:$sparkVersion") compile("org.apache.spark:spark-mllib_$scalaVersionShort:$sparkVersion") compile("org.scalatest:scalatest_$scalaVersionShort:3.1.0") compile("org.testng:testng:6.8.8") } test { useTestNG() } archivesBaseName = "${project.name}_${sparkVersion}_${scalaVersionShort}" apply from: "$rootDir/gradle/java-publication.gradle" ================================================ FILE: lift/src/main/scala/com/linkedin/lift/eval/FairnessMetricsUtils.scala ================================================ package com.linkedin.lift.eval import com.linkedin.lift.lib.{DivergenceUtils, PermutationTestUtils, StatsUtils} import com.linkedin.lift.types.{BenefitMap, Distribution, FairnessResult, ModelPrediction} import org.apache.spark.sql.functions._ import org.apache.spark.sql.types.IntegerType import org.apache.spark.sql.{DataFrame, SaveMode, SparkSession} /** * Utilities that stitch together various fairness metrics methods, * to provide more higher level APIs. */ object FairnessMetricsUtils { /** * Extract a flattened DataFrame containing (id, label, score) columns. * This input DataFrame would have an ID field, a label field and a score field. * * @param df The input DataFrame * @param uidField The unique ID field, like a Member ID * @param labelField The label field * @param scoreField The score field * @param groupIdField The grouping ID field * @return A flattened DataFrame containing the above 3 fields */ def projectIdLabelsAndScores(df: DataFrame, uidField: String, labelField: String, scoreField: String, groupIdField: String): DataFrame = { if (groupIdField.isEmpty) { df.select(col(uidField), col(labelField), col(scoreField)) } else { val allFields = df.schema.fieldNames if (allFields.contains(groupIdField)) { df.select(col(uidField), col(labelField), col(scoreField), col(groupIdField)) } else { df.select(col(uidField), col(labelField), col(scoreField)) } } } /** * Evaluate if the difference in metric values is significant, for every * pair of protected attribute values (in the given set of predictions), * using permutation testing. * * @param predictions The input predictions to evaluate * @param dimType The dimension type, such as gender or age. * @param metrics The metrics of interest * @param numTrials Number of trials to run the permutation test for * @param seed A random seed. * @return A map containing the results of the permutation test */ def computePermutationTestMetrics(predictions: Seq[ModelPrediction], dimType: String, metrics: Seq[String], numTrials: Int, seed: Long): Seq[FairnessResult] = { // Compute all subsets of size 2 - nC2 such pairs in total val allDimValPairs = predictions.map(_.dimensionValue) .toSet .subsets(2) .toList metrics.flatMap { metric => allDimValPairs.map { dimValPair => PermutationTestUtils.permutationTest(predictions, dimType, dimValPair.head, dimValPair.last, metric, numTrials, seed) } } } /** * Evaluate a requested set of overall fairness metrics on a benefit * vector generated by computing a benefit metric for each * protected attribute value (in the given set of predictions). * * @param sampledDF The sampled input DataFrame * @param args The Model Fairness Measurement command line args * @return A sequence of various model-performance-related fairness metrics. */ def computeModelPerformanceMetrics(sampledDF: DataFrame, args: MeasureModelFairnessMetricsCmdLineArgs): Seq[FairnessResult] = { val samplePredictions = ModelPrediction.compute(sampledDF, args.labelField, args.scoreField, args.groupIdField, args.protectedAttributeField) val permutationTestMetrics = computePermutationTestMetrics( samplePredictions, args.protectedAttributeField, args.permutationMetrics, args.numTrials, args.seed) val benefitMaps = args.performanceBenefitMetrics.map { benefitMetric => BenefitMap.compute(samplePredictions, args.protectedAttributeField, benefitMetric) } val overallFairnessMetrics = benefitMaps.flatMap(_.computeOverallMetrics(args.overallMetrics)) permutationTestMetrics ++ overallFairnessMetrics } /** * Join the input DataFrame with the protectedAttribute DataFrame, and return * the input DataFrame appended with the protectedAttribute. * * @param protectedDF DataFrame containing the protected attribute data * @param df The input DataFrame * @param uidField The unique ID field, such as memberId * @param protectedDatasetPath Path to the protected dataset. If empty, we * attempt to load the right dataset based on the * protectedAttribute specified. * @param uidProtectedAttributeField The uid field of the protected * attribute dataset * @param protectedAttributeField The protected attribute field in the * protectedAttribute DataFrame * @return The joined DataFrame */ def computeJoinedDF(protectedDF: DataFrame, df: DataFrame, uidField: String, protectedDatasetPath: String, uidProtectedAttributeField: String, protectedAttributeField: String): DataFrame = { protectedDF.select(col(uidProtectedAttributeField).as(uidField), col(protectedAttributeField)) .join(df, uidField) } /** * Computes a reference distribution based on the input dataset's * distribution. * * @param inputDistr The input dataset's distribution of protectedAttributes only * @param referenceDistribution The kind of reference distribution desired. * Currently, only a uniform distribution is supported. * @return The computed reference distribution, or None if the specified * referenceDistribution parameter is invalid. */ def computeReferenceDistributionOpt(inputDistr: Distribution, referenceDistribution: String): Option[Distribution] = { if (referenceDistribution != "UNIFORM") { None } else { val numDims = inputDistr.entries.size val uniformWeight: Double = 1.0 / numDims val uniformEntries = inputDistr.entries.map { case (dimVal, _) => (dimVal, uniformWeight) } Some(Distribution(uniformEntries)) } } /** * Computes all requested distance and divergence metrics for a given * distribution. It compares it to a reference distribution if necessary. * * We assume that the original and reference distributions are for the training * data, and are over (label, protectedAttribute), and that the reference * distribution also contains similar dimensions. * * @param distribution The distribution of (label, protectedAttribute) for the * training dataset * @param referenceDistrOpt An optional field that contains a reference * distribution of (label, protectedAttribute) to * compare against. If not provided, distance and * divergence metrics that perform comparisons will * return empty results. * @param args The set of parsed command line arguments for this job. * @return A sequence of Metric values that contain the name of the metric, * any parameters used, and the result of the computation. */ def computeDatasetMetrics(distribution: Distribution, referenceDistrOpt: Option[Distribution], args: MeasureDatasetFairnessMetricsCmdLineArgs, scoreField: String = ""): Seq[FairnessResult] = { val computedDistanceMetrics = DivergenceUtils.computeDistanceMetrics( args.distanceMetrics, distribution, referenceDistrOpt, args.labelField, scoreField, args.protectedAttributeField) val benefitMaps = DivergenceUtils.computeDistanceMetrics( args.benefitMetrics, distribution, referenceDistrOpt, args.labelField, scoreField, args.protectedAttributeField) .map(_.toBenefitMap) val computedOverallMetrics = benefitMaps.flatMap(_.computeOverallMetrics(args.overallMetrics)) computedDistanceMetrics ++ computedOverallMetrics } /** * Computes a DataFrame with probabilities for the score field. If the * threshold is specified, we use it to obtain 0/1 values for the scores, * thus giving us probabilities. If we have raw scores, we convert it into * probabilities using the sigmoid function. If we have probabilities, we * return them as is. * * @param df The input DataFrame * @param thresholdOpt An optional threshold that can be provided to convert * the scores into 0/1 predictions directly. * @param labelField The label field * @param scoreField The score field * @param protectedAttributeField The protected attribute field * @param scoreType Whether the scores are raw scores or probabilities * @return The input DataFrame with its scores transformed into probabilities. */ def computeProbabilityDF(df: DataFrame, thresholdOpt: Option[Double], labelField: String, scoreField: String, protectedAttributeField: String, scoreType: String): DataFrame = { val probDF = if (scoreType.equals("RAW")) { // Compute sigmoid(x) = 1.0 / (1.0 + exp(-x)) df.select(col(labelField), (lit(1.0) / (lit(1.0) + exp(-col(scoreField)))) .as(scoreField), col(protectedAttributeField)) } else { df.select(labelField, scoreField, protectedAttributeField) } thresholdOpt.map { threshold => probDF.select(col(labelField), (col(scoreField) > threshold).cast(IntegerType).as(scoreField), col(protectedAttributeField)) }.getOrElse(probDF) } /** * Computes all requested model-related fairness metrics for a given * dataset. We assume that the dataset has the (label, score/prediction, * protectedAttribute) fields at the very least. * * At a high level, there are three kinds of metrics. The first involves * checking for statistically significant differences in a particular metric * across various protected groups. The second kind computes conventional * notions of fairness such as Demographic Parity and Equalized Odds. The * third kind is to compute aggregate metrics. This can further be divided * into two kinds: the first is used to summarize metrics such as Demographic * Parity and Equalized Odds, while the second is used to summarize model * performance metrics across various groups (such as precision, TPR, FPR, AUC etc.). * * This method works by sampling the input data and computes metrics. Typically, * 50k-100k rows of data are more than sufficient to compute good estimates * without having to analyze the entire output. If the number of data points * in the input DataFrame is less than this, the entire dataset is analyzed. * * @param df The input DataFrame * @param referenceDistrOpt An optional reference distribution to compare against * @param args The set of model-related fairness measurement command line args * @return A sequence of Metric values that contain the name of the metric, * any parameters used, and the result of the computation. */ def computeModelMetrics(df: DataFrame, referenceDistrOpt: Option[Distribution], args: MeasureModelFairnessMetricsCmdLineArgs): Seq[FairnessResult] = { val probabilityDF = computeProbabilityDF(df, args.thresholdOpt, args.labelField, args.scoreField, args.protectedAttributeField, args.scoreType) val distribution = DivergenceUtils .computeGeneralizedPredictionCountDistribution(probabilityDF, args.labelField, args.scoreField, args.protectedAttributeField) val distMetrics = computeDatasetMetrics(distribution, referenceDistrOpt, MeasureDatasetFairnessMetricsCmdLineArgs( datasetPath = args.datasetPath, protectedDatasetPath = args.protectedDatasetPath, uidField = args.uidField, labelField = args.labelField, protectedAttributeField = args.protectedAttributeField, outputPath = args.outputPath, referenceDistribution = args.referenceDistribution, distanceMetrics = args.distanceMetrics, overallMetrics = args.overallMetrics, benefitMetrics = args.distanceBenefitMetrics), args.scoreField) val sampledDF = if (args.groupIdField.isEmpty) { StatsUtils.sampleDataFrame(probabilityDF, args.labelField, args.approxRows, args.labelZeroPercentage, args.seed) } else { StatsUtils.sampleDataFrameByGroupId(df, args.labelField, args.scoreField, args.groupIdField, args.protectedAttributeField, args.approxRows, args.seed) } val modelPerfMetrics = computeModelPerformanceMetrics(sampledDF, args) distMetrics ++ modelPerfMetrics } /** * Writes a sequence of FairnessResults out to disk. * * @param spark The Spark Session to use * @param dataFormat Output data format * @param dataOptions options for the DataFrameWriter * @param outputPath Output path for the results * @param fairnessResults Sequence of results to be written out */ def writeFairnessResults(spark: SparkSession, dataFormat: String, dataOptions: Map[String, String], outputPath: String, fairnessResults: Seq[FairnessResult]): Unit = { FairnessResult.toDF(spark, fairnessResults) .repartition(1) .write .mode(SaveMode.Overwrite) .format(dataFormat) .options(dataOptions) .save(outputPath) } /** * Computes dataset metrics and logs the output for tracking purposes. * * @param df The input data * @param referenceDistrOpt An optional reference distribution * @param args Command line args for dataset metrics measurement */ def computeAndWriteDatasetMetrics(df: DataFrame, referenceDistrOpt: Option[Distribution], args: MeasureDatasetFairnessMetricsCmdLineArgs): Unit = { // Compute the label-protected attribute distribution of the input data val distribution = Distribution.compute(df, Set(args.labelField, args.protectedAttributeField)) // Passing in the appropriate parameters to this API returns the fairness metrics val fairnessMetrics = computeDatasetMetrics(distribution, referenceDistrOpt, args) // The above fairness metrics can be written out to HDFS writeFairnessResults(df.sparkSession, args.dataFormat, args.dataOptions, args.outputPath, fairnessMetrics) } /** * Computes model metrics and logs the output for tracking purposes. * * @param df The input data * @param referenceDistrOpt An optional reference distribution * @param args Command line args for model metrics measurement */ def computeAndWriteModelMetrics(df: DataFrame, referenceDistrOpt: Option[Distribution], args: MeasureModelFairnessMetricsCmdLineArgs): Unit = { // Passing in the appropriate parameters to this API returns the fairness metrics val fairnessMetrics = computeModelMetrics(df, referenceDistrOpt, args) // The above fairness metrics can be written out to HDFS writeFairnessResults(df.sparkSession, args.dataFormat, args.dataOptions, args.outputPath, fairnessMetrics) } } ================================================ FILE: lift/src/main/scala/com/linkedin/lift/eval/MeasureDatasetFairnessMetricsCmdLineArgs.scala ================================================ package com.linkedin.lift.eval /** * Contains the dataset metrics command line arguments * * @param datasetPath Input data path * @param protectedDatasetPath Input path to the protected dataset (optional). * If not provided, the library attempts to use * the right dataset based on the protected attribute. * @param dataFormat Format of the input datasets. This is the parameter passed * to the Spark reader's format method. Defaults to avro. * @param dataOptions A map of options to be used with Spark's reader (optional). * @param uidField The unique ID field, like a memberId field. * @param labelField The label field * @param protectedAttributeField The protected attribute field * @param uidProtectedAttributeField The uid field for the protected attribute dataset * @param outputPath Output data path * @param referenceDistribution A reference distribution to compare against (optional). * Only accepted value currently is UNIFORM. * @param distanceMetrics Distance and divergence metrics like SKEWS, INF_NORM_DIST, * TOTAL_VAR_DIST, JS_DIVERGENCE, KL_DIVERGENCE and * DEMOGRAPHIC_PARITY (optional). * @param overallMetrics Aggregate metrics like GENERALIZED_ENTROPY_INDEX, * ATKINSONS_INDEX, THEIL_L_INDEX, THEIL_T_INDEX and * COEFFICIENT_OF_VARIATION, along with their corresponding * parameters. * @param benefitMetrics The distance/divergence metrics to use as the benefit * vector when computing the overall metrics. Acceptable * values are SKEWS and DEMOGRAPHIC_PARITY. */ case class MeasureDatasetFairnessMetricsCmdLineArgs( datasetPath: String = "", protectedDatasetPath: String = "", dataFormat: String = "com.databricks.spark.avro", dataOptions: Map[String, String] = Map(), uidField: String = "", labelField: String = "", protectedAttributeField: String = "", uidProtectedAttributeField: String = "memberId", outputPath: String = "", referenceDistribution: String = "", distanceMetrics: Seq[String] = Seq(), overallMetrics: Map[String, String] = Map(), benefitMetrics: Seq[String] = Seq() ) object MeasureDatasetFairnessMetricsCmdLineArgs { /** * Parse command line arguments to generate a structured case class. * * @param args The command line args * @return A case class with the populated parameters */ def parseArgs(args: Seq[String]): MeasureDatasetFairnessMetricsCmdLineArgs = { val parser = new scopt.OptionParser[MeasureDatasetFairnessMetricsCmdLineArgs]( "MeasureDatasetFairnessMetrics") { opt[String]("datasetPath") required() action { (x, c) => c.copy(datasetPath = x) } opt[String]("protectedDatasetPath") optional() action { (x, c) => c.copy(protectedDatasetPath = x) } opt[String]("dataFormat") optional() action { (x, c) => c.copy(dataFormat = x) } opt[Map[String, String]]("dataOptions") optional() action { (x, c) => c.copy(dataOptions = x) } opt[String]("uidField") required() action { (x, c) => c.copy(uidField = x) } opt[String]("labelField") required() action { (x, c) => c.copy(labelField = x) } opt[String]("protectedAttributeField") required() action { (x, c) => c.copy(protectedAttributeField = x) } opt[String]("uidProtectedAttributeField") optional() action { (x, c) => c.copy(uidProtectedAttributeField = x) } opt[String]("outputPath") required() action { (x, c) => c.copy(outputPath = x) } opt[String]("referenceDistribution") optional() action { (x, c) => c.copy(referenceDistribution = x) } opt[Seq[String]]("distanceMetrics") optional() action { (x, c) => c.copy(distanceMetrics = x) } opt[Map[String, String]]("overallMetrics") optional() action { (x, c) => c.copy(overallMetrics = x) } opt[Seq[String]]("benefitMetrics") optional() action { (x, c) => c.copy(benefitMetrics = x) } } // If the parser was unable to read the arguments correctly, // this will generate an exception and end the job val cmdLineArgsOpt: Option[MeasureDatasetFairnessMetricsCmdLineArgs] = parser.parse( args, MeasureDatasetFairnessMetricsCmdLineArgs()) require(cmdLineArgsOpt.isDefined) cmdLineArgsOpt.get } } ================================================ FILE: lift/src/main/scala/com/linkedin/lift/eval/MeasureModelFairnessMetricsCmdLineArgs.scala ================================================ package com.linkedin.lift.eval /** * Contains the model metrics command line arguments * * @param datasetPath Input data path * @param protectedDatasetPath Input path to the protected dataset (optional). * If not provided, the library attempts to use * the right dataset based on the protected attribute. * @param dataFormat Format of the input datasets. This is the parameter passed * to the Spark reader's format method. Defaults to avro. * @param dataOptions A map of options to be used with Spark's reader (optional). * @param uidField The unique ID field, like a memberId field. * @param labelField The label field * @param scoreField The score field * @param scoreType Whether the scores are raw scores or probabilities. * Accepted values are RAW or PROB. * @param protectedAttributeField The protected attribute field * @param uidProtectedAttributeField The uid field for the protected attribute dataset * @param groupIdField An optional field to be used for grouping, in case of ranking metrics * @param outputPath Output data path * @param referenceDistribution A reference distribution to compare against (optional). * Only accepted value currently is UNIFORM. * @param approxRows The approximate number of rows to sample from the input data * when computing model metrics. The final sampled value is * min(numRowsInDataset, approxRows) * @param labelZeroPercentage The percentage of the sampled data that must * be negatively labeled. This is useful in case * the input data is highly skewed and you believe * that stratified sampling will not obtain sufficient * number of examples of a certain label. * @param thresholdOpt An optional value that contains a threshold. It is used * in case you want to generate hard binary classifications. * If not provided and you request metrics that depend on * explicit label predictions (eg. precision), the scoreType * information is used to convert the scores into the * probabilities of predicting positives. This is used for * computing expected positive prediction counts. * @param numTrials Number of trials to run the permutation test for. More trials * yield results with lower variance in the computed p-value, * but takes more time * @param seed The random value seed * @param distanceMetrics Distance and divergence metrics that are to be computed. * These are metrics such as Demographic Parity * and Equalized Odds. * @param permutationMetrics The metrics to use for permutation testing * @param distanceBenefitMetrics The model metrics that are to be used for * computing benefit vectors, one for each * distance metric specified. * @param performanceBenefitMetrics The model metrics that are to be used for * computing benefit vectors, one for each * model performance metric specified. * @param overallMetrics The aggregate metrics that are to be computed on each * of the benefit vectors generated. */ case class MeasureModelFairnessMetricsCmdLineArgs( datasetPath: String = "", protectedDatasetPath: String = "", dataFormat: String = "com.databricks.spark.avro", dataOptions: Map[String, String] = Map(), uidField: String = "", labelField: String = "", scoreField: String = "", scoreType: String = "PROB", protectedAttributeField: String = "", uidProtectedAttributeField: String = "memberId", groupIdField: String = "", outputPath: String = "", referenceDistribution: String = "", approxRows: Long = 500000L, labelZeroPercentage: Double = -1.0, thresholdOpt: Option[Double] = None, numTrials: Int = 1000, seed: Long = 0L, distanceMetrics: Seq[String] = Seq(), permutationMetrics: Seq[String] = Seq(), distanceBenefitMetrics: Seq[String] = Seq(), performanceBenefitMetrics: Seq[String] = Seq(), overallMetrics: Map[String, String] = Map() ) object MeasureModelFairnessMetricsCmdLineArgs { /** * Parse command line arguments to generate a structured case class. * * @param args The command line args * @return A case class with the populated parameters */ def parseArgs(args: Seq[String]): MeasureModelFairnessMetricsCmdLineArgs = { val parser = new scopt.OptionParser[MeasureModelFairnessMetricsCmdLineArgs]( "MeasureModelFairnessMetrics") { opt[String]("datasetPath") required() action { (x, c) => c.copy(datasetPath = x) } opt[String]("protectedDatasetPath") optional() action { (x, c) => c.copy(protectedDatasetPath = x) } opt[String]("dataFormat") optional() action { (x, c) => c.copy(dataFormat = x) } opt[Map[String, String]]("dataOptions") optional() action { (x, c) => c.copy(dataOptions = x) } opt[String]("uidField") required() action { (x, c) => c.copy(uidField = x) } opt[String]("labelField") required() action { (x, c) => c.copy(labelField = x) } opt[String]("scoreField") required() action { (x, c) => c.copy(scoreField = x) } opt[String]("scoreType") required() action { (x, c) => c.copy(scoreType = x) } opt[String]("protectedAttributeField") required() action { (x, c) => c.copy(protectedAttributeField = x) } opt[String]("uidProtectedAttributeField") optional() action { (x, c) => c.copy(uidProtectedAttributeField = x) } opt[String]("groupIdField") optional() action { (x, c) => c.copy(groupIdField = x) } opt[String]("outputPath") required() action { (x, c) => c.copy(outputPath = x) } opt[String]("referenceDistribution") optional() action { (x, c) => c.copy(referenceDistribution = x) } opt[Long]("approxRows") optional() action { (x, c) => c.copy(approxRows = x) } opt[Double]("labelZeroPercentage") optional() action { (x, c) => c.copy(labelZeroPercentage = x) } opt[Double]("threshold") optional() action { (x, c) => c.copy(thresholdOpt = Some(x)) } opt[Int]("numTrials") optional() action { (x, c) => c.copy(numTrials = x) } opt[Long]("seed") optional() action { (x, c) => c.copy(seed = x) } opt[Seq[String]]("distanceMetrics") optional() action { (x, c) => c.copy(distanceMetrics = x) } opt[Seq[String]]("permutationMetrics") optional() action { (x, c) => c.copy(permutationMetrics = x) } opt[Map[String, String]]("overallMetrics") optional() action { (x, c) => c.copy(overallMetrics = x) } opt[Seq[String]]("distanceBenefitMetrics") optional() action { (x, c) => c.copy(distanceBenefitMetrics = x) } opt[Seq[String]]("performanceBenefitMetrics") optional() action { (x, c) => c.copy(performanceBenefitMetrics = x) } } // If the parser was unable to read the arguments correctly, // this will generate an exception and end the job val cmdLineArgsOpt: Option[MeasureModelFairnessMetricsCmdLineArgs] = parser.parse( args, MeasureModelFairnessMetricsCmdLineArgs()) require(cmdLineArgsOpt.isDefined) cmdLineArgsOpt.get } } ================================================ FILE: lift/src/main/scala/com/linkedin/lift/eval/jobs/MeasureDatasetFairnessMetrics.scala ================================================ package com.linkedin.lift.eval.jobs import com.linkedin.lift.eval.{FairnessMetricsUtils, MeasureDatasetFairnessMetricsCmdLineArgs} import com.linkedin.lift.types.Distribution import org.apache.spark.sql.SparkSession /** * A basic dataset-level fairness metrics measurement program. If your use case * is more involved, you can create a similar wrapper driver program that * prepares the data and calls the computeDatasetMetrics API. */ object MeasureDatasetFairnessMetrics { /** * Driver program to measure various fairness metrics * * @param progArgs Command line arguments */ def main(progArgs: Array[String]): Unit = { val spark = SparkSession .builder() .appName(getClass.getSimpleName) .getOrCreate() val args = MeasureDatasetFairnessMetricsCmdLineArgs.parseArgs(progArgs) // One could choose to do their own preprocessing here // For example, filtering out only certain records based on some threshold val dfReader = spark.read.format(args.dataFormat).options(args.dataOptions) val df = dfReader.load(args.datasetPath) .select(args.uidField, args.labelField) val protectedDF = dfReader.load(args.protectedDatasetPath) // Similar preprocessing can be done with the protected attribute data val joinedDF = FairnessMetricsUtils.computeJoinedDF(protectedDF, df, args.uidField, args.protectedDatasetPath, args.uidProtectedAttributeField, args.protectedAttributeField) // Input distributions are computed using the joined data val referenceDistrOpt = if (args.referenceDistribution.isEmpty) { None } else { val distribution = Distribution.compute(joinedDF, Set(args.labelField, args.protectedAttributeField)) FairnessMetricsUtils.computeReferenceDistributionOpt( distribution, args.referenceDistribution) } // Passing in the appropriate parameters to this API computes and writes // out the fairness metrics FairnessMetricsUtils.computeAndWriteDatasetMetrics(joinedDF, referenceDistrOpt, args) } } ================================================ FILE: lift/src/main/scala/com/linkedin/lift/eval/jobs/MeasureModelFairnessMetrics.scala ================================================ package com.linkedin.lift.eval.jobs import com.linkedin.lift.eval.{FairnessMetricsUtils, MeasureModelFairnessMetricsCmdLineArgs} import com.linkedin.lift.lib.DivergenceUtils import org.apache.spark.sql.SparkSession /** * A basic model-level fairness metrics measurement program. If your use case * is more involved, you can create a similar wrapper driver program that * prepares the data and calls the computeModelMetrics API. */ object MeasureModelFairnessMetrics { /** * Driver program to measure various fairness metrics * * @param progArgs Command line arguments */ def main(progArgs: Array[String]): Unit = { val spark = SparkSession .builder() .appName(getClass.getSimpleName) .getOrCreate() val args = MeasureModelFairnessMetricsCmdLineArgs.parseArgs(progArgs) // One could choose to do their own preprocessing here // For example, filtering out only certain records based on some threshold val dfReader = spark.read.format(args.dataFormat).options(args.dataOptions) val df = FairnessMetricsUtils.projectIdLabelsAndScores(dfReader.load(args.datasetPath), args.uidField, args.labelField, args.scoreField, args.groupIdField) val protectedDF = dfReader.load(args.protectedDatasetPath) // Similar preprocessing can be done with the protected attribute data val joinedDF = FairnessMetricsUtils.computeJoinedDF(protectedDF, df, args.uidField, args.protectedDatasetPath, args.uidProtectedAttributeField, args.protectedAttributeField) joinedDF.persist // Input distributions are computed using the joined data val referenceDistrOpt = if (args.referenceDistribution.isEmpty) { None } else { val probabilityDF = FairnessMetricsUtils.computeProbabilityDF(joinedDF, args.thresholdOpt, args.labelField, args.scoreField, args.protectedAttributeField, args.scoreType) val distribution = DivergenceUtils .computeGeneralizedPredictionCountDistribution(probabilityDF, args.labelField, args.scoreField, args.protectedAttributeField) .computeMarginal(Set(args.scoreField, args.protectedAttributeField)) FairnessMetricsUtils.computeReferenceDistributionOpt( distribution, args.referenceDistribution) } // Passing in the appropriate parameters to this API computes and writes // out the fairness metrics FairnessMetricsUtils.computeAndWriteModelMetrics(joinedDF, referenceDistrOpt, args) } } ================================================ FILE: lift/src/main/scala/com/linkedin/lift/lib/DivergenceUtils.scala ================================================ package com.linkedin.lift.lib import com.linkedin.lift.types.Distribution.DimensionValues import com.linkedin.lift.types.{Distribution, FairnessResult} import org.apache.spark.sql.DataFrame import org.apache.spark.sql.functions._ /** * Utilities to compute divergence, distance, and skew measures. */ object DivergenceUtils { /** * KL divergence from q to p. Note that this is asymmetric. * We assume that the distributions are valid * (i.e., they don't have any non-negative values). * * This method normalizes the values into probabilities. * * There is support for Laplace smoothing for the source distribution * to avoid divide-by-zero errors. To ensure numerical stability, we compute * KL divergence on the counts and then adjust this to convert it into the * actual KL divergence on probabilities. We use log base 2 to measure info * in terms of bits. * * @param p Target distribution * @param q Source distribution * @param alpha Parameter to set amount of Laplace smoothing. * Defaults to 1.0 (add one smoothing) * @return Kullback-Leibler Divergence */ def computeKullbackLeiblerDivergence(p: Distribution, q: Distribution, alpha: Double = 1.0): Double = { val logVals = p.zip(q).map { case (_, pVal, qVal) => if (pVal == 0.0) { 0.0 } else { pVal * math.log(pVal / (qVal + alpha)) } } val pSum = p.sum val qSum = q.sum + (alpha * logVals.size) 1.0 / math.log(2.0) * ((logVals.sum / pSum) + math.log(qSum / pSum)) } /** * JS divergence of p and q. Note that this is symmetric. * We assume that the distributions are valid * (i.e., they don't have any non-negative values). * * The JS divergence is the average of the KS divergences of M (from p and q), * where M is the average of the probability distributions of p and q. * * @param p First distribution * @param q Second distribution * @return Jensen-Shannon Divergence */ def computeJensenShannonDivergence(p: Distribution, q: Distribution): Double = { val pSum = p.sum val qSum = q.sum val avgDistributionEntries = p.zip(q) .map { case (dimensions, pVal, qVal) => (dimensions, 0.5 * ((pVal / pSum) + (qVal / qSum))) } .toMap val avgDistribution = Distribution(avgDistributionEntries) // We don't need any smoothing since an avgDistribution value will be zero // iff the corresponding p and q values are both 0.0. But if this is the // case, p * log(p/avg) will be zero, so no divide by zero errors. 0.5 * (computeKullbackLeiblerDivergence(p, avgDistribution, 0.0) + computeKullbackLeiblerDivergence(q, avgDistribution, 0.0)) } /** * Total variation distance between p and q. Note that this is symmetric. * We assume that the distributions are valid * (i.e., they don't have any non-negative values). * * Total variation distance between p and q equals half the L1-distance * between the underlying probability distribution vectors. It also equals * the largest possible difference between the probabilities that the two * distributions can assign to the same event. * https://en.wikipedia.org/wiki/Total_variation_distance_of_probability_measures * * @param p First distribution * @param q Second distribution * @return Total variation distance */ def computeTotalVariationDistance(p: Distribution, q: Distribution): Double = { val pSum = p.sum val qSum = q.sum val l1Distance = p.zip(q) .map { case (_, pVal, qVal) => math.abs((pVal / pSum) - (qVal / qSum)) } .sum 0.5 * l1Distance } /** * Infinity norm distance (Chebyshev distance) between probability * distributions corresponding to p and q. Note that this is symmetric. * We assume that the distributions are valid * (i.e., they don't have any non-negative values). * * Infinity norm distance (Chebyshev distance) equals the maximum * difference between the probabilities assigned by p and q along any * dimension. * https://en.wikipedia.org/wiki/Chebyshev_distance * * @param p First distribution * @param q Second distribution * @return Infinity norm distance */ def computeInfinityNormDistance(p: Distribution, q: Distribution): Double = { val pSum = p.sum val qSum = q.sum val infinityNormDistance = p.zip(q) .map { case (_, pVal, qVal) => math.abs((pVal / pSum) - (qVal / qSum)) } .max infinityNormDistance } /** * Skew for a category (dimensions) in the observed distribution (p) * with respect to the desired distribution (q), defined as the logarithmic * ratio of the proportion for the category, dimensions observed in p to the * corresponding desired proportion in q. Note that this is asymmetric. * * We assume that the distributions are valid * (i.e., they don't have any non-negative values). * * There is support for Laplace smoothing for the source distribution * to avoid divide-by-zero errors. * * @param p Observed distribution * @param q Desired distribution * @param dimensions Category for which skew is to be computed * @param alpha Parameter to set amount of Laplace smoothing. * Defaults to 1.0 (add one smoothing) * @return Skew */ def computeSkew(p: Distribution, q: Distribution, dimensions: DimensionValues, alpha: Double = 1.0): Double = { val totalCategoryCount = p.zip(q).size val pSum = p.sum + (alpha * totalCategoryCount) val qSum = q.sum + (alpha * totalCategoryCount) math.log(p.getValue(dimensions) + alpha) - math.log(pSum) + math.log(qSum) - math.log(q.getValue(dimensions) + alpha) } /** * Minimum skew over all categories in the observed distribution (p) with * respect to the desired distribution (q), defined as the minimum over all * categories of the logarithmic ratio of the proportion for a category * observed in p to the corresponding desired proportion in q. Note that * this is asymmetric. * * We assume that the distributions are valid * (i.e., they don't have any non-negative values). * * There is support for Laplace smoothing for the source distribution * to avoid divide-by-zero errors. * * @param p Observed distribution * @param q Desired distribution * @param alpha Parameter to set amount of Laplace smoothing. * Defaults to 1.0 (add one smoothing) * @return (Category, skew) corresponding to the minimum skew */ def computeMinSkew(p: Distribution, q: Distribution, alpha: Double = 1.0): (DimensionValues, Double) = { val probRatios = p.zip(q).map { case (dimensions, pVal, qVal) => (dimensions, (pVal + alpha) / (qVal + alpha)) } val pSum = p.sum + (alpha * probRatios.size) val qSum = q.sum + (alpha * probRatios.size) val (minDimensions, minProbRatios) = probRatios.minBy(_._2) (minDimensions, math.log(minProbRatios) + math.log(qSum / pSum)) } /** * Maximum skew over all categories in the observed distribution (p) with * respect to the desired distribution (q), defined as the maximum over all * categories of the logarithmic ratio of the proportion for a category * observed in p to the corresponding desired proportion in q. Note that * this is asymmetric. * * We assume that the distributions are valid * (i.e., they don't have any non-negative values). * * There is support for Laplace smoothing for the source distribution * to avoid divide-by-zero errors. * * @param p Observed distribution * @param q Desired distribution * @param alpha Parameter to set amount of Laplace smoothing. * Defaults to 1.0 (add one smoothing) * @return (Category, skew) corresponding to the maximum skew */ def computeMaxSkew(p: Distribution, q: Distribution, alpha: Double = 1.0): (DimensionValues, Double) = { val probRatios = p.zip(q).map { case (dimensions, pVal, qVal) => (dimensions, (pVal + alpha) / (qVal + alpha)) } val pSum = p.sum + (alpha * probRatios.size) val qSum = q.sum + (alpha * probRatios.size) val (maxDimensions, maxProbRatios) = probRatios.maxBy(_._2) (maxDimensions, math.log(maxProbRatios) + math.log(qSum / pSum)) } /** * Compute skew for all categories, where the skew for a category * (dimensions) in the observed distribution (p) with respect to the * desired distribution (q) is defined as the logarithmic ratio of the * proportion for the category, dimensions observed in p to the * corresponding desired proportion in q. Note that this is asymmetric. * * We assume that the distributions are valid * (i.e., they don't have any non-negative values). * * There is support for Laplace smoothing for the source distribution * to avoid divide-by-zero errors. * * @param p Observed distribution * @param q Desired distribution * @param alpha Parameter to set amount of Laplace smoothing. * Defaults to 1.0 (add one smoothing) * @return A map of (category, skew) tuples */ def computeAllSkews(p: Distribution, q: Distribution, alpha: Double = 1.0): Map[DimensionValues, Double] = { val pzipq = p.zip(q) val totalCategoryCount = pzipq.size val pSum = p.sum + (alpha * totalCategoryCount) val qSum = q.sum + (alpha * totalCategoryCount) val logSumDiff = math.log(pSum) - math.log(qSum) pzipq.map { case (dimensions, pVal, qVal) => val skew = math.log(pVal + alpha) - math.log(qVal + alpha) - logSumDiff (dimensions, skew) }.toMap } /** * Compute a distribution of (protectedAttributeValue, label, prediction) * counts that works both for cases when prediction is {0.0, 1.0}, and when * it is a probability P(y=1) in [0.0, 1.0]. The labels are assumed to be * binary. The working is straightforward in the former case. * In the latter, we compute the expected number of FPs, TPs, FNs and TNs. * E[FPs] = E[C(label = 0, prediction = 1)]. Hence doing this for all * protected attribute values gets us the expected counts as desired. The logic * is similar to that used for computing the generalized confusion matrix. * * This method is typically to be used when there is no notion of a threshold * for the classifier, ie., the model's scores are directly being used for * things like ranking, but the model is actually a binary classifier. * * @param df The input DataFrame * @param labelField Label field name * @param scoreField Score field name * @param protectedAttributeField Protected attribute field name * @return A generalized count distribution */ def computeGeneralizedPredictionCountDistribution(df: DataFrame, labelField: String, scoreField: String, protectedAttributeField: String): Distribution = { // E[number of positive predictions] = sum(prob that ith example is positive * 1.0) // Score of ith example is the probability it is positive (We assume that // the scores are probabilities. Raw scores can be passed through a sigmoid // to convert them into probabilities) val entries = df.select(protectedAttributeField, labelField, scoreField) .groupBy(protectedAttributeField, labelField) .agg(sum(col(scoreField)), sum(lit(1.0) - col(scoreField))) .collect .flatMap { row => val rowSeq = row.toSeq.map { Option(_).fold("") { _.toString } } val protectedAttr = rowSeq.head val label = rowSeq(1) // score1 is the E[C(positive predictions | label, protectedAttr)] // score0 is the E[C(negative predictions | label, protectedAttr)] // They always sum to C(label, protectedAttr) val score1 = rowSeq(2).toDouble val score0 = rowSeq(3).toDouble val dimVals0: DimensionValues = Map( protectedAttributeField -> protectedAttr, labelField -> label, scoreField -> "0.0") val dimVals1: DimensionValues = Map( protectedAttributeField -> protectedAttr, labelField -> label, scoreField -> "1.0") Seq((dimVals0, score0), (dimVals1, score1)) }.toMap Distribution(entries) } /** * Computes Demographic Parity deviations for all combinations of the protected * attribute values. Demographic Parity is defined as * P(Y=1|G=g1) = P(Y=1|G=g2) for all g1, g2 in G (the protected attribute). * The variable Y is the label in the case of training data, and is the * prediction in the case of scored outputs. * * Note that aiming to achieve Demographic Parity is not necessarily an ideal * solution, since it only requires that the positive label rates are equal, * and does not look into more meaningful values like true and false positive * rates. Nevertheless, given two models with similar performance, the one * with lower DP is generally more desirable. * * This metric is ideally suited for binary classifier problems. * * @param p The input distribution of (label/prediction, protectedAttribute) counts * @param labelField The label/prediction field name * @param protectedAttributeField The protected attribute field name * @return A list of Demographic Parity deviations for all combinations of the * protected attribute values. */ def computeDemographicParity(p: Distribution, labelField: String, protectedAttributeField: String): FairnessResult = { // Find out if the labels are 1/0 or 1.0/0.0 val labelVals = p.entries.map { case (dimVals, _) => dimVals(labelField) }.toSet val labelValueOne = if (labelVals.contains("1")) { "1" } else { "1.0" } // Compute P(Y=1 | G=g) for all g in G val protectedAttributeDistr = p.computeMarginal(Set(protectedAttributeField)) val positiveLabelRates = protectedAttributeDistr.entries.map { case (dimVals, protectedAttrCount) => val labelProtectedAttrCount = p.getValue(dimVals ++ Map(labelField -> labelValueOne)) (dimVals.values.mkString(","), StatsUtils.roundDouble(labelProtectedAttrCount / protectedAttrCount)) } // Compute all pairs {g1, g2} from G val allDimValPairs = positiveLabelRates.keys .toSet .subsets(2) // Compute differences for all the pairs val constituentVals = allDimValPairs.map { dimValPair => val dimVal1 = dimValPair.head val dimVal2 = dimValPair.last (Map(protectedAttributeField + "1" -> dimVal1, protectedAttributeField + "2" -> dimVal2), StatsUtils.roundDouble( math.abs(positiveLabelRates(dimVal1) - positiveLabelRates(dimVal2)))) }.toMap FairnessResult( resultType = "DEMOGRAPHIC_PARITY", resultValOpt = None, constituentVals = constituentVals, additionalStats = positiveLabelRates) } /** * Computes Equalized Odds deviations for all combinations of the protected * attribute values. Equalized Odds is defined as * P(Y_hat=1|Y=y,G=g1) = P(Y_hat=1|Y=y,G=g2) for y in {0, 1} (label) and * for all g1, g2 in G (the protected attribute). * The variable Y_hat is the predicted value. * * Note that aiming to achieve perfect Equalized Odds is not always possible, * especially if the application at hand has requirements such as ensuring * high precision across all groups. In such scenarios, it is possible only in * trivial cases of perfect classifiers or equal prevalence rates amongst * various protected groups. That is, all |gC2|*|y| equations might not * all be simultaneously satisfiable. This is due to the impossibility results * that link FPR, TPR (recall), precision and prevalence rates. Equalized Odds * attempts to ensure that FPRs are equal (y=0) and TPRs are equal (y=1). * Thus, this will come at the cost of precision of the model when prevalence * rates are unequal. * * Nevertheless, obtaining these deviations is helpful to understand model * biases upfront. * * This metric is ideally suited for binary classifier problems. * * @param p The input distribution of (label, prediction, protectedAttribute) counts * @param labelField The label field name * @param predictionField The prediction field name * @param protectedAttributeField The protected attribute field name * @return A list of EO deviations for all combinations of the * protected attribute values. */ def computeEqualizedOdds(p: Distribution, labelField: String, predictionField: String, protectedAttributeField: String): FairnessResult = { // Find out if the predictions are 1/0 or 1.0/0.0 val predictionVals = p.entries.map { case (dimVals, _) => dimVals(predictionField) }.toSet val predictionValueOne = if (predictionVals.contains("1")) { "1" } else { "1.0" } // Compute P(Y=1 | Y=y, G=g) for all y in Y and g in G val labelProtectedAttributeDistr = p.computeMarginal(Set(labelField, protectedAttributeField)) val trueFalsePositiveRates = labelProtectedAttributeDistr.entries.map { case (dimVals, labelProtectedAttrCount) => val predictionLabelProtectedAttrCount = p.getValue(dimVals ++ Map(predictionField -> predictionValueOne)) (dimVals, StatsUtils.roundDouble(predictionLabelProtectedAttrCount / labelProtectedAttrCount)) } // Group by Y, so that we don't compare TPRs and FPRs with each other val constituentVals = trueFalsePositiveRates.groupBy { case (dimVals, _) => dimVals(labelField) }.flatMap { case (label, positiveRatesForLabel) => // Compute all pairs {g1, g2} from G val allDimValPairs = positiveRatesForLabel.keys .toSet .subsets(2) // Compute differences for all the pairs allDimValPairs.map { dimValPair => val dimVal1 = dimValPair.head val dimVal2 = dimValPair.last (Map(protectedAttributeField + "1" -> dimVal1(protectedAttributeField), protectedAttributeField + "2" -> dimVal2(protectedAttributeField), labelField -> label), StatsUtils.roundDouble( math.abs(positiveRatesForLabel(dimVal1) - positiveRatesForLabel(dimVal2)))) } } val additionalStats = trueFalsePositiveRates.map { case (dimVals, positiveRate) => (dimVals.values.mkString(","), positiveRate) } FairnessResult( resultType = "EQUALIZED_ODDS", resultValOpt = None, constituentVals = constituentVals, additionalStats = additionalStats) } /** * Computes a list of distance/divergence related fairness metrics over * (protectedAttributeField, labelField/scoreField). * * @param distanceMetrics The set of metrics to compute * @param distribution The input distribution to compute the metrics for. * This is a distribution over (protectedAttribute, labelField) * @param referenceDistrOpt An optional reference distribution, for metrics that * compare the input distribution against another distribution * @param labelField The label field. This could be the score field, in * case one wants to compute the statistics on the * (protectedAttribute, scoreField) distribution instead. * @param protectedAttributeField The protected attribute field * @return A sequence of FairnessResults containing distance/divergence metrics */ def computeDatasetDistanceMetrics(distanceMetrics: Seq[String], distribution: Distribution, referenceDistrOpt: Option[Distribution], labelField: String, protectedAttributeField: String): Seq[FairnessResult] = { distanceMetrics.flatMap { case "SKEWS" => referenceDistrOpt.map { referenceDistr => val allSkews = computeAllSkews(distribution, referenceDistr) FairnessResult( resultType = "SKEWS", resultValOpt = None, parameters = referenceDistr.toString, constituentVals = allSkews) } case "INF_NORM_DIST" => referenceDistrOpt.map { referenceDistr => val infNormDist = computeInfinityNormDistance(distribution, referenceDistr) FairnessResult( resultType = "INF_NORM_DIST", resultValOpt = Some(infNormDist), parameters = referenceDistr.toString, constituentVals = Map()) } case "TOTAL_VAR_DIST" => referenceDistrOpt.map { referenceDistr => val totalVarDist = computeTotalVariationDistance(distribution, referenceDistr) FairnessResult( resultType = "TOTAL_VAR_DIST", resultValOpt = Some(totalVarDist), parameters = referenceDistr.toString, constituentVals = Map()) } case "JS_DIVERGENCE" => referenceDistrOpt.map { referenceDistr => val JSDivergence = computeJensenShannonDivergence(distribution, referenceDistr) FairnessResult( resultType = "JS_DIVERGENCE", resultValOpt = Some(JSDivergence), parameters = referenceDistr.toString, constituentVals = Map()) } case "KL_DIVERGENCE" => referenceDistrOpt.map { referenceDistr => val KLDivergence = computeKullbackLeiblerDivergence(distribution, referenceDistr) FairnessResult( resultType = "KL_DIVERGENCE", resultValOpt = Some(KLDivergence), parameters = referenceDistr.toString, constituentVals = Map()) } case "DEMOGRAPHIC_PARITY" => Some(computeDemographicParity(distribution, labelField, protectedAttributeField)) case _ => None } } /** * Computes a list of distance/divergence related fairness metrics over * (protectedAttributeField, labelField, scoreField). If the scoreField is * missing, it assumes that dataset metrics are being computed, and calls * computeDatasetDistanceMetrics with the appropriate parameters. * * @param distanceMetrics The set of metrics to compute * @param distribution The input distribution to compute the metrics for. This is * a distribution over (protectedAttribute, label, score) * @param referenceDistrOpt An optional reference distribution over * (protectedAttributeField, scoreField) * @param labelField The label field * @param scoreField The score field. If empty, computes dataset-only metrics * @param protectedAttributeField The protected attribute field * @return A sequence of FairnessResults containing distance/divergence metrics */ def computeDistanceMetrics(distanceMetrics: Seq[String], distribution: Distribution, referenceDistrOpt: Option[Distribution], labelField: String, scoreField: String, protectedAttributeField: String): Seq[FairnessResult] = { if (scoreField.isEmpty) { computeDatasetDistanceMetrics(distanceMetrics, distribution, referenceDistrOpt, labelField, protectedAttributeField) } else { // Metrics that need only the score and protectedAttribute val scoreProtectedAttrDistr = distribution.computeMarginal(Set(scoreField, protectedAttributeField)) val computedMetrics = computeDatasetDistanceMetrics(distanceMetrics, scoreProtectedAttrDistr, referenceDistrOpt, scoreField, protectedAttributeField) // Metrics that need both score and label, and the protectedAttribute val additionalOnes = distanceMetrics.flatMap { case "EQUALIZED_ODDS" => Some(computeEqualizedOdds(distribution, labelField, scoreField, protectedAttributeField)) case _ => None } computedMetrics ++ additionalOnes } } } ================================================ FILE: lift/src/main/scala/com/linkedin/lift/lib/PermutationTestUtils.scala ================================================ package com.linkedin.lift.lib import com.linkedin.lift.types.{FairnessResult, ModelPrediction} import scala.util.Random /** * Utilities to perform statistical tests */ object PermutationTestUtils { /** * Generates a bootstrap sample: Given a sequence of size N, generates a new * sequence of the same size that has been obtained by sampling the input * sequence (with replacement) * * @param predictions The input sequence * @return The bootstrap sample */ private[lift] def generateBootstrapSample( predictions: Seq[ModelPrediction]): Seq[ModelPrediction] = { val n = predictions.length (0 until n).map { _ => predictions(Random.nextInt(n)) } } /** * Estimates the standard deviation of a statistic (computed on a given sample). * It achieves this by computing the statistic on multiple bootstrap samples of * the input, and computes the standard deviation of the resulting * distribution (of the statistic). In effect, this simulates the act of * picking multiple samples (of the same size) from the original population * and computing the same statistic for each of these. * * @param predictions The input sample to operate on * @param bootstrapFn The statistic to compute on the bootstrap sample * @param numTrials The number of trials to run the bootstrap sampling for. * More trials produce a better estimate of the distribution. * @return An estimate of the standard deviation of the statistic */ private[lift] def computeBootstrapStdDev(predictions: Seq[ModelPrediction], bootstrapFn: Seq[ModelPrediction] => Double, numTrials: Int): Double = { val bootstrapDifferences = (0 until numTrials).map { _ => val sampledPredictions = generateBootstrapSample(predictions) bootstrapFn(sampledPredictions) } StatsUtils.computeStdDev(bootstrapDifferences) } /** * Computes the difference (in the same metric) between two different groups. * This is the test statistic for permutation testing. * * @param dim1 The first group * @param dim2 The second group * @param fn The metric/statistic to compute on each group * @param predictions The sample for which the difference is to be computed * @return The value of the input fn evaluated for dim1 and dim2, and their difference */ private[lift] def permutationFn(dim1: String, dim2: String, fn: Seq[ModelPrediction] => Double) (predictions:Seq[ModelPrediction]): (Double, Double, Double) = { val predictionsDim1 = predictions.filter(_.dimensionValue == dim1) val predictionsDim2 = predictions.filter(_.dimensionValue == dim2) val value1 = fn(predictionsDim1) val value2 = fn(predictionsDim2) (value1, value2, value1 - value2) } /** * Implementation of the permutation testing methodology described in: * "Cyrus DiCiccio, Sriram Vasudevan, Kinjal Basu, Krishnaram Kenthapadi, Deepak Agarwal. 2020. * Evaluating Fairness Using Permutation Tests. To Appear in Proceedings of the 26th ACM SIGKDD * International Conference on Knowledge Discovery & Data Mining (KDD '20). * Association for Computing Machinery, New York, NY, USA." * * Perform a two-sided permutation test (for a given function) to assess if * the difference between two groups is statistically significant. * * The null hypothesis is that there is no difference between these groups. * If this is the case, then randomly shuffling the samples around should * have no impact on the difference between the two groups. To generate the * distribution of data under the null hypotheses, we need to compute the * difference between all possible permutations of the data split into * two groups. To approximate this, we randomly shuffle the data N times, * splitting it in the ratio of the two groups. * * We then compute the p-value, the probability (under the null hypothesis) * of observing a result as extreme as (or more extreme than) the result * we observed. * * Our sequence of extremeDiffs can be viewed as a biased coin with a bias * equal to the p-value. This can then be looked at as a binomially distributed * observation. We can then estimate its standard error as sqrt(p * (1-p) / n) * (Refer: https://en.wikipedia.org/wiki/Margin_of_error#Calculations_assuming_random_sampling, * https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval#Normal_approximation_interval) * * To decide if the difference is meaningful, one can think of the following: * 1. Is the observed difference large enough to matter? * 2. Is it statistically significant (wrt some significance level) * 3. What is the confidence interval for the estimated p-value? * (It is [p - z * std_err, p + z * std_err] where z is the critical value of * a standard normal distribution corresponding to the (1 - alpha/2) quantile, * and alpha is the target error = 1 - confidence_interval_percentage. Some * useful values: 68.3% is about 1 std error, 95.4% is about 2, and 99.7% is about 3) * * @param predictions The sequence of model predictions to operate on * @param dimType The dimension type, such as gender or age. * @param dim1 The first dimension value / group * @param dim2 The second dimension value / group * @param metric The metric to evaluate using the permutation test * @param numTrials Number of trials to perform (for both permutation * testing and bootstrap estimate of std dev) * @param seed Random seed. If not provided (or set to 0), uses a random seed. * @return A PermutationTestResult case class containing the results */ def permutationTest(predictions: Seq[ModelPrediction], dimType: String, dim1: String, dim2: String, metric: String, numTrials: Int, seed: Long = 0): FairnessResult = { // Consider samples for only the two dimensions since simulations show // that testing with only these samples results in more statistical power. val predictionsDim12 = predictions.filter { prediction => (prediction.dimensionValue == dim1) || (prediction.dimensionValue == dim2) } val bucket1Length = predictionsDim12.count(_.dimensionValue == dim1) // Set seed if provided and non-zero if (seed != 0) { Random.setSeed(seed) } // Obtain permutation functions val fn = StatsUtils.getMetricFn(metric) val permutationTestFn: Seq[ModelPrediction] => (Double, Double, Double) = permutationFn(dim1, dim2, fn) // Compute the observed difference and studentize it val (value1, value2, observedDifference) = permutationTestFn(predictionsDim12) val bootstrapStdDev = computeBootstrapStdDev(predictionsDim12, permutationTestFn(_)._3, numTrials) val studentizedObservedDifference = observedDifference / bootstrapStdDev // Compute differences for n random trials and studentize it val differenceHist: Seq[Double] = (0 until numTrials).map { _ => val (shuffledBucket1, shuffledBucket2) = Random.shuffle(predictionsDim12).splitAt(bucket1Length) fn(shuffledBucket1) - fn(shuffledBucket2) } val differenceHistStdDev = StatsUtils.computeStdDev(differenceHist) val studentizedDiffHist = differenceHist.map(_ / differenceHistStdDev) .filterNot(_.isNaN) // Compute p-value for the two-sided test val extremeDiffs = studentizedDiffHist.map(math.abs) .map(_ > math.abs(studentizedObservedDifference)) .map(_.compare(false)) val pVal = extremeDiffs.sum / numTrials.asInstanceOf[Double] val stdError = math.sqrt(pVal * (1 - pVal) / numTrials) // Build FairnessResults val metricMap = Map( "metric" -> metric, "numTrials" -> numTrials.toString, "seed" -> seed.toString) FairnessResult( resultType = "PERMUTATION_TEST", resultValOpt = Some(StatsUtils.roundDouble(observedDifference)), parameters = metricMap.toString, constituentVals = Map( Map(dimType -> dim1) -> StatsUtils.roundDouble(value1), Map(dimType -> dim2) -> StatsUtils.roundDouble(value2)), additionalStats = Map( "pValue" -> StatsUtils.roundDouble(pVal), "stdError" -> StatsUtils.roundDouble(stdError), "bootstrapStdDev" -> bootstrapStdDev, "testStatisticStdDev" -> differenceHistStdDev)) } } ================================================ FILE: lift/src/main/scala/com/linkedin/lift/lib/PositionBiasUtils.scala ================================================ package com.linkedin.lift.lib import com.linkedin.lift.types.ScoreWithLabelAndPosition import org.apache.spark.mllib.stat.KernelDensity import org.apache.spark.sql.expressions.Window import org.apache.spark.sql.functions.{broadcast, count, lit, min, pow, rand, stddev_pop} import org.apache.spark.sql.{DataFrame, Dataset} /** * Utilities for estimating position bias and removing the effect of position bias for learning * an equality of opportunity transformation */ object PositionBiasUtils { case class PositionBias(position: Int, positionBias: Double) /** * Bandwidth computation based on Silverman's rule for kernel density estimation * * @param df a dataframe containing i.i.d. samples as "value" * @return bandwidth */ def getBandwidth(df: DataFrame): Double = { import df.sparkSession.implicits._ df.agg(count($"value").as("numSamples"), stddev_pop($"value").as("stdDev")) .select(lit(1.06) * $"stdDev" * pow($"numSamples", lit(-0.2))).head.getDouble(0) } /** * Kernel density estimation with a Gaussian kernel for scores corresponding to a given position * * @param data dataset containing score, binary label/response, position * @param position position value for filtering * @return a probability density function */ def getDensity(data: Dataset[ScoreWithLabelAndPosition], position: Int): KernelDensity = { import data.sparkSession.implicits._ val scores = data.filter(_.position == position).map(_.score) val bw = getBandwidth(scores.toDF("value")) val density = new KernelDensity() .setSample(scores.rdd) .setBandwidth(bw) density } /** * Estimating the position bias at "targetPosition" with respect to "basePosition", * where the position bias is defined as the decay in the number of positive examples * from basePosition to targetPosition when similar items are served at each position. * Typically, (the quality of) items differ from one position to another in the observational data. * We correct for this discrepancy by matching the score distribution at the target position with * the score distribution at the base position via importance sampling. * * @param data dataset containing score, binary label {0, 1}, position * @param maxImportanceWeight to control the variance of an importance sampling estimator * @param targetPosition target position for position bias estimation * @param basePosition base position for position bias estimation * @return estimated position bias */ def estimateAdjacentPositionBias(data: Dataset[ScoreWithLabelAndPosition], maxImportanceWeight: Double, targetPosition: Int, basePosition: Int): Double = { import data.sparkSession.implicits._ // estimating the density of the scores at basePosition val kdTargetPosition = getDensity(data, targetPosition) // estimating the density of the scores at basePosition val kdPreviousPosition = getDensity(data, basePosition) // extracting scores corresponding to positive labels at targetPosition val targetPositionPositiveScoresArray = data .filter(r => r.label == 1 && r.position == targetPosition) .map(r => r.score).collect // estimating the positive label probability at targetPosition with importance sampling adjustment val totalWeight = kdPreviousPosition.estimate(targetPositionPositiveScoresArray) .zip(kdTargetPosition.estimate(targetPositionPositiveScoresArray)) .map(x => Math.min(x._1 / x._2, maxImportanceWeight)).sum // estimating the positive label probability at basePosition val basePositionPositiveScoresCount = data .filter(r => r.label == 1 && r.position == basePosition) .count totalWeight / basePositionPositiveScoresCount } /** * Estimating the position bias with respect to the top most position * based on cumulative multiplications of estimated adjacent position biases. * The adjacent position bias at a target position is estimated as the decay in the number of positive examples * from the previous position to the target position after adjusting for * the discrepancy in the score distributions with importance sampling (see above). * * @param data dataset containing score, binary label {0, 1}, position * @param maxImportanceWeight to control the variance of an importance sampling estimator * for the adjacent position bias estimations * @param positionBiasEstimationCutOff all adjacent position biases will be set to one beyond this cutoff * @return position bias estimates for all positions */ def estimatePositionBias(data: Dataset[ScoreWithLabelAndPosition], maxImportanceWeight: Double, positionBiasEstimationCutOff: Int): Seq[PositionBias] = { import data.sparkSession.implicits._ val positions = data.map(_.position).distinct.collect.toSeq.sorted val adjacentPositionBias = (1 until math.min(positions.size, positionBiasEstimationCutOff)).map(i => estimateAdjacentPositionBias( data.filter($"position" === positions(i) or $"position" === positions(i - 1)), maxImportanceWeight, positions(i), positions(i - 1))) var estimate = Seq(PositionBias(positions.head, 1.0)) for (i <- 1 until positions.size) { if (i < positionBiasEstimationCutOff) { // note that adjacentPositionBias(i-1) = adjacent position bias at position(i) estimate :+= PositionBias(positions(i), estimate.last.positionBias * adjacentPositionBias(i - 1)) } else { estimate :+= PositionBias(positions(i), estimate.last.positionBias) } } estimate } /** * Resampling data with weights corresponds to the inverse position bias for removing effect of position bias * from the data with positive labels. * We first compute normalized weights from the position bias estimates. * Note that the normalized weights are in [0, 1]. * This allows us to get a weighted sample by applying down-sampling with normalized weights as down-sample rate. * We repeat the down-sampling with multiple copies of the original sample to improve * the accuracy of the final weighted sample. * * @param data dataset containing score, binary label {0, 1}, position * @param maxImportanceWeight see estimatePositionBias() * @param positionBiasEstimationCutOff see estimatePositionBias() * @param repeatTimes number of times the dataset to be repeated for resampling, * a larger number would lead to a more computationally expensive but * more accurate debiasing * @param inflationRate the maximum allowed ratio of the number of data points with positive labels * after and before debiasing * @param numPartitions the number of partition for repartitioning the data with positive labels * @param seed for setting random seed for reproducibility * @return debiased dataset with positive labels */ def debiasPositiveLabelScores(data: Dataset[ScoreWithLabelAndPosition], maxImportanceWeight: Double = 1000, positionBiasEstimationCutOff: Int, repeatTimes: Int = 1000, inflationRate: Double = 10, numPartitions: Int = 1000, seed: Long = scala.util.Random.nextLong()): Dataset[ScoreWithLabelAndPosition] = { import data.sparkSession.implicits._ if (positionBiasEstimationCutOff < 1) { return data.filter(_.label == 1) } val positionBiasEstimates = estimatePositionBias(data, maxImportanceWeight, positionBiasEstimationCutOff) .toDF val dataWithWeight = data .filter(_.label == 1) .repartition(numPartitions, $"position") .join(broadcast(positionBiasEstimates), Seq("position"), "left_outer") .withColumn("minPositionBias", min($"positionBias").over(Window.partitionBy(lit(1)))) .withColumn("weight", $"minPositionBias" / $"positionBias") val repeatedSamples = Seq.range(0, repeatTimes) .map(i => dataWithWeight.filter(rand(seed * i) <= $"weight").drop("weight")) .reduceOption(_ union _).getOrElse(throw new RuntimeException("Cannot create union data")) .as[ScoreWithLabelAndPosition] repeatedSamples.persist() val downSampleRate = inflationRate * data.count().toFloat / repeatedSamples.count() if (downSampleRate < 1) { repeatedSamples.sample(downSampleRate) } else { repeatedSamples } } } ================================================ FILE: lift/src/main/scala/com/linkedin/lift/lib/StatsUtils.scala ================================================ package com.linkedin.lift.lib import com.linkedin.lift.types.{CustomMetric, ModelPrediction} import org.apache.spark.sql.DataFrame import org.apache.spark.sql.functions.col /** * Utilities to perform statistical tests */ object StatsUtils { /** * Represents a confusion matrix * * @param truePositive True Positive count * @param falsePositive False Positive count * @param trueNegative True Negative count * @param falseNegative False Negative count */ case class ConfusionMatrix( truePositive: Double, falsePositive: Double, trueNegative: Double, falseNegative: Double) /** * Round off a double to a certain number of digits of precision * @param d The double to round off * @param numDigits Number of digits of precision required * @return The rounded off double */ def roundDouble(d: Double, numDigits: Int = 5): Double = { val multiplier = math.pow(10, numDigits) math.round(d * multiplier) / multiplier } /** * Compute the percentage of positive and negative labels to sample, given * the source DataFrames for the positive and negative labels. * * @param posDF The original DataFrame containing only positive labels * @param negDF The original DataFrame containing only negative labels * @param approxRows The approximate number of rows to sample (in total) * @param labelZeroPercentage Percentage of negative labels to be present in * the sampled DataFrame. If not provided (or if * an invalid percentage is given), the sampling * ratio is according to that of the source * DataFrame. * @return The sampling percentages for the positive and negative labels * respectively, to be used for stratified sampling. */ def computePosNegSamplePercentages(posDF: DataFrame, negDF: DataFrame, approxRows: Long, labelZeroPercentage: Double = -1.0): (Double, Double) = { val posCount = posDF.count.toDouble val negCount = negDF.count.toDouble val totalCount = negCount + posCount val (samplePosPercentage, sampleNegPercentage) = if (labelZeroPercentage >= 0.0 && labelZeroPercentage <= 1.0) { ((approxRows * (1.0 - labelZeroPercentage)) / posCount, (approxRows * labelZeroPercentage) / negCount) } else { (approxRows / totalCount, approxRows / totalCount) } val updatedSamplePosPercentage = if (samplePosPercentage > 1.0) 1.0 else samplePosPercentage val updatedSampleNegPercentage = if (sampleNegPercentage > 1.0) 1.0 else sampleNegPercentage (updatedSamplePosPercentage, updatedSampleNegPercentage) } /** * Sample an approximate number of entries from a DataFrame (using stratified * sampling), ensuring that it contains a positive:negative label ratio * as specified. If no such input is provided, we attempt to sample according * to the ratio in which positives and negatives are present in the input * DataFrame. * * @param df The DataFrame to operate on * @param labelField The column containing the labels * @param approxRows An approximate number of rows to sample * @param labelZeroPercentage Percentage of negative labels to be present in * the sampled DataFrame. If not provided (or if * an invalid percentage is given), the sampling * ratio is according to that of the source * DataFrame. * @param seed Random seed. If not provided (or set to 0), uses a random seed. * @return The sampled DataFrame. */ def sampleDataFrame(df: DataFrame, labelField: String, approxRows: Long, labelZeroPercentage: Double = -1.0, seed: Long = 0): DataFrame = { val posDF = df.filter(col(labelField) === 1.0) val negDF = df.filter(col(labelField) === 0.0) val (samplePosPercentage, sampleNegPercentage) = computePosNegSamplePercentages(posDF, negDF, approxRows, labelZeroPercentage) val (samplePosDF, sampleNegDF) = if (seed == 0) { (posDF.sample(samplePosPercentage), negDF.sample(sampleNegPercentage)) } else { (posDF.sample(samplePosPercentage, seed), negDF.sample(sampleNegPercentage, seed)) } samplePosDF.union(sampleNegDF) } /** * Sample an approximate number of entries from a DataFrame by selecting all * rows belonging to a group ID, for a bunch of randomly sampled group IDs. * The idea behind this sampling technique is to ensure that per-groupID * metrics have as little noise as possible (eg: a group ID may have only 25 * results, if the group ID is the search ID), while cutting down on the * number of groups being analyzed. Ranking metrics average their results * across group IDs, so sampling by this should only affect the averaging * process. * * @param df The DataFrame to operate on * @param labelField The column containing the labels * @param scoreField The column containing the scores * @param groupIdField The column containing the group IDs * @param protectedAttributeField The column containing the protected attributes * @param approxRows An approximate number of groupIDs to sample * @param seed Random seed. If not provided (or set to 0), uses a random seed. * @return The sampled DataFrame. */ def sampleDataFrameByGroupId(df: DataFrame, labelField: String, scoreField: String, groupIdField: String, protectedAttributeField: String, approxRows: Long, seed: Long = 0): DataFrame = { val modelPredictionDF = ModelPrediction.getModelPredictionDS(df, labelField, scoreField, groupIdField, protectedAttributeField) .toDF val groupIdDF = modelPredictionDF.select("groupId").distinct val samplePercentage = math.min(1.0, approxRows.toDouble / groupIdDF.count) val sampledGroupIdDF = if (seed == 0) { groupIdDF.sample(samplePercentage) } else { groupIdDF.sample(samplePercentage, seed) } modelPredictionDF.join(sampledGroupIdDF, "groupId") .select(col("label").as(labelField), col("prediction").as(scoreField), col("groupId").as(groupIdField), col("dimensionValue").as(protectedAttributeField)) } /** * Computes Precision@K at a given threshold. Data points with predictions * higher than this threshold are true positives and others are false positives. * For example, if job views are labeled 1 and job applies are labeled 2, * using a threshold of 2 computes P@K for job applies * while 1 computes it for job views (includes job applies). Note that threshold * indicates whether a result is 'relevant' or not (TP or FP), while K indicates * the position up to which results are to be looked at. * * @param threshold Threshold above which to mark as true positive * @param k The value of k for P@K * @param data The data to compute this over * @return The Precision@K value for the input data */ def computePrecisionAtK(threshold: Double, k: Int) (data: Seq[ModelPrediction]): Double = { def singleQueryPrecisionAtK(predicted: Seq[ModelPrediction]): Double = { // Consider only predictions with ranks [1, k] val predictedWithinK = predicted .filter(_.rank <= k) if (predictedWithinK.isEmpty) { 0.0 } else { // Ranking metrics are computed by looking at the predicted ordering // of labels rather than the predictions/scores themselves val numRelevant = predictedWithinK .count(_.label >= threshold) .toDouble numRelevant / predictedWithinK.length } } val precisions = data.groupBy(_.groupId) .map { case (_, perGroupPredictions) => singleQueryPrecisionAtK(perGroupPredictions) } precisions.sum / precisions.size } /** * Retrieve the metric function corresponding to the requested metric * * @param metric The metric of interest * @return The function that computes the requested metric */ def getMetricFn(metric: String): Seq[ModelPrediction] => Double = { if (metric.equals("AUC")) { computeAUC } else if (metric.equals("FNR")) { computeFalseNegativeRate } else if (metric.equals("FPR")) { computeFalsePositiveRate } else if (metric.equals("TNR")) { computeTrueNegativeRate } else if (metric.equals("PRECISION")) { computePrecision } else if (metric.equals("RECALL")) { computeRecall } else if (metric.matches("PRECISION/\\d*\\.*\\d+@\\d+")) { // The format is PRECISION/threshold@K val parameters = metric.split("/").last val threshold = parameters.split("@").head.toDouble val k = parameters.split("@").last.toInt computePrecisionAtK(threshold, k) } else { Class.forName(metric) .asInstanceOf[Class[CustomMetric]] .newInstance .compute } } /** * Compute the standard deviation of a given sample. This is obtained by * taking the square root of the unbiased estimator of the variance, but the * estimate of the standard deviation obtained as a result is biased. * The unbiased estimator of the standard deviation is fairly involved and * isn't worth it, especially when we're dealing with large samples. * * @param vals The values to obtain the standard deviation for * @return The standard deviation */ def computeStdDev(vals: Seq[Double]): Double = { val valsWithoutNan = vals.filterNot(_.isNaN) val variance = if (valsWithoutNan.length < 2) { 0.0 } else { // Compute an unbiased estimate of the variance val n = valsWithoutNan.length val mean = valsWithoutNan.sum / n val sumOfSquareDeviations = valsWithoutNan .map { v => (v - mean) * (v - mean) } .sum sumOfSquareDeviations / (n - 1) } math.sqrt(variance) } ////////////////////////////////////////////////////////////////////////////// // Metrics to evaluate using the permutation test are defined below. ////////////////////////////////////////////////////////////////////////////// /** * Computes a generalized confusion matrix for the given model prediction * data. The values of this matrix are the sum of the prediction scores. * * If the predicted scores are thresholded (ie., either 0.0 or 1.0 * only), the generalized confusion matrix reduces * to the traditional confusion matrix. * * @param data The model prediction details * @return A confusion matrix containing the true positive, false positive, * true negative and false negative scores/counts. */ def computeGeneralizedConfusionMatrix(data: Seq[ModelPrediction]): ConfusionMatrix = { if (data.isEmpty) { ConfusionMatrix(0, 0, 0, 0) } else { data.map { modelPrediction => // The label indicates if it is a positive label (1) or not (0) val tp = modelPrediction.prediction * modelPrediction.label val fp = modelPrediction.prediction * (1.0 - modelPrediction.label) val tn = (1.0 - modelPrediction.prediction) * (1.0 - modelPrediction.label) val fn = (1.0 - modelPrediction.prediction) * modelPrediction.label ConfusionMatrix( truePositive = tp, falsePositive = fp, trueNegative = tn, falseNegative = fn) }.reduce { (cm1, cm2) => // Add up entries ConfusionMatrix( truePositive = cm1.truePositive + cm2.truePositive, falsePositive = cm1.falsePositive + cm2.falsePositive, trueNegative = cm1.trueNegative + cm2.trueNegative, falseNegative = cm1.falseNegative + cm2.falseNegative) } } } /** * Computes the precision/Positive Predictive Value for a given set of * model predictions. * Refer: https://en.wikipedia.org/wiki/Confusion_matrix * * @param data Sequence of model predictions. * We assume that the predictions are thresholded to 0/1. * @return The precision for the given predictions */ def computePrecision(data: Seq[ModelPrediction]): Double = { val confusionMatrix = computeGeneralizedConfusionMatrix(data) if (confusionMatrix.truePositive == 0) { 0.0 } else { confusionMatrix.truePositive / (confusionMatrix.truePositive + confusionMatrix.falsePositive) } } /** * Computes the False Positive Rate for a given set of model predictions. * Refer: https://en.wikipedia.org/wiki/Confusion_matrix * * @param data Sequence of model predictions * We assume that the predictions are thresholded to 0/1. * @return The FPR for the given predictions */ def computeFalsePositiveRate(data: Seq[ModelPrediction]): Double = { val confusionMatrix = computeGeneralizedConfusionMatrix(data) if (confusionMatrix.falsePositive == 0) { 0.0 } else { confusionMatrix.falsePositive / (confusionMatrix.falsePositive + confusionMatrix.trueNegative) } } /** * Computes the False Negative Rate for a given set of model predictions. * Refer: https://en.wikipedia.org/wiki/Confusion_matrix * * @param data Sequence of model predictions * We assume that the predictions are thresholded to 0/1. * @return The FNR for the given predictions */ def computeFalseNegativeRate(data: Seq[ModelPrediction]): Double = { val confusionMatrix = computeGeneralizedConfusionMatrix(data) if (confusionMatrix.falseNegative == 0) { 0.0 } else { confusionMatrix.falseNegative / (confusionMatrix.truePositive + confusionMatrix.falseNegative) } } /** * Computes the recall/sensitivity/True Positive Rate for a given set of * model predictions. * Refer: https://en.wikipedia.org/wiki/Confusion_matrix * * @param data Sequence of model predictions * We assume that the predictions are thresholded to 0/1. * @return The recall for the given predictions */ def computeRecall(data: Seq[ModelPrediction]): Double = { val confusionMatrix = computeGeneralizedConfusionMatrix(data) if (confusionMatrix.truePositive == 0) { 0.0 } else { confusionMatrix.truePositive / (confusionMatrix.truePositive + confusionMatrix.falseNegative) } } /** * Computes the True Negative Rate for a given set of model predictions. * Refer: https://en.wikipedia.org/wiki/Confusion_matrix * * @param data Sequence of model predictions * We assume that the predictions are thresholded to 0/1. * @return The TNR for the given predictions */ def computeTrueNegativeRate(data: Seq[ModelPrediction]): Double = { val confusionMatrix = computeGeneralizedConfusionMatrix(data) if (confusionMatrix.trueNegative == 0) { 0.0 } else { confusionMatrix.trueNegative / (confusionMatrix.trueNegative + confusionMatrix.falsePositive) } } /** * Compute the ROC curve for a given sequence of labels and predictions. * The implementation here is similar to NumPy's ROC curve computation. * * @param data The sequence of labels and predictions * @return The False Positive Rate (FPR) and True Positive Rate (TPR) values * for the model, making up the X and Y axes of the ROC curve */ def computeROCCurve(data: Seq[ModelPrediction]): (Seq[Double], Seq[Double]) = { val descSortedData = data.sortBy(-_.prediction) // Select the largest index for each unique prediction value. For example, // for [0.8, 0.7, 0.7, 0.6, 0.5, 0.5, 0.5], we get [0, 2, 3, 6]. // We do this by finding the indices with non-zero differences between // adjacent elements. We need to force-select the last index. val thresholdIdxs = descSortedData.sliding(2) .zipWithIndex .collect { case (slidingWindow, idx) if (slidingWindow(1).prediction - slidingWindow.head.prediction) != 0 => idx }.toList :+ (descSortedData.length - 1) val cumSums = descSortedData.scanLeft(0.0)(_ + _.label) .tail // Drop the initial 0.0 that gets added to the list // Select the cumulative sums at the identified thresholds. This would be the // number of true positives, since the labels are 0 or 1. val truePositives = thresholdIdxs.collect(cumSums) val falsePositives = thresholdIdxs.zip(truePositives) .map { case (thresholdIdx, truePositive) => // 1 + thresholdIdx is the number of entries marked +ve at that threshold // so subtracting the TP count would give us the FP count. 1 + thresholdIdx - truePositive } // The last entries in the FP and TP counts would be the values when // all datapoints are predicted as 1. This means that TN and FN would be zero // due to no negative predictions. Thus, N = FP + TN = FP, and P = TP + FN = TP. // That is, the total positive and negative counts are given by the last entries // in the TP and FP lists respectively. val numPositives = truePositives.lastOption.getOrElse(0.0) val numNegatives = falsePositives.lastOption.getOrElse(0.0) val fpr = falsePositives.map(_ / numNegatives) val tpr = truePositives.map(_ / numPositives) (fpr, tpr) } /** * Compute AUC for a given sequence of labels and predictions. * * This works by computing the ROC curve, and estimates the integral of * y = f(x) (where x is the fpr and y is the tpr) by using the trapezoidal * rule. This is similar to how NumPy and Spark MLLib estimate AUC. * * @param data The sequence of labels and predictions * @return The AUC for the model */ def computeAUC(data: Seq[ModelPrediction]): Double = { val (fpr, tpr) = computeROCCurve(data) if (fpr.length == 1 && tpr.length == 1) { 0.0 } else { // Integral from a to b of f(x) is estimated by computing the area of the // trapezium (formed by a, b, f(a) and f(b)) as (b-a) * (f(a) + f(b)) / 2 fpr.zip(tpr) .sliding(2) .foldLeft(0.0) { case (auc, slidingWindow) => val xDiff = slidingWindow(1)._1 - slidingWindow.head._1 val yAvg = (slidingWindow(1)._2 + slidingWindow.head._2) / 2.0 auc + (xDiff * yAvg) } } } } ================================================ FILE: lift/src/main/scala/com/linkedin/lift/lib/testing/TestCustomMetric.scala ================================================ package com.linkedin.lift.lib.testing import com.linkedin.lift.types.{CustomMetric, ModelPrediction} /** * Custom metric test class. Returns 1.0 */ class TestCustomMetric extends CustomMetric { override def compute(data: Seq[ModelPrediction]): Double = { 1.0 } } ================================================ FILE: lift/src/main/scala/com/linkedin/lift/lib/testing/TestUtils.scala ================================================ package com.linkedin.lift.lib.testing import com.linkedin.lift.types.ScoreWithLabelAndAttribute import org.apache.spark.SparkConf import org.apache.spark.sql.expressions.Window import org.apache.spark.sql.functions.rank import org.apache.spark.sql.types.StructType import org.apache.spark.sql.{DataFrame, Dataset, SparkSession} import scala.reflect.ClassTag import scala.reflect.runtime.universe._ /** * Common utilities for testing purposes */ object TestUtils { /** * Creates DataFrame from a subclass of Product */ def createDFFromProduct[T <: Product: ClassTag](spark: SparkSession, data: Seq[T]) (implicit t: TypeTag[T]): DataFrame = { val rdd = spark.sparkContext.parallelize(data) spark.createDataFrame(rdd) } /** * Create the local SparkSession used for general-purpose Spark unit tests. * * @param appName: Name of the local spark app. * @param numThreads: Parallelism of the local spark app, the input of * numThreads can either be an integer or the character '*' * which means spark will use as many worker * threads as the logical cores. */ def createSparkSession(appName: String = "localtest", numThreads: Any = 4): SparkSession = { val sparkConf: SparkConf = { // Turn on Kryo serialization by default val conf = new SparkConf() conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") conf.set("spark.driver.host", "localhost") conf } // numThreads can either be an integer or '*' which means Spark will // use as many worker threads as the logical cores if (!numThreads.isInstanceOf[Int] && !numThreads.equals("*")) { throw new IllegalArgumentException(s"Invalid arguments: The number of " + s"threads ($numThreads) can only be integers or '*'.") } SparkSession.builder .appName(appName) .master(s"local[$numThreads]") .config(sparkConf) .getOrCreate() } /** * Loading csv data * * @param spark spark session * @param dataPath data path * @param dataSchema data schema * @param delimiter data separating delimiter * @param numPartitions number of partitions * @return loaded data as a dataframe */ def loadCsvData(spark: SparkSession, dataPath: String, dataSchema: StructType, delimiter: String, numPartitions: Int = 100): DataFrame ={ spark.read.format("csv") .option("header", value = true) .option("delimiter", delimiter) .option("numPartitions", numPartitions) .schema(dataSchema) .load(dataPath) } /** * To apply the effect of position bias (i.e. positive response decay), we multiply the labels with * Bernoulli(1/(1 + position)) random numbers, * where the position corresponds to the rank of an item in a session (according to item scores). * * @param dataWithoutPositionBias a dataset containing sessionId, score, position and label * @return dataset with modified labels */ def applyPositionBias(dataWithoutPositionBias: Dataset[ScoreWithLabelAndAttribute]): Dataset[ScoreWithLabelAndAttribute] = { import dataWithoutPositionBias.sparkSession.implicits._ dataWithoutPositionBias .withColumn("position", rank().over(Window.partitionBy($"sessionId").orderBy($"score".desc))) .as[ScoreWithLabelAndAttribute] .map(row => row.copy(label = if (math.random < 1 / math.log(1 + row.position.getOrElse(1)) && row.label == 1) 1 else 0)) } } ================================================ FILE: lift/src/main/scala/com/linkedin/lift/lib/testing/TestValues.scala ================================================ package com.linkedin.lift.lib.testing import com.linkedin.lift.types.ScoreWithLabelAndPosition import org.apache.spark.mllib.random.RandomRDDs.normalRDD import org.apache.spark.rdd.RDD import org.apache.spark.sql.{DataFrame, Dataset, SparkSession} /** * Common values for testing purposes */ object TestValues { val spark: SparkSession = TestUtils.createSparkSession(numThreads = "*") import spark.implicits._ case class JoinedData(memberId: Int, label: String, predicted: String, gender: String, qid: String = "") val testData: Seq[JoinedData] = Seq( JoinedData(12340, "0", "0", "MALE"), JoinedData(12341, "1", "0", "MALE"), JoinedData(12342, "0", "1", "MALE"), JoinedData(12343, "0", "0", "MALE"), JoinedData(12344, "1", "1", "MALE"), JoinedData(12345, "0", "1", "UNKNOWN"), JoinedData(12346, "1", "1", "FEMALE"), JoinedData(12347, "1", "0", "FEMALE"), JoinedData(12348, "0", "0", "FEMALE"), JoinedData(12349, "0", "1", "FEMALE")) val df: DataFrame = TestUtils.createDFFromProduct(TestValues.spark, testData) val testData2: Seq[JoinedData] = Seq( JoinedData(12340, "0.0", "0.3", "MALE", "1"), JoinedData(12341, "1.0", "0.4", "MALE", "2"), JoinedData(12342, "0.0", "0.8", "MALE", "3"), JoinedData(12343, "0.0", "0.1", "MALE", "3"), JoinedData(12344, "1.0", "0.7", "MALE", "1"), JoinedData(12345, "0.0", "0.6", "UNKNOWN", "2"), JoinedData(12346, "1.0", "0.9", "FEMALE", "2"), JoinedData(12347, "1.0", "0.3", "FEMALE", "3"), JoinedData(12348, "0.0", "0.2", "FEMALE", "2"), JoinedData(12349, "0.0", "0.8", "FEMALE", "1")) val df2: DataFrame = TestUtils.createDFFromProduct(TestValues.spark, testData2) // test data for PositionBiasUtils val score00: RDD[ScoreWithLabelAndPosition] = normalRDD(spark.sparkContext, 1000L, 1, 12) .map(x => ScoreWithLabelAndPosition(x, 0, 1)) val score10: RDD[ScoreWithLabelAndPosition] = normalRDD(spark.sparkContext, 1000L, 1, 123) .map(x => ScoreWithLabelAndPosition(x, 1, 1)) val score01: RDD[ScoreWithLabelAndPosition] = normalRDD(spark.sparkContext, 200L, 1, 1234) .map(x => ScoreWithLabelAndPosition(x, 0, 2)) val score11: RDD[ScoreWithLabelAndPosition] = normalRDD(spark.sparkContext, 800L, 1, 12345) .map(x => ScoreWithLabelAndPosition(x, 1, 2)) val score02: RDD[ScoreWithLabelAndPosition] = normalRDD(spark.sparkContext, 100L, 1, 23) .map(x => ScoreWithLabelAndPosition(x - 0.5, 0, 3)) val score12: RDD[ScoreWithLabelAndPosition] = normalRDD(spark.sparkContext, 600L, 1, 234) .map(x => ScoreWithLabelAndPosition(x - 0.5, 1, 3)) val positionBiasData: Dataset[ScoreWithLabelAndPosition] = (score00 ++ score01 ++ score10 ++ score11 ++ score02 ++ score12).toDS } ================================================ FILE: lift/src/main/scala/com/linkedin/lift/mitigation/EOppUtils.scala ================================================ package com.linkedin.lift.mitigation import com.linkedin.lift.types.{ScoreWithAttribute, ScoreWithLabelAndAttribute} import org.apache.spark.sql.{DataFrame, Dataset} /** * Utilities for learning and applying an equality of opportunity transformation * (based on https://arxiv.org/abs/2006.11350) */ object EOppUtils { /** * This is a helper function for applyTransformation() below. * Transforming a single score using a transformation function given as a scala map. * We first perform a binary search to determine the closest lowerBound and upperBound of the score in the keys * of the transformation map. Then we transform the score assuming that the transformation function is linear in * the interval (lowerBound, upperBound). * * @param score score value * @param sortedKeys sorted keys of the transformation map * @param transformation transformation function given as a scala map * @return transformed score */ def transformScore(score: Double, sortedKeys: Seq[Double], transformation: Map[Double, Double]): Double = { if (score <= sortedKeys.head) { return transformation(sortedKeys.head) } else if (score >= sortedKeys.last) { return transformation(sortedKeys.last) } var left = 0 var right = sortedKeys.length - 1 while (left <= right) { val mid = left + (right - left) / 2 if (sortedKeys(mid) <= score) left = mid + 1 else right = mid - 1 if (score <= sortedKeys(left)) { right = left - 1 } else if (score >= sortedKeys(right)) { left = right + 1 } } val lowerBound = sortedKeys(right) val upperBound = sortedKeys(left) val deltaProportion = (score - lowerBound) / (upperBound - lowerBound) transformation(lowerBound) + deltaProportion * (transformation(upperBound) - transformation(lowerBound)) } /** * Transform scores of a dataset based on the corresponding attribute using transformScore(). * * @param data dataset containing score and attribute * @param attributeList list of attributes * @param transformations transformations represented as a scala Map for each attribute * @return transformed scores */ def applyTransformation(data: Dataset[ScoreWithAttribute], attributeList: Seq[String], transformations: Map[String, Map[Double, Double]], numPartitions: Int = 1000): Dataset[ScoreWithAttribute] = { import data.sparkSession.implicits._ val sortedKeys: Map[String, Seq[Double]] = attributeList.zip(attributeList.map( transformations(_).keys.toSeq.sorted)).toMap data .filter($"attribute".isin(attributeList: _*)) .repartition(numPartitions) .map(row => row.copy(score = transformScore(row.score, sortedKeys(row.attribute), transformations(row.attribute))) ) } /** * Computing the empirical CDF function. * * @param data dataframe containing "score" * @param probabilities array of probabilities for computing quantiles * @param relativeTolerance relative tolerance for computing approximate quantiles * @return the eCDF as a scala map */ def cdfTransformation(data: DataFrame, probabilities: Array[Double], relativeTolerance: Double): Map[Double, Double] = { val quantiles = data.stat.approxQuantile("score", probabilities, relativeTolerance) quantiles.zip(probabilities).toMap } /** * Adjust transformation such that the transformed score distribution is the same as the baseline score distribution * * @param baselineData dataset containing score, label, attribute * @param attributeList list of attributes * @param transformations transformations represented as a scala Map for each attribute * @param numQuantiles the number of quantiles for computing a quantile-quantile map between the original score * and the transformed score * @param relativeTolerance relative tolerance for computing approximate quantiles * @return modified transformations */ def adjustScale(baselineData: Dataset[ScoreWithAttribute], attributeList: Seq[String], transformations: Map[String, Map[Double, Double]], numQuantiles: Int, relativeTolerance: Double): Map[String, Map[Double, Double]] = { import baselineData.sparkSession.implicits._ val filteredData = baselineData .filter($"attribute".isin(attributeList: _*)) .as[ScoreWithAttribute] val probabilities = Array.range(0, numQuantiles + 1).map(x => x.toDouble / numQuantiles) val quantilesBeforeTransformation = filteredData.stat.approxQuantile("score", probabilities, relativeTolerance) val transformedData = applyTransformation(filteredData, attributeList, transformations) val quantilesAfterTransformation = transformedData.stat.approxQuantile("score", probabilities, relativeTolerance) val qqMap = quantilesAfterTransformation.zip(quantilesBeforeTransformation).toMap transformations.transform((attribute, innerMap) => innerMap.transform((key, value) => transformScore(value, quantilesAfterTransformation.toSeq, qqMap))) } /** * Learning the equality of opportunity (EOpp) transformation for datasets. * By setting originalScale = true, a score distribution preserving transformation can be learned. * However, this may affect the quality of the output (i.e. the EOpp transformation), especially when numQuntiles is * not large enough. * * @param data dataset containing score, label, attribute * @param numQuantiles number of points for representing transformation functions * (quantile-quantile mappings). * @param relativeTolerance relative tolerance for computing approximate quantiles * @param originalScale whether the distribution of the transformed score should be the same as the distribution * before transformation. * @return EOpp transformation represented as a scala Map[Double, Double] for each attribute */ def eOppTransformation(data: Dataset[ScoreWithLabelAndAttribute], attributeList: Seq[String], numQuantiles: Int = 10000, relativeTolerance: Double = 1e-6, originalScale: Boolean = false): Map[String, Map[Double, Double]] = { import data.sparkSession.implicits._ val probabilities = Array.range(0, numQuantiles + 1).map(x => x.toDouble / numQuantiles) val eOppMaps = attributeList.zip(attributeList.map(attribute => cdfTransformation(data.filter($"label" === 1 and $"attribute" === attribute).toDF, probabilities, relativeTolerance) )).toMap if (!originalScale) { eOppMaps } else { adjustScale(data.drop("label").as[ScoreWithAttribute], attributeList, eOppMaps, numQuantiles, relativeTolerance) } } } ================================================ FILE: lift/src/main/scala/com/linkedin/lift/types/BenefitMap.scala ================================================ package com.linkedin.lift.types import com.linkedin.lift.lib.StatsUtils import com.linkedin.lift.types.Distribution.DimensionValues /** * Class representing the benefits for different categories. It is a map * from a category (specified as DimensionValues) to the corresponding benefit * value. Examples of a category include [gender = Female], [gender = Male, * age >= 40], and [age < 40, disability = yes]. Examples of a benefit value * include AUC, precision, recall, error rate, FPR, FNR, FDR, and FOR. We * assume that this map is non-empty, the benefit values are non-negative, * at least one benefit value is positive, and the benefit value for missing * dimensions equals zero. * * @param entries The map that represents benefits for different categories. * @param benefitType The benefit metric whose values are stored in the entries. */ case class BenefitMap( entries: Map[DimensionValues, Double], benefitType: String ) { val errorTolerance = 1e-12 /** * Computes the mean of the benefit values * * @return The mean of all the entries */ def mean: Double = entries.values.sum / entries.size /** * Computes the population variance of the benefit values * * While computing the inequality measures, we would be considering the * entire "population" consisting of all dimensions (as opposed to * "sampling" from the set of potential dimensions). Hence, we treat as * though we calculate the population variance and do not apply Bessel's * correction (https://en.wikipedia.org/wiki/Bessel%27s_correction). * * @return The population variance of all the entries */ def variance: Double = { entries.values.map(math.pow(_, 2)).sum / entries.size - math.pow(this.mean, 2) } /** * Get the value corresponding to a given DimensionValue * * @param key DimensionValue of interest * @return value if present, else 0.0 */ def getValue(key: DimensionValues): Double = entries.getOrElse(key, 0.0) /** * Generalized entropy index as a measure of inequality of the distribution * of the benefits over categories. References: * https://arxiv.org/abs/1902.04783 * https://arxiv.org/abs/1807.00787 * https://en.wikipedia.org/wiki/Generalized_entropy_index * * We assume that the benefits are positive whenever alpha is set to 0 or 1, * so it is recommended to use the useAbsVal flag (to convert benefit vectors * into their positive counterparts) if the vector might contain negative values. * * @param alpha Parameter which regulates the weight given to distances * between benefits at different parts of the distribution. * @return Generalized entropy index */ def computeGeneralizedEntropyIndex(alpha: Double, useAbsVal: Boolean = false): Double = { val count = entries.size val updatedBenefitMap = if (useAbsVal) { val posEntries = entries.map { case (dimVal, entry) => (dimVal, math.abs(entry)) } BenefitMap(entries = posEntries, benefitType = this.benefitType) } else { this } val mean = updatedBenefitMap.mean val normalizedBenefits = updatedBenefitMap.entries.map { case (_, benefit) => benefit / mean } if (math.abs(alpha - 1.0) < errorTolerance) { normalizedBenefits.map(x => x * math.log(x)).sum / count } else if (math.abs(alpha) < errorTolerance) { normalizedBenefits.map(x => - math.log(x)).sum / count } else { normalizedBenefits.map(x => math.pow(x, alpha) - 1.0).sum / (count * alpha * (alpha - 1.0)) } } /** * Theil T index as a measure of inequality of the distribution of the * benefits over categories. Reference: * https://en.wikipedia.org/wiki/Theil_index * * We assume that the distribution has only positive values. * * @return Theil T index */ def computeTheilTIndex: Double = computeGeneralizedEntropyIndex(1.0, useAbsVal = true) /** * Theil L index as a measure of inequality of the distribution of the * benefits over categories (also known as the mean log deviation). * References: * https://en.wikipedia.org/wiki/Theil_index * https://en.wikipedia.org/wiki/Mean_log_deviation * * We assume that the distribution has only positive values. * * @return Theil L index */ def computeTheilLIndex: Double = computeGeneralizedEntropyIndex(0.0, useAbsVal = true) /** * Atkinson index as a measure of inequality of the distribution * of the benefits over categories. References: * https://en.wikipedia.org/wiki/Atkinson_index * Atkinson, On the measurement of inequality. * Journal of Economic Theory, 2 (3), 1970 * http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.521.849&rep=rep1&type=pdf * https://statisticalhorizons.com/wp-content/uploads/Inequality.pdf (Note * that there is a typo in equation 17) * * We assume that the benefits are positive whenever epsilon > 1. Although * Atkinson index can be expressed in terms of generalized entropy index, * we compute directly for simplicity and to avoid positivity assumption * when epsilon = 0. * * @param epsilon Inequality aversion parameter (greater than or equal to * zero, with zero corresponding to no aversion to inequality) * @return Atkinson index */ def computeAtkinsonIndex(epsilon: Double): Double = { val count = entries.size val mean = this.mean val normalizedBenefits = entries.map {case (_, benefit) => benefit / mean } val alpha = 1 - epsilon if (math.abs(alpha) < errorTolerance) { 1.0 - math.pow(normalizedBenefits.product, 1.0/count) } else { val normalizedBenefitPowerMean = normalizedBenefits .map(math.pow(_, alpha)) .sum / count 1.0 - math.pow(normalizedBenefitPowerMean, 1.0/alpha) } } /** * Coefficient of variation as a measure of inequality of the distribution * of the benefits over categories. References: * https://en.wikipedia.org/wiki/Coefficient_of_variation * https://statisticalhorizons.com/wp-content/uploads/Inequality.pdf * * Although coefficient of variation can be expressed in terms of * generalized entropy index (GEI with alpha = 2 equals half the squared * coefficient of variation), we compute directly for simplicity. * * @return Coefficient of variation */ def computeCoefficientOfVariation: Double = { math.sqrt(this.variance) / this.mean } /** * Compute the requested metric, passing along any metric-specific parameters. * * @param metric The metric of interest * @param metricParam A metric-specific parameter * @return The computed metric of interest */ def computeMetric(metric: String, metricParam: String): Option[FairnessResult] = { metric match { case "GENERALIZED_ENTROPY_INDEX" => Some(FairnessResult( resultType = s"$benefitType: $metric", resultValOpt = Some(computeGeneralizedEntropyIndex(metricParam.toDouble)), parameters = metricParam, constituentVals = Map())) case "ATKINSON_INDEX" => Some(FairnessResult( resultType = s"$benefitType: $metric", resultValOpt = Some(computeAtkinsonIndex(metricParam.toDouble)), parameters = metricParam, constituentVals = Map())) case "THEIL_L_INDEX" => Some(FairnessResult( resultType = s"$benefitType: $metric", resultValOpt = Some(computeTheilLIndex), parameters = metricParam, constituentVals = Map())) case "THEIL_T_INDEX" => Some(FairnessResult( resultType = s"$benefitType: $metric", resultValOpt = Some(computeTheilTIndex), parameters = metricParam, constituentVals = Map())) case "COEFFICIENT_OF_VARIATION" => Some(FairnessResult( resultType = s"$benefitType: $metric", resultValOpt = Some(computeCoefficientOfVariation), parameters = metricParam, constituentVals = Map())) case _ => None } } /** * Compute the aggregate metrics requested, and append the benefit metric * used for these computations to the returned list of results. * * @param overallMetrics The aggregate metrics to compute * @return The sequence of FairnessResults */ def computeOverallMetrics(overallMetrics: Map[String, String]): Seq[FairnessResult] = { val overallMetricsSeq = overallMetrics.flatMap { case (overallMetric, metricParam) => computeMetric(overallMetric, metricParam) }.toList FairnessResult( resultType = s"Benefit Map for $benefitType", resultValOpt = None, constituentVals = entries ) +: overallMetricsSeq } } object BenefitMap { /** * Compute a Benefit Map that captures a benefit value for each dimension * value, from a given set of model predictions and benefit function. * * @param predictions The model predictions to analyze * @param dimensionType The dimension type of interest * @param benefitMetric The benefit metric to compute for each dimension value * @return The computed benefit map */ def compute(predictions: Seq[ModelPrediction], dimensionType: String, benefitMetric: String): BenefitMap = { val benefitFn = StatsUtils.getMetricFn(benefitMetric) val benefitEntries: Map[DimensionValues, Double] = predictions.groupBy(_.dimensionValue) .map { case (dimVal, entries) => (Map(dimensionType -> dimVal), benefitFn(entries)) } BenefitMap(entries = benefitEntries, benefitType = benefitMetric) } } ================================================ FILE: lift/src/main/scala/com/linkedin/lift/types/CustomMetric.scala ================================================ package com.linkedin.lift.types /** * Abstract class that needs to be extended in case a custom metric needs to * be computed. The compute method needs to be overridden. */ abstract class CustomMetric { /** * Compute a user-defined metric given a sequence of model predictions. * * @param data A sample of model predictions * @return The custom computed metric */ def compute(data: Seq[ModelPrediction]): Double } ================================================ FILE: lift/src/main/scala/com/linkedin/lift/types/Distribution.scala ================================================ package com.linkedin.lift.types import com.linkedin.lift.types.Distribution.DimensionValues import org.apache.spark.sql.functions.{col, count} import org.apache.spark.sql.types.{DoubleType, StringType, StructField, StructType} import org.apache.spark.sql.{DataFrame, Row, SparkSession} /** * Class representing a data distribution. It is a map from a set of dimension * entries to their corresponding value. We assume that this is a sparse * representation, ie., missing dimensions correspond to a value of zero. Note * that the class is not aware of the set of all possible dimension values - * it will return a value of zero for any key it doesn't contain. * * The Distribution can be a frequency distribution or a discrete probability * distribution. * * @param entries The map that represents the distribution. */ case class Distribution( entries: Map[DimensionValues, Double] ) { /** * Computes the sum of the distribution * * @return The sum of all the entries */ def sum: Double = entries.values.sum /** * Computes the max of the distribution * * @return The max of all the entries */ def max: Double = entries.values.max /** * Get the value corresponding to a given DimensionValue * * @param key DimensionValue of interest * @return value if present, else 0.0 */ def getValue(key: DimensionValues): Double = entries.getOrElse(key, 0.0) /** * Zips this distribution with another distribution * * @param other The other distribution to zip with * @return An iterable over the two distributions, with imputed values for * dimensions missing in either distribution. Since we are not aware * of the set of all dimension values, we cannot impute values for * dimensions missing in both distributions. */ def zip(other: Distribution): Seq[(DimensionValues, Double, Double)] = { (this.entries.keys ++ other.entries.keys).map { key => (key, this.getValue(key), other.getValue(key)) }.toSeq } /** * Computes marginal distribution with respect to the specified set of * dimensions * * @param groupByCols Dimensions to group by * @return The resultant marginal distribution */ def computeMarginal(groupByCols: Set[String]): Distribution = { val marginalDistributionEntries = entries.toSeq .map { case (dimVals, count) => val marginalDimensions = groupByCols.map { groupByCol => (groupByCol, dimVals.getOrElse(groupByCol, "")) }.toMap (marginalDimensions, count) } .groupBy(_._1) .map { case (marginalDimensions, countsGroup) => (marginalDimensions, countsGroup.map(_._2).sum)} Distribution(entries = marginalDistributionEntries) } /** * Convert the Distribution into a DataFrame * * @param spark The current Spark Session * @return A DataFrame with column names (dim1, ...., dimN, count). */ def toDF(spark: SparkSession): DataFrame = { val allKeys = entries.keySet.flatMap(_.keySet).toSeq // Build a schema corresponding to the distribution entries val schema = StructType( allKeys.map(StructField(_, StringType)) :+ StructField("count", DoubleType)) // Build an RDD with the distribution entries val entriesSeq = entries.map { case (dimVals, count) => val entries: Seq[Any] = allKeys.map(dimVals.getOrElse(_, "")) :+ count entries }.toSeq val rowData = entriesSeq.map(entry => Row(entry: _*)) val rdd = spark.sparkContext.parallelize(rowData) spark.createDataFrame(rdd, schema) } } object Distribution { type DimensionValues = Map[String, String] /** * Create a Distribution instance given a DataFrame and the fields to group on. * * @param df DataFrame to be grouped * @param groupByCols Dimensions to group by * @return The resultant distribution */ def compute(df: DataFrame, groupByCols: Set[String]): Distribution = { val groupBySqlCols = groupByCols.map(col).toSeq val distributionEntries = df.select(groupBySqlCols: _*) .groupBy(groupBySqlCols: _*) .agg(count("*")) .collect .toSeq .map { row => val rowSeq = row.toSeq.map { Option(_).fold("") { _.toString } } val groupingVals = rowSeq.take(groupByCols.size) val countVal = rowSeq.drop(groupByCols.size).head val dimensions = groupByCols.zip(groupingVals).toMap (dimensions, countVal.toDouble) } .toMap Distribution(entries = distributionEntries) } } ================================================ FILE: lift/src/main/scala/com/linkedin/lift/types/EOppCaseClasses.scala ================================================ package com.linkedin.lift.types case class ScoreWithLabelAndPosition(score: Double, label: Int, position: Int, attribute: Option[String] = None) case class ScoreWithAttribute(itemId: Int, score: Double, attribute: String, sessionId: Option[Int] = None, position: Option[Int] = None) case class ScoreWithLabelAndAttribute(itemId: Int, score: Double, label: Int, attribute: String, sessionId: Option[Int] = None, position: Option[Int] = None) ================================================ FILE: lift/src/main/scala/com/linkedin/lift/types/FairnessResult.scala ================================================ package com.linkedin.lift.types import com.linkedin.lift.types.Distribution.DimensionValues import org.apache.spark.sql.{DataFrame, SparkSession} /** * Captures the results of a generic fairness metric computation. * * @param resultType Description/title of the computed metric * @param parameters Any parameters that were used in the computation * @param resultValOpt The result of the computation. Some results involve a * single metric, in which case this field is used. It is * None otherwise. * @param constituentVals Values/results that the result is comprised of. Some * metrics produce a list of values, while others * produce a single overall value. In the latter case, * we attempt to capture the contributions of the * individual dimensions in this field. * @param additionalStats Any additional statistics related to the computation. */ case class FairnessResult( resultType: String, parameters: String = "", resultValOpt: Option[Double], constituentVals: Map[DimensionValues, Double], additionalStats: Map[String, Double] = Map() ) { /** * Convert a FairnessResult into a BenefitMap. This works by using the * constituent values of the FairnessResult as the benefit vector. * * @return The resultant BenefitMap */ def toBenefitMap: BenefitMap = { BenefitMap(entries = constituentVals, benefitType = resultType) } } object FairnessResult { // Avro schemas allow only String keys for Maps private case class AvroCompatibleResult( resultType: String, parameters: String, resultValOpt: Option[Double], constituentVals: Map[String, Double], additionalStats: Map[String, Double]) /** * Create an Avro-compatible DataFrame from a sequence of results. * * @param spark The Spark Session * @param results The results to be converted * @return A DataFrame containing the results */ def toDF(spark: SparkSession, results: Seq[FairnessResult]): DataFrame = { import spark.implicits._ results.toDS.map { result => AvroCompatibleResult( resultType = result.resultType, parameters = result.parameters, resultValOpt = result.resultValOpt, constituentVals = result.constituentVals.map { case (dimVals, value) => (dimVals.toString, value) }, additionalStats = result.additionalStats) }.toDF } } ================================================ FILE: lift/src/main/scala/com/linkedin/lift/types/ModelPrediction.scala ================================================ package com.linkedin.lift.types import org.apache.spark.sql.{DataFrame, Dataset, Encoders, Row} /** * Represents a single data point's ground-truth label, * the model's prediction (either a score, or a predicted class), * and the dimension value it corresponds to. For a method to work with * the permutation test, it needs to take a sequence of these case classes * as its input. * * @param label The ground-truth label of the data point. Values are in {0, 1} * @param prediction The model's prediction. Lies between [0, 1] * @param groupId The optional groupId for ranking * @param rank A value that indicates the rank of the prediction. If groupId is * empty, this would be the absolute rank. Otherwise, it is * the per-group rank. Starts from 1. * @param dimensionValue The dimension value the data point belongs to */ case class ModelPrediction( label: Double, prediction: Double, groupId: String = "", rank: Int = 0, dimensionValue: String) object ModelPrediction { /** * Retrieves the group ID from the specified field. * * @param row Input DataFrame's Row * @param groupIdField The group ID field * @return THe group ID value if present, else an empty string */ def getGroupId(row: Row, groupIdField: String): String = { val allFields = row.schema.fieldNames val groupIdValOpt = if (allFields.contains(groupIdField)) { Some(row.getAs[CharSequence](groupIdField).toString) } else { None } groupIdValOpt.getOrElse("") } /** * Builds a Dataset[ModelPrediction] by extracting labels, predictions, * dimension values and group IDs. * * @param df The DataFrame to process * @param labelField The label field * @param scoreField The score field * @param groupIdField The group ID field * @param dimValField The dimension value field * @return The Dataset containing ModelPredictions */ def getModelPredictionDS(df: DataFrame, labelField: String, scoreField: String, groupIdField: String, dimValField: String): Dataset[ModelPrediction] = { df.map { row => val label = row.getAs[Any](labelField).toString.toDouble val prediction = row.getAs[Any](scoreField).toString.toDouble val groupIdVal = getGroupId(row, groupIdField) val dimVal = row.getAs[CharSequence](dimValField).toString ModelPrediction( label = label, prediction = prediction, groupId = groupIdVal, dimensionValue = dimVal) } (Encoders.product[ModelPrediction]) } /** * Generate Model Predictions from a given DataFrame. * * @param df The DataFrame to process * @param labelField Column containing the labels * @param scoreField Column containing the model scores * @param groupIdField Grouping column name (usually meant for ranking metrics) * @param dimValField Column containing the dimension value of interest * @return A sequence of model predictions extracted from the DataFrame */ def compute(df: DataFrame, labelField: String, scoreField: String, groupIdField: String, dimValField: String): Seq[ModelPrediction] = { val modelPredictions = getModelPredictionDS(df, labelField, scoreField, groupIdField, dimValField) .collect .toSeq // Add ranking info modelPredictions.groupBy(_.groupId) .flatMap { case (_, predictions) => predictions.sortBy(-_.prediction) .zipWithIndex .map { case (prediction, rank) => prediction.copy(rank = rank + 1) } }.toList } } ================================================ FILE: lift/src/test/data/TrainingData.csv ================================================ [File too large to display: 30.9 MB] ================================================ FILE: lift/src/test/data/ValidationData.csv ================================================ [File too large to display: 30.9 MB] ================================================ FILE: lift/src/test/scala/com/linkedin/lift/eval/FairnessMetricsUtilsTest.scala ================================================ package com.linkedin.lift.eval import com.linkedin.lift.lib.testing.{TestUtils, TestValues} import com.linkedin.lift.lib.testing.TestValues.JoinedData import com.linkedin.lift.types.{Distribution, FairnessResult, ModelPrediction} import org.testng.Assert import org.testng.annotations.Test /** * Tests for FairnessMetricsUtils */ class FairnessMetricsUtilsTest { val predictions: Seq[ModelPrediction] = Seq( ModelPrediction(label = 1, prediction = 1, dimensionValue = "MALE"), ModelPrediction(label = 1, prediction = 0, dimensionValue = "MALE"), ModelPrediction(label = 0, prediction = 0, dimensionValue = "MALE"), ModelPrediction(label = 0, prediction = 1, dimensionValue = "MALE"), ModelPrediction(label = 0, prediction = 0, dimensionValue = "FEMALE"), ModelPrediction(label = 1, prediction = 1, dimensionValue = "FEMALE"), ModelPrediction(label = 0, prediction = 1, dimensionValue = "FEMALE"), ModelPrediction(label = 0, prediction = 1, dimensionValue = "FEMALE"), ModelPrediction(label = 1, prediction = 1, dimensionValue = "UNKNOWN"), ModelPrediction(label = 0, prediction = 1, dimensionValue = "UNKNOWN"), ModelPrediction(label = 0, prediction = 1, dimensionValue = "UNKNOWN"), ModelPrediction(label = 0, prediction = 1, dimensionValue = "UNKNOWN")) @Test(description = "Project IDs, labels and scores") def testProjectIdLabelsAndScores(): Unit = { val projectedDF = FairnessMetricsUtils.projectIdLabelsAndScores(TestValues.df, "memberId", "label", "predicted", "") val projectedDFStrSeq = projectedDF.collect.toSeq.map(_.toString) val expectedStringSeq1 = Seq( "[12340,0,0]", "[12341,1,0]", "[12342,0,1]", "[12343,0,0]", "[12344,1,1]", "[12345,0,1]", "[12346,1,1]", "[12347,1,0]", "[12348,0,0]", "[12349,0,1]") Assert.assertEquals(projectedDFStrSeq, expectedStringSeq1) } @Test(description = "Compute permutation test metrics") def testComputePermutationTestMetrics(): Unit = { val actualResults = FairnessMetricsUtils.computePermutationTestMetrics( predictions, "gender", Seq("PRECISION", "RECALL"), 1000, 2) Assert.assertEquals(actualResults, Seq( FairnessResult(resultType = "PERMUTATION_TEST", resultValOpt = Some(0.16667), constituentVals = Map(Map("gender" -> "MALE") -> 0.5, Map("gender" -> "FEMALE") -> 0.33333), parameters = "Map(metric -> PRECISION, numTrials -> 1000, seed -> 2)", additionalStats = Map("pValue" -> 0.498, "stdError" -> 0.01581, "bootstrapStdDev" -> 0.49003251236869033, "testStatisticStdDev" -> 0.5265411200529941)), FairnessResult(resultType = "PERMUTATION_TEST", resultValOpt = Some(0.25), constituentVals = Map(Map("gender" -> "MALE") -> 0.5, Map("gender" -> "UNKNOWN") -> 0.25), parameters = "Map(metric -> PRECISION, numTrials -> 1000, seed -> 2)", additionalStats = Map("pValue" -> 0.631, "stdError" -> 0.01526, "bootstrapStdDev" -> 0.45875449105441796, "testStatisticStdDev" -> 0.4308307912931494)), FairnessResult(resultType = "PERMUTATION_TEST", resultValOpt = Some(0.08333), constituentVals = Map(Map("gender" -> "FEMALE") -> 0.33333, Map("gender" -> "UNKNOWN") -> 0.25), parameters = "Map(metric -> PRECISION, numTrials -> 1000, seed -> 2)", additionalStats = Map("pValue" -> 1.0, "stdError" -> 0.0, "bootstrapStdDev" -> 0.38278033036295717, "testStatisticStdDev" -> 0.373106162115032)), FairnessResult(resultType = "PERMUTATION_TEST", resultValOpt = Some(-0.5), constituentVals = Map(Map("gender" -> "MALE") -> 0.5, Map("gender" -> "FEMALE") -> 1.0), parameters = "Map(metric -> RECALL, numTrials -> 1000, seed -> 2)", additionalStats = Map("pValue" -> 0.444, "stdError" -> 0.01571, "bootstrapStdDev" -> 0.6180776162116582, "testStatisticStdDev" -> 0.7040802954178795)), FairnessResult(resultType = "PERMUTATION_TEST", resultValOpt = Some(-0.5), constituentVals = Map(Map("gender" -> "MALE") -> 0.5, Map("gender" -> "UNKNOWN") -> 1.0), parameters = "Map(metric -> RECALL, numTrials -> 1000, seed -> 2)", additionalStats = Map("pValue" -> 0.417, "stdError" -> 0.01559, "bootstrapStdDev" -> 0.6473608956252646, "testStatisticStdDev" -> 0.6938742203424768)), FairnessResult(resultType = "PERMUTATION_TEST", resultValOpt = Some(0.0), constituentVals = Map(Map("gender" -> "FEMALE") -> 1.0, Map("gender" -> "UNKNOWN") -> 1.0), parameters = "Map(metric -> RECALL, numTrials -> 1000, seed -> 2)", additionalStats = Map("pValue" -> 0.426, "stdError" -> 0.01564, "bootstrapStdDev" -> 0.6958650934112338, "testStatisticStdDev" -> 0.653010277424806)))) } @Test(description = "Compute reference distributions") def testComputeReferenceDistributionOpt(): Unit = { val distribution = Distribution(Map( Map("gender" -> "MALE", "label" -> "1") -> 2.0, Map("gender" -> "MALE", "label" -> "0") -> 3.0, Map("gender" -> "FEMALE", "label" -> "1") -> 2.0, Map("gender" -> "FEMALE", "label" -> "0") -> 2.8, Map("gender" -> "UNKNOWN", "label" -> "1") -> 0.3, Map("gender" -> "UNKNOWN", "label" -> "0") -> 1.0)) Assert.assertEquals(FairnessMetricsUtils.computeReferenceDistributionOpt( distribution, "incorrect"), None) Assert.assertEquals(FairnessMetricsUtils.computeReferenceDistributionOpt( distribution, "UNIFORM"), Some(Distribution(Map( Map("gender" -> "MALE", "label" -> "1") -> 1.0/6.0, Map("gender" -> "MALE", "label" -> "0") -> 1.0/6.0, Map("gender" -> "FEMALE", "label" -> "1") -> 1.0/6.0, Map("gender" -> "FEMALE", "label" -> "0") -> 1.0/6.0, Map("gender" -> "UNKNOWN", "label" -> "1") -> 1.0/6.0, Map("gender" -> "UNKNOWN", "label" -> "0") -> 1.0/6.0)))) } @Test(description = "Compute dataset metrics - no reference distribution") def testComputeDatasetMetricsNoRefDistr(): Unit = { val distribution = Distribution(Map( Map("gender" -> "MALE", "label" -> "1.0") -> 0.3, Map("gender" -> "MALE", "label" -> "0.0") -> 0.2, Map("gender" -> "FEMALE", "label" -> "1.0") -> 0.1, Map("gender" -> "FEMALE", "label" -> "0.0") -> 0.2, Map("gender" -> "UNKNOWN", "label" -> "1.0") -> 0.1, Map("gender" -> "UNKNOWN", "label" -> "0.0") -> 0.1)) val args = MeasureDatasetFairnessMetricsCmdLineArgs( labelField = "label", protectedAttributeField = "gender", distanceMetrics = Seq("KL_DIVERGENCE", "DEMOGRAPHIC_PARITY", "EQUALIZED_ODDS"), overallMetrics = Map("THEIL_L_INDEX" -> "", "THEIL_T_INDEX" -> ""), benefitMetrics = Seq("SKEWS")) // Only distance metrics with no reference distribution are computed val actualMetrics = FairnessMetricsUtils.computeDatasetMetrics(distribution, None, args) Assert.assertEquals(actualMetrics, Seq(FairnessResult( resultType = "DEMOGRAPHIC_PARITY", resultValOpt = None, constituentVals = Map( Map("gender1" -> "FEMALE", "gender2" -> "UNKNOWN") -> 0.16667, Map("gender1" -> "FEMALE", "gender2" -> "MALE") -> 0.26667, Map("gender1" -> "UNKNOWN", "gender2" -> "MALE") -> 0.1), additionalStats = Map("FEMALE" -> 0.33333, "UNKNOWN" -> 0.5, "MALE" -> 0.6)))) } @Test(description = "Compute dataset metrics - with reference distribution") def testComputeDatasetMetricsWithRefDistr(): Unit = { val distribution = Distribution(Map( Map("gender" -> "MALE", "label" -> "1.0") -> 0.3, Map("gender" -> "MALE", "label" -> "0.0") -> 0.2, Map("gender" -> "FEMALE", "label" -> "1.0") -> 0.1, Map("gender" -> "FEMALE", "label" -> "0.0") -> 0.2, Map("gender" -> "UNKNOWN", "label" -> "1.0") -> 0.1, Map("gender" -> "UNKNOWN", "label" -> "0.0") -> 0.1)) val args = MeasureDatasetFairnessMetricsCmdLineArgs( labelField = "label", protectedAttributeField = "gender", distanceMetrics = Seq("KL_DIVERGENCE", "DEMOGRAPHIC_PARITY", "EQUALIZED_ODDS"), overallMetrics = Map("THEIL_L_INDEX" -> "", "THEIL_T_INDEX" -> ""), benefitMetrics = Seq("SKEWS")) // Dataset distance metrics val referenceDistr = Distribution(Map( Map("gender" -> "MALE", "label" -> "1.0") -> 0.16666, Map("gender" -> "MALE", "label" -> "0.0") -> 0.16666, Map("gender" -> "FEMALE", "label" -> "1.0") -> 0.16666, Map("gender" -> "FEMALE", "label" -> "0.0") -> 0.16666, Map("gender" -> "UNKNOWN", "label" -> "1.0") -> 0.16666, Map("gender" -> "UNKNOWN", "label" -> "0.0") -> 0.16666)) val actualMetrics = FairnessMetricsUtils.computeDatasetMetrics(distribution, Some(referenceDistr), args) Assert.assertEquals(actualMetrics, Seq( FairnessResult(resultType = "KL_DIVERGENCE", parameters = Distribution(Map( Map("gender" -> "FEMALE", "label" -> "1.0") -> 0.16666, Map("gender" -> "UNKNOWN", "label" -> "0.0") -> 0.16666, Map("gender" -> "UNKNOWN", "label" -> "1.0") -> 0.16666, Map("gender" -> "MALE", "label" -> "0.0") -> 0.16666, Map("gender" -> "FEMALE", "label" -> "0.0") -> 0.16666, Map("gender" -> "MALE", "label" -> "1.0") -> 0.16666)).toString, resultValOpt = Some(0.13852315605014068), constituentVals = Map()), FairnessResult(resultType = "DEMOGRAPHIC_PARITY", resultValOpt = None, constituentVals = Map( Map("gender1" -> "FEMALE", "gender2" -> "UNKNOWN") -> 0.16667, Map("gender1" -> "FEMALE", "gender2" -> "MALE") -> 0.26667, Map("gender1" -> "UNKNOWN", "gender2" -> "MALE") -> 0.1), additionalStats = Map("FEMALE" -> 0.33333, "UNKNOWN" -> 0.5, "MALE" -> 0.6)), FairnessResult( resultType = "Benefit Map for SKEWS", resultValOpt = None, constituentVals = Map( Map("gender" -> "FEMALE", "label" -> "1.0") -> -0.058840500022933395, Map("gender" -> "UNKNOWN", "label" -> "0.0") -> -0.058840500022933395, Map("gender" -> "UNKNOWN", "label" -> "1.0") -> -0.058840500022933395, Map("gender" -> "MALE", "label" -> "0.0") -> 0.028170876966696262, Map("gender" -> "FEMALE", "label" -> "0.0") -> 0.028170876966696262, Map("gender" -> "MALE", "label" -> "1.0") -> 0.10821358464023273)), FairnessResult( resultType = "SKEWS: THEIL_L_INDEX", resultValOpt = Some(0.10948572991373717), constituentVals = Map()), FairnessResult( resultType = "SKEWS: THEIL_T_INDEX", resultValOpt = Some(0.10611973507347484), constituentVals = Map()))) } @Test(description = "Compute model metrics") def testComputeModelMetrics(): Unit = { val testData = Seq( JoinedData(memberId = 1, label = "0.0", predicted = "1.0", gender = "MALE"), JoinedData(memberId = 2, label = "0.0", predicted = "1.0", gender = "MALE"), JoinedData(memberId = 3, label = "0.0", predicted = "1.0", gender = "MALE"), JoinedData(memberId = 4, label = "0.0", predicted = "1.0", gender = "MALE"), JoinedData(memberId = 5, label = "1.0", predicted = "1.0", gender = "MALE"), JoinedData(memberId = 6, label = "1.0", predicted = "1.0", gender = "MALE"), JoinedData(memberId = 7, label = "0.0", predicted = "0.0", gender = "MALE"), JoinedData(memberId = 9, label = "0.0", predicted = "0.0", gender = "MALE"), JoinedData(memberId = 9, label = "1.0", predicted = "0.0", gender = "MALE"), JoinedData(memberId = 10, label = "1.0", predicted = "0.0", gender = "MALE"), JoinedData(memberId = 11, label = "0.0", predicted = "1.0", gender = "FEMALE"), JoinedData(memberId = 12, label = "0.0", predicted = "1.0", gender = "FEMALE"), JoinedData(memberId = 13, label = "0.0", predicted = "0.0", gender = "FEMALE"), JoinedData(memberId = 14, label = "0.0", predicted = "0.0", gender = "FEMALE"), JoinedData(memberId = 15, label = "1.0", predicted = "0.0", gender = "FEMALE"), JoinedData(memberId = 16, label = "1.0", predicted = "0.0", gender = "FEMALE"), JoinedData(memberId = 17, label = "1.0", predicted = "1.0", gender = "UNKNOWN"), JoinedData(memberId = 18, label = "1.0", predicted = "1.0", gender = "UNKNOWN"), JoinedData(memberId = 19, label = "0.0", predicted = "0.0", gender = "UNKNOWN"), JoinedData(memberId = 20, label = "1.0", predicted = "0.0", gender = "UNKNOWN")) val df = TestUtils.createDFFromProduct(TestValues.spark, testData) val args = MeasureModelFairnessMetricsCmdLineArgs( labelField = "label", scoreField = "predicted", protectedAttributeField = "gender", distanceMetrics = Seq("KL_DIVERGENCE", "DEMOGRAPHIC_PARITY", "EQUALIZED_ODDS"), overallMetrics = Map("THEIL_L_INDEX" -> "", "THEIL_T_INDEX" -> ""), distanceBenefitMetrics = Seq("SKEWS")) // Model distance metrics val referenceDistr = Distribution(Map( Map("gender" -> "MALE", "predicted" -> "1.0") -> 0.16666, Map("gender" -> "MALE", "predicted" -> "0.0") -> 0.16666, Map("gender" -> "FEMALE", "predicted" -> "1.0") -> 0.16666, Map("gender" -> "FEMALE", "predicted" -> "0.0") -> 0.16666, Map("gender" -> "UNKNOWN", "predicted" -> "1.0") -> 0.16666, Map("gender" -> "UNKNOWN", "predicted" -> "0.0") -> 0.16666)) val actualMetrics = FairnessMetricsUtils.computeModelMetrics(df, Some(referenceDistr), args) Assert.assertEquals(actualMetrics, Seq( FairnessResult(resultType = "KL_DIVERGENCE", parameters = Distribution(Map( Map("gender" -> "UNKNOWN", "predicted" -> "1.0") -> 0.16666, Map("gender" -> "MALE", "predicted" -> "1.0") -> 0.16666, Map("gender" -> "MALE", "predicted" -> "0.0") -> 0.16666, Map("gender" -> "FEMALE", "predicted" -> "0.0") -> 0.16666, Map("gender" -> "FEMALE", "predicted" -> "1.0") -> 0.16666, Map("gender" -> "UNKNOWN", "predicted" -> "0.0") -> 0.16666)).toString, resultValOpt = Some(0.13852315605014037), constituentVals = Map()), FairnessResult(resultType = "DEMOGRAPHIC_PARITY", resultValOpt = None, constituentVals = Map( Map("gender1" -> "FEMALE", "gender2" -> "UNKNOWN") -> 0.16667, Map("gender1" -> "FEMALE", "gender2" -> "MALE") -> 0.26667, Map("gender1" -> "UNKNOWN", "gender2" -> "MALE") -> 0.1), additionalStats = Map("FEMALE" -> 0.33333, "UNKNOWN" -> 0.5, "MALE" -> 0.6)), FairnessResult(resultType = "EQUALIZED_ODDS", resultValOpt = None, constituentVals = Map( Map("gender1" -> "FEMALE", "gender2" -> "UNKNOWN", "label" -> "1.0") -> 0.66667, Map("gender1" -> "FEMALE", "gender2" -> "MALE", "label" -> "1.0") -> 0.5, Map("gender1" -> "UNKNOWN", "gender2" -> "MALE", "label" -> "1.0") -> 0.16667, Map("gender1" -> "UNKNOWN", "gender2" -> "FEMALE", "label" -> "0.0") -> 0.5, Map("gender1" -> "MALE", "gender2" -> "FEMALE", "label" -> "0.0") -> 0.16667, Map("gender1" -> "UNKNOWN", "gender2" -> "MALE", "label" -> "0.0") -> 0.66667), additionalStats = Map("1.0,UNKNOWN" -> 0.66667, "0.0,UNKNOWN" -> 0.0, "1.0,MALE" -> 0.5, "0.0,FEMALE" -> 0.5, "0.0,MALE" -> 0.66667, "1.0,FEMALE" -> 0.0)), FairnessResult( resultType = "Benefit Map for SKEWS", resultValOpt = None, constituentVals = Map( Map("predicted" -> "1.0", "gender" -> "UNKNOWN") -> -0.3677247801253174, Map("predicted" -> "1.0", "gender" -> "MALE") -> 0.47957308026188605, Map("predicted" -> "0.0", "gender" -> "MALE") -> 0.1431008436406731, Map("predicted" -> "0.0", "gender" -> "FEMALE") -> 0.1431008436406731, Map("predicted" -> "1.0", "gender" -> "FEMALE") -> -0.3677247801253174, Map("predicted" -> "0.0", "gender" -> "UNKNOWN") -> -0.3677247801253174)), FairnessResult( resultType = "SKEWS: THEIL_L_INDEX", resultValOpt = Some(0.10437214313750444), constituentVals = Map()), FairnessResult( resultType = "SKEWS: THEIL_T_INDEX", resultValOpt = Some(0.08957922977452398), constituentVals = Map()))) } @Test(description = "Compute probability DataFrame") def testComputeProbabilityDF(): Unit = { // DF with probabilities val actualDF1 = FairnessMetricsUtils.computeProbabilityDF(TestValues.df2, None, "label", "predicted", "gender", "PROB") Assert.assertEquals(actualDF1.collect.toSeq.map(_.mkString(",")), Seq("0.0,0.3,MALE", "1.0,0.4,MALE", "0.0,0.8,MALE", "0.0,0.1,MALE", "1.0,0.7,MALE", "0.0,0.6,UNKNOWN", "1.0,0.9,FEMALE", "1.0,0.3,FEMALE", "0.0,0.2,FEMALE", "0.0,0.8,FEMALE")) // DF with threshold val actualDF2 = FairnessMetricsUtils.computeProbabilityDF(TestValues.df2, Some(0.6), "label", "predicted", "gender", "RAW") Assert.assertEquals(actualDF2.collect.toSeq.map(_.mkString(",")), Seq("0.0,0,MALE", "1.0,0,MALE", "0.0,1,MALE", "0.0,0,MALE", "1.0,1,MALE", "0.0,1,UNKNOWN", "1.0,1,FEMALE", "1.0,0,FEMALE", "0.0,0,FEMALE", "0.0,1,FEMALE")) // DF with raw scores val actualDF3 = FairnessMetricsUtils.computeProbabilityDF(TestValues.df2, None, "label", "predicted", "gender", "RAW") Assert.assertEquals(actualDF3.collect.toSeq.map(_.mkString(",")), Seq( "0.0,0.574442516811659,MALE", "1.0,0.598687660112452,MALE", "0.0,0.6899744811276125,MALE", "0.0,0.52497918747894,MALE", "1.0,0.6681877721681662,MALE", "0.0,0.6456563062257954,UNKNOWN", "1.0,0.7109495026250039,FEMALE", "1.0,0.574442516811659,FEMALE", "0.0,0.549833997312478,FEMALE", "0.0,0.6899744811276125,FEMALE")) } } ================================================ FILE: lift/src/test/scala/com/linkedin/lift/lib/DivergenceUtilsTest.scala ================================================ package com.linkedin.lift.lib import com.linkedin.lift.lib.testing.TestValues import com.linkedin.lift.types.{Distribution, FairnessResult} import org.testng.Assert import org.testng.annotations.Test /** * Tests for DivergenceUtils */ class DivergenceUtilsTest { val EPS = 1e-12 @Test(description = "KL divergence - no overlap") def testKLDivergenceNoOverlap(): Unit = { val testDist1 = Distribution(Map( Map("gender" -> "MALE", "age" -> "20") -> 24.0, Map("gender" -> "FEMALE", "age" -> "40") -> 4.0)) val testDist2 = Distribution(Map( Map("gender" -> "FEMALE", "age" -> "20") -> 20.0, Map("gender" -> "MALE", "age" -> "40") -> 10.0)) val actualKLDivergence12 = DivergenceUtils.computeKullbackLeiblerDivergence(testDist1, testDist2) val expectedKLDivergence12 = (1.0 / math.log(2.0)) * ((24.0 / 28.0 * math.log((24.0 / 28.0) / (1.0 / 34.0))) + (4.0 / 28.0 * math.log((4.0 / 28.0) / (1.0 / 34.0)))) Assert.assertTrue(math.abs(actualKLDivergence12 - expectedKLDivergence12) < EPS) val actualKLDivergence21 = DivergenceUtils.computeKullbackLeiblerDivergence(testDist2, testDist1) val expectedKLDivergence21 = (1.0 / math.log(2.0)) * ((20.0 / 30.0 * math.log((20.0 / 30.0) / (1.0 / 32.0))) + (10.0 / 30.0 * math.log((10.0 / 30.0) / (1.0 / 32.0)))) Assert.assertTrue(math.abs(actualKLDivergence21 - expectedKLDivergence21) < EPS) // Ensure that the difference is asymmetric (in this case) Assert.assertFalse(math.abs(actualKLDivergence12 - actualKLDivergence21) < EPS) } @Test(description = "KL divergence - with overlap") def testKLDivergenceWithOverlap(): Unit = { val testDist1 = Distribution(Map( Map("gender" -> "MALE", "age" -> "20") -> 24.0, Map("gender" -> "MALE", "age" -> "40") -> 12.0, Map("gender" -> "FEMALE", "age" -> "40") -> 4.0)) val testDist2 = Distribution(Map( Map("gender" -> "FEMALE", "age" -> "20") -> 20.0, Map("gender" -> "FEMALE", "age" -> "40") -> 5.0, Map("gender" -> "MALE", "age" -> "40") -> 10.0)) val actualKLDivergence12 = DivergenceUtils.computeKullbackLeiblerDivergence(testDist1, testDist2) val expectedKLDivergence12 = (1.0 / math.log(2.0)) * ((24.0 / 40.0 * math.log((24.0 / 40.0) / (1.0 / 39.0))) + (12.0 / 40.0 * math.log((12.0 / 40.0) / (11.0 / 39.0))) + (4.0 / 40.0 * math.log((4.0 / 40.0) / (6.0 / 39.0)))) Assert.assertTrue(math.abs(actualKLDivergence12 - expectedKLDivergence12) < EPS) val actualKLDivergence21 = DivergenceUtils.computeKullbackLeiblerDivergence(testDist2, testDist1) val expectedKLDivergence21 = (1.0 / math.log(2.0)) * ((20.0 / 35.0 * math.log((20.0 / 35.0) / (1.0 / 44.0))) + (5.0 / 35.0 * math.log((5.0 / 35.0) / (5.0 / 44.0))) + (10.0 / 35.0 * math.log((10.0 / 35.0) / (13.0 / 44.0)))) Assert.assertTrue(math.abs(actualKLDivergence21 - expectedKLDivergence21) < EPS) // Ensure that the difference is asymmetric (in this case) Assert.assertFalse(math.abs(actualKLDivergence12 - actualKLDivergence21) < EPS) } @Test(description = "JS divergence - no overlap") def testJSDivergenceNoOverlap(): Unit = { val testDist1 = Distribution(Map( Map("gender" -> "MALE", "age" -> "20") -> 24.0, Map("gender" -> "FEMALE", "age" -> "40") -> 4.0)) val testDist2 = Distribution(Map( Map("gender" -> "FEMALE", "age" -> "20") -> 20.0, Map("gender" -> "MALE", "age" -> "40") -> 10.0)) val expectedJSDivergence = (0.5 / math.log(2.0)) * ((24.0 / 28.0 * math.log((24.0 / 28.0) / (12.0 / 28.0))) + (4.0 / 28.0 * math.log((4.0 / 28.0) / (2.0 / 28.0))) + (20.0 / 30.0 * math.log((20.0 / 30.0) / (10.0 / 30.0))) + (10.0 / 30.0 * math.log((10.0 / 30.0) / (5.0 / 30.0)))) val actualJSDivergence12 = DivergenceUtils.computeJensenShannonDivergence(testDist1, testDist2) Assert.assertTrue(math.abs(actualJSDivergence12 - expectedJSDivergence) < EPS) val actualJSDivergence21 = DivergenceUtils.computeJensenShannonDivergence(testDist2, testDist1) Assert.assertTrue(math.abs(actualJSDivergence21 - expectedJSDivergence) < EPS) } @Test(description = "JS divergence - with overlap") def testJSDivergenceWithOverlap(): Unit = { val testDist1 = Distribution(Map( Map("gender" -> "MALE", "age" -> "20") -> 24.0, Map("gender" -> "MALE", "age" -> "40") -> 12.0, Map("gender" -> "FEMALE", "age" -> "40") -> 4.0)) val testDist2 = Distribution(Map( Map("gender" -> "FEMALE", "age" -> "20") -> 20.0, Map("gender" -> "FEMALE", "age" -> "40") -> 5.0, Map("gender" -> "MALE", "age" -> "40") -> 10.0)) val expectedJSDivergence = (0.5 / math.log(2.0)) * ((24.0 / 40.0 * math.log((24.0 / 40.0) / (12.0 / 40.0))) + (12.0 / 40.0 * math.log((12.0 / 40.0) / (820.0 / 2800.0))) + (4.0 / 40.0 * math.log((4.0 / 40.0) / (340.0 / 2800.0))) + (20.0 / 35.0 * math.log((20.0 / 35.0) / (10.0 / 35.0))) + (5.0 / 35.0 * math.log((5.0 / 35.0) / (340.0 / 2800.0))) + (10.0 / 35.0 * math.log((10.0 / 35.0) / (820.0 / 2800.0)))) val actualJSDivergence12 = DivergenceUtils.computeJensenShannonDivergence(testDist1, testDist2) Assert.assertTrue(math.abs(actualJSDivergence12 - expectedJSDivergence) < EPS) val actualJSDivergence21 = DivergenceUtils.computeJensenShannonDivergence(testDist2, testDist1) Assert.assertTrue(math.abs(actualJSDivergence21 - expectedJSDivergence) < EPS) } @Test(description = "Total variation and infinity norm distances - no overlap") def testTotalVariationAndInfinityNormDistancesNoOverlap(): Unit = { val testDist1 = Distribution(Map( Map("gender" -> "MALE", "age" -> "20") -> 24.0, Map("gender" -> "FEMALE", "age" -> "40") -> 4.0)) val testDist2 = Distribution(Map( Map("gender" -> "FEMALE", "age" -> "20") -> 20.0, Map("gender" -> "MALE", "age" -> "40") -> 10.0)) val expectedTotalVariationDistance = 1.0 val expectedInfinityNormDistance = 24.0 / 28.0 val actualTotalVariationDistance12 = DivergenceUtils.computeTotalVariationDistance(testDist1, testDist2) Assert.assertTrue(math.abs(actualTotalVariationDistance12 - expectedTotalVariationDistance) < EPS) val actualTotalVariationDistance21 = DivergenceUtils.computeTotalVariationDistance(testDist2, testDist1) Assert.assertTrue(math.abs(actualTotalVariationDistance21 - expectedTotalVariationDistance) < EPS) val actualInfinityNormDistance12 = DivergenceUtils.computeInfinityNormDistance(testDist1, testDist2) Assert.assertTrue(math.abs(actualInfinityNormDistance12 - expectedInfinityNormDistance) < EPS) val actualInfinityNormDistance21 = DivergenceUtils.computeInfinityNormDistance(testDist2, testDist1) Assert.assertTrue(math.abs(actualInfinityNormDistance21 - expectedInfinityNormDistance) < EPS) } @Test(description = "Total variation and infinity norm distances - with overlap") def testTotalVariationAndInfinityNormDistancesWithOverlap(): Unit = { val testDist1 = Distribution(Map( Map("gender" -> "MALE", "age" -> "20") -> 24.0, Map("gender" -> "MALE", "age" -> "40") -> 12.0, Map("gender" -> "FEMALE", "age" -> "40") -> 4.0)) val testDist2 = Distribution(Map( Map("gender" -> "FEMALE", "age" -> "20") -> 25.0, Map("gender" -> "FEMALE", "age" -> "40") -> 5.0, Map("gender" -> "MALE", "age" -> "40") -> 10.0)) val expectedTotalVariationDistance = 0.5 * (24.0 + 2.0 + 25.0 + 1.0) / 40.0 val expectedInfinityNormDistance = 25.0 / 40.0 val actualTotalVariationDistance12 = DivergenceUtils.computeTotalVariationDistance(testDist1, testDist2) Assert.assertTrue(math.abs(actualTotalVariationDistance12 - expectedTotalVariationDistance) < EPS) val actualTotalVariationDistance21 = DivergenceUtils.computeTotalVariationDistance(testDist2, testDist1) Assert.assertTrue(math.abs(actualTotalVariationDistance21 - expectedTotalVariationDistance) < EPS) val actualInfinityNormDistance12 = DivergenceUtils.computeInfinityNormDistance(testDist1, testDist2) Assert.assertTrue(math.abs(actualInfinityNormDistance12 - expectedInfinityNormDistance) < EPS) val actualInfinityNormDistance21 = DivergenceUtils.computeInfinityNormDistance(testDist2, testDist1) Assert.assertTrue(math.abs(actualInfinityNormDistance21 - expectedInfinityNormDistance) < EPS) } @Test(description = "Skew measures - no overlap") def testSkewMeasuresNoOverlap(): Unit = { val testDist1 = Distribution(Map( Map("gender" -> "MALE", "age" -> "20") -> 24.0, Map("gender" -> "FEMALE", "age" -> "40") -> 4.0)) val testDist2 = Distribution(Map( Map("gender" -> "FEMALE", "age" -> "20") -> 20.0, Map("gender" -> "MALE", "age" -> "40") -> 10.0)) val expectedSkew12GenderFemaleAge20 = math.log(1/21.0 * 34.0/32.0) val expectedSkew12GenderMaleAge20 = math.log(25.0 * 34.0/32.0) val expectedMinSkew12 = (Map("gender" -> "FEMALE", "age" -> "20"), expectedSkew12GenderFemaleAge20) val expectedMaxSkew12 = (Map("gender" -> "MALE", "age" -> "20"), expectedSkew12GenderMaleAge20) val expectedAllSkews12 = Map( Map("gender" -> "MALE", "age" -> "20") -> expectedSkew12GenderMaleAge20, Map("gender" -> "FEMALE", "age" -> "40") -> math.log(5.0 * 34.0/32.0), Map("gender" -> "FEMALE", "age" -> "20") -> expectedSkew12GenderFemaleAge20, Map("gender" -> "MALE", "age" -> "40") -> math.log(1/11.0 * 34.0/32.0)) val actualSkew12GenderFemaleAge20 = DivergenceUtils.computeSkew(testDist1, testDist2, Map("gender" -> "FEMALE", "age" -> "20")) Assert.assertTrue(math.abs(actualSkew12GenderFemaleAge20 - expectedSkew12GenderFemaleAge20) < EPS) val actualSkew12GenderMaleAge20 = DivergenceUtils.computeSkew(testDist1, testDist2, Map("gender" -> "MALE", "age" -> "20")) Assert.assertTrue(math.abs(actualSkew12GenderMaleAge20 - expectedSkew12GenderMaleAge20) < EPS) val actualMinSkew12 = DivergenceUtils.computeMinSkew(testDist1, testDist2) Assert.assertEquals(actualMinSkew12._1, expectedMinSkew12._1) Assert.assertTrue(math.abs(actualMinSkew12._2 - expectedMinSkew12._2) < EPS) val actualMaxSkew12 = DivergenceUtils.computeMaxSkew(testDist1, testDist2) Assert.assertEquals(actualMaxSkew12._1, expectedMaxSkew12._1) Assert.assertTrue(math.abs(actualMaxSkew12._2 - expectedMaxSkew12._2) < EPS) val actualAllSkews12 = DivergenceUtils.computeAllSkews(testDist1, testDist2) actualAllSkews12.foreach { case (dimensions, skew) => Assert.assertTrue(math.abs(skew - expectedAllSkews12.getOrElse(dimensions, 0.0)) < EPS) } } @Test(description = "Skew measures - with overlap") def testSkewMeasuresWithOverlap(): Unit = { val testDist1 = Distribution(Map( Map("gender" -> "MALE", "age" -> "20") -> 24.0, Map("gender" -> "MALE", "age" -> "40") -> 12.0, Map("gender" -> "FEMALE", "age" -> "40") -> 4.0)) val testDist2 = Distribution(Map( Map("gender" -> "FEMALE", "age" -> "20") -> 25.0, Map("gender" -> "FEMALE", "age" -> "40") -> 5.0, Map("gender" -> "MALE", "age" -> "40") -> 10.0)) val expectedSkew12GenderFemaleAge20 = math.log(1/26.0) val expectedSkew12GenderMaleAge20 = math.log(25.0) val expectedMinSkew12 = (Map("gender" -> "FEMALE", "age" -> "20"), expectedSkew12GenderFemaleAge20) val expectedMaxSkew12 = (Map("gender" -> "MALE", "age" -> "20"), expectedSkew12GenderMaleAge20) val expectedAllSkews12 = Map( Map("gender" -> "MALE", "age" -> "20") -> expectedSkew12GenderMaleAge20, Map("gender" -> "FEMALE", "age" -> "40") -> math.log(5.0/6.0), Map("gender" -> "FEMALE", "age" -> "20") -> expectedSkew12GenderFemaleAge20, Map("gender" -> "MALE", "age" -> "40") -> math.log(13.0/11.0)) val actualSkew12GenderFemaleAge20 = DivergenceUtils.computeSkew(testDist1, testDist2, Map("gender" -> "FEMALE", "age" -> "20")) Assert.assertTrue(math.abs(actualSkew12GenderFemaleAge20 - expectedSkew12GenderFemaleAge20) < EPS) val actualSkew12GenderMaleAge20 = DivergenceUtils.computeSkew(testDist1, testDist2, Map("gender" -> "MALE", "age" -> "20")) Assert.assertTrue(math.abs(actualSkew12GenderMaleAge20 - expectedSkew12GenderMaleAge20) < EPS) val actualMinSkew12 = DivergenceUtils.computeMinSkew(testDist1, testDist2) Assert.assertEquals(actualMinSkew12._1, expectedMinSkew12._1) Assert.assertTrue(math.abs(actualMinSkew12._2 - expectedMinSkew12._2) < EPS) val actualMaxSkew12 = DivergenceUtils.computeMaxSkew(testDist1, testDist2) Assert.assertEquals(actualMaxSkew12._1, expectedMaxSkew12._1) Assert.assertTrue(math.abs(actualMaxSkew12._2 - expectedMaxSkew12._2) < EPS) val actualAllSkews12 = DivergenceUtils.computeAllSkews(testDist1, testDist2) actualAllSkews12.foreach { case (dimensions, skew) => Assert.assertTrue(math.abs(skew - expectedAllSkews12.getOrElse(dimensions, 0.0)) < EPS) } } @Test(description = "Generalized counts distribution") def testComputeGeneralizedPredictionCountDistribution(): Unit = { val expectedDistr = Distribution(Map( Map("gender" -> "MALE", "label" -> "1.0", "predicted" -> "1.0") -> 1.1, Map("gender" -> "MALE", "label" -> "1.0", "predicted" -> "0.0") -> 0.9, Map("gender" -> "MALE", "label" -> "0.0", "predicted" -> "1.0") -> 1.2, Map("gender" -> "MALE", "label" -> "0.0", "predicted" -> "0.0") -> 1.8, Map("gender" -> "FEMALE", "label" -> "1.0", "predicted" -> "1.0") -> 1.2, Map("gender" -> "FEMALE", "label" -> "1.0", "predicted" -> "0.0") -> 0.8, Map("gender" -> "FEMALE", "label" -> "0.0", "predicted" -> "1.0") -> 1.0, Map("gender" -> "FEMALE", "label" -> "0.0", "predicted" -> "0.0") -> 1.0, Map("gender" -> "UNKNOWN", "label" -> "0.0", "predicted" -> "1.0") -> 0.6, Map("gender" -> "UNKNOWN", "label" -> "0.0", "predicted" -> "0.0") -> 0.4)) val actualDistr = DivergenceUtils.computeGeneralizedPredictionCountDistribution( TestValues.df2, "label", "predicted", "gender") Assert.assertEquals(actualDistr.entries.size, expectedDistr.entries.size) expectedDistr.entries.foreach { case (dimVals, expectedCounts) => Assert.assertTrue(math.abs(actualDistr.getValue(dimVals) - expectedCounts) < EPS) } } @Test(description = "Demographic Parity") def testComputeDemographicParity(): Unit = { val distribution = Distribution(Map( Map("gender" -> "MALE", "label" -> "1.0") -> 345, Map("gender" -> "MALE", "label" -> "0.0") -> 123, Map("gender" -> "FEMALE", "label" -> "1.0") -> 567, Map("gender" -> "FEMALE", "label" -> "0.0") -> 89, Map("gender" -> "UNKNOWN", "label" -> "1.0") -> 25, Map("gender" -> "UNKNOWN", "label" -> "0.0") -> 70)) val actualResults = DivergenceUtils.computeDemographicParity(distribution, "label", "gender") val expectedResults = FairnessResult( resultType = "DEMOGRAPHIC_PARITY", resultValOpt = None, constituentVals = Map( Map("gender1" -> "FEMALE", "gender2" -> "UNKNOWN") -> 0.60117, Map("gender1" -> "FEMALE", "gender2" -> "MALE") -> 0.12715, Map("gender1" -> "UNKNOWN", "gender2" -> "MALE") -> 0.47402), additionalStats = Map("MALE" -> 0.73718, "FEMALE" -> 0.86433, "UNKNOWN" -> 0.26316)) Assert.assertEquals(actualResults, expectedResults) // Test with 0/1 labels val distributionInt = Distribution(Map( Map("gender" -> "MALE", "label" -> "1") -> 345, Map("gender" -> "MALE", "label" -> "0") -> 123, Map("gender" -> "FEMALE", "label" -> "1") -> 567, Map("gender" -> "FEMALE", "label" -> "0") -> 89, Map("gender" -> "UNKNOWN", "label" -> "1") -> 25, Map("gender" -> "UNKNOWN", "label" -> "0") -> 70)) val actualResultsInt = DivergenceUtils.computeDemographicParity(distributionInt, "label", "gender") Assert.assertEquals(actualResultsInt, expectedResults) } @Test(description = "Equalized Odds") def testComputeEqualizedOdds(): Unit = { val distribution = Distribution(Map( Map("gender" -> "MALE", "label" -> "1.0", "predicted" -> "0.0") -> 345, Map("gender" -> "MALE", "label" -> "1.0", "predicted" -> "1.0") -> 145, Map("gender" -> "MALE", "label" -> "0.0", "predicted" -> "0.0") -> 123, Map("gender" -> "MALE", "label" -> "0.0", "predicted" -> "1.0") -> 23, Map("gender" -> "FEMALE", "label" -> "1.0", "predicted" -> "0.0") -> 567, Map("gender" -> "FEMALE", "label" -> "1.0", "predicted" -> "1.0") -> 367, Map("gender" -> "FEMALE", "label" -> "0.0", "predicted" -> "0.0") -> 89, Map("gender" -> "FEMALE", "label" -> "0.0", "predicted" -> "1.0") -> 49, Map("gender" -> "UNKNOWN", "label" -> "1.0", "predicted" -> "0.0") -> 25, Map("gender" -> "UNKNOWN", "label" -> "1.0", "predicted" -> "1.0") -> 35, Map("gender" -> "UNKNOWN", "label" -> "0.0", "predicted" -> "0.0") -> 70, Map("gender" -> "UNKNOWN", "label" -> "0.0", "predicted" -> "1.0") -> 20)) val actualResults = DivergenceUtils.computeEqualizedOdds(distribution, "label", "predicted", "gender") val expectedResults = FairnessResult( resultType = "EQUALIZED_ODDS", resultValOpt = None, constituentVals = Map( Map("gender1" -> "FEMALE", "gender2" -> "UNKNOWN", "label" -> "1.0") -> 0.1904, Map("gender1" -> "FEMALE", "gender2" -> "MALE", "label" -> "1.0") -> 0.09701, Map("gender1" -> "UNKNOWN", "gender2" -> "MALE", "label" -> "1.0") -> 0.28741, Map("gender1" -> "UNKNOWN", "gender2" -> "MALE", "label" -> "0.0") -> 0.06469, Map("gender1" -> "UNKNOWN", "gender2" -> "FEMALE", "label" -> "0.0") -> 0.13285, Map("gender1" -> "MALE", "gender2" -> "FEMALE", "label" -> "0.0") -> 0.19754), additionalStats = Map( "1.0,MALE" -> 0.29592, "1.0,FEMALE" -> 0.39293, "1.0,UNKNOWN" -> 0.58333, "0.0,MALE" -> 0.15753, "0.0,FEMALE" -> 0.35507, "0.0,UNKNOWN" -> 0.22222)) Assert.assertEquals(actualResults, expectedResults) val distributionInt = Distribution(Map( Map("gender" -> "MALE", "label" -> "1", "predicted" -> "0") -> 345, Map("gender" -> "MALE", "label" -> "1", "predicted" -> "1") -> 145, Map("gender" -> "MALE", "label" -> "0", "predicted" -> "0") -> 123, Map("gender" -> "MALE", "label" -> "0", "predicted" -> "1") -> 23, Map("gender" -> "FEMALE", "label" -> "1", "predicted" -> "0") -> 567, Map("gender" -> "FEMALE", "label" -> "1", "predicted" -> "1") -> 367, Map("gender" -> "FEMALE", "label" -> "0", "predicted" -> "0") -> 89, Map("gender" -> "FEMALE", "label" -> "0", "predicted" -> "1") -> 49, Map("gender" -> "UNKNOWN", "label" -> "1", "predicted" -> "0") -> 25, Map("gender" -> "UNKNOWN", "label" -> "1", "predicted" -> "1") -> 35, Map("gender" -> "UNKNOWN", "label" -> "0", "predicted" -> "0") -> 70, Map("gender" -> "UNKNOWN", "label" -> "0", "predicted" -> "1") -> 20)) val actualResultsInt = DivergenceUtils.computeEqualizedOdds(distributionInt, "label", "predicted", "gender") val expectedResultsInt = FairnessResult( resultType = "EQUALIZED_ODDS", resultValOpt = None, constituentVals = Map( Map("gender1" -> "UNKNOWN", "gender2" -> "FEMALE", "label" -> "1") -> 0.1904, Map("gender1" -> "MALE", "gender2" -> "FEMALE", "label" -> "1") -> 0.09701, Map("gender1" -> "MALE", "gender2" -> "UNKNOWN", "label" -> "1") -> 0.28741, Map("gender1" -> "UNKNOWN", "gender2" -> "MALE", "label" -> "0") -> 0.06469, Map("gender1" -> "FEMALE", "gender2" -> "UNKNOWN", "label" -> "0") -> 0.13285, Map("gender1" -> "FEMALE", "gender2" -> "MALE", "label" -> "0") -> 0.19754), additionalStats = Map( "1,MALE" -> 0.29592, "1,FEMALE" -> 0.39293, "1,UNKNOWN" -> 0.58333, "0,MALE" -> 0.15753, "0,FEMALE" -> 0.35507, "0,UNKNOWN" -> 0.22222)) Assert.assertEquals(actualResultsInt, expectedResultsInt) } } ================================================ FILE: lift/src/test/scala/com/linkedin/lift/lib/PermutationTestUtilsTest.scala ================================================ package com.linkedin.lift.lib import com.linkedin.lift.types.{FairnessResult, ModelPrediction} import org.testng.Assert import org.testng.annotations.Test /** * Tests for PermutationTestUtils */ class PermutationTestUtilsTest { @Test(description = "Permutation test with precision. Expected results obtained using R code.") def testPermutationTestPrecision(): Unit = { val predictions1 = Seq( ModelPrediction(label = 1, prediction = 1, dimensionValue = "MALE"), ModelPrediction(label = 1, prediction = 0, dimensionValue = "MALE"), ModelPrediction(label = 0, prediction = 0, dimensionValue = "MALE"), ModelPrediction(label = 0, prediction = 0, dimensionValue = "FEMALE"), ModelPrediction(label = 0, prediction = 1, dimensionValue = "FEMALE"), ModelPrediction(label = 0, prediction = 1, dimensionValue = "FEMALE"), ModelPrediction(label = 0, prediction = 1, dimensionValue = "FEMALE"), ModelPrediction(label = 0, prediction = 1, dimensionValue = "FEMALE"), ModelPrediction(label = 0, prediction = 1, dimensionValue = "FEMALE"), ModelPrediction(label = 1, prediction = 0, dimensionValue = "FEMALE"), ModelPrediction(label = 1, prediction = 0, dimensionValue = "FEMALE"), ModelPrediction(label = 0, prediction = 1, dimensionValue = "MALE"), ModelPrediction(label = 0, prediction = 1, dimensionValue = "MALE"), ModelPrediction(label = 0, prediction = 1, dimensionValue = "MALE"), ModelPrediction(label = 0, prediction = 1, dimensionValue = "MALE"), ModelPrediction(label = 0, prediction = 1, dimensionValue = "MALE"), ModelPrediction(label = 0, prediction = 1, dimensionValue = "MALE"), ModelPrediction(label = 0, prediction = 1, dimensionValue = "MALE")) val actualResult1 = PermutationTestUtils.permutationTest(predictions1, "gender", "MALE", "FEMALE", "PRECISION", 2000, 1) val expectedResult1 = FairnessResult( resultType = "PERMUTATION_TEST", parameters = "Map(metric -> PRECISION, numTrials -> 2000, seed -> 1)", resultValOpt = Some(0.125), constituentVals = Map(Map("gender" -> "MALE") -> 0.125, Map("gender" -> "FEMALE") -> 0.0), additionalStats = Map("pValue" -> 0.438, "stdError" -> 0.01109, "bootstrapStdDev" -> 0.12188672941783991, "testStatisticStdDev" -> 0.1574454263804954)) Assert.assertEquals(actualResult1, expectedResult1) val predictions2 = Seq( ModelPrediction(label = 1, prediction = 1, dimensionValue = "MALE"), ModelPrediction(label = 1, prediction = 0, dimensionValue = "MALE"), ModelPrediction(label = 0, prediction = 0, dimensionValue = "MALE"), ModelPrediction(label = 0, prediction = 0, dimensionValue = "FEMALE"), ModelPrediction(label = 0, prediction = 1, dimensionValue = "FEMALE"), ModelPrediction(label = 1, prediction = 1, dimensionValue = "FEMALE"), ModelPrediction(label = 1, prediction = 0, dimensionValue = "FEMALE"), ModelPrediction(label = 0, prediction = 1, dimensionValue = "MALE"), ModelPrediction(label = 0, prediction = 1, dimensionValue = "MALE"), ModelPrediction(label = 0, prediction = 1, dimensionValue = "MALE")) val actualResult2 = PermutationTestUtils.permutationTest(predictions2, "gender", "FEMALE", "MALE", "PRECISION", 2000, 1) val expectedResult2 = FairnessResult( resultType = "PERMUTATION_TEST", parameters = "Map(metric -> PRECISION, numTrials -> 2000, seed -> 1)", resultValOpt = Some(0.25), constituentVals = Map(Map("gender" -> "MALE") -> 0.25, Map("gender" -> "FEMALE") -> 0.5), additionalStats = Map("pValue" -> 0.753, "stdError" -> 0.00964, "bootstrapStdDev" -> 0.4590352058557182, "testStatisticStdDev" -> 0.44861306534335205)) Assert.assertEquals(actualResult2, expectedResult2) val predictions3 = Seq( ModelPrediction(label = 1, prediction = 1, dimensionValue = "MALE"), ModelPrediction(label = 1, prediction = 0, dimensionValue = "MALE"), ModelPrediction(label = 0, prediction = 0, dimensionValue = "MALE"), ModelPrediction(label = 0, prediction = 1, dimensionValue = "FEMALE"), ModelPrediction(label = 0, prediction = 1, dimensionValue = "FEMALE"), ModelPrediction(label = 1, prediction = 1, dimensionValue = "FEMALE"), ModelPrediction(label = 0, prediction = 1, dimensionValue = "FEMALE"), ModelPrediction(label = 0, prediction = 1, dimensionValue = "MALE"), ModelPrediction(label = 0, prediction = 1, dimensionValue = "MALE"), ModelPrediction(label = 0, prediction = 1, dimensionValue = "MALE")) val actualResult3 = PermutationTestUtils.permutationTest(predictions3, "gender", "MALE", "FEMALE", "PRECISION", 2000, 1) val expectedResult3 = FairnessResult( resultType = "PERMUTATION_TEST", parameters = "Map(metric -> PRECISION, numTrials -> 2000, seed -> 1)", resultValOpt = Some(0.0), constituentVals = Map(Map("gender" -> "MALE") -> 0.25, Map("gender" -> "FEMALE") -> 0.25), additionalStats = Map("pValue" -> 0.788, "stdError" -> 0.00914, "bootstrapStdDev" -> 0.32798838458036056, "testStatisticStdDev" -> 0.3334284273228113)) Assert.assertEquals(actualResult3, expectedResult3) } @Test(description = "Permutation test for ranking") def testPermutationTestRanking(): Unit = { val predictions = Seq( ModelPrediction(label = 0, prediction = 1, dimensionValue = "MALE", groupId = "1", rank = 1), ModelPrediction(label = 1, prediction = 1, dimensionValue = "MALE", groupId = "1", rank = 2), ModelPrediction(label = 0, prediction = 1, dimensionValue = "FEMALE", groupId = "1", rank = 3), ModelPrediction(label = 1, prediction = 0, dimensionValue = "MALE", groupId = "2", rank = 1), ModelPrediction(label = 1, prediction = 1, dimensionValue = "FEMALE", groupId = "2", rank = 2), ModelPrediction(label = 0, prediction = 1, dimensionValue = "MALE", groupId = "2", rank = 4), ModelPrediction(label = 0, prediction = 0, dimensionValue = "MALE", groupId = "2", rank = 7), ModelPrediction(label = 0, prediction = 1, dimensionValue = "FEMALE", groupId = "3", rank = 1), ModelPrediction(label = 0, prediction = 1, dimensionValue = "FEMALE", groupId = "3", rank = 2), ModelPrediction(label = 1, prediction = 1, dimensionValue = "MALE", groupId = "3", rank = 3)) val actualResult3 = PermutationTestUtils.permutationTest(predictions, "gender", "MALE", "FEMALE", "PRECISION/1@5", 2000, 1) val expectedResult3 = FairnessResult( resultType = "PERMUTATION_TEST", parameters = "Map(metric -> PRECISION/1@5, numTrials -> 2000, seed -> 1)", resultValOpt = Some(0.33333), constituentVals = Map(Map("gender" -> "MALE") -> 0.66667, Map("gender" -> "FEMALE") -> 0.33333), additionalStats = Map("pValue" -> 0.317, "stdError" -> 0.0104, "bootstrapStdDev" -> 0.3387960612857448, "testStatisticStdDev" -> 0.40255115701974276)) Assert.assertEquals(actualResult3, expectedResult3) } } ================================================ FILE: lift/src/test/scala/com/linkedin/lift/lib/PositionBiasUtilsTest.scala ================================================ package com.linkedin.lift.lib import com.linkedin.lift.lib.PositionBiasUtils._ import com.linkedin.lift.lib.testing.TestUtils import com.linkedin.lift.lib.testing.TestValues.positionBiasData import org.apache.spark.mllib.random.RandomRDDs.normalRDD import org.apache.spark.sql.SparkSession import org.testng.Assert import org.testng.annotations.Test /** * Tests for PositionBiasUtils */ class PositionBiasUtilsTest { final val spark: SparkSession = TestUtils.createSparkSession() @Test(description = "Bandwidth computation based on Silverman's rule") def getBandwidthTest(): Unit = { import spark.implicits._ val df = normalRDD(spark.sparkContext, 10000L, 1, seed = 123).toDF("value") val bw = getBandwidth(df) Assert.assertEquals(bw, 1.06 * Math.pow(10000, -0.2), 0.01) } @Test(description = "Estimating the position bias at targetPosition with respect to basePosition") def estimateAdjacentPositionBiasTest(): Unit = { val estimate = estimateAdjacentPositionBias(positionBiasData, 1e3, 2, 1) Assert.assertEquals(estimate, 0.80, 0.01) } @Test(description = "Position bias Estimation with respect to the top most position") def estimatePositionBiasTest(): Unit = { val estimate = estimatePositionBias(positionBiasData, 1e3, 3) Assert.assertEquals(estimate(1).positionBias, 0.80, 0.01) Assert.assertEquals(estimate(2).positionBias, 0.60, 0.01) } @Test(description = "Resampling data with weights corresponds to the inverse position bias") def debiasPositiveLabelScores(): Unit = { import spark.implicits._ val debiasedPositiveLabelData = PositionBiasUtils.debiasPositiveLabelScores(positionBiasData, 1e3, 3, 5, 10, 1, 1234) val positiveLabelRatioInDebiasedData21 = debiasedPositiveLabelData.filter( $"position" === 2).count.toFloat / debiasedPositiveLabelData.filter($"position" === 1).count val positiveLabelRatioInData21 = positionBiasData.filter($"position" === 2 and $"label" === 1).count.toFloat / positionBiasData.filter($"position" === 1 and $"label" === 1).count // the ratio of positiveLabelRatioInData21 and positiveLabelRatioInDebiasedData21 should match // the position bias at position 2 with respect to position 1 Assert.assertEquals(positiveLabelRatioInData21 / positiveLabelRatioInDebiasedData21, 0.80, 0.05) val positiveLabelRatioInDebiasedData31 = debiasedPositiveLabelData.filter( $"position" === 3).count.toFloat / debiasedPositiveLabelData.filter($"position" === 1).count val positiveLabelRatioInData31 = positionBiasData.filter($"position" === 3 and $"label" === 1).count.toFloat / positionBiasData.filter($"position" === 1 and $"label" === 1).count // the ratio of positiveLabelRatioInData31 and positiveLabelRatioInDebiasedData31 should match // the position bias at position 3 with respect to position 1 Assert.assertEquals(positiveLabelRatioInData31 / positiveLabelRatioInDebiasedData31, 0.60, 0.05) } } ================================================ FILE: lift/src/test/scala/com/linkedin/lift/lib/StatsUtilsTest.scala ================================================ package com.linkedin.lift.lib import com.linkedin.lift.lib.StatsUtils.ConfusionMatrix import com.linkedin.lift.lib.testing.TestValues import com.linkedin.lift.types.ModelPrediction import org.apache.spark.sql.functions.col import org.testng.Assert import org.testng.annotations.Test /** * Tests for StatsUtils */ class StatsUtilsTest { @Test(description = "Round a double to specified digits of precision") def testRoundDouble(): Unit = { Assert.assertEquals(StatsUtils.roundDouble(0.123456), 0.12346) Assert.assertEquals(StatsUtils.roundDouble(0.123456, 4), 0.1235) Assert.assertEquals(StatsUtils.roundDouble(0.123456, 2), 0.12) Assert.assertEquals(StatsUtils.roundDouble(0.123456, 1), 0.1) } @Test(description = "Compute positive and negative sample percentages") def testComputePosNegSamplePercentages(): Unit = { val posDF = TestValues.df.filter(col("label") === "1") val negDF = TestValues.df.filter(col("label") === "0") // Sample 50% from each striation to ensure an overall 50% sample with the // same pos:neg ratio as the source val (posSamplePercentage1, negSamplePercentage1) = StatsUtils.computePosNegSamplePercentages(posDF, negDF, 5) Assert.assertEquals(posSamplePercentage1, 0.5) Assert.assertEquals(negSamplePercentage1, 0.5) // Sampling 1 out of 4 positives, and 4 out of 6 negatives will give us 0.8 // percentage of negative labels and a total of 5 rows. val (posSamplePercentage2, negSamplePercentage2) = StatsUtils.computePosNegSamplePercentages(posDF, negDF, 5, 0.8) Assert.assertEquals(StatsUtils.roundDouble(posSamplePercentage2), 0.25) Assert.assertEquals(StatsUtils.roundDouble(negSamplePercentage2), 0.66667) // Requesting way too many samples should return 1.0 val (posSamplePercentage3, negSamplePercentage3) = StatsUtils.computePosNegSamplePercentages(posDF, negDF, 100) Assert.assertEquals(posSamplePercentage3, 1.0) Assert.assertEquals(negSamplePercentage3, 1.0) } @Test(description = "Precision@K") def testComputePrecisionAtK(): Unit = { val pAt5Threshold1 = StatsUtils.computePrecisionAtK(1.0, 5)(_) val pAt5Threshold2 = StatsUtils.computePrecisionAtK(2.0, 5)(_) val pAt10Threshold1 = StatsUtils.computePrecisionAtK(1.0, 10)(_) val pAt10Threshold2 = StatsUtils.computePrecisionAtK(2.0, 10)(_) val predictions = Seq( ModelPrediction(label = 1, prediction = 1.0, dimensionValue = "", groupId = "1", rank = 1), ModelPrediction(label = 1, prediction = 0.8, dimensionValue = "", groupId = "1", rank = 2), ModelPrediction(label = 2, prediction = 0.8, dimensionValue = "", groupId = "1", rank = 3), ModelPrediction(label = 0, prediction = 0.7, dimensionValue = "", groupId = "1", rank = 4), ModelPrediction(label = 2, prediction = 0.9, dimensionValue = "", groupId = "2", rank = 1), ModelPrediction(label = 2, prediction = 0.2, dimensionValue = "", groupId = "2", rank = 2), ModelPrediction(label = 1, prediction = 0.3, dimensionValue = "", groupId = "2", rank = 3), ModelPrediction(label = 1, prediction = 1.0, dimensionValue = "", groupId = "2", rank = 4), ModelPrediction(label = 0, prediction = 0.6, dimensionValue = "", groupId = "2", rank = 5), ModelPrediction(label = 1, prediction = 0.6, dimensionValue = "", groupId = "2", rank = 6), ModelPrediction(label = 2, prediction = 0.6, dimensionValue = "", groupId = "2", rank = 7), ModelPrediction(label = 2, prediction = 0.6, dimensionValue = "", groupId = "2", rank = 8), ModelPrediction(label = 1, prediction = 0.6, dimensionValue = "", groupId = "2", rank = 9), ModelPrediction(label = 0, prediction = 0.6, dimensionValue = "", groupId = "2", rank = 10), ModelPrediction(label = 0, prediction = 0.7, dimensionValue = "", groupId = "2", rank = 11), ModelPrediction(label = 2, prediction = 0.6, dimensionValue = "", groupId = "3", rank = 1), ModelPrediction(label = 2, prediction = 0.6, dimensionValue = "", groupId = "3", rank = 2), ModelPrediction(label = 2, prediction = 0.6, dimensionValue = "", groupId = "3", rank = 3), ModelPrediction(label = 1, prediction = 0.6, dimensionValue = "", groupId = "3", rank = 4), ModelPrediction(label = 1, prediction = 0.6, dimensionValue = "", groupId = "3", rank = 5), ModelPrediction(label = 0, prediction = 0.7, dimensionValue = "", groupId = "3", rank = 6)) Assert.assertEquals(pAt5Threshold1(predictions), 0.85) Assert.assertEquals(pAt5Threshold2(predictions), 0.4166666666666667) Assert.assertEquals(pAt10Threshold1(predictions), 0.7944444444444444) Assert.assertEquals(pAt10Threshold2(predictions), 0.3833333333333333) } @Test(description = "Standard Deviation") def testComputeStdDev(): Unit = { Assert.assertEquals(StatsUtils.computeStdDev(Seq()), 0.0) Assert.assertEquals(StatsUtils.computeStdDev(Seq(1.0)), 0.0) val testSeq1: Seq[Double] = Seq(1.0, 1.0, 1.0, 1.0) Assert.assertEquals(StatsUtils.computeStdDev(testSeq1), 0.0) val testSeq2: Seq[Double] = Seq(-2.0, -1.0, 0.0, 1.0, 2.0) Assert.assertEquals(StatsUtils.computeStdDev(testSeq2), 1.5811388300841898) val testSeq3: Seq[Double] = Seq(1.0, 1.2, 2.0, 1.3, -1.4, -2.3, -1.8, 4.4, 2.2, 5.8, -3.0, 0.0, 0.3, 0.1, -0.01, -4, -3, -2.0, 1.0, 4.1, -2.8, 3.3) Assert.assertEquals(StatsUtils.computeStdDev(testSeq3), 2.6658697808242033) } @Test(description = "Traditional confusion matrix") def testComputeTraditionalConfusionMatrix(): Unit = { val predictions = Seq( ModelPrediction(label = 1, prediction = 1, dimensionValue = ""), ModelPrediction(label = 1, prediction = 0, dimensionValue = ""), ModelPrediction(label = 0, prediction = 0, dimensionValue = ""), ModelPrediction(label = 0, prediction = 0, dimensionValue = ""), ModelPrediction(label = 0, prediction = 1, dimensionValue = ""), ModelPrediction(label = 1, prediction = 0, dimensionValue = ""), ModelPrediction(label = 1, prediction = 0, dimensionValue = ""), ModelPrediction(label = 0, prediction = 1, dimensionValue = ""), ModelPrediction(label = 0, prediction = 1, dimensionValue = ""), ModelPrediction(label = 0, prediction = 1, dimensionValue = "")) val actualConfMatrix = StatsUtils.computeGeneralizedConfusionMatrix(predictions) val expectedConfMatrix = ConfusionMatrix( truePositive = 1, falsePositive = 4, trueNegative = 2, falseNegative = 3) Assert.assertEquals(actualConfMatrix, expectedConfMatrix) } @Test(description = "Generalized confusion matrix") def testComputeGeneralizedConfusionMatrix(): Unit = { val predictions = Seq( ModelPrediction(label = 1, prediction = 0.8, dimensionValue = ""), ModelPrediction(label = 1, prediction = 0.0, dimensionValue = ""), ModelPrediction(label = 0, prediction = 0.4, dimensionValue = ""), ModelPrediction(label = 0, prediction = 0.1, dimensionValue = ""), ModelPrediction(label = 0, prediction = 0.9, dimensionValue = ""), ModelPrediction(label = 1, prediction = 0.2, dimensionValue = ""), ModelPrediction(label = 1, prediction = 0.3, dimensionValue = ""), ModelPrediction(label = 0, prediction = 1.0, dimensionValue = ""), ModelPrediction(label = 0, prediction = 0.6, dimensionValue = ""), ModelPrediction(label = 0, prediction = 0.7, dimensionValue = "")) val actualConfMatrix = StatsUtils.computeGeneralizedConfusionMatrix(predictions) val expectedConfMatrix = ConfusionMatrix( truePositive = 1.3, falsePositive = 3.7, trueNegative = 2.3, falseNegative = 2.7) Assert.assertEquals(actualConfMatrix, expectedConfMatrix) } @Test(description = "Compute precision") def testComputePrecision(): Unit = { val predictions = Seq( ModelPrediction(label = 1, prediction = 1, dimensionValue = ""), ModelPrediction(label = 1, prediction = 0, dimensionValue = ""), ModelPrediction(label = 0, prediction = 0, dimensionValue = ""), ModelPrediction(label = 0, prediction = 0, dimensionValue = ""), ModelPrediction(label = 0, prediction = 1, dimensionValue = ""), ModelPrediction(label = 1, prediction = 0, dimensionValue = ""), ModelPrediction(label = 1, prediction = 0, dimensionValue = ""), ModelPrediction(label = 0, prediction = 1, dimensionValue = ""), ModelPrediction(label = 0, prediction = 1, dimensionValue = ""), ModelPrediction(label = 0, prediction = 1, dimensionValue = "")) val actualPrecision = StatsUtils.computePrecision(predictions) Assert.assertEquals(actualPrecision, 0.2) } @Test(description = "Compute FPR") def testComputeFalsePositiveRate(): Unit = { val predictions = Seq( ModelPrediction(label = 1, prediction = 1, dimensionValue = ""), ModelPrediction(label = 1, prediction = 0, dimensionValue = ""), ModelPrediction(label = 0, prediction = 0, dimensionValue = ""), ModelPrediction(label = 0, prediction = 0, dimensionValue = ""), ModelPrediction(label = 0, prediction = 1, dimensionValue = ""), ModelPrediction(label = 1, prediction = 0, dimensionValue = ""), ModelPrediction(label = 1, prediction = 0, dimensionValue = ""), ModelPrediction(label = 0, prediction = 1, dimensionValue = ""), ModelPrediction(label = 0, prediction = 1, dimensionValue = ""), ModelPrediction(label = 0, prediction = 1, dimensionValue = "")) val actualFPR = StatsUtils.computeFalsePositiveRate(predictions) Assert.assertEquals(StatsUtils.roundDouble(actualFPR), 0.66667) } @Test(description = "Compute FNR") def testComputeFalseNegativeRate(): Unit = { val predictions = Seq( ModelPrediction(label = 1, prediction = 1, dimensionValue = ""), ModelPrediction(label = 1, prediction = 0, dimensionValue = ""), ModelPrediction(label = 0, prediction = 0, dimensionValue = ""), ModelPrediction(label = 0, prediction = 0, dimensionValue = ""), ModelPrediction(label = 0, prediction = 1, dimensionValue = ""), ModelPrediction(label = 1, prediction = 0, dimensionValue = ""), ModelPrediction(label = 1, prediction = 0, dimensionValue = ""), ModelPrediction(label = 0, prediction = 1, dimensionValue = ""), ModelPrediction(label = 0, prediction = 1, dimensionValue = ""), ModelPrediction(label = 0, prediction = 1, dimensionValue = "")) val actualFNR = StatsUtils.computeFalseNegativeRate(predictions) Assert.assertEquals(actualFNR, 0.75) } @Test(description = "Compute Recall") def testComputeRecall(): Unit = { val predictions = Seq( ModelPrediction(label = 1, prediction = 1, dimensionValue = ""), ModelPrediction(label = 1, prediction = 0, dimensionValue = ""), ModelPrediction(label = 0, prediction = 0, dimensionValue = ""), ModelPrediction(label = 0, prediction = 0, dimensionValue = ""), ModelPrediction(label = 0, prediction = 1, dimensionValue = ""), ModelPrediction(label = 1, prediction = 0, dimensionValue = ""), ModelPrediction(label = 1, prediction = 0, dimensionValue = ""), ModelPrediction(label = 0, prediction = 1, dimensionValue = ""), ModelPrediction(label = 0, prediction = 1, dimensionValue = ""), ModelPrediction(label = 0, prediction = 1, dimensionValue = "")) val actualRecall = StatsUtils.computeRecall(predictions) Assert.assertEquals(actualRecall, 0.25) } @Test(description = "Compute TNR") def testComputeTrueNegativeRate(): Unit = { val predictions = Seq( ModelPrediction(label = 1, prediction = 1, dimensionValue = ""), ModelPrediction(label = 1, prediction = 0, dimensionValue = ""), ModelPrediction(label = 0, prediction = 0, dimensionValue = ""), ModelPrediction(label = 0, prediction = 0, dimensionValue = ""), ModelPrediction(label = 0, prediction = 1, dimensionValue = ""), ModelPrediction(label = 1, prediction = 0, dimensionValue = ""), ModelPrediction(label = 1, prediction = 0, dimensionValue = ""), ModelPrediction(label = 0, prediction = 1, dimensionValue = ""), ModelPrediction(label = 0, prediction = 1, dimensionValue = ""), ModelPrediction(label = 0, prediction = 1, dimensionValue = "")) val actualTNR = StatsUtils.computeTrueNegativeRate(predictions) Assert.assertEquals(StatsUtils.roundDouble(actualTNR), 0.33333) } @Test(description = "computeROCCurve") def testComputeROCCurve(): Unit = { val predictions = Seq( ModelPrediction(label = 1, prediction = 0.8, dimensionValue = ""), ModelPrediction(label = 1, prediction = 0.0, dimensionValue = ""), ModelPrediction(label = 0, prediction = 0.4, dimensionValue = ""), ModelPrediction(label = 0, prediction = 0.1, dimensionValue = ""), ModelPrediction(label = 0, prediction = 0.9, dimensionValue = ""), ModelPrediction(label = 1, prediction = 0.2, dimensionValue = ""), ModelPrediction(label = 1, prediction = 0.2, dimensionValue = ""), ModelPrediction(label = 0, prediction = 1.0, dimensionValue = ""), ModelPrediction(label = 0, prediction = 0.7, dimensionValue = ""), ModelPrediction(label = 0, prediction = 0.7, dimensionValue = "")) val (fpr, tpr) = StatsUtils.computeROCCurve(predictions) val roundedFpr = fpr.map(StatsUtils.roundDouble(_)) val roundedTpr = tpr.map(StatsUtils.roundDouble(_)) Assert.assertEquals(roundedFpr, Seq(0.16667, 0.33333, 0.33333, 0.66667, 0.83333, 0.83333, 1.0, 1.0)) Assert.assertEquals(roundedTpr, Seq(0.0, 0.0, 0.25, 0.25, 0.25, 0.75, 0.75, 1.0)) } @Test(description = "computeAUC") def testComputeAUC(): Unit = { // Using the same predictions as above val predictions1 = Seq( ModelPrediction(label = 1, prediction = 0.8, dimensionValue = ""), ModelPrediction(label = 1, prediction = 0.0, dimensionValue = ""), ModelPrediction(label = 0, prediction = 0.4, dimensionValue = ""), ModelPrediction(label = 0, prediction = 0.1, dimensionValue = ""), ModelPrediction(label = 0, prediction = 0.9, dimensionValue = ""), ModelPrediction(label = 1, prediction = 0.2, dimensionValue = ""), ModelPrediction(label = 1, prediction = 0.2, dimensionValue = ""), ModelPrediction(label = 0, prediction = 1.0, dimensionValue = ""), ModelPrediction(label = 0, prediction = 0.7, dimensionValue = ""), ModelPrediction(label = 0, prediction = 0.7, dimensionValue = "")) val auc1 = StatsUtils.computeAUC(predictions1) Assert.assertEquals(StatsUtils.roundDouble(auc1), 0.25) // Predictions with a good classifier val predictions2 = Seq( ModelPrediction(label = 1, prediction = 1.0, dimensionValue = ""), ModelPrediction(label = 1, prediction = 0.8, dimensionValue = ""), ModelPrediction(label = 1, prediction = 0.5, dimensionValue = ""), ModelPrediction(label = 1, prediction = 0.7, dimensionValue = ""), ModelPrediction(label = 0, prediction = 0.7, dimensionValue = ""), ModelPrediction(label = 0, prediction = 0.6, dimensionValue = ""), ModelPrediction(label = 0, prediction = 0.4, dimensionValue = ""), ModelPrediction(label = 0, prediction = 0.2, dimensionValue = ""), ModelPrediction(label = 0, prediction = 0.1, dimensionValue = ""), ModelPrediction(label = 0, prediction = 0.0, dimensionValue = "")) val auc2 = StatsUtils.computeAUC(predictions2) Assert.assertEquals(StatsUtils.roundDouble(auc2), 0.89583) // Predictions with a perfect classifier val predictions3 = Seq( ModelPrediction(label = 1, prediction = 1.0, dimensionValue = ""), ModelPrediction(label = 1, prediction = 0.8, dimensionValue = ""), ModelPrediction(label = 1, prediction = 0.7, dimensionValue = ""), ModelPrediction(label = 1, prediction = 0.7, dimensionValue = ""), ModelPrediction(label = 0, prediction = 0.6, dimensionValue = ""), ModelPrediction(label = 0, prediction = 0.6, dimensionValue = ""), ModelPrediction(label = 0, prediction = 0.4, dimensionValue = ""), ModelPrediction(label = 0, prediction = 0.2, dimensionValue = ""), ModelPrediction(label = 0, prediction = 0.1, dimensionValue = ""), ModelPrediction(label = 0, prediction = 0.0, dimensionValue = "")) val auc3 = StatsUtils.computeAUC(predictions3) Assert.assertEquals(StatsUtils.roundDouble(auc3), 1.0) } } ================================================ FILE: lift/src/test/scala/com/linkedin/lift/mitigation/EOppUtilsTest.scala ================================================ package com.linkedin.lift.mitigation import com.linkedin.lift.lib.PositionBiasUtils.debiasPositiveLabelScores import com.linkedin.lift.lib.testing.TestUtils import com.linkedin.lift.lib.testing.TestUtils.{applyPositionBias, loadCsvData} import com.linkedin.lift.mitigation.EOppUtils._ import com.linkedin.lift.types.{ScoreWithAttribute, ScoreWithLabelAndAttribute, ScoreWithLabelAndPosition} import org.apache.spark.mllib.random.RandomRDDs.uniformRDD import org.apache.spark.sql.types._ import org.apache.spark.sql.{Row, SparkSession} import org.scalatest.matchers.must.Matchers.contain import org.scalatest.matchers.should.Matchers.convertToAnyShouldWrapper import org.testng.Assert import org.testng.annotations.Test /** * Tests for EOppUtils */ class EOppUtilsTest { final val spark: SparkSession = TestUtils.createSparkSession() @Test(description = "Transforming a single score using a transformation function given as a scala map") def transformScoreTest: Unit = { val transformation = Map(1.0 -> 2.0, 2.0 -> 4.0, 3.0 -> 5.0, 6.0 -> 11.0, 7.0 -> 11.0) val sortedKeys = transformation.keys.toList.sorted Assert.assertEquals(transformScore(0.0, sortedKeys, transformation), 2.0, 0) Assert.assertEquals(transformScore(1.0, sortedKeys, transformation), 2.0, 0) Assert.assertEquals(transformScore(1.5, sortedKeys, transformation), 3.0, 0) Assert.assertEquals(transformScore(3.2, sortedKeys, transformation), 5.4, 0) Assert.assertEquals(transformScore(6.2, sortedKeys, transformation), 11.0, 0) Assert.assertEquals(transformScore(10, sortedKeys, transformation), 11.0, 0) } @Test(description = "Transform scores of a dataset based on the corresponding attribute") def applyTransformationTest(): Unit = { import spark.implicits._ val attributeList = List("0", "1") val transformations = Map(attributeList(0) -> Map(1.0 -> 2.0, 2.0 -> 4.0, 3.0 -> 5.0, 6.0 -> 11.0, 7.0 -> 11.0), attributeList(1) -> Map(1.0 -> 10.0, 3.0 -> 4.0, 5.0 -> 2.0)) val data = List( ScoreWithAttribute(0, 0.0, attributeList(0)), ScoreWithAttribute(1, 0.0, attributeList(1)), ScoreWithAttribute(2, 1.5, attributeList(0)), ScoreWithAttribute(3, 1.5, attributeList(1)), ScoreWithAttribute(4, 4.0, attributeList(0)), ScoreWithAttribute(5, 4.0, attributeList(1)), ScoreWithAttribute(6, 6.0, attributeList(0)), ScoreWithAttribute(7, 6.0, attributeList(1))).toDS val transformedData = applyTransformation(data, attributeList, transformations) val expectedOutput = List( ScoreWithAttribute(0, 2.0, attributeList(0)), ScoreWithAttribute(1, 10.0, attributeList(1)), ScoreWithAttribute(2, 3.0, attributeList(0)), ScoreWithAttribute(3, 8.5, attributeList(1)), ScoreWithAttribute(4, 7.0, attributeList(0)), ScoreWithAttribute(5, 3.0, attributeList(1)), ScoreWithAttribute(6, 11.0, attributeList(0)), ScoreWithAttribute(7, 2.0, attributeList(1))).toDS transformedData.collect() should contain theSameElementsAs expectedOutput.collect } @Test(description = "Computing the empirical CDF function") def cdfTransformationTest(): Unit = { val schema = StructType(Array(StructField("score", DoubleType))) val scoreRDD = uniformRDD(spark.sparkContext, 10000L, 5, 12).map(Row(_)) val data = spark.createDataFrame(scoreRDD, schema) val numQuantiles = 4 val probabilities = Array.range(0, numQuantiles + 1).map(x => x.toDouble / numQuantiles) val cdf = cdfTransformation(data, probabilities, 1e-6) val sortedKeys = cdf.keys.toList.sorted Assert.assertEquals(transformScore(0.0, sortedKeys, cdf), 0.0, 0.01) Assert.assertEquals(transformScore(0.25, sortedKeys, cdf), 0.25, 0.01) Assert.assertEquals(transformScore(0.5, sortedKeys, cdf), 0.5, 0.01) Assert.assertEquals(transformScore(1.0, sortedKeys, cdf), 1.0, 0.01) } @Test() def adjustScaleTest(): Unit = { import spark.implicits._ val attributeList = List("0", "1") val transformations = Map(attributeList(0) -> Map(1.0 -> 2.0, 2.0 -> 4.0, 3.0 -> 6.0), attributeList(1) -> Map(1.0 -> 2.0, 2.0 -> 4.0, 3.0 -> 6.0)) val data = List( ScoreWithAttribute(0, 1.0, attributeList(0)), ScoreWithAttribute(1, 1.0, attributeList(1)), ScoreWithAttribute(2, 2.0, attributeList(0)), ScoreWithAttribute(3, 2.0, attributeList(1)), ScoreWithAttribute(4, 3.0, attributeList(0)), ScoreWithAttribute(5, 3.0, attributeList(1))).toDS val adjustedTransformation = adjustScale(data, attributeList, transformations, 3, 1e-6) val transformedData = applyTransformation(data, attributeList, adjustedTransformation) transformedData.collect() should contain theSameElementsAs data.collect } //@Test() // it takes around 2-5 minutes to run def eOppTransformationTest(): Unit = { // Training data and validation data are generated using the models described in the simulation section of // https://arxiv.org/abs/2006.11350. Each dataset contains 1 million rows // (20k sessions with 50 randomly selected items from a population of 50k items) and 5 columns // (itemId, sessionId, score, label, attribute). Please see equality-of-opportunity.md for further details import spark.implicits._ val attributeList = List("0", "1") val dataSchema = StructType(Array( StructField("itemId", IntegerType), StructField("sessionId", IntegerType), StructField("score", DoubleType), StructField("label", IntegerType), StructField("attribute", StringType), StructField("position", IntegerType, true)) ) val trainingDataWithoutPositionBias = loadCsvData(spark, "src/test/data/TrainingData.csv", dataSchema, ",") .as[ScoreWithLabelAndAttribute] val trainingData = applyPositionBias(trainingDataWithoutPositionBias) trainingData.persist // Step 1: Learning position bias corrected EOpp transformation using the training data val debiasedTrainingData = debiasPositiveLabelScores(positionBiasEstimationCutOff = 20, data = trainingData.as[ScoreWithLabelAndPosition], repeatTimes = 10, inflationRate = 10, numPartitions = 10, seed = 123) val transformations = eOppTransformation(debiasedTrainingData.as[ScoreWithLabelAndAttribute], attributeList, numQuantiles = 1000, relativeTolerance = 1e-4, true) // Step 2: Applying the EOpp transformation on the validation data val validationDataWithoutPositionBias = loadCsvData(spark, "src/test/data/ValidationData.csv", dataSchema, ",") .as[ScoreWithLabelAndAttribute] val validationDataWithoutLabel = validationDataWithoutPositionBias .drop("label").as[ScoreWithAttribute] val transformedValidationData = applyTransformation(validationDataWithoutLabel, attributeList, transformations, 10) val joinedData = transformedValidationData .join(validationDataWithoutPositionBias.select($"itemId", $"sessionId", $"label"), Seq("itemId", "sessionId"), "inner") .as[ScoreWithLabelAndAttribute] val transformedValidationDataWithPositionBias = applyPositionBias(joinedData) .filter($"label" === 1) // Step 3: checking EOpp in the transformed validation data with position bias val numQuantiles = 1000 val relativeTolerance = 1e-4 val probabilities = Array.range(0, numQuantiles + 1).map(x => x.toDouble / numQuantiles) val attribute0Quantiles = transformedValidationDataWithPositionBias.filter($"attribute" === "0") .stat.approxQuantile("score", probabilities, relativeTolerance) val attribute1Quantiles = transformedValidationDataWithPositionBias.filter($"attribute" === "1") .stat.approxQuantile("score", probabilities, relativeTolerance) val wasserstein2DistanceEOpp = attribute0Quantiles.zip(attribute1Quantiles) .map(x => math.pow(x._1 - x._2, 2)).sum / numQuantiles Assert.assertEquals(wasserstein2DistanceEOpp, 0, 0.05) // Step 4: checking if the transformed score distribution is the same as the score distribution before //transformation val quantilesAfterTransformation = transformedValidationData .stat.approxQuantile("score", probabilities, relativeTolerance) val quantilesBeforeTransformation = validationDataWithoutLabel .stat.approxQuantile("score", probabilities, relativeTolerance) val wasserstein2DistanceRescaling = quantilesAfterTransformation.zip(quantilesBeforeTransformation) .map(x => math.pow(x._1 - x._2, 2)).sum / numQuantiles Assert.assertEquals(wasserstein2DistanceRescaling, 0, 0.05) } } ================================================ FILE: lift/src/test/scala/com/linkedin/lift/types/BenefitMapTest.scala ================================================ package com.linkedin.lift.types import org.testng.Assert import org.testng.annotations.Test /** * Tests for the BenefitMap class */ class BenefitMapTest { val EPS = 1e-12 val testBenefits: BenefitMap = BenefitMap(benefitType = "x", entries = Map( Map("gender" -> "MALE") -> 0.9, Map("gender" -> "FEMALE") -> 0.75, Map("gender" -> "UNKNOWN") -> 0.6)) val testBenefitsEqual: BenefitMap = BenefitMap(benefitType = "y", entries = Map( Map("gender" -> "MALE") -> 0.9, Map("gender" -> "FEMALE") -> 0.9, Map("gender" -> "UNKNOWN") -> 0.9)) @Test(description = "Benefits mean and variance") def testMean(): Unit = { Assert.assertEquals(testBenefits.mean, 0.75) Assert.assertTrue(math.abs(testBenefits.variance - 0.015) < EPS) Assert.assertEquals(testBenefitsEqual.mean, 0.9) Assert.assertTrue(math.abs(testBenefitsEqual.variance) < EPS) } @Test(description = "Inequality measures - unequal benefits") def testInequalityMeasuresUnequalBenefits(): Unit = { val actualGEI20 = testBenefits.computeGeneralizedEntropyIndex(2.0) val expectedGEI20 = 0.04 / 3 Assert.assertTrue(math.abs(actualGEI20 - expectedGEI20) < EPS) val actualGEI10 = testBenefits.computeGeneralizedEntropyIndex(1.0) val actualTheilT = testBenefits.computeTheilTIndex val expectedGEI10 = (1.2 * math.log(1.2) + 0.8 * math.log(0.8)) / 3 Assert.assertTrue(math.abs(actualGEI10 - expectedGEI10) < EPS) Assert.assertTrue(math.abs(actualTheilT - expectedGEI10) < EPS) val actualGEI00 = testBenefits.computeGeneralizedEntropyIndex(0) val actualTheilL = testBenefits.computeTheilLIndex val expectedGEI00 = - (math.log(1.2) + math.log(0.8)) / 3 Assert.assertTrue(math.abs(actualGEI00 - expectedGEI00) < EPS) Assert.assertTrue(math.abs(actualTheilL - expectedGEI00) < EPS) val actualGEI05 = testBenefits.computeGeneralizedEntropyIndex(0.5) val expectedGEI05 = (2 - math.sqrt(1.2) - math.sqrt(0.8)) * 4 / 3 Assert.assertTrue(math.abs(actualGEI05 - expectedGEI05) < EPS) val actualAtkinson10 = testBenefits.computeAtkinsonIndex(1.0) val expectedAtkinson10 = 1 - math.exp(-expectedGEI00) Assert.assertTrue(math.abs(actualAtkinson10 - expectedAtkinson10) < EPS) val actualAtkinson00 = testBenefits.computeAtkinsonIndex(0) Assert.assertTrue(math.abs(actualAtkinson00) < EPS) val actualAtkinson05 = testBenefits.computeAtkinsonIndex(0.5) val expectedAtkinson05 = 1 - math.pow(math.sqrt(1.2) + math.sqrt(0.8) + 1, 2) / 9 Assert.assertTrue(math.abs(actualAtkinson05 - expectedAtkinson05) < EPS) val actualCOV = testBenefits.computeCoefficientOfVariation val expectedCOV = math.sqrt(0.015) / 0.75 Assert.assertTrue(math.abs(actualCOV - expectedCOV) < EPS) } @Test(description = "Inequality measures - equal benefits") def testInequalityMeasuresEqualBenefits(): Unit = { val actualGEI20 = testBenefitsEqual.computeGeneralizedEntropyIndex(2.0) Assert.assertTrue(math.abs(actualGEI20) < EPS) val actualGEI10 = testBenefitsEqual.computeGeneralizedEntropyIndex(1.0) val actualTheilT = testBenefitsEqual.computeTheilTIndex Assert.assertTrue(math.abs(actualGEI10) < EPS) Assert.assertTrue(math.abs(actualTheilT) < EPS) val actualGEI00 = testBenefitsEqual.computeGeneralizedEntropyIndex(0) val actualTheilL = testBenefitsEqual.computeTheilLIndex Assert.assertTrue(math.abs(actualGEI00) < EPS) Assert.assertTrue(math.abs(actualTheilL) < EPS) val actualGEI05 = testBenefitsEqual.computeGeneralizedEntropyIndex(0.5) Assert.assertTrue(math.abs(actualGEI05) < EPS) val actualAtkinson10 = testBenefitsEqual.computeAtkinsonIndex(1.0) Assert.assertTrue(math.abs(actualAtkinson10) < EPS) val actualAtkinson00 = testBenefitsEqual.computeAtkinsonIndex(0) Assert.assertTrue(math.abs(actualAtkinson00) < EPS) val actualAtkinson05 = testBenefitsEqual.computeAtkinsonIndex(0.5) Assert.assertTrue(math.abs(actualAtkinson05) < EPS) val actualCOV = testBenefitsEqual.computeCoefficientOfVariation Assert.assertTrue(math.abs(actualCOV) < EPS) } @Test(description = "Compute overall fairness metrics") def testComputeOverallMetrics(): Unit = { val actualResults = testBenefits.computeOverallMetrics(Map( "GENERALIZED_ENTROPY_INDEX" -> "0.5", "THEIL_T_INDEX" -> "", "THEIL_L_INDEX" -> "")) Assert.assertEquals(actualResults, Seq( FairnessResult(resultType = "Benefit Map for x", resultValOpt = None, constituentVals = Map(Map("gender" -> "UNKNOWN") -> 0.6, Map("gender" -> "MALE") -> 0.9, Map("gender" -> "FEMALE") -> 0.75)), FairnessResult(resultType = "x: GENERALIZED_ENTROPY_INDEX", parameters = "0.5", constituentVals = Map(), resultValOpt = Some(0.013503591986335994)), FairnessResult(resultType = "x: THEIL_T_INDEX", constituentVals = Map(), resultValOpt = Some(0.013423675700459214)), FairnessResult(resultType = "x: THEIL_L_INDEX", constituentVals = Map(), resultValOpt = Some(0.013607331506751752)))) } @Test(description = "BenefitMap computation") def testCompute(): Unit = { val predictions = Seq( ModelPrediction(label = 1, prediction = 1, dimensionValue = "MALE"), ModelPrediction(label = 1, prediction = 0, dimensionValue = "MALE"), ModelPrediction(label = 0, prediction = 0, dimensionValue = "MALE"), ModelPrediction(label = 0, prediction = 1, dimensionValue = "MALE"), ModelPrediction(label = 0, prediction = 0, dimensionValue = "FEMALE"), ModelPrediction(label = 0, prediction = 1, dimensionValue = "FEMALE"), ModelPrediction(label = 0, prediction = 1, dimensionValue = "FEMALE"), ModelPrediction(label = 0, prediction = 1, dimensionValue = "FEMALE"), ModelPrediction(label = 1, prediction = 1, dimensionValue = "UNKNOWN"), ModelPrediction(label = 0, prediction = 1, dimensionValue = "UNKNOWN"), ModelPrediction(label = 0, prediction = 1, dimensionValue = "UNKNOWN"), ModelPrediction(label = 0, prediction = 1, dimensionValue = "UNKNOWN")) val actualBenefitMap1 = BenefitMap.compute(predictions, "gender", "PRECISION") Assert.assertEquals(actualBenefitMap1, BenefitMap( benefitType = "PRECISION", entries = Map(Map("gender" -> "MALE") -> 0.5, Map("gender" -> "FEMALE") -> 0.0, Map("gender" -> "UNKNOWN") -> 0.25))) val actualBenefitMap2 = BenefitMap.compute(predictions, "gender", "com.linkedin.lift.lib.testing.TestCustomMetric") Assert.assertEquals(actualBenefitMap2, BenefitMap( benefitType = "com.linkedin.lift.lib.testing.TestCustomMetric", entries = Map(Map("gender" -> "MALE") -> 1.0, Map("gender" -> "FEMALE") -> 1.0, Map("gender" -> "UNKNOWN") -> 1.0))) } @Test(description = "BenefitMap computation for ranking metric") def testComputeRanking(): Unit = { val predictions = Seq( ModelPrediction(label = 1, prediction = 0.35, dimensionValue = "MALE", groupId = "1", rank = 1), ModelPrediction(label = 0, prediction = 0.25, dimensionValue = "MALE", groupId = "1", rank = 2), ModelPrediction(label = 1, prediction = 0.11, dimensionValue = "FEMALE", groupId = "1", rank = 3), ModelPrediction(label = 1, prediction = 0.88, dimensionValue = "MALE", groupId = "2", rank = 1), ModelPrediction(label = 0, prediction = 0.65, dimensionValue = "FEMALE", groupId = "2", rank = 2), ModelPrediction(label = 0, prediction = 0.22, dimensionValue = "MALE", groupId = "2", rank = 3), ModelPrediction(label = 1, prediction = 0.10, dimensionValue = "FEMALE", groupId = "2", rank = 4), ModelPrediction(label = 1, prediction = 0.11, dimensionValue = "MALE", groupId = "3", rank = 1)) val actualBenefitMap = BenefitMap.compute(predictions, "gender", "PRECISION/1@25") Assert.assertEquals(actualBenefitMap, BenefitMap( benefitType = "PRECISION/1@25", entries = Map(Map("gender" -> "MALE") -> 0.6666666666666666, Map("gender" -> "FEMALE") -> 0.75))) } } ================================================ FILE: lift/src/test/scala/com/linkedin/lift/types/DistributionTest.scala ================================================ package com.linkedin.lift.types import com.linkedin.lift.lib.testing.TestValues import org.testng.Assert import org.testng.annotations.Test /** * Tests for the Distribution class */ class DistributionTest { @Test(description = "Distribution sum") def testSum(): Unit = { val testDist = Distribution(Map( Map("gender" -> "MALE", "age" -> "20") -> 24.0, Map("gender" -> "FEMALE", "age" -> "20") -> 20.0, Map("gender" -> "FEMALE", "age" -> "40") -> 4.0, Map("gender" -> "MALE", "age" -> "40") -> 10.0)) Assert.assertEquals(testDist.sum, 58.0) val testDistEmpty = Distribution(Map()) Assert.assertEquals(testDistEmpty.sum, 0.0) } @Test(description = "Zip two different distributions - no overlap") def testZipNoOverlap(): Unit = { val testDist1 = Distribution(Map( Map("gender" -> "MALE", "age" -> "20") -> 24.0, Map("gender" -> "FEMALE", "age" -> "40") -> 4.0)) val testDist2 = Distribution(Map( Map("gender" -> "FEMALE", "age" -> "20") -> 20.0, Map("gender" -> "MALE", "age" -> "40") -> 10.0)) val expectedZip12 = Seq( (Map("gender" -> "MALE", "age" -> "20"), 24.0, 0.0), (Map("gender" -> "FEMALE", "age" -> "40"), 4.0, 0.0), (Map("gender" -> "FEMALE", "age" -> "20"), 0.0, 20.0), (Map("gender" -> "MALE", "age" -> "40"), 0.0, 10.0)) Assert.assertEquals(testDist1.zip(testDist2), expectedZip12) val expectedZip21 = Seq( (Map("gender" -> "FEMALE", "age" -> "20"), 20.0, 0.0), (Map("gender" -> "MALE", "age" -> "40"), 10.0, 0.0), (Map("gender" -> "MALE", "age" -> "20"), 0.0, 24.0), (Map("gender" -> "FEMALE", "age" -> "40"), 0.0, 4.0)) Assert.assertEquals(testDist2.zip(testDist1), expectedZip21) } @Test(description = "Zip two different distributions - with overlap") def testZipWithOverlap(): Unit = { val testDist1 = Distribution(Map( Map("gender" -> "MALE", "age" -> "20") -> 24.0, Map("gender" -> "MALE", "age" -> "40") -> 12.0, Map("gender" -> "FEMALE", "age" -> "40") -> 4.0)) val testDist2 = Distribution(Map( Map("gender" -> "FEMALE", "age" -> "20") -> 20.0, Map("gender" -> "FEMALE", "age" -> "40") -> 5.0, Map("gender" -> "MALE", "age" -> "40") -> 10.0)) val expectedZip12 = Seq( (Map("gender" -> "MALE", "age" -> "20"), 24.0, 0.0), (Map("gender" -> "MALE", "age" -> "40"), 12.0, 10.0), (Map("gender" -> "FEMALE", "age" -> "40"), 4.0, 5.0), (Map("gender" -> "FEMALE", "age" -> "20"), 0.0, 20.0)) Assert.assertEquals(testDist1.zip(testDist2), expectedZip12) val expectedZip21 = Seq( (Map("gender" -> "FEMALE", "age" -> "20"), 20.0, 0.0), (Map("gender" -> "FEMALE", "age" -> "40"), 5.0, 4.0), (Map("gender" -> "MALE", "age" -> "40"), 10.0, 12.0), (Map("gender" -> "MALE", "age" -> "20"), 0.0, 24.0)) Assert.assertEquals(testDist2.zip(testDist1), expectedZip21) } @Test(description = "Marginal distribution computation") def testComputeMarginal(): Unit = { val inputDistributionGenderLabel = Distribution(Map( Map("gender" -> "MALE", "label" -> "0") -> 10.0, Map("gender" -> "MALE", "label" -> "1") -> 3.0, Map("gender" -> "UNKNOWN", "label" -> "0") -> 4.0, Map("gender" -> "FEMALE", "label" -> "0") -> 5.0, Map("gender" -> "FEMALE", "label" -> "1") -> 2.0)) val expectedMarginalDistributionGender = Distribution(Map( Map("gender" -> "MALE") -> 13.0, Map("gender" -> "UNKNOWN") -> 4.0, Map("gender" -> "FEMALE") -> 7.0)) val expectedMarginalDistributionLabel = Distribution(Map( Map("label" -> "0") -> 19.0, Map("label" -> "1") -> 5.0)) val actualMarginalDistributionGender = inputDistributionGenderLabel.computeMarginal(Set("gender")) Assert.assertEquals(actualMarginalDistributionGender, expectedMarginalDistributionGender) val actualMarginalDistributionLabel = inputDistributionGenderLabel.computeMarginal(Set("label")) Assert.assertEquals(actualMarginalDistributionLabel, expectedMarginalDistributionLabel) // Ensure that the marginal distribution is identical to the original // distribution when all dimensions are included val actualMarginalDistributionGenderLabel = inputDistributionGenderLabel.computeMarginal(Set("gender", "label")) Assert.assertEquals(actualMarginalDistributionGenderLabel, inputDistributionGenderLabel) } @Test(description = "Distribution to DF conversion") def testToDF(): Unit = { val testDist1 = Distribution(Map( Map("gender" -> "MALE", "age" -> "20") -> 24.0, Map("gender" -> "MALE", "age" -> "40", "label" -> "1") -> 12.0, Map("gender" -> "FEMALE", "age" -> "40") -> 4.0)) val df = testDist1.toDF(TestValues.spark) // Ensure column names are correct Assert.assertEquals(df.schema.fieldNames.toSeq, Seq("gender", "age", "label", "count")) val actualDFSeq: Seq[Seq[Any]] = df.collect.toSeq.map { row => row.toSeq.map { Option(_).fold("") { _.toString } } } val expectedDFSeq: Seq[Seq[Any]] = Seq( Seq("MALE", "20", "", "24.0"), Seq("MALE", "40", "1", "12.0"), Seq("FEMALE", "40", "", "4.0")) // Ensure that datasets match Assert.assertEquals(actualDFSeq, expectedDFSeq) } @Test(description = "Distribution computation") def testCompute(): Unit = { val actualDistributionGender = Distribution.compute(TestValues.df, Set("gender")) val expectedDistributionGender = Distribution(Map( Map("gender" -> "MALE") -> 5, Map("gender" -> "FEMALE") -> 4, Map("gender" -> "UNKNOWN") -> 1)) Assert.assertEquals(actualDistributionGender, expectedDistributionGender) val actualDistributionLabel = Distribution.compute(TestValues.df, Set("label")) val expectedDistributionLabel = Distribution(Map( Map("label" -> "0") -> 6, Map("label" -> "1") -> 4)) Assert.assertEquals(actualDistributionLabel, expectedDistributionLabel) val actualDistributionGenderLabel = Distribution.compute(TestValues.df, Set("gender", "label")) val expectedDistributionGenderLabel = Distribution(Map( Map("gender" -> "MALE", "label" -> "0") -> 3.0, Map("gender" -> "MALE", "label" -> "1") -> 2.0, Map("gender" -> "UNKNOWN", "label" -> "0") -> 1.0, Map("gender" -> "FEMALE", "label" -> "0") -> 2.0, Map("gender" -> "FEMALE", "label" -> "1") -> 2.0)) Assert.assertEquals(actualDistributionGenderLabel, expectedDistributionGenderLabel) } } ================================================ FILE: lift/src/test/scala/com/linkedin/lift/types/FairnessResultTest.scala ================================================ package com.linkedin.lift.types import com.linkedin.lift.lib.testing.TestValues import org.testng.Assert import org.testng.annotations.Test /** * Tests for the FairnessResult class */ class FairnessResultTest { val EPS = 1e-12 @Test(description = "FairnessResult to BenefitMap translation") def testToBenefitMap(): Unit = { val fairnessResult = FairnessResult( resultType = "DEMOGRAPHIC_PARITY", resultValOpt = None, constituentVals = Map( Map("gender1" -> "MALE", "gender2" -> "FEMALE") -> 0.01, Map("gender1" -> "FEMALE", "gender2" -> "UNKNOWN") -> 0.03, Map("gender1" -> "MALE", "gender2" -> "UNKNOWN") -> 0.02), additionalStats = Map("MALE" -> 0.03, "FEMALE" -> 0.02, "UNKNOWN" -> 0.05)) val actualBenefitMap = fairnessResult.toBenefitMap Assert.assertEquals(actualBenefitMap, BenefitMap( benefitType = "DEMOGRAPHIC_PARITY", entries = Map( Map("gender1" -> "MALE", "gender2" -> "FEMALE") -> 0.01, Map("gender1" -> "FEMALE", "gender2" -> "UNKNOWN") -> 0.03, Map("gender1" -> "MALE", "gender2" -> "UNKNOWN") -> 0.02))) val actualGEI = (3.0 - math.pow(0.5, 0.5) - math.pow(1.5, 0.5) - math.pow(1.0, 0.5)) / 0.75 Assert.assertTrue(math.abs( actualBenefitMap.computeGeneralizedEntropyIndex(0.5) - actualGEI) < EPS) } @Test(description = "FairnessResults to DataFrame translation") def testToDF(): Unit = { val results = Seq( FairnessResult(resultType = "KL_DIVERGENCE", parameters = Distribution(Map( Map("gender" -> "FEMALE", "label" -> "1.0") -> 0.16666, Map("gender" -> "UNKNOWN", "label" -> "0.0") -> 0.16666, Map("gender" -> "UNKNOWN", "label" -> "1.0") -> 0.16666, Map("gender" -> "MALE", "label" -> "0.0") -> 0.16666, Map("gender" -> "FEMALE", "label" -> "0.0") -> 0.16666, Map("gender" -> "MALE", "label" -> "1.0") -> 0.16666)).toString, resultValOpt = Some(0.13852315605014068), constituentVals = Map()), FairnessResult(resultType = "DEMOGRAPHIC_PARITY", resultValOpt = None, constituentVals = Map( Map("gender1" -> "FEMALE", "gender2" -> "UNKNOWN") -> 0.16667, Map("gender1" -> "FEMALE", "gender2" -> "MALE") -> 0.26667, Map("gender1" -> "UNKNOWN", "gender2" -> "MALE") -> 0.1), additionalStats = Map("FEMALE" -> 0.33333, "UNKNOWN" -> 0.5, "MALE" -> 0.6))) val actualDFSeq = FairnessResult.toDF(TestValues.spark, results) .collect .toSeq.map(_.toString) Assert.assertEquals(actualDFSeq, Seq("[KL_DIVERGENCE,Distribution(Map(Map(" + "gender -> FEMALE, label -> 1.0) -> 0.16666, Map(gender -> UNKNOWN, " + "label -> 0.0) -> 0.16666, Map(gender -> UNKNOWN, label -> 1.0) -> 0.16666, " + "Map(gender -> MALE, label -> 0.0) -> 0.16666, Map(gender -> FEMALE, " + "label -> 0.0) -> 0.16666, Map(gender -> MALE, label -> 1.0) -> 0.16666))," + "0.13852315605014068,Map(),Map()]", "[DEMOGRAPHIC_PARITY,,null,Map(Map(gender1 -> FEMALE, gender2 -> UNKNOWN) -> 0.16667, " + "Map(gender1 -> FEMALE, gender2 -> MALE) -> 0.26667, Map(gender1 -> UNKNOWN, " + "gender2 -> MALE) -> 0.1),Map(FEMALE -> 0.33333, UNKNOWN -> 0.5, MALE -> 0.6)]")) } } ================================================ FILE: lift/src/test/scala/com/linkedin/lift/types/ModelPredictionTest.scala ================================================ package com.linkedin.lift.types import com.linkedin.lift.lib.testing.TestValues import org.testng.Assert import org.testng.annotations.Test /** * Tests for the ModelPrediction class */ class ModelPredictionTest { @Test(description = "Compute ModelPrediction instances from a DF") def testCompute(): Unit = { val actualPredictions = ModelPrediction.compute(TestValues.df, "label", "predicted", "", "gender") val expectedPredictions = TestValues.testData .sortBy(- _.predicted.toDouble) .zipWithIndex .map { case (data, idx) => ModelPrediction( label = data.label.toDouble, prediction = data.predicted.toDouble, rank = idx + 1, dimensionValue = data.gender) } Assert.assertEquals(actualPredictions, expectedPredictions) } @Test(description = "Compute ModelPrediction instances from a DF with groups") def testComputeWithGroups(): Unit = { val actualPredictions = ModelPrediction.compute(TestValues.df2, "label", "predicted", "qid", "gender") val expectedPredictions = TestValues.testData2 .groupBy(_.qid) .flatMap { case (_, dataPts) => dataPts.sortBy(- _.predicted.toDouble) .zipWithIndex.map { case (data, idx) => ModelPrediction( label = data.label.toDouble, prediction = data.predicted.toDouble, groupId = data.qid, rank = idx + 1, dimensionValue = data.gender) } } Assert.assertEquals(actualPredictions, expectedPredictions) } } ================================================ FILE: model-fairness.md ================================================ # Model-level Fairness Metrics At a high level, these metrics require the score of the model and the corresponding protected attribute value. There are some metrics that also make use of the corresponding label as well. Now, a model can produce a raw score or a probability. Our fairness metrics specifically deal with models that output probabilities that can be treated as ![P(\hat{Y}(X) = 1)](https://render.githubusercontent.com/render/math?math=P(%5Chat%7BY%7D(X)%20%3D%201)) (if the scores are raw scores, we pass it through a sigmoid link function to interpret it as a probability). If your models do not output binary prediction probabilities, or these probabilities are not appropriate to be interpreted as shown, you will need to preprocess the scores before using the library. This can be done inline, in the Spark job that computes the model-related fairness metrics. If your model is being used for binary classification (and not just for its scores), an optional threshold value can be provided, which will be used to binarize the predictions. If a threshold value is not specified, the probabilities ![P(\hat{Y}(X) = 1)](https://render.githubusercontent.com/render/math?math=P(%5Chat%7BY%7D(X)%20%3D%201)) are used to compute expected TP, FP, TN and FN counts as needed. We provide here a list of the various metrics available for measuring fairness of ML models, as well as a short description of each of them. 1. **Metrics that compare against a given reference distribution:** These metrics involve computing some measure of distance or divergence from a given reference distribution provided by the user. The library supports only the `UNIFORM` distribution out of the box (all `score-protectedAttribute` combinations must have equal number of records), but users may supply their own distribution (such as an apriori known gender distribution etc.). These metrics are similar to those computed on the training dataset. The only difference is that we make use of the predictions/scores instead of the labels, ie., ![\hat{Y}(X)](https://render.githubusercontent.com/render/math?math=%5Chat%7BY%7D(X)) instead of ![Y(X)](https://render.githubusercontent.com/render/math?math=Y(X)). For the most up-to-date documentation on the supported metrics, please look at the link [here](lift/src/main/scala/com/linkedin/lift/lib/DivergenceUtils.scala), and look for the `computeDistanceMetrics` method as the starting point. The following metrics fall under this category: 1. **Skews:** Computes the logarithm of the ratio of the observed value to the expected value. For example, if we are dealing with score-gender distributions, this metric computes ![\log\left(\frac{(0.0, MALE)_{obs}}{(0.0, MALE)_{exp}}\right), \log\left(\frac{(1.0, MALE)_{obs}}{(1.0, MALE)_{exp}}\right), \log\left(\frac{(0.0, FEMALE)_{obs}}{(0.0, FEMALE)_{exp}}\right), \log\left(\frac{(1.0, FEMALE)_{obs}}{(1.0, FEMALE)_{exp}}\right)](https://render.githubusercontent.com/render/math?math=%5Clog%5Cleft(%5Cfrac%7B(0.0%2C%20MALE)_%7Bobs%7D%7D%7B(0.0%2C%20MALE)_%7Bexp%7D%7D%5Cright)%2C%20%5Clog%5Cleft(%5Cfrac%7B(1.0%2C%20MALE)_%7Bobs%7D%7D%7B(1.0%2C%20MALE)_%7Bexp%7D%7D%5Cright)%2C%20%5Clog%5Cleft(%5Cfrac%7B(0.0%2C%20FEMALE)_%7Bobs%7D%7D%7B(0.0%2C%20FEMALE)_%7Bexp%7D%7D%5Cright)%2C%20%5Clog%5Cleft(%5Cfrac%7B(1.0%2C%20FEMALE)_%7Bobs%7D%7D%7B(1.0%2C%20FEMALE)_%7Bexp%7D%7D%5Cright)) 2. **Infinity Norm Distance:** Computes the Chebyshev Distance between the observed and reference distribution. It equals the maximum difference between the two distributions. 3. **Total Variation Distance:** Computes the Total Variation Distance between the observed and reference distribution. It is equal to half the L1 distance between the two distributions. 4. **JS Divergence:** The Jensen-Shannon Divergence between the observed and reference distribution. Suppose that the average of these two distributions is given by M. Then, the JS Divergence is the average of the KL Divergences between the observed distribution and M, and the reference distribution and M. 5. **KL Divergence:** The Kullback-Leibler Divergence between the observed and reference distribution. It is the expectation (over the observed distribution) of the logarithmic differences between the observed and reference distributions. The latter is the Skew we measure above. 2. **Metrics computed on the observed distribution only:** These metrics compute some notion of distance or divergence between various segments of the observed distribution. For the most up-to-date documentation on the supported metrics, please look at the link [here](lift/src/main/scala/com/linkedin/lift/lib/DivergenceUtils.scala), and look for the `computeDistanceMetrics` method as the starting point. The following metrics fall under this category: 1. **Demographic Parity:** It measures the difference between the conditional expected value of the prediction (given one protected attribute value) and the conditional expected value of the prediction (given the other protected attribute value). This is measured for all pairs of protected attribute values. ![DP_{(g_1, g_2)} = E\[\hat{Y}(X)|G=g_1\] - E\[\hat{Y}(X)|G=g_2\] = P(\hat{Y}(X)=1|G=g_1) - P(\hat{Y}(X)=1|G=g_2)](https://render.githubusercontent.com/render/math?math=DP_%7B(g_1%2C%20g_2)%7D%20%3D%20E%5B%5Chat%7BY%7D(X)%7CG%3Dg_1%5D%20-%20E%5B%5Chat%7BY%7D(X)%7CG%3Dg_2%5D%20%3D%20P(%5Chat%7BY%7D(X)%3D1%7CG%3Dg_1)%20-%20P(%5Chat%7BY%7D(X)%3D1%7CG%3Dg_2)) This metric captures the idea that different protected groups should have similar acceptance rates. While this is desirable in an ideal scenario (and is related to the [80% Labor Law rule](https://en.wikipedia.org/wiki/Disparate_impact), this might not always be true. For example, various socio-economic factors might contribute towards having different acceptance rates for different groups. That is, the difference is not due to the protected group itself, but rather due to other meaningful, but correlated variables. Furthermore, even if we are dealing with a scenario where DP is desirable, it does not deal with model performance at all. We might as well have a second model predict '1' randomly for one group (with a probability equal to the acceptance rate of the other group) to achieve DP. Thus, attempting to optimize for DP directly might not be a good goal, but using it to inform decisions is nevertheless helpful. 2. **Equalized Odds:** It measures the difference between the conditional expected value of the prediction (given one protected attribute value and its label) and the conditional expected value of the prediction (given the other protected attribute value and its label). This is measured for all pairs of protected attribute values and label. ![EO_{(g_1, g_2, y)} = E\[\hat{Y}(X)|Y=y,G=g_1\] - E\[\hat{Y}(X)|Y=y,G=g_2\] = P(\hat{Y}(X)=1|Y=y,G=g_1) - P(\hat{Y}(X)=1|Y=y,G=g_2)](https://render.githubusercontent.com/render/math?math=EO_%7B(g_1%2C%20g_2%2C%20y)%7D%20%3D%20E%5B%5Chat%7BY%7D(X)%7CY%3Dy%2CG%3Dg_1%5D%20-%20E%5B%5Chat%7BY%7D(X)%7CY%3Dy%2CG%3Dg_2%5D%20%3D%20P(%5Chat%7BY%7D(X)%3D1%7CY%3Dy%2CG%3Dg_1)%20-%20P(%5Chat%7BY%7D(X)%3D1%7CY%3Dy%2CG%3Dg_2)) 3. **Statistical Tests for Fairness:** This deals with comparing a given model performance metric between two different protected groups. For example, comparing the AUC for men vs AUC for women. We need to be able to say if this difference is statistically significant, and we also need it to be metric-agnostic. We achieve this using Permutation Testing. Since this is a non-parametric statistical test, it can be slow, so users can control the sample size and the number of trials to run. The test provides a p-value and a measure of standard error (for the p-value) as well. We support AUC, Precision, Recall, TNR, FNR and FPR out-of-the-box (the full list can be found by visiting `StatsUtils.scala` and looking at the `getMetricFn` method), and also support any user-defined custom metrics (it needs to extend [CustomMetric.scala](lift/src/main/scala/com/linkedin/lift/types/CustomMetric.scala)). More details about the test itself can be found in the `permutationTest` method defined [here](lift/src/main/scala/com/linkedin/lift/lib/PermutationTestUtils.scala). To cite this work, please refer to the 'Citations' section of the [README](README.md). 4. **Aggregate Metrics:** These metrics are useful to obtain higher level (or second order) notions of inequality, when comparing multiple per-protected-attribute-value inequality metrics. For example, these could be used to say if one set of Skews measured is more equally distributed that another set of Skews. These lower-level metrics are called benefit vectors, and the aggregate metrics provide a notion of how uniformly these inequalities are distributed. Note that these metrics capture inequalities within the vector. Thus, going by this metric alone is not sufficient. For example, take a benefit vector that captures Demographic Parity differences between (MALE, FEMALE), (FEMALE, UNKNOWN), and (MALE, UNKNOWN). Suppose that the vector for one distribution is (a, 2a, 3a) and the other is (0.5a, 1.5a, 2a). Even though the individual differences are smaller in the second distribution (for each pair of protected attribute values), an aggregate metric will deem it to be more unfair than the former because the differences in the elements of the vector are more drastic than the other (for the first one, the ratio is 1:2:3 while for the second it is 1:3:4). However, the latter has better Demographic Parity. Hence, there may be conflicting notions of fairness being measured, and it is up to the end user to identify which one they would like to focus on. We divide these into two: `distanceBenefitMetrics` and `performanceBenefitMetrics`. The former computes distance and divergence metrics (mentioned in 1 and 2) and uses these as the benefit vectors for aggregate metrics computation. The latter uses model performance metrics (such as AUC, TPR, FPR for different protected groups) as the benefit vector for aggregate metrics computation. There is no difference in the aggregate computation itself; this distinction is used by LiFT to just be more specific about what needs to be computed. The aggregate metrics can be computed for performance metrics supported out-of-the-box, as well as user-defined custom ones, as mentioned in 3. For the most up-to-date documentation on the supported metrics, please look at the link [here](lift/src/main/scala/com/linkedin/lift/types/BenefitMap.scala), and look for the `computeMetric` method as the starting point. The following aggregate metrics are available: 1. **Generalized Entropy Index:** Computes an average of the relative benefits based on some input parameters. 2. **Atkinsons Index:** A derivative of the Generalized Entropy Index. Used more commonly in the field of economics. 3. **Theil's L Index:** The Generalized Entropy Index when its parameter is set to 0. It is more sensitive to differences at the lower end of the distribution (the benefit vector values). 4. **Theil's T Index:** The Generalized Entropy Index when its parameter is set to 1. It is more sensitive to differences at the higher end of the distribution (the benefit vector values). 5. **Coefficient of Variation:** A derivative of the Generalized Entropy Index. It computes the value of the standard deviation divided by the mean of the benefit vector. ================================================ FILE: settings.gradle ================================================ /* * This file was generated by the Gradle 'init' task. * * The settings file is used to specify which projects to include in your build. * * Detailed information about configuring a multi-project build in Gradle can be found * in the user manual at https://docs.gradle.org/5.6.2/userguide/multi_project_builds.html */ rootProject.name = 'lift' include 'lift' ================================================ FILE: version.properties ================================================ # Version of the produced binaries. # The version is inferred by shipkit-auto-version Gradle plugin (https://github.com/shipkit/shipkit-auto-version) version=0.3.*