Full Code of scikit-learn-contrib/imbalanced-learn for AI

master 6cdb18e9b8fc cached

242 files

1.1 MB

306.1k tokens

786 symbols

1 requests

Download .txt

Showing preview only (1,228K chars total). Download the full file or copy to clipboard to get everything.

Repository: scikit-learn-contrib/imbalanced-learn
Branch: master
Commit: 6cdb18e9b8fc
Files: 242
Total size: 1.1 MB

Directory structure:
gitextract_8wwn0p4o/

├── .circleci/
│   └── config.yml
├── .coveragerc
├── .github/
│   ├── ISSUE_TEMPLATE/
│   │   ├── bug_report.md
│   │   ├── documentation-improvement.md
│   │   ├── feature_request.md
│   │   ├── other--blank-template-.md
│   │   ├── question.md
│   │   └── usage-question.md
│   ├── ISSUE_TEMPLATE.md
│   ├── PULL_REQUEST_TEMPLATE.md
│   ├── check-changelog.yml
│   ├── dependabot.yml
│   └── workflows/
│       ├── circleci-artifacts-redirector.yml
│       ├── linters.yml
│       └── tests.yml
├── .gitignore
├── .pre-commit-config.yaml
├── AUTHORS.rst
├── CONTRIBUTING.md
├── LICENSE
├── MANIFEST.in
├── README.rst
├── build_tools/
│   └── circle/
│       ├── build_doc.sh
│       ├── checkout_merge_commit.sh
│       ├── linting.sh
│       └── push_doc.sh
├── conftest.py
├── doc/
│   ├── Makefile
│   ├── _static/
│   │   ├── css/
│   │   │   └── imbalanced-learn.css
│   │   ├── img/
│   │   │   └── logo.xcf
│   │   └── js/
│   │       └── copybutton.js
│   ├── _templates/
│   │   ├── class.rst
│   │   ├── function.rst
│   │   ├── numpydoc_docstring.rst
│   │   └── sidebar-search-bs.html
│   ├── about.rst
│   ├── bibtex/
│   │   └── refs.bib
│   ├── combine.rst
│   ├── common_pitfalls.rst
│   ├── conf.py
│   ├── datasets/
│   │   └── index.rst
│   ├── developers_utils.rst
│   ├── ensemble.rst
│   ├── index.rst
│   ├── install.rst
│   ├── introduction.rst
│   ├── make.bat
│   ├── metrics.rst
│   ├── miscellaneous.rst
│   ├── model_selection.rst
│   ├── over_sampling.rst
│   ├── references/
│   │   ├── combine.rst
│   │   ├── datasets.rst
│   │   ├── ensemble.rst
│   │   ├── index.rst
│   │   ├── keras.rst
│   │   ├── metrics.rst
│   │   ├── miscellaneous.rst
│   │   ├── model_selection.rst
│   │   ├── over_sampling.rst
│   │   ├── pipeline.rst
│   │   ├── tensorflow.rst
│   │   ├── under_sampling.rst
│   │   └── utils.rst
│   ├── sphinxext/
│   │   ├── LICENSE.txt
│   │   ├── MANIFEST.in
│   │   ├── README.txt
│   │   ├── github_link.py
│   │   └── sphinx_issues.py
│   ├── under_sampling.rst
│   ├── user_guide.rst
│   ├── whats_new/
│   │   ├── v0.1.rst
│   │   ├── v0.10.rst
│   │   ├── v0.11.rst
│   │   ├── v0.12.rst
│   │   ├── v0.13.rst
│   │   ├── v0.14.rst
│   │   ├── v0.15.rst
│   │   ├── v0.2.rst
│   │   ├── v0.3.rst
│   │   ├── v0.4.rst
│   │   ├── v0.5.rst
│   │   ├── v0.6.rst
│   │   ├── v0.7.rst
│   │   ├── v0.8.rst
│   │   └── v0.9.rst
│   ├── whats_new.rst
│   └── zzz_references.rst
├── examples/
│   ├── README.txt
│   ├── api/
│   │   ├── README.txt
│   │   └── plot_sampling_strategy_usage.py
│   ├── applications/
│   │   ├── README.txt
│   │   ├── plot_impact_imbalanced_classes.py
│   │   ├── plot_multi_class_under_sampling.py
│   │   ├── plot_outlier_rejections.py
│   │   ├── plot_over_sampling_benchmark_lfw.py
│   │   ├── plot_topic_classication.py
│   │   └── porto_seguro_keras_under_sampling.py
│   ├── combine/
│   │   ├── README.txt
│   │   └── plot_comparison_combine.py
│   ├── datasets/
│   │   ├── README.txt
│   │   └── plot_make_imbalance.py
│   ├── ensemble/
│   │   ├── README.txt
│   │   ├── plot_bagging_classifier.py
│   │   └── plot_comparison_ensemble_classifier.py
│   ├── evaluation/
│   │   ├── README.txt
│   │   ├── plot_classification_report.py
│   │   └── plot_metrics.py
│   ├── model_selection/
│   │   ├── README.txt
│   │   ├── plot_instance_hardness_cv.py
│   │   └── plot_validation_curve.py
│   ├── over-sampling/
│   │   ├── README.txt
│   │   ├── plot_comparison_over_sampling.py
│   │   ├── plot_illustration_generation_sample.py
│   │   └── plot_shrinkage_effect.py
│   ├── pipeline/
│   │   ├── README.txt
│   │   └── plot_pipeline_classification.py
│   └── under-sampling/
│       ├── README.txt
│       ├── plot_comparison_under_sampling.py
│       ├── plot_illustration_nearmiss.py
│       └── plot_illustration_tomek_links.py
├── imblearn/
│   ├── VERSION.txt
│   ├── __init__.py
│   ├── _version.py
│   ├── base.py
│   ├── combine/
│   │   ├── __init__.py
│   │   ├── _smote_enn.py
│   │   ├── _smote_tomek.py
│   │   └── tests/
│   │       ├── __init__.py
│   │       ├── test_smote_enn.py
│   │       └── test_smote_tomek.py
│   ├── datasets/
│   │   ├── __init__.py
│   │   ├── _imbalance.py
│   │   ├── _zenodo.py
│   │   └── tests/
│   │       ├── __init__.py
│   │       ├── test_imbalance.py
│   │       └── test_zenodo.py
│   ├── ensemble/
│   │   ├── __init__.py
│   │   ├── _bagging.py
│   │   ├── _common.py
│   │   ├── _easy_ensemble.py
│   │   ├── _forest.py
│   │   ├── _weight_boosting.py
│   │   └── tests/
│   │       ├── __init__.py
│   │       ├── test_bagging.py
│   │       ├── test_easy_ensemble.py
│   │       ├── test_forest.py
│   │       └── test_weight_boosting.py
│   ├── exceptions.py
│   ├── keras/
│   │   ├── __init__.py
│   │   ├── _generator.py
│   │   └── tests/
│   │       ├── __init__.py
│   │       └── test_generator.py
│   ├── metrics/
│   │   ├── __init__.py
│   │   ├── _classification.py
│   │   ├── pairwise.py
│   │   └── tests/
│   │       ├── __init__.py
│   │       ├── test_classification.py
│   │       ├── test_pairwise.py
│   │       └── test_score_objects.py
│   ├── model_selection/
│   │   ├── __init__.py
│   │   ├── _split.py
│   │   └── tests/
│   │       ├── __init__.py
│   │       └── test_split.py
│   ├── over_sampling/
│   │   ├── __init__.py
│   │   ├── _adasyn.py
│   │   ├── _random_over_sampler.py
│   │   ├── _smote/
│   │   │   ├── __init__.py
│   │   │   ├── base.py
│   │   │   ├── cluster.py
│   │   │   ├── filter.py
│   │   │   └── tests/
│   │   │       ├── __init__.py
│   │   │       ├── test_borderline_smote.py
│   │   │       ├── test_kmeans_smote.py
│   │   │       ├── test_smote.py
│   │   │       ├── test_smote_nc.py
│   │   │       ├── test_smoten.py
│   │   │       └── test_svm_smote.py
│   │   ├── base.py
│   │   └── tests/
│   │       ├── __init__.py
│   │       ├── test_adasyn.py
│   │       ├── test_common.py
│   │       └── test_random_over_sampler.py
│   ├── pipeline.py
│   ├── tensorflow/
│   │   ├── __init__.py
│   │   ├── _generator.py
│   │   └── tests/
│   │       ├── __init__.py
│   │       └── test_generator.py
│   ├── tests/
│   │   ├── __init__.py
│   │   ├── test_base.py
│   │   ├── test_common.py
│   │   ├── test_docstring_parameters.py
│   │   ├── test_exceptions.py
│   │   ├── test_pipeline.py
│   │   └── test_public_functions.py
│   ├── under_sampling/
│   │   ├── __init__.py
│   │   ├── _prototype_generation/
│   │   │   ├── __init__.py
│   │   │   ├── _cluster_centroids.py
│   │   │   └── tests/
│   │   │       ├── __init__.py
│   │   │       └── test_cluster_centroids.py
│   │   ├── _prototype_selection/
│   │   │   ├── __init__.py
│   │   │   ├── _condensed_nearest_neighbour.py
│   │   │   ├── _edited_nearest_neighbours.py
│   │   │   ├── _instance_hardness_threshold.py
│   │   │   ├── _nearmiss.py
│   │   │   ├── _neighbourhood_cleaning_rule.py
│   │   │   ├── _one_sided_selection.py
│   │   │   ├── _random_under_sampler.py
│   │   │   ├── _tomek_links.py
│   │   │   └── tests/
│   │   │       ├── __init__.py
│   │   │       ├── test_allknn.py
│   │   │       ├── test_condensed_nearest_neighbour.py
│   │   │       ├── test_edited_nearest_neighbours.py
│   │   │       ├── test_instance_hardness_threshold.py
│   │   │       ├── test_nearmiss.py
│   │   │       ├── test_neighbourhood_cleaning_rule.py
│   │   │       ├── test_one_sided_selection.py
│   │   │       ├── test_random_under_sampler.py
│   │   │       ├── test_repeated_edited_nearest_neighbours.py
│   │   │       └── test_tomek_links.py
│   │   └── base.py
│   └── utils/
│       ├── __init__.py
│       ├── _docstring.py
│       ├── _show_versions.py
│       ├── _tags.py
│       ├── _test_common/
│       │   ├── __init__.py
│       │   └── instance_generator.py
│       ├── _validation.py
│       ├── deprecation.py
│       ├── estimator_checks.py
│       ├── testing.py
│       └── tests/
│           ├── __init__.py
│           ├── test_deprecation.py
│           ├── test_docstring.py
│           ├── test_estimator_checks.py
│           ├── test_min_dependencies.py
│           ├── test_show_versions.py
│           ├── test_testing.py
│           └── test_validation.py
├── maint_tools/
│   └── test_docstring.py
├── pyproject.toml
└── references.bib

================================================
FILE CONTENTS
================================================

================================================
FILE: .circleci/config.yml
================================================
version: 2.1

jobs:
  python3:
    docker:
      - image: cimg/python:3.9
    environment:
      - OMP_NUM_THREADS: 1
    steps:
      - checkout
      - run: ./build_tools/circle/checkout_merge_commit.sh
      - run:
          command: ./build_tools/circle/build_doc.sh
          no_output_timeout: 30m
      - store_artifacts:
          path: doc/_build/html
          destination: doc
      - store_artifacts:
          path: ~/log.txt
          destination: log.txt
      - persist_to_workspace:
          root: doc/_build/html
          paths: .

  deploy:
    docker:
      - image: cimg/python:3.9
    environment:
      - USERNAME: "glemaitre"
      - ORGANIZATION: "imbalanced-learn"
      - DOC_REPO: "imbalanced-learn.github.io"
      - EMAIL: "g.lemaitre58@gmail.com"
    steps:
      - checkout
      - run: ./build_tools/circle/checkout_merge_commit.sh
      - attach_workspace:
          at: doc/_build/html
      - run: ls -ltrh doc/_build/html
      - deploy:
          command: |
            if [[ "${CIRCLE_BRANCH}" =~ ^master$|^[0-9]+\.[0-9]+\.X$ ]]; then
              bash ./build_tools/circle/push_doc.sh doc/_build/html
            fi

workflows:
  version: 2
  build-doc-and-deploy:
    jobs:
      - python3
      - deploy:
          requires:
            - python3


================================================
FILE: .coveragerc
================================================
[run]
branch = True

[report]
exclude_lines =
    if self.debug:
    pragma: no cover
    raise NotImplementedError
ignore_errors = True
omit =
    */tests/*
    **/setup.py


================================================
FILE: .github/ISSUE_TEMPLATE/bug_report.md
================================================
---
name: Bug report
about: Create a report to help us reproduce and correct the bug
title: "[BUG]"
labels: bug
assignees: ''

---

#### Describe the bug
A clear and concise description of what the bug is.

#### Steps/Code to Reproduce
<!--
Example:
```python
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation

docs = ["Help I have a bug" for i in range(1000)]

vectorizer = CountVectorizer(input=docs, analyzer='word')
lda_features = vectorizer.fit_transform(docs)

lda_model = LatentDirichletAllocation(
    n_topics=10,
    learning_method='online',
    evaluate_every=10,
    n_jobs=4,
)
model = lda_model.fit(lda_features)
```
If the code is too long, feel free to put it in a public gist and link
it in the issue: https://gist.github.com
-->

```
Sample code to reproduce the problem
```

#### Expected Results
<!-- Example: No error is thrown. Please paste or describe the expected results.-->

#### Actual Results
<!-- Please paste or specifically describe the actual output or traceback. -->

#### Versions
<!--
Please run the following snippet and paste the output below.
For scikit-learn >= 0.20:
import sklearn; sklearn.show_versions()
For scikit-learn < 0.20:
import platform; print(platform.platform())
import sys; print("Python", sys.version)
import numpy; print("NumPy", numpy.__version__)
import scipy; print("SciPy", scipy.__version__)
import sklearn; print("Scikit-Learn", sklearn.__version__)
import imblearn; print("Imbalanced-Learn", imblearn.__version__)
-->


<!-- Thanks for contributing! -->


================================================
FILE: .github/ISSUE_TEMPLATE/documentation-improvement.md
================================================
---
name: Documentation improvement
about: Create a report to help us improve the documentation
title: "[DOC]"
labels: Documentation, help wanted, good first issue
assignees: ''

---

#### Describe the issue linked to the documentation

Tell us about the confusion introduce in the documentation.

#### Suggest a potential alternative/fix

Tell us how we could improve the documentation in this regard.


================================================
FILE: .github/ISSUE_TEMPLATE/feature_request.md
================================================
---
name: Feature request
about: Suggest an new algorithm, enhancement to an existing algorithm, etc.
title: "[ENH]"
labels: enhancement
assignees: ''

---

<--
If you want to propose a new algorithm, please refer first to the scikit-learn inclusion criterion:
https://scikit-learn.org/stable/faq.html#what-are-the-inclusion-criteria-for-new-algorithms
-->

#### Is your feature request related to a problem? Please describe

#### Describe the solution you'd like

#### Describe alternatives you've considered

#### Additional context


================================================
FILE: .github/ISSUE_TEMPLATE/other--blank-template-.md
================================================
---
name: Other (blank template)
about: For all other issues to reach the community...
title: ''
labels: ''
assignees: ''

---


================================================
FILE: .github/ISSUE_TEMPLATE/question.md
================================================
---
name: Question
about: If you have a usage question
title: ''
labels: ''
assignees: ''

---

**
If your issue is a usage question, submit it here instead:
- The imbalanced learn gitter: https://gitter.im/scikit-learn-contrib/imbalanced-learn
**


================================================
FILE: .github/ISSUE_TEMPLATE/usage-question.md
================================================
---
name: Usage question
about: If you have a usage question
title: "[SO]"
labels: question
assignees: ''

---

** If your issue is a usage question, submit it here instead:**
- **The imbalanced learn gitter: https://gitter.im/scikit-learn-contrib/imbalanced-learn**
- **StackOverflow with the imblearn (or imbalanced-learn) tag:https://stackoverflow.com/questions/tagged/imblearn**

We are going to automatically close this issue if this is not link to a bug or an enhancement.


================================================
FILE: .github/ISSUE_TEMPLATE.md
================================================
<!--
If your issue is a usage question, submit it here instead:
- The imbalanced learn gitter: https://gitter.im/scikit-learn-contrib/imbalanced-learn
-->

<!-- Instructions For Filing a Bug: https://github.com/scikit-learn-contrib/imbalanced-learn/blob/master/CONTRIBUTING.md#filing-bugs -->

#### Description
<!-- Example: Joblib Error thrown when calling fit on LatentDirichletAllocation with evaluate_every > 0-->

#### Steps/Code to Reproduce
<!--
Example:
```
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation

docs = ["Help I have a bug" for i in range(1000)]

vectorizer = CountVectorizer(input=docs, analyzer='word')
lda_features = vectorizer.fit_transform(docs)

lda_model = LatentDirichletAllocation(
    n_topics=10,
    learning_method='online',
    evaluate_every=10,
    n_jobs=4,
)
model = lda_model.fit(lda_features)
```
If the code is too long, feel free to put it in a public gist and link
it in the issue: https://gist.github.com
-->

#### Expected Results
<!-- Example: No error is thrown. Please paste or describe the expected results.-->

#### Actual Results
<!-- Please paste or specifically describe the actual output or traceback. -->

#### Versions
<!--
Please run the following snippet and paste the output below.
import platform; print(platform.platform())
import sys; print("Python", sys.version)
import numpy; print("NumPy", numpy.__version__)
import scipy; print("SciPy", scipy.__version__)
import sklearn; print("Scikit-Learn", sklearn.__version__)
import imblearn; print("Imbalanced-Learn", imblearn.__version__)
-->


<!-- Thanks for contributing! -->


================================================
FILE: .github/PULL_REQUEST_TEMPLATE.md
================================================
<!--
Thanks for contributing a pull request! Please ensure you have taken a look at
the contribution guidelines: https://github.com/scikit-learn-contrib/imbalanced-learn/blob/master/CONTRIBUTING.md#contributing-pull-requests
-->
#### Reference Issue
<!-- Example: Fixes #1234 -->


#### What does this implement/fix? Explain your changes.


#### Any other comments?


<!--
Please be aware that we are a loose team of volunteers so patience is
necessary; assistance handling other issues is very welcome. We value
all user contributions, no matter how minor they are. If we are slow to
review, either the pull request needs some benchmarking, tinkering,
convincing, etc. or more likely the reviewers are simply busy. In either
case, we ask for your understanding during the review process.

Thanks for contributing!
-->


================================================
FILE: .github/check-changelog.yml
================================================
name: Check Changelog
# This check makes sure that the changelog is properly updated
# when a PR introduces a change in a test file.
# To bypass this check, label the PR with "No Changelog Needed".
on:
  pull_request:
    types: [opened, edited, labeled, unlabeled, synchronize]

jobs:
  check:
    name: A reviewer will let you know if it is required or can be bypassed
    runs-on: ubuntu-latest
    if: ${{ contains(github.event.pull_request.labels.*.name, 'No Changelog Needed') == 0 }}
    steps:
      - name: Get PR number and milestone
        run: |
          echo "PR_NUMBER=${{ github.event.pull_request.number }}" >> $GITHUB_ENV
          echo "TAGGED_MILESTONE=${{ github.event.pull_request.milestone.title }}" >> $GITHUB_ENV
      - uses: actions/checkout@v4
        with:
          fetch-depth: '0'
      - name: Check the changelog entry
        run: |
          set -xe
          changed_files=$(git diff --name-only origin/main)
          # Changelog should be updated only if tests have been modified
          if [[ ! "$changed_files" =~ tests ]]
          then
            exit 0
          fi
          all_changelogs=$(cat ./doc/whats_new/v*.rst)
          if [[ "$all_changelogs" =~ :pr:\`$PR_NUMBER\` ]]
          then
            echo "Changelog has been updated."
            # If the pull request is milestoned check the correspondent changelog
            if exist -f ./doc/whats_new/v${TAGGED_MILESTONE:0:4}.rst
            then
              expected_changelog=$(cat ./doc/whats_new/v${TAGGED_MILESTONE:0:4}.rst)
              if [[ "$expected_changelog" =~ :pr:\`$PR_NUMBER\` ]]
              then
                echo "Changelog and milestone correspond."
              else
                echo "Changelog and milestone do not correspond."
                echo "If you see this error make sure that the tagged milestone for the PR"
                echo "and the edited changelog filename properly match."
                exit 1
              fi
            fi
          else
            echo "A Changelog entry is missing."
            echo ""
            echo "Please add an entry to the changelog at 'doc/whats_new/v*.rst'"
            echo "to document your change assuming that the PR will be merged"
            echo "in time for the next release of imbalanced-learn."
            echo ""
            echo "Look at other entries in that file for inspiration and please"
            echo "reference this pull request using the ':pr:' directive and"
            echo "credit yourself (and other contributors if applicable) with"
            echo "the ':user:' directive."
            echo ""
            echo "If you see this error and there is already a changelog entry,"
            echo "check that the PR number is correct."
            echo ""
            echo "If you believe that this PR does not warrant a changelog"
            echo "entry, say so in a comment so that a maintainer will label"
            echo "the PR with 'No Changelog Needed' to bypass this check."
            exit 1
          fi


================================================
FILE: .github/dependabot.yml
================================================
version: 2
updates:
  # Maintain dependencies for GitHub Actions as recommended in SPEC8:
  # https://github.com/scientific-python/specs/pull/325
  # At the time of writing, release critical workflows such as
  # pypa/gh-action-pypi-publish should use hash-based versioning for security
  # reasons. This strategy may be generalized to all other github actions
  # in the future.
  - package-ecosystem: "github-actions"
    directory: "/"
    schedule:
      interval: "weekly"
    groups:
      actions:
        patterns:
          - "*"
    reviewers:
      - "glemaitre"


================================================
FILE: .github/workflows/circleci-artifacts-redirector.yml
================================================
name: CircleCI artifacts redirector

on: [status]

# Restrict the permissions granted to the use of secrets.GITHUB_TOKEN in this
# github actions workflow:
# https://docs.github.com/en/actions/security-guides/automatic-token-authentication
permissions:
  statuses: write

jobs:
  circleci_artifacts_redirector_job:
    runs-on: ubuntu-latest
    # For testing this action on a fork, remove the "github.repository =="" condition.
    if: "github.repository == 'scikit-learn-contrib/imbalanced-learn' && github.event.context == 'ci/circleci: doc'"
    name: Run CircleCI artifacts redirector
    steps:
      - name: GitHub Action step
        uses: scientific-python/circleci-artifacts-redirector-action@v1
        with:
          repo-token: ${{ secrets.GITHUB_TOKEN }}
          api-token: ${{ secrets.CIRCLE_CI }}
          artifact-path: 0/doc/index.html
          circleci-jobs: doc
          job-title: Check the rendered docs here!


================================================
FILE: .github/workflows/linters.yml
================================================
name: Run code format checks

on:
  push:
    branches:
      - "main"
  pull_request:
    branches:
      - '*'

jobs:
  run-pre-commit-checks:
    name: Run pre-commit checks
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v6
      - uses: prefix-dev/setup-pixi@v0.9.3
        with:
          pixi-version: v0.51.0
          frozen: true

      - name: Run tests
        run: pixi run -e linters linters


================================================
FILE: .github/workflows/tests.yml
================================================
name: 'tests'

on:
  push:
    branches:
      - "main"
  pull_request:
    branches:
      - '*'

jobs:
  test:
    strategy:
      matrix:
        os: [windows-latest, ubuntu-latest, macos-latest]
        environment: [
            ci-py310-min-dependencies,
            ci-py310-min-optional-dependencies,
            ci-py310-min-keras,
            ci-py310-min-tensorflow,
            ci-py311-sklearn-1-4,
            ci-py311-sklearn-1-5,
            ci-py312-sklearn-1-6,
            ci-py311-latest-keras,
            ci-py311-latest-tensorflow,
            ci-py314-latest-dependencies,
            ci-py314-latest-optional-dependencies,
        ]
        exclude:
            - os: windows-latest
              environment: ci-py310-min-keras
            - os: windows-latest
              environment: ci-py310-min-tensorflow
            - os: windows-latest
              environment: ci-py311-latest-keras
            - os: windows-latest
              environment: ci-py311-latest-tensorflow
    runs-on: ${{ matrix.os }}
    steps:
      - uses: actions/checkout@v6
      - uses: prefix-dev/setup-pixi@v0.9.3
        with:
          pixi-version: v0.51.0
          environments: ${{ matrix.environment }}
          # we can freeze the environment and manually bump the dependencies to the
          # latest version time to time.
          frozen: true

      - name: Run tests
        run: pixi run -e ${{ matrix.environment }} tests -n 3

      - name: Upload coverage reports to Codecov
        uses: codecov/codecov-action@v5.5.2
        with:
          token: ${{ secrets.CODECOV_TOKEN }}
          slug: scikit-learn-contrib/imbalanced-learn


================================================
FILE: .gitignore
================================================
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]

# C extensions
*.so

# Distribution / packaging
.Python
env/
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
*.egg-info/
.installed.cfg
*.egg
Pipfile
Pipfile.lock

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*,cover
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# vim
*.swp

# emacs
*~

# Visual Studio
*.sln
*.pyproj
*.suo
*.vs
.vscode/

# PyCharm
.idea/

# Cython
*.pyc
*.pyo
__pycache__
*.so
*.o

*.egg
*.egg-info

Cython/Compiler/*.c
Cython/Plex/*.c
Cython/Runtime/refnanny.c
Cython/Tempita/*.c
Cython/*.c

Tools/*.elc

/TEST_TMP/
/build/
/wheelhouse*/
!tests/build/
/dist/
.gitrev
.coverage
*.orig
*.rej
*.dep
*.swp
*~

.ipynb_checkpoints
docs/build

tags
TAGS
MANIFEST

.tox

cythonize.dat

# build documentation
doc/_build/
doc/auto_examples/
doc/generated/
doc/references/generated/
doc/bibtex/auto
doc/min_dependency_table.rst

# MacOS
.DS_Store

# Pixi folder
.pixi/

# Generated files
doc/min_dependency_substitutions.rst
doc/sg_execution_times.rst


================================================
FILE: .pre-commit-config.yaml
================================================
repos:
-   repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.3.0
    hooks:
    -   id: check-yaml
    -   id: end-of-file-fixer
    -   id: trailing-whitespace
-   repo: https://github.com/astral-sh/ruff-pre-commit
    # Ruff version.
    rev: v0.4.8
    hooks:
    -   id: ruff
        args: ["--fix", "--output-format=full"]
-   repo: https://github.com/psf/black
    rev: 23.3.0
    hooks:
    -   id: black


================================================
FILE: AUTHORS.rst
================================================
History
-------

Development lead
~~~~~~~~~~~~~~~~

The project started in August 2014 by Fernando Nogueira and focused on SMOTE implementation.
Together with Guillaume Lemaitre, Dayvid Victor, and Christos Aridas, additional under-sampling and over-sampling methods have been implemented as well as major changes in the API to be fully compatible with scikit-learn_.

Contributors
------------

Refers to GitHub contributors page_.

.. _scikit-learn: http://scikit-learn.org
.. _page: https://github.com/scikit-learn-contrib/imbalanced-learn/graphs/contributors


================================================
FILE: CONTRIBUTING.md
================================================
Contributing code
=================

This guide is adapted from [scikit-learn](https://github.com/scikit-learn/scikit-learn/blob/master/CONTRIBUTING.md).

How to contribute
-----------------

The preferred way to contribute to imbalanced-learn is to fork the
[main repository](https://github.com/scikit-learn-contrib/imbalanced-learn) on
GitHub:

1. Fork the [project repository](https://github.com/scikit-learn-contrib/imbalanced-learn):
   click on the 'Fork' button near the top of the page. This creates
   a copy of the code under your account on the GitHub server.

2. Clone this copy to your local disk:

        $ git clone git@github.com:YourLogin/imbalanced-learn.git
        $ cd imblearn

3. Create a branch to hold your changes:

        $ git checkout -b my-feature

   and start making changes. Never work in the ``master`` branch!

4. Work on this copy on your computer using Git to do the version
   control. When you're done editing, do:

        $ git add modified_files
        $ git commit

   to record your changes in Git, then push them to GitHub with:

        $ git push -u origin my-feature

Finally, go to the web page of your fork of the imbalanced-learn repo,
and click 'Pull request' to send your changes to the maintainers for
review. This will send an email to the committers.

(If any of the above seems like magic to you, then look up the
[Git documentation](https://git-scm.com/documentation) on the web.)

Contributing Pull Requests
--------------------------

It is recommended to check that your contribution complies with the
following rules before submitting a pull request:

-  Follow the
   [coding-guidelines](http://scikit-learn.org/dev/developers/contributing.html#coding-guidelines)
   as for scikit-learn.

-  When applicable, use the validation tools and other code in the
   `sklearn.utils` submodule.  A list of utility routines available
   for developers can be found in the
   [Utilities for Developers](http://scikit-learn.org/dev/developers/utilities.html#developers-utils)
   page.

-  If your pull request addresses an issue, please use the title to describe
   the issue and mention the issue number in the pull request description to
   ensure a link is created to the original issue.

-  All public methods should have informative docstrings with sample
   usage presented as doctests when appropriate.

-  Please prefix the title of your pull request with `[MRG]` if the
   contribution is complete and should be subjected to a detailed review.
   Incomplete contributions should be prefixed `[WIP]` to indicate a work
   in progress (and changed to `[MRG]` when it matures). WIPs may be useful
   to: indicate you are working on something to avoid duplicated work,
   request broad review of functionality or API, or seek collaborators.
   WIPs often benefit from the inclusion of a
   [task list](https://github.com/blog/1375-task-lists-in-gfm-issues-pulls-comments)
   in the PR description.

-  All other tests pass when everything is rebuilt from scratch. On
   Unix-like systems, check with (from the toplevel source folder):

        $ make

-  When adding additional functionality, provide at least one
   example script in the ``examples/`` folder. Have a look at other
   examples for reference. Examples should demonstrate why the new
   functionality is useful in practice and, if possible, compare it
   to other methods available in scikit-learn.

-  Documentation and high-coverage tests are necessary for enhancements
   to be accepted.

-  At least one paragraph of narrative documentation with links to
   references in the literature (with PDF links when possible) and
   the example.

You can also check for common programming errors with the following
tools:

-  Code with good unittest coverage (at least 80%), check with:

        $ pip install pytest pytest-cov
        $ pytest --cov=imblearn imblearn

-  No pyflakes warnings, check with:

        $ pip install pyflakes
        $ pyflakes path/to/module.py

-  No PEP8 warnings, check with:

        $ pip install pycodestyle
        $ pycodestyle path/to/module.py

-  AutoPEP8 can help you fix some of the easy redundant errors:

        $ pip install autopep8
        $ autopep8 path/to/pep8.py

Filing bugs
-----------
We use Github issues to track all bugs and feature requests; feel free to
open an issue if you have found a bug or wish to see a feature implemented.

It is recommended to check that your issue complies with the
following rules before submitting:

-  Verify that your issue is not being currently addressed by other
   [issues](https://github.com/scikit-learn-contrib/imbalanced-learn/issues)
   or [pull requests](https://github.com/scikit-learn-contrib/imbalanced-learn/pulls).

-  Please ensure all code snippets and error messages are formatted in
   appropriate code blocks.
   See [Creating and highlighting code blocks](https://help.github.com/articles/creating-and-highlighting-code-blocks).

-  Please include your operating system type and version number, as well
   as your Python, scikit-learn, numpy, and scipy versions. This information
   can be found by runnning the following code snippet:

   ```python
   import platform; print(platform.platform())
   import sys; print("Python", sys.version)
   import numpy; print("NumPy", numpy.__version__)
   import scipy; print("SciPy", scipy.__version__)
   import sklearn; print("Scikit-Learn", sklearn.__version__)
   import imblearn; print("Imbalanced-Learn", imblearn.__version__)
   ```

-  Please be specific about what estimators and/or functions are involved
   and the shape of the data, as appropriate; please include a
   [reproducible](https://stackoverflow.com/help/mcve) code snippet
   or link to a [gist](https://gist.github.com). If an exception is raised,
   please provide the traceback.

Documentation
-------------

We are glad to accept any sort of documentation: function docstrings,
reStructuredText documents (like this one), tutorials, etc.
reStructuredText documents live in the source code repository under the
doc/ directory.

You can edit the documentation using any text editor and then generate
the HTML output by typing ``make html`` from the doc/ directory.
Alternatively, ``make`` can be used to quickly generate the
documentation without the example gallery. The resulting HTML files will
be placed in _build/html/ and are viewable in a web browser. See the
README file in the doc/ directory for more information.

For building the documentation, you will need
[sphinx](http://sphinx-doc.org),
[matplotlib](https://matplotlib.org), and
[pillow](https://pillow.readthedocs.io).

When you are writing documentation, it is important to keep a good
compromise between mathematical and algorithmic details, and give
intuition to the reader on what the algorithm does. It is best to always
start with a small paragraph with a hand-waving explanation of what the
method does to the data and a figure (coming from an example)
illustrating it.


================================================
FILE: LICENSE
================================================
The MIT License (MIT)

Copyright (c) 2014-2020 The imbalanced-learn developers.
All rights reserved.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: MANIFEST.in
================================================

recursive-include doc *
recursive-include examples *
include AUTHORS.rst
include CONTRIBUTING.md
include LICENSE
include README.rst


================================================
FILE: README.rst
================================================
.. -*- mode: rst -*-

.. _scikit-learn: http://scikit-learn.org/stable/

.. _scikit-learn-contrib: https://github.com/scikit-learn-contrib

|GitHubActions|_ |Codecov|_ |CircleCI|_ |PythonVersion|_ |Pypi|_ |Gitter|_ |Black|_

.. |GitHubActions| image:: https://github.com/scikit-learn-contrib/imbalanced-learn/actions/workflows/tests.yml/badge.svg
.. _GitHubActions: https://github.com/scikit-learn-contrib/imbalanced-learn/actions/workflows/tests.yml

.. |Codecov| image:: https://codecov.io/gh/scikit-learn-contrib/imbalanced-learn/branch/master/graph/badge.svg
.. _Codecov: https://codecov.io/gh/scikit-learn-contrib/imbalanced-learn

.. |CircleCI| image:: https://circleci.com/gh/scikit-learn-contrib/imbalanced-learn.svg?style=shield
.. _CircleCI: https://circleci.com/gh/scikit-learn-contrib/imbalanced-learn/tree/master

.. |PythonVersion| image:: https://img.shields.io/pypi/pyversions/imbalanced-learn.svg
.. _PythonVersion: https://img.shields.io/pypi/pyversions/imbalanced-learn.svg

.. |Pypi| image:: https://badge.fury.io/py/imbalanced-learn.svg
.. _Pypi: https://badge.fury.io/py/imbalanced-learn

.. |Gitter| image:: https://badges.gitter.im/scikit-learn-contrib/imbalanced-learn.svg
.. _Gitter: https://gitter.im/scikit-learn-contrib/imbalanced-learn?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge

.. |Black| image:: https://img.shields.io/badge/code%20style-black-000000.svg
.. _Black: :target: https://github.com/psf/black

.. |PythonMinVersion| replace:: 3.10
.. |NumPyMinVersion| replace:: 1.25.2
.. |SciPyMinVersion| replace:: 1.11.4
.. |ScikitLearnMinVersion| replace:: 1.4.2
.. |MatplotlibMinVersion| replace:: 3.7.3
.. |PandasMinVersion| replace:: 2.0.3
.. |TensorflowMinVersion| replace:: 2.16.1
.. |KerasMinVersion| replace:: 3.3.3
.. |SeabornMinVersion| replace:: 0.12.2
.. |PytestMinVersion| replace:: 7.2.2

imbalanced-learn
================

imbalanced-learn is a python package offering a number of re-sampling techniques
commonly used in datasets showing strong between-class imbalance.
It is compatible with scikit-learn_ and is part of scikit-learn-contrib_
projects.

Documentation
-------------

Installation documentation, API documentation, and examples can be found on the
documentation_.

.. _documentation: https://imbalanced-learn.org/stable/

Installation
------------

Dependencies
~~~~~~~~~~~~

`imbalanced-learn` requires the following dependencies:

- Python (>= |PythonMinVersion|)
- NumPy (>= |NumPyMinVersion|)
- SciPy (>= |SciPyMinVersion|)
- Scikit-learn (>= |ScikitLearnMinVersion|)
- Pytest (>= |PytestMinVersion|)

Additionally, `imbalanced-learn` requires the following optional dependencies:

- Pandas (>= |PandasMinVersion|) for dealing with dataframes
- Tensorflow (>= |TensorflowMinVersion|) for dealing with TensorFlow models
- Keras (>= |KerasMinVersion|) for dealing with Keras models

The examples will requires the following additional dependencies:

- Matplotlib (>= |MatplotlibMinVersion|)
- Seaborn (>= |SeabornMinVersion|)

Installation
~~~~~~~~~~~~

From PyPi or conda-forge repositories
.....................................

imbalanced-learn is currently available on the PyPi's repositories and you can
install it via `pip`::

  pip install -U imbalanced-learn

The package is release also in Anaconda Cloud platform::

  conda install -c conda-forge imbalanced-learn

From source available on GitHub
...............................

If you prefer, you can clone it and run the setup.py file. Use the following
commands to get a copy from Github and install all dependencies::

  git clone https://github.com/scikit-learn-contrib/imbalanced-learn.git
  cd imbalanced-learn
  pip install .

Be aware that you can install in developer mode with::

  pip install --no-build-isolation --editable .

If you wish to make pull-requests on GitHub, we advise you to install
pre-commit::

  pip install pre-commit
  pre-commit install

Testing
~~~~~~~

After installation, you can use `pytest` to run the test suite::

  make coverage

Development
-----------

The development of this scikit-learn-contrib is in line with the one
of the scikit-learn community. Therefore, you can refer to their
`Development Guide
<http://scikit-learn.org/stable/developers>`_.

Endorsement of the Scientific Python Specification
--------------------------------------------------

We endorse good practices from the Scientific Python Ecosystem Coordination (SPEC).
The full list of recommendations is available `here`_.

See below the list of recommendations that we endorse for the imbalanced-learn project.

|SPEC 0 — Minimum Supported Dependencies|

.. |SPEC 0 — Minimum Supported Dependencies| image:: https://img.shields.io/badge/SPEC-0-green?labelColor=%23004811&color=%235CA038
   :target: https://scientific-python.org/specs/spec-0000/

.. _here: https://scientific-python.org/specs/

About
-----

If you use imbalanced-learn in a scientific publication, we would appreciate
citations to the following paper::

  @article{JMLR:v18:16-365,
  author  = {Guillaume  Lema{{\^i}}tre and Fernando Nogueira and Christos K. Aridas},
  title   = {Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning},
  journal = {Journal of Machine Learning Research},
  year    = {2017},
  volume  = {18},
  number  = {17},
  pages   = {1-5},
  url     = {http://jmlr.org/papers/v18/16-365}
  }

Most classification algorithms will only perform optimally when the number of
samples of each class is roughly the same. Highly skewed datasets, where the
minority is heavily outnumbered by one or more classes, have proven to be a
challenge while at the same time becoming more and more common.

One way of addressing this issue is by re-sampling the dataset as to offset this
imbalance with the hope of arriving at a more robust and fair decision boundary
than you would otherwise.

You can refer to the `imbalanced-learn`_ documentation to find details about
the implemented algorithms.

.. _imbalanced-learn: https://imbalanced-learn.org/stable/user_guide.html


================================================
FILE: build_tools/circle/build_doc.sh
================================================
#!/usr/bin/env bash
set -x
set -e

# deactivate circleci virtualenv and setup a miniconda env instead
if [[ `type -t deactivate` ]]; then
    deactivate
fi

# Install pixi
curl -fsSL https://pixi.sh/install.sh | bash
export PATH=/home/circleci/.pixi/bin:$PATH

# The pipefail is requested to propagate exit code
set -o pipefail && pixi run --frozen -e docs build-docs 2>&1 | tee ~/log.txt
set +o pipefail


================================================
FILE: build_tools/circle/checkout_merge_commit.sh
================================================
#!/bin/bash

# Add `master` branch to the update list.
# Otherwise CircleCI will give us a cached one.
FETCH_REFS="+master:master"

# Update PR refs for testing.
if [[ -n "${CIRCLE_PR_NUMBER}" ]]
then
    FETCH_REFS="${FETCH_REFS} +refs/pull/${CIRCLE_PR_NUMBER}/head:pr/${CIRCLE_PR_NUMBER}/head"
    FETCH_REFS="${FETCH_REFS} +refs/pull/${CIRCLE_PR_NUMBER}/merge:pr/${CIRCLE_PR_NUMBER}/merge"
fi

# Retrieve the refs.
git fetch -u origin ${FETCH_REFS}

# Checkout the PR merge ref.
if [[ -n "${CIRCLE_PR_NUMBER}" ]]
then
    git checkout -qf "pr/${CIRCLE_PR_NUMBER}/merge" || (
        echo Could not fetch merge commit. >&2
        echo There may be conflicts in merging PR \#${CIRCLE_PR_NUMBER} with master. >&2;
        exit 1)
fi

# Check for merge conflicts.
if [[ -n "${CIRCLE_PR_NUMBER}" ]]
then
    git branch --merged | grep master > /dev/null
    git branch --merged | grep "pr/${CIRCLE_PR_NUMBER}/head" > /dev/null
fi


================================================
FILE: build_tools/circle/linting.sh
================================================
#!/bin/bash

# This script is used in CircleCI to check that PRs do not add obvious
# flake8 violations. It relies on two things:
#   - find common ancestor between branch and
#     scikit-learn/scikit-learn remote
#   - run flake8 --diff on the diff between the branch and the common
#     ancestor
#
# Additional features:
#   - the line numbers in Travis match the local branch on the PR
#     author machine.
#   - ./build_tools/circle/flake8_diff.sh can be run locally for quick
#     turn-around

set -e
# pipefail is necessary to propagate exit codes
set -o pipefail

PROJECT=scikit-learn-contrib/imbalanced-learn
PROJECT_URL=https://github.com/$PROJECT.git

# Find the remote with the project name (upstream in most cases)
REMOTE=$(git remote -v | grep $PROJECT | cut -f1 | head -1 || echo '')

# Add a temporary remote if needed. For example this is necessary when
# Travis is configured to run in a fork. In this case 'origin' is the
# fork and not the reference repo we want to diff against.
if [[ -z "$REMOTE" ]]; then
    TMP_REMOTE=tmp_reference_upstream
    REMOTE=$TMP_REMOTE
    git remote add $REMOTE $PROJECT_URL
fi

echo "Remotes:"
echo '--------------------------------------------------------------------------------'
git remote --verbose

# Travis does the git clone with a limited depth (50 at the time of
# writing). This may not be enough to find the common ancestor with
# $REMOTE/master so we unshallow the git checkout
if [[ -a .git/shallow ]]; then
    echo -e '\nTrying to unshallow the repo:'
    echo '--------------------------------------------------------------------------------'
    git fetch --unshallow
fi

if [[ "$TRAVIS" == "true" ]]; then
    if [[ "$TRAVIS_PULL_REQUEST" == "false" ]]
    then
        # In main repo, using TRAVIS_COMMIT_RANGE to test the commits
        # that were pushed into a branch
        if [[ "$PROJECT" == "$TRAVIS_REPO_SLUG" ]]; then
            if [[ -z "$TRAVIS_COMMIT_RANGE" ]]; then
                echo "New branch, no commit range from Travis so passing this test by convention"
                exit 0
            fi
            COMMIT_RANGE=$TRAVIS_COMMIT_RANGE
        fi
    else
        # We want to fetch the code as it is in the PR branch and not
        # the result of the merge into master. This way line numbers
        # reported by Travis will match with the local code.
        LOCAL_BRANCH_REF=travis_pr_$TRAVIS_PULL_REQUEST
        # In Travis the PR target is always origin
        git fetch origin pull/$TRAVIS_PULL_REQUEST/head:refs/$LOCAL_BRANCH_REF
    fi
fi

# If not using the commit range from Travis we need to find the common
# ancestor between $LOCAL_BRANCH_REF and $REMOTE/master
if [[ -z "$COMMIT_RANGE" ]]; then
    if [[ -z "$LOCAL_BRANCH_REF" ]]; then
        LOCAL_BRANCH_REF=$(git rev-parse --abbrev-ref HEAD)
    fi
    echo -e "\nLast 2 commits in $LOCAL_BRANCH_REF:"
    echo '--------------------------------------------------------------------------------'
    git --no-pager log -2 $LOCAL_BRANCH_REF

    REMOTE_MASTER_REF="$REMOTE/master"
    # Make sure that $REMOTE_MASTER_REF is a valid reference
    echo -e "\nFetching $REMOTE_MASTER_REF"
    echo '--------------------------------------------------------------------------------'
    git fetch $REMOTE master:refs/remotes/$REMOTE_MASTER_REF
    LOCAL_BRANCH_SHORT_HASH=$(git rev-parse --short $LOCAL_BRANCH_REF)
    REMOTE_MASTER_SHORT_HASH=$(git rev-parse --short $REMOTE_MASTER_REF)

    COMMIT=$(git merge-base $LOCAL_BRANCH_REF $REMOTE_MASTER_REF) || \
        echo "No common ancestor found for $(git show $LOCAL_BRANCH_REF -q) and $(git show $REMOTE_MASTER_REF -q)"

    if [ -z "$COMMIT" ]; then
        exit 1
    fi

    COMMIT_SHORT_HASH=$(git rev-parse --short $COMMIT)

    echo -e "\nCommon ancestor between $LOCAL_BRANCH_REF ($LOCAL_BRANCH_SHORT_HASH)"\
         "and $REMOTE_MASTER_REF ($REMOTE_MASTER_SHORT_HASH) is $COMMIT_SHORT_HASH:"
    echo '--------------------------------------------------------------------------------'
    git --no-pager show --no-patch $COMMIT_SHORT_HASH

    COMMIT_RANGE="$COMMIT_SHORT_HASH..$LOCAL_BRANCH_SHORT_HASH"

    if [[ -n "$TMP_REMOTE" ]]; then
        git remote remove $TMP_REMOTE
    fi

else
    echo "Got the commit range from Travis: $COMMIT_RANGE"
fi

echo -e '\nRunning flake8 on the diff in the range' "$COMMIT_RANGE" \
     "($(git rev-list $COMMIT_RANGE | wc -l) commit(s)):"
echo '--------------------------------------------------------------------------------'

# We ignore files from sklearn/externals. Unfortunately there is no
# way to do it with flake8 directly (the --exclude does not seem to
# work with --diff). We could use the exclude magic in the git pathspec
# ':!sklearn/externals' but it is only available on git 1.9 and Travis
# uses git 1.8.
# We need the following command to exit with 0 hence the echo in case
# there is no match
MODIFIED_FILES="$(git diff --name-only $COMMIT_RANGE | grep -v 'sklearn/externals' | \
                     grep -v 'doc/sphinxext' || echo "no_match")"

check_files() {
    files="$1"
    shift
    options="$*"
    if [ -n "$files" ]; then
        # Conservative approach: diff without context (--unified=0) so that code
        # that was not changed does not create failures
        git diff --unified=0 $COMMIT_RANGE -- $files | flake8 --diff --max-line-length=88 --show-source $options
    fi
}

if [[ "$MODIFIED_FILES" == "no_match" ]]; then
    echo "No file outside sklearn/externals and doc/sphinxext has been modified"
else

    check_files "$(echo "$MODIFIED_FILES" | grep -v ^examples)"
    check_files "$(echo "$MODIFIED_FILES" | grep ^examples)" \
        --config ./setup.cfg
fi
echo -e "No problem detected by flake8\n"

# For docstrings and warnings of deprecated attributes to be rendered
# properly, the property decorator must come before the deprecated decorator
# (else they are treated as functions)

# do not error when grep -B1 "@property" finds nothing
set +e
bad_deprecation_property_order=`git grep -A 10 "@property"  -- "*.py" | awk '/@property/,/def /' | grep -B1 "@deprecated"`

if [ ! -z "$bad_deprecation_property_order" ]
then
    echo "property decorator should come before deprecated decorator"
    echo "found the following occurrencies:"
    echo $bad_deprecation_property_order
    exit 1
fi


================================================
FILE: build_tools/circle/push_doc.sh
================================================
#!/bin/bash
# This script is meant to be called in the "deploy" step defined in
# circle.yml. See https://circleci.com/docs/ for more details.
# The behavior of the script is controlled by environment variable defined
# in the circle.yml in the top level folder of the project.

GENERATED_DOC_DIR=$1

if [[ -z "$GENERATED_DOC_DIR" ]]; then
    echo "Need to pass directory of the generated doc as argument"
    echo "Usage: $0 <generated_doc_dir>"
    exit 1
fi

# Absolute path needed because we use cd further down in this script
GENERATED_DOC_DIR=$(readlink -f $GENERATED_DOC_DIR)

if [ "$CIRCLE_BRANCH" = "master" ]
then
    dir=dev
else
    # Strip off .X
    dir="${CIRCLE_BRANCH::-2}"
fi

MSG="Pushing the docs to $dir/ for branch: $CIRCLE_BRANCH, commit $CIRCLE_SHA1"

cd $HOME
if [ ! -d $DOC_REPO ];
then git clone --depth 1 --no-checkout -b master "git@github.com:"$ORGANIZATION"/"$DOC_REPO".git";
fi
cd $DOC_REPO
git config core.sparseCheckout true
echo $dir > .git/info/sparse-checkout
git checkout master
git reset --hard origin/master
git rm -rf $dir/ && rm -rf $dir/
cp -R $GENERATED_DOC_DIR $dir
touch $dir/.nojekyll
git config --global user.email $EMAIL
git config --global user.name $USERNAME
git config --global push.default matching
git add -f $dir/
git commit -m "$MSG" $dir
git push origin master

echo $MSG


================================================
FILE: conftest.py
================================================
# This file is here so that when running from the root folder
# ./imblearn is added to sys.path by pytest.
# See https://docs.pytest.org/en/latest/pythonpath.html for more details.
# For example, this allows to build extensions in place and run pytest
# doc/modules/clustering.rst and use imblearn from the local folder
# rather than the one from site-packages.

import os

import numpy as np
import pytest
from sklearn.utils.fixes import parse_version

# use legacy numpy print options to avoid failures due to NumPy 2.+ scalar
# representation
if parse_version(np.__version__) > parse_version("2.0.0"):
    np.set_printoptions(legacy="1.25")


def pytest_runtest_setup(item):
    fname = item.fspath.strpath
    if (
        fname.endswith(os.path.join("keras", "_generator.py"))
        or fname.endswith(os.path.join("tensorflow", "_generator.py"))
        or fname.endswith("miscellaneous.rst")
    ):
        try:
            import tensorflow  # noqa
        except ImportError:
            pytest.skip("The tensorflow package is not installed.")


================================================
FILE: doc/Makefile
================================================
# Makefile for Sphinx documentation
#

# You can set these variables from the command line.
SPHINXOPTS    = -v
SPHINXBUILD   = sphinx-build
PAPER         =
BUILDDIR      = _build

# User-friendly check for sphinx-build
ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1)
$(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/)
endif

# Internal variables.
PAPEROPT_a4     = -D latex_paper_size=a4
PAPEROPT_letter = -D latex_paper_size=letter
ALLSPHINXOPTS   = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
# the i18n builder cannot share the environment and doctrees with the others
I18NSPHINXOPTS  = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .

.PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext

help:
	@echo "Please use \`make <target>' where <target> is one of"
	@echo "  html       to make standalone HTML files"
	@echo "  dirhtml    to make HTML files named index.html in directories"
	@echo "  singlehtml to make a single large HTML file"
	@echo "  pickle     to make pickle files"
	@echo "  json       to make JSON files"
	@echo "  htmlhelp   to make HTML files and a HTML help project"
	@echo "  qthelp     to make HTML files and a qthelp project"
	@echo "  devhelp    to make HTML files and a Devhelp project"
	@echo "  epub       to make an epub"
	@echo "  latex      to make LaTeX files, you can set PAPER=a4 or PAPER=letter"
	@echo "  latexpdf   to make LaTeX files and run them through pdflatex"
	@echo "  latexpdfja to make LaTeX files and run them through platex/dvipdfmx"
	@echo "  text       to make text files"
	@echo "  man        to make manual pages"
	@echo "  texinfo    to make Texinfo files"
	@echo "  info       to make Texinfo files and run them through makeinfo"
	@echo "  gettext    to make PO message catalogs"
	@echo "  changes    to make an overview of all changed/added/deprecated items"
	@echo "  xml        to make Docutils-native XML files"
	@echo "  pseudoxml  to make pseudoxml-XML files for display purposes"
	@echo "  linkcheck  to check all external links for integrity"
	@echo "  doctest    to run all doctests embedded in the documentation (if enabled)"

clean:
	-rm -rf $(BUILDDIR)/*
	-rm -rf auto_examples/
	-rm -rf generated/*
	-rm -rf modules/generated/*

html:
	# These two lines make the build a bit more lengthy, and the
	# the embedding of images more robust
	rm -rf $(BUILDDIR)/html/_images
	#rm -rf _build/doctrees/
	$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
	touch $(BUILDDIR)/html/.nojekyll
	@echo
	@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."

dirhtml:
	$(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml
	@echo
	@echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml."

singlehtml:
	$(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml
	@echo
	@echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml."

pickle:
	$(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle
	@echo
	@echo "Build finished; now you can process the pickle files."

json:
	$(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json
	@echo
	@echo "Build finished; now you can process the JSON files."

htmlhelp:
	$(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp
	@echo
	@echo "Build finished; now you can run HTML Help Workshop with the" \
	      ".hhp project file in $(BUILDDIR)/htmlhelp."

qthelp:
	$(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp
	@echo
	@echo "Build finished; now you can run "qcollectiongenerator" with the" \
	      ".qhcp project file in $(BUILDDIR)/qthelp, like this:"
	@echo "# qcollectiongenerator $(BUILDDIR)/qthelp/imbalanced-learn.qhcp"
	@echo "To view the help file:"
	@echo "# assistant -collectionFile $(BUILDDIR)/qthelp/imbalanced-learn.qhc"

devhelp:
	$(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp
	@echo
	@echo "Build finished."
	@echo "To view the help file:"
	@echo "# mkdir -p $$HOME/.local/share/devhelp/imbalanced-learn"
	@echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/imbalanced-learn"
	@echo "# devhelp"

epub:
	$(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub
	@echo
	@echo "Build finished. The epub file is in $(BUILDDIR)/epub."

latex:
	$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
	@echo
	@echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex."
	@echo "Run \`make' in that directory to run these through (pdf)latex" \
	      "(use \`make latexpdf' here to do that automatically)."

latexpdf:
	$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
	@echo "Running LaTeX files through pdflatex..."
	$(MAKE) -C $(BUILDDIR)/latex all-pdf
	@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."

latexpdfja:
	$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
	@echo "Running LaTeX files through platex and dvipdfmx..."
	$(MAKE) -C $(BUILDDIR)/latex all-pdf-ja
	@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."

text:
	$(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text
	@echo
	@echo "Build finished. The text files are in $(BUILDDIR)/text."

man:
	$(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man
	@echo
	@echo "Build finished. The manual pages are in $(BUILDDIR)/man."

texinfo:
	$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
	@echo
	@echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo."
	@echo "Run \`make' in that directory to run these through makeinfo" \
	      "(use \`make info' here to do that automatically)."

info:
	$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
	@echo "Running Texinfo files through makeinfo..."
	make -C $(BUILDDIR)/texinfo info
	@echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo."

gettext:
	$(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale
	@echo
	@echo "Build finished. The message catalogs are in $(BUILDDIR)/locale."

changes:
	$(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes
	@echo
	@echo "The overview file is in $(BUILDDIR)/changes."

linkcheck:
	$(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
	@echo
	@echo "Link check complete; look for any errors in the above output " \
	      "or in $(BUILDDIR)/linkcheck/output.txt."

doctest:
	$(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest
	@echo "Testing of doctests in the sources finished, look at the " \
	      "results in $(BUILDDIR)/doctest/output.txt."

xml:
	$(SPHINXBUILD) -b xml $(ALLSPHINXOPTS) $(BUILDDIR)/xml
	@echo
	@echo "Build finished. The XML files are in $(BUILDDIR)/xml."

pseudoxml:
	$(SPHINXBUILD) -b pseudoxml $(ALLSPHINXOPTS) $(BUILDDIR)/pseudoxml
	@echo
	@echo "Build finished. The pseudo-XML files are in $(BUILDDIR)/pseudoxml."


================================================
FILE: doc/_static/css/imbalanced-learn.css
================================================
@import url("theme.css");

.highlight a {
  text-decoration: underline;
}

.deprecated p {
  padding: 10px 7px 10px 10px;
  color: #b94a48;
  background-color: #f3e5e5;
  border: 1px solid #eed3d7;
}

.deprecated p span.versionmodified {
  font-weight: bold;
}

.wy-nav-content {
  max-width: 1200px !important;
}

/* Override some aspects of the pydata-sphinx-theme */

/* Main index page overview cards */

.intro-card {
  padding: 30px 10px 20px 10px;
}

.intro-card .sd-card-img-top {
  margin: 10px;
  height: 52px;
  background: none !important;
}

.intro-card .sd-card-title {
  color: var(--pst-color-primary);
  font-size: var(--pst-font-size-h5);
  padding: 1rem 0rem 0.5rem 0rem;
}

.intro-card .sd-card-footer {
  border: none !important;
}

.intro-card .sd-card-footer p.sd-card-text {
  max-width: 220px;
  margin-left: auto;
  margin-right: auto;
}

.intro-card .sd-btn-secondary {
  background-color: #6c757d !important;
  border-color: #6c757d !important;
}

.intro-card .sd-btn-secondary:hover {
  background-color: #5a6268 !important;
  border-color: #545b62 !important;
}

.card, .card img {
  background-color: var(--pst-color-background);
}


================================================
FILE: doc/_static/js/copybutton.js
================================================
$(document).ready(function() {
    /* Add a [>>>] button on the top-right corner of code samples to hide
     * the >>> and ... prompts and the output and thus make the code
     * copyable. */
    var div = $('.highlight-python .highlight,' +
                '.highlight-python3 .highlight,' +
                '.highlight-pycon .highlight,' +
		'.highlight-default .highlight')
    var pre = div.find('pre');

    // get the styles from the current theme
    pre.parent().parent().css('position', 'relative');
    var hide_text = 'Hide the prompts and output';
    var show_text = 'Show the prompts and output';
    var border_width = pre.css('border-top-width');
    var border_style = pre.css('border-top-style');
    var border_color = pre.css('border-top-color');
    var button_styles = {
        'cursor':'pointer', 'position': 'absolute', 'top': '0', 'right': '0',
        'border-color': border_color, 'border-style': border_style,
        'border-width': border_width, 'color': border_color, 'text-size': '75%',
        'font-family': 'monospace', 'padding-left': '0.2em', 'padding-right': '0.2em',
        'border-radius': '0 3px 0 0'
    }

    // create and add the button to all the code blocks that contain >>>
    div.each(function(index) {
        var jthis = $(this);
        if (jthis.find('.gp').length > 0) {
            var button = $('<span class="copybutton">&gt;&gt;&gt;</span>');
            button.css(button_styles)
            button.attr('title', hide_text);
            button.data('hidden', 'false');
            jthis.prepend(button);
        }
        // tracebacks (.gt) contain bare text elements that need to be
        // wrapped in a span to work with .nextUntil() (see later)
        jthis.find('pre:has(.gt)').contents().filter(function() {
            return ((this.nodeType == 3) && (this.data.trim().length > 0));
        }).wrap('<span>');
    });

    // define the behavior of the button when it's clicked
    $('.copybutton').click(function(e){
        e.preventDefault();
        var button = $(this);
        if (button.data('hidden') === 'false') {
            // hide the code output
            button.parent().find('.go, .gp, .gt').hide();
            button.next('pre').find('.gt').nextUntil('.gp, .go').css('visibility', 'hidden');
            button.css('text-decoration', 'line-through');
            button.attr('title', show_text);
            button.data('hidden', 'true');
        } else {
            // show the code output
            button.parent().find('.go, .gp, .gt').show();
            button.next('pre').find('.gt').nextUntil('.gp, .go').css('visibility', 'visible');
            button.css('text-decoration', 'none');
            button.attr('title', hide_text);
            button.data('hidden', 'false');
        }
    });
});


================================================
FILE: doc/_templates/class.rst
================================================
{{objname}}
{{ underline }}==============

.. currentmodule:: {{ module }}

.. autoclass:: {{ objname }}

   {% block methods %}

   {% if methods %}
   .. rubric:: Methods

   .. autosummary::
   {% for item in methods %}
      {% if '__init__' not in item %}
        ~{{ name }}.{{ item }}
      {% endif %}
   {%- endfor %}
   {% endif %}
   {% endblock %}

.. include:: {{module}}.{{objname}}.examples

.. raw:: html

    <div style='clear:both'></div>


================================================
FILE: doc/_templates/function.rst
================================================
{{objname}}
{{ underline }}====================

.. currentmodule:: {{ module }}

.. autofunction:: {{ objname }}

.. include:: {{module}}.{{objname}}.examples

.. raw:: html

    <div style='clear:both'></div>


================================================
FILE: doc/_templates/numpydoc_docstring.rst
================================================
{{index}}
{{summary}}
{{extended_summary}}
{{parameters}}
{{returns}}
{{yields}}
{{other_parameters}}
{{attributes}}
{{raises}}
{{warns}}
{{warnings}}
{{see_also}}
{{notes}}
{{references}}
{{examples}}
{{methods}}


================================================
FILE: doc/_templates/sidebar-search-bs.html
================================================
<div class="navbar-brand-box">
  <a class="navbar-brand-box text-wrap" href="{{ pathto('index') }}">
    {% if logo %}
    <img
      src="{{ pathto('_static/' + logo, 1) }}"
      class="logo"
      style="width: 60%"
      alt="logo"
    />
    {% endif %} {% if docstitle %}
    <h4 class="site-logo" id="site-title">{{ docstitle }}</h4>
    {% endif %}
  </a>
</div>


================================================
FILE: doc/about.rst
================================================
About us
========

.. include:: ../AUTHORS.rst

.. _citing-imbalanced-learn:

Citing imbalanced-learn
-----------------------

If you use imbalanced-learn in a scientific publication, we would appreciate
citations to the following paper::

  @article{JMLR:v18:16-365,
  author  = {Guillaume  Lema{{\^i}}tre and Fernando Nogueira and Christos K. Aridas},
  title   = {Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning},
  journal = {Journal of Machine Learning Research},
  year    = {2017},
  volume  = {18},
  number  = {17},
  pages   = {1-5},
  url     = {http://jmlr.org/papers/v18/16-365.html}
  }


================================================
FILE: doc/bibtex/refs.bib
================================================
@inproceedings{mani2003knn,
  title={kNN approach to unbalanced data distributions: a case study involving information extraction},
  author={Mani, Inderjeet and Zhang, I},
  booktitle={Proceedings of workshop on learning from imbalanced datasets},
  volume={126},
  year={2003}
}


@article{batista2004study,
  title={A study of the behavior of several methods for balancing machine learning training data},
  author={Batista, Gustavo EAPA and Prati, Ronaldo C and Monard, Maria Carolina},
  journal={ACM SIGKDD explorations newsletter},
  volume={6},
  number={1},
  pages={20--29},
  year={2004},
  publisher={ACM}
}

@inproceedings{batista2003balancing,
  title={Balancing Training Data for Automated Annotation of Keywords: a Case Study.},
  author={Batista, Gustavo EAPA and Bazzan, Ana LC and Monard, Maria Carolina},
  booktitle={WOB},
  pages={10--18},
  year={2003}
}

@article{chen2004using,
  title={Using random forest to learn imbalanced data},
  author={Chen, Chao and Liaw, Andy and Breiman, Leo and others},
  journal={University of California, Berkeley},
  volume={110},
  number={1-12},
  pages={24},
  year={2004}
}

@article{liu2008exploratory,
  title={Exploratory undersampling for class-imbalance learning},
  author={Liu, Xu-Ying and Wu, Jianxin and Zhou, Zhi-Hua},
  journal={IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics)},
  volume={39},
  number={2},
  pages={539--550},
  year={2008},
  publisher={IEEE}
}

@article{seiffert2009rusboost,
  title={RUSBoost: A hybrid approach to alleviating class imbalance},
  author={Seiffert, Chris and Khoshgoftaar, Taghi M and Van Hulse, Jason and Napolitano, Amri},
  journal={IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans},
  volume={40},
  number={1},
  pages={185--197},
  year={2009},
  publisher={IEEE}
}

@inproceedings{kubat1997addressing,
  title={Addressing the curse of imbalanced training sets: one-sided selection},
  author={Kubat, Miroslav and Matwin, Stan and others},
  booktitle={Icml},
  volume={97},
  pages={179--186},
  year={1997},
  organization={Nashville, USA}
}

@article{barandela2003strategies,
  title={Strategies for learning in class imbalance problems},
  author={Barandela, Ricardo and S{\'a}nchez, Jos{\'e} Salvador and Garca, V and Rangel, Edgar},
  journal={Pattern Recognition},
  volume={36},
  number={3},
  pages={849--851},
  year={2003},
  publisher={Elsevier Science Publishing Company, Inc.}
}

@article{garcia2012effectiveness,
  title={On the effectiveness of preprocessing methods when dealing with different levels of class imbalance},
  author={Garc{\'\i}a, Vicente and S{\'a}nchez, Jos{\'e} Salvador and Mollineda, Ram{\'o}n Alberto},
  journal={Knowledge-Based Systems},
  volume={25},
  number={1},
  pages={13--21},
  year={2012},
  publisher={Elsevier}
}

@inproceedings{he2008adasyn,
  title={ADASYN: Adaptive synthetic sampling approach for imbalanced learning},
  author={He, Haibo and Bai, Yang and Garcia, Edwardo A and Li, Shutao},
  booktitle={2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence)},
  pages={1322--1328},
  year={2008},
  organization={IEEE}
}

@article{chawla2002smote,
  title={SMOTE: synthetic minority over-sampling technique},
  author={Chawla, Nitesh V and Bowyer, Kevin W and Hall, Lawrence O and Kegelmeyer, W Philip},
  journal={Journal of artificial intelligence research},
  volume={16},
  pages={321--357},
  year={2002}
}

@inproceedings{han2005borderline,
  title={Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning},
  author={Han, Hui and Wang, Wen-Yuan and Mao, Bing-Huan},
  booktitle={International conference on intelligent computing},
  pages={878--887},
  year={2005},
  organization={Springer}
}

@inproceedings{nguyen2009borderline,
  title={Borderline over-sampling for imbalanced data classification},
  author={Nguyen, Hien M and Cooper, Eric W and Kamei, Katsuari},
  booktitle={Proceedings: Fifth International Workshop on Computational Intelligence \& Applications},
  volume={2009},
  number={1},
  pages={24--29},
  year={2009},
  organization={IEEE SMC Hiroshima Chapter}
}

@article{last2017oversampling,
  title={Oversampling for Imbalanced Learning Based on K-Means and SMOTE},
  author={Last, Felix and Douzas, Georgios and Bacao, Fernando},
  journal={arXiv preprint arXiv:1711.00837},
  year={2017}
}

@article{tomek1976two,
  title={Two modifications of CNN},
  author={Tomek, Ivan},
  journal={IEEE Trans. Systems, Man and Cybernetics},
  volume={6},
  pages={769--772},
  year={1976}
}

@article{wilson1972asymptotic,
  title={Asymptotic properties of nearest neighbor rules using edited data},
  author={Wilson, Dennis L},
  journal={IEEE Transactions on Systems, Man, and Cybernetics},
  number={3},
  pages={408--421},
  year={1972},
  publisher={IEEE}
}

@article{tomek1976experiment,
  title={An experiment with the edited nearest-neighbor rule},
  author={Tomek, Ivan},
  journal={IEEE Transactions on systems, Man, and Cybernetics},
  volume={6},
  number={6},
  pages={448--452},
  year={1976}
}

@article{hart1968condensed,
  title={The condensed nearest neighbor rule (Corresp.)},
  author={Hart, Peter},
  journal={IEEE transactions on information theory},
  volume={14},
  number={3},
  pages={515--516},
  year={1968},
  publisher={Citeseer}
}

@inproceedings{laurikkala2001improving,
  title={Improving identification of difficult small classes by balancing class distribution},
  author={Laurikkala, Jorma},
  booktitle={Conference on Artificial Intelligence in Medicine in Europe},
  pages={63--66},
  year={2001},
  organization={Springer}
}

@article{smith2014instance,
  title={An instance level analysis of data complexity},
  author={Smith, Michael R and Martinez, Tony and Giraud-Carrier, Christophe},
  journal={Machine learning},
  volume={95},
  number={2},
  pages={225--256},
  year={2014},
  publisher={Springer}
}

@article{torelli2014rose,
  author = {Menardi, Giovanna and Torelli, Nicola},
  title={Training and assessing classification rules with imbalanced data},
  journal={Data Mining and Knowledge Discovery},
  volume={28},
  pages={92-122},
  year={2014},
  publisher={Springer},
  issue = {1},
  issn = {1573-756X},
  url = {https://doi.org/10.1007/s10618-012-0295-5},
  doi = {10.1007/s10618-012-0295-5}
}

@article{esuli2009ordinal,
  author = {A. Esuli and S. Baccianella and F. Sebastiani},
  title = {Evaluation Measures for Ordinal Regression},
  journal = {Intelligent Systems Design and Applications, International Conference on},
  year = {2009},
  volume = {1},
  issn = {},
  pages = {283-287},
  keywords = {ordinal regression;ordinal classification;evaluation measures;class imbalance;product reviews},
  doi = {10.1109/ISDA.2009.230},
  url = {https://doi.ieeecomputersociety.org/10.1109/ISDA.2009.230},
  publisher = {IEEE Computer Society},
  address = {Los Alamitos, CA, USA},
  month = {dec}
}

@article{stanfill1986toward,
  title={Toward memory-based reasoning},
  author={Stanfill, Craig and Waltz, David},
  journal={Communications of the ACM},
  volume={29},
  number={12},
  pages={1213--1228},
  year={1986},
  publisher={ACM New York, NY, USA}
}

@article{wilson1997improved,
  title={Improved heterogeneous distance functions},
  author={Wilson, D Randall and Martinez, Tony R},
  journal={Journal of artificial intelligence research},
  volume={6},
  pages={1--34},
  year={1997}
}

@inproceedings{wang2009diversity,
  title={Diversity analysis on imbalanced data sets by using ensemble models},
  author={Wang, Shuo and Yao, Xin},
  booktitle={2009 IEEE symposium on computational intelligence and data mining},
  pages={324--331},
  year={2009},
  organization={IEEE}
}

@article{hido2009roughly,
  title={Roughly balanced bagging for imbalanced data},
  author={Hido, Shohei and Kashima, Hisashi and Takahashi, Yutaka},
  journal={Statistical Analysis and Data Mining: The ASA Data Science Journal},
  volume={2},
  number={5-6},
  pages={412--426},
  year={2009},
  publisher={Wiley Online Library}
}

@article{maclin1997empirical,
  title={An empirical evaluation of bagging and boosting},
  author={Maclin, Richard and Opitz, David},
  journal={AAAI/IAAI},
  volume={1997},
  pages={546--551},
  year={1997}
}


================================================
FILE: doc/combine.rst
================================================
.. _combine:

=======================================
Combination of over- and under-sampling
=======================================

.. currentmodule:: imblearn.over_sampling

We previously presented :class:`SMOTE` and showed that this method can generate
noisy samples by interpolating new points between marginal outliers and
inliers. This issue can be solved by cleaning the space resulting
from over-sampling.

.. currentmodule:: imblearn.combine

In this regard, Tomek's link and edited nearest-neighbours are the two cleaning
methods that have been added to the pipeline after applying SMOTE over-sampling
to obtain a cleaner space. The two ready-to use classes imbalanced-learn
implements for combining over- and undersampling methods are: (i)
:class:`SMOTETomek` :cite:`batista2004study` and (ii) :class:`SMOTEENN`
:cite:`batista2003balancing`.

Those two classes can be used like any other sampler with parameters identical
to their former samplers::

  >>> from collections import Counter
  >>> from sklearn.datasets import make_classification
  >>> X, y = make_classification(n_samples=5000, n_features=2, n_informative=2,
  ...                            n_redundant=0, n_repeated=0, n_classes=3,
  ...                            n_clusters_per_class=1,
  ...                            weights=[0.01, 0.05, 0.94],
  ...                            class_sep=0.8, random_state=0)
  >>> print(sorted(Counter(y).items()))
  [(0, 64), (1, 262), (2, 4674)]
  >>> from imblearn.combine import SMOTEENN
  >>> smote_enn = SMOTEENN(random_state=0)
  >>> X_resampled, y_resampled = smote_enn.fit_resample(X, y)
  >>> print(sorted(Counter(y_resampled).items()))
  [(0, 4060), (1, 4381), (2, 3502)]
  >>> from imblearn.combine import SMOTETomek
  >>> smote_tomek = SMOTETomek(random_state=0)
  >>> X_resampled, y_resampled = smote_tomek.fit_resample(X, y)
  >>> print(sorted(Counter(y_resampled).items()))
  [(0, 4499), (1, 4566), (2, 4413)]

We can also see in the example below that :class:`SMOTEENN` tends to clean more
noisy samples than :class:`SMOTETomek`.

.. image:: ./auto_examples/combine/images/sphx_glr_plot_comparison_combine_001.png
   :target: ./auto_examples/combine/plot_comparison_combine.html
   :scale: 60
   :align: center

.. topic:: Examples

  * :ref:`sphx_glr_auto_examples_combine_plot_comparison_combine.py`


================================================
FILE: doc/common_pitfalls.rst
================================================
.. _common_pitfalls:

=========================================
Common pitfalls and recommended practices
=========================================

This section is a complement to the documentation given
`[here] <https://scikit-learn.org/dev/common_pitfalls.html>`_ in scikit-learn.
Indeed, we will highlight the issue of misusing resampling, leading to a
**data leakage**. Due to this leakage, the performance of a model reported
will be over-optimistic.

Data leakage
============

As mentioned in the scikit-learn documentation, data leakage occurs when
information that would not be available at prediction time is used when
building the model.

In the resampling setting, there is a common pitfall that corresponds to
resample the **entire** dataset before splitting it into a train and a test
partitions. Note that it would be equivalent to resample the train and test
partitions as well.

Such of a processing leads to two issues:

* the model will not be tested on a dataset with class distribution similar
  to the real use-case. Indeed, by resampling the entire dataset, both the
  training and testing set will be potentially balanced while the model should
  be tested on the natural imbalanced dataset to evaluate the potential bias
  of the model;
* the resampling procedure might use information about samples in the dataset
  to either generate or select some of the samples. Therefore, we might use
  information of samples which will be later used as testing samples which
  is the typical data leakage issue.

We will demonstrate the wrong and right ways to do some sampling and emphasize
the tools that one should use, avoiding to fall in the trap.

We will use the adult census dataset. For the sake of simplicity, we will only
use the numerical features. Also, we will make the dataset more imbalanced to
increase the effect of the wrongdoings::

  >>> from sklearn.datasets import fetch_openml
  >>> from imblearn.datasets import make_imbalance
  >>> X, y = fetch_openml(
  ...     data_id=1119, as_frame=True, return_X_y=True
  ... )
  >>> X = X.select_dtypes(include="number")
  >>> X, y = make_imbalance(
  ...     X, y, sampling_strategy={">50K": 300}, random_state=1
  ... )

Let's first check the balancing ratio on this dataset::

  >>> from collections import Counter
  >>> {key: value / len(y) for key, value in Counter(y).items()}
  {'<=50K': 0.988..., '>50K': 0.011...}

To later highlight some of the issue, we will keep aside a left-out set that we
will not use for the evaluation of the model::

  >>> from sklearn.model_selection import train_test_split
  >>> X, X_left_out, y, y_left_out = train_test_split(
  ...     X, y, stratify=y, random_state=0
  ... )

We will use a :class:`sklearn.ensemble.HistGradientBoostingClassifier` as a
baseline classifier. First, we will train and check the performance of this
classifier, without any preprocessing to alleviate the bias toward the majority
class. We evaluate the generalization performance of the classifier via
cross-validation::

  >>> from sklearn.ensemble import HistGradientBoostingClassifier
  >>> from sklearn.model_selection import cross_validate
  >>> model = HistGradientBoostingClassifier(random_state=0)
  >>> cv_results = cross_validate(
  ...     model, X, y, scoring="balanced_accuracy",
  ...     return_train_score=True, return_estimator=True,
  ...     n_jobs=-1
  ... )
  >>> print(
  ...     f"Balanced accuracy mean +/- std. dev.: "
  ...     f"{cv_results['test_score'].mean():.3f} +/- "
  ...     f"{cv_results['test_score'].std():.3f}"
  ... )
  Balanced accuracy mean +/- std. dev.: 0.609 +/- 0.024

We see that the classifier does not give good performance in terms of balanced
accuracy mainly due to the class imbalance issue.

In the cross-validation, we stored the different classifiers of all folds. We
will show that evaluating these classifiers on the left-out data will give
close statistical performance::

  >>> import numpy as np
  >>> from sklearn.metrics import balanced_accuracy_score
  >>> scores = []
  >>> for fold_id, cv_model in enumerate(cv_results["estimator"]):
  ...     scores.append(
  ...         balanced_accuracy_score(
  ...             y_left_out, cv_model.predict(X_left_out)
  ...         )
  ...     )
  >>> print(
  ...     f"Balanced accuracy mean +/- std. dev.: "
  ...     f"{np.mean(scores):.3f} +/- {np.std(scores):.3f}"
  ... )
  Balanced accuracy mean +/- std. dev.: 0.628 +/- 0.009

Let's now show the **wrong** pattern to apply when it comes to resampling to
alleviate the class imbalance issue. We will use a sampler to balance the
**entire** dataset and check the statistical performance of our classifier via
cross-validation::

  >>> from imblearn.under_sampling import RandomUnderSampler
  >>> sampler = RandomUnderSampler(random_state=0)
  >>> X_resampled, y_resampled = sampler.fit_resample(X, y)
  >>> model = HistGradientBoostingClassifier(random_state=0)
  >>> cv_results = cross_validate(
  ...     model, X_resampled, y_resampled, scoring="balanced_accuracy",
  ...     return_train_score=True, return_estimator=True,
  ...     n_jobs=-1
  ... )
  >>> print(
  ...     f"Balanced accuracy mean +/- std. dev.: "
  ...     f"{cv_results['test_score'].mean():.3f} +/- "
  ...     f"{cv_results['test_score'].std():.3f}"
  ... )
  Balanced accuracy mean +/- std. dev.: 0.724 +/- 0.042

The cross-validation performance looks good, but evaluating the classifiers
on the left-out data shows a different picture::

  >>> scores = []
  >>> for fold_id, cv_model in enumerate(cv_results["estimator"]):
  ...     scores.append(
  ...         balanced_accuracy_score(
  ...             y_left_out, cv_model.predict(X_left_out)
  ...        )
  ...     )
  >>> print(
  ...     f"Balanced accuracy mean +/- std. dev.: "
  ...     f"{np.mean(scores):.3f} +/- {np.std(scores):.3f}"
  ... )
  Balanced accuracy mean +/- std. dev.: 0.698 +/- 0.014

We see that the performance is now worse than the cross-validated performance.
Indeed, the data leakage gave us too optimistic results due to the reason
stated earlier in this section.

We will now illustrate the correct pattern to use. Indeed, as in scikit-learn,
using a :class:`~imblearn.pipeline.Pipeline` avoids to make any data leakage
because the resampling will be delegated to imbalanced-learn and does not
require any manual steps::

  >>> from imblearn.pipeline import make_pipeline
  >>> model = make_pipeline(
  ...     RandomUnderSampler(random_state=0),
  ...     HistGradientBoostingClassifier(random_state=0)
  ... )
  >>> cv_results = cross_validate(
  ...     model, X, y, scoring="balanced_accuracy",
  ...     return_train_score=True, return_estimator=True,
  ...     n_jobs=-1
  ... )
  >>> print(
  ...     f"Balanced accuracy mean +/- std. dev.: "
  ...     f"{cv_results['test_score'].mean():.3f} +/- "
  ...     f"{cv_results['test_score'].std():.3f}"
  ... )
  Balanced accuracy mean +/- std. dev.: 0.732 +/- 0.019

We observe that we get good statistical performance as well. However, now we
can check the performance of the model from each cross-validation fold to
ensure that we have similar performance::

  >>> scores = []
  >>> for fold_id, cv_model in enumerate(cv_results["estimator"]):
  ...     scores.append(
  ...         balanced_accuracy_score(
  ...             y_left_out, cv_model.predict(X_left_out)
  ...        )
  ...     )
  >>> print(
  ...     f"Balanced accuracy mean +/- std. dev.: "
  ...     f"{np.mean(scores):.3f} +/- {np.std(scores):.3f}"
  ... )
  Balanced accuracy mean +/- std. dev.: 0.727 +/- 0.008

We see that the statistical performance are very close to the cross-validation
study that we perform, without any sign of over-optimistic results.


================================================
FILE: doc/conf.py
================================================
#
# imbalanced-learn documentation build configuration file, created by
# sphinx-quickstart on Mon Jan 18 14:44:12 2016.
#
# This file is execfile()d with the current directory set to its
# containing dir.
#
# Note that not all possible configuration values are present in this
# autogenerated file.
#
# All configuration values have a default; values that are commented out
# serve to show the default.

import os
import sys
from datetime import datetime
from io import StringIO
from pathlib import Path

# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
sys.path.insert(0, os.path.abspath("sphinxext"))
from github_link import make_linkcode_resolve  # noqa

# -- General configuration ------------------------------------------------

# If your documentation needs a minimal Sphinx version, state it here.
# needs_sphinx = '1.0'

# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
    "sphinx.ext.autodoc",
    "sphinx.ext.autosummary",
    "sphinx.ext.doctest",
    "sphinx.ext.intersphinx",
    "sphinx.ext.linkcode",
    "sphinxcontrib.bibtex",
    "numpydoc",
    "sphinx_issues",
    "sphinx_gallery.gen_gallery",
    "sphinx_copybutton",
    "sphinx_design",
]

# Specify how to identify the prompt when copying code snippets
copybutton_prompt_text = r">>> |\.\.\. "
copybutton_prompt_is_regexp = True

# Add any paths that contain templates here, relative to this directory.
templates_path = ["_templates"]

# The suffix of source filenames.
source_suffix = ".rst"

# The master toctree document.
master_doc = "index"

# General information about the project.
project = "imbalanced-learn"
copyright = f"2014-{datetime.now().year}, The imbalanced-learn developers"

# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
# built documents.
#
# The short X.Y version.
from imblearn import __version__  # noqa

version = __version__
# The full version, including alpha/beta/rc tags.
release = __version__

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
exclude_patterns = ["_build", "_templates"]

# The reST default role (used for this markup: `text`) to use for all
# documents.
default_role = "literal"

# If true, '()' will be appended to :func: etc. cross-reference text.
add_function_parentheses = False

# The name of the Pygments (syntax highlighting) style to use.
pygments_style = "sphinx"

# -- Options for HTML output ----------------------------------------------

# The theme to use for HTML and HTML Help pages.  See the documentation for
# a list of builtin themes.
html_theme = "pydata_sphinx_theme"
html_title = f"Version {version}"
html_favicon = "_static/img/favicon.ico"
html_logo = "_static/img/logo_wide.png"
html_style = "css/imbalanced-learn.css"
html_css_files = [
    "css/imbalanced-learn.css",
]
html_sidebars = {
    "changelog": [],
}

html_theme_options = {
    "external_links": [],
    "github_url": "https://github.com/scikit-learn-contrib/imbalanced-learn",
    "use_edit_page_button": True,
    "show_toc_level": 1,
    # "navbar_align": "right",  # For testing that the navbar items align properly
    "logo": {
        "image_dark": (
            "https://imbalanced-learn.org/stable/_static/img/logo_wide_dark.png"
        )
    },
}

html_context = {
    "github_user": "scikit-learn-contrib",
    "github_repo": "imbalanced-learn",
    "github_version": "master",
    "doc_path": "doc",
}

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ["_static"]

# Output file base name for HTML help builder.
htmlhelp_basename = "imbalanced-learndoc"

# -- Options for autodoc ------------------------------------------------------

autodoc_default_options = {
    "members": True,
    "inherited-members": True,
}

# generate autosummary even if no references
autosummary_generate = True

# -- Options for numpydoc -----------------------------------------------------

# this is needed for some reason...
# see https://github.com/numpy/numpydoc/issues/69
numpydoc_show_class_members = False

# -- Options for sphinxcontrib-bibtex -----------------------------------------

# bibtex file
bibtex_bibfiles = ["bibtex/refs.bib"]

# -- Options for intersphinx --------------------------------------------------

# intersphinx configuration
intersphinx_mapping = {
    "python": (f"https://docs.python.org/{sys.version_info.major}", None),
    "numpy": ("https://numpy.org/doc/stable", None),
    "scipy": ("https://docs.scipy.org/doc/scipy/reference", None),
    "matplotlib": ("https://matplotlib.org/", None),
    "pandas": ("https://pandas.pydata.org/pandas-docs/stable/", None),
    "joblib": ("https://joblib.readthedocs.io/en/latest/", None),
    "seaborn": ("https://seaborn.pydata.org/", None),
}

# -- Options for sphinx-gallery -----------------------------------------------

# Generate the plot for the gallery
plot_gallery = True

# sphinx-gallery configuration
sphinx_gallery_conf = {
    "doc_module": "imblearn",
    "backreferences_dir": os.path.join("references/generated"),
    "show_memory": True,
    "reference_url": {"imblearn": None},
}

# -- Options for github link for what's new -----------------------------------

# Config for sphinx_issues
issues_uri = "https://github.com/scikit-learn-contrib/imbalanced-learn/issues/{issue}"
issues_github_path = "scikit-learn-contrib/imbalanced-learn"
issues_user_uri = "https://github.com/{user}"

# The following is used by sphinx.ext.linkcode to provide links to github
linkcode_resolve = make_linkcode_resolve(
    "imblearn",
    (
        "https://github.com/scikit-learn-contrib/"
        "imbalanced-learn/blob/{revision}/"
        "{package}/{path}#L{lineno}"
    ),
)

# -- Options for LaTeX output ---------------------------------------------

latex_elements = {
    # The paper size ('letterpaper' or 'a4paper').
    # 'papersize': 'letterpaper',
    # The font size ('10pt', '11pt' or '12pt').
    # 'pointsize': '10pt',
    # Additional stuff for the LaTeX preamble.
    # 'preamble': '',
}

# Grouping the document tree into LaTeX files. List of tuples
# (source start file, target name, title,
#  author, documentclass [howto, manual, or own class]).
latex_documents = [
    (
        "index",
        "imbalanced-learn.tex",
        "imbalanced-learn Documentation",
        "The imbalanced-learn developers",
        "manual",
    ),
]

# -- Options for manual page output ---------------------------------------

# If false, no module index is generated.
# latex_domain_indices = True


# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
man_pages = [
    (
        "index",
        "imbalanced-learn",
        "imbalanced-learn Documentation",
        ["The imbalanced-learn developers"],
        1,
    )
]

# If true, show URL addresses after external links.
# man_show_urls = False

# -- Options for Texinfo output -------------------------------------------

# Grouping the document tree into Texinfo files. List of tuples
# (source start file, target name, title, author,
#  dir menu entry, description, category)
texinfo_documents = [
    (
        "index",
        "imbalanced-learn",
        "imbalanced-learn Documentation",
        "The imbalanced-learn developerss",
        "imbalanced-learn",
        "Toolbox for imbalanced dataset in machine learning.",
        "Miscellaneous",
    ),
]

# -- Dependencies generation ----------------------------------------------


def generate_min_dependency_table(app):
    """Generate min dependency table for docs."""
    from sklearn._min_dependencies import dependent_packages

    # get length of header
    package_header_len = max(len(package) for package in dependent_packages) + 4
    version_header_len = len("Minimum Version") + 4
    tags_header_len = max(len(tags) for _, tags in dependent_packages.values()) + 4

    output = StringIO()
    output.write(
        " ".join(
            ["=" * package_header_len, "=" * version_header_len, "=" * tags_header_len]
        )
    )
    output.write("\n")
    dependency_title = "Dependency"
    version_title = "Minimum Version"
    tags_title = "Purpose"

    output.write(
        f"{dependency_title:<{package_header_len}} "
        f"{version_title:<{version_header_len}} "
        f"{tags_title}\n"
    )

    output.write(
        " ".join(
            ["=" * package_header_len, "=" * version_header_len, "=" * tags_header_len]
        )
    )
    output.write("\n")

    for package, (version, tags) in dependent_packages.items():
        output.write(
            f"{package:<{package_header_len}} {version:<{version_header_len}} {tags}\n"
        )

    output.write(
        " ".join(
            ["=" * package_header_len, "=" * version_header_len, "=" * tags_header_len]
        )
    )
    output.write("\n")
    output = output.getvalue()

    with (Path(".") / "min_dependency_table.rst").open("w") as f:
        f.write(output)


def generate_min_dependency_substitutions(app):
    """Generate min dependency substitutions for docs."""
    from sklearn._min_dependencies import dependent_packages

    output = StringIO()

    for package, (version, _) in dependent_packages.items():
        package = package.capitalize()
        output.write(f".. |{package}MinVersion| replace:: {version}")
        output.write("\n")

    output = output.getvalue()

    with (Path(".") / "min_dependency_substitutions.rst").open("w") as f:
        f.write(output)


# -- Additional temporary hacks -----------------------------------------------


def setup(app):
    app.connect("builder-inited", generate_min_dependency_table)
    app.connect("builder-inited", generate_min_dependency_substitutions)


================================================
FILE: doc/datasets/index.rst
================================================
.. _datasets:

=========================
Dataset loading utilities
=========================

.. currentmodule:: imblearn.datasets

The :mod:`imblearn.datasets` package is complementing the
:mod:`sklearn.datasets` package. The package provides both: (i) a set of
imbalanced datasets to perform systematic benchmark and (ii) a utility to
create an imbalanced dataset from an original balanced dataset.

.. _zenodo:

Imbalanced datasets for benchmark
=================================

:func:`fetch_datasets` allows to fetch 27 datasets which are imbalanced and
binarized. The following data sets are available:

    +--+--------------+-------------------------------+-------+---------+-----+
    |ID|Name          | Repository & Target           | Ratio | #S      | #F  |
    +==+==============+===============================+=======+=========+=====+
    |1 |ecoli         | UCI, target: imU              | 8.6:1 | 336     | 7   |
    +--+--------------+-------------------------------+-------+---------+-----+
    |2 |optical_digits| UCI, target: 8                | 9.1:1 | 5,620   | 64  |
    +--+--------------+-------------------------------+-------+---------+-----+
    |3 |satimage      | UCI, target: 4                | 9.3:1 | 6,435   | 36  |
    +--+--------------+-------------------------------+-------+---------+-----+
    |4 |pen_digits    | UCI, target: 5                | 9.4:1 | 10,992  | 16  |
    +--+--------------+-------------------------------+-------+---------+-----+
    |5 |abalone       | UCI, target: 7                | 9.7:1 | 4,177   | 10  |
    +--+--------------+-------------------------------+-------+---------+-----+
    |6 |sick_euthyroid| UCI, target: sick euthyroid   | 9.8:1 | 3,163   | 42  |
    +--+--------------+-------------------------------+-------+---------+-----+
    |7 |spectrometer  | UCI, target: >=44             | 11:1  | 531     | 93  |
    +--+--------------+-------------------------------+-------+---------+-----+
    |8 |car_eval_34   | UCI, target: good, v good     | 12:1  | 1,728   | 21  |
    +--+--------------+-------------------------------+-------+---------+-----+
    |9 |isolet        | UCI, target: A, B             | 12:1  | 7,797   | 617 |
    +--+--------------+-------------------------------+-------+---------+-----+
    |10|us_crime      | UCI, target: >0.65            | 12:1  | 1,994   | 100 |
    +--+--------------+-------------------------------+-------+---------+-----+
    |11|yeast_ml8     | LIBSVM, target: 8             | 13:1  | 2,417   | 103 |
    +--+--------------+-------------------------------+-------+---------+-----+
    |12|scene         | LIBSVM, target: >one label    | 13:1  | 2,407   | 294 |
    +--+--------------+-------------------------------+-------+---------+-----+
    |13|libras_move   | UCI, target: 1                | 14:1  | 360     | 90  |
    +--+--------------+-------------------------------+-------+---------+-----+
    |14|thyroid_sick  | UCI, target: sick             | 15:1  | 3,772   | 52  |
    +--+--------------+-------------------------------+-------+---------+-----+
    |15|coil_2000     | KDD, CoIL, target: minority   | 16:1  | 9,822   | 85  |
    +--+--------------+-------------------------------+-------+---------+-----+
    |16|arrhythmia    | UCI, target: 06               | 17:1  | 452     | 278 |
    +--+--------------+-------------------------------+-------+---------+-----+
    |17|solar_flare_m0| UCI, target: M->0             | 19:1  | 1,389   | 32  |
    +--+--------------+-------------------------------+-------+---------+-----+
    |18|oil           | UCI, target: minority         | 22:1  | 937     | 49  |
    +--+--------------+-------------------------------+-------+---------+-----+
    |19|car_eval_4    | UCI, target: vgood            | 26:1  | 1,728   | 21  |
    +--+--------------+-------------------------------+-------+---------+-----+
    |20|wine_quality  | UCI, wine, target: <=4        | 26:1  | 4,898   | 11  |
    +--+--------------+-------------------------------+-------+---------+-----+
    |21|letter_img    | UCI, target: Z                | 26:1  | 20,000  | 16  |
    +--+--------------+-------------------------------+-------+---------+-----+
    |22|yeast_me2     | UCI, target: ME2              | 28:1  | 1,484   | 8   |
    +--+--------------+-------------------------------+-------+---------+-----+
    |23|webpage       | LIBSVM, w7a, target: minority | 33:1  | 34,780  | 300 |
    +--+--------------+-------------------------------+-------+---------+-----+
    |24|ozone_level   | UCI, ozone, data              | 34:1  | 2,536   | 72  |
    +--+--------------+-------------------------------+-------+---------+-----+
    |25|mammography   | UCI, target: minority         | 42:1  | 11,183  | 6   |
    +--+--------------+-------------------------------+-------+---------+-----+
    |26|protein_homo  | KDD CUP 2004, minority        | 11:1  | 145,751 | 74  |
    +--+--------------+-------------------------------+-------+---------+-----+
    |27|abalone_19    | UCI, target: 19               | 130:1 | 4,177   | 10  |
    +--+--------------+-------------------------------+-------+---------+-----+


A specific data set can be selected as::

  >>> from collections import Counter
  >>> from imblearn.datasets import fetch_datasets
  >>> ecoli = fetch_datasets()['ecoli']
  >>> ecoli.data.shape
  (336, 7)
  >>> print(sorted(Counter(ecoli.target).items()))
  [(-1, 301), (1, 35)]

.. _make_imbalanced:

Imbalanced generator
====================

:func:`make_imbalance` turns an original dataset into an imbalanced
dataset. This behaviour is driven by the parameter ``sampling_strategy`` which
behave similarly to other resampling algorithm. ``sampling_strategy`` can be
given as a dictionary where the key corresponds to the class and the value is
the number of samples in the class::

  >>> from sklearn.datasets import load_iris
  >>> from imblearn.datasets import make_imbalance
  >>> iris = load_iris()
  >>> sampling_strategy = {0: 20, 1: 30, 2: 40}
  >>> X_imb, y_imb = make_imbalance(iris.data, iris.target,
  ...                               sampling_strategy=sampling_strategy)
  >>> sorted(Counter(y_imb).items())
  [(0, 20), (1, 30), (2, 40)]

Note that all samples of a class are passed-through if the class is not mentioned
in the dictionary::

  >>> sampling_strategy = {0: 10}
  >>> X_imb, y_imb = make_imbalance(iris.data, iris.target,
  ...                               sampling_strategy=sampling_strategy)
  >>> sorted(Counter(y_imb).items())
  [(0, 10), (1, 50), (2, 50)]

Instead of a dictionary, a function can be defined and directly pass to
``sampling_strategy``::

  >>> def ratio_multiplier(y):
  ...     multiplier = {0: 0.5, 1: 0.7, 2: 0.95}
  ...     target_stats = Counter(y)
  ...     for key, value in target_stats.items():
  ...         target_stats[key] = int(value * multiplier[key])
  ...     return target_stats
  >>> X_imb, y_imb = make_imbalance(iris.data, iris.target,
  ...                               sampling_strategy=ratio_multiplier)
  >>> sorted(Counter(y_imb).items())
  [(0, 25), (1, 35), (2, 47)]

It would also work with pandas dataframe::

  >>> from sklearn.datasets import fetch_openml
  >>> df, y = fetch_openml(
  ...     'iris', version=1, return_X_y=True, as_frame=True)
  >>> df_resampled, y_resampled = make_imbalance(
  ...     df, y, sampling_strategy={'Iris-setosa': 10, 'Iris-versicolor': 20},
  ...     random_state=42)
  >>> df_resampled.head()
          sepallength  sepalwidth  petallength  petalwidth
    13          4.3         3.0          1.1         0.1
    39          5.1         3.4          1.5         0.2
    30          4.8         3.1          1.6         0.2
    45          4.8         3.0          1.4         0.3
    17          5.1         3.5          1.4         0.3
  >>> Counter(y_resampled)
  Counter({'Iris-virginica': 50, 'Iris-versicolor': 20, 'Iris-setosa': 10})

See :ref:`sphx_glr_auto_examples_datasets_plot_make_imbalance.py` and
:ref:`sphx_glr_auto_examples_api_plot_sampling_strategy_usage.py`.


================================================
FILE: doc/developers_utils.rst
================================================
.. _developers-utils:

===================
Developer guideline
===================

Developer utilities
-------------------

Imbalanced-learn contains a number of utilities to help with development. These are
located in :mod:`imblearn.utils`, and include tools in a number of categories.
All the following functions and classes are in the module :mod:`imblearn.utils`.

.. warning ::

   These utilities are meant to be used internally within the imbalanced-learn
   package. They are not guaranteed to be stable between versions of
   imbalanced-learn. Backports, in particular, will be removed as the
   imbalanced-learn dependencies evolve.


Validation Tools
~~~~~~~~~~~~~~~~

.. currentmodule:: imblearn.utils

These are tools used to check and validate input. When you write a function
which accepts arrays, matrices, or sparse matrices as arguments, the following
should be used when applicable.

- :func:`check_neighbors_object`: Check the objects is consistent to be a NN.
- :func:`check_target_type`: Check the target types to be conform to the current
  samplers.
- :func:`check_sampling_strategy`: Checks that sampling target is consistent with
  the type and return a dictionary containing each targeted class with its
  corresponding number of pixel.


Deprecation
~~~~~~~~~~~

.. currentmodule:: imblearn.utils.deprecation

.. warning ::
   Apart from :func:`deprecate_parameter` the rest of this section is taken from
   scikit-learn. Please refer to their original documentation.

If any publicly accessible method, function, attribute or parameter
is renamed, we still support the old one for two releases and issue
a deprecation warning when it is called/passed/accessed.
E.g., if the function ``zero_one`` is renamed to ``zero_one_loss``,
we add the decorator ``deprecated`` (from ``sklearn.utils``)
to ``zero_one`` and call ``zero_one_loss`` from that function::

    from ..utils import deprecated

    def zero_one_loss(y_true, y_pred, normalize=True):
        # actual implementation
        pass

    @deprecated("Function 'zero_one' was renamed to 'zero_one_loss' "
                "in version 0.13 and will be removed in release 0.15. "
                "Default behavior is changed from 'normalize=False' to "
                "'normalize=True'")
    def zero_one(y_true, y_pred, normalize=False):
        return zero_one_loss(y_true, y_pred, normalize)

If an attribute is to be deprecated,
use the decorator ``deprecated`` on a property.
E.g., renaming an attribute ``labels_`` to ``classes_`` can be done as::

    @property
    @deprecated("Attribute labels_ was deprecated in version 0.13 and "
                "will be removed in 0.15. Use 'classes_' instead")
    def labels_(self):
        return self.classes_

If a parameter has to be deprecated, use ``FutureWarning`` appropriately.
In the following example, k is deprecated and renamed to n_clusters::

    import warnings

    def example_function(n_clusters=8, k=None):
        if k is not None:
            warnings.warn("'k' was renamed to n_clusters in version 0.13 and "
                          "will be removed in 0.15.", DeprecationWarning)
            n_clusters = k

As in these examples, the warning message should always give both the
version in which the deprecation happened and the version in which the
old behavior will be removed. If the deprecation happened in version
0.x-dev, the message should say deprecation occurred in version 0.x and
the removal will be in 0.(x+2). For example, if the deprecation happened
in version 0.18-dev, the message should say it happened in version 0.18
and the old behavior will be removed in version 0.20.

In addition, a deprecation note should be added in the docstring, recalling the
same information as the deprecation warning as explained above. Use the
``.. deprecated::`` directive::

  .. deprecated:: 0.13
     ``k`` was renamed to ``n_clusters`` in version 0.13 and will be removed
     in 0.15.

On the top of all the functionality provided by scikit-learn. imbalanced-learn
provides :func:`deprecate_parameter`: which is used to deprecate a sampler's
parameter (attribute) by another one.

Making a release
----------------
This section document the different steps that are necessary to make a new
imbalanced-learn release.

Major release
~~~~~~~~~~~~~

* Update the release note `whats_new/v0.<version number>.rst` by giving a date
  and removing the status "Under development" from the title.
* Run `bumpversion release`. It will remove the `dev0` tag.
* Commit the change `git commit -am "bumpversion 0.<version number>.0"`
  (e.g., `git commit -am "bumpversion 0.5.0"`).
* Create a branch for this version
  (e.g., `git checkout -b 0.<version number>.X`).
* Push the new branch into the upstream remote imbalanced-learn repository.
* Change the `symlink` in the
  `imbalanced-learn website repository <https://github.com/imbalanced-learn/imbalanced-learn.github.io>`_
  such that stable points to the latest release version,
  i.e, `0.<version number>`. To do this, clone the repository,
  `run unlink stable`, followed by `ln -s 0.<version number> stable`. To check
  that this was performed correctly, ensure that stable has the new version
  number using `ls -l`.
* Return to your imbalanced-learn repository, in the branch
  `0.<version number>.X`.
* Create the source distribution and wheel: `python setup.py sdist` and
  `python setup.py bdist_wheel`.
* Upload these file to PyPI using `twine upload dist/*`
* Switch to the `master` branch and run `bumpversion minor`, commit and push on
  upstream. We are officially at `0.<version number + 1>.0.dev0`.
* Create a GitHub release by clicking on "Draft a new release" here.
  "Tag version" should be the latest version number (e.g., `0.<version>.0`),
  "Target" should be the branch for that the release
  (e.g., `0.<version number>.X`) and "Release title" should be
  "Version <version number>". Add the notes from the release notes there.
* Add a new `v0.<version number + 1>.rst` file in `doc/whats_new/` and
  `.. include::` this new file in `doc/whats_new.rst`. Mark the version as the
  version under development.
* Finally, go to the `conda-forge feedstock <https://github.com/conda-forge/imbalanced-learn-feedstock>`_
  and a new PR will be created when the feedstock will synchronizing with the
  PyPI repository. Merge this PR such that we have the binary for `conda`
  available.

Bug fix release
~~~~~~~~~~~~~~~

* Find the commit(s) hash of the bug fix commit you wish to back port using
  `git log`.
* Checkout the branch for the lastest release, e.g.,
  `git checkout 0.<version number>.X`.
* Append the bug fix commit(s) to the branch using `git cherry-pick <hash>`.
  Alternatively, you can use interactive rebasing from the `master` branch.
* Bump the version number with bumpversion patch. This will bump the patch
  version, for example from `0.X.0` to `0.X.* dev0`.
* Mark the current version as a release version (as opposed to `dev` version)
  with `bumpversion release --allow-dirty`. It will bump the version, for
  example from `0.X.* dev0` to `0.X.1`.
* Commit the changes with `git commit -am 'bumpversion <new version>'`.
* Push the changes to the release branch in upstream, e.g.
  `git push <upstream remote> <release branch>`.
* Use the same process as in a major release to upload on PyPI and conda-forge.


================================================
FILE: doc/ensemble.rst
================================================
.. _ensemble:

====================
Ensemble of samplers
====================

.. currentmodule:: imblearn.ensemble

.. _ensemble_meta_estimators:

Classifier including inner balancing samplers
=============================================

.. _bagging:

Bagging classifier
------------------

In ensemble classifiers, bagging methods build several estimators on different
randomly selected subset of data. In scikit-learn, this classifier is named
:class:`~sklearn.ensemble.BaggingClassifier`. However, this classifier does not
allow each subset of data to be balanced. Therefore, when training on an imbalanced
data set, this classifier will favor the majority classes::

  >>> from sklearn.datasets import make_classification
  >>> X, y = make_classification(n_samples=10000, n_features=2, n_informative=2,
  ...                            n_redundant=0, n_repeated=0, n_classes=3,
  ...                            n_clusters_per_class=1,
  ...                            weights=[0.01, 0.05, 0.94], class_sep=0.8,
  ...                            random_state=0)
  >>> from sklearn.model_selection import train_test_split
  >>> from sklearn.metrics import balanced_accuracy_score
  >>> from sklearn.ensemble import BaggingClassifier
  >>> from sklearn.tree import DecisionTreeClassifier
  >>> X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
  >>> bc = BaggingClassifier(DecisionTreeClassifier(), random_state=0)
  >>> bc.fit(X_train, y_train) #doctest:
  BaggingClassifier(...)
  >>> y_pred = bc.predict(X_test)
  >>> balanced_accuracy_score(y_test, y_pred)
  0.77...

In :class:`BalancedBaggingClassifier`, each bootstrap sample will be further
resampled to achieve the `sampling_strategy` desired. Therefore,
:class:`BalancedBaggingClassifier` takes the same parameters as the
scikit-learn :class:`~sklearn.ensemble.BaggingClassifier`. In addition, the
sampling is controlled by the parameter `sampler` or the two parameters
`sampling_strategy` and `replacement`, if one wants to use the
:class:`~imblearn.under_sampling.RandomUnderSampler`::

  >>> from imblearn.ensemble import BalancedBaggingClassifier
  >>> bbc = BalancedBaggingClassifier(DecisionTreeClassifier(),
  ...                                 sampling_strategy='auto',
  ...                                 replacement=False,
  ...                                 random_state=0)
  >>> bbc.fit(X_train, y_train)
  BalancedBaggingClassifier(...)
  >>> y_pred = bbc.predict(X_test)
  >>> balanced_accuracy_score(y_test, y_pred)
  0.8...

Changing the `sampler` will give rise to different known implementations
:cite:`maclin1997empirical`, :cite:`hido2009roughly`,
:cite:`wang2009diversity`. You can refer to the following example which shows these
different methods in practice:
:ref:`sphx_glr_auto_examples_ensemble_plot_bagging_classifier.py`

.. _forest:

Forest of randomized trees
--------------------------

:class:`BalancedRandomForestClassifier` is another ensemble method in which
each tree of the forest will be provided a balanced bootstrap sample
:cite:`chen2004using`. This class provides all functionality of the
:class:`~sklearn.ensemble.RandomForestClassifier`::

  >>> from imblearn.ensemble import BalancedRandomForestClassifier
  >>> brf = BalancedRandomForestClassifier(
  ...     n_estimators=100, random_state=0, sampling_strategy="all", replacement=True,
  ...     bootstrap=False,
  ... )
  >>> brf.fit(X_train, y_train)
  BalancedRandomForestClassifier(...)
  >>> y_pred = brf.predict(X_test)
  >>> balanced_accuracy_score(y_test, y_pred)
  0.8...

.. _boosting:

Boosting
--------

Several methods taking advantage of boosting have been designed.

:class:`RUSBoostClassifier` randomly under-samples the dataset before performing
a boosting iteration :cite:`seiffert2009rusboost`::

  >>> from imblearn.ensemble import RUSBoostClassifier
  >>> rusboost = RUSBoostClassifier(n_estimators=200, random_state=0)
  >>> rusboost.fit(X_train, y_train)
  RUSBoostClassifier(...)
  >>> y_pred = rusboost.predict(X_test)
  >>> balanced_accuracy_score(y_test, y_pred)
  0...

A specific method which uses :class:`~sklearn.ensemble.AdaBoostClassifier` as
learners in the bagging classifier is called "EasyEnsemble". The
:class:`EasyEnsembleClassifier` allows bagging AdaBoost learners which are
trained on balanced bootstrap samples :cite:`liu2008exploratory`. Similarly to
the :class:`BalancedBaggingClassifier` API, one can construct the ensemble as::

  >>> from imblearn.ensemble import EasyEnsembleClassifier
  >>> eec = EasyEnsembleClassifier(random_state=0)
  >>> eec.fit(X_train, y_train)
  EasyEnsembleClassifier(...)
  >>> y_pred = eec.predict(X_test)
  >>> balanced_accuracy_score(y_test, y_pred)
  0.6...

.. topic:: Examples

  * :ref:`sphx_glr_auto_examples_ensemble_plot_comparison_ensemble_classifier.py`


================================================
FILE: doc/index.rst
================================================
.. project-template documentation master file, created by
   sphinx-quickstart on Mon Jan 18 14:44:12 2016.
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

:notoc:

##############################
imbalanced-learn documentation
##############################

**Date**: |today| **Version**: |version|

**Useful links**:
`Binary Installers <https://pypi.org/project/imbalanced-learn>`__ |
`Source Repository <https://github.com/scikit-learn-contrib/imbalanced-learn>`__ |
`Issues & Ideas <https://github.com/scikit-learn-contrib/imbalanced-learn/issues>`__ |
`Q&A Support <https://gitter.im/scikit-learn-contrib/imbalanced-learn>`__

Imbalanced-learn (imported as :mod:`imblearn`) is an open source, MIT-licensed
library relying on scikit-learn (imported as :mod:`sklearn`) and provides tools
when dealing with classification with imbalanced classes.

.. grid:: 1 2 2 2
    :gutter: 4
    :padding: 2 2 0 0
    :class-container: sd-text-center

    .. grid-item-card:: Getting started
        :img-top: _static/index_getting_started.svg
        :class-card: intro-card
        :shadow: md

        Check out the getting started guides to install `imbalanced-learn`.
        Some extra information to get started with a new contribution is also provided.

        +++

        .. button-ref:: getting_started
            :ref-type: ref
            :click-parent:
            :color: secondary
            :expand:

            To the installation guideline

    .. grid-item-card::  User guide
        :img-top: _static/index_user_guide.svg
        :class-card: intro-card
        :shadow: md

        The user guide provides in-depth information on the key concepts of
        `imbalanced-learn` with useful background information and explanation.

        +++

        .. button-ref:: user_guide
            :ref-type: ref
            :click-parent:
            :color: secondary
            :expand:

            To the user guide

    .. grid-item-card::  API reference
        :img-top: _static/index_api.svg
        :class-card: intro-card
        :shadow: md

        The reference guide contains a detailed description of
        the `imbalanced-learn` API. To known more about methods parameters.

        +++

        .. button-ref:: api
            :ref-type: ref
            :click-parent:
            :color: secondary
            :expand:

            To the reference guide

    .. grid-item-card::  Examples
        :img-top: _static/index_examples.svg
        :class-card: intro-card
        :shadow: md

        The gallery of examples is a good place to see `imbalanced-learn` in action.
        Select an example and dive in.

        +++

        .. button-ref:: general_examples
            :ref-type: ref
            :click-parent:
            :color: secondary
            :expand:

            To the gallery of examples


.. toctree::
    :maxdepth: 3
    :hidden:
    :titlesonly:

    install
    user_guide
    references/index
    auto_examples/index
    whats_new
    about


================================================
FILE: doc/install.rst
================================================
.. _getting_started:

###############
Getting Started
###############

Prerequisites
=============

.. |PythonMinVersion| replace:: 3.10
.. |NumPyMinVersion| replace:: 1.25.2
.. |SciPyMinVersion| replace:: 1.11.4
.. |ScikitLearnMinVersion| replace:: 1.4.2
.. |MatplotlibMinVersion| replace:: 3.7.3
.. |PandasMinVersion| replace:: 2.0.3
.. |TensorflowMinVersion| replace:: 2.16.1
.. |KerasMinVersion| replace:: 3.3.3
.. |SeabornMinVersion| replace:: 0.12.2
.. |PytestMinVersion| replace:: 7.2.2

`imbalanced-learn` requires the following dependencies:

- Python (>= |PythonMinVersion|)
- NumPy (>= |NumPyMinVersion|)
- SciPy (>= |SciPyMinVersion|)
- Scikit-learn (>= |ScikitLearnMinVersion|)
- Pytest (>= |PytestMinVersion|)

Additionally, `imbalanced-learn` requires the following optional dependencies:

- Pandas (>= |PandasMinVersion|) for dealing with dataframes
- Tensorflow (>= |TensorflowMinVersion|) for dealing with TensorFlow models
- Keras (>= |KerasMinVersion|) for dealing with Keras models

The examples will requires the following additional dependencies:

- Matplotlib (>= |MatplotlibMinVersion|)
- Seaborn (>= |SeabornMinVersion|)

Install
=======

From PyPi or conda-forge repositories
-------------------------------------

imbalanced-learn is currently available on the PyPi's repositories and you can
install it via `pip`::

  pip install imbalanced-learn

The package is released also on the conda-forge repositories and you can install
it with `conda` (or `mamba`)::

  conda install -c conda-forge imbalanced-learn

Intel optimizations via scikit-learn-intelex
--------------------------------------------

Imbalanced-learn relies entirely on scikit-learn algorithms. Intel provides an
optimized version of scikit-learn for Intel hardwares, called scikit-learn-intelex.
Installing scikit-learn-intelex and patching scikit-learn will activate the
Intel optimizations.

You can refer to the following
`blog post <https://medium.com/intel-analytics-software/why-pay-more-for-machine-learning-893683bd78e4>`_
for some benchmarks.

Refer to the following documentation for instructions:

- `Installation guide <https://intel.github.io/scikit-learn-intelex/installation.html>`_.
- `Patching guide <https://intel.github.io/scikit-learn-intelex/what-is-patching.html>`_.

From source available on GitHub
-------------------------------

If you prefer, you can clone it and run the setup.py file. Use the following
commands to get a copy from Github and install all dependencies::

  git clone https://github.com/scikit-learn-contrib/imbalanced-learn.git
  cd imbalanced-learn
  pip install .

Be aware that you can install in developer mode with::

  pip install --no-build-isolation --editable .

If you wish to make pull-requests on GitHub, we advise you to install
pre-commit::

  pip install pre-commit
  pre-commit install

Test and coverage
=================

You want to test the code before to install::

  $ make test

You wish to test the coverage of your version::

  $ make coverage

You can also use `pytest`::

  $ pytest imblearn -v

Contribute
==========

You can contribute to this code through Pull Request on GitHub_. Please, make
sure that your code is coming with unit tests to ensure full coverage and
continuous integration in the API.

.. _GitHub: https://github.com/scikit-learn-contrib/imbalanced-learn/pulls


================================================
FILE: doc/introduction.rst
================================================
.. _introduction:

============
Introduction
============

.. _api_imblearn:

API's of imbalanced-learn samplers
----------------------------------

The available samplers follow the
`scikit-learn API <https://scikit-learn.org/stable/getting_started.html#fitting-and-predicting-estimator-basics>`_
using the base estimator
and incorporating a sampling functionality via the ``sample`` method:

:Estimator:

    The base object, implements a ``fit`` method to learn from data::

      estimator = obj.fit(data, targets)

:Resampler:

    To resample a data sets, each sampler implements a ``fit_resample`` method::

      data_resampled, targets_resampled = obj.fit_resample(data, targets)

Imbalanced-learn samplers accept the same inputs as scikit-learn estimators:

* `data`, 2-dimensional array-like structures, such as:
   * Python's list of lists :class:`list`,
   * Numpy arrays :class:`numpy.ndarray`,
   * Panda dataframes :class:`pandas.DataFrame`,
   * Scipy sparse matrices :class:`scipy.sparse.csr_matrix` or :class:`scipy.sparse.csc_matrix`;

* `targets`, 1-dimensional array-like structures, such as:
   * Numpy arrays :class:`numpy.ndarray`,
   * Pandas series :class:`pandas.Series`.

The output will be of the following type:

* `data_resampled`, 2-dimensional aray-like structures, such as:
   * Numpy arrays :class:`numpy.ndarray`,
   * Pandas dataframes :class:`pandas.DataFrame`,
   * Scipy sparse matrices :class:`scipy.sparse.csr_matrix` or :class:`scipy.sparse.csc_matrix`;

* `targets_resampled`, 1-dimensional array-like structures, such as:
   * Numpy arrays :class:`numpy.ndarray`,
   * Pandas series :class:`pandas.Series`.

.. topic:: Pandas in/out

   Unlike scikit-learn, imbalanced-learn provides support for pandas in/out.
   Therefore providing a dataframe, will output as well a dataframe.

.. topic:: Sparse input

   For sparse input the data is **converted to the Compressed Sparse Rows
   representation** (see ``scipy.sparse.csr_matrix``) before being fed to the
   sampler. To avoid unnecessary memory copies, it is recommended to choose the
   CSR representation upstream.

.. _problem_statement:

Problem statement regarding imbalanced data sets
------------------------------------------------

The learning and prediction phrases of machine learning algorithms
can be impacted by the issue of **imbalanced datasets**. This imbalance
refers to the difference in the number of samples across different classes.
We demonstrate the effect of training a `Logistic Regression classifier
<https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html>`_
with varying levels of class balancing by adjusting their weights.

.. image:: ./auto_examples/over-sampling/images/sphx_glr_plot_comparison_over_sampling_001.png
   :target: ./auto_examples/over-sampling/plot_comparison_over_sampling.html
   :scale: 60
   :align: center

As expected, the decision function of the Logistic Regression classifier varies significantly
depending on how imbalanced the data is. With a greater imbalance ratio, the decision function
tends to favour the class with the larger number of samples, usually referred to as the
**majority class**.


================================================
FILE: doc/make.bat
================================================
@ECHO OFF

REM Command file for Sphinx documentation

if "%SPHINXBUILD%" == "" (
	set SPHINXBUILD=sphinx-build
)
set BUILDDIR=_build
set ALLSPHINXOPTS=-d %BUILDDIR%/doctrees %SPHINXOPTS% .
set I18NSPHINXOPTS=%SPHINXOPTS% .
if NOT "%PAPER%" == "" (
	set ALLSPHINXOPTS=-D latex_paper_size=%PAPER% %ALLSPHINXOPTS%
	set I18NSPHINXOPTS=-D latex_paper_size=%PAPER% %I18NSPHINXOPTS%
)

if "%1" == "" goto help

if "%1" == "help" (
	:help
	echo.Please use `make ^<target^>` where ^<target^> is one of
	echo.  html       to make standalone HTML files
	echo.  dirhtml    to make HTML files named index.html in directories
	echo.  singlehtml to make a single large HTML file
	echo.  pickle     to make pickle files
	echo.  json       to make JSON files
	echo.  htmlhelp   to make HTML files and a HTML help project
	echo.  qthelp     to make HTML files and a qthelp project
	echo.  devhelp    to make HTML files and a Devhelp project
	echo.  epub       to make an epub
	echo.  latex      to make LaTeX files, you can set PAPER=a4 or PAPER=letter
	echo.  text       to make text files
	echo.  man        to make manual pages
	echo.  texinfo    to make Texinfo files
	echo.  gettext    to make PO message catalogs
	echo.  changes    to make an overview over all changed/added/deprecated items
	echo.  xml        to make Docutils-native XML files
	echo.  pseudoxml  to make pseudoxml-XML files for display purposes
	echo.  linkcheck  to check all external links for integrity
	echo.  doctest    to run all doctests embedded in the documentation if enabled
	goto end
)

if "%1" == "clean" (
	for /d %%i in (%BUILDDIR%\*) do rmdir /q /s %%i
	del /q /s %BUILDDIR%\*
	goto end
)


%SPHINXBUILD% 2> nul
if errorlevel 9009 (
	echo.
	echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
	echo.installed, then set the SPHINXBUILD environment variable to point
	echo.to the full path of the 'sphinx-build' executable. Alternatively you
	echo.may add the Sphinx directory to PATH.
	echo.
	echo.If you don't have Sphinx installed, grab it from
	echo.http://sphinx-doc.org/
	exit /b 1
)

if "%1" == "html" (
	%SPHINXBUILD% -b html %ALLSPHINXOPTS% %BUILDDIR%/html
	if errorlevel 1 exit /b 1
	echo.
	echo.Build finished. The HTML pages are in %BUILDDIR%/html.
	goto end
)

if "%1" == "dirhtml" (
	%SPHINXBUILD% -b dirhtml %ALLSPHINXOPTS% %BUILDDIR%/dirhtml
	if errorlevel 1 exit /b 1
	echo.
	echo.Build finished. The HTML pages are in %BUILDDIR%/dirhtml.
	goto end
)

if "%1" == "singlehtml" (
	%SPHINXBUILD% -b singlehtml %ALLSPHINXOPTS% %BUILDDIR%/singlehtml
	if errorlevel 1 exit /b 1
	echo.
	echo.Build finished. The HTML pages are in %BUILDDIR%/singlehtml.
	goto end
)

if "%1" == "pickle" (
	%SPHINXBUILD% -b pickle %ALLSPHINXOPTS% %BUILDDIR%/pickle
	if errorlevel 1 exit /b 1
	echo.
	echo.Build finished; now you can process the pickle files.
	goto end
)

if "%1" == "json" (
	%SPHINXBUILD% -b json %ALLSPHINXOPTS% %BUILDDIR%/json
	if errorlevel 1 exit /b 1
	echo.
	echo.Build finished; now you can process the JSON files.
	goto end
)

if "%1" == "htmlhelp" (
	%SPHINXBUILD% -b htmlhelp %ALLSPHINXOPTS% %BUILDDIR%/htmlhelp
	if errorlevel 1 exit /b 1
	echo.
	echo.Build finished; now you can run HTML Help Workshop with the ^
.hhp project file in %BUILDDIR%/htmlhelp.
	goto end
)

if "%1" == "qthelp" (
	%SPHINXBUILD% -b qthelp %ALLSPHINXOPTS% %BUILDDIR%/qthelp
	if errorlevel 1 exit /b 1
	echo.
	echo.Build finished; now you can run "qcollectiongenerator" with the ^
.qhcp project file in %BUILDDIR%/qthelp, like this:
	echo.^> qcollectiongenerator %BUILDDIR%\qthelp\imbalanced-learn.qhcp
	echo.To view the help file:
	echo.^> assistant -collectionFile %BUILDDIR%\qthelp\imbalanced-learn.ghc
	goto end
)

if "%1" == "devhelp" (
	%SPHINXBUILD% -b devhelp %ALLSPHINXOPTS% %BUILDDIR%/devhelp
	if errorlevel 1 exit /b 1
	echo.
	echo.Build finished.
	goto end
)

if "%1" == "epub" (
	%SPHINXBUILD% -b epub %ALLSPHINXOPTS% %BUILDDIR%/epub
	if errorlevel 1 exit /b 1
	echo.
	echo.Build finished. The epub file is in %BUILDDIR%/epub.
	goto end
)

if "%1" == "latex" (
	%SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex
	if errorlevel 1 exit /b 1
	echo.
	echo.Build finished; the LaTeX files are in %BUILDDIR%/latex.
	goto end
)

if "%1" == "latexpdf" (
	%SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex
	cd %BUILDDIR%/latex
	make all-pdf
	cd %BUILDDIR%/..
	echo.
	echo.Build finished; the PDF files are in %BUILDDIR%/latex.
	goto end
)

if "%1" == "latexpdfja" (
	%SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex
	cd %BUILDDIR%/latex
	make all-pdf-ja
	cd %BUILDDIR%/..
	echo.
	echo.Build finished; the PDF files are in %BUILDDIR%/latex.
	goto end
)

if "%1" == "text" (
	%SPHINXBUILD% -b text %ALLSPHINXOPTS% %BUILDDIR%/text
	if errorlevel 1 exit /b 1
	echo.
	echo.Build finished. The text files are in %BUILDDIR%/text.
	goto end
)

if "%1" == "man" (
	%SPHINXBUILD% -b man %ALLSPHINXOPTS% %BUILDDIR%/man
	if errorlevel 1 exit /b 1
	echo.
	echo.Build finished. The manual pages are in %BUILDDIR%/man.
	goto end
)

if "%1" == "texinfo" (
	%SPHINXBUILD% -b texinfo %ALLSPHINXOPTS% %BUILDDIR%/texinfo
	if errorlevel 1 exit /b 1
	echo.
	echo.Build finished. The Texinfo files are in %BUILDDIR%/texinfo.
	goto end
)

if "%1" == "gettext" (
	%SPHINXBUILD% -b gettext %I18NSPHINXOPTS% %BUILDDIR%/locale
	if errorlevel 1 exit /b 1
	echo.
	echo.Build finished. The message catalogs are in %BUILDDIR%/locale.
	goto end
)

if "%1" == "changes" (
	%SPHINXBUILD% -b changes %ALLSPHINXOPTS% %BUILDDIR%/changes
	if errorlevel 1 exit /b 1
	echo.
	echo.The overview file is in %BUILDDIR%/changes.
	goto end
)

if "%1" == "linkcheck" (
	%SPHINXBUILD% -b linkcheck %ALLSPHINXOPTS% %BUILDDIR%/linkcheck
	if errorlevel 1 exit /b 1
	echo.
	echo.Link check complete; look for any errors in the above output ^
or in %BUILDDIR%/linkcheck/output.txt.
	goto end
)

if "%1" == "doctest" (
	%SPHINXBUILD% -b doctest %ALLSPHINXOPTS% %BUILDDIR%/doctest
	if errorlevel 1 exit /b 1
	echo.
	echo.Testing of doctests in the sources finished, look at the ^
results in %BUILDDIR%/doctest/output.txt.
	goto end
)

if "%1" == "xml" (
	%SPHINXBUILD% -b xml %ALLSPHINXOPTS% %BUILDDIR%/xml
	if errorlevel 1 exit /b 1
	echo.
	echo.Build finished. The XML files are in %BUILDDIR%/xml.
	goto end
)

if "%1" == "pseudoxml" (
	%SPHINXBUILD% -b pseudoxml %ALLSPHINXOPTS% %BUILDDIR%/pseudoxml
	if errorlevel 1 exit /b 1
	echo.
	echo.Build finished. The pseudo-XML files are in %BUILDDIR%/pseudoxml.
	goto end
)

:end


================================================
FILE: doc/metrics.rst
================================================
.. _metrics:

=======
Metrics
=======

.. currentmodule:: imblearn.metrics

Classification metrics
----------------------

Currently, scikit-learn only offers the
``sklearn.metrics.balanced_accuracy_score`` (in 0.20) as metric to deal with
imbalanced datasets. The module :mod:`imblearn.metrics` offers a couple of
other metrics which are used in the literature to evaluate the quality of
classifiers.

.. _sensitivity_specificity:

Sensitivity and specificity metrics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Sensitivity and specificity are metrics which are well known in medical
imaging. Sensitivity (also called true positive rate or recall) is the
proportion of the positive samples which is well classified while specificity
(also called true negative rate) is the proportion of the negative samples
which are well classified. Therefore, depending of the field of application,
either the sensitivity/specificity or the precision/recall pair of metrics are
used.

Currently, only the `precision and recall metrics
<http://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_recall_fscore_support.html>`_
are implemented in scikit-learn. :func:`sensitivity_specificity_support`,
:func:`sensitivity_score`, and :func:`specificity_score` add the possibility to
use those metrics.

.. _imbalanced_metrics:

Additional metrics specific to imbalanced datasets
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The :func:`geometric_mean_score`
:cite:`barandela2003strategies,kubat1997addressing` is the root of the product
of class-wise sensitivity. This measure tries to maximize the accuracy on each
of the classes while keeping these accuracies balanced.

The :func:`make_index_balanced_accuracy` :cite:`garcia2012effectiveness` can
wrap any metric and give more importance to a specific class using the
parameter ``alpha``.

.. _macro_averaged_mean_absolute_error:

Macro-Averaged Mean Absolute Error (MA-MAE)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Ordinal classification is used when there is a rank among classes, for example
levels of functionality or movie ratings.

The :func:`macro_averaged_mean_absolute_error` :cite:`esuli2009ordinal` is used
for imbalanced ordinal classification. The mean absolute error is computed for
each class and averaged over classes, giving an equal weight to each class.

.. _classification_report:

Summary of important metrics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The :func:`classification_report_imbalanced` will compute a set of metrics per
class and summarize it in a table. The parameter `output_dict` allows to get a
string or a Python dictionary. This dictionary can be reused to create a Pandas
dataframe for instance.

The bottom row (i.e "avg/total") contains the weighted average by the support
(i.e column "sup") of each column.

Note that the weighted average of the class recalls is also known as the
classification accuracy.

.. _pairwise_metrics:

Pairwise metrics
----------------

The :mod:`imblearn.metrics.pairwise` submodule implements pairwise distances
that are available in scikit-learn while used in some of the methods in
imbalanced-learn.

.. _vdm:

Value Difference Metric
~~~~~~~~~~~~~~~~~~~~~~~

The class :class:`~imblearn.metrics.pairwise.ValueDifferenceMetric` is
implementing the Value Difference Metric proposed in
:cite:`stanfill1986toward`. This measure is used to compute the proximity
of two samples composed of only categorical values.

Given a single feature, categories with similar correlation with the target
vector will be considered closer. Let's give an example to illustrate this
behaviour as given in :cite:`wilson1997improved`. `X` will be represented by a
single feature which will be some color and the target will be if a sample is
whether or not an apple::

    >>> import numpy as np
    >>> X = np.array(["green"] * 10 + ["red"] * 10 + ["blue"] * 10).reshape(-1, 1)
    >>> y = ["apple"] * 8 + ["not apple"] * 5 + ["apple"] * 7 + ["not apple"] * 9 + ["apple"]

In this dataset, the categories "red" and "green" are more correlated to the
target `y` and should have a smaller distance than with the category "blue".
We should this behaviour. Be aware that we need to encode the `X` to work with
numerical values::

    >>> from sklearn.preprocessing import OrdinalEncoder
    >>> encoder = OrdinalEncoder(dtype=np.int32)
    >>> X_encoded = encoder.fit_transform(X)

Now, we can compute the distance between three different samples representing
the different categories::

    >>> from imblearn.metrics.pairwise import ValueDifferenceMetric
    >>> vdm = ValueDifferenceMetric().fit(X_encoded, y)
    >>> X_test = np.array(["green", "red", "blue"]).reshape(-1, 1)
    >>> X_test_encoded = encoder.transform(X_test)
    >>> vdm.pairwise(X_test_encoded)
    array([[0.  ,  0.04,  1.96],
           [0.04,  0.  ,  1.44],
           [1.96,  1.44,  0.  ]])

We see that the minimum distance happen when the categories "red" and "green"
are compared. Whenever comparing with "blue", the distance is much larger.

**Mathematical formulation**

The distance between feature values of two samples is defined as:

.. math::
    \delta(x, y) = \sum_{c=1}^{C} |p(c|x_{f}) - p(c|y_{f})|^{k} \ ,

where :math:`x` and :math:`y` are two samples and :math:`f` a given
feature, :math:`C` is the number of classes, :math:`p(c|x_{f})` is the
conditional probability that the output class is :math:`c` given that
the feature value :math:`f` has the value :math:`x` and :math:`k` an
exponent usually defined to 1 or 2.

The distance for the feature vectors :math:`X` and :math:`Y` is
subsequently defined as:

.. math::
    \Delta(X, Y) = \sum_{f=1}^{F} \delta(X_{f}, Y_{f})^{r} \ ,

where :math:`F` is the number of feature and :math:`r` an exponent usually
defined equal to 1 or 2.


================================================
FILE: doc/miscellaneous.rst
================================================
.. _miscellaneous:

======================
Miscellaneous samplers
======================

.. currentmodule:: imblearn

.. _function_sampler:

Custom samplers
---------------

A fully customized sampler, :class:`FunctionSampler`, is available in
imbalanced-learn such that you can fast prototype your own sampler by defining
a single function. Additional parameters can be added using the attribute
``kw_args`` which accepts a dictionary. The following example illustrates how
to retain the 10 first elements of the array ``X`` and ``y``::

  >>> import numpy as np
  >>> from imblearn import FunctionSampler
  >>> from sklearn.datasets import make_classification
  >>> X, y = make_classification(n_samples=5000, n_features=2, n_informative=2,
  ...                            n_redundant=0, n_repeated=0, n_classes=3,
  ...                            n_clusters_per_class=1,
  ...                            weights=[0.01, 0.05, 0.94],
  ...                            class_sep=0.8, random_state=0)
  >>> def func(X, y):
  ...   return X[:10], y[:10]
  >>> sampler = FunctionSampler(func=func)
  >>> X_res, y_res = sampler.fit_resample(X, y)
  >>> np.all(X_res == X[:10])
  True
  >>> np.all(y_res == y[:10])
  True

In addition, the parameter ``validate`` controls input checking. For instance,
turning ``validate=False`` allows to pass any type of target ``y`` and do some
sampling for regression targets::

  >>> from sklearn.datasets import make_regression
  >>> X_reg, y_reg = make_regression(n_samples=100, random_state=42)
  >>> rng = np.random.RandomState(42)
  >>> def dummy_sampler(X, y):
  ...     indices = rng.choice(np.arange(X.shape[0]), size=10)
  ...     return X[indices], y[indices]
  >>> sampler = FunctionSampler(func=dummy_sampler, validate=False)
  >>> X_res, y_res = sampler.fit_resample(X_reg, y_reg)
  >>> y_res
  array([  41.49112498, -142.78526195,   85.55095317,  141.43321419,
           75.46571114,  -67.49177372,  159.72700509, -169.80498923,
          211.95889757,  211.95889757])

We illustrated the use of such sampler to implement an outlier rejection
estimator which can be easily used within a
:class:`~imblearn.pipeline.Pipeline`:
:ref:`sphx_glr_auto_examples_applications_plot_outlier_rejections.py`

.. _generators:

Custom generators
-----------------

Imbalanced-learn provides specific generators for TensorFlow and Keras which
will generate balanced mini-batches.

.. _tensorflow_generator:

TensorFlow generator
~~~~~~~~~~~~~~~~~~~~

The :func:`~imblearn.tensorflow.balanced_batch_generator` allows to generate
balanced mini-batches using an imbalanced-learn sampler which returns indices.

Let's first generate some data::

  >>> n_features, n_classes = 10, 2
  >>> X, y = make_classification(
  ...     n_samples=10_000, n_features=n_features, n_informative=2,
  ...     n_redundant=0, n_repeated=0, n_classes=n_classes,
  ...     n_clusters_per_class=1, weights=[0.1, 0.9],
  ...     class_sep=0.8, random_state=0
  ... )
  >>> X = X.astype(np.float32)

Then, we can create the generator that will yield mini-batches that will be
balanced::

  >>> from imblearn.under_sampling import RandomUnderSampler
  >>> from imblearn.tensorflow import balanced_batch_generator
  >>> training_generator, steps_per_epoch = balanced_batch_generator(
  ...     X,
  ...     y,
  ...     sample_weight=None,
  ...     sampler=RandomUnderSampler(),
  ...     batch_size=32,
  ...     random_state=42,
  ... )

The ``generator`` and ``steps_per_epoch`` are used during the training of a
Tensorflow model. We will illustrate how to use this generator. First, we can
define a logistic regression model which will be optimized by a gradient
descent::

  >>> import tensorflow as tf
  >>> # initialize the weights and intercept
  >>> normal_initializer = tf.random_normal_initializer(mean=0, stddev=0.01)
  >>> coef = tf.Variable(normal_initializer(
  ...     shape=[n_features, n_classes]), dtype="float32"
  ... )
  >>> intercept = tf.Variable(
  ...     normal_initializer(shape=[n_classes]), dtype="float32"
  ... )
  >>> # define the model
  >>> def logistic_regression(X):
  ...     return tf.nn.softmax(tf.matmul(X, coef) + intercept)
  >>> # define the loss function
  >>> def cross_entropy(y_true, y_pred):
  ...     y_true = tf.one_hot(y_true, depth=n_classes)
  ...     y_pred = tf.clip_by_value(y_pred, 1e-9, 1.)
  ...     return tf.reduce_mean(-tf.reduce_sum(y_true * tf.math.log(y_pred)))
  >>> # define our metric
  >>> def balanced_accuracy(y_true, y_pred):
  ...     cm = tf.math.confusion_matrix(tf.cast(y_true, tf.int64), tf.argmax(y_pred, 1))
  ...     per_class = np.diag(cm) / tf.math.reduce_sum(cm, axis=1)
  ...     return np.mean(per_class)
  >>> # define the optimizer
  >>> optimizer = tf.optimizers.SGD(learning_rate=0.01)
  >>> # define the optimization step
  >>> def run_optimization(X, y):
  ...     with tf.GradientTape() as g:
  ...         y_pred = logistic_regression(X)
  ...         loss = cross_entropy(y, y_pred)
  ...     gradients = g.gradient(loss, [coef, intercept])
  ...     optimizer.apply_gradients(zip(gradients, [coef, intercept]))

Once initialized, the model is trained by iterating on balanced mini-batches of
data and minimizing the loss previously defined::

  >>> epochs = 10
  >>> for e in range(epochs):
  ...     y_pred = logistic_regression(X)
  ...     loss = cross_entropy(y, y_pred)
  ...     bal_acc = balanced_accuracy(y, y_pred)
  ...     print(f"epoch: {e}, loss: {loss:.3f}, accuracy: {bal_acc}")
  ...     for i in range(steps_per_epoch):
  ...         X_batch, y_batch = next(training_generator)
  ...         run_optimization(X_batch, y_batch)
  epoch: 0, ...

.. _keras_generator:

Keras generator
~~~~~~~~~~~~~~~

Keras provides an higher level API in which a model can be defined and train by
calling ``fit_generator`` method to train the model. To illustrate, we will
define a logistic regression model::

  >>> from tensorflow import keras
  >>> y = keras.utils.to_categorical(y, 3)
  >>> model = keras.Sequential()
  >>> model.add(
  ...     keras.layers.Dense(
  ...         y.shape[1], input_dim=X.shape[1], activation='softmax'
  ...     )
  ... )
  >>> model.compile(
  ...     optimizer='sgd', loss='categorical_crossentropy', metrics=['accuracy']
  ... )

:func:`~imblearn.keras.balanced_batch_generator` creates a balanced
mini-batches generator with the associated number of mini-batches which will be
generated::

  >>> from imblearn.keras import balanced_batch_generator
  >>> training_generator, steps_per_epoch = balanced_batch_generator(
  ...     X, y, sampler=RandomUnderSampler(), batch_size=10, random_state=42
  ... )

Then, ``fit`` can be called passing the generator and the step::

  >>> callback_history = model.fit(
  ...     training_generator,
  ...     steps_per_epoch=steps_per_epoch,
  ...     epochs=10,
  ...     verbose=1,
  ... )
  Epoch 1/10 ...

The second possibility is to use
:class:`~imblearn.keras.BalancedBatchGenerator`. Only an instance of this class
will be passed to ``fit``::

  >>> from imblearn.keras import BalancedBatchGenerator
  >>> training_generator = BalancedBatchGenerator(
  ...     X, y, sampler=RandomUnderSampler(), batch_size=10, random_state=42
  ... )
  >>> callback_history = model.fit(
  ...     training_generator,
  ...     steps_per_epoch=steps_per_epoch,
  ...     epochs=10,
  ...     verbose=1,
  ... )
  Epoch 1/10 ...

.. topic:: References

  * :ref:`sphx_glr_auto_examples_applications_porto_seguro_keras_under_sampling.py`


================================================
FILE: doc/model_selection.rst
================================================
.. _cross_validation:

================
Cross validation
================

.. currentmodule:: imblearn.model_selection


.. _instance_hardness_threshold_cv:

The term instance hardness is used in literature to express the difficulty to correctly
classify an instance. An instance for which the predicted probability of the true class
is low, has large instance hardness. The way these hard-to-classify instances are
distributed over train and test sets in cross validation, has significant effect on the
test set performance metrics. The :class:`~imblearn.model_selection.InstanceHardnessCV`
splitter distributes samples with large instance hardness equally over the folds,
resulting in more robust cross validation.

We will discuss instance hardness in this document and explain how to use the
:class:`~imblearn.model_selection.InstanceHardnessCV` splitter.

Instance hardness and average precision
=======================================

Instance hardness is defined as 1 minus the probability of the most probable class:

.. math::

   H(x) = 1 - P(\hat{y}|x)

In this equation :math:`H(x)` is the instance hardness for a sample with features
:math:`x` and :math:`P(\hat{y}|x)` the probability of predicted label :math:`\hat{y}`
given the features. If the model predicts label 0 and gives a `predict_proba` output
of [0.9, 0.1], the probability of the most probable class (0) is 0.9 and the
instance hardness is `1-0.9=0.1`.

Samples with large instance hardness have significant effect on the area under
precision-recall curve, or average precision. Especially samples with label 0
with large instance hardness (so the model predicts label 1) reduce the average
precision a lot as these points affect the precision-recall curve in the left
where the area is largest; the precision is lowered in the range of low recall
and high thresholds. When doing cross validation, e.g. in case of hyperparameter
tuning or recursive feature elimination, random gathering of these points in
some folds introduce variance in CV results that deteriorates robustness of the
cross validation task. The :class:`~imblearn.model_selection.InstanceHardnessCV`
splitter aims to distribute the samples with large instance hardness over the
folds in order to reduce undesired variance. Note that one should use this
splitter to make model *selection* tasks robust like hyperparameter tuning and
feature selection but not for model *performance estimation* for which you also
want to know the variance of performance to be expected in production.


Create imbalanced dataset with samples with large instance hardness
===================================================================

Let's start by creating a dataset to work with. We create a dataset with 5% class
imbalance using scikit-learn's :func:`~sklearn.datasets.make_blobs` function.

  >>> import numpy as np
  >>> from matplotlib import pyplot as plt
  >>> from sklearn.datasets import make_blobs
  >>> from imblearn.datasets import make_imbalance
  >>> random_state = 10
  >>> X, y = make_blobs(n_samples=[950, 50], centers=((-3, 0), (3, 0)),
  ...                   random_state=random_state)
  >>> plt.scatter(X[:, 0], X[:, 1], c=y)
  >>> plt.show()

.. image:: ./auto_examples/model_selection/images/sphx_glr_plot_instance_hardness_cv_001.png
   :target: ./auto_examples/model_selection/plot_instance_hardness_cv.html
   :align: center

Now we add some samples with large instance hardness

  >>> X_hard, y_hard = make_blobs(n_samples=10, centers=((3, 0), (-3, 0)),
  ...                             cluster_std=1,
  ...                             random_state=random_state)
  >>> X = np.vstack((X, X_hard))
  >>> y = np.hstack((y, y_hard))
  >>> plt.scatter(X[:, 0], X[:, 1], c=y)
  >>> plt.show()

.. image:: ./auto_examples/model_selection/images/sphx_glr_plot_instance_hardness_cv_002.png
   :target: ./auto_examples/model_selection/plot_instance_hardness_cv.html
   :align: center

Assess cross validation performance variance using `InstanceHardnessCV` splitter
================================================================================

Then we take a :class:`~sklearn.linear_model.LogisticRegression` and assess the
cross validation performance using a :class:`~sklearn.model_selection.StratifiedKFold`
cv splitter and the :func:`~sklearn.model_selection.cross_validate` function.

  >>> from sklearn.ensemble import LogisticRegressionClassifier
  >>> clf = LogisticRegressionClassifier(random_state=random_state)
  >>> skf_cv = StratifiedKFold(n_splits=5, shuffle=True,
  ...                           random_state=random_state)
  >>> skf_result = cross_validate(clf, X, y, cv=skf_cv, scoring="average_precision")

Now, we do the same using an :class:`~imblearn.model_selection.InstanceHardnessCV`
splitter. We use provide our classifier to the splitter to calculate instance hardness
and distribute samples with large instance hardness equally over the folds.

  >>> ih_cv = InstanceHardnessCV(estimator=clf, n_splits=5,
  ...                               random_state=random_state)
  >>> ih_result = cross_validate(clf, X, y, cv=ih_cv, scoring="average_precision")

When we plot the test scores for both cv splitters, we see that the variance using the
:class:`~imblearn.model_selection.InstanceHardnessCV` splitter is lower than for the
:class:`~sklearn.model_selection.StratifiedKFold` splitter.

  >>> plt.boxplot([skf_result['test_score'], ih_result['test_score']],
  ...               tick_labels=["StratifiedKFold", "InstanceHardnessCV"],
  ...               vert=False)
  >>> plt.xlabel('Average precision')
  >>> plt.tight_layout()

.. image:: ./auto_examples/model_selection/images/sphx_glr_plot_instance_hardness_cv_003.png
   :target: ./auto_examples/model_selection/plot_instance_hardness_cv.html
   :align: center

Be aware that the most important part of cross-validation splitters is to simulate the
conditions that one will encounter in production. Therefore, if it is likely to get
difficult samples in production, one should use a cross-validation splitter that
emulates this situation. In our case, the
:class:`~sklearn.model_selection.StratifiedKFold` splitter did not allow to distribute
the difficult samples over the folds and thus it was likely a problem for our use case.


================================================
FILE: doc/over_sampling.rst
================================================
.. _over-sampling:

=============
Over-sampling
=============

.. currentmodule:: imblearn.over_sampling

A practical guide
=================

You can refer to
:ref:`sphx_glr_auto_examples_over-sampling_plot_comparison_over_sampling.py`.

.. _random_over_sampler:

Naive random over-sampling
--------------------------

One way to fight this issue is to generate new samples in the classes which are
under-represented. The most naive strategy is to generate new samples by
randomly sampling with replacement the current available samples. The
:class:`RandomOverSampler` offers such scheme::

   >>> from sklearn.datasets import make_classification
   >>> X, y = make_classification(n_samples=5000, n_features=2, n_informative=2,
   ...                            n_redundant=0, n_repeated=0, n_classes=3,
   ...                            n_clusters_per_class=1,
   ...                            weights=[0.01, 0.05, 0.94],
   ...                            class_sep=0.8, random_state=0)
   >>> from imblearn.over_sampling import RandomOverSampler
   >>> ros = RandomOverSampler(random_state=0)
   >>> X_resampled, y_resampled = ros.fit_resample(X, y)
   >>> from collections import Counter
   >>> print(sorted(Counter(y_resampled).items()))
   [(0, 4674), (1, 4674), (2, 4674)]

The augmented data set should be used instead of the original data set to train
a classifier::

  >>> from sklearn.linear_model import LogisticRegression
  >>> clf = LogisticRegression()
  >>> clf.fit(X_resampled, y_resampled)
  LogisticRegression(...)

In the figure below, we compare the decision functions of a classifier trained
using the over-sampled data set and the original data set.

.. image:: ./auto_examples/over-sampling/images/sphx_glr_plot_comparison_over_sampling_002.png
   :target: ./auto_examples/over-sampling/plot_comparison_over_sampling.html
   :scale: 60
   :align: center

As a result, the majority class does not take over the other classes during the
training process. Consequently, all classes are represented by the decision
function.

In addition, :class:`RandomOverSampler` allows to sample heterogeneous data
(e.g. containing some strings)::

  >>> import numpy as np
  >>> X_hetero = np.array([['xxx', 1, 1.0], ['yyy', 2, 2.0], ['zzz', 3, 3.0]],
  ...                     dtype=object)
  >>> y_hetero = np.array([0, 0, 1])
  >>> X_resampled, y_resampled = ros.fit_resample(X_hetero, y_hetero)
  >>> print(X_resampled)
  [['xxx' 1 1.0]
   ['yyy' 2 2.0]
   ['zzz' 3 3.0]
   ['zzz' 3 3.0]]
  >>> print(y_resampled)
  [0 0 1 1]

It would also work with pandas dataframe::

  >>> from sklearn.datasets import fetch_openml
  >>> df_adult, y_adult = fetch_openml(
  ...     'adult', version=2, as_frame=True, return_X_y=True)
  >>> df_adult.head()  # doctest: +SKIP
  >>> df_resampled, y_resampled = ros.fit_resample(df_adult, y_adult)
  >>> df_resampled.head()  # doctest: +SKIP

If repeating samples is an issue, the parameter `shrinkage` allows to create a
smoothed bootstrap. However, the original data needs to be numerical. The
`shrinkage` parameter controls the dispersion of the new generated samples. We
show an example illustrate that the new samples are not overlapping anymore
once using a smoothed bootstrap. This ways of generating smoothed bootstrap is
also known a Random Over-Sampling Examples
(ROSE) :cite:`torelli2014rose`.

.. image:: ./auto_examples/over-sampling/images/sphx_glr_plot_comparison_over_sampling_003.png
   :target: ./auto_examples/over-sampling/plot_comparison_over_sampling.html
   :scale: 60
   :align: center

.. _smote_adasyn:

From random over-sampling to SMOTE and ADASYN
---------------------------------------------

Apart from the random sampling with replacement, there are two popular methods
to over-sample minority classes: (i) the Synthetic Minority Oversampling
Technique (SMOTE) :cite:`chawla2002smote` and (ii) the Adaptive Synthetic
(ADASYN) :cite:`he2008adasyn` sampling method. These algorithms can be used in
the same manner::

  >>> from imblearn.over_sampling import SMOTE, ADASYN
  >>> X_resampled, y_resampled = SMOTE().fit_resample(X, y)
  >>> print(sorted(Counter(y_resampled).items()))
  [(0, 4674), (1, 4674), (2, 4674)]
  >>> clf_smote = LogisticRegression().fit(X_resampled, y_resampled)
  >>> X_resampled, y_resampled = ADASYN().fit_resample(X, y)
  >>> print(sorted(Counter(y_resampled).items()))
  [(0, 4673), (1, 4662), (2, 4674)]
  >>> clf_adasyn = LogisticRegression().fit(X_resampled, y_resampled)

The figure below illustrates the major difference of the different
over-sampling methods.

.. image:: ./auto_examples/over-sampling/images/sphx_glr_plot_comparison_over_sampling_004.png
   :target: ./auto_examples/over-sampling/plot_comparison_over_sampling.html
   :scale: 60
   :align: center

Ill-posed examples
------------------

While the :class:`RandomOverSampler` is over-sampling by duplicating some of
the original samples of the minority class, :class:`SMOTE` and :class:`ADASYN`
generate new samples in by interpolation. However, the samples used to
interpolate/generate new synthetic samples differ. In fact, :class:`ADASYN`
focuses on generating samples next to the original samples which are wrongly
classified using a k-Nearest Neighbors classifier while the basic
implementation of :class:`SMOTE` will not make any distinction between easy and
hard samples to be classified using the nearest neighbors rule. Therefore, the
decision function found during training will be different among the algorithms.

.. image:: ./auto_examples/over-sampling/images/sphx_glr_plot_comparison_over_sampling_005.png
   :target: ./auto_examples/over-sampling/plot_comparison_over_sampling.html
   :align: center

The sampling particularities of these two algorithms can lead to some peculiar
behavior as shown below.

.. image:: ./auto_examples/over-sampling/images/sphx_glr_plot_comparison_over_sampling_006.png
   :target: ./auto_examples/over-sampling/plot_comparison_over_sampling.html
   :scale: 60
   :align: center

SMOTE variants
--------------

SMOTE might connect inliers and outliers while ADASYN might focus solely on
outliers which, in both cases, might lead to a sub-optimal decision
function. In this regard, SMOTE offers three additional options to generate
samples. Those methods focus on samples near the border of the optimal
decision function and will generate samples in the opposite direction of the
nearest neighbors class. Those variants are presented in the figure below.

.. image:: ./auto_examples/over-sampling/images/sphx_glr_plot_comparison_over_sampling_007.png
   :target: ./auto_examples/over-sampling/plot_comparison_over_sampling.html
   :scale: 60
   :align: center


The :class:`BorderlineSMOTE` :cite:`han2005borderline`,
:class:`SVMSMOTE` :cite:`nguyen2009borderline`, and
:class:`KMeansSMOTE` :cite:`last2017oversampling` offer some variant of the
SMOTE algorithm::

  >>> from imblearn.over_sampling import BorderlineSMOTE
  >>> X_resampled, y_resampled = BorderlineSMOTE().fit_resample(X, y)
  >>> print(sorted(Counter(y_resampled).items()))
  [(0, 4674), (1, 4674), (2, 4674)]

When dealing with mixed data type such as continuous and categorical features,
none of the presented methods (apart of the class :class:`RandomOverSampler`)
can deal with the categorical features. The :class:`SMOTENC`
:cite:`chawla2002smote` is an extension of the :class:`SMOTE` algorithm for
which categorical data are treated differently::

  >>> # create a synthetic data set with continuous and categorical features
  >>> rng = np.random.RandomState(42)
  >>> n_samples = 50
  >>> X = np.empty((n_samples, 3), dtype=object)
  >>> X[:, 0] = rng.choice(['A', 'B', 'C'], size=n_samples).astype(object)
  >>> X[:, 1] = rng.randn(n_samples)
  >>> X[:, 2] = rng.randint(3, size=n_samples)
  >>> y = np.array([0] * 20 + [1] * 30)
  >>> print(sorted(Counter(y).items()))
  [(0, 20), (1, 30)]

In this data set, the first and last features are considered as categorical
features. One needs to provide this information to :class:`SMOTENC` via the
parameters ``categorical_features`` either by passing the indices, the feature
names when `X` is a pandas DataFrame, a boolean mask marking these features,
or relying on `dtype` inference if the columns are using the
:class:`pandas.CategoricalDtype`::

  >>> from imblearn.over_sampling import SMOTENC
  >>> smote_nc = SMOTENC(categorical_features=[0, 2], random_state=0)
  >>> X_resampled, y_resampled = smote_nc.fit_resample(X, y)
  >>> print(sorted(Counter(y_resampled).items()))
  [(0, 30), (1, 30)]
  >>> print(X_resampled[-5:])
  [['A' 0.19... 2]
   ['B' -0.36... 2]
   ['B' 0.87... 2]
   ['B' 0.37... 2]
   ['B' 0.33... 2]]

Therefore, it can be seen that the samples generated in the first and last
columns are belonging to the same categories originally presented without any
other extra interpolation.

However, :class:`SMOTENC` is only working when data is a mixed of numerical and
categorical features. If data are made of only categorical data, one can use
the :class:`SMOTEN` variant :cite:`chawla2002smote`. The algorithm changes in
two ways:

* the nearest neighbors search does not rely on the Euclidean distance. Indeed,
  the value difference metric (VDM) also implemented in the class
  :class:`~imblearn.metrics.ValueDifferenceMetric` is used.
* a new sample is generated where each feature value corresponds to the most
  common category seen in the neighbors samples belonging to the same class.

Let's take the following example::

   >>> import numpy as np
   >>> X = np.array(["green"] * 5 + ["red"] * 10 + ["blue"] * 7,
   ...              dtype=object).reshape(-1, 1)
   >>> y = np.array(["apple"] * 5 + ["not apple"] * 3 + ["apple"] * 7 +
   ...              ["not apple"] * 5 + ["apple"] * 2, dtype=object)

We generate a dataset associating a color to being an apple or not an apple.
We strongly associated "green" and "red" to being an apple. The minority class
being "not apple", we expect new data generated belonging to the category
"blue"::

   >>> from imblearn.over_sampling import SMOTEN
   >>> sampler = SMOTEN(random_state=0)
   >>> X_res, y_res = sampler.fit_resample(X, y)
   >>> X_res[y.size:]
   array([['blue'],
           ['blue'],
           ['blue'],
           ['blue'],
           ['blue'],
           ['blue']], dtype=object)
   >>> y_res[y.size:]
   array(['not apple', 'not apple', 'not apple', 'not apple', 'not apple',
          'not apple'], dtype=object)

Mathematical formulation
========================

Sample generation
-----------------

Both :class:`SMOTE` and :class:`ADASYN` use the same algorithm to generate new
samples. Considering a sample :math:`x_i`, a new sample :math:`x_{new}` will be
generated considering its k neareast-neighbors (corresponding to
``k_neighbors``). For instance, the 3 nearest-neighbors are included in the
blue circle as illustrated in the figure below. Then, one of these
nearest-neighbors :math:`x_{zi}` is selected and a sample is generated as
follows:

.. math::

   x_{new} = x_i + \lambda \times (x_{zi} - x_i)

where :math:`\lambda` is a random number in the range :math:`[0, 1]`. This
interpolation will create a sample on the line between :math:`x_{i}` and
:math:`x_{zi}` as illustrated in the image below:

.. image:: ./auto_examples/over-sampling/images/sphx_glr_plot_illustration_generation_sample_001.png
   :target: ./auto_examples/over-sampling/plot_illustration_generation_sample.html
   :scale: 60
   :align: center

SMOTE-NC slightly change the way a new sample is generated by performing
something specific for the categorical features. In fact, the categories of a
new generated sample are decided by picking the most frequent category of the
nearest neighbors present during the generation.

.. warning::
   Be aware that SMOTE-NC is not designed to work with only categorical data.

The other SMOTE variants and ADASYN differ from each other by selecting the
samples :math:`x_i` ahead of generating the new samples.

The **regular** SMOTE algorithm --- cf. to the :class:`SMOTE` object --- does not
impose any rule and will randomly pick-up all possible :math:`x_i` available.

The **borderline** SMOTE --- cf. to the :class:`BorderlineSMOTE` with the
parameters ``kind='borderline-1'`` and ``kind='borderline-2'`` --- will
classify each sample :math:`x_i` to be (i) noise (i.e. all nearest-neighbors
are from a different class than the one of :math:`x_i`), (ii) in danger
(i.e. at least half of the nearest neighbors are from the same class than
:math:`x_i`, or (iii) safe (i.e. all nearest neighbors are from the same class
than :math:`x_i`). **Borderline-1** and **Borderline-2** SMOTE will use the
samples *in danger* to generate new samples. In **Borderline-1** SMOTE,
:math:`x_{zi}` will belong to the same class than the one of the sample
:math:`x_i`. On the contrary, **Borderline-2** SMOTE will consider
:math:`x_{zi}` which can be from any class.

**SVM** SMOTE --- cf. to :class:`SVMSMOTE` --- uses an SVM classifier to find
support vectors and generate samples considering them. Note that the ``C``
parameter of the SVM classifier allows to select more or less support vectors.

For both borderline and SVM SMOTE, a neighborhood is defined using the
parameter ``m_neighbors`` to decide if a sample is in danger, safe, or noise.

**KMeans** SMOTE --- cf. to :class:`KMeansSMOTE` --- uses a KMeans clustering
method before to apply SMOTE. The clustering will group samples together and
generate new samples depending of the cluster density.

ADASYN works similarly to the regular SMOTE. However, the number of
samples generated for each :math:`x_i` is proportional to the number of samples
which are not from the same class than :math:`x_i` in a given
neighborhood. Therefore, more samples will be generated in the area that the
nearest neighbor rule is not respected. The parameter ``m_neighbors`` is
equivalent to ``k_neighbors`` in :class:`SMOTE`.

Multi-class management
----------------------

All algorithms can be used with multiple classes as well as binary classes
classification.  :class:`RandomOverSampler` does not require any inter-class
information during the sample generation. Therefore, each targeted class is
resampled independently. In the contrary, both :class:`ADASYN` and
:class:`SMOTE` need information regarding the neighbourhood of each sample used
for sample generation. They are using a one-vs-rest approach by selecting each
targeted class and computing the necessary statistics against the rest of the
data set which are grouped in a single class.


================================================
FILE: doc/references/combine.rst
================================================
.. _combine_ref:

Combination of over- and under-sampling methods
===============================================

.. automodule:: imblearn.combine
   :no-members:
   :no-inherited-members:

.. currentmodule:: imblearn.combine

.. autosummary::
   :toctree: generated/
   :template: class.rst

   SMOTEENN
   SMOTETomek


================================================
FILE: doc/references/datasets.rst
================================================
.. _datasets_ref:

Datasets
========

.. automodule:: imblearn.datasets
    :no-members:
    :no-inherited-members:

.. currentmodule:: imblearn.datasets

.. autosummary::
   :toctree: generated/
   :template: function.rst

   make_imbalance
   fetch_datasets


================================================
FILE: doc/references/ensemble.rst
================================================
.. _ensemble_ref:

Ensemble methods
================

.. automodule:: imblearn.ensemble
    :no-members:
    :no-inherited-members:

.. currentmodule:: imblearn.ensemble

Boosting algorithms
-------------------

.. autosummary::
   :toctree: generated/
   :template: class.rst

   EasyEnsembleClassifier
   RUSBoostClassifier

Bagging algorithms
------------------

.. autosummary::
   :toctree: generated/
   :template: class.rst

   BalancedBaggingClassifier
   BalancedRandomForestClassifier


================================================
FILE: doc/references/index.rst
================================================
.. _api:

#############
API reference
#############

This is the full API documentation of the `imbalanced-learn` toolbox.

.. toctree::
   :maxdepth: 3

   under_sampling
   over_sampling
   combine
   ensemble
   keras
   tensorflow
   miscellaneous
   pipeline
   metrics
   model_selection
   datasets
   utils


================================================
FILE: doc/references/keras.rst
================================================
.. _keras_ref:

Batch generator for Keras
=========================

.. automodule:: imblearn.keras
    :no-members:
    :no-inherited-members:

.. currentmodule:: imblearn

.. autosummary::
   :toctree: generated/
   :template: class.rst

   keras.BalancedBatchGenerator

.. autosummary::
   :toctree: generated/
   :template: function.rst

   keras.balanced_batch_generator


================================================
FILE: doc/references/metrics.rst
================================================
.. _metrics_ref:

Metrics
=======

.. automodule:: imblearn.metrics
   :no-members:
   :no-inherited-members:

Classification metrics
----------------------
See the :ref:`metrics` section of the user guide for further details.

.. currentmodule:: imblearn.metrics

.. autosummary::
   :toctree: generated/
   :template: function.rst

   classification_report_imbalanced
   sensitivity_specificity_support
   sensitivity_score
   specificity_score
   geometric_mean_score
   macro_averaged_mean_absolute_error
   make_index_balanced_accuracy

Pairwise metrics
----------------
See the :ref:`pairwise_metrics` section of the user guide for further details.

.. automodule:: imblearn.metrics.pairwise
   :no-members:
   :no-inherited-members:

.. currentmodule:: imblearn.metrics.pairwise

.. autosummary::
   :toctree: generated/
   :template: class.rst

   ValueDifferenceMetric


================================================
FILE: doc/references/miscellaneous.rst
================================================
.. _misc_ref:

Miscellaneous
=============

Imbalance-learn provides some fast-prototyping tools.

.. currentmodule:: imblearn

.. autosummary::
   :toctree: generated/
   :template: class.rst

   FunctionSampler


================================================
FILE: doc/references/model_selection.rst
================================================
.. _model_selection_ref:

Model selection methods
=======================

.. automodule:: imblearn.model_selection
    :no-members:
    :no-inherited-members:

Cross-validation splitters
--------------------------

.. automodule:: imblearn.model_selection._split
   :no-members:
   :no-inherited-members:

.. currentmodule:: imblearn.model_selection

.. autosummary::
   :toctree: generated/
   :template: class.rst

   InstanceHardnessCV


================================================
FILE: doc/references/over_sampling.rst
================================================
.. _over_sampling_ref:

Over-sampling methods
=====================

.. automodule:: imblearn.over_sampling
    :no-members:
    :no-inherited-members:

.. currentmodule:: imblearn.over_sampling

Basic over-sampling
-------------------

.. autosummary::
   :toctree: generated/
   :template: class.rst

   RandomOverSampler

SMOTE algorithms
----------------

.. autosummary::
   :toctree: generated/
   :template: class.rst

   SMOTE
   SMOTENC
   SMOTEN
   ADASYN
   BorderlineSMOTE
   KMeansSMOTE
   SVMSMOTE


================================================
FILE: doc/references/pipeline.rst
================================================
.. _pipeline_ref:

Pipeline
========

.. automodule:: imblearn.pipeline
    :no-members:
    :no-inherited-members:

.. currentmodule:: imblearn.pipeline

.. autosummary::
   :toctree: generated/
   :template: class.rst

   Pipeline

.. autosummary::
   :toctree: generated/
   :template: function.rst

   make_pipeline


================================================
FILE: doc/references/tensorflow.rst
================================================
.. _tensorflow_ref:

Batch generator for TensorFlow
==============================

.. automodule:: imblearn.tensorflow
    :no-members:
    :no-inherited-members:

.. currentmodule:: imblearn

.. autosummary::
   :toctree: generated/
   :template: function.rst

   tensorflow.balanced_batch_generator


================================================
FILE: doc/references/under_sampling.rst
================================================
.. _under_sampling_ref:

Under-sampling methods
======================

.. automodule:: imblearn.under_sampling
    :no-members:
    :no-inherited-members:

Prototype generation
--------------------

.. automodule:: imblearn.under_sampling._prototype_generation
   :no-members:
   :no-inherited-members:

.. currentmodule:: imblearn.under_sampling

.. autosummary::
   :toctree: generated/
   :template: class.rst

   ClusterCentroids

Prototype selection
-------------------

.. automodule:: imblearn.under_sampling._prototype_selection
   :no-members:
   :no-inherited-members:

.. currentmodule:: imblearn.under_sampling

.. autosummary::
   :toctree: generated/
   :template: class.rst

   CondensedNearestNeighbour
   EditedNearestNeighbours
   RepeatedEditedNearestNeighbours
   AllKNN
   InstanceHardnessThreshold
   NearMiss
   NeighbourhoodCleaningRule
   OneSidedSelection
   RandomUnderSampler
   TomekLinks


================================================
FILE: doc/references/utils.rst
================================================
Utilities
=========

.. automodule:: imblearn.utils
    :no-members:
    :no-inherited-members:

.. currentmodule:: imblearn.utils

Validation checks used in samplers
----------------------------------

.. autosummary::
   :toctree: generated/
   :template: function.rst

   estimator_checks.parametrize_with_checks
   check_neighbors_object
   check_sampling_strategy
   check_target_type

Testing compatibility of your own sampler
-----------------------------------------

.. automodule:: imblearn.utils.estimator_checks
    :no-members:
    :no-inherited-members:

.. currentmodule:: imblearn.utils.estimator_checks

.. autosummary::
   :toctree: generated/
   :template: function.rst

   parametrize_with_checks


================================================
FILE: doc/sphinxext/LICENSE.txt
================================================
-------------------------------------------------------------------------------
    The files
    - numpydoc.py
    - autosummary.py
    - autosummary_generate.py
    - docscrape.py
    - docscrape_sphinx.py
    - phantom_import.py
    have the following license:

Copyright (C) 2008 Stefan van der Walt <stefan@mentat.za.net>, Pauli Virtanen <pav@iki.fi>

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:

 1. Redistributions of source code must retain the above copyright
    notice, this list of conditions and the following disclaimer.
 2. Redistributions in binary form must reproduce the above copyright
    notice, this list of conditions and the following disclaimer in
    the documentation and/or other materials provided with the
    distribution.

THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT,
INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.

-------------------------------------------------------------------------------
    The files
    - compiler_unparse.py
    - comment_eater.py
    - traitsdoc.py
    have the following license:

This software is OSI Certified Open Source Software.
OSI Certified is a certification mark of the Open Source Initiative.

Copyright (c) 2006, Enthought, Inc.
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

 * Redistributions of source code must retain the above copyright notice, this
   list of conditions and the following disclaimer.
 * Redistributions in binary form must reproduce the above copyright notice,
   this list of conditions and the following disclaimer in the documentation
   and/or other materials provided with the distribution.
 * Neither the name of Enthought, Inc. nor the names of its contributors may
   be used to endorse or promote products derived from this software without
   specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.


-------------------------------------------------------------------------------
    The files
    - only_directives.py
    - plot_directive.py
    originate from Matplotlib (http://matplotlib.sf.net/) which has
    the following license:

Copyright (c) 2002-2008 John D. Hunter; All Rights Reserved.

1. This LICENSE AGREEMENT is between John D. Hunter (“JDH”), and the Individual or Organization (“Licensee”) accessing and otherwise using matplotlib software in source or binary form and its associated documentation.

2. Subject to the terms and conditions of this License Agreement, JDH hereby grants Licensee a nonexclusive, royalty-free, world-wide license to reproduce, analyze, test, perform and/or display publicly, prepare derivative works, distribute, and otherwise use matplotlib 0.98.3 alone or in any derivative version, provided, however, that JDH’s License Agreement and JDH’s notice of copyright, i.e., “Copyright (c) 2002-2008 John D. Hunter; All Rights Reserved” are retained in matplotlib 0.98.3 alone or in any derivative version prepared by Licensee.

3. In the event Licensee prepares a derivative work that is based on or incorporates matplotlib 0.98.3 or any part thereof, and wants to make the derivative work available to others as provided herein, then Licensee hereby agrees to include in any such work a brief summary of the changes made to matplotlib 0.98.3.

4. JDH is making matplotlib 0.98.3 available to Licensee on an “AS IS” basis. JDH MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, JDH MAKES NO AND DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF MATPLOTLIB 0.98.3 WILL NOT INFRINGE ANY THIRD PARTY RIGHTS.

5. JDH SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF MATPLOTLIB 0.98.3 FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS AS A RESULT OF MODIFYING, DISTRIBUTING, OR OTHERWISE USING MATPLOTLIB 0.98.3, OR ANY DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF.

6. This License Agreement will automatically terminate upon a material breach of its terms and conditions.

7. Nothing in this License Agreement shall be deemed to create any relationship of agency, partnership, or joint venture between JDH and Licensee. This License Agreement does not grant permission to use JDH trademarks or trade name in a trademark sense to endorse or promote products or services of Licensee, or any third party.

8. By copying, installing or otherwise using matplotlib 0.98.3, Licensee agrees to be bound by the terms and conditions of this License Agreement.


================================================
FILE: doc/sphinxext/MANIFEST.in
================================================
recursive-include tests *.py
include *.txt


================================================
FILE: doc/sphinxext/README.txt
================================================
=====================================
numpydoc -- Numpy's Sphinx extensions
=====================================

Numpy's documentation uses several custom extensions to Sphinx.  These
are shipped in this ``numpydoc`` package, in case you want to make use
of them in third-party projects.

The following extensions are available:

  - ``numpydoc``: support for the Numpy docstring format in Sphinx, and add
    the code description directives ``np-function``, ``np-cfunction``, etc.
    that support the Numpy docstring syntax.

  - ``numpydoc.traitsdoc``: For gathering documentation about Traits attributes.

  - ``numpydoc.plot_directives``: Adaptation of Matplotlib's ``plot::``
    directive. Note that this implementation may still undergo severe
    changes or eventually be deprecated.

  - ``numpydoc.only_directives``: (DEPRECATED)

  - ``numpydoc.autosummary``: (DEPRECATED) An ``autosummary::`` directive.
    Available in Sphinx 0.6.2 and (to-be) 1.0 as ``sphinx.ext.autosummary``,
    and it the Sphinx 1.0 version is recommended over that included in
    Numpydoc.


numpydoc
========

Numpydoc inserts a hook into Sphinx's autodoc that converts docstrings
following the Numpy/Scipy format to a form palatable to Sphinx.

Options
-------

The following options can be set in conf.py:

- numpydoc_use_plots: bool

  Whether to produce ``plot::`` directives for Examples sections that
  contain ``import matplotlib``.

- numpydoc_show_class_members: bool

  Whether to show all members of a class in the Methods and Attributes
  sections automatically.

- numpydoc_edit_link: bool  (DEPRECATED -- edit your HTML template instead)

  Whether to insert an edit link after docstrings.


================================================
FILE: doc/sphinxext/github_link.py
================================================
import inspect
import os
import subprocess
import sys
from functools import partial
from operator import attrgetter

REVISION_CMD = "git rev-parse --short HEAD"


def _get_git_revision():
    try:
        revision = subprocess.check_output(REVISION_CMD.split()).strip()
    except (subprocess.CalledProcessError, OSError):
        print("Failed to execute git to get revision")
        return None
    return revision.decode("utf-8")


def _linkcode_resolve(domain, info, package, url_fmt, revision):
    """Determine a link to online source for a class/method/function

    This is called by sphinx.ext.linkcode

    An example with a long-untouched module that everyone has
    >>> _linkcode_resolve('py', {'module': 'tty',
    ...                          'fullname': 'setraw'},
    ...                   package='tty',
    ...                   url_fmt='https://hg.python.org/cpython/file/'
    ...                           '{revision}/Lib/{package}/{path}#L{lineno}',
    ...                   revision='xxxx')
    'https://hg.python.org/cpython/file/xxxx/Lib/tty/tty.py#L18'
    """

    if revision is None:
        return
    if domain not in ("py", "pyx"):
        return
    if not info.get("module") or not info.get("fullname"):
        return

    class_name = info["fullname"].split(".")[0]
    module = __import__(info["module"], fromlist=[class_name])
    obj = attrgetter(info["fullname"])(module)

    # Unwrap the object to get the correct source
    # file in case that is wrapped by a decorator
    obj = inspect.unwrap(obj)

    try:
        fn = inspect.getsourcefile(obj)
    except Exception:
        fn = None
    if not fn:
        try:
            fn = inspect.getsourcefile(sys.modules[obj.__module__])
        except Exception:
            fn = None
    if not fn:
        return

    fn = os.path.relpath(fn, start=os.path.dirname(__import__(package).__file__))
    try:
        lineno = inspect.getsourcelines(obj)[1]
    except Exception:
        lineno = ""
    return url_fmt.format(revision=revision, package=package, path=fn, lineno=lineno)


def make_linkcode_resolve(package, url_fmt):
    """Returns a linkcode_resolve function for the given URL format

    revision is a git commit reference (hash or name)

    package is the name of the root module of the package

    url_fmt is along the lines of ('https://github.com/USER/PROJECT/'
                                   'blob/{revision}/{package}/'
                                   '{path}#L{lineno}')
    """
    revision = _get_git_revision()
    return partial(
        _linkcode_resolve, revision=revision, package=package, url_fmt=url_fmt
    )


================================================
FILE: doc/sphinxext/sphinx_issues.py
================================================
"""A Sphinx extension for linking to your project's issue tracker.

Copyright 2014 Steven Loria

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
"""
import re

from docutils import nodes, utils
from sphinx.util.nodes import split_explicit_title

__version__ = "1.2.0"
__author__ = "Steven Loria"
__license__ = "MIT"


def user_role(name, rawtext, text, lineno, inliner, options=None, content=None):
    """Sphinx role for linking to a user profile. Defaults to linking to
    Github profiles, but the profile URIS can be configured via the
    ``issues_user_uri`` config value.
    Examples: ::
        :user:`sloria`
    Anchor text also works: ::
        :user:`Steven Loria <sloria>`
    """
    options = options or {}
    content = content or []
    has_explicit_title, title, target = split_explicit_title(text)

    target = utils.unescape(target).strip()
    title = utils.unescape(title).strip()
    config = inliner.document.settings.env.app.config
    if config.issues_user_uri:
        ref = config.issues_user_uri.format(user=target)
    else:
        ref = f"https://github.com/{target}"
    if has_explicit_title:
        text = title
    else:
        text = f"@{target}"

    link = nodes.reference(text=text, refuri=ref, **options)
    return [link], []


def cve_role(name, rawtext, text, lineno, inliner, options=None, content=None):
    """Sphinx role for linking to a CVE on https://cve.mitre.org.
    Examples: ::
        :cve:`CVE-2018-17175`
    """
    options = options or {}
    content = content or []
    has_explicit_title, title, target = split_explicit_title(text)

    target = utils.unescape(target).strip()
    title = utils.unescape(title).strip()
    ref = f"https://cve.mitre.org/cgi-bin/cvename.cgi?name={target}"
    text = title if has_explicit_title else target
    link = nodes.reference(text=text, refuri=ref, **options)
    return [link], []


class IssueRole:
    EXTERNAL_REPO_REGEX = re.compile(r"^(\w+)/(.+)([#@])([\w]+)$")

    def __init__(
        self,
        uri_config_option,
        format_kwarg,
        github_uri_template,
        format_text=None,
    ):
        self.uri_config_option = uri_config_option
        self.format_kwarg = format_kwarg
        self.github_uri_template = github_uri_template
        self.format_text = format_text or self.default_format_text

    @staticmethod
    def default_format_text(issue_no):
        return f"#{issue_no}"

    def make_node(self, name, issue_no, config, options=None):
        name_map = {"pr": "pull", "issue": "issues", "commit": "commit"}
        options = options or {}
        repo_match = self.EXTERNAL_REPO_REGEX.match(issue_no)
        if repo_match:  # External repo
            username, repo, symbol, issue = repo_match.groups()
            if name not in name_map:
                raise ValueError(f"External repo linking not supported for :{name}:")
            path = name_map.get(name)
            ref = "https://github.com/{issues_github_path}/{path}/{n}".format(
                issues_github_path=f"{username}/{repo}",
                path=path,
                n=issue,
            )
            formatted_issue = self.format_text(issue).lstrip("#")
            text = "{username}/{repo}{symbol}{formatted_issue}".format(**locals())
            link = nodes.reference(text=text, refuri=ref, **options)
            return link

        if issue_no not in ("-", "0"):
            uri_template = getattr(config, self.uri_config_option, None)
            if uri_template:
                ref = uri_template.format(**{self.format_kwarg: issue_no})
            elif config.issues_github_path:
                ref = self.github_uri_template.format(
                    issues_github_path=config.issues_github_path, n=issue_no
                )
            else:
                raise ValueError(
                    f"Neither {self.uri_config_option} nor issues_github_path is set"
                )
            issue_text = self.format_text(issue_no)
            link = nodes.reference(text=issue_text, refuri=ref, **options)
        else:
            link = None
        return link

    def __call__(
        self, name, rawtext, text, lineno, inliner, options=None, content=None
    ):
        options = options or {}
        content = content or []
        issue_nos = [each.strip() for each in utils.unescape(text).split(",")]
        config = inliner.document.settings.env.app.config
        ret = []
        for i, issue_no in enumerate(issue_nos):
            node = self.make_node(name, issue_no, config, options=options)
            ret.append(node)
            if i != len(issue_nos) - 1:
                sep = nodes.raw(text=", ", format="html")
                ret.append(sep)
        return ret, []


"""Sphinx role for linking to an issue. Must have
`issues_uri` or `issues_github_path` configured in ``conf.py``.
Examples: ::
    :issue:`123`
    :issue:`42,45`
    :issue:`sloria/konch#123`
"""
issue_role = IssueRole(
    uri_config_option="issues_uri",
    format_kwarg="issue",
    github_uri_template="https://github.com/{issues_github_path}/issues/{n}",
)

"""Sphinx role for linking to a pull request. Must have
`issues_pr_uri` or `issues_github_path` configured in ``conf.py``.
Examples: ::
    :pr:`123`
    :pr:`42,45`
    :pr:`sloria/konch#43`
"""
pr_role = IssueRole(
    uri_config_option="issues_pr_uri",
    format_kwarg="pr",
    github_uri_template="https://github.com/{issues_github_path}/pull/{n}",
)


def format_commit_text(sha):
    return sha[:7]


"""Sphinx role for linking to a commit. Must have
`issues_pr_uri` or `issues_github_path` configured in ``conf.py``.
Examples: ::
    :commit:`123abc456def`
    :commit:`sloria/konch@123abc456def`
"""
commit_role = IssueRole(
    uri_config_option="issues_commit_uri",
    format_kwarg="commit",
    github_uri_template="https://github.com/{issues_github_path}/commit/{n}",
    format_text=format_commit_text,
)


def setup(app):
    # Format template for issues URI
    # e.g. 'https://github.com/sloria/marshmallow/issues/{issue}
    app.add_config_value("issues_uri", default=None, rebuild="html")
    # Format template for PR URI
    # e.g. 'https://github.com/sloria/marshmallow/pull/{issue}
    app.add_config_value("issues_pr_uri", default=None, rebuild="html")
    # Format template for commit URI
    # e.g. 'https://github.com/sloria/marshmallow/commits/{commit}
    app.add_config_value("issues_commit_uri", default=None, rebuild="html")
    # Shortcut for Github, e.g. 'sloria/marshmallow'
    app.add_config_value("issues_github_path", default=None, rebuild="html")
    # Format template for user profile URI
    # e.g. 'https://github.com/{user}'
    app.add_config_value("issues_user_uri", default=None, rebuild="html")
    app.add_role("issue", issue_role)
    app.add_role("pr", pr_role)
    app.add_role("user", user_role)
    app.add_role("commit", commit_role)
    app.add_role("cve", cve_role)
    return {
        "version": __version__,
        "parallel_read_safe": True,
        "parallel_write_safe": True,
    }


================================================
FILE: doc/under_sampling.rst
================================================
.. _under-sampling:

==============
Under-sampling
==============

.. currentmodule:: imblearn.under_sampling

One way of handling imbalanced datasets is to reduce the number of observations from
all classes but the minority class. The minority class is that with the least number
of observations. The most well known algorithm in this group is random
undersampling, where samples from the targeted classes are removed at random.

But there are many other algorithms to help us reduce the number of observations in the
dataset. These algorithms can be grouped based on their undersampling strategy into:

- Prototype generation methods.
- Prototype selection methods.

And within the latter, we find:

- Controlled undersampling
- Cleaning methods

We will discuss the different algorithms throughout this document.

Check also
:ref:`sphx_glr_auto_examples_under-sampling_plot_comparison_under_sampling.py`.

.. _cluster_centroids:

Prototype generation
====================

Given an original data set :math:`S`, prototype generation algorithms will
generate a new set :math:`S'` where :math:`|S'| < |S|` and :math:`S' \not\subset
S`. In other words, prototype generation techniques will reduce the number of
samples in the targeted classes but the remaining samples are generated --- and
not selected --- from the original set.

:class:`ClusterCentroids` makes use of K-means to reduce the number of
samples. Therefore, each class will be synthesized with the centroids of the
K-means method instead of the original samples::

  >>> from collections import Counter
  >>> from sklearn.datasets import make_classification
  >>> X, y = make_classification(n_samples=5000, n_features=2, n_informative=2,
  ...                            n_redundant=0, n_repeated=0, n_classes=3,
  ...                            n_clusters_per_class=1,
  ...                            weights=[0.01, 0.05, 0.94],
  ...                            class_sep=0.8, random_state=0)
  >>> print(sorted(Counter(y).items()))
  [(0, 64), (1, 262), (2, 4674)]
  >>> from imblearn.under_sampling import ClusterCentroids
  >>> cc = ClusterCentroids(random_state=0)
  >>> X_resampled, y_resampled = cc.fit_resample(X, y)
  >>> print(sorted(Counter(y_resampled).items()))
  [(0, 64), (1, 64), (2, 64)]

The figure below illustrates such under-sampling.

.. image:: ./auto_examples/under-sampling/images/sphx_glr_plot_comparison_under_sampling_001.png
   :target: ./auto_examples/under-sampling/plot_comparison_under_sampling.html
   :scale: 60
   :align: center

:class:`ClusterCentroids` offers an efficient way to represent the data cluster
with a reduced number of samples. Keep in mind that this method requires that
your data are grouped into clusters. In addition, the number of centroids
should be set such that the under-sampled clusters are representative of the
original one.

.. warning::

   :class:`ClusterCentroids` supports sparse matrices. However, the new samples
   generated are not specifically sparse. Therefore, even if the resulting
   matrix will be sparse, the algorithm will be inefficient in this regard.

Prototype selection
===================

Prototype selection algorithms will select samples from the original set :math:`S`,
generating a dataset :math:`S'`, where :math:`|S'| < |S|` and :math:`S' \subset S`. In
other words, :math:`S'` is a subset of :math:`S`.

Prototype selection algorithms can be divided into two groups: (i) controlled
under-sampling techniques and (ii) cleaning under-sampling techniques.

Controlled under-sampling methods reduce the number of observations in the majority
class or classes to an arbitrary number of samples specified by the user. Typically,
they reduce the number of observations to the number of samples observed in the
minority class.

In contrast, cleaning under-sampling techniques "clean" the feature space by removing
either "noisy" or "too easy to classify" observations, depending on the method. The
final number of observations in each class varies with the cleaning method and can't be
specified by the user.

.. _controlled_under_sampling:

Controlled under-sampling techniques
------------------------------------

Controlled under-sampling techniques reduce the number of observations from the
targeted classes to a number specified by the user.

Random under-sampling
^^^^^^^^^^^^^^^^^^^^^

:class:`RandomUnderSampler` is a fast and easy way to balance the data by
randomly selecting a subset of data for the targeted classes::

  >>> from imblearn.under_sampling import RandomUnderSampler
  >>> rus = RandomUnderSampler(random_state=0)
  >>> X_resampled, y_resampled = rus.fit_resample(X, y)
  >>> print(sorted(Counter(y_resampled).items()))
  [(0, 64), (1, 64), (2, 64)]

.. image:: ./auto_examples/under-sampling/images/sphx_glr_plot_comparison_under_sampling_002.png
   :target: ./auto_examples/under-sampling/plot_comparison_under_sampling.html
   :scale: 60
   :align: center

:class:`RandomUnderSampler` allows bootstrapping the data by setting
``replacement`` to ``True``. When there are multiple classes, each targeted class is
under-sampled independently::

  >>> import numpy as np
  >>> print(np.vstack([tuple(row) for row in X_resampled]).shape)
  (192, 2)
  >>> rus = RandomUnderSampler(random_state=0, replacement=True)
  >>> X_resampled, y_resampled = rus.fit_resample(X, y)
  >>> print(np.vstack(np.unique([tuple(row) for row in X_resampled], axis=0)).shape)
  (181, 2)

:class:`RandomUnderSampler` handles heterogeneous data types, i.e. numerical,
categorical, dates, etc.::

  >>> X_hetero = np.array([['xxx', 1, 1.0], ['yyy', 2, 2.0], ['zzz', 3, 3.0]],
  ...                     dtype=object)
  >>> y_hetero = np.array([0, 0, 1])
  >>> X_resampled, y_resampled = rus.fit_resample(X_hetero, y_hetero)
  >>> print(X_resampled)
  [['xxx' 1 1.0]
   ['zzz' 3 3.0]]
  >>> print(y_resampled)
  [0 1]

:class:`RandomUnderSampler` also supports pandas dataframes as input for
undersampling::

  >>> from sklearn.datasets import fetch_openml
  >>> df_adult, y_adult = fetch_openml(
  ...     'adult', version=2, as_frame=True, return_X_y=True)
  >>> df_adult.head()  # doctest: +SKIP
  >>> df_resampled, y_resampled = rus.fit_resample(df_adult, y_adult)
  >>> df_resampled.head()  # doctest: +SKIP

:class:`NearMiss` adds some heuristic rules to select samples
:cite:`mani2003knn`. :class:`NearMiss` implements 3 different types of
heuristic which can be selected with the parameter ``version``::

  >>> from imblearn.under_sampling import NearMiss
  >>> nm1 = NearMiss(version=1)
  >>> X_resampled_nm1, y_resampled = nm1.fit_resample(X, y)
  >>> print(sorted(Counter(y_resampled).items()))
  [(0, 64), (1, 64), (2, 64)]

As later stated in the next section, :class:`NearMiss` heuristic rules are
based on nearest neighbors algorithm. Therefore, the parameters ``n_neighbors``
and ``n_neighbors_ver3`` accept classifier derived from ``KNeighborsMixin``
from scikit-learn. The former parameter is used to compute the average distance
to the neighbors while the latter is used for the pre-selection of the samples
of interest.

Mathematical formulation
^^^^^^^^^^^^^^^^^^^^^^^^

Let *positive samples* be the samples belonging to the targeted class to be
under-sampled. *Negative sample* refers to the samples from the minority class
(i.e., the most under-represented class).

NearMiss-1 selects the positive samples for which the average distance
to the :math:`N` closest samples of the negative class is the smallest.

.. image:: ./auto_examples/under-sampling/images/sphx_glr_plot_illustration_nearmiss_001.png
   :target: ./auto_examples/under-sampling/plot_illustration_nearmiss.html
   :scale: 60
   :align: center

NearMiss-2 selects the positive samples for which the average distance to the
:math:`N` farthest samples of the negative class is the smallest.

.. image:: ./auto_examples/under-sampling/images/sphx_glr_plot_illustration_nearmiss_002.png
   :target: ./auto_examples/under-sampling/plot_illustration_nearmiss.html
   :scale: 60
   :align: center

NearMiss-3 is a 2-steps algorithm. First, for each negative sample, their
:math:`M` nearest-neighbors will be kept. Then, the positive samples selected
are the one for which the average distance to the :math:`N` nearest-neighbors
is the largest.

.. image:: ./auto_examples/under-sampling/images/sphx_glr_plot_illustration_nearmiss_003.png
   :target: ./auto_examples/under-sampling/plot_illustration_nearmiss.html
   :scale: 60
   :align: center

In the next example, the different :class:`NearMiss` variant are applied on the
previous toy example. It can be seen that the decision functions obtained in
each case are different.

When under-sampling a specific class, NearMiss-1 can be altered by the presence
of noise. In fact, it will implied that samples of the targeted class will be
selected around these samples as it is the case in the illustration below for
the yellow class. However, in the normal case, samples next to the boundaries
will be selected. NearMiss-2 will not have this effect since it does not focus
on the nearest samples but rather on the farthest samples. We can imagine that
the presence of noise can also altered the sampling mainly in the presence of
marginal outliers. NearMiss-3 is probably the version which will be less
affected by noise due to the first step sample selection.

.. image:: ./auto_examples/under-sampling/images/sphx_glr_plot_comparison_under_sampling_003.png
   :target: ./auto_examples/under-sampling/plot_comparison_under_sampling.html
   :scale: 60
   :align: center

Cleaning under-sampling techniques
----------------------------------

Cleaning under-sampling methods "clean" the feature space by removing
either "noisy" observations or observations that are "too easy to classify", depending
on the method. The final number of observations in each targeted class varies with the
cleaning method and cannot be specified by the user.

.. _tomek_links:

Tomek's links
^^^^^^^^^^^^^

A Tomek's link exists when two samples from different classes are closest neighbors to
each other.

Mathematically, a Tomek's link between two samples from different classes :math:`x`
and :math:`y` is defined such that for any sample :math:`z`:

.. math::

   d(x, y) < d(x, z) \text{ and } d(x, y) < d(y, z)

where :math:`d(.)` is the distance between the two samples.

:class:`TomekLinks` detects and removes Tomek's links :cite:`tomek1976two`. The
underlying idea is that Tomek's links are noisy or hard to classify observations and
would not help the algorithm find a suitable discrimination boundary.

In the following figure, a Tomek's link between an observation of class :math:`+` and
class :math:`-` is highlighted in green:

.. image:: ./auto_examples/under-sampling/images/sphx_glr_plot_illustration_tomek_links_001.png
   :target: ./auto_examples/under-sampling/plot_illustration_tomek_links.html
   :scale: 60
   :align: center

When :class:`TomekLinks` finds a Tomek's link, it can either remove the sample of the
majority class, or both. The parameter ``sampling_strategy`` controls which samples
from the link will be removed. By default (i.e., ``sampling_strategy='auto'``), it will
remove the sample from the majority class. Both samples, that is that from the majority
and the one from the minority class, can be removed by setting ``sampling_strategy`` to
``'all'``.

The following figure illustrates this behaviour: on the left, only the sample from the
majority class is removed, whereas on the right, the entire Tomek's link is removed.

.. image:: ./auto_examples/under-sampling/images/sphx_glr_plot_illustration_tomek_links_002.png
   :target: ./auto_examples/under-sampling/plot_illustration_tomek_links.html
   :scale: 60
   :align: center

.. _edited_nearest_neighbors:

Editing data using nearest neighbours
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Edited nearest neighbours
~~~~~~~~~~~~~~~~~~~~~~~~~

The edited nearest neighbours methodology uses K-Nearest Neighbours to identify the
neighbours of the targeted class samples, and then removes observations if any or most
of their neighbours are from a different class :cite:`wilson1972asymptotic`.

:class:`EditedNearestNeighbours` carries out the following steps:

1. Train a K-Nearest neighbours using the entire dataset.
2. Find each observations' K closest neighbours (only for the targeted classes).
3. Remove observations if any or most of its neighbours belong to a different class.

Below the code implementation::

  >>> sorted(Counter(y).items())
  [(0, 64), (1, 262), (2, 4674)]
  >>> from imblearn.under_sampling import EditedNearestNeighbours
  >>> enn = EditedNearestNeighbours()
  >>> X_resampled, y_resampled = enn.fit_resample(X, y)
  >>> print(sorted(Counter(y_resampled).items()))
  [(0, 64), (1, 213), (2, 4568)]


To paraphrase step 3, :class:`EditedNearestNeighbours` will retain observations from
the majority class when **most**, or **all** of its neighbours are from the same class.
To control this behaviour we set ``kind_sel='mode'`` or ``kind_sel='all'``,
respectively. Hence, `kind_sel='all'` is less conservative than `kind_sel='mode'`,
resulting in the removal of more samples::

  >>> enn = EditedNearestNeighbours(kind_sel="all")
  >>> X_resampled, y_resampled = enn.fit_resample(X, y)
  >>> print(sorted(Counter(y_resampled).items()))
  [(0, 64), (1, 213), (2, 4568)]
  >>> enn = EditedNearestNei

Download .txt

gitextract_8wwn0p4o/

├── .circleci/
│   └── config.yml
├── .coveragerc
├── .github/
│   ├── ISSUE_TEMPLATE/
│   │   ├── bug_report.md
│   │   ├── documentation-improvement.md
│   │   ├── feature_request.md
│   │   ├── other--blank-template-.md
│   │   ├── question.md
│   │   └── usage-question.md
│   ├── ISSUE_TEMPLATE.md
│   ├── PULL_REQUEST_TEMPLATE.md
│   ├── check-changelog.yml
│   ├── dependabot.yml
│   └── workflows/
│       ├── circleci-artifacts-redirector.yml
│       ├── linters.yml
│       └── tests.yml
├── .gitignore
├── .pre-commit-config.yaml
├── AUTHORS.rst
├── CONTRIBUTING.md
├── LICENSE
├── MANIFEST.in
├── README.rst
├── build_tools/
│   └── circle/
│       ├── build_doc.sh
│       ├── checkout_merge_commit.sh
│       ├── linting.sh
│       └── push_doc.sh
├── conftest.py
├── doc/
│   ├── Makefile
│   ├── _static/
│   │   ├── css/
│   │   │   └── imbalanced-learn.css
│   │   ├── img/
│   │   │   └── logo.xcf
│   │   └── js/
│   │       └── copybutton.js
│   ├── _templates/
│   │   ├── class.rst
│   │   ├── function.rst
│   │   ├── numpydoc_docstring.rst
│   │   └── sidebar-search-bs.html
│   ├── about.rst
│   ├── bibtex/
│   │   └── refs.bib
│   ├── combine.rst
│   ├── common_pitfalls.rst
│   ├── conf.py
│   ├── datasets/
│   │   └── index.rst
│   ├── developers_utils.rst
│   ├── ensemble.rst
│   ├── index.rst
│   ├── install.rst
│   ├── introduction.rst
│   ├── make.bat
│   ├── metrics.rst
│   ├── miscellaneous.rst
│   ├── model_selection.rst
│   ├── over_sampling.rst
│   ├── references/
│   │   ├── combine.rst
│   │   ├── datasets.rst
│   │   ├── ensemble.rst
│   │   ├── index.rst
│   │   ├── keras.rst
│   │   ├── metrics.rst
│   │   ├── miscellaneous.rst
│   │   ├── model_selection.rst
│   │   ├── over_sampling.rst
│   │   ├── pipeline.rst
│   │   ├── tensorflow.rst
│   │   ├── under_sampling.rst
│   │   └── utils.rst
│   ├── sphinxext/
│   │   ├── LICENSE.txt
│   │   ├── MANIFEST.in
│   │   ├── README.txt
│   │   ├── github_link.py
│   │   └── sphinx_issues.py
│   ├── under_sampling.rst
│   ├── user_guide.rst
│   ├── whats_new/
│   │   ├── v0.1.rst
│   │   ├── v0.10.rst
│   │   ├── v0.11.rst
│   │   ├── v0.12.rst
│   │   ├── v0.13.rst
│   │   ├── v0.14.rst
│   │   ├── v0.15.rst
│   │   ├── v0.2.rst
│   │   ├── v0.3.rst
│   │   ├── v0.4.rst
│   │   ├── v0.5.rst
│   │   ├── v0.6.rst
│   │   ├── v0.7.rst
│   │   ├── v0.8.rst
│   │   └── v0.9.rst
│   ├── whats_new.rst
│   └── zzz_references.rst
├── examples/
│   ├── README.txt
│   ├── api/
│   │   ├── README.txt
│   │   └── plot_sampling_strategy_usage.py
│   ├── applications/
│   │   ├── README.txt
│   │   ├── plot_impact_imbalanced_classes.py
│   │   ├── plot_multi_class_under_sampling.py
│   │   ├── plot_outlier_rejections.py
│   │   ├── plot_over_sampling_benchmark_lfw.py
│   │   ├── plot_topic_classication.py
│   │   └── porto_seguro_keras_under_sampling.py
│   ├── combine/
│   │   ├── README.txt
│   │   └── plot_comparison_combine.py
│   ├── datasets/
│   │   ├── README.txt
│   │   └── plot_make_imbalance.py
│   ├── ensemble/
│   │   ├── README.txt
│   │   ├── plot_bagging_classifier.py
│   │   └── plot_comparison_ensemble_classifier.py
│   ├── evaluation/
│   │   ├── README.txt
│   │   ├── plot_classification_report.py
│   │   └── plot_metrics.py
│   ├── model_selection/
│   │   ├── README.txt
│   │   ├── plot_instance_hardness_cv.py
│   │   └── plot_validation_curve.py
│   ├── over-sampling/
│   │   ├── README.txt
│   │   ├── plot_comparison_over_sampling.py
│   │   ├── plot_illustration_generation_sample.py
│   │   └── plot_shrinkage_effect.py
│   ├── pipeline/
│   │   ├── README.txt
│   │   └── plot_pipeline_classification.py
│   └── under-sampling/
│       ├── README.txt
│       ├── plot_comparison_under_sampling.py
│       ├── plot_illustration_nearmiss.py
│       └── plot_illustration_tomek_links.py
├── imblearn/
│   ├── VERSION.txt
│   ├── __init__.py
│   ├── _version.py
│   ├── base.py
│   ├── combine/
│   │   ├── __init__.py
│   │   ├── _smote_enn.py
│   │   ├── _smote_tomek.py
│   │   └── tests/
│   │       ├── __init__.py
│   │       ├── test_smote_enn.py
│   │       └── test_smote_tomek.py
│   ├── datasets/
│   │   ├── __init__.py
│   │   ├── _imbalance.py
│   │   ├── _zenodo.py
│   │   └── tests/
│   │       ├── __init__.py
│   │       ├── test_imbalance.py
│   │       └── test_zenodo.py
│   ├── ensemble/
│   │   ├── __init__.py
│   │   ├── _bagging.py
│   │   ├── _common.py
│   │   ├── _easy_ensemble.py
│   │   ├── _forest.py
│   │   ├── _weight_boosting.py
│   │   └── tests/
│   │       ├── __init__.py
│   │       ├── test_bagging.py
│   │       ├── test_easy_ensemble.py
│   │       ├── test_forest.py
│   │       └── test_weight_boosting.py
│   ├── exceptions.py
│   ├── keras/
│   │   ├── __init__.py
│   │   ├── _generator.py
│   │   └── tests/
│   │       ├── __init__.py
│   │       └── test_generator.py
│   ├── metrics/
│   │   ├── __init__.py
│   │   ├── _classification.py
│   │   ├── pairwise.py
│   │   └── tests/
│   │       ├── __init__.py
│   │       ├── test_classification.py
│   │       ├── test_pairwise.py
│   │       └── test_score_objects.py
│   ├── model_selection/
│   │   ├── __init__.py
│   │   ├── _split.py
│   │   └── tests/
│   │       ├── __init__.py
│   │       └── test_split.py
│   ├── over_sampling/
│   │   ├── __init__.py
│   │   ├── _adasyn.py
│   │   ├── _random_over_sampler.py
│   │   ├── _smote/
│   │   │   ├── __init__.py
│   │   │   ├── base.py
│   │   │   ├── cluster.py
│   │   │   ├── filter.py
│   │   │   └── tests/
│   │   │       ├── __init__.py
│   │   │       ├── test_borderline_smote.py
│   │   │       ├── test_kmeans_smote.py
│   │   │       ├── test_smote.py
│   │   │       ├── test_smote_nc.py
│   │   │       ├── test_smoten.py
│   │   │       └── test_svm_smote.py
│   │   ├── base.py
│   │   └── tests/
│   │       ├── __init__.py
│   │       ├── test_adasyn.py
│   │       ├── test_common.py
│   │       └── test_random_over_sampler.py
│   ├── pipeline.py
│   ├── tensorflow/
│   │   ├── __init__.py
│   │   ├── _generator.py
│   │   └── tests/
│   │       ├── __init__.py
│   │       └── test_generator.py
│   ├── tests/
│   │   ├── __init__.py
│   │   ├── test_base.py
│   │   ├── test_common.py
│   │   ├── test_docstring_parameters.py
│   │   ├── test_exceptions.py
│   │   ├── test_pipeline.py
│   │   └── test_public_functions.py
│   ├── under_sampling/
│   │   ├── __init__.py
│   │   ├── _prototype_generation/
│   │   │   ├── __init__.py
│   │   │   ├── _cluster_centroids.py
│   │   │   └── tests/
│   │   │       ├── __init__.py
│   │   │       └── test_cluster_centroids.py
│   │   ├── _prototype_selection/
│   │   │   ├── __init__.py
│   │   │   ├── _condensed_nearest_neighbour.py
│   │   │   ├── _edited_nearest_neighbours.py
│   │   │   ├── _instance_hardness_threshold.py
│   │   │   ├── _nearmiss.py
│   │   │   ├── _neighbourhood_cleaning_rule.py
│   │   │   ├── _one_sided_selection.py
│   │   │   ├── _random_under_sampler.py
│   │   │   ├── _tomek_links.py
│   │   │   └── tests/
│   │   │       ├── __init__.py
│   │   │       ├── test_allknn.py
│   │   │       ├── test_condensed_nearest_neighbour.py
│   │   │       ├── test_edited_nearest_neighbours.py
│   │   │       ├── test_instance_hardness_threshold.py
│   │   │       ├── test_nearmiss.py
│   │   │       ├── test_neighbourhood_cleaning_rule.py
│   │   │       ├── test_one_sided_selection.py
│   │   │       ├── test_random_under_sampler.py
│   │   │       ├── test_repeated_edited_nearest_neighbours.py
│   │   │       └── test_tomek_links.py
│   │   └── base.py
│   └── utils/
│       ├── __init__.py
│       ├── _docstring.py
│       ├── _show_versions.py
│       ├── _tags.py
│       ├── _test_common/
│       │   ├── __init__.py
│       │   └── instance_generator.py
│       ├── _validation.py
│       ├── deprecation.py
│       ├── estimator_checks.py
│       ├── testing.py
│       └── tests/
│           ├── __init__.py
│           ├── test_deprecation.py
│           ├── test_docstring.py
│           ├── test_estimator_checks.py
│           ├── test_min_dependencies.py
│           ├── test_show_versions.py
│           ├── test_testing.py
│           └── test_validation.py
├── maint_tools/
│   └── test_docstring.py
├── pyproject.toml
└── references.bib

Download .txt

SYMBOL INDEX (786 symbols across 104 files)

FILE: conftest.py
  function pytest_runtest_setup (line 20) | def pytest_runtest_setup(item):

FILE: doc/conf.py
  function generate_min_dependency_table (line 263) | def generate_min_dependency_table(app):
  function generate_min_dependency_substitutions (line 313) | def generate_min_dependency_substitutions(app):
  function setup (line 333) | def setup(app):

FILE: doc/sphinxext/github_link.py
  function _get_git_revision (line 11) | def _get_git_revision():
  function _linkcode_resolve (line 20) | def _linkcode_resolve(domain, info, package, url_fmt, revision):
  function make_linkcode_resolve (line 70) | def make_linkcode_resolve(package, url_fmt):

FILE: doc/sphinxext/sphinx_issues.py
  function user_role (line 31) | def user_role(name, rawtext, text, lineno, inliner, options=None, conten...
  function cve_role (line 60) | def cve_role(name, rawtext, text, lineno, inliner, options=None, content...
  class IssueRole (line 77) | class IssueRole:
    method __init__ (line 80) | def __init__(
    method default_format_text (line 93) | def default_format_text(issue_no):
    method make_node (line 96) | def make_node(self, name, issue_no, config, options=None):
    method __call__ (line 133) | def __call__(
  function format_commit_text (line 177) | def format_commit_text(sha):
  function setup (line 195) | def setup(app):

FILE: examples/api/plot_sampling_strategy_usage.py
  function ratio_multiplier (line 179) | def ratio_multiplier(y):

FILE: examples/applications/plot_outlier_rejections.py
  function plot_scatter (line 31) | def plot_scatter(X, y, title):
  function outlier_rejection (line 90) | def outlier_rejection(X, y):

FILE: examples/applications/porto_seguro_keras_under_sampling.py
  function convert_float64 (line 58) | def convert_float64(X):
  function make_model (line 105) | def make_model(n_features):
  function timeit (line 137) | def timeit(f):
  function fit_predict_imbalanced_model (line 160) | def fit_predict_imbalanced_model(X_train, y_train, X_test, y_test):
  function fit_predict_balanced_model (line 180) | def fit_predict_balanced_model(X_train, y_train, X_test, y_test):

FILE: examples/combine/plot_comparison_combine.py
  function plot_resampling (line 59) | def plot_resampling(X, y, sampler, ax):
  function plot_decision_function (line 76) | def plot_decision_function(X, y, clf, ax):

FILE: examples/datasets/plot_make_imbalance.py
  function ratio_func (line 60) | def ratio_func(y, multiplier, minority_class):

FILE: examples/ensemble/plot_bagging_classifier.py
  function roughly_balanced_bagging (line 129) | def roughly_balanced_bagging(X, y, replace=False):

FILE: examples/over-sampling/plot_comparison_over_sampling.py
  function create_dataset (line 31) | def create_dataset(
  function plot_resampling (line 58) | def plot_resampling(X, y, sampler, ax, title=None):
  function plot_decision_function (line 76) | def plot_decision_function(X, y, clf, ax, title=None):

FILE: examples/under-sampling/plot_comparison_under_sampling.py
  function create_dataset (line 30) | def create_dataset(
  function plot_resampling (line 57) | def plot_resampling(X, y, sampler, ax, title=None):
  function plot_decision_function (line 75) | def plot_decision_function(X, y, clf, ax, title=None):

FILE: examples/under-sampling/plot_illustration_nearmiss.py
  function make_plot_despine (line 26) | def make_plot_despine(ax):

FILE: examples/under-sampling/plot_illustration_tomek_links.py
  function make_plot_despine (line 26) | def make_plot_despine(ax):

FILE: imblearn/__init__.py
  class LazyLoader (line 73) | class LazyLoader(types.ModuleType):
    method __init__ (line 81) | def __init__(self, local_name, parent_module_globals, name, warning=No...
    method _load (line 88) | def _load(self):
    method __getattr__ (line 101) | def __getattr__(self, item):
    method __dir__ (line 105) | def __dir__(self):

FILE: imblearn/base.py
  class SamplerMixin (line 36) | class SamplerMixin(metaclass=ABCMeta):
    method fit (line 46) | def fit(self, X, y, **params):
    method fit_resample (line 75) | def fit_resample(self, X, y, **params):
    method _fit_resample (line 117) | def _fit_resample(self, X, y, **params):
  class BaseSampler (line 145) | class BaseSampler(SamplerMixin, OneToOneFeatureMixin, BaseEstimator):
    method __init__ (line 152) | def __init__(self, sampling_strategy="auto"):
    method _check_X_y (line 155) | def _check_X_y(self, X, y, accept_sparse=None):
    method fit (line 162) | def fit(self, X, y, **params):
    method fit_resample (line 183) | def fit_resample(self, X, y, **params):
    method _more_tags (line 206) | def _more_tags(self):
    method __sklearn_tags__ (line 209) | def __sklearn_tags__(self):
  function _identity (line 229) | def _identity(X, y):
  function is_sampler (line 233) | def is_sampler(estimator):
  class FunctionSampler (line 255) | class FunctionSampler(BaseSampler):
    method __init__ (line 353) | def __init__(self, *, func=None, accept_sparse=True, kw_args=None, val...
    method fit (line 360) | def fit(self, X, y):
    method fit_resample (line 391) | def fit_resample(self, X, y):
    method _fit_resample (line 435) | def _fit_resample(self, X, y):

FILE: imblearn/combine/_smote_enn.py
  class SMOTEENN (line 25) | class SMOTEENN(BaseSampler):
    method __init__ (line 120) | def __init__(
    method _validate_estimator (line 136) | def _validate_estimator(self):
    method _fit_resample (line 153) | def _fit_resample(self, X, y):

FILE: imblearn/combine/_smote_tomek.py
  class SMOTETomek (line 26) | class SMOTETomek(BaseSampler):
    method __init__ (line 118) | def __init__(
    method _validate_estimator (line 134) | def _validate_estimator(self):
    method _fit_resample (line 150) | def _fit_resample(self, X, y):

FILE: imblearn/combine/tests/test_smote_enn.py
  function test_sample_regular (line 42) | def test_sample_regular():
  function test_sample_regular_pass_smote_enn (line 62) | def test_sample_regular_pass_smote_enn():
  function test_sample_regular_half (line 86) | def test_sample_regular_half():
  function test_validate_estimator_init (line 104) | def test_validate_estimator_init():
  function test_validate_estimator_default (line 125) | def test_validate_estimator_default():
  function test_parallelisation (line 144) | def test_parallelisation():

FILE: imblearn/combine/tests/test_smote_tomek.py
  function test_sample_regular (line 42) | def test_sample_regular():
  function test_sample_regular_half (line 70) | def test_sample_regular_half():
  function test_validate_estimator_init (line 96) | def test_validate_estimator_init():
  function test_validate_estimator_default (line 126) | def test_validate_estimator_default():
  function test_parallelisation (line 154) | def test_parallelisation():

FILE: imblearn/datasets/_imbalance.py
  function make_imbalance (line 27) | def make_imbalance(

FILE: imblearn/datasets/_zenodo.py
  function fetch_datasets (line 111) | def fetch_datasets(

FILE: imblearn/datasets/tests/test_imbalance.py
  function iris (line 16) | def iris():
  function test_make_imbalance_error (line 27) | def test_make_imbalance_error(iris, sampling_strategy, err_msg):
  function test_make_imbalance_error_single_class (line 35) | def test_make_imbalance_error_single_class(iris):
  function test_make_imbalance_dict (line 49) | def test_make_imbalance_dict(iris, sampling_strategy, expected_counts):
  function test_make_imbalanced_iris (line 69) | def test_make_imbalanced_iris(as_frame, sampling_strategy, expected_coun...

FILE: imblearn/datasets/tests/test_zenodo.py
  function fetch (line 45) | def fetch(*args, **kwargs):
  function test_fetch (line 50) | def test_fetch():
  function test_fetch_filter (line 68) | def test_fetch_filter():
  function test_fetch_error (line 96) | def test_fetch_error(filter_data, err_msg):

FILE: imblearn/ensemble/_bagging.py
  class BalancedBaggingClassifier (line 29) | class BalancedBaggingClassifier(BaggingClassifier):
    method __init__ (line 239) | def __init__(
    method _validate_y (line 274) | def _validate_y(self, y):
    method _validate_estimator (line 292) | def _validate_estimator(self, default=DecisionTreeClassifier()):
    method fit (line 308) | def fit(self, X, y):
    method _fit (line 330) | def _fit(self, X, y, max_samples=None, max_depth=None, sample_weight=N...
    method base_estimator_ (line 346) | def base_estimator_(self):
    method _more_tags (line 353) | def _more_tags(self):
    method __sklearn_tags__ (line 364) | def __sklearn_tags__(self):

FILE: imblearn/ensemble/_common.py
  function _estimator_has (line 13) | def _estimator_has(attr):

FILE: imblearn/ensemble/_easy_ensemble.py
  class EasyEnsembleClassifier (line 34) | class EasyEnsembleClassifier(BaggingClassifier):
    method __init__ (line 184) | def __init__(
    method _validate_y (line 212) | def _validate_y(self, y):
    method _validate_estimator (line 227) | def _validate_estimator(self, default=None):
    method fit (line 244) | def fit(self, X, y):
    method _fit (line 266) | def _fit(self, X, y, max_samples=None, max_depth=None, sample_weight=N...
    method base_estimator_ (line 273) | def base_estimator_(self):
    method _get_estimator (line 280) | def _get_estimator(self):
    method _more_tags (line 288) | def _more_tags(self):
    method __sklearn_tags__ (line 291) | def __sklearn_tags__(self):

FILE: imblearn/ensemble/_forest.py
  function _local_parallel_build_trees (line 44) | def _local_parallel_build_trees(
  class BalancedRandomForestClassifier (line 92) | class BalancedRandomForestClassifier(RandomForestClassifier):
    method __init__ (line 427) | def __init__(
    method _validate_estimator (line 479) | def _validate_estimator(self, default=DecisionTreeClassifier()):
    method _make_sampler_estimator (line 492) | def _make_sampler_estimator(self, random_state=None):
    method fit (line 508) | def fit(self, X, y, sample_weight=None):
    method _set_oob_score_and_attributes (line 722) | def _set_oob_score_and_attributes(self, X, y):
    method _compute_oob_predictions (line 742) | def _compute_oob_predictions(self, X, y):
    method _more_tags (line 815) | def _more_tags(self):
    method __sklearn_tags__ (line 818) | def __sklearn_tags__(self):

FILE: imblearn/ensemble/_weight_boosting.py
  class RUSBoostClassifier (line 30) | class RUSBoostClassifier(AdaBoostClassifier):
    method __init__ (line 185) | def __init__(
    method fit (line 207) | def fit(self, X, y, sample_weight=None):
    method _validate_estimator (line 235) | def _validate_estimator(self):
    method _make_sampler_estimator (line 266) | def _make_sampler_estimator(self, append=True, random_state=None):
    method _boost_real (line 288) | def _boost_real(self, iboost, X, y, sample_weight, random_state):
    method _boost_discrete (line 349) | def _boost_discrete(self, iboost, X, y, sample_weight, random_state):
    method _boost (line 401) | def _boost(self, iboost, X, y, sample_weight, random_state):

FILE: imblearn/ensemble/tests/test_bagging.py
  function test_balanced_bagging_classifier (line 57) | def test_balanced_bagging_classifier(estimator, params):
  function test_bootstrap_samples (line 77) | def test_bootstrap_samples():
  function test_bootstrap_features (line 113) | def test_bootstrap_features():
  function test_probability (line 146) | def test_probability():
  function test_oob_score_classification (line 191) | def test_oob_score_classification():
  function test_single_estimator (line 226) | def test_single_estimator():
  function test_gridsearch (line 252) | def test_gridsearch():
  function test_estimator (line 269) | def test_estimator():
  function test_bagging_with_pipeline (line 298) | def test_bagging_with_pipeline():
  function test_warm_start (line 312) | def test_warm_start(random_state=42):
  function test_warm_start_smaller_n_estimators (line 340) | def test_warm_start_smaller_n_estimators():
  function test_warm_start_equal_n_estimators (line 350) | def test_warm_start_equal_n_estimators():
  function test_warm_start_equivalence (line 368) | def test_warm_start_equivalence():
  function test_warm_start_with_oob_score_fails (line 391) | def test_warm_start_with_oob_score_fails():
  function test_oob_score_removed_on_warm_start (line 399) | def test_oob_score_removed_on_warm_start():
  function test_oob_score_consistency (line 412) | def test_oob_score_consistency():
  function test_estimators_samples (line 426) | def test_estimators_samples():
  function test_max_samples_consistency (line 469) | def test_max_samples_consistency():
  class CountDecisionTreeClassifier (line 484) | class CountDecisionTreeClassifier(DecisionTreeClassifier):
    method fit (line 488) | def fit(self, X, y, sample_weight=None):
  function test_balanced_bagging_classifier_samplers (line 507) | def test_balanced_bagging_classifier_samplers(sampler, n_samples_bootstr...
  function test_balanced_bagging_classifier_with_function_sampler (line 533) | def test_balanced_bagging_classifier_with_function_sampler(replace):

FILE: imblearn/ensemble/tests/test_easy_ensemble.py
  function test_easy_ensemble_classifier (line 48) | def test_easy_ensemble_classifier(n_estimators, estimator):
  function test_estimator (line 75) | def test_estimator():
  function test_bagging_with_pipeline (line 98) | def test_bagging_with_pipeline():
  function test_warm_start (line 112) | def test_warm_start(random_state=42):
  function test_warm_start_smaller_n_estimators (line 140) | def test_warm_start_smaller_n_estimators():
  function test_warm_start_equal_n_estimators (line 150) | def test_warm_start_equal_n_estimators():
  function test_warm_start_equivalence (line 168) | def test_warm_start_equivalence():
  function test_easy_ensemble_classifier_single_estimator (line 187) | def test_easy_ensemble_classifier_single_estimator():
  function test_easy_ensemble_classifier_grid_search (line 205) | def test_easy_ensemble_classifier_grid_search():

FILE: imblearn/ensemble/tests/test_forest.py
  function imbalanced_dataset (line 13) | def imbalanced_dataset():
  function test_balanced_random_forest_error_warning_warm_start (line 28) | def test_balanced_random_forest_error_warning_warm_start(imbalanced_data...
  function test_balanced_random_forest (line 45) | def test_balanced_random_forest(imbalanced_dataset):
  function test_balanced_random_forest_attributes (line 62) | def test_balanced_random_forest_attributes(imbalanced_dataset):
  function test_balanced_random_forest_sample_weight (line 91) | def test_balanced_random_forest_sample_weight(imbalanced_dataset):
  function test_balanced_random_forest_oob (line 106) | def test_balanced_random_forest_oob(imbalanced_dataset):
  function test_balanced_random_forest_grid_search (line 139) | def test_balanced_random_forest_grid_search(imbalanced_dataset):
  function test_little_tree_with_small_max_samples (line 147) | def test_little_tree_with_small_max_samples():
  function test_balanced_random_forest_pruning (line 183) | def test_balanced_random_forest_pruning(imbalanced_dataset):
  function test_balanced_random_forest_oob_binomial (line 201) | def test_balanced_random_forest_oob_binomial(ratio):
  function test_missing_values_is_resilient (line 224) | def test_missing_values_is_resilient():
  function test_missing_value_is_predictive (line 273) | def test_missing_value_is_predictive():

FILE: imblearn/ensemble/tests/test_weight_boosting.py
  function imbalanced_dataset (line 11) | def imbalanced_dataset():
  function test_rusboost (line 26) | def test_rusboost(imbalanced_dataset):
  function test_rusboost_sample_weight (line 68) | def test_rusboost_sample_weight(imbalanced_dataset):
  function test_rusboost_algorithm (line 88) | def test_rusboost_algorithm(imbalanced_dataset, algorithm):

FILE: imblearn/exceptions.py
  function raise_isinstance_error (line 10) | def raise_isinstance_error(variable_name, possible_type, variable):

FILE: imblearn/keras/_generator.py
  function import_keras (line 10) | def import_keras():
  class BalancedBatchGenerator (line 64) | class BalancedBatchGenerator(*ParentClass):  # type: ignore
    method __init__ (line 144) | def __init__(
    method _sample (line 166) | def _sample(self):
    method __len__ (line 179) | def __len__(self):
    method __getitem__ (line 182) | def __getitem__(self, index):
  function balanced_batch_generator (line 206) | def balanced_batch_generator(

FILE: imblearn/keras/tests/test_generator.py
  function data (line 22) | def data():
  function _build_keras_model (line 32) | def _build_keras_model(n_classes, n_features):
  function test_balanced_batch_generator_class_no_return_indices (line 41) | def test_balanced_batch_generator_class_no_return_indices(data):
  function test_balanced_batch_generator_class (line 58) | def test_balanced_batch_generator_class(data, sampler, sample_weight):
  function test_balanced_batch_generator_class_sparse (line 73) | def test_balanced_batch_generator_class_sparse(data, keep_sparse):
  function test_balanced_batch_generator_function_no_return_indices (line 90) | def test_balanced_batch_generator_function_no_return_indices(data):
  function test_balanced_batch_generator_function (line 110) | def test_balanced_batch_generator_function(data, sampler, sample_weight):
  function test_balanced_batch_generator_function_sparse (line 130) | def test_balanced_batch_generator_function_sparse(data, keep_sparse):

FILE: imblearn/metrics/_classification.py
  function sensitivity_specificity_support (line 48) | def sensitivity_specificity_support(
  function sensitivity_score (line 315) | def sensitivity_score(
  function specificity_score (line 431) | def specificity_score(
  function geometric_mean_score (line 550) | def geometric_mean_score(
  function make_index_balanced_accuracy (line 743) | def make_index_balanced_accuracy(*, alpha=0.1, squared=True):
  function classification_report_imbalanced (line 865) | def classification_report_imbalanced(
  function macro_averaged_mean_absolute_error (line 1080) | def macro_averaged_mean_absolute_error(y_true, y_pred, *, sample_weight=...

FILE: imblearn/metrics/pairwise.py
  class ValueDifferenceMetric (line 19) | class ValueDifferenceMetric(BaseEstimator):
    method __init__ (line 128) | def __init__(self, *, n_categories="auto", k=1, r=2):
    method fit (line 134) | def fit(self, X, y):
    method pairwise (line 194) | def pairwise(self, X, Y=None):
    method _more_tags (line 234) | def _more_tags(self):
    method __sklearn_tags__ (line 239) | def __sklearn_tags__(self):

FILE: imblearn/metrics/tests/test_classification.py
  function make_prediction (line 45) | def make_prediction(dataset=None, binary=False):
  function test_sensitivity_specificity_score_binary (line 93) | def test_sensitivity_specificity_score_binary():
  function test_sensitivity_specificity_f_binary_single_class (line 118) | def test_sensitivity_specificity_f_binary_single_class(
  function test_sensitivity_specificity_extra_labels (line 134) | def test_sensitivity_specificity_extra_labels(average, expected_specific...
  function test_sensitivity_specificity_ignored_labels (line 142) | def test_sensitivity_specificity_ignored_labels():
  function test_sensitivity_specificity_error_multilabels (line 163) | def test_sensitivity_specificity_error_multilabels():
  function test_sensitivity_specificity_support_errors (line 173) | def test_sensitivity_specificity_support_errors():
  function test_sensitivity_specificity_unused_pos_label (line 185) | def test_sensitivity_specificity_unused_pos_label():
  function test_geometric_mean_support_binary (line 194) | def test_geometric_mean_support_binary():
  function test_geometric_mean_multiclass (line 222) | def test_geometric_mean_multiclass(y_true, y_pred, correction, expected_...
  function test_geometric_mean_average (line 237) | def test_geometric_mean_average(y_true, y_pred, average, expected_gmean):
  function test_geometric_mean_sample_weight (line 262) | def test_geometric_mean_sample_weight(
  function test_geometric_mean_score_prediction (line 284) | def test_geometric_mean_score_prediction(average, expected_gmean):
  function test_iba_geo_mean_binary (line 291) | def test_iba_geo_mean_binary():
  function _format_report (line 302) | def _format_report(report):
  function test_classification_report_imbalanced_multiclass (line 306) | def test_classification_report_imbalanced_multiclass():
  function test_classification_report_imbalanced_multiclass_with_digits (line 337) | def test_classification_report_imbalanced_multiclass_with_digits():
  function test_classification_report_imbalanced_multiclass_with_string_label (line 369) | def test_classification_report_imbalanced_multiclass_with_string_label():
  function test_classification_report_imbalanced_multiclass_with_unicode_label (line 396) | def test_classification_report_imbalanced_multiclass_with_unicode_label():
  function test_classification_report_imbalanced_multiclass_with_long_string_label (line 413) | def test_classification_report_imbalanced_multiclass_with_long_string_la...
  function test_iba_sklearn_metrics (line 440) | def test_iba_sklearn_metrics(score, expected_score):
  function test_iba_error_y_score_prob_error (line 452) | def test_iba_error_y_score_prob_error(score_loss):
  function test_classification_report_imbalanced_dict_with_target_names (line 460) | def test_classification_report_imbalanced_dict_with_target_names():
  function test_classification_report_imbalanced_dict_without_target_names (line 492) | def test_classification_report_imbalanced_dict_without_target_names():
  function test_macro_averaged_mean_absolute_error (line 531) | def test_macro_averaged_mean_absolute_error(y_true, y_pred, expected_ma_...
  function test_macro_averaged_mean_absolute_error_sample_weight (line 536) | def test_macro_averaged_mean_absolute_error_sample_weight():

FILE: imblearn/metrics/tests/test_pairwise.py
  function data (line 16) | def data():
  function test_value_difference_metric (line 34) | def test_value_difference_metric(data, dtype, k, r, y_type, encode_label):
  function test_value_difference_metric_property (line 62) | def test_value_difference_metric_property(dtype, k, r, y_type, encode_la...
  function test_value_difference_metric_categories (line 111) | def test_value_difference_metric_categories(data):
  function test_value_difference_metric_categories_error (line 128) | def test_value_difference_metric_categories_error(data):
  function test_value_difference_metric_missing_categories (line 143) | def test_value_difference_metric_missing_categories(data):
  function test_value_difference_value_unfitted (line 163) | def test_value_difference_value_unfitted(data):

FILE: imblearn/metrics/tests/test_score_objects.py
  function data (line 23) | def data():
  function test_scorer_common_average (line 38) | def test_scorer_common_average(data, score, expected_score, average):
  function test_scorer_default_average (line 66) | def test_scorer_default_average(data, score, average, expected_score):

FILE: imblearn/model_selection/_split.py
  class InstanceHardnessCV (line 11) | class InstanceHardnessCV(BaseCrossValidator):
    method __init__ (line 49) | def __init__(self, estimator, *, n_splits=5, pos_label=None):
    method split (line 54) | def split(self, X, y, groups=None):
    method get_n_splits (line 102) | def get_n_splits(self, X=None, y=None, groups=None):

FILE: imblearn/model_selection/tests/test_split.py
  function data (line 13) | def data():
  function test_groups_parameter_warning (line 25) | def test_groups_parameter_warning(data):
  function test_error_on_multiclass (line 35) | def test_error_on_multiclass():
  function test_default_params (line 43) | def test_default_params(data):
  function test_target_string_labels (line 54) | def test_target_string_labels(data, dtype_target):
  function test_target_string_pos_label (line 71) | def test_target_string_pos_label(data, dtype_target):
  function test_n_splits (line 96) | def test_n_splits(n_splits):

FILE: imblearn/over_sampling/_adasyn.py
  class ADASYN (line 23) | class ADASYN(BaseOverSampler):
    method __init__ (line 123) | def __init__(
    method _validate_estimator (line 134) | def _validate_estimator(self):
    method _fit_resample (line 140) | def _fit_resample(self, X, y):
    method _more_tags (line 208) | def _more_tags(self):
    method __sklearn_tags__ (line 213) | def __sklearn_tags__(self):

FILE: imblearn/over_sampling/_random_over_sampler.py
  class RandomOverSampler (line 27) | class RandomOverSampler(BaseOverSampler):
    method __init__ (line 146) | def __init__(
    method _check_X_y (line 157) | def _check_X_y(self, X, y):
    method _fit_resample (line 163) | def _fit_resample(self, X, y):
    method _more_tags (line 252) | def _more_tags(self):
    method __sklearn_tags__ (line 262) | def __sklearn_tags__(self):

FILE: imblearn/over_sampling/_smote/base.py
  class BaseSMOTE (line 40) | class BaseSMOTE(BaseOverSampler):
    method __init__ (line 51) | def __init__(
    method _validate_estimator (line 61) | def _validate_estimator(self):
    method _make_samples (line 69) | def _make_samples(
    method _generate_samples (line 124) | def _generate_samples(
    method _in_danger_noise (line 191) | def _in_danger_noise(self, nn_estimator, samples, target_class, y, kin...
  class SMOTE (line 242) | class SMOTE(BaseSMOTE):
    method __init__ (line 335) | def __init__(
    method _fit_resample (line 348) | def _fit_resample(self, X, y):
  class SMOTENC (line 381) | class SMOTENC(SMOTE):
    method __init__ (line 533) | def __init__(
    method _check_X_y (line 550) | def _check_X_y(self, X, y):
    method _validate_column_types (line 559) | def _validate_column_types(self, X):
    method _validate_estimator (line 582) | def _validate_estimator(self):
    method _fit_resample (line 595) | def _fit_resample(self, X, y):
    method _generate_samples (line 707) | def _generate_samples(self, X, nn_data, nn_num, rows, cols, steps, y_t...
    method _more_tags (line 752) | def _more_tags(self):
    method __sklearn_tags__ (line 755) | def __sklearn_tags__(self):
  class SMOTEN (line 766) | class SMOTEN(SMOTE):
    method __init__ (line 874) | def __init__(
    method _check_X_y (line 889) | def _check_X_y(self, X, y):
    method _validate_estimator (line 902) | def _validate_estimator(self):
    method _make_samples (line 907) | def _make_samples(self, X_class, klass, y_dtype, nn_indices, n_samples):
    method _fit_resample (line 923) | def _fit_resample(self, X, y):
    method _more_tags (line 981) | def _more_tags(self):
    method __sklearn_tags__ (line 984) | def __sklearn_tags__(self):

FILE: imblearn/over_sampling/_smote/cluster.py
  class KMeansSMOTE (line 30) | class KMeansSMOTE(BaseSMOTE):
    method __init__ (line 161) | def __init__(
    method _validate_estimator (line 182) | def _validate_estimator(self):
    method _find_cluster_sparsity (line 200) | def _find_cluster_sparsity(self, X):
    method _fit_resample (line 218) | def _fit_resample(self, X, y):

FILE: imblearn/over_sampling/_smote/filter.py
  class BorderlineSMOTE (line 28) | class BorderlineSMOTE(BaseSMOTE):
    method __init__ (line 159) | def __init__(
    method _validate_estimator (line 176) | def _validate_estimator(self):
    method _fit_resample (line 182) | def _fit_resample(self, X, y):
  class SVMSMOTE (line 235) | class SVMSMOTE(BaseSMOTE):
    method __init__ (line 372) | def __init__(
    method _validate_estimator (line 391) | def _validate_estimator(self):
    method _fit_resample (line 402) | def _fit_resample(self, X, y):

FILE: imblearn/over_sampling/_smote/tests/test_borderline_smote.py
  function test_borderline_smote_no_in_danger_samples (line 12) | def test_borderline_smote_no_in_danger_samples(kind):
  function test_borderline_smote_kind (line 36) | def test_borderline_smote_kind():
  function test_borderline_smote_in_danger (line 75) | def test_borderline_smote_in_danger():

FILE: imblearn/over_sampling/_smote/tests/test_kmeans_smote.py
  function data (line 12) | def data():
  function test_kmeans_smote (line 42) | def test_kmeans_smote(data):
  function test_sample_kmeans_custom (line 73) | def test_sample_kmeans_custom(data, k_neighbors, kmeans_estimator):
  function test_sample_kmeans_not_enough_clusters (line 89) | def test_sample_kmeans_not_enough_clusters(data):
  function test_sample_kmeans_density_estimation (line 98) | def test_sample_kmeans_density_estimation(density_exponent, cluster_bala...

FILE: imblearn/over_sampling/_smote/tests/test_smote.py
  function test_sample_regular (line 41) | def test_sample_regular():
  function test_sample_regular_half (line 79) | def test_sample_regular_half():
  function test_sample_regular_with_nn (line 113) | def test_sample_regular_with_nn():

FILE: imblearn/over_sampling/_smote/tests/test_smote_nc.py
  function data_heterogneous_ordered (line 19) | def data_heterogneous_ordered():
  function data_heterogneous_unordered (line 33) | def data_heterogneous_unordered():
  function data_heterogneous_masked (line 47) | def data_heterogneous_masked():
  function data_heterogneous_unordered_multiclass (line 61) | def data_heterogneous_unordered_multiclass():
  function data_sparse (line 75) | def data_sparse(format):
  function test_smotenc_error (line 89) | def test_smotenc_error():
  function test_smotenc (line 107) | def test_smotenc(data):
  function test_smotenc_check_target_type (line 130) | def test_smotenc_check_target_type():
  function test_smotenc_samplers_one_label (line 143) | def test_smotenc_samplers_one_label():
  function test_smotenc_fit (line 151) | def test_smotenc_fit():
  function test_smotenc_fit_resample (line 160) | def test_smotenc_fit_resample():
  function test_smotenc_fit_resample_sampling_strategy (line 170) | def test_smotenc_fit_resample_sampling_strategy():
  function test_smotenc_pandas (line 180) | def test_smotenc_pandas():
  function test_smotenc_preserve_dtype (line 193) | def test_smotenc_preserve_dtype():
  function test_smotenc_raising_error_all_categorical (line 211) | def test_smotenc_raising_error_all_categorical(categorical_features):
  function test_smote_nc_with_null_median_std (line 225) | def test_smote_nc_with_null_median_std():
  function test_smotenc_categorical_encoder (line 264) | def test_smotenc_categorical_encoder():
  function test_smotenc_bool_categorical (line 280) | def test_smotenc_bool_categorical():
  function test_smotenc_categorical_features_str (line 314) | def test_smotenc_categorical_features_str():
  function test_smotenc_categorical_features_auto (line 339) | def test_smotenc_categorical_features_auto():
  function test_smote_nc_categorical_features_auto_error (line 366) | def test_smote_nc_categorical_features_auto_error():

FILE: imblearn/over_sampling/_smote/tests/test_smoten.py
  function data (line 11) | def data():
  function test_smoten (line 25) | def test_smoten(data):
  function test_smoten_resampling (line 36) | def test_smoten_resampling():
  function test_smoten_sparse_input (line 62) | def test_smoten_sparse_input(data, sparse_format):
  function test_smoten_categorical_encoder (line 79) | def test_smoten_categorical_encoder(data):

FILE: imblearn/over_sampling/_smote/tests/test_svm_smote.py
  function data (line 13) | def data():
  function test_svm_smote (line 42) | def test_svm_smote(data):
  function test_svm_smote_not_svm (line 58) | def test_svm_smote_not_svm(data):
  function test_svm_smote_all_noise (line 67) | def test_svm_smote_all_noise(data):

FILE: imblearn/over_sampling/base.py
  class BaseOverSampler (line 16) | class BaseOverSampler(BaseSampler):

FILE: imblearn/over_sampling/tests/test_adasyn.py
  function test_ada_init (line 41) | def test_ada_init():
  function test_ada_fit_resample (line 47) | def test_ada_fit_resample():
  function test_ada_fit_resample_nn_obj (line 85) | def test_ada_fit_resample_nn_obj():

FILE: imblearn/over_sampling/tests/test_common.py
  function numerical_data (line 20) | def numerical_data():
  function categorical_data (line 29) | def categorical_data():
  function heterogeneous_data (line 44) | def heterogeneous_data():
  function test_smote_m_neighbors (line 57) | def test_smote_m_neighbors(numerical_data, smote):
  function test_numerical_smote_custom_nn (line 83) | def test_numerical_smote_custom_nn(numerical_data, smote, neighbor_estim...
  function test_categorical_smote_k_custom_nn (line 94) | def test_categorical_smote_k_custom_nn(categorical_data):
  function test_heterogeneous_smote_k_custom_nn (line 103) | def test_heterogeneous_smote_k_custom_nn(heterogeneous_data):
  function test_numerical_smote_extra_custom_nn (line 119) | def test_numerical_smote_extra_custom_nn(numerical_data, smote):

FILE: imblearn/over_sampling/tests/test_random_over_sampler.py
  function data (line 24) | def data():
  function test_ros_init (line 43) | def test_ros_init():
  function test_ros_fit_resample (line 53) | def test_ros_fit_resample(X_type, data, params):
  function test_ros_fit_resample_half (line 92) | def test_ros_fit_resample_half(data, params):
  function test_multiclass_fit_resample (line 124) | def test_multiclass_fit_resample(data, params):
  function test_random_over_sampling_heterogeneous_data (line 143) | def test_random_over_sampling_heterogeneous_data():
  function test_random_over_sampling_nan_inf (line 159) | def test_random_over_sampling_nan_inf(data):
  function test_random_over_sampling_heterogeneous_data_smoothed_bootstrap (line 180) | def test_random_over_sampling_heterogeneous_data_smoothed_bootstrap():
  function test_random_over_sampler_smoothed_bootstrap (line 194) | def test_random_over_sampler_smoothed_bootstrap(X_type, data):
  function test_random_over_sampler_equivalence_shrinkage (line 208) | def test_random_over_sampler_equivalence_shrinkage(data):
  function test_random_over_sampler_shrinkage_behaviour (line 228) | def test_random_over_sampler_shrinkage_behaviour(data):
  function test_random_over_sampler_shrinkage_error (line 253) | def test_random_over_sampler_shrinkage_error(data, shrinkage, err_msg):
  function test_random_over_sampler_strings (line 264) | def test_random_over_sampler_strings(sampling_strategy):
  function test_random_over_sampling_datetime (line 278) | def test_random_over_sampling_datetime():
  function test_random_over_sampler_full_nat (line 291) | def test_random_over_sampler_full_nat():

FILE: imblearn/pipeline.py
  function _raise_or_warn_if_not_fitted (line 47) | def _raise_or_warn_if_not_fitted(estimator):
  function _cached_transform (line 73) | def _cached_transform(
  class Pipeline (line 111) | class Pipeline(pipeline.Pipeline):
    method __init__ (line 243) | def __init__(self, steps, *, transform_input=None, memory=None, verbos...
    method _validate_steps (line 251) | def _validate_steps(self):
    method _iter (line 301) | def _iter(self, with_final=True, filter_passthrough=True, filter_resam...
    method _get_metadata_for_step (line 314) | def _get_metadata_for_step(self, *, step_idx, step_params, all_params):
    method _fit (line 394) | def _fit(self, X, y=None, routed_params=None, raw_params=None):
    method fit (line 455) | def fit(self, X, y=None, **params):
    method _can_fit_transform (line 525) | def _can_fit_transform(self):
    method fit_transform (line 537) | def fit_transform(self, X, y=None, **params):
    method predict (line 602) | def predict(self, X, **params):
    method _can_fit_resample (line 665) | def _can_fit_resample(self):
    method fit_resample (line 675) | def fit_resample(self, X, y=None, **params):
    method fit_predict (line 738) | def fit_predict(self, X, y=None, **params):
    method predict_proba (line 801) | def predict_proba(self, X, **params):
    method decision_function (line 860) | def decision_function(self, X, **params):
    method score_samples (line 908) | def score_samples(self, X):
    method predict_log_proba (line 935) | def predict_log_proba(self, X, **params):
    method _can_transform (line 993) | def _can_transform(self):
    method transform (line 999) | def transform(self, X, **params):
    method _can_inverse_transform (line 1043) | def _can_inverse_transform(self):
    method inverse_transform (line 1047) | def inverse_transform(self, Xt, **params):
    method score (line 1091) | def score(self, X, y=None, sample_weight=None, **params):
    method get_metadata_routing (line 1152) | def get_metadata_routing(self):
    method _check_method_params (line 1239) | def _check_method_params(self, method, props, **kwargs):
    method __sklearn_is_fitted__ (line 1268) | def __sklearn_is_fitted__(self):
    method __sklearn_tags__ (line 1298) | def __sklearn_tags__(self):
  function _fit_resample_one (line 1330) | def _fit_resample_one(sampler, X, y, message_clsname="", message=None, p...
  function _transform_one (line 1337) | def _transform_one(transformer, X, y, weight, params=None):
  function _fit_transform_one (line 1366) | def _fit_transform_one(
  function make_pipeline (line 1398) | def make_pipeline(*steps, memory=None, transform_input=None, verbose=Fal...

FILE: imblearn/tensorflow/_generator.py
  function balanced_batch_generator (line 13) | def balanced_batch_generator(

FILE: imblearn/tensorflow/tests/test_generator.py
  function data (line 16) | def data():
  function check_balanced_batch_generator_tf_1_X_X (line 23) | def check_balanced_batch_generator_tf_1_X_X(dataset, sampler):
  function check_balanced_batch_generator_tf_2_X_X_compat_1_X_X (line 84) | def check_balanced_batch_generator_tf_2_X_X_compat_1_X_X(dataset, sampler):
  function test_balanced_batch_generator (line 148) | def test_balanced_batch_generator(data, sampler):
  function test_balanced_batch_generator_function_sparse (line 156) | def test_balanced_batch_generator_function_sparse(data, keep_sparse):

FILE: imblearn/tests/test_base.py
  function test_function_sampler_reject_sparse (line 26) | def test_function_sampler_reject_sparse():
  function test_function_sampler_identity (line 40) | def test_function_sampler_identity(X, y):
  function test_function_sampler_func (line 50) | def test_function_sampler_func(X, y):
  function test_function_sampler_func_kwargs (line 63) | def test_function_sampler_func_kwargs(X, y):
  function test_function_sampler_validate (line 79) | def test_function_sampler_validate():
  function test_function_resampler_fit (line 95) | def test_function_resampler_fit():

FILE: imblearn/tests/test_common.py
  function test_all_estimator_no_base_class (line 35) | def test_all_estimator_no_base_class(name, Estimator):
  function test_estimators_compatibility_sklearn (line 44) | def test_estimators_compatibility_sklearn(estimator, check, request):
  function test_estimators_imblearn (line 52) | def test_estimators_imblearn(estimator, check, request):
  function test_check_param_validation (line 69) | def test_check_param_validation(estimator):
  function test_strategy_as_ordered_dict (line 76) | def test_strategy_as_ordered_dict(Sampler):
  function test_pandas_column_name_consistency (line 94) | def test_pandas_column_name_consistency(estimator):

FILE: imblearn/tests/test_docstring_parameters.py
  function test_docstring_parameters (line 65) | def test_docstring_parameters():
  function test_tabs (line 141) | def test_tabs():
  function test_fit_docstring_attributes (line 159) | def test_fit_docstring_attributes(estimator):
  function _get_all_fitted_attributes (line 222) | def _get_all_fitted_attributes(estimator):

FILE: imblearn/tests/test_exceptions.py
  function test_raise_isinstance_error (line 11) | def test_raise_isinstance_error():

FILE: imblearn/tests/test_pipeline.py
  class NoFit (line 58) | class NoFit:
    method __init__ (line 61) | def __init__(self, a=None, b=None):
    method __sklearn_tags__ (line 65) | def __sklearn_tags__(self):
  class NoTrans (line 69) | class NoTrans(NoFit):
    method fit (line 70) | def fit(self, X, y):
    method get_params (line 73) | def get_params(self, deep=False):
    method set_params (line 76) | def set_params(self, **params):
  class NoInvTransf (line 81) | class NoInvTransf(NoTrans):
    method transform (line 82) | def transform(self, X, y=None):
  class Transf (line 86) | class Transf(NoInvTransf):
    method transform (line 87) | def transform(self, X, y=None):
    method inverse_transform (line 90) | def inverse_transform(self, X):
  class TransfFitParams (line 94) | class TransfFitParams(Transf):
    method fit (line 95) | def fit(self, X, y, **fit_params):
  class Mult (line 100) | class Mult(BaseEstimator):
    method __init__ (line 101) | def __init__(self, mult=1):
    method __sklearn_is_fitted__ (line 104) | def __sklearn_is_fitted__(self):
    method fit (line 107) | def fit(self, X, y):
    method transform (line 110) | def transform(self, X):
    method inverse_transform (line 113) | def inverse_transform(self, X):
    method predict (line 116) | def predict(self, X):
    method score (line 121) | def score(self, X, y=None):
  class FitParamT (line 125) | class FitParamT(BaseEstimator):
    method __init__ (line 128) | def __init__(self):
    method fit (line 131) | def fit(self, X, y, should_succeed=False):
    method predict (line 136) | def predict(self, X):
    method fit_predict (line 139) | def fit_predict(self, X, y, should_succeed=False):
    method score (line 143) | def score(self, X, y=None, sample_weight=None):
  class DummyTransf (line 149) | class DummyTransf(Transf):
    method fit (line 152) | def fit(self, X, y):
  class DummyEstimatorParams (line 160) | class DummyEstimatorParams(BaseEstimator):
    method __sklearn_is_fitted__ (line 163) | def __sklearn_is_fitted__(self):
    method fit (line 166) | def fit(self, X, y):
    method predict (line 169) | def predict(self, X, got_attribute=False):
  class DummySampler (line 174) | class DummySampler(NoTrans):
    method fit_resample (line 177) | def fit_resample(self, X, y):
  class FitTransformSample (line 185) | class FitTransformSample(NoTrans):
    method __sklearn_is_fitted__ (line 188) | def __sklearn_is_fitted__(self):
    method fit (line 191) | def fit(self, X, y, should_succeed=False):
    method fit_resample (line 194) | def fit_resample(self, X, y=None):
    method fit_transform (line 197) | def fit_transform(self, X, y=None):
    method transform (line 200) | def transform(self, X, y=None):
  function test_pipeline_init_tuple (line 204) | def test_pipeline_init_tuple():
  function test_pipeline_init (line 215) | def test_pipeline_init():
  function test_pipeline_methods_anova (line 285) | def test_pipeline_methods_anova():
  function test_pipeline_fit_params (line 301) | def test_pipeline_fit_params():
  function test_pipeline_sample_weight_supported (line 315) | def test_pipeline_sample_weight_supported():
  function test_pipeline_sample_weight_unsupported (line 326) | def test_pipeline_sample_weight_unsupported():
  function test_pipeline_raise_set_params_error (line 337) | def test_pipeline_raise_set_params_error():
  function test_pipeline_methods_pca_svm (line 348) | def test_pipeline_methods_pca_svm():
  function test_pipeline_methods_preprocessing_svm (line 364) | def test_pipeline_methods_preprocessing_svm():
  function test_fit_predict_on_pipeline (line 400) | def test_fit_predict_on_pipeline():
  function test_fit_predict_on_pipeline_without_fit_predict (line 423) | def test_fit_predict_on_pipeline_without_fit_predict():
  function test_fit_predict_with_intermediate_fit_params (line 434) | def test_fit_predict_with_intermediate_fit_params():
  function test_pipeline_transform (line 446) | def test_pipeline_transform():
  function test_pipeline_fit_transform (line 466) | def test_pipeline_fit_transform():
  function test_set_pipeline_steps (line 480) | def test_set_pipeline_steps():
  function test_pipeline_correctly_adjusts_steps (line 509) | def test_pipeline_correctly_adjusts_steps(passthrough):
  function test_set_pipeline_step_passthrough (line 525) | def test_set_pipeline_step_passthrough(passthrough):
  function test_pipeline_ducktyping (line 604) | def test_pipeline_ducktyping():
  function test_make_pipeline (line 632) | def test_make_pipeline():
  function test_classes_property (line 647) | def test_classes_property():
  function test_pipeline_memory_transformer (line 667) | def test_pipeline_memory_transformer():
  function test_pipeline_memory_sampler (line 733) | def test_pipeline_memory_sampler():
  function test_pipeline_methods_pca_rus_svm (line 808) | def test_pipeline_methods_pca_rus_svm():
  function test_pipeline_methods_rus_pca_svm (line 835) | def test_pipeline_methods_rus_pca_svm():
  function test_pipeline_sample (line 862) | def test_pipeline_sample():
  function test_pipeline_sample_transform (line 901) | def test_pipeline_sample_transform():
  function test_pipeline_none_classifier (line 925) | def test_pipeline_none_classifier():
  function test_pipeline_none_sampler_classifier (line 948) | def test_pipeline_none_sampler_classifier():
  function test_pipeline_sampler_none_classifier (line 972) | def test_pipeline_sampler_none_classifier():
  function test_pipeline_none_sampler_sample (line 996) | def test_pipeline_none_sampler_sample():
  function test_pipeline_none_transformer (line 1016) | def test_pipeline_none_transformer():
  function test_pipeline_methods_anova_rus (line 1040) | def test_pipeline_methods_anova_rus():
  function test_pipeline_with_step_that_implements_both_sample_and_transform (line 1066) | def test_pipeline_with_step_that_implements_both_sample_and_transform():
  function test_pipeline_with_step_that_it_is_pipeline (line 1087) | def test_pipeline_with_step_that_it_is_pipeline():
  function test_pipeline_fit_then_sample_with_sampler_last_estimator (line 1111) | def test_pipeline_fit_then_sample_with_sampler_last_estimator():
  function test_pipeline_fit_then_sample_3_samplers_with_sampler_last_estimator (line 1136) | def test_pipeline_fit_then_sample_3_samplers_with_sampler_last_estimator():
  function test_make_pipeline_memory (line 1161) | def test_make_pipeline_memory():
  function test_predict_with_predict_params (line 1173) | def test_predict_with_predict_params():
  function test_resampler_last_stage_passthrough (line 1182) | def test_resampler_last_stage_passthrough():
  function test_pipeline_score_samples_pca_lof_binary (line 1201) | def test_pipeline_score_samples_pca_lof_binary():
  function test_score_samples_on_pipeline_without_score_samples (line 1230) | def test_score_samples_on_pipeline_without_score_samples():
  function test_pipeline_param_error (line 1244) | def test_pipeline_param_error():
  function test_verbose (line 1317) | def test_verbose(est, method, pattern, capsys):
  function test_pipeline_score_samples_pca_lof_multiclass (line 1332) | def test_pipeline_score_samples_pca_lof_multiclass():
  function test_pipeline_param_validation (line 1351) | def test_pipeline_param_validation():
  function test_pipeline_with_set_output (line 1358) | def test_pipeline_with_set_output():
  function test_pipeline_warns_not_fitted (line 1393) | def test_pipeline_warns_not_fitted(method):
  function test_transform_input_explicit_value_check (line 1442) | def test_transform_input_explicit_value_check():
  function test_transform_input_no_slep6 (line 1475) | def test_transform_input_no_slep6():
  function test_transform_input_sklearn_version (line 1489) | def test_transform_input_sklearn_version():
  function test_metadata_routing_with_sampler (line 1505) | def test_metadata_routing_with_sampler():

FILE: imblearn/tests/test_public_functions.py
  function test_function_param_validation (line 29) | def test_function_param_validation(func_module):

FILE: imblearn/under_sampling/_prototype_generation/_cluster_centroids.py
  class ClusterCentroids (line 28) | class ClusterCentroids(BaseUnderSampler):
    method __init__ (line 124) | def __init__(
    method _validate_estimator (line 137) | def _validate_estimator(self):
    method _generate_sample (line 149) | def _generate_sample(self, X, y, centroids, target_class):
    method _fit_resample (line 164) | def _fit_resample(self, X, y):
    method _more_tags (line 204) | def _more_tags(self):
    method __sklearn_tags__ (line 207) | def __sklearn_tags__(self):

FILE: imblearn/under_sampling/_prototype_generation/tests/test_cluster_centroids.py
  function test_fit_resample_check_voting (line 37) | def test_fit_resample_check_voting(X, expected_voting):
  function test_fit_resample_auto (line 44) | def test_fit_resample_auto():
  function test_fit_resample_half (line 53) | def test_fit_resample_half():
  function test_multiclass_fit_resample (line 62) | def test_multiclass_fit_resample():
  function test_fit_resample_object (line 74) | def test_fit_resample_object():
  function test_fit_hard_voting (line 88) | def test_fit_hard_voting():
  function test_cluster_centroids_hard_target_class (line 107) | def test_cluster_centroids_hard_target_class():
  function test_cluster_centroids_custom_clusterer (line 141) | def test_cluster_centroids_custom_clusterer():

FILE: imblearn/under_sampling/_prototype_selection/_condensed_nearest_neighbour.py
  class CondensedNearestNeighbour (line 28) | class CondensedNearestNeighbour(BaseCleaningSampler):
    method __init__ (line 132) | def __init__(
    method _validate_estimator (line 147) | def _validate_estimator(self):
    method _fit_resample (line 160) | def _fit_resample(self, X, y):
    method _more_tags (line 241) | def _more_tags(self):
    method __sklearn_tags__ (line 244) | def __sklearn_tags__(self):

FILE: imblearn/under_sampling/_prototype_selection/_edited_nearest_neighbours.py
  class EditedNearestNeighbours (line 28) | class EditedNearestNeighbours(BaseCleaningSampler):
    method __init__ (line 136) | def __init__(
    method _validate_estimator (line 149) | def _validate_estimator(self):
    method _fit_resample (line 156) | def _fit_resample(self, X, y):
    method _more_tags (line 192) | def _more_tags(self):
    method __sklearn_tags__ (line 195) | def __sklearn_tags__(self):
  class RepeatedEditedNearestNeighbours (line 205) | class RepeatedEditedNearestNeighbours(BaseCleaningSampler):
    method __init__ (line 327) | def __init__(
    method _validate_estimator (line 342) | def _validate_estimator(self):
    method _fit_resample (line 355) | def _fit_resample(self, X, y):
    method _more_tags (line 418) | def _more_tags(self):
    method __sklearn_tags__ (line 421) | def __sklearn_tags__(self):
  class AllKNN (line 431) | class AllKNN(BaseCleaningSampler):
    method __init__ (line 552) | def __init__(
    method _validate_estimator (line 567) | def _validate_estimator(self):
    method _fit_resample (line 580) | def _fit_resample(self, X, y):
    method _more_tags (line 632) | def _more_tags(self):
    method __sklearn_tags__ (line 635) | def __sklearn_tags__(self):

FILE: imblearn/under_sampling/_prototype_selection/_instance_hardness_threshold.py
  class InstanceHardnessThreshold (line 30) | class InstanceHardnessThreshold(BaseUnderSampler):
    method __init__ (line 123) | def __init__(
    method _validate_estimator (line 138) | def _validate_estimator(self, random_state):
    method _fit_resample (line 156) | def _fit_resample(self, X, y):
    method _more_tags (line 203) | def _more_tags(self):
    method __sklearn_tags__ (line 206) | def __sklearn_tags__(self):

FILE: imblearn/under_sampling/_prototype_selection/_nearmiss.py
  class NearMiss (line 24) | class NearMiss(BaseUnderSampler):
    method __init__ (line 132) | def __init__(
    method _selection_dist_based (line 147) | def _selection_dist_based(
    method _validate_estimator (line 216) | def _validate_estimator(self):
    method _fit_resample (line 228) | def _fit_resample(self, X, y):
    method _more_tags (line 309) | def _more_tags(self):
    method __sklearn_tags__ (line 319) | def __sklearn_tags__(self):

FILE: imblearn/under_sampling/_prototype_selection/_neighbourhood_cleaning_rule.py
  class NeighbourhoodCleaningRule (line 30) | class NeighbourhoodCleaningRule(BaseCleaningSampler):
    method __init__ (line 142) | def __init__(
    method _validate_estimator (line 157) | def _validate_estimator(self):
    method _fit_resample (line 181) | def _fit_resample(self, X, y):
    method _more_tags (line 233) | def _more_tags(self):
    method __sklearn_tags__ (line 236) | def __sklearn_tags__(self):

FILE: imblearn/under_sampling/_prototype_selection/_one_sided_selection.py
  class OneSidedSelection (line 27) | class OneSidedSelection(BaseCleaningSampler):
    method __init__ (line 125) | def __init__(
    method _validate_estimator (line 140) | def _validate_estimator(self):
    method _fit_resample (line 153) | def _fit_resample(self, X, y):
    method _more_tags (line 207) | def _more_tags(self):
    method __sklearn_tags__ (line 210) | def __sklearn_tags__(self):

FILE: imblearn/under_sampling/_prototype_selection/_random_under_sampler.py
  class RandomUnderSampler (line 21) | class RandomUnderSampler(BaseUnderSampler):
    method __init__ (line 93) | def __init__(
    method _check_X_y (line 100) | def _check_X_y(self, X, y):
    method _fit_resample (line 106) | def _fit_resample(self, X, y):
    method _more_tags (line 134) | def _more_tags(self):
    method __sklearn_tags__ (line 144) | def __sklearn_tags__(self):

FILE: imblearn/under_sampling/_prototype_selection/_tomek_links.py
  class TomekLinks (line 23) | class TomekLinks(BaseCleaningSampler):
    method __init__ (line 98) | def __init__(self, *, sampling_strategy="auto", n_jobs=None):
    method is_tomek (line 103) | def is_tomek(y, nn_index, class_type):
    method _fit_resample (line 145) | def _fit_resample(self, X, y):
    method _more_tags (line 159) | def _more_tags(self):
    method __sklearn_tags__ (line 162) | def __sklearn_tags__(self):

FILE: imblearn/under_sampling/_prototype_selection/tests/test_allknn.py
  function test_allknn_fit_resample (line 105) | def test_allknn_fit_resample():
  function test_all_knn_allow_minority (line 175) | def test_all_knn_allow_minority():
  function test_allknn_fit_resample_mode (line 196) | def test_allknn_fit_resample_mode():
  function test_allknn_fit_resample_with_nn_object (line 274) | def test_allknn_fit_resample_with_nn_object():
  function test_alknn_not_good_object (line 353) | def test_alknn_not_good_object():

FILE: imblearn/under_sampling/_prototype_selection/tests/test_condensed_nearest_neighbour.py
  function test_cnn_init (line 42) | def test_cnn_init():
  function test_cnn_fit_resample (line 49) | def test_cnn_fit_resample():
  function test_cnn_fit_resample_with_object (line 73) | def test_cnn_fit_resample_with_object(n_neighbors):
  function test_condensed_nearest_neighbour_multiclass (line 101) | def test_condensed_nearest_neighbour_multiclass():

FILE: imblearn/under_sampling/_prototype_selection/tests/test_edited_nearest_neighbours.py
  function test_enn_init (line 40) | def test_enn_init():
  function test_enn_fit_resample (line 48) | def test_enn_fit_resample():
  function test_enn_fit_resample_mode (line 68) | def test_enn_fit_resample_mode():
  function test_enn_fit_resample_with_nn_object (line 95) | def test_enn_fit_resample_with_nn_object():
  function test_enn_check_kind_selection (line 123) | def test_enn_check_kind_selection():

FILE: imblearn/under_sampling/_prototype_selection/tests/test_instance_hardness_threshold.py
  function test_iht_init (line 38) | def test_iht_init():
  function test_iht_fit_resample (line 50) | def test_iht_fit_resample():
  function test_iht_fit_resample_half (line 57) | def test_iht_fit_resample_half():
  function test_iht_fit_resample_class_obj (line 69) | def test_iht_fit_resample_class_obj():
  function test_iht_reproducibility (line 77) | def test_iht_reproducibility():
  function test_iht_fit_resample_default_estimator (line 91) | def test_iht_fit_resample_default_estimator():
  function test_iht_estimator_pipeline (line 99) | def test_iht_estimator_pipeline():

FILE: imblearn/under_sampling/_prototype_selection/tests/test_nearmiss.py
  function test_nm_fit_resample_auto (line 36) | def test_nm_fit_resample_auto():
  function test_nm_fit_resample_float_sampling_strategy (line 91) | def test_nm_fit_resample_float_sampling_strategy():
  function test_nm_fit_resample_nn_obj (line 153) | def test_nm_fit_resample_nn_obj():

FILE: imblearn/under_sampling/_prototype_selection/tests/test_neighbourhood_cleaning_rule.py
  function data (line 17) | def data():
  function test_ncr_threshold_cleaning (line 31) | def test_ncr_threshold_cleaning(data):
  function test_ncr_n_neighbors (line 61) | def test_ncr_n_neighbors(data):

FILE: imblearn/under_sampling/_prototype_selection/tests/test_one_sided_selection.py
  function test_oss_init (line 37) | def test_oss_init():
  function test_oss_fit_resample (line 45) | def test_oss_fit_resample():
  function test_oss_with_object (line 71) | def test_oss_with_object(n_neighbors):
  function test_one_sided_selection_multiclass (line 101) | def test_one_sided_selection_multiclass():

FILE: imblearn/under_sampling/_prototype_selection/tests/test_random_under_sampler.py
  function test_rus_fit_resample (line 35) | def test_rus_fit_resample(as_frame):
  function test_rus_fit_resample_half (line 64) | def test_rus_fit_resample_half():
  function test_multiclass_fit_resample (line 91) | def test_multiclass_fit_resample():
  function test_random_under_sampling_heterogeneous_data (line 103) | def test_random_under_sampling_heterogeneous_data():
  function test_random_under_sampling_nan_inf (line 116) | def test_random_under_sampling_nan_inf():
  function test_random_under_sampler_strings (line 139) | def test_random_under_sampler_strings(sampling_strategy):
  function test_random_under_sampling_datetime (line 153) | def test_random_under_sampling_datetime():
  function test_random_under_sampler_full_nat (line 166) | def test_random_under_sampler_full_nat():

FILE: imblearn/under_sampling/_prototype_selection/tests/test_repeated_edited_nearest_neighbours.py
  function test_renn_init (line 103) | def test_renn_init():
  function test_renn_iter_wrong (line 111) | def test_renn_iter_wrong():
  function test_renn_fit_resample (line 118) | def test_renn_fit_resample():
  function test_renn_fit_resample_mode_object (line 160) | def test_renn_fit_resample_mode_object():
  function test_renn_fit_resample_mode (line 245) | def test_renn_fit_resample_mode():
  function test_renn_iter_attribute (line 335) | def test_renn_iter_attribute(max_iter, n_iter):

FILE: imblearn/under_sampling/_prototype_selection/tests/test_tomek_links.py
  function test_tl_init (line 40) | def test_tl_init():
  function test_tl_fit_resample (line 45) | def test_tl_fit_resample():
  function test_tomek_links_strings (line 78) | def test_tomek_links_strings(sampling_strategy):

FILE: imblearn/under_sampling/base.py
  class BaseUnderSampler (line 15) | class BaseUnderSampler(BaseSampler):
  class BaseCleaningSampler (line 74) | class BaseCleaningSampler(BaseSampler):

FILE: imblearn/utils/_docstring.py
  class Substitution (line 7) | class Substitution:
    method __init__ (line 15) | def __init__(self, *args, **kwargs):
    method __call__ (line 21) | def __call__(self, obj):

FILE: imblearn/utils/_show_versions.py
  function _get_deps_info (line 14) | def _get_deps_info():
  function show_versions (line 49) | def show_versions(github=False):

FILE: imblearn/utils/_tags.py
  function _dataclass_args (line 15) | def _dataclass_args():
  class InputTags (line 20) | class InputTags(SklearnInputTags):
  class SamplerTags (line 82) | class SamplerTags:
  class Tags (line 96) | class Tags:
  function get_tags (line 160) | def get_tags(estimator):
  function _to_new_tags (line 190) | def _to_new_tags(old_tags, estimator=None):

FILE: imblearn/utils/_test_common/instance_generator.py
  function _tested_estimators (line 101) | def _tested_estimators(type_filter=None):
  function _construct_instances (line 107) | def _construct_instances(Estimator):
  function _get_check_estimator_ids (line 130) | def _get_check_estimator_ids(obj):
  function _yield_instances_for_check (line 166) | def _yield_instances_for_check(check, estimator_orig):
  function _get_expected_failed_checks (line 240) | def _get_expected_failed_checks(estimator):

FILE: imblearn/utils/_validation.py
  class ArraysTransformer (line 32) | class ArraysTransformer:
    method __init__ (line 35) | def __init__(self, X, y):
    method transform (line 39) | def transform(self, X, y):
    method _gets_props (line 50) | def _gets_props(self, array):
    method _transfrom_one (line 58) | def _transfrom_one(self, array, props):
  function _is_neighbors_object (line 96) | def _is_neighbors_object(estimator):
  function check_neighbors_object (line 116) | def check_neighbors_object(nn_name, nn_object, additional_neighbor=0):
  function _count_class_sample (line 146) | def _count_class_sample(y):
  function check_target_type (line 151) | def check_target_type(y, indicate_one_vs_all=False):
  function _sampling_strategy_all (line 189) | def _sampling_strategy_all(y, sampling_type):
  function _sampling_strategy_majority (line 206) | def _sampling_strategy_majority(y, sampling_type):
  function _sampling_strategy_not_majority (line 227) | def _sampling_strategy_not_majority(y, sampling_type):
  function _sampling_strategy_not_minority (line 253) | def _sampling_strategy_not_minority(y, sampling_type):
  function _sampling_strategy_minority (line 279) | def _sampling_strategy_minority(y, sampling_type):
  function _sampling_strategy_auto (line 301) | def _sampling_strategy_auto(y, sampling_type):
  function _sampling_strategy_dict (line 310) | def _sampling_strategy_dict(sampling_strategy, y, sampling_type):
  function _sampling_strategy_list (line 364) | def _sampling_strategy_list(sampling_strategy, y, sampling_type):
  function _sampling_strategy_float (line 389) | def _sampling_strategy_float(sampling_strategy, y, sampling_type):
  function check_sampling_strategy (line 438) | def check_sampling_strategy(sampling_strategy, y, sampling_type, **kwargs):
  function _deprecate_positional_args (line 589) | def _deprecate_positional_args(f):
  function _check_X (line 633) | def _check_X(X):

FILE: imblearn/utils/deprecation.py
  function deprecate_parameter (line 9) | def deprecate_parameter(sampler, version_deprecation, param_deprecated, ...

FILE: imblearn/utils/estimator_checks.py
  function sample_dataset_generator (line 50) | def sample_dataset_generator():
  function _set_checking_parameters (line 61) | def _set_checking_parameters(estimator):
  function _yield_sampler_checks (line 76) | def _yield_sampler_checks(sampler):
  function _yield_classifier_checks (line 109) | def _yield_classifier_checks(classifier):
  function _yield_all_checks (line 114) | def _yield_all_checks(estimator, legacy=True):
  function _check_name (line 134) | def _check_name(check):
  function _maybe_mark (line 140) | def _maybe_mark(estimator, check, expected_failed_checks=None, mark=None...
  function _should_be_skipped_or_marked (line 179) | def _should_be_skipped_or_marked(
  function estimator_checks_generator (line 211) | def estimator_checks_generator(
  function parametrize_with_checks (line 257) | def parametrize_with_checks(estimators, *, legacy=True, expected_failed_...
  function check_target_type (line 349) | def check_target_type(name, estimator_orig):
  function check_samplers_one_label (line 365) | def check_samplers_one_label(name, sampler_orig):
  function check_samplers_fit (line 386) | def check_samplers_fit(name, sampler_orig):
  function check_samplers_fit_resample (line 397) | def check_samplers_fit_resample(name, sampler_orig):
  function check_samplers_sampling_strategy_fit_resample (line 426) | def check_samplers_sampling_strategy_fit_resample(name, sampler_orig):
  function check_samplers_sparse (line 448) | def check_samplers_sparse(name, sampler_orig):
  function check_samplers_pandas_sparse (line 462) | def check_samplers_pandas_sparse(name, sampler_orig):
  function check_samplers_pandas (line 494) | def check_samplers_pandas(name, sampler_orig):
  function check_samplers_list (line 526) | def check_samplers_list(name, sampler_orig):
  function check_samplers_multiclass_ova (line 543) | def check_samplers_multiclass_ova(name, sampler_orig):
  function check_samplers_2d_target (line 555) | def check_samplers_2d_target(name, sampler_orig):
  function check_samplers_preserve_dtype (line 563) | def check_samplers_preserve_dtype(name, sampler_orig):
  function check_samplers_sample_indices (line 574) | def check_samplers_sample_indices(name, sampler_orig):
  function check_samplers_string (line 585) | def check_samplers_string(name, sampler_orig):
  function check_samplers_nan (line 600) | def check_samplers_nan(name, sampler_orig):
  function check_classifier_on_multilabel_or_multioutput_targets (line 615) | def check_classifier_on_multilabel_or_multioutput_targets(name, estimato...
  function check_classifiers_with_encoded_labels (line 623) | def check_classifiers_with_encoded_labels(name, classifier_orig):
  function check_param_validation (line 651) | def check_param_validation(name, estimator_orig):
  function check_dataframe_column_names_consistency (line 726) | def check_dataframe_column_names_consistency(name, estimator_orig):
  function check_sampler_get_feature_names_out (line 868) | def check_sampler_get_feature_names_out(name, sampler_orig):
  function check_sampler_get_feature_names_out_pandas (line 913) | def check_sampler_get_feature_names_out_pandas(name, sampler_orig):

FILE: imblearn/utils/testing.py
  function all_estimators (line 20) | def all_estimators(
  class _CustomNearestNeighbors (line 113) | class _CustomNearestNeighbors(BaseEstimator):
    method __init__ (line 119) | def __init__(self, n_neighbors=1, metric="euclidean"):
    method fit (line 123) | def fit(self, X, y=None):
    method kneighbors (line 128) | def kneighbors(self, X, n_neighbors=None, return_distance=True):
    method kneighbors_graph (line 136) | def kneighbors_graph(X=None, n_neighbors=None, mode="connectivity"):
  class _CustomClusterer (line 142) | class _CustomClusterer(BaseEstimator):
    method __init__ (line 145) | def __init__(self, n_clusters=1, expose_cluster_centers=True):
    method fit (line 149) | def fit(self, X, y=None):
    method predict (line 154) | def predict(self, X):

FILE: imblearn/utils/tests/test_deprecation.py
  class Sampler (line 11) | class Sampler:
    method __init__ (line 12) | def __init__(self):
  function test_deprecate_parameter (line 17) | def test_deprecate_parameter():

FILE: imblearn/utils/tests/test_docstring.py
  function _dedent_docstring (line 15) | def _dedent_docstring(docstring):
  function func (line 33) | def func(param_1, param_2):
  class cls (line 55) | class cls:
    method __init__ (line 65) | def __init__(self, param_1, param_2):
  function test_docstring_inject (line 78) | def test_docstring_inject(obj, obj_docstring):
  function test_docstring_template (line 83) | def test_docstring_template():
  function test_docstring_with_python_OO (line 88) | def test_docstring_with_python_OO():

FILE: imblearn/utils/tests/test_estimator_checks.py
  class BaseBadSampler (line 21) | class BaseBadSampler(BaseEstimator):
    method fit (line 26) | def fit(self, X, y):
    method fit_resample (line 29) | def fit_resample(self, X, y):
  class SamplerSingleClass (line 35) | class SamplerSingleClass(BaseSampler):
    method fit_resample (line 40) | def fit_resample(self, X, y):
    method _fit_resample (line 43) | def _fit_resample(self, X, y):
  class NotFittedSampler (line 47) | class NotFittedSampler(BaseBadSampler):
    method fit (line 50) | def fit(self, X, y):
  class NoAcceptingSparseSampler (line 55) | class NoAcceptingSparseSampler(BaseBadSampler):
    method fit (line 58) | def fit(self, X, y):
  class NotPreservingDtypeSampler (line 64) | class NotPreservingDtypeSampler(BaseSampler):
    method _fit_resample (line 69) | def _fit_resample(self, X, y):
  class IndicesSampler (line 73) | class IndicesSampler(BaseOverSampler):
    method _check_X_y (line 74) | def _check_X_y(self, X, y):
    method _fit_resample (line 86) | def _fit_resample(self, X, y):
  function test_check_samplers_string (line 92) | def test_check_samplers_string():
  function test_check_samplers_nan (line 97) | def test_check_samplers_nan():
  function _test_single_check (line 111) | def _test_single_check(Estimator, check):
  function test_all_checks (line 119) | def test_all_checks():

FILE: imblearn/utils/tests/test_min_dependencies.py
  function test_min_dependencies_readme (line 19) | def test_min_dependencies_readme():

FILE: imblearn/utils/tests/test_show_versions.py
  function test_get_deps_info (line 8) | def test_get_deps_info():
  function test_show_versions_default (line 21) | def test_show_versions_default(capsys):
  function test_show_versions_github (line 40) | def test_show_versions_github(capsys):

FILE: imblearn/utils/tests/test_testing.py
  function test_all_estimators (line 14) | def test_all_estimators():
  function test_custom_nearest_neighbors (line 30) | def test_custom_nearest_neighbors():

FILE: imblearn/utils/tests/test_validation.py
  function test_check_neighbors_object (line 31) | def test_check_neighbors_object():
  function test_check_target_type (line 56) | def test_check_target_type(target, output_target):
  function test_check_target_type_ova (line 69) | def test_check_target_type_ova(target, output_target, is_ova):
  function test_check_sampling_strategy_warning (line 77) | def test_check_sampling_strategy_warning():
  function test_check_sampling_strategy_float_error (line 106) | def test_check_sampling_strategy_float_error(ratio, y, type, err_msg):
  function test_check_sampling_strategy_error (line 111) | def test_check_sampling_strategy_error():
  function test_check_sampling_strategy_error_wrong_string (line 131) | def test_check_sampling_strategy_error_wrong_string(
  function test_sampling_strategy_class_target_unknown (line 149) | def test_sampling_strategy_class_target_unknown(sampling_strategy, sampl...
  function test_sampling_strategy_dict_error (line 155) | def test_sampling_strategy_dict_error():
  function test_sampling_strategy_float_error_not_in_range (line 181) | def test_sampling_strategy_float_error_not_in_range(sampling_strategy):
  function test_sampling_strategy_float_error_not_binary (line 187) | def test_sampling_strategy_float_error_not_binary():
  function test_sampling_strategy_list_error_not_clean_sampling (line 195) | def test_sampling_strategy_list_error_not_clean_sampling(sampling_method):
  function _sampling_strategy_func (line 202) | def _sampling_strategy_func(y):
  function test_check_sampling_strategy (line 250) | def test_check_sampling_strategy(
  function test_sampling_strategy_callable_args (line 259) | def test_sampling_strategy_callable_args():
  function test_sampling_strategy_check_order (line 291) | def test_sampling_strategy_check_order(
  function test_arrays_transformer_plain_list (line 301) | def test_arrays_transformer_plain_list():
  function test_arrays_transformer_numpy (line 311) | def test_arrays_transformer_numpy():
  function test_arrays_transformer_pandas (line 321) | def test_arrays_transformer_pandas():
  function test_deprecate_positional_args_warns_for_function (line 351) | def test_deprecate_positional_args_warns_for_function():
  function test_is_neighbors_object (line 381) | def test_is_neighbors_object(estimator, is_neighbor_estimator):

FILE: maint_tools/test_docstring.py
  function get_all_methods (line 65) | def get_all_methods():
  function _is_checked_function (line 84) | def _is_checked_function(item):
  function get_all_functions_names (line 98) | def get_all_functions_names():
  function filter_errors (line 125) | def filter_errors(errors, method, Estimator=None):
  function repr_errors (line 164) | def repr_errors(res, estimator=None, method: str | None = None) -> str:
  function test_function_docstring (line 218) | def test_function_docstring(function_name, request):
  function test_docstring (line 236) | def test_docstring(Estimator, method, request):

Download .json

Condensed preview — 242 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (1,247K chars).

[
  {
    "path": ".circleci/config.yml",
    "chars": 1292,
    "preview": "version: 2.1\n\njobs:\n  python3:\n    docker:\n      - image: cimg/python:3.9\n    environment:\n      - OMP_NUM_THREADS: 1\n  "
  },
  {
    "path": ".coveragerc",
    "chars": 174,
    "preview": "[run]\nbranch = True\n\n[report]\nexclude_lines =\n    if self.debug:\n    pragma: no cover\n    raise NotImplementedError\nigno"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/bug_report.md",
    "chars": 1588,
    "preview": "---\nname: Bug report\nabout: Create a report to help us reproduce and correct the bug\ntitle: \"[BUG]\"\nlabels: bug\nassignee"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/documentation-improvement.md",
    "chars": 403,
    "preview": "---\nname: Documentation improvement\nabout: Create a report to help us improve the documentation\ntitle: \"[DOC]\"\nlabels: D"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/feature_request.md",
    "chars": 535,
    "preview": "---\nname: Feature request\nabout: Suggest an new algorithm, enhancement to an existing algorithm, etc.\ntitle: \"[ENH]\"\nlab"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/other--blank-template-.md",
    "chars": 127,
    "preview": "---\nname: Other (blank template)\nabout: For all other issues to reach the community...\ntitle: ''\nlabels: ''\nassignees: '"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/question.md",
    "chars": 248,
    "preview": "---\nname: Question\nabout: If you have a usage question\ntitle: ''\nlabels: ''\nassignees: ''\n\n---\n\n**\nIf your issue is a us"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/usage-question.md",
    "chars": 479,
    "preview": "---\nname: Usage question\nabout: If you have a usage question\ntitle: \"[SO]\"\nlabels: question\nassignees: ''\n\n---\n\n** If yo"
  },
  {
    "path": ".github/ISSUE_TEMPLATE.md",
    "chars": 1657,
    "preview": "<!--\nIf your issue is a usage question, submit it here instead:\n- The imbalanced learn gitter: https://gitter.im/scikit-"
  },
  {
    "path": ".github/PULL_REQUEST_TEMPLATE.md",
    "chars": 819,
    "preview": "<!--\nThanks for contributing a pull request! Please ensure you have taken a look at\nthe contribution guidelines: https:/"
  },
  {
    "path": ".github/check-changelog.yml",
    "chars": 3046,
    "preview": "name: Check Changelog\n# This check makes sure that the changelog is properly updated\n# when a PR introduces a change in "
  },
  {
    "path": ".github/dependabot.yml",
    "chars": 574,
    "preview": "version: 2\nupdates:\n  # Maintain dependencies for GitHub Actions as recommended in SPEC8:\n  # https://github.com/scienti"
  },
  {
    "path": ".github/workflows/circleci-artifacts-redirector.yml",
    "chars": 938,
    "preview": "name: CircleCI artifacts redirector\n\non: [status]\n\n# Restrict the permissions granted to the use of secrets.GITHUB_TOKEN"
  },
  {
    "path": ".github/workflows/linters.yml",
    "chars": 428,
    "preview": "name: Run code format checks\n\non:\n  push:\n    branches:\n      - \"main\"\n  pull_request:\n    branches:\n      - '*'\n\njobs:\n"
  },
  {
    "path": ".github/workflows/tests.yml",
    "chars": 1664,
    "preview": "name: 'tests'\n\non:\n  push:\n    branches:\n      - \"main\"\n  pull_request:\n    branches:\n      - '*'\n\njobs:\n  test:\n    str"
  },
  {
    "path": ".gitignore",
    "chars": 1437,
    "preview": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n\n# C extensions\n*.so\n\n# Distribution / packaging\n.Python\n"
  },
  {
    "path": ".pre-commit-config.yaml",
    "chars": 427,
    "preview": "repos:\n-   repo: https://github.com/pre-commit/pre-commit-hooks\n    rev: v4.3.0\n    hooks:\n    -   id: check-yaml\n    - "
  },
  {
    "path": "AUTHORS.rst",
    "chars": 563,
    "preview": "History\n-------\n\nDevelopment lead\n~~~~~~~~~~~~~~~~\n\nThe project started in August 2014 by Fernando Nogueira and focused "
  },
  {
    "path": "CONTRIBUTING.md",
    "chars": 6992,
    "preview": "Contributing code\n=================\n\nThis guide is adapted from [scikit-learn](https://github.com/scikit-learn/scikit-le"
  },
  {
    "path": "LICENSE",
    "chars": 1125,
    "preview": "The MIT License (MIT)\n\nCopyright (c) 2014-2020 The imbalanced-learn developers.\nAll rights reserved.\n\nPermission is here"
  },
  {
    "path": "MANIFEST.in",
    "chars": 133,
    "preview": "\nrecursive-include doc *\nrecursive-include examples *\ninclude AUTHORS.rst\ninclude CONTRIBUTING.md\ninclude LICENSE\ninclud"
  },
  {
    "path": "README.rst",
    "chars": 6064,
    "preview": ".. -*- mode: rst -*-\n\n.. _scikit-learn: http://scikit-learn.org/stable/\n\n.. _scikit-learn-contrib: https://github.com/sc"
  },
  {
    "path": "build_tools/circle/build_doc.sh",
    "chars": 405,
    "preview": "#!/usr/bin/env bash\nset -x\nset -e\n\n# deactivate circleci virtualenv and setup a miniconda env instead\nif [[ `type -t dea"
  },
  {
    "path": "build_tools/circle/checkout_merge_commit.sh",
    "chars": 929,
    "preview": "#!/bin/bash\n\n# Add `master` branch to the update list.\n# Otherwise CircleCI will give us a cached one.\nFETCH_REFS=\"+mast"
  },
  {
    "path": "build_tools/circle/linting.sh",
    "chars": 6322,
    "preview": "#!/bin/bash\n\n# This script is used in CircleCI to check that PRs do not add obvious\n# flake8 violations. It relies on tw"
  },
  {
    "path": "build_tools/circle/push_doc.sh",
    "chars": 1330,
    "preview": "#!/bin/bash\n# This script is meant to be called in the \"deploy\" step defined in\n# circle.yml. See https://circleci.com/d"
  },
  {
    "path": "conftest.py",
    "chars": 1054,
    "preview": "# This file is here so that when running from the root folder\n# ./imblearn is added to sys.path by pytest.\n# See https:/"
  },
  {
    "path": "doc/Makefile",
    "chars": 7074,
    "preview": "# Makefile for Sphinx documentation\n#\n\n# You can set these variables from the command line.\nSPHINXOPTS    = -v\nSPHINXBUI"
  },
  {
    "path": "doc/_static/css/imbalanced-learn.css",
    "chars": 1163,
    "preview": "@import url(\"theme.css\");\n\n.highlight a {\n  text-decoration: underline;\n}\n\n.deprecated p {\n  padding: 10px 7px 10px 10px"
  },
  {
    "path": "doc/_static/js/copybutton.js",
    "chars": 2803,
    "preview": "$(document).ready(function() {\n    /* Add a [>>>] button on the top-right corner of code samples to hide\n     * the >>> "
  },
  {
    "path": "doc/_templates/class.rst",
    "chars": 457,
    "preview": "{{objname}}\n{{ underline }}==============\n\n.. currentmodule:: {{ module }}\n\n.. autoclass:: {{ objname }}\n\n   {% block me"
  },
  {
    "path": "doc/_templates/function.rst",
    "chars": 211,
    "preview": "{{objname}}\n{{ underline }}====================\n\n.. currentmodule:: {{ module }}\n\n.. autofunction:: {{ objname }}\n\n.. in"
  },
  {
    "path": "doc/_templates/numpydoc_docstring.rst",
    "chars": 214,
    "preview": "{{index}}\n{{summary}}\n{{extended_summary}}\n{{parameters}}\n{{returns}}\n{{yields}}\n{{other_parameters}}\n{{attributes}}\n{{r"
  },
  {
    "path": "doc/_templates/sidebar-search-bs.html",
    "chars": 371,
    "preview": "<div class=\"navbar-brand-box\">\n  <a class=\"navbar-brand-box text-wrap\" href=\"{{ pathto('index') }}\">\n    {% if logo %}\n "
  },
  {
    "path": "doc/about.rst",
    "chars": 651,
    "preview": "About us\n========\n\n.. include:: ../AUTHORS.rst\n\n.. _citing-imbalanced-learn:\n\nCiting imbalanced-learn\n------------------"
  },
  {
    "path": "doc/bibtex/refs.bib",
    "chars": 8315,
    "preview": "@inproceedings{mani2003knn,\n  title={kNN approach to unbalanced data distributions: a case study involving information e"
  },
  {
    "path": "doc/combine.rst",
    "chars": 2337,
    "preview": ".. _combine:\n\n=======================================\nCombination of over- and under-sampling\n=========================="
  },
  {
    "path": "doc/common_pitfalls.rst",
    "chars": 7714,
    "preview": ".. _common_pitfalls:\n\n=========================================\nCommon pitfalls and recommended practices\n=============="
  },
  {
    "path": "doc/conf.py",
    "chars": 10278,
    "preview": "#\n# imbalanced-learn documentation build configuration file, created by\n# sphinx-quickstart on Mon Jan 18 14:44:12 2016."
  },
  {
    "path": "doc/datasets/index.rst",
    "chars": 8070,
    "preview": ".. _datasets:\n\n=========================\nDataset loading utilities\n=========================\n\n.. currentmodule:: imblear"
  },
  {
    "path": "doc/developers_utils.rst",
    "chars": 7353,
    "preview": ".. _developers-utils:\n\n===================\nDeveloper guideline\n===================\n\nDeveloper utilities\n----------------"
  },
  {
    "path": "doc/ensemble.rst",
    "chars": 4830,
    "preview": ".. _ensemble:\n\n====================\nEnsemble of samplers\n====================\n\n.. currentmodule:: imblearn.ensemble\n\n.. "
  },
  {
    "path": "doc/index.rst",
    "chars": 3073,
    "preview": ".. project-template documentation master file, created by\n   sphinx-quickstart on Mon Jan 18 14:44:12 2016.\n   You can a"
  },
  {
    "path": "doc/install.rst",
    "chars": 3350,
    "preview": ".. _getting_started:\n\n###############\nGetting Started\n###############\n\nPrerequisites\n=============\n\n.. |PythonMinVersion"
  },
  {
    "path": "doc/introduction.rst",
    "chars": 3196,
    "preview": ".. _introduction:\n\n============\nIntroduction\n============\n\n.. _api_imblearn:\n\nAPI's of imbalanced-learn samplers\n-------"
  },
  {
    "path": "doc/make.bat",
    "chars": 6721,
    "preview": "@ECHO OFF\r\n\r\nREM Command file for Sphinx documentation\r\n\r\nif \"%SPHINXBUILD%\" == \"\" (\r\n\tset SPHINXBUILD=sphinx-build\r\n)\r\n"
  },
  {
    "path": "doc/metrics.rst",
    "chars": 5768,
    "preview": ".. _metrics:\n\n=======\nMetrics\n=======\n\n.. currentmodule:: imblearn.metrics\n\nClassification metrics\n---------------------"
  },
  {
    "path": "doc/miscellaneous.rst",
    "chars": 7505,
    "preview": ".. _miscellaneous:\n\n======================\nMiscellaneous samplers\n======================\n\n.. currentmodule:: imblearn\n\n."
  },
  {
    "path": "doc/model_selection.rst",
    "chars": 6264,
    "preview": ".. _cross_validation:\n\n================\nCross validation\n================\n\n.. currentmodule:: imblearn.model_selection\n\n"
  },
  {
    "path": "doc/over_sampling.rst",
    "chars": 14548,
    "preview": ".. _over-sampling:\n\n=============\nOver-sampling\n=============\n\n.. currentmodule:: imblearn.over_sampling\n\nA practical gu"
  },
  {
    "path": "doc/references/combine.rst",
    "chars": 320,
    "preview": ".. _combine_ref:\n\nCombination of over- and under-sampling methods\n===============================================\n\n.. au"
  },
  {
    "path": "doc/references/datasets.rst",
    "chars": 260,
    "preview": ".. _datasets_ref:\n\nDatasets\n========\n\n.. automodule:: imblearn.datasets\n    :no-members:\n    :no-inherited-members:\n\n.. "
  },
  {
    "path": "doc/references/ensemble.rst",
    "chars": 495,
    "preview": ".. _ensemble_ref:\n\nEnsemble methods\n================\n\n.. automodule:: imblearn.ensemble\n    :no-members:\n    :no-inherit"
  },
  {
    "path": "doc/references/index.rst",
    "chars": 315,
    "preview": ".. _api:\n\n#############\nAPI reference\n#############\n\nThis is the full API documentation of the `imbalanced-learn` toolbo"
  },
  {
    "path": "doc/references/keras.rst",
    "chars": 376,
    "preview": ".. _keras_ref:\n\nBatch generator for Keras\n=========================\n\n.. automodule:: imblearn.keras\n    :no-members:\n   "
  },
  {
    "path": "doc/references/metrics.rst",
    "chars": 878,
    "preview": ".. _metrics_ref:\n\nMetrics\n=======\n\n.. automodule:: imblearn.metrics\n   :no-members:\n   :no-inherited-members:\n\nClassific"
  },
  {
    "path": "doc/references/miscellaneous.rst",
    "chars": 213,
    "preview": ".. _misc_ref:\n\nMiscellaneous\n=============\n\nImbalance-learn provides some fast-prototyping tools.\n\n.. currentmodule:: im"
  },
  {
    "path": "doc/references/model_selection.rst",
    "chars": 440,
    "preview": ".. _model_selection_ref:\n\nModel selection methods\n=======================\n\n.. automodule:: imblearn.model_selection\n    "
  },
  {
    "path": "doc/references/over_sampling.rst",
    "chars": 512,
    "preview": ".. _over_sampling_ref:\n\nOver-sampling methods\n=====================\n\n.. automodule:: imblearn.over_sampling\n    :no-memb"
  },
  {
    "path": "doc/references/pipeline.rst",
    "chars": 320,
    "preview": ".. _pipeline_ref:\n\nPipeline\n========\n\n.. automodule:: imblearn.pipeline\n    :no-members:\n    :no-inherited-members:\n\n.. "
  },
  {
    "path": "doc/references/tensorflow.rst",
    "chars": 302,
    "preview": ".. _tensorflow_ref:\n\nBatch generator for TensorFlow\n==============================\n\n.. automodule:: imblearn.tensorflow\n"
  },
  {
    "path": "doc/references/under_sampling.rst",
    "chars": 919,
    "preview": ".. _under_sampling_ref:\n\nUnder-sampling methods\n======================\n\n.. automodule:: imblearn.under_sampling\n    :no-"
  },
  {
    "path": "doc/references/utils.rst",
    "chars": 717,
    "preview": "Utilities\n=========\n\n.. automodule:: imblearn.utils\n    :no-members:\n    :no-inherited-members:\n\n.. currentmodule:: imbl"
  },
  {
    "path": "doc/sphinxext/LICENSE.txt",
    "chars": 6014,
    "preview": "-------------------------------------------------------------------------------\n    The files\n    - numpydoc.py\n    - au"
  },
  {
    "path": "doc/sphinxext/MANIFEST.in",
    "chars": 43,
    "preview": "recursive-include tests *.py\ninclude *.txt\n"
  },
  {
    "path": "doc/sphinxext/README.txt",
    "chars": 1696,
    "preview": "=====================================\nnumpydoc -- Numpy's Sphinx extensions\n=====================================\n\nNumpy"
  },
  {
    "path": "doc/sphinxext/github_link.py",
    "chars": 2645,
    "preview": "import inspect\nimport os\nimport subprocess\nimport sys\nfrom functools import partial\nfrom operator import attrgetter\n\nREV"
  },
  {
    "path": "doc/sphinxext/sphinx_issues.py",
    "chars": 8028,
    "preview": "\"\"\"A Sphinx extension for linking to your project's issue tracker.\n\nCopyright 2014 Steven Loria\n\nPermission is hereby gr"
  },
  {
    "path": "doc/under_sampling.rst",
    "chars": 22471,
    "preview": ".. _under-sampling:\n\n==============\nUnder-sampling\n==============\n\n.. currentmodule:: imblearn.under_sampling\n\nOne way o"
  },
  {
    "path": "doc/user_guide.rst",
    "chars": 525,
    "preview": ".. title:: User guide: contents\n\n.. _user_guide:\n\n==========\nUser Guide\n==========\n\n.. Ensure that the references will b"
  },
  {
    "path": "doc/whats_new/v0.1.rst",
    "chars": 1170,
    "preview": ".. _changes_0_1:\n\nVersion 0.1\n===========\n\n**December 26, 2016**\n\nChangelog\n---------\n\nAPI\n~~~\n\n- First release of the s"
  },
  {
    "path": "doc/whats_new/v0.10.rst",
    "chars": 2097,
    "preview": ".. _changes_0_10:\n\nVersion 0.10.1\n==============\n\n**December 28, 2022**\n\nChangelog\n---------\n\nBug fixes\n.........\n\n- Fix"
  },
  {
    "path": "doc/whats_new/v0.11.rst",
    "chars": 2890,
    "preview": ".. _changes_0_11:\n\nVersion 0.11.0\n==============\n\n**July 8, 2023**\n\nChangelog\n---------\n\nBug fixes\n.........\n\n- Fix a bu"
  },
  {
    "path": "doc/whats_new/v0.12.rst",
    "chars": 4441,
    "preview": ".. _changes_0_12:\n\nVersion 0.12.4\n==============\n\n**October 4, 2024**\n\nChangelog\n---------\n\nCompatibility\n.............\n"
  },
  {
    "path": "doc/whats_new/v0.13.rst",
    "chars": 915,
    "preview": ".. _changes_0_13:\n\nVersion 0.13.0\n==============\n\n**December 20, 2024**\n\nChangelog\n---------\n\nBug fixes\n.........\n\n- Fix"
  },
  {
    "path": "doc/whats_new/v0.14.rst",
    "chars": 732,
    "preview": ".. _changes_0_14:\n\nVersion 0.14.1\n==============\n\n**December 21, 2025**\n\nChangelog\n---------\n\nMaintenance\n...........\n\n-"
  },
  {
    "path": "doc/whats_new/v0.15.rst",
    "chars": 223,
    "preview": ".. _changes_0_15:\n\nVersion 0.15.dev0 (In development)\n==================================\n\n**TBD**\n\nChangelog\n---------\n\n"
  },
  {
    "path": "doc/whats_new/v0.2.rst",
    "chars": 6068,
    "preview": ".. _changes_0_2:\n\nVersion 0.2\n===========\n\n**January 1, 2017**\n\nChangelog\n---------\n\nBug fixes\n~~~~~~~~~\n\n- Fixed a bug "
  },
  {
    "path": "doc/whats_new/v0.3.rst",
    "chars": 3388,
    "preview": ".. _changes_0_3:\n\nVersion 0.3\n===========\n\n**February 22, 2018**\n\nChangelog\n---------\n\nTesting\n~~~~~~~\n- Pytest is used "
  },
  {
    "path": "doc/whats_new/v0.4.rst",
    "chars": 8625,
    "preview": ".. _changes_0_4:\n\nVersion 0.4.2\n=============\n\n**October 21, 2018**\n\nChangelog\n---------\n\nBug fixes\n.........\n\n- Fix a b"
  },
  {
    "path": "doc/whats_new/v0.5.rst",
    "chars": 2893,
    "preview": ".. _changes_0_5:\n\nVersion 0.5.0\n=============\n\n**June 28, 2019**\n\nChangelog\n---------\n\nChanged models\n..............\n\nTh"
  },
  {
    "path": "doc/whats_new/v0.6.rst",
    "chars": 4967,
    "preview": ".. _changes_0_6_2:\n\nVersion 0.6.2\n==============\n\n**February 16, 2020**\n\nThis is a bug-fix release to resolve some issue"
  },
  {
    "path": "doc/whats_new/v0.7.rst",
    "chars": 2436,
    "preview": ".. _changes_0_7:\n\nVersion 0.7.0\n=============\n\n**June 9, 2020**\n\nChangelog\n---------\n\nMaintenance\n...........\n\n- Ensure "
  },
  {
    "path": "doc/whats_new/v0.8.rst",
    "chars": 2675,
    "preview": ".. _changes_0_8:\n\nVersion 0.8.1\n=============\n\n**September 29, 2020**\n\nChangelog\n---------\n\nMaintenance\n...........\n\n- M"
  },
  {
    "path": "doc/whats_new/v0.9.rst",
    "chars": 402,
    "preview": ".. _changes_0_9:\n\nVersion 0.9.1\n=============\n\n**May 16, 2022**\n\nChangelog\n---------\n\nThis release provides fixes that m"
  },
  {
    "path": "doc/whats_new.rst",
    "chars": 578,
    "preview": ".. currentmodule:: imblearn\n\n===============\nRelease history\n===============\n\n.. include:: whats_new/v0.15.rst\n\n.. inclu"
  },
  {
    "path": "doc/zzz_references.rst",
    "chars": 68,
    "preview": "==========\nReferences\n==========\n\n.. bibliography:: bibtex/refs.bib\n"
  },
  {
    "path": "examples/README.txt",
    "chars": 120,
    "preview": ".. _general_examples:\n\nExamples\n--------\n\nGeneral-purpose and introductory examples for the `imbalanced-learn` toolbox.\n"
  },
  {
    "path": "examples/api/README.txt",
    "chars": 176,
    "preview": ".. _api_usage:\n\nExamples showing API imbalanced-learn usage\n-------------------------------------------\n\nExamples that s"
  },
  {
    "path": "examples/api/plot_sampling_strategy_usage.py",
    "chars": 6100,
    "preview": "\"\"\"\n====================================================\nHow to use ``sampling_strategy`` in imbalanced-learn\n=========="
  },
  {
    "path": "examples/applications/README.txt",
    "chars": 139,
    "preview": ".. _realword_examples:\n\nExamples based on real world datasets\n-------------------------------------\n\nExamples which use "
  },
  {
    "path": "examples/applications/plot_impact_imbalanced_classes.py",
    "chars": 12376,
    "preview": "\"\"\"\n==========================================================\nFitting model on imbalanced datasets and how to fight bia"
  },
  {
    "path": "examples/applications/plot_multi_class_under_sampling.py",
    "chars": 1494,
    "preview": "\"\"\"\n=============================================\nMulticlass classification with under-sampling\n========================"
  },
  {
    "path": "examples/applications/plot_outlier_rejections.py",
    "chars": 4281,
    "preview": "\"\"\"\n===============================================================\nCustomized sampler to implement an outlier rejection"
  },
  {
    "path": "examples/applications/plot_over_sampling_benchmark_lfw.py",
    "chars": 4758,
    "preview": "\"\"\"\n==========================================================\nBenchmark over-sampling methods in a face recognition tas"
  },
  {
    "path": "examples/applications/plot_topic_classication.py",
    "chars": 3358,
    "preview": "\"\"\"\n=================================================\nExample of topic classification in text documents\n================"
  },
  {
    "path": "examples/applications/porto_seguro_keras_under_sampling.py",
    "chars": 8753,
    "preview": "\"\"\"\n==========================================================\nPorto Seguro: balancing samples in mini-batches with Kera"
  },
  {
    "path": "examples/combine/README.txt",
    "chars": 278,
    "preview": ".. _combine_examples:\n\nExamples using combine class methods\n====================================\n\nCombine methods mixed "
  },
  {
    "path": "examples/combine/plot_comparison_combine.py",
    "chars": 3820,
    "preview": "\"\"\"\n==================================================\nCompare sampler combining over- and under-sampling\n=============="
  },
  {
    "path": "examples/datasets/README.txt",
    "chars": 122,
    "preview": ".. _dataset_examples:\n\nDataset examples\n-----------------------\n\nExamples concerning the :mod:`imblearn.datasets` module"
  },
  {
    "path": "examples/datasets/plot_make_imbalance.py",
    "chars": 2474,
    "preview": "\"\"\"\n============================\nCreate an imbalanced dataset\n============================\n\nAn illustration of the :func"
  },
  {
    "path": "examples/ensemble/README.txt",
    "chars": 361,
    "preview": ".. _ensemble_examples:\n\nExample using ensemble class methods\n====================================\n\nUnder-sampling method"
  },
  {
    "path": "examples/ensemble/plot_bagging_classifier.py",
    "chars": 6020,
    "preview": "\"\"\"\n=================================\nBagging classifiers using sampler\n=================================\n\nIn this examp"
  },
  {
    "path": "examples/ensemble/plot_comparison_ensemble_classifier.py",
    "chars": 7338,
    "preview": "\"\"\"\n=============================================\nCompare ensemble classifiers using resampling\n========================"
  },
  {
    "path": "examples/evaluation/README.txt",
    "chars": 146,
    "preview": ".. _evaluation_examples:\n\nEvaluation examples\n-------------------\n\nExamples illustrating how classification using imbala"
  },
  {
    "path": "examples/evaluation/plot_classification_report.py",
    "chars": 1584,
    "preview": "\"\"\"\n=============================================\nEvaluate classification by compiling a report\n========================"
  },
  {
    "path": "examples/evaluation/plot_metrics.py",
    "chars": 2900,
    "preview": "\"\"\"\n=======================================\nMetrics specific to imbalanced learning\n===================================="
  },
  {
    "path": "examples/model_selection/README.txt",
    "chars": 120,
    "preview": ".. _model_selection_examples:\n\nModel Selection\n---------------\n\nExamples related to the selection of balancing methods.\n"
  },
  {
    "path": "examples/model_selection/plot_instance_hardness_cv.py",
    "chars": 3704,
    "preview": "\"\"\"\n====================================================\nDistribute hard-to-classify datapoints over CV folds\n=========="
  },
  {
    "path": "examples/model_selection/plot_validation_curve.py",
    "chars": 3157,
    "preview": "\"\"\"\n==========================\nPlotting Validation Curves\n==========================\n\nIn this example the impact of the "
  },
  {
    "path": "examples/over-sampling/README.txt",
    "chars": 255,
    "preview": ".. _over_sampling_examples:\n\nExample using over-sampling class methods\n=========================================\n\nData b"
  },
  {
    "path": "examples/over-sampling/plot_comparison_over_sampling.py",
    "chars": 10987,
    "preview": "\"\"\"\n==============================\nCompare over-sampling samplers\n==============================\n\nThe following example "
  },
  {
    "path": "examples/over-sampling/plot_illustration_generation_sample.py",
    "chars": 2010,
    "preview": "\"\"\"\n============================================\nSample generator used in SMOTE-like samplers\n=========================="
  },
  {
    "path": "examples/over-sampling/plot_shrinkage_effect.py",
    "chars": 3956,
    "preview": "\"\"\"\n======================================================\nEffect of the shrinkage factor in random over-sampling\n======"
  },
  {
    "path": "examples/pipeline/README.txt",
    "chars": 156,
    "preview": ".. _pipeline_examples:\n\nPipeline examples\n=================\n\nExample of how to use the a pipeline to include under-sampl"
  },
  {
    "path": "examples/pipeline/plot_pipeline_classification.py",
    "chars": 2006,
    "preview": "\"\"\"\n====================================\nUsage of pipeline embedding samplers\n====================================\n\nAn e"
  },
  {
    "path": "examples/under-sampling/README.txt",
    "chars": 330,
    "preview": ".. _under_sampling_examples:\n\nExample using under-sampling class methods\n==========================================\n\nUnd"
  },
  {
    "path": "examples/under-sampling/plot_comparison_under_sampling.py",
    "chars": 9707,
    "preview": "\"\"\"\n===============================\nCompare under-sampling samplers\n===============================\n\nThe following examp"
  },
  {
    "path": "examples/under-sampling/plot_illustration_nearmiss.py",
    "chars": 5767,
    "preview": "\"\"\"\n============================\nSample selection in NearMiss\n============================\n\nThis example illustrates the"
  },
  {
    "path": "examples/under-sampling/plot_illustration_tomek_links.py",
    "chars": 3180,
    "preview": "\"\"\"\n==============================================\nIllustration of the definition of a Tomek link\n======================"
  },
  {
    "path": "imblearn/VERSION.txt",
    "chars": 10,
    "preview": "0.15.dev0\n"
  },
  {
    "path": "imblearn/__init__.py",
    "chars": 4100,
    "preview": "\"\"\"Toolbox for imbalanced dataset in machine learning.\n\n``imbalanced-learn`` is a set of python methods to deal with imb"
  },
  {
    "path": "imblearn/_version.py",
    "chars": 719,
    "preview": "\"\"\"\n``imbalanced-learn`` is a set of python methods to deal with imbalanced\ndatset in machine learning and pattern recog"
  },
  {
    "path": "imblearn/base.py",
    "chars": 13589,
    "preview": "\"\"\"Base class for sampling\"\"\"\n\n# Authors: Guillaume Lemaitre <g.lemaitre58@gmail.com>\n#          Christos Aridas\n# Licen"
  },
  {
    "path": "imblearn/combine/__init__.py",
    "chars": 241,
    "preview": "\"\"\"The :mod:`imblearn.combine` provides methods which combine\nover-sampling and under-sampling.\n\"\"\"\n\nfrom imblearn.combi"
  },
  {
    "path": "imblearn/combine/_smote_enn.py",
    "chars": 5096,
    "preview": "\"\"\"Class to perform over-sampling using SMOTE and cleaning using ENN.\"\"\"\n\n# Authors: Guillaume Lemaitre <g.lemaitre58@gm"
  },
  {
    "path": "imblearn/combine/_smote_tomek.py",
    "chars": 4954,
    "preview": "\"\"\"Class to perform over-sampling using SMOTE and cleaning using Tomek\nlinks.\"\"\"\n\n# Authors: Guillaume Lemaitre <g.lemai"
  },
  {
    "path": "imblearn/combine/tests/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "imblearn/combine/tests/test_smote_enn.py",
    "chars": 4729,
    "preview": "\"\"\"Test the module SMOTE ENN.\"\"\"\n# Authors: Guillaume Lemaitre <g.lemaitre58@gmail.com>\n#          Christos Aridas\n# Lic"
  },
  {
    "path": "imblearn/combine/tests/test_smote_tomek.py",
    "chars": 5505,
    "preview": "\"\"\"Test the module SMOTE ENN.\"\"\"\n# Authors: Guillaume Lemaitre <g.lemaitre58@gmail.com>\n#          Christos Aridas\n# Lic"
  },
  {
    "path": "imblearn/datasets/__init__.py",
    "chars": 241,
    "preview": "\"\"\"\nThe :mod:`imblearn.datasets` provides methods to generate\nimbalanced data.\n\"\"\"\n\nfrom imblearn.datasets._imbalance im"
  },
  {
    "path": "imblearn/datasets/_imbalance.py",
    "chars": 4148,
    "preview": "\"\"\"Transform a dataset into an imbalanced dataset.\"\"\"\n\n# Authors: Dayvid Oliveira\n#          Guillaume Lemaitre <g.lemai"
  },
  {
    "path": "imblearn/datasets/_zenodo.py",
    "chars": 13176,
    "preview": "\"\"\"Collection of imbalanced datasets.\n\nThis collection of datasets has been proposed in [1]_. The\ncharacteristics of the"
  },
  {
    "path": "imblearn/datasets/tests/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "imblearn/datasets/tests/test_imbalance.py",
    "chars": 2518,
    "preview": "\"\"\"Test the module easy ensemble.\"\"\"\n# Authors: Guillaume Lemaitre <g.lemaitre58@gmail.com>\n#          Christos Aridas\n#"
  },
  {
    "path": "imblearn/datasets/tests/test_zenodo.py",
    "chars": 2773,
    "preview": "\"\"\"Test the datasets loader.\n\nSkipped if datasets is not already downloaded to data_home.\n\"\"\"\n# Authors: Guillaume Lemai"
  },
  {
    "path": "imblearn/ensemble/__init__.py",
    "chars": 533,
    "preview": "\"\"\"\nThe :mod:`imblearn.ensemble` module include methods generating\nunder-sampled subsets combined inside an ensemble.\n\"\""
  },
  {
    "path": "imblearn/ensemble/_bagging.py",
    "chars": 13016,
    "preview": "\"\"\"Bagging classifier trained on balanced bootstrap samples.\"\"\"\n\n# Authors: Guillaume Lemaitre <g.lemaitre58@gmail.com>\n"
  },
  {
    "path": "imblearn/ensemble/_common.py",
    "chars": 3249,
    "preview": "from numbers import Integral, Real\n\nfrom sklearn.tree._criterion import Criterion\nfrom sklearn.utils._param_validation i"
  },
  {
    "path": "imblearn/ensemble/_easy_ensemble.py",
    "chars": 9918,
    "preview": "\"\"\"Class to perform under-sampling using easy ensemble.\"\"\"\n\n# Authors: Guillaume Lemaitre <g.lemaitre58@gmail.com>\n#    "
  },
  {
    "path": "imblearn/ensemble/_forest.py",
    "chars": 32041,
    "preview": "\"\"\"Forest classifiers trained on balanced boostrasp samples.\"\"\"\n\n# Authors: Guillaume Lemaitre <g.lemaitre58@gmail.com>\n"
  },
  {
    "path": "imblearn/ensemble/_weight_boosting.py",
    "chars": 15248,
    "preview": "import copy\nimport numbers\nimport warnings\nfrom copy import deepcopy\n\nimport numpy as np\nfrom sklearn.base import clone\n"
  },
  {
    "path": "imblearn/ensemble/tests/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "imblearn/ensemble/tests/test_bagging.py",
    "chars": 18756,
    "preview": "\"\"\"Test the module ensemble classifiers.\"\"\"\n# Authors: Guillaume Lemaitre <g.lemaitre58@gmail.com>\n#          Christos A"
  },
  {
    "path": "imblearn/ensemble/tests/test_easy_ensemble.py",
    "chars": 7033,
    "preview": "\"\"\"Test the module easy ensemble.\"\"\"\n# Authors: Guillaume Lemaitre <g.lemaitre58@gmail.com>\n#          Christos Aridas\n#"
  },
  {
    "path": "imblearn/ensemble/tests/test_forest.py",
    "chars": 10041,
    "preview": "import numpy as np\nimport pytest\nfrom sklearn.datasets import make_classification\nfrom sklearn.model_selection import Gr"
  },
  {
    "path": "imblearn/ensemble/tests/test_weight_boosting.py",
    "chars": 3195,
    "preview": "import numpy as np\nimport pytest\nfrom sklearn.datasets import make_classification\nfrom sklearn.model_selection import tr"
  },
  {
    "path": "imblearn/exceptions.py",
    "chars": 785,
    "preview": "\"\"\"\nThe :mod:`imblearn.exceptions` module includes all custom warnings and error\nclasses and functions used across imbal"
  },
  {
    "path": "imblearn/keras/__init__.py",
    "chars": 247,
    "preview": "\"\"\"The :mod:`imblearn.keras` provides utilities to deal with imbalanced dataset\nin keras.\"\"\"\n\nfrom imblearn.keras._gener"
  },
  {
    "path": "imblearn/keras/_generator.py",
    "chars": 10284,
    "preview": "\"\"\"Implement generators for ``keras`` which will balance the data.\"\"\"\n\n\n# This is a trick to avoid an error during tests"
  },
  {
    "path": "imblearn/keras/tests/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "imblearn/keras/tests/test_generator.py",
    "chars": 4306,
    "preview": "import numpy as np\nimport pytest\nfrom scipy import sparse\nfrom sklearn.cluster import KMeans\nfrom sklearn.datasets impor"
  },
  {
    "path": "imblearn/metrics/__init__.py",
    "chars": 658,
    "preview": "\"\"\"\nThe :mod:`imblearn.metrics` module includes score functions, performance\nmetrics and pairwise metrics and distance c"
  },
  {
    "path": "imblearn/metrics/_classification.py",
    "chars": 40155,
    "preview": "\"\"\"Metrics to assess performance on a classification task given class\npredictions. The available metrics are complementa"
  },
  {
    "path": "imblearn/metrics/pairwise.py",
    "chars": 8975,
    "preview": "\"\"\"Metrics to perform pairwise computation.\"\"\"\n\n# Authors: Guillaume Lemaitre <g.lemaitre58@gmail.com>\n# License: MIT\n\ni"
  },
  {
    "path": "imblearn/metrics/tests/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "imblearn/metrics/tests/test_classification.py",
    "chars": 17788,
    "preview": "\"\"\"Testing the metric for classification with imbalanced dataset\"\"\"\n# Authors: Guillaume Lemaitre <g.lemaitre58@gmail.co"
  },
  {
    "path": "imblearn/metrics/tests/test_pairwise.py",
    "chars": 6395,
    "preview": "\"\"\"Test for the metrics that perform pairwise distance computation.\"\"\"\n\n# Authors: Guillaume Lemaitre <g.lemaitre58@gmai"
  },
  {
    "path": "imblearn/metrics/tests/test_score_objects.py",
    "chars": 2091,
    "preview": "\"\"\"Test for score\"\"\"\n# Authors: Guillaume Lemaitre <g.lemaitre58@gmail.com>\n#          Christos Aridas\n# License: MIT\n\ni"
  },
  {
    "path": "imblearn/model_selection/__init__.py",
    "chars": 209,
    "preview": "\"\"\"\nThe :mod:`imblearn.model_selection` provides methods to split the dataset into\ntraining and test sets.\n\"\"\"\n\nfrom imb"
  },
  {
    "path": "imblearn/model_selection/_split.py",
    "chars": 4402,
    "preview": "import warnings\n\nimport numpy as np\nfrom sklearn.base import clone\nfrom sklearn.model_selection import LeaveOneGroupOut,"
  },
  {
    "path": "imblearn/model_selection/tests/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "imblearn/model_selection/tests/test_split.py",
    "chars": 3344,
    "preview": "import numpy as np\nimport pytest\nfrom sklearn.datasets import make_classification\nfrom sklearn.linear_model import Logis"
  },
  {
    "path": "imblearn/over_sampling/__init__.py",
    "chars": 506,
    "preview": "\"\"\"\nThe :mod:`imblearn.over_sampling` provides a set of method to\nperform over-sampling.\n\"\"\"\n\nfrom imblearn.over_samplin"
  },
  {
    "path": "imblearn/over_sampling/_adasyn.py",
    "chars": 7851,
    "preview": "\"\"\"Class to perform over-sampling using ADASYN.\"\"\"\n\n# Authors: Guillaume Lemaitre <g.lemaitre58@gmail.com>\n#          Ch"
  },
  {
    "path": "imblearn/over_sampling/_random_over_sampler.py",
    "chars": 9855,
    "preview": "\"\"\"Class to perform random over-sampling.\"\"\"\n\n# Authors: Guillaume Lemaitre <g.lemaitre58@gmail.com>\n#          Christos"
  },
  {
    "path": "imblearn/over_sampling/_smote/__init__.py",
    "chars": 322,
    "preview": "from imblearn.over_sampling._smote.base import SMOTE, SMOTEN, SMOTENC\nfrom imblearn.over_sampling._smote.cluster import "
  },
  {
    "path": "imblearn/over_sampling/_smote/base.py",
    "chars": 36814,
    "preview": "\"\"\"Base class and original SMOTE methods for over-sampling\"\"\"\n\n# Authors: Guillaume Lemaitre <g.lemaitre58@gmail.com>\n# "
  },
  {
    "path": "imblearn/over_sampling/_smote/cluster.py",
    "chars": 11154,
    "preview": "\"\"\"SMOTE variant employing some clustering before the generation.\"\"\"\n\n# Authors: Guillaume Lemaitre <g.lemaitre58@gmail."
  },
  {
    "path": "imblearn/over_sampling/_smote/filter.py",
    "chars": 18469,
    "preview": "\"\"\"SMOTE variant applying some filtering before the generation process.\"\"\"\n\n# Authors: Guillaume Lemaitre <g.lemaitre58@"
  },
  {
    "path": "imblearn/over_sampling/_smote/tests/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "imblearn/over_sampling/_smote/tests/test_borderline_smote.py",
    "chars": 3490,
    "preview": "from collections import Counter\n\nimport pytest\nfrom sklearn.datasets import make_classification\nfrom sklearn.linear_mode"
  },
  {
    "path": "imblearn/over_sampling/_smote/tests/test_kmeans_smote.py",
    "chars": 3632,
    "preview": "import numpy as np\nimport pytest\nfrom sklearn.cluster import KMeans, MiniBatchKMeans\nfrom sklearn.datasets import make_c"
  },
  {
    "path": "imblearn/over_sampling/_smote/tests/test_smote.py",
    "chars": 5046,
    "preview": "\"\"\"Test the module SMOTE.\"\"\"\n# Authors: Guillaume Lemaitre <g.lemaitre58@gmail.com>\n#          Christos Aridas\n# License"
  },
  {
    "path": "imblearn/over_sampling/_smote/tests/test_smote_nc.py",
    "chars": 13426,
    "preview": "\"\"\"Test the module SMOTENC.\"\"\"\n# Authors: Guillaume Lemaitre <g.lemaitre58@gmail.com>\n#          Christos Aridas\n#      "
  },
  {
    "path": "imblearn/over_sampling/_smote/tests/test_smoten.py",
    "chars": 3241,
    "preview": "import numpy as np\nimport pytest\nfrom sklearn.exceptions import DataConversionWarning\nfrom sklearn.preprocessing import "
  },
  {
    "path": "imblearn/over_sampling/_smote/tests/test_svm_smote.py",
    "chars": 2860,
    "preview": "import numpy as np\nimport pytest\nfrom sklearn.datasets import make_classification\nfrom sklearn.linear_model import Logis"
  },
  {
    "path": "imblearn/over_sampling/base.py",
    "chars": 2549,
    "preview": "\"\"\"\nBase class for the over-sampling method.\n\"\"\"\n# Authors: Guillaume Lemaitre <g.lemaitre58@gmail.com>\n#          Chris"
  },
  {
    "path": "imblearn/over_sampling/tests/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "imblearn/over_sampling/tests/test_adasyn.py",
    "chars": 3969,
    "preview": "\"\"\"Test the module under sampler.\"\"\"\n# Authors: Guillaume Lemaitre <g.lemaitre58@gmail.com>\n#          Christos Aridas\n#"
  },
  {
    "path": "imblearn/over_sampling/tests/test_common.py",
    "chars": 3662,
    "preview": "from collections import Counter\n\nimport numpy as np\nimport pytest\nfrom sklearn.cluster import MiniBatchKMeans\n\nfrom imbl"
  },
  {
    "path": "imblearn/over_sampling/tests/test_random_over_sampler.py",
    "chars": 10010,
    "preview": "\"\"\"Test the module under sampler.\"\"\"\n# Authors: Guillaume Lemaitre <g.lemaitre58@gmail.com>\n#          Christos Aridas\n#"
  },
  {
    "path": "imblearn/pipeline.py",
    "chars": 58645,
    "preview": "\"\"\"\nThe :mod:`imblearn.pipeline` module implements utilities to build a\ncomposite estimator, as a chain of transforms, s"
  },
  {
    "path": "imblearn/tensorflow/__init__.py",
    "chars": 212,
    "preview": "\"\"\"The :mod:`imblearn.tensorflow` provides utilities to deal with imbalanced\ndataset in tensorflow.\"\"\"\n\nfrom imblearn.te"
  },
  {
    "path": "imblearn/tensorflow/_generator.py",
    "chars": 3315,
    "preview": "\"\"\"Implement generators for ``tensorflow`` which will balance the data.\"\"\"\n\nfrom scipy.sparse import issparse\nfrom sklea"
  },
  {
    "path": "imblearn/tensorflow/tests/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "imblearn/tensorflow/tests/test_generator.py",
    "chars": 5428,
    "preview": "import numpy as np\nimport pytest\nfrom scipy import sparse\nfrom sklearn.datasets import load_iris\nfrom sklearn.utils.fixe"
  },
  {
    "path": "imblearn/tests/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "imblearn/tests/test_base.py",
    "chars": 3397,
    "preview": "\"\"\"Test for miscellaneous samplers objects.\"\"\"\n\n# Authors: Guillaume Lemaitre <g.lemaitre58@gmail.com>\n# License: MIT\n\ni"
  },
  {
    "path": "imblearn/tests/test_common.py",
    "chars": 3427,
    "preview": "\"\"\"Common tests\"\"\"\n\n# Authors: Guillaume Lemaitre <g.lemaitre58@gmail.com>\n#          Christos Aridas\n# License: MIT\n\nim"
  },
  {
    "path": "imblearn/tests/test_docstring_parameters.py",
    "chars": 8284,
    "preview": "# Authors: Alexandre Gramfort <alexandre.gramfort@inria.fr>\n#          Raghav RV <rvraghav93@gmail.com>\n# License: BSD 3"
  },
  {
    "path": "imblearn/tests/test_exceptions.py",
    "chars": 375,
    "preview": "\"\"\"Test for the exceptions modules\"\"\"\n# Authors: Guillaume Lemaitre <g.lemaitre58@gmail.com>\n#          Christos Aridas\n"
  },
  {
    "path": "imblearn/tests/test_pipeline.py",
    "chars": 48804,
    "preview": "\"\"\"\nTest the pipeline module.\n\"\"\"\n\n# Authors: Guillaume Lemaitre <g.lemaitre58@gmail.com>\n#          Christos Aridas\n# L"
  },
  {
    "path": "imblearn/tests/test_public_functions.py",
    "chars": 4036,
    "preview": "\"\"\"This is a copy of sklearn/tests/test_public_functions.py. It can be\nremoved when we support scikit-learn >= 1.2.\n\"\"\"\n"
  },
  {
    "path": "imblearn/under_sampling/__init__.py",
    "chars": 779,
    "preview": "\"\"\"\nThe :mod:`imblearn.under_sampling` provides methods to under-sample\na dataset.\n\"\"\"\n\nfrom imblearn.under_sampling._pr"
  },
  {
    "path": "imblearn/under_sampling/_prototype_generation/__init__.py",
    "chars": 286,
    "preview": "\"\"\"\nThe :mod:`imblearn.under_sampling.prototype_generation` submodule contains\nmethods that generate new samples in orde"
  },
  {
    "path": "imblearn/under_sampling/_prototype_generation/_cluster_centroids.py",
    "chars": 7711,
    "preview": "\"\"\"Class to perform under-sampling by generating centroids based on\nclustering.\"\"\"\n\n# Authors: Guillaume Lemaitre <g.lem"
  },
  {
    "path": "imblearn/under_sampling/_prototype_generation/tests/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "imblearn/under_sampling/_prototype_generation/tests/test_cluster_centroids.py",
    "chars": 5289,
    "preview": "\"\"\"Test the module cluster centroids.\"\"\"\nfrom collections import Counter\n\nimport numpy as np\nimport pytest\nfrom scipy im"
  },
  {
    "path": "imblearn/under_sampling/_prototype_selection/__init__.py",
    "chars": 1325,
    "preview": "\"\"\"\nThe :mod:`imblearn.under_sampling.prototype_selection` submodule contains\nmethods that select samples in order to ba"
  }
]

// ... and 42 more files (download for full content)

About this extraction

This page contains the full source code of the scikit-learn-contrib/imbalanced-learn GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 242 files (1.1 MB), approximately 306.1k tokens, and a symbol index with 786 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo