Showing preview only (1,921K chars total). Download the full file or copy to clipboard to get everything.
Repository: joshlk/k-means-constrained
Branch: master
Commit: 5465da91605b
Files: 91
Total size: 1.8 MB
Directory structure:
gitextract_2ogob1oo/
├── .bumpversion.cfg
├── .github/
│ ├── ISSUE_TEMPLATE/
│ │ └── bug_report.md
│ └── workflows/
│ └── build_wheels.yml
├── .gitignore
├── CITATION.cff
├── CLAUDE.md
├── LICENSE
├── MANIFEST.in
├── Makefile
├── README.md
├── README_dev.md
├── docs/
│ ├── .buildinfo
│ ├── .doctrees/
│ │ ├── environment.pickle
│ │ ├── index.doctree
│ │ └── modules.doctree
│ ├── .nojekyll
│ ├── _modules/
│ │ ├── index.html
│ │ ├── k_means_constrained/
│ │ │ ├── k_means_constrained_.html
│ │ │ ├── sklearn_cluster/
│ │ │ │ └── k_means_.html
│ │ │ └── sklearn_import/
│ │ │ ├── base.html
│ │ │ └── cluster/
│ │ │ └── k_means_.html
│ │ └── sklearn/
│ │ └── base.html
│ ├── _sources/
│ │ ├── index.rst.txt
│ │ └── modules.rst.txt
│ ├── _static/
│ │ ├── alabaster.css
│ │ ├── basic.css
│ │ ├── css/
│ │ │ ├── badge_only.css
│ │ │ └── theme.css
│ │ ├── custom.css
│ │ ├── doctools.js
│ │ ├── documentation_options.js
│ │ ├── fonts/
│ │ │ └── FontAwesome.otf
│ │ ├── jquery-3.4.1.js
│ │ ├── jquery-3.5.1.js
│ │ ├── jquery.js
│ │ ├── js/
│ │ │ ├── badge_only.js
│ │ │ └── theme.js
│ │ ├── language_data.js
│ │ ├── pygments.css
│ │ ├── searchtools.js
│ │ ├── underscore-1.12.0.js
│ │ ├── underscore-1.3.1.js
│ │ └── underscore.js
│ ├── genindex.html
│ ├── index.html
│ ├── modules.html
│ ├── objects.inv
│ ├── py-modindex.html
│ ├── search.html
│ └── searchindex.js
├── docs_source/
│ ├── Makefile
│ ├── README.md
│ ├── conf.py
│ ├── index.rst
│ └── make.bat
├── etc/
│ ├── benchmark.ipynb
│ ├── benchmark_k_means.py
│ ├── benchmark_k_means_constrained.py
│ └── cython_benchmark.ipynb
├── k_means_constrained/
│ ├── __init__.py
│ ├── k_means_constrained_.py
│ └── sklearn_import/
│ ├── README
│ ├── __init__.py
│ ├── base.py
│ ├── cluster/
│ │ ├── __init__.py
│ │ ├── _k_means.pyx
│ │ └── k_means_.py
│ ├── exceptions.py
│ ├── externals/
│ │ ├── __init__.py
│ │ └── funcsigs.py
│ ├── fixes.py
│ ├── funcsigs.py
│ ├── metrics/
│ │ ├── __init__.py
│ │ ├── pairwise.py
│ │ └── pairwise_fast.pyx
│ ├── preprocessing/
│ │ ├── __init__.py
│ │ └── data.py
│ └── utils/
│ ├── __init__.py
│ ├── extmath.py
│ ├── fixes.py
│ ├── sparsefuncs.py
│ ├── sparsefuncs_fast.pyx
│ └── validation.py
├── pyproject.toml
├── requirements-dev.txt
├── requirements.txt
├── setup.cfg
├── setup.py
├── tests/
│ ├── test_k_means_constrained_.py
│ └── test_kmeans_constrained_from_sklearn.py
└── tox.ini
================================================
FILE CONTENTS
================================================
================================================
FILE: .bumpversion.cfg
================================================
[bumpversion]
current_version = 0.9.0
commit = True
tag = True
[bumpversion:file:setup.cfg]
[bumpversion:file:k_means_constrained/__init__.py]
================================================
FILE: .github/ISSUE_TEMPLATE/bug_report.md
================================================
---
name: Bug report
about: Create a report to help us improve
title: "[BUG]"
labels: ''
assignees: ''
---
**Describe the bug**
A clear and concise description of what the bug is.
**Minimum working example**
Code and minimum data to reproduce the error. The example should be copy and pastable to reproduce the problem.
**Versions:**
- Python:
- Operating system: [Windows/MacOS/Linux]
- k-means-constrained:
- numpy:
- scipy:
- ortools:
- joblib:
- cython (if installed):
================================================
FILE: .github/workflows/build_wheels.yml
================================================
name: Build & Test
on:
schedule:
- cron: '0 1 * * 4' # Runs every Thursday at 1 AM (UTC)
pull_request:
push:
branches:
- master
workflow_dispatch:
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
build_wheels:
name: ${{ matrix.os }}-${{ matrix.python }}
runs-on: ${{ matrix.os }}
strategy:
matrix:
# macos-15-intel is an intel runner, macos-latest is apple silicon
# windows-11-arm is currently not supported by the dependencies
os: [ubuntu-latest, ubuntu-24.04-arm, windows-latest, macos-15-intel, macos-latest]
python: [cp310, cp311, cp312, cp313, cp314]
fail-fast: false
steps:
- uses: actions/checkout@v4
- name: Build and test wheels
uses: pypa/cibuildwheel@v3.3.1
env:
# Build
# NOTE: build dependencies are defined in pyproject.toml (including numpy version)
CIBW_BUILD_FRONTEND: "build"
CIBW_ARCHS: native # Build only for the native architecture of the build machine
CIBW_BUILD: "${{ matrix.python }}*" # Build only for the specified Python version
CIBW_SKIP: "*musllinux* cp3*t-*" # Don't build musllinux (ortools) or free-threaded wheels
# Test
CIBW_BEFORE_TEST: "python -m pip install --upgrade pip && python -m pip install -r requirements-dev.txt && python -m pip list"
CIBW_TEST_COMMAND: "pytest {project}/tests"
- uses: actions/upload-artifact@v4
with:
path: ./wheelhouse/*.whl
name: ${{ matrix.os }}-${{ matrix.python }}
merge:
runs-on: ubuntu-latest
needs: build_wheels
steps:
- uses: actions/upload-artifact/merge@v4
with:
name: wheels
delete-merged: true
================================================
FILE: .gitignore
================================================
# Created by .ignore support plugin (hsz.mobi)
### Python template
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
.pytest_cache/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
X_docs/_build/
# PyBuilder
target/
# Jupyter Notebook
.ipynb_checkpoints
# pyenv
.python-version
# celery beat schedule file
celerybeat-schedule
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
### JetBrains template
# Covers JetBrains IDEs: IntelliJ, RubyMine, PhpStorm, AppCode, PyCharm, CLion, Android Studio and WebStorm
# Reference: https://intellij-support.jetbrains.com/hc/en-us/articles/206544839
# User-specific stuff
.idea/**/workspace.xml
.idea/**/tasks.xml
.idea/**/dictionaries
.idea/**/shelf
# Sensitive or high-churn files
.idea/**/dataSources/
.idea/**/dataSources.ids
.idea/**/dataSources.local.xml
.idea/**/sqlDataSources.xml
.idea/**/dynamic.xml
.idea/**/uiDesigner.xml
# Gradle
.idea/**/gradle.xml
.idea/**/libraries
# CMake
cmake-build-debug/
cmake-build-release/
# Mongo Explorer plugin
.idea/**/mongoSettings.xml
# File-based project format
*.iws
# IntelliJ
out/
# mpeltonen/sbt-idea plugin
.idea_modules/
# JIRA plugin
atlassian-ide-plugin.xml
# Cursive Clojure plugin
.idea/replstate.xml
# Crashlytics plugin (for Android Studio and IntelliJ)
com_crashlytics_export_strings.xml
crashlytics.properties
crashlytics-build.properties
fabric.properties
# Editor-based Rest Client
.idea/httpRequests
### Example user template template
### Example user template
# IntelliJ project files
.idea
*.iml
out
gen
/k_means_constrained/mincostflow_vectorized_.c
/k_means_constrained/sklearn_import/cluster/_k_means.c
/docs_source/_build/
k-means-env/*
/k_means_constrained/sklearn_import/metrics/pairwise_fast.c
/k_means_constrained/sklearn_import/utils/sparsefuncs_fast.c
.DS_Store
.DS_Store/*
*.code-workspace
artifact/*
================================================
FILE: CITATION.cff
================================================
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Levy-Kramer"
given-names: "Josh"
orcid: "https://orcid.org/0000-0002-4350-6197"
title: "k-means-constrained"
date-released: 2018-04-23
url: "https://github.com/joshlk/k-means-constrained"
================================================
FILE: CLAUDE.md
================================================
# CLAUDE.md
This file provides guidance for AI assistants working with the k-means-constrained codebase.
## Project Overview
**k-means-constrained** is a Python library implementing K-means clustering with minimum and/or maximum cluster size constraints. It extends scikit-learn's KMeans API by formulating the constrained assignment step (E-step) as a Minimum Cost Flow (MCF) network optimization problem, solved using Google OR-Tools' `SimpleMinCostFlow`.
- **Author:** Josh Levy-Kramer
- **License:** BSD 3-Clause
- **Version:** 0.9.0
- **Python support:** 3.10, 3.11, 3.12, 3.13, 3.14
## Repository Structure
```
k_means_constrained/ # Main package
├── __init__.py # Exports KMeansConstrained, defines __version__
├── k_means_constrained_.py # Core algorithm implementation
└── sklearn_import/ # Vendored scikit-learn code (modified)
├── base.py # BaseEstimator, ClusterMixin, TransformerMixin
├── exceptions.py
├── cluster/
│ ├── _k_means.pyx # Cython: M-step center computation
│ └── k_means_.py # KMeans base class, k-means++ init
├── metrics/
│ ├── pairwise.py # Distance computations
│ └── pairwise_fast.pyx # Cython: optimized pairwise distances
├── utils/
│ ├── extmath.py # row_norms, squared_norm
│ ├── validation.py # Input validation (check_array, etc.)
│ └── sparsefuncs_fast.pyx # Cython: sparse matrix operations
└── preprocessing/
tests/
├── test_k_means_constrained_.py # Core algorithm tests
└── test_kmeans_constrained_from_sklearn.py # Sklearn-adapted tests
etc/ # Benchmarks and notebooks
docs_source/ # Sphinx documentation source
docs/ # Built HTML documentation
.github/workflows/build_wheels.yml # CI/CD pipeline
```
## Build & Development Commands
### Prerequisites
Requires Cython and numpy at build time. Install all dev dependencies:
```sh
pip install -r requirements.txt
pip install -r requirements-dev.txt
```
### Key Commands
| Command | Purpose |
|---|---|
| `make compile` | Build Cython extensions in-place (required before running tests locally) |
| `pytest` | Run all tests |
| `pytest tests/test_k_means_constrained_.py` | Run core tests only |
| `make build` | Build the package |
| `make dist` | Build wheel and sdist |
| `make clean` | Remove build artifacts and caches |
| `make docs` | Build Sphinx HTML documentation |
### Typical Development Workflow
1. `make compile` — build Cython extensions in-place
2. Edit Python or Cython source files
3. `make compile` again if `.pyx` files were changed
4. `pytest` — run tests
## Architecture & Key Concepts
### Algorithm Flow
1. **Initialization:** k-means++ or random center selection (in `sklearn_import/cluster/k_means_.py:_k_init`)
2. **E-step (constrained):** `_labels_constrained()` builds an MCF graph from distance matrix and solves it via `ortools.SimpleMinCostFlow` to assign points to clusters respecting size_min/size_max
3. **M-step (standard):** `_centers_dense()` / `_centers_sparse()` in `_k_means.pyx` recomputes cluster centers
4. **Iterate** until convergence or max iterations
### Key Functions in `k_means_constrained_.py`
- `KMeansConstrained` — main API class, sklearn-compatible estimator
- `k_means_constrained()` — top-level function handling multiple random inits
- `kmeans_constrained_single()` — single run of the constrained E-M loop
- `_labels_constrained()` — constrained E-step using min-cost flow
- `minimum_cost_flow_problem_graph()` — builds MCF graph (nodes, arcs, costs, capacities)
- `solve_min_cost_flow_graph()` — solves the MCF problem via OR-Tools
### Vendored sklearn Code
The `sklearn_import/` directory contains code copied and adapted from scikit-learn. This is not a dependency on sklearn at runtime — it's vendored to avoid version coupling. Changes to these files should be minimal and well-documented.
## Cython Extensions
Three Cython `.pyx` files compile to C extensions:
| Extension | Source | Purpose |
|---|---|---|
| `cluster._k_means` | `_k_means.pyx` | Compute cluster centers (M-step) |
| `metrics.pairwise_fast` | `pairwise_fast.pyx` | Optimized sparse distance computation |
| `utils.sparsefuncs_fast` | `sparsefuncs_fast.pyx` | Sparse CSR row norms and stats |
Compilation is controlled by the `CYTHONIZE` environment variable (defaults to `1`). Set `CYTHONIZE=0` to skip Cythonization and use pre-compiled `.c`/`.cpp` files.
Cython compiler directives: `language_level=3`, `embedsignature=True`. Extensions use `boundscheck(False)`, `wraparound(False)`, `cdivision(True)` for performance.
## Testing
- **Framework:** pytest
- **Test files:** `tests/test_k_means_constrained_.py` (core algorithm), `tests/test_kmeans_constrained_from_sklearn.py` (sklearn compatibility)
- **CI matrix:** Ubuntu (x64+ARM), Windows, macOS (Intel+Apple Silicon) x Python 3.10-3.14
- **CI tool:** `cibuildwheel` v3.0.0 — builds and tests wheels across platforms
- **CI triggers:** push to master, PRs, weekly schedule (Thursday 1 AM UTC), manual dispatch
- **Note:** musllinux is skipped (ortools compatibility)
## Dependencies
### Runtime (`requirements.txt`)
- `ortools >= 9.15.6755` — Google OR-Tools for min-cost flow
- `scipy >= 1.14.1` — sparse matrices, distance functions
- `numpy >= 2.1.1` — array operations
- `six` — Python 2/3 compatibility (legacy)
- `joblib` — parallel execution of multiple inits
### Build (`pyproject.toml`)
- `setuptools`, `wheel`, `cython >= 3.0.11`, `numpy >= 2.0, < 3`
### Dev (`requirements-dev.txt`)
- `pytest`, `pandas`, `scikit-learn >= 1.5.2`, `sphinx`, `bump2version`, `twine`
## Versioning & Release
### Version locations
The version string appears in three files that must stay in sync:
| File | Format |
|---|---|
| `setup.cfg` | `version = X.Y.Z` |
| `k_means_constrained/__init__.py` | `__version__ = 'X.Y.Z'` |
| `.bumpversion.cfg` | `current_version = X.Y.Z` |
### Bumping the version
Use `bump2version` (config in `.bumpversion.cfg`) which updates all three files and creates a git commit + tag automatically:
```sh
bump2version patch # 0.9.0 → 0.9.1
bump2version minor # 0.9.0 → 0.10.0
bump2version major # 0.9.0 → 1.0.0
```
If bumping manually (without `bump2version`), update all three files listed above.
### Changelog
The changelog lives in `README.md` under the `# Change log` heading. When bumping the version, add a new entry at the top of the list following the existing format:
```
* vX.Y.Z (YYYY-MM-DD) Brief description of changes.
```
### Release workflow (from `README_dev.md`)
1. Build and test locally (`make compile && pytest`)
2. Push to GitHub (triggers CI wheel builds)
3. Add changelog entry in `README.md`, bump version (`bump2version patch|minor|major`), push again
4. Download CI artifacts (`make download-dists ID=$BUILD_ID`)
5. Upload to test PyPI (`make test-pypi`), verify install
6. Upload to real PyPI (`make pypi-upload`)
## Code Conventions
- **Naming:** PascalCase for classes, snake_case for functions, leading underscore for internal/private
- **Docstrings:** NumPy style (Parameters, Returns, Notes, Examples sections)
- **Imports:** `import numpy as np`, `import scipy.sparse as sp`
- **Type hints:** Not heavily used; Cython files use `cdef`/`ctypedef` typed declarations
- **Error handling:** `ValueError` for constraint violations, `NotImplementedError` for unsupported sparse operations
- **Style checking:** flake8 (configured in `tox.ini`, excludes `.tox`, `*.egg`, `build`, `data`)
## Important Caveats
- Sparse matrix input is not fully supported — some code paths raise `NotImplementedError`
- Performance: O(n^4 log n) when n ~ c (number of clusters), vs O(n^2) for standard k-means. Not suitable for very large datasets with few clusters.
- The `tox.ini` is outdated (references Python 3.8/3.9). Use `pytest` directly or rely on CI.
- Always run `make compile` after modifying `.pyx` files before testing.
================================================
FILE: LICENSE
================================================
BSD 3-Clause License
Copyright (c) 2022, Josh Levy-Kramer & Outra Limited
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
* Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
================================================
FILE: MANIFEST.in
================================================
include README*.md
include requirements*.txt
include LICENSE
include pyproject.toml
global-include *.pyx
global-include *.pyd
================================================
FILE: Makefile
================================================
.PHONY: build dist redist install dist-no-cython install-from-source clean venv-create venv-activate check-dist test-pypi pypi-upload
build:
python setup.py build
dist:
python setup.py build bdist_wheel sdist
dist-no-cython:
CYTHONIZE=0 python setup.py build bdist_wheel
compile:
python setup.py build build_ext --inplace
redist: clean dist
install:
pip install .
install-from-source: dist
pip install dist/k-means-constrained-0.5.0.tar.gz
clean:
$(RM) -r build dist src/*.egg-info artifact
$(RM) -r .pytest_cache
find . -name __pycache__ -exec rm -r {} +
#git clean -fdX
venv-create:
conda create -n k-means-constrained python=3.10
conda activate k-means-constrained
pip install -r requirements.txt
pip install -r requirements-dev.txt
venv-activate:
# Doesn't work. Need to execute manually
conda activate k-means-constrained
venv-delete:
conda env delete k-means-constrained
docs:
sphinx-build -b html docs_source docs
source-dists:
rm -r dist
python setup.py sdist --formats=gztar
download-dists:
# e.g. `make download-dists ID=8`
# ID is run id (get from url. Not Job ID)
# Need gh installed. `brew install gh`
rm -r wheels || true
gh run download $(ID)
check-dist:
twine check wheels/*
test-pypi:
# Get API key from password manager
twine upload --repository-url https://test.pypi.org/legacy/ wheels/*
pypi-upload:
# Get API key from password manager
twine upload wheels/*
================================================
FILE: README.md
================================================
[](https://pypi.org/project/k-means-constrained/)

[](https://github.com/joshlk/k-means-constrained/actions/workflows/build_wheels.yml)
[**Documentation**](https://joshlk.github.io/k-means-constrained/)
# k-means-constrained
K-means clustering implementation whereby a minimum and/or maximum size for each
cluster can be specified.
This K-means implementation modifies the cluster assignment step (E in EM)
by formulating it as a Minimum Cost Flow (MCF) linear network
optimisation problem. This is then solved using a cost-scaling
push-relabel algorithm and uses [Google's Operations Research tools's
`SimpleMinCostFlow`](https://developers.google.com/optimization/flow/mincostflow)
which is a fast C++ implementation.
This package is inspired by [Bradley et al.](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-2000-65.pdf).
The original Minimum Cost Flow (MCF) network proposed by Bradley et al.
has been modified so maximum cluster sizes can also be specified along
with minimum cluster size.
The code is based on [scikit-lean's `KMeans`](https://scikit-learn.org/0.19/modules/generated/sklearn.cluster.KMeans.html)
and implements the same [API with modifications](https://joshlk.github.io/k-means-constrained/).
Ref:
1. [Bradley, P. S., K. P. Bennett, and Ayhan Demiriz. "Constrained k-means clustering."
Microsoft Research, Redmond (2000): 1-8.](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-2000-65.pdf)
2. [Google's SimpleMinCostFlow C++ implementation](https://github.com/google/or-tools/blob/master/ortools/graph/min_cost_flow.h)
# Installation
You can install the k-means-constrained from PyPI:
```
pip install k-means-constrained
```
It is supported on Python 3.10, 3.11, 3.12, 3.13 and 3.14. Previous versions of k-means-constrained support older versions of Python and Numpy.
# Example
More details can be found in the [API documentation](https://joshlk.github.io/k-means-constrained/).
```python
>>> from k_means_constrained import KMeansConstrained
>>> import numpy as np
>>> X = np.array([[1, 2], [1, 4], [1, 0],
... [4, 2], [4, 4], [4, 0]])
>>> clf = KMeansConstrained(
... n_clusters=2,
... size_min=2,
... size_max=5,
... random_state=0
... )
>>> clf.fit_predict(X)
array([0, 0, 0, 1, 1, 1], dtype=int32)
>>> clf.cluster_centers_
array([[ 1., 2.],
[ 4., 2.]])
>>> clf.labels_
array([0, 0, 0, 1, 1, 1], dtype=int32)
```
<details>
<summary>Code only</summary>
```
from k_means_constrained import KMeansConstrained
import numpy as np
X = np.array([[1, 2], [1, 4], [1, 0],
[4, 2], [4, 4], [4, 0]])
clf = KMeansConstrained(
n_clusters=2,
size_min=2,
size_max=5,
random_state=0
)
clf.fit_predict(X)
clf.cluster_centers_
clf.labels_
```
</details>
# Time complexity and runtime
k-means-constrained is a more complex algorithm than vanilla k-means and therefore will take longer to execute and has worse scaling characteristics.
Given a number of data points $n$ and clusters $c$, the time complexity of:
* k-means: $\mathcal{O}(nc)$
* k-means-constrained<sup>1</sup>: $\mathcal{O}((n^3c+n^2c^2+nc^3)\log(n+c)))$
This assumes a constant number of algorithm iterations and data-point features/dimensions.
If you consider the case where $n$ is the same order as $c$ ($n \backsim c$) then:
* k-means: $\mathcal{O}(n^2)$
* k-means-constrained<sup>1</sup>: $\mathcal{O}(n^4\log(n)))$
Below is a runtime comparison between k-means and k-means-constrained whereby the number of iterations, initializations, multi-process pool size and dimension size are fixed. The number of clusters is also always one-tenth the number of data points $n=10c$. It is shown above that the runtime is independent of the minimum or maximum cluster size, and so none is included below.
<p align="center">
<img src="https://raw.githubusercontent.com/joshlk/k-means-constrained/master/etc/execution_time.png" alt="Data-points vs execution time for k-means vs k-means-constrained. Data-points=10*clusters. No min/max constraints" width="50%" height="50%">
</p>
<details>
<summary>System details</summary>
* OS: Linux-5.15.0-75-generic-x86_64-with-glibc2.35
* CPU: AMD EPYC 7763 64-Core Processor
* CPU cores: 120
* k-means-constrained version: 0.7.3
* numpy version: 1.24.2
* scipy version: 1.11.1
* ortools version: 9.6.2534
* joblib version: 1.3.1
* sklearn version: 1.3.0
</details>
---
<sup>1</sup>: [Ortools states](https://developers.google.com/optimization/reference/graph/min_cost_flow) the time complexity of their cost-scaling push-relabel algorithm for the min-cost flow problem as $\mathcal{O}(n^2m\log(nC))$ where $n$ is the number of nodes, $m$ is the number of edges and $C$ is the maximum absolute edge cost.
# Change log
* v0.9.0 (2026-01-27) Added Python 3.14 support. Bumped ortools to >= 9.15.6755.
* v0.8.0 (2025-11-26) Fixed IndexError due to imprecision in _k_init centroid selection. Ported fix from scikit-learn: [scikit-learn#11756](https://github.com/scikit-learn/scikit-learn/pull/11756)
* v0.7.6 (2025-06-30) Add Python v3.13 and Linux ARM support.
* v0.7.5 fix comment in README on Python version that is supported
* v0.7.4 compatible with Numpy +v2.1.1. Added Python 3.12 support and dropped Python 3.8 and 3.9 support (due to Numpy). Linux ARM support has been dropped as we use GitHub runners to build the package and ARM machines was being emulated using QEMU. This however was producing numerical errors. GitHub should natively support Ubuntu ARM images soon and then we can start to re-build them.
* v0.7.3 compatible with Numpy v1.23.0 to 1.26.4
# Citations
If you use this software in your research, please use the following citation:
```
@software{Levy-Kramer_k-means-constrained_2018,
author = {Levy-Kramer, Josh},
month = apr,
title = {{k-means-constrained}},
url = {https://github.com/joshlk/k-means-constrained},
year = {2018}
}
```
================================================
FILE: README_dev.md
================================================
# Build and test
Notes:
* Numpy build version is in `pyproject.toml` while the runtime version is in `requirements.txt`
* Check which Python versions a new version of Numpy is compatible with. Also check ortools as this is slower to update.
* Change the Python versions in the GitHub action. Also change the badge in the README and the comment in the installation section.
* You might need to increase the ciwheelbuild version in the GitHub action to be able to use new Python versions
* Check ciwheelbuild example if you need to change runner image versions (e.g. MacOS, Windows or Ubuntu).
Change as little as possible so not to run into other errors:
https://github.com/pypa/cibuildwheel/blob/main/examples/github-with-qemu.yml
* Add changes to the change log
Steps:
1. Build and test locally:
To build Cython extensions in source:
```shell script
make compile
```
To test:
```shell script
pytest
```
2. Push changes to GitHub to build it for all platforms (if you get errors check notes above)
3. Add changes to change log and bump version (major, minor or patch). Push changes (so dist have new version):
```shell script
bump2version patch
git push # Must push as otherwise wheels have the wrong version
```
4. Download distributions (artifacts)
```shell script
make download-dists ID=$BUILD_ID
```
5. Upload to test PyPi (you can get PyPI API token in password manager)
```shell script
make check-dist
make test-pypi
```
6. Activate virtual env (might need to `make venv-create`)
```shell script
source k-means-env/bin/activate
```
7. Test install (in virtual env. *****Remember to cd out of k-means-constrained folder*****):
```shell script
pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple k-means-constrained
```
8. Then push to real PyPI:
```shell script
make pypi-upload
```
================================================
FILE: docs/.buildinfo
================================================
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: 382d35e4d15790d94b7a36ad2e0a5f4f
tags: 645f666f9bcd5a90fca523b33c5a78b7
================================================
FILE: docs/.nojekyll
================================================
================================================
FILE: docs/_modules/index.html
================================================
<!DOCTYPE html>
<html class="writer-html5" lang="en" >
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Overview: module code — k-means-constrained 0.5.1 documentation</title>
<link rel="stylesheet" href="../_static/css/theme.css" type="text/css" />
<link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="../_static/css/theme.css" type="text/css" />
<!--[if lt IE 9]>
<script src="../_static/js/html5shiv.min.js"></script>
<![endif]-->
<script type="text/javascript" id="documentation_options" data-url_root="../" src="../_static/documentation_options.js"></script>
<script data-url_root="../" id="documentation_options" src="../_static/documentation_options.js"></script>
<script src="../_static/jquery.js"></script>
<script src="../_static/underscore.js"></script>
<script src="../_static/doctools.js"></script>
<script type="text/javascript" src="../_static/js/theme.js"></script>
<link rel="index" title="Index" href="../genindex.html" />
<link rel="search" title="Search" href="../search.html" />
</head>
<body class="wy-body-for-nav">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll">
<div class="wy-side-nav-search" >
<a href="../index.html" class="icon icon-home"> k-means-constrained
</a>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div>
<div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
<!-- Local TOC -->
<div class="local-toc"></div>
</div>
</div>
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">
<nav class="wy-nav-top" aria-label="top navigation">
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../index.html">k-means-constrained</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="breadcrumbs navigation">
<ul class="wy-breadcrumbs">
<li><a href="../index.html" class="icon icon-home"></a> »</li>
<li>Overview: module code</li>
<li class="wy-breadcrumbs-aside">
</li>
</ul>
<hr/>
</div>
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody">
<h1>All modules for which code is available</h1>
<ul><li><a href="k_means_constrained/k_means_constrained_.html">k_means_constrained.k_means_constrained_</a></li>
<li><a href="k_means_constrained/sklearn_import/base.html">k_means_constrained.sklearn_import.base</a></li>
<li><a href="k_means_constrained/sklearn_import/cluster/k_means_.html">k_means_constrained.sklearn_import.cluster.k_means_</a></li>
</ul>
</div>
</div>
<footer>
<hr/>
<div role="contentinfo">
<p>
© Copyright 2020, Josh Levy-Kramer. Documentation derived from Scikit-Learn.
</p>
</div>
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<script type="text/javascript">
jQuery(function () {
SphinxRtdTheme.Navigation.enable(true);
});
</script>
</body>
</html>
================================================
FILE: docs/_modules/k_means_constrained/k_means_constrained_.html
================================================
<!DOCTYPE html>
<html class="writer-html5" lang="en" >
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>k_means_constrained.k_means_constrained_ — k-means-constrained 0.5.1 documentation</title>
<link rel="stylesheet" href="../../_static/css/theme.css" type="text/css" />
<link rel="stylesheet" href="../../_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="../../_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="../../_static/css/theme.css" type="text/css" />
<!--[if lt IE 9]>
<script src="../../_static/js/html5shiv.min.js"></script>
<![endif]-->
<script type="text/javascript" id="documentation_options" data-url_root="../../" src="../../_static/documentation_options.js"></script>
<script data-url_root="../../" id="documentation_options" src="../../_static/documentation_options.js"></script>
<script src="../../_static/jquery.js"></script>
<script src="../../_static/underscore.js"></script>
<script src="../../_static/doctools.js"></script>
<script type="text/javascript" src="../../_static/js/theme.js"></script>
<link rel="index" title="Index" href="../../genindex.html" />
<link rel="search" title="Search" href="../../search.html" />
</head>
<body class="wy-body-for-nav">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll">
<div class="wy-side-nav-search" >
<a href="../../index.html" class="icon icon-home"> k-means-constrained
</a>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="../../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div>
<div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
<!-- Local TOC -->
<div class="local-toc"></div>
</div>
</div>
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">
<nav class="wy-nav-top" aria-label="top navigation">
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../../index.html">k-means-constrained</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="breadcrumbs navigation">
<ul class="wy-breadcrumbs">
<li><a href="../../index.html" class="icon icon-home"></a> »</li>
<li><a href="../index.html">Module code</a> »</li>
<li>k_means_constrained.k_means_constrained_</li>
<li class="wy-breadcrumbs-aside">
</li>
</ul>
<hr/>
</div>
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody">
<h1>Source code for k_means_constrained.k_means_constrained_</h1><div class="highlight"><pre>
<span></span><span class="sd">"""k-means-constrained"""</span>
<span class="c1"># Authors: Josh Levy-Kramer <josh@levykramer.co.uk></span>
<span class="c1"># Gael Varoquaux <gael.varoquaux@normalesup.org></span>
<span class="c1"># Thomas Rueckstiess <ruecksti@in.tum.de></span>
<span class="c1"># James Bergstra <james.bergstra@umontreal.ca></span>
<span class="c1"># Jan Schlueter <scikit-learn@jan-schlueter.de></span>
<span class="c1"># Nelle Varoquaux</span>
<span class="c1"># Peter Prettenhofer <peter.prettenhofer@gmail.com></span>
<span class="c1"># Olivier Grisel <olivier.grisel@ensta.org></span>
<span class="c1"># Mathieu Blondel <mathieu@mblondel.org></span>
<span class="c1"># Robert Layton <robertlayton@gmail.com></span>
<span class="c1"># License: BSD 3 clause</span>
<span class="kn">import</span> <span class="nn">warnings</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
<span class="kn">import</span> <span class="nn">scipy.sparse</span> <span class="k">as</span> <span class="nn">sp</span>
<span class="kn">from</span> <span class="nn">.sklearn_import.metrics.pairwise</span> <span class="kn">import</span> <span class="n">euclidean_distances</span>
<span class="kn">from</span> <span class="nn">.sklearn_import.utils.extmath</span> <span class="kn">import</span> <span class="n">row_norms</span><span class="p">,</span> <span class="n">squared_norm</span><span class="p">,</span> <span class="n">cartesian</span>
<span class="kn">from</span> <span class="nn">.sklearn_import.utils.validation</span> <span class="kn">import</span> <span class="n">check_array</span><span class="p">,</span> <span class="n">check_random_state</span><span class="p">,</span> <span class="n">as_float_array</span><span class="p">,</span> <span class="n">check_is_fitted</span>
<span class="kn">from</span> <span class="nn">joblib</span> <span class="kn">import</span> <span class="n">Parallel</span>
<span class="kn">from</span> <span class="nn">joblib</span> <span class="kn">import</span> <span class="n">delayed</span>
<span class="c1"># Internal scikit learn methods imported into this project</span>
<span class="kn">from</span> <span class="nn">k_means_constrained.sklearn_import.cluster._k_means</span> <span class="kn">import</span> <span class="n">_centers_dense</span><span class="p">,</span> <span class="n">_centers_sparse</span>
<span class="kn">from</span> <span class="nn">k_means_constrained.sklearn_import.cluster.k_means_</span> <span class="kn">import</span> <span class="n">_validate_center_shape</span><span class="p">,</span> <span class="n">_tolerance</span><span class="p">,</span> <span class="n">KMeans</span><span class="p">,</span> \
<span class="n">_init_centroids</span>
<span class="kn">from</span> <span class="nn">k_means_constrained.mincostflow_vectorized</span> <span class="kn">import</span> <span class="n">SimpleMinCostFlowVectorized</span>
<span class="k">def</span> <span class="nf">k_means_constrained</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">n_clusters</span><span class="p">,</span> <span class="n">size_min</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">size_max</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">init</span><span class="o">=</span><span class="s1">'k-means++'</span><span class="p">,</span>
<span class="n">n_init</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">max_iter</span><span class="o">=</span><span class="mi">300</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span>
<span class="n">tol</span><span class="o">=</span><span class="mf">1e-4</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">copy_x</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">n_jobs</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
<span class="n">return_n_iter</span><span class="o">=</span><span class="kc">False</span><span class="p">):</span>
<span class="sd">"""K-Means clustering with minimum and maximum cluster size constraints.</span>
<span class="sd"> Read more in the :ref:`User Guide <k_means>`.</span>
<span class="sd"> Parameters</span>
<span class="sd"> ----------</span>
<span class="sd"> X : array-like, shape (n_samples, n_features)</span>
<span class="sd"> The observations to cluster.</span>
<span class="sd"> size_min : int, optional, default: None</span>
<span class="sd"> Constrain the label assignment so that each cluster has a minimum</span>
<span class="sd"> size of size_min. If None, no constrains will be applied</span>
<span class="sd"> size_max : int, optional, default: None</span>
<span class="sd"> Constrain the label assignment so that each cluster has a maximum</span>
<span class="sd"> size of size_max. If None, no constrains will be applied</span>
<span class="sd"> n_clusters : int</span>
<span class="sd"> The number of clusters to form as well as the number of</span>
<span class="sd"> centroids to generate.</span>
<span class="sd"> init : {'k-means++', 'random', or ndarray, or a callable}, optional</span>
<span class="sd"> Method for initialization, default to 'k-means++':</span>
<span class="sd"> 'k-means++' : selects initial cluster centers for k-mean</span>
<span class="sd"> clustering in a smart way to speed up convergence. See section</span>
<span class="sd"> Notes in k_init for more details.</span>
<span class="sd"> 'random': generate k centroids from a Gaussian with mean and</span>
<span class="sd"> variance estimated from the data.</span>
<span class="sd"> If an ndarray is passed, it should be of shape (n_clusters, n_features)</span>
<span class="sd"> and gives the initial centers.</span>
<span class="sd"> If a callable is passed, it should take arguments X, k and</span>
<span class="sd"> and a random state and return an initialization.</span>
<span class="sd"> n_init : int, optional, default: 10</span>
<span class="sd"> Number of time the k-means algorithm will be run with different</span>
<span class="sd"> centroid seeds. The final results will be the best output of</span>
<span class="sd"> n_init consecutive runs in terms of inertia.</span>
<span class="sd"> max_iter : int, optional, default 300</span>
<span class="sd"> Maximum number of iterations of the k-means algorithm to run.</span>
<span class="sd"> verbose : boolean, optional</span>
<span class="sd"> Verbosity mode.</span>
<span class="sd"> tol : float, optional</span>
<span class="sd"> Relative tolerance with regards to Frobenius norm of the difference</span>
<span class="sd"> in the cluster centers of two consecutive iterations to declare</span>
<span class="sd"> convergence.</span>
<span class="sd"> random_state : int, RandomState instance or None, optional, default: None</span>
<span class="sd"> If int, random_state is the seed used by the random number generator;</span>
<span class="sd"> If RandomState instance, random_state is the random number generator;</span>
<span class="sd"> If None, the random number generator is the RandomState instance used</span>
<span class="sd"> by `np.random`.</span>
<span class="sd"> copy_x : boolean, optional</span>
<span class="sd"> When pre-computing distances it is more numerically accurate to center</span>
<span class="sd"> the data first. If copy_x is True, then the original data is not</span>
<span class="sd"> modified. If False, the original data is modified, and put back before</span>
<span class="sd"> the function returns, but small numerical differences may be introduced</span>
<span class="sd"> by subtracting and then adding the data mean.</span>
<span class="sd"> n_jobs : int</span>
<span class="sd"> The number of jobs to use for the computation. This works by computing</span>
<span class="sd"> each of the n_init runs in parallel.</span>
<span class="sd"> If -1 all CPUs are used. If 1 is given, no parallel computing code is</span>
<span class="sd"> used at all, which is useful for debugging. For n_jobs below -1,</span>
<span class="sd"> (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one</span>
<span class="sd"> are used.</span>
<span class="sd"> return_n_iter : bool, optional</span>
<span class="sd"> Whether or not to return the number of iterations.</span>
<span class="sd"> Returns</span>
<span class="sd"> -------</span>
<span class="sd"> centroid : float ndarray with shape (k, n_features)</span>
<span class="sd"> Centroids found at the last iteration of k-means.</span>
<span class="sd"> label : integer ndarray with shape (n_samples,)</span>
<span class="sd"> label[i] is the code or index of the centroid the</span>
<span class="sd"> i'th observation is closest to.</span>
<span class="sd"> inertia : float</span>
<span class="sd"> The final value of the inertia criterion (sum of squared distances to</span>
<span class="sd"> the closest centroid for all observations in the training set).</span>
<span class="sd"> best_n_iter : int</span>
<span class="sd"> Number of iterations corresponding to the best results.</span>
<span class="sd"> Returned only if `return_n_iter` is set to True.</span>
<span class="sd"> """</span>
<span class="k">if</span> <span class="n">sp</span><span class="o">.</span><span class="n">issparse</span><span class="p">(</span><span class="n">X</span><span class="p">):</span>
<span class="k">raise</span> <span class="ne">NotImplementedError</span><span class="p">(</span><span class="s2">"Not implemented for sparse X"</span><span class="p">)</span>
<span class="k">if</span> <span class="n">n_init</span> <span class="o"><=</span> <span class="mi">0</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s2">"Invalid number of initializations."</span>
<span class="s2">" n_init=</span><span class="si">%d</span><span class="s2"> must be bigger than zero."</span> <span class="o">%</span> <span class="n">n_init</span><span class="p">)</span>
<span class="n">random_state</span> <span class="o">=</span> <span class="n">check_random_state</span><span class="p">(</span><span class="n">random_state</span><span class="p">)</span>
<span class="k">if</span> <span class="n">max_iter</span> <span class="o"><=</span> <span class="mi">0</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s1">'Number of iterations should be a positive number,'</span>
<span class="s1">' got </span><span class="si">%d</span><span class="s1"> instead'</span> <span class="o">%</span> <span class="n">max_iter</span><span class="p">)</span>
<span class="n">X</span> <span class="o">=</span> <span class="n">as_float_array</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">copy</span><span class="o">=</span><span class="n">copy_x</span><span class="p">)</span>
<span class="n">tol</span> <span class="o">=</span> <span class="n">_tolerance</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">tol</span><span class="p">)</span>
<span class="c1"># Validate init array</span>
<span class="k">if</span> <span class="nb">hasattr</span><span class="p">(</span><span class="n">init</span><span class="p">,</span> <span class="s1">'__array__'</span><span class="p">):</span>
<span class="n">init</span> <span class="o">=</span> <span class="n">check_array</span><span class="p">(</span><span class="n">init</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">X</span><span class="o">.</span><span class="n">dtype</span><span class="o">.</span><span class="n">type</span><span class="p">,</span> <span class="n">copy</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">_validate_center_shape</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">n_clusters</span><span class="p">,</span> <span class="n">init</span><span class="p">)</span>
<span class="k">if</span> <span class="n">n_init</span> <span class="o">!=</span> <span class="mi">1</span><span class="p">:</span>
<span class="n">warnings</span><span class="o">.</span><span class="n">warn</span><span class="p">(</span>
<span class="s1">'Explicit initial center position passed: '</span>
<span class="s1">'performing only one init in k-means instead of n_init=</span><span class="si">%d</span><span class="s1">'</span>
<span class="o">%</span> <span class="n">n_init</span><span class="p">,</span> <span class="ne">RuntimeWarning</span><span class="p">,</span> <span class="n">stacklevel</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
<span class="n">n_init</span> <span class="o">=</span> <span class="mi">1</span>
<span class="c1"># subtract of mean of x for more accurate distance computations</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">sp</span><span class="o">.</span><span class="n">issparse</span><span class="p">(</span><span class="n">X</span><span class="p">):</span>
<span class="n">X_mean</span> <span class="o">=</span> <span class="n">X</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="c1"># The copy was already done above</span>
<span class="n">X</span> <span class="o">-=</span> <span class="n">X_mean</span>
<span class="k">if</span> <span class="nb">hasattr</span><span class="p">(</span><span class="n">init</span><span class="p">,</span> <span class="s1">'__array__'</span><span class="p">):</span>
<span class="n">init</span> <span class="o">-=</span> <span class="n">X_mean</span>
<span class="c1"># precompute squared norms of data points</span>
<span class="n">x_squared_norms</span> <span class="o">=</span> <span class="n">row_norms</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">squared</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">best_labels</span><span class="p">,</span> <span class="n">best_inertia</span><span class="p">,</span> <span class="n">best_centers</span> <span class="o">=</span> <span class="kc">None</span><span class="p">,</span> <span class="kc">None</span><span class="p">,</span> <span class="kc">None</span>
<span class="k">if</span> <span class="n">n_jobs</span> <span class="o">==</span> <span class="mi">1</span><span class="p">:</span>
<span class="c1"># For a single thread, less memory is needed if we just store one set</span>
<span class="c1"># of the best results (as opposed to one set per run per thread).</span>
<span class="k">for</span> <span class="n">it</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">n_init</span><span class="p">):</span>
<span class="c1"># run a k-means once</span>
<span class="n">labels</span><span class="p">,</span> <span class="n">inertia</span><span class="p">,</span> <span class="n">centers</span><span class="p">,</span> <span class="n">n_iter_</span> <span class="o">=</span> <span class="n">kmeans_constrained_single</span><span class="p">(</span>
<span class="n">X</span><span class="p">,</span> <span class="n">n_clusters</span><span class="p">,</span>
<span class="n">size_min</span><span class="o">=</span><span class="n">size_min</span><span class="p">,</span> <span class="n">size_max</span><span class="o">=</span><span class="n">size_max</span><span class="p">,</span>
<span class="n">max_iter</span><span class="o">=</span><span class="n">max_iter</span><span class="p">,</span> <span class="n">init</span><span class="o">=</span><span class="n">init</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="n">verbose</span><span class="p">,</span> <span class="n">tol</span><span class="o">=</span><span class="n">tol</span><span class="p">,</span>
<span class="n">x_squared_norms</span><span class="o">=</span><span class="n">x_squared_norms</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="n">random_state</span><span class="p">)</span>
<span class="c1"># determine if these results are the best so far</span>
<span class="k">if</span> <span class="n">best_inertia</span> <span class="ow">is</span> <span class="kc">None</span> <span class="ow">or</span> <span class="n">inertia</span> <span class="o"><</span> <span class="n">best_inertia</span><span class="p">:</span>
<span class="n">best_labels</span> <span class="o">=</span> <span class="n">labels</span><span class="o">.</span><span class="n">copy</span><span class="p">()</span>
<span class="n">best_centers</span> <span class="o">=</span> <span class="n">centers</span><span class="o">.</span><span class="n">copy</span><span class="p">()</span>
<span class="n">best_inertia</span> <span class="o">=</span> <span class="n">inertia</span>
<span class="n">best_n_iter</span> <span class="o">=</span> <span class="n">n_iter_</span>
<span class="k">else</span><span class="p">:</span>
<span class="c1"># parallelisation of k-means runs</span>
<span class="n">seeds</span> <span class="o">=</span> <span class="n">random_state</span><span class="o">.</span><span class="n">randint</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">iinfo</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">int32</span><span class="p">)</span><span class="o">.</span><span class="n">max</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="n">n_init</span><span class="p">)</span>
<span class="n">results</span> <span class="o">=</span> <span class="n">Parallel</span><span class="p">(</span><span class="n">n_jobs</span><span class="o">=</span><span class="n">n_jobs</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="mi">0</span><span class="p">)(</span>
<span class="n">delayed</span><span class="p">(</span><span class="n">kmeans_constrained_single</span><span class="p">)(</span><span class="n">X</span><span class="p">,</span> <span class="n">n_clusters</span><span class="p">,</span>
<span class="n">size_min</span><span class="o">=</span><span class="n">size_min</span><span class="p">,</span> <span class="n">size_max</span><span class="o">=</span><span class="n">size_max</span><span class="p">,</span>
<span class="n">max_iter</span><span class="o">=</span><span class="n">max_iter</span><span class="p">,</span> <span class="n">init</span><span class="o">=</span><span class="n">init</span><span class="p">,</span>
<span class="n">verbose</span><span class="o">=</span><span class="n">verbose</span><span class="p">,</span> <span class="n">tol</span><span class="o">=</span><span class="n">tol</span><span class="p">,</span>
<span class="n">x_squared_norms</span><span class="o">=</span><span class="n">x_squared_norms</span><span class="p">,</span>
<span class="c1"># Change seed to ensure variety</span>
<span class="n">random_state</span><span class="o">=</span><span class="n">seed</span><span class="p">)</span>
<span class="k">for</span> <span class="n">seed</span> <span class="ow">in</span> <span class="n">seeds</span><span class="p">)</span>
<span class="c1"># Get results with the lowest inertia</span>
<span class="n">labels</span><span class="p">,</span> <span class="n">inertia</span><span class="p">,</span> <span class="n">centers</span><span class="p">,</span> <span class="n">n_iters</span> <span class="o">=</span> <span class="nb">zip</span><span class="p">(</span><span class="o">*</span><span class="n">results</span><span class="p">)</span>
<span class="n">best</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">argmin</span><span class="p">(</span><span class="n">inertia</span><span class="p">)</span>
<span class="n">best_labels</span> <span class="o">=</span> <span class="n">labels</span><span class="p">[</span><span class="n">best</span><span class="p">]</span>
<span class="n">best_inertia</span> <span class="o">=</span> <span class="n">inertia</span><span class="p">[</span><span class="n">best</span><span class="p">]</span>
<span class="n">best_centers</span> <span class="o">=</span> <span class="n">centers</span><span class="p">[</span><span class="n">best</span><span class="p">]</span>
<span class="n">best_n_iter</span> <span class="o">=</span> <span class="n">n_iters</span><span class="p">[</span><span class="n">best</span><span class="p">]</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">sp</span><span class="o">.</span><span class="n">issparse</span><span class="p">(</span><span class="n">X</span><span class="p">):</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">copy_x</span><span class="p">:</span>
<span class="n">X</span> <span class="o">+=</span> <span class="n">X_mean</span>
<span class="n">best_centers</span> <span class="o">+=</span> <span class="n">X_mean</span>
<span class="k">if</span> <span class="n">return_n_iter</span><span class="p">:</span>
<span class="k">return</span> <span class="n">best_centers</span><span class="p">,</span> <span class="n">best_labels</span><span class="p">,</span> <span class="n">best_inertia</span><span class="p">,</span> <span class="n">best_n_iter</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">return</span> <span class="n">best_centers</span><span class="p">,</span> <span class="n">best_labels</span><span class="p">,</span> <span class="n">best_inertia</span>
<span class="k">def</span> <span class="nf">kmeans_constrained_single</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">n_clusters</span><span class="p">,</span> <span class="n">size_min</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">size_max</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
<span class="n">max_iter</span><span class="o">=</span><span class="mi">300</span><span class="p">,</span> <span class="n">init</span><span class="o">=</span><span class="s1">'k-means++'</span><span class="p">,</span>
<span class="n">verbose</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">x_squared_norms</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
<span class="n">random_state</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">tol</span><span class="o">=</span><span class="mf">1e-4</span><span class="p">):</span>
<span class="sd">"""A single run of k-means constrained, assumes preparation completed prior.</span>
<span class="sd"> Parameters</span>
<span class="sd"> ----------</span>
<span class="sd"> X : array-like of floats, shape (n_samples, n_features)</span>
<span class="sd"> The observations to cluster.</span>
<span class="sd"> size_min : int, optional, default: None</span>
<span class="sd"> Constrain the label assignment so that each cluster has a minimum</span>
<span class="sd"> size of size_min. If None, no constrains will be applied</span>
<span class="sd"> size_max : int, optional, default: None</span>
<span class="sd"> Constrain the label assignment so that each cluster has a maximum</span>
<span class="sd"> size of size_max. If None, no constrains will be applied</span>
<span class="sd"> n_clusters : int</span>
<span class="sd"> The number of clusters to form as well as the number of</span>
<span class="sd"> centroids to generate.</span>
<span class="sd"> max_iter : int, optional, default 300</span>
<span class="sd"> Maximum number of iterations of the k-means algorithm to run.</span>
<span class="sd"> init : {'k-means++', 'random', or ndarray, or a callable}, optional</span>
<span class="sd"> Method for initialization, default to 'k-means++':</span>
<span class="sd"> 'k-means++' : selects initial cluster centers for k-mean</span>
<span class="sd"> clustering in a smart way to speed up convergence. See section</span>
<span class="sd"> Notes in k_init for more details.</span>
<span class="sd"> 'random': generate k centroids from a Gaussian with mean and</span>
<span class="sd"> variance estimated from the data.</span>
<span class="sd"> If an ndarray is passed, it should be of shape (k, p) and gives</span>
<span class="sd"> the initial centers.</span>
<span class="sd"> If a callable is passed, it should take arguments X, k and</span>
<span class="sd"> and a random state and return an initialization.</span>
<span class="sd"> tol : float, optional</span>
<span class="sd"> Relative tolerance with regards to Frobenius norm of the difference</span>
<span class="sd"> in the cluster centers of two consecutive iterations to declare</span>
<span class="sd"> convergence.</span>
<span class="sd"> verbose : boolean, optional</span>
<span class="sd"> Verbosity mode</span>
<span class="sd"> x_squared_norms : array</span>
<span class="sd"> Precomputed x_squared_norms.</span>
<span class="sd"> random_state : int, RandomState instance or None, optional, default: None</span>
<span class="sd"> If int, random_state is the seed used by the random number generator;</span>
<span class="sd"> If RandomState instance, random_state is the random number generator;</span>
<span class="sd"> If None, the random number generator is the RandomState instance used</span>
<span class="sd"> by `np.random`.</span>
<span class="sd"> Returns</span>
<span class="sd"> -------</span>
<span class="sd"> centroid : float ndarray with shape (k, n_features)</span>
<span class="sd"> Centroids found at the last iteration of k-means.</span>
<span class="sd"> label : integer ndarray with shape (n_samples,)</span>
<span class="sd"> label[i] is the code or index of the centroid the</span>
<span class="sd"> i'th observation is closest to.</span>
<span class="sd"> inertia : float</span>
<span class="sd"> The final value of the inertia criterion (sum of squared distances to</span>
<span class="sd"> the closest centroid for all observations in the training set).</span>
<span class="sd"> n_iter : int</span>
<span class="sd"> Number of iterations run.</span>
<span class="sd"> """</span>
<span class="k">if</span> <span class="n">sp</span><span class="o">.</span><span class="n">issparse</span><span class="p">(</span><span class="n">X</span><span class="p">):</span>
<span class="k">raise</span> <span class="ne">NotImplementedError</span><span class="p">(</span><span class="s2">"Not implemented for sparse X"</span><span class="p">)</span>
<span class="n">random_state</span> <span class="o">=</span> <span class="n">check_random_state</span><span class="p">(</span><span class="n">random_state</span><span class="p">)</span>
<span class="n">n_samples</span> <span class="o">=</span> <span class="n">X</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="n">best_labels</span><span class="p">,</span> <span class="n">best_inertia</span><span class="p">,</span> <span class="n">best_centers</span> <span class="o">=</span> <span class="kc">None</span><span class="p">,</span> <span class="kc">None</span><span class="p">,</span> <span class="kc">None</span>
<span class="c1"># init</span>
<span class="n">centers</span> <span class="o">=</span> <span class="n">_init_centroids</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">n_clusters</span><span class="p">,</span> <span class="n">init</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="n">random_state</span><span class="p">,</span> <span class="n">x_squared_norms</span><span class="o">=</span><span class="n">x_squared_norms</span><span class="p">)</span>
<span class="k">if</span> <span class="n">verbose</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"Initialization complete"</span><span class="p">)</span>
<span class="c1"># Allocate memory to store the distances for each sample to its</span>
<span class="c1"># closer center for reallocation in case of ties</span>
<span class="n">distances</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="n">n_samples</span><span class="p">,),</span> <span class="n">dtype</span><span class="o">=</span><span class="n">X</span><span class="o">.</span><span class="n">dtype</span><span class="p">)</span>
<span class="c1"># Determine min and max sizes if non given</span>
<span class="k">if</span> <span class="n">size_min</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
<span class="n">size_min</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">if</span> <span class="n">size_max</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
<span class="n">size_max</span> <span class="o">=</span> <span class="n">n_samples</span> <span class="c1"># Number of data points</span>
<span class="c1"># Check size min and max</span>
<span class="k">if</span> <span class="ow">not</span> <span class="p">((</span><span class="n">size_min</span> <span class="o">>=</span> <span class="mi">0</span><span class="p">)</span> <span class="ow">and</span> <span class="p">(</span><span class="n">size_min</span> <span class="o"><=</span> <span class="n">n_samples</span><span class="p">)</span>
<span class="ow">and</span> <span class="p">(</span><span class="n">size_max</span> <span class="o">>=</span> <span class="mi">0</span><span class="p">)</span> <span class="ow">and</span> <span class="p">(</span><span class="n">size_max</span> <span class="o"><=</span> <span class="n">n_samples</span><span class="p">)):</span>
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s2">"size_min and size_max must be a positive number smaller "</span>
<span class="s2">"than the number of data points or `None`"</span><span class="p">)</span>
<span class="k">if</span> <span class="n">size_max</span> <span class="o"><</span> <span class="n">size_min</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s2">"size_max must be larger than size_min"</span><span class="p">)</span>
<span class="k">if</span> <span class="n">size_min</span> <span class="o">*</span> <span class="n">n_clusters</span> <span class="o">></span> <span class="n">n_samples</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s2">"The product of size_min and n_clusters cannot exceed the number of samples (X)"</span><span class="p">)</span>
<span class="k">if</span> <span class="n">size_max</span> <span class="o">*</span> <span class="n">n_clusters</span> <span class="o"><</span> <span class="n">n_samples</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s2">"The product of size_max and n_clusters must be larger than or equal the number of samples (X)"</span><span class="p">)</span>
<span class="c1"># iterations</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">max_iter</span><span class="p">):</span>
<span class="n">centers_old</span> <span class="o">=</span> <span class="n">centers</span><span class="o">.</span><span class="n">copy</span><span class="p">()</span>
<span class="c1"># labels assignment is also called the E-step of EM</span>
<span class="n">labels</span><span class="p">,</span> <span class="n">inertia</span> <span class="o">=</span> \
<span class="n">_labels_constrained</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">centers</span><span class="p">,</span> <span class="n">size_min</span><span class="p">,</span> <span class="n">size_max</span><span class="p">,</span> <span class="n">distances</span><span class="o">=</span><span class="n">distances</span><span class="p">)</span>
<span class="c1"># computation of the means is also called the M-step of EM</span>
<span class="k">if</span> <span class="n">sp</span><span class="o">.</span><span class="n">issparse</span><span class="p">(</span><span class="n">X</span><span class="p">):</span>
<span class="n">centers</span> <span class="o">=</span> <span class="n">_centers_sparse</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">labels</span><span class="p">,</span> <span class="n">n_clusters</span><span class="p">,</span> <span class="n">distances</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">centers</span> <span class="o">=</span> <span class="n">_centers_dense</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">labels</span><span class="p">,</span> <span class="n">n_clusters</span><span class="p">,</span> <span class="n">distances</span><span class="p">)</span>
<span class="k">if</span> <span class="n">verbose</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"Iteration </span><span class="si">%2d</span><span class="s2">, inertia </span><span class="si">%.3f</span><span class="s2">"</span> <span class="o">%</span> <span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">inertia</span><span class="p">))</span>
<span class="k">if</span> <span class="n">best_inertia</span> <span class="ow">is</span> <span class="kc">None</span> <span class="ow">or</span> <span class="n">inertia</span> <span class="o"><</span> <span class="n">best_inertia</span><span class="p">:</span>
<span class="n">best_labels</span> <span class="o">=</span> <span class="n">labels</span><span class="o">.</span><span class="n">copy</span><span class="p">()</span>
<span class="n">best_centers</span> <span class="o">=</span> <span class="n">centers</span><span class="o">.</span><span class="n">copy</span><span class="p">()</span>
<span class="n">best_inertia</span> <span class="o">=</span> <span class="n">inertia</span>
<span class="n">center_shift_total</span> <span class="o">=</span> <span class="n">squared_norm</span><span class="p">(</span><span class="n">centers_old</span> <span class="o">-</span> <span class="n">centers</span><span class="p">)</span>
<span class="k">if</span> <span class="n">center_shift_total</span> <span class="o"><=</span> <span class="n">tol</span><span class="p">:</span>
<span class="k">if</span> <span class="n">verbose</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"Converged at iteration </span><span class="si">%d</span><span class="s2">: "</span>
<span class="s2">"center shift </span><span class="si">%e</span><span class="s2"> within tolerance </span><span class="si">%e</span><span class="s2">"</span>
<span class="o">%</span> <span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">center_shift_total</span><span class="p">,</span> <span class="n">tol</span><span class="p">))</span>
<span class="k">break</span>
<span class="k">if</span> <span class="n">center_shift_total</span> <span class="o">></span> <span class="mi">0</span><span class="p">:</span>
<span class="c1"># rerun E-step in case of non-convergence so that predicted labels</span>
<span class="c1"># match cluster centers</span>
<span class="n">best_labels</span><span class="p">,</span> <span class="n">best_inertia</span> <span class="o">=</span> \
<span class="n">_labels_constrained</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">centers</span><span class="p">,</span> <span class="n">size_min</span><span class="p">,</span> <span class="n">size_max</span><span class="p">,</span> <span class="n">distances</span><span class="o">=</span><span class="n">distances</span><span class="p">)</span>
<span class="k">return</span> <span class="n">best_labels</span><span class="p">,</span> <span class="n">best_inertia</span><span class="p">,</span> <span class="n">best_centers</span><span class="p">,</span> <span class="n">i</span> <span class="o">+</span> <span class="mi">1</span>
<span class="k">def</span> <span class="nf">_labels_constrained</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">centers</span><span class="p">,</span> <span class="n">size_min</span><span class="p">,</span> <span class="n">size_max</span><span class="p">,</span> <span class="n">distances</span><span class="p">):</span>
<span class="sd">"""Compute labels using the min and max cluster size constraint</span>
<span class="sd"> This will overwrite the 'distances' array in-place.</span>
<span class="sd"> Parameters</span>
<span class="sd"> ----------</span>
<span class="sd"> X : numpy array, shape (n_sample, n_features)</span>
<span class="sd"> Input data.</span>
<span class="sd"> size_min : int</span>
<span class="sd"> Minimum size for each cluster</span>
<span class="sd"> size_max : int</span>
<span class="sd"> Maximum size for each cluster</span>
<span class="sd"> centers : numpy array, shape (n_clusters, n_features)</span>
<span class="sd"> Cluster centers which data is assigned to.</span>
<span class="sd"> distances : numpy array, shape (n_samples,)</span>
<span class="sd"> Pre-allocated array in which distances are stored.</span>
<span class="sd"> Returns</span>
<span class="sd"> -------</span>
<span class="sd"> labels : numpy array, dtype=np.int, shape (n_samples,)</span>
<span class="sd"> Indices of clusters that samples are assigned to.</span>
<span class="sd"> inertia : float</span>
<span class="sd"> Sum of squared distances of samples to their closest cluster center.</span>
<span class="sd"> """</span>
<span class="n">C</span> <span class="o">=</span> <span class="n">centers</span>
<span class="c1"># Distances to each centre C. (the `distances` parameter is the distance to the closest centre)</span>
<span class="c1"># K-mean original uses squared distances but this equivalent for constrained k-means</span>
<span class="n">D</span> <span class="o">=</span> <span class="n">euclidean_distances</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">squared</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
<span class="n">edges</span><span class="p">,</span> <span class="n">costs</span><span class="p">,</span> <span class="n">capacities</span><span class="p">,</span> <span class="n">supplies</span><span class="p">,</span> <span class="n">n_C</span><span class="p">,</span> <span class="n">n_X</span> <span class="o">=</span> <span class="n">minimum_cost_flow_problem_graph</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">size_min</span><span class="p">,</span> <span class="n">size_max</span><span class="p">)</span>
<span class="n">labels</span> <span class="o">=</span> <span class="n">solve_min_cost_flow_graph</span><span class="p">(</span><span class="n">edges</span><span class="p">,</span> <span class="n">costs</span><span class="p">,</span> <span class="n">capacities</span><span class="p">,</span> <span class="n">supplies</span><span class="p">,</span> <span class="n">n_C</span><span class="p">,</span> <span class="n">n_X</span><span class="p">)</span>
<span class="c1"># cython k-means M step code assumes int32 inputs</span>
<span class="n">labels</span> <span class="o">=</span> <span class="n">labels</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">int32</span><span class="p">)</span>
<span class="c1"># Change distances in-place</span>
<span class="n">distances</span><span class="p">[:]</span> <span class="o">=</span> <span class="n">D</span><span class="p">[</span><span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="n">D</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]),</span> <span class="n">labels</span><span class="p">]</span> <span class="o">**</span> <span class="mi">2</span> <span class="c1"># Square for M step of EM</span>
<span class="n">inertia</span> <span class="o">=</span> <span class="n">distances</span><span class="o">.</span><span class="n">sum</span><span class="p">()</span>
<span class="k">return</span> <span class="n">labels</span><span class="p">,</span> <span class="n">inertia</span>
<span class="k">def</span> <span class="nf">minimum_cost_flow_problem_graph</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">size_min</span><span class="p">,</span> <span class="n">size_max</span><span class="p">):</span>
<span class="c1"># Setup minimum cost flow formulation graph</span>
<span class="c1"># Vertices indexes:</span>
<span class="c1"># X-nodes: [0, n(x)-1], C-nodes: [n(X), n(X)+n(C)-1], C-dummy nodes:[n(X)+n(C), n(X)+2*n(C)-1],</span>
<span class="c1"># Artificial node: [n(X)+2*n(C), n(X)+2*n(C)+1-1]</span>
<span class="c1"># Create indices of nodes</span>
<span class="n">n_X</span> <span class="o">=</span> <span class="n">X</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="n">n_C</span> <span class="o">=</span> <span class="n">C</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="n">X_ix</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="n">n_X</span><span class="p">)</span>
<span class="n">C_dummy_ix</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="n">X_ix</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="n">X_ix</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o">+</span> <span class="mi">1</span> <span class="o">+</span> <span class="n">n_C</span><span class="p">)</span>
<span class="n">C_ix</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="n">C_dummy_ix</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="n">C_dummy_ix</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o">+</span> <span class="mi">1</span> <span class="o">+</span> <span class="n">n_C</span><span class="p">)</span>
<span class="n">art_ix</span> <span class="o">=</span> <span class="n">C_ix</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o">+</span> <span class="mi">1</span>
<span class="c1"># Edges</span>
<span class="n">edges_X_C_dummy</span> <span class="o">=</span> <span class="n">cartesian</span><span class="p">([</span><span class="n">X_ix</span><span class="p">,</span> <span class="n">C_dummy_ix</span><span class="p">])</span> <span class="c1"># All X's connect to all C dummy nodes (C')</span>
<span class="n">edges_C_dummy_C</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">stack</span><span class="p">([</span><span class="n">C_dummy_ix</span><span class="p">,</span> <span class="n">C_ix</span><span class="p">],</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="c1"># Each C' connects to a corresponding C (centroid)</span>
<span class="n">edges_C_art</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">stack</span><span class="p">([</span><span class="n">C_ix</span><span class="p">,</span> <span class="n">art_ix</span> <span class="o">*</span> <span class="n">np</span><span class="o">.</span><span class="n">ones</span><span class="p">(</span><span class="n">n_C</span><span class="p">)],</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="c1"># All C connect to artificial node</span>
<span class="n">edges</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">concatenate</span><span class="p">([</span><span class="n">edges_X_C_dummy</span><span class="p">,</span> <span class="n">edges_C_dummy_C</span><span class="p">,</span> <span class="n">edges_C_art</span><span class="p">])</span>
<span class="c1"># Costs</span>
<span class="n">costs_X_C_dummy</span> <span class="o">=</span> <span class="n">D</span><span class="o">.</span><span class="n">reshape</span><span class="p">(</span><span class="n">D</span><span class="o">.</span><span class="n">size</span><span class="p">)</span>
<span class="n">costs</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">concatenate</span><span class="p">([</span><span class="n">costs_X_C_dummy</span><span class="p">,</span> <span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">edges</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">-</span> <span class="nb">len</span><span class="p">(</span><span class="n">costs_X_C_dummy</span><span class="p">))])</span>
<span class="c1"># Capacities - can set for max-k</span>
<span class="n">capacities_C_dummy_C</span> <span class="o">=</span> <span class="n">size_max</span> <span class="o">*</span> <span class="n">np</span><span class="o">.</span><span class="n">ones</span><span class="p">(</span><span class="n">n_C</span><span class="p">)</span>
<span class="n">cap_non</span> <span class="o">=</span> <span class="n">n_X</span> <span class="c1"># The total supply and therefore wont restrict flow</span>
<span class="n">capacities</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">concatenate</span><span class="p">([</span>
<span class="n">np</span><span class="o">.</span><span class="n">ones</span><span class="p">(</span><span class="n">edges_X_C_dummy</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]),</span>
<span class="n">capacities_C_dummy_C</span><span class="p">,</span>
<span class="n">cap_non</span> <span class="o">*</span> <span class="n">np</span><span class="o">.</span><span class="n">ones</span><span class="p">(</span><span class="n">n_C</span><span class="p">)</span>
<span class="p">])</span>
<span class="c1"># Sources and sinks</span>
<span class="n">supplies_X</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">ones</span><span class="p">(</span><span class="n">n_X</span><span class="p">)</span>
<span class="n">supplies_C</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span> <span class="o">*</span> <span class="n">size_min</span> <span class="o">*</span> <span class="n">np</span><span class="o">.</span><span class="n">ones</span><span class="p">(</span><span class="n">n_C</span><span class="p">)</span> <span class="c1"># Demand node</span>
<span class="n">supplies_art</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span> <span class="o">*</span> <span class="p">(</span><span class="n">n_X</span> <span class="o">-</span> <span class="n">n_C</span> <span class="o">*</span> <span class="n">size_min</span><span class="p">)</span> <span class="c1"># Demand node</span>
<span class="n">supplies</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">concatenate</span><span class="p">([</span>
<span class="n">supplies_X</span><span class="p">,</span>
<span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">n_C</span><span class="p">),</span> <span class="c1"># C_dummies</span>
<span class="n">supplies_C</span><span class="p">,</span>
<span class="p">[</span><span class="n">supplies_art</span><span class="p">]</span>
<span class="p">])</span>
<span class="c1"># All arrays must be of int dtype for `SimpleMinCostFlow`</span>
<span class="n">edges</span> <span class="o">=</span> <span class="n">edges</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="s1">'int32'</span><span class="p">)</span>
<span class="n">costs</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">around</span><span class="p">(</span><span class="n">costs</span> <span class="o">*</span> <span class="mi">1000</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="s1">'int32'</span><span class="p">)</span> <span class="c1"># Times by 1000 to give extra precision</span>
<span class="n">capacities</span> <span class="o">=</span> <span class="n">capacities</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="s1">'int32'</span><span class="p">)</span>
<span class="n">supplies</span> <span class="o">=</span> <span class="n">supplies</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="s1">'int32'</span><span class="p">)</span>
<span class="k">return</span> <span class="n">edges</span><span class="p">,</span> <span class="n">costs</span><span class="p">,</span> <span class="n">capacities</span><span class="p">,</span> <span class="n">supplies</span><span class="p">,</span> <span class="n">n_C</span><span class="p">,</span> <span class="n">n_X</span>
<span class="k">def</span> <span class="nf">solve_min_cost_flow_graph</span><span class="p">(</span><span class="n">edges</span><span class="p">,</span> <span class="n">costs</span><span class="p">,</span> <span class="n">capacities</span><span class="p">,</span> <span class="n">supplies</span><span class="p">,</span> <span class="n">n_C</span><span class="p">,</span> <span class="n">n_X</span><span class="p">):</span>
<span class="c1"># Instantiate a SimpleMinCostFlow solver.</span>
<span class="n">min_cost_flow</span> <span class="o">=</span> <span class="n">SimpleMinCostFlowVectorized</span><span class="p">()</span>
<span class="k">if</span> <span class="p">(</span><span class="n">edges</span><span class="o">.</span><span class="n">dtype</span> <span class="o">!=</span> <span class="s1">'int32'</span><span class="p">)</span> <span class="ow">or</span> <span class="p">(</span><span class="n">costs</span><span class="o">.</span><span class="n">dtype</span> <span class="o">!=</span> <span class="s1">'int32'</span><span class="p">)</span> \
<span class="ow">or</span> <span class="p">(</span><span class="n">capacities</span><span class="o">.</span><span class="n">dtype</span> <span class="o">!=</span> <span class="s1">'int32'</span><span class="p">)</span> <span class="ow">or</span> <span class="p">(</span><span class="n">supplies</span><span class="o">.</span><span class="n">dtype</span> <span class="o">!=</span> <span class="s1">'int32'</span><span class="p">):</span>
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s2">"`edges`, `costs`, `capacities`, `supplies` must all be int dtype"</span><span class="p">)</span>
<span class="n">N_edges</span> <span class="o">=</span> <span class="n">edges</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="n">N_nodes</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">supplies</span><span class="p">)</span>
<span class="c1"># Add each edge with associated capacities and cost</span>
<span class="n">min_cost_flow</span><span class="o">.</span><span class="n">AddArcWithCapacityAndUnitCostVectorized</span><span class="p">(</span><span class="n">edges</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">],</span> <span class="n">edges</span><span class="p">[:,</span> <span class="mi">1</span><span class="p">],</span> <span class="n">capacities</span><span class="p">,</span> <span class="n">costs</span><span class="p">)</span>
<span class="c1"># Add node supplies</span>
<span class="n">min_cost_flow</span><span class="o">.</span><span class="n">SetNodeSupplyVectorized</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="n">N_nodes</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="s1">'int32'</span><span class="p">),</span> <span class="n">supplies</span><span class="p">)</span>
<span class="c1"># Find the minimum cost flow between node 0 and node 4.</span>
<span class="k">if</span> <span class="n">min_cost_flow</span><span class="o">.</span><span class="n">Solve</span><span class="p">()</span> <span class="o">!=</span> <span class="n">min_cost_flow</span><span class="o">.</span><span class="n">OPTIMAL</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">'There was an issue with the min cost flow input.'</span><span class="p">)</span>
<span class="c1"># Assignment</span>
<span class="n">labels_M</span> <span class="o">=</span> <span class="n">min_cost_flow</span><span class="o">.</span><span class="n">FlowVectorized</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="n">n_X</span> <span class="o">*</span> <span class="n">n_C</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="s1">'int32'</span><span class="p">))</span><span class="o">.</span><span class="n">reshape</span><span class="p">(</span><span class="n">n_X</span><span class="p">,</span> <span class="n">n_C</span><span class="p">)</span>
<span class="n">labels</span> <span class="o">=</span> <span class="n">labels_M</span><span class="o">.</span><span class="n">argmax</span><span class="p">(</span><span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="k">return</span> <span class="n">labels</span>
<div class="viewcode-block" id="KMeansConstrained"><a class="viewcode-back" href="../../index.html#k_means_constrained.KMeansConstrained">[docs]</a><span class="k">class</span> <span class="nc">KMeansConstrained</span><span class="p">(</span><span class="n">KMeans</span><span class="p">):</span>
<span class="sd">"""K-Means clustering with minimum and maximum cluster size constraints</span>
<span class="sd"> Parameters</span>
<span class="sd"> ----------</span>
<span class="sd"> n_clusters : int, optional, default: 8</span>
<span class="sd"> The number of clusters to form as well as the number of</span>
<span class="sd"> centroids to generate.</span>
<span class="sd"> size_min : int, optional, default: None</span>
<span class="sd"> Constrain the label assignment so that each cluster has a minimum</span>
<span class="sd"> size of size_min. If None, no constrains will be applied</span>
<span class="sd"> size_max : int, optional, default: None</span>
<span class="sd"> Constrain the label assignment so that each cluster has a maximum</span>
<span class="sd"> size of size_max. If None, no constrains will be applied</span>
<span class="sd"> init : {'k-means++', 'random' or an ndarray}</span>
<span class="sd"> Method for initialization, defaults to 'k-means++':</span>
<span class="sd"> 'k-means++' : selects initial cluster centers for k-mean</span>
<span class="sd"> clustering in a smart way to speed up convergence. See section</span>
<span class="sd"> Notes in k_init for more details.</span>
<span class="sd"> 'random': choose k observations (rows) at random from data for</span>
<span class="sd"> the initial centroids.</span>
<span class="sd"> If an ndarray is passed, it should be of shape (n_clusters, n_features)</span>
<span class="sd"> and gives the initial centers.</span>
<span class="sd"> n_init : int, default: 10</span>
<span class="sd"> Number of times the k-means algorithm will be run with different</span>
<span class="sd"> centroid seeds. The final results will be the best output of</span>
<span class="sd"> n_init consecutive runs in terms of inertia.</span>
<span class="sd"> max_iter : int, default: 300</span>
<span class="sd"> Maximum number of iterations of the k-means algorithm for a</span>
<span class="sd"> single run.</span>
<span class="sd"> tol : float, default: 1e-4</span>
<span class="sd"> Relative tolerance with regards to Frobenius norm of the difference</span>
<span class="sd"> in the cluster centers of two consecutive iterations to declare</span>
<span class="sd"> convergence.</span>
<span class="sd"> verbose : int, default 0</span>
<span class="sd"> Verbosity mode.</span>
<span class="sd"> random_state : int, RandomState instance or None, optional, default: None</span>
<span class="sd"> If int, random_state is the seed used by the random number generator;</span>
<span class="sd"> If RandomState instance, random_state is the random number generator;</span>
<span class="sd"> If None, the random number generator is the RandomState instance used</span>
<span class="sd"> by `np.random`.</span>
<span class="sd"> copy_x : boolean, default True</span>
<span class="sd"> When pre-computing distances it is more numerically accurate to center</span>
<span class="sd"> the data first. If copy_x is True, then the original data is not</span>
<span class="sd"> modified. If False, the original data is modified, and put back before</span>
<span class="sd"> the function returns, but small numerical differences may be introduced</span>
<span class="sd"> by subtracting and then adding the data mean.</span>
<span class="sd"> n_jobs : int</span>
<span class="sd"> The number of jobs to use for the computation. This works by computing</span>
<span class="sd"> each of the n_init runs in parallel.</span>
<span class="sd"> If -1 all CPUs are used. If 1 is given, no parallel computing code is</span>
<span class="sd"> used at all, which is useful for debugging. For n_jobs below -1,</span>
<span class="sd"> (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one</span>
<span class="sd"> are used.</span>
<span class="sd"> Attributes</span>
<span class="sd"> ----------</span>
<span class="sd"> cluster_centers_ : array, [n_clusters, n_features]</span>
<span class="sd"> Coordinates of cluster centers</span>
<span class="sd"> labels_ :</span>
<span class="sd"> Labels of each point</span>
<span class="sd"> inertia_ : float</span>
<span class="sd"> Sum of squared distances of samples to their closest cluster center.</span>
<span class="sd"> Examples</span>
<span class="sd"> --------</span>
<span class="sd"> >>> from k_means_constrained import KMeansConstrained</span>
<span class="sd"> >>> import numpy as np</span>
<span class="sd"> >>> X = np.array([[1, 2], [1, 4], [1, 0],</span>
<span class="sd"> ... [4, 2], [4, 4], [4, 0]])</span>
<span class="sd"> >>> clf = KMeansConstrained(</span>
<span class="sd"> ... n_clusters=2,</span>
<span class="sd"> ... size_min=2,</span>
<span class="sd"> ... size_max=5,</span>
<span class="sd"> ... random_state=0</span>
<span class="sd"> ... )</span>
<span class="sd"> >>> clf.fit_predict(X)</span>
<span class="sd"> array([0, 0, 0, 1, 1, 1], dtype=int32)</span>
<span class="sd"> >>> clf.cluster_centers_</span>
<span class="sd"> array([[ 1., 2.],</span>
<span class="sd"> [ 4., 2.]])</span>
<span class="sd"> >>> clf.labels_</span>
<span class="sd"> array([0, 0, 0, 1, 1, 1], dtype=int32)</span>
<span class="sd"> Notes</span>
<span class="sd"> ------</span>
<span class="sd"> K-means problem constrained with a minimum and/or maximum size for each cluster.</span>
<span class="sd"> The constrained assignment is formulated as a Minimum Cost Flow (MCF) linear network optimisation</span>
<span class="sd"> problem. This is then solved using a cost-scaling push-relabel algorithm. The implementation used is</span>
<span class="sd"> Google's Operations Research tools's `SimpleMinCostFlow`.</span>
<span class="sd"> Ref:</span>
<span class="sd"> 1. Bradley, P. S., K. P. Bennett, and Ayhan Demiriz. "Constrained k-means clustering."</span>
<span class="sd"> Microsoft Research, Redmond (2000): 1-8.</span>
<span class="sd"> 2. Google's SimpleMinCostFlow implementation:</span>
<span class="sd"> https://github.com/google/or-tools/blob/master/ortools/graph/min_cost_flow.h</span>
<span class="sd"> """</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">n_clusters</span><span class="o">=</span><span class="mi">8</span><span class="p">,</span> <span class="n">size_min</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">size_max</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">init</span><span class="o">=</span><span class="s1">'k-means++'</span><span class="p">,</span> <span class="n">n_init</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">max_iter</span><span class="o">=</span><span class="mi">300</span><span class="p">,</span> <span class="n">tol</span><span class="o">=</span><span class="mf">1e-4</span><span class="p">,</span>
<span class="n">verbose</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">copy_x</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">n_jobs</span><span class="o">=</span><span class="mi">1</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">size_min</span> <span class="o">=</span> <span class="n">size_min</span>
<span class="bp">self</span><span class="o">.</span><span class="n">size_max</span> <span class="o">=</span> <span class="n">size_max</span>
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="n">n_clusters</span><span class="o">=</span><span class="n">n_clusters</span><span class="p">,</span> <span class="n">init</span><span class="o">=</span><span class="n">init</span><span class="p">,</span> <span class="n">n_init</span><span class="o">=</span><span class="n">n_init</span><span class="p">,</span> <span class="n">max_iter</span><span class="o">=</span><span class="n">max_iter</span><span class="p">,</span> <span class="n">tol</span><span class="o">=</span><span class="n">tol</span><span class="p">,</span>
<span class="n">verbose</span><span class="o">=</span><span class="n">verbose</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="n">random_state</span><span class="p">,</span> <span class="n">copy_x</span><span class="o">=</span><span class="n">copy_x</span><span class="p">,</span> <span class="n">n_jobs</span><span class="o">=</span><span class="n">n_jobs</span><span class="p">)</span>
<div class="viewcode-block" id="KMeansConstrained.fit"><a class="viewcode-back" href="../../index.html#k_means_constrained.KMeansConstrained.fit">[docs]</a> <span class="k">def</span> <span class="nf">fit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="sd">"""Compute k-means clustering with given constants.</span>
<span class="sd"> Parameters</span>
<span class="sd"> ----------</span>
<span class="sd"> X : array-like, shape=(n_samples, n_features)</span>
<span class="sd"> Training instances to cluster.</span>
<span class="sd"> y : Ignored</span>
<span class="sd"> """</span>
<span class="k">if</span> <span class="n">sp</span><span class="o">.</span><span class="n">issparse</span><span class="p">(</span><span class="n">X</span><span class="p">):</span>
<span class="k">raise</span> <span class="ne">NotImplementedError</span><span class="p">(</span><span class="s2">"Not implemented for sparse X"</span><span class="p">)</span>
<span class="n">random_state</span> <span class="o">=</span> <span class="n">check_random_state</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">random_state</span><span class="p">)</span>
<span class="n">X</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_check_fit_data</span><span class="p">(</span><span class="n">X</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">cluster_centers_</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">labels_</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">inertia_</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">n_iter_</span> <span class="o">=</span> \
<span class="n">k_means_constrained</span><span class="p">(</span>
<span class="n">X</span><span class="p">,</span> <span class="n">n_clusters</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">n_clusters</span><span class="p">,</span>
<span class="n">size_min</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">size_min</span><span class="p">,</span> <span class="n">size_max</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">size_max</span><span class="p">,</span>
<span class="n">init</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">init</span><span class="p">,</span>
<span class="n">n_init</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">n_init</span><span class="p">,</span> <span class="n">max_iter</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">max_iter</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">verbose</span><span class="p">,</span>
<span class="n">tol</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">tol</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="n">random_state</span><span class="p">,</span> <span class="n">copy_x</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">copy_x</span><span class="p">,</span>
<span class="n">n_jobs</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">n_jobs</span><span class="p">,</span>
<span class="n">return_n_iter</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="k">return</span> <span class="bp">self</span></div>
<div class="viewcode-block" id="KMeansConstrained.predict"><a class="viewcode-back" href="../../index.html#k_means_constrained.KMeansConstrained.predict">[docs]</a> <span class="k">def</span> <span class="nf">predict</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">size_min</span><span class="o">=</span><span class="s1">'init'</span><span class="p">,</span> <span class="n">size_max</span><span class="o">=</span><span class="s1">'init'</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> Predict the closest cluster each sample in X belongs to given the provided constraints.</span>
<span class="sd"> The constraints can be temporally overridden when determining which cluster each datapoint is assigned to.</span>
<span class="sd"> Only computes the assignment step. It does not re-fit the cluster positions.</span>
<span class="sd"> Parameters</span>
<span class="sd"> ----------</span>
<span class="sd"> X : array-like, shape = [n_samples, n_features]</span>
<span class="sd"> New data to predict.</span>
<span class="sd"> size_min : int, optional, default: size_min provided with initialisation</span>
<span class="sd"> Constrain the label assignment so that each cluster has a minimum</span>
<span class="sd"> size of size_min. If None, no constrains will be applied.</span>
<span class="sd"> If 'init' the value provided during initialisation of the</span>
<span class="sd"> class will be used.</span>
<span class="sd"> size_max : int, optional, default: size_max provided with initialisation</span>
<span class="sd"> Constrain the label assignment so that each cluster has a maximum</span>
<span class="sd"> size of size_max. If None, no constrains will be applied.</span>
<span class="sd"> If 'init' the value provided during initialisation of the</span>
<span class="sd"> class will be used.</span>
<span class="sd"> Returns</span>
<span class="sd"> -------</span>
<span class="sd"> labels : array, shape [n_samples,]</span>
<span class="sd"> Index of the cluster each sample belongs to.</span>
<span class="sd"> """</span>
<span class="k">if</span> <span class="n">sp</span><span class="o">.</span><span class="n">issparse</span><span class="p">(</span><span class="n">X</span><span class="p">):</span>
<span class="k">raise</span> <span class="ne">NotImplementedError</span><span class="p">(</span><span class="s2">"Not implemented for sparse X"</span><span class="p">)</span>
<span class="k">if</span> <span class="n">size_min</span> <span class="o">==</span> <span class="s1">'init'</span><span class="p">:</span>
<span class="n">size_min</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">size_min</span>
<span class="k">if</span> <span class="n">size_max</span> <span class="o">==</span> <span class="s1">'init'</span><span class="p">:</span>
<span class="n">size_max</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">size_max</span>
<span class="n">n_clusters</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">n_clusters</span>
<span class="n">n_samples</span> <span class="o">=</span> <span class="n">X</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="n">check_is_fitted</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="s1">'cluster_centers_'</span><span class="p">)</span>
<span class="n">X</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_check_test_data</span><span class="p">(</span><span class="n">X</span><span class="p">)</span>
<span class="c1"># Allocate memory to store the distances for each sample to its</span>
<span class="c1"># closer center for reallocation in case of ties</span>
<span class="n">distances</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="n">n_samples</span><span class="p">,),</span> <span class="n">dtype</span><span class="o">=</span><span class="n">X</span><span class="o">.</span><span class="n">dtype</span><span class="p">)</span>
<span class="c1"># Determine min and max sizes if non given</span>
<span class="k">if</span> <span class="n">size_min</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
<span class="n">size_min</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">if</span> <span class="n">size_max</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
<span class="n">size_max</span> <span class="o">=</span> <span class="n">n_samples</span> <span class="c1"># Number of data points</span>
<span class="c1"># Check size min and max</span>
<span class="k">if</span> <span class="ow">not</span> <span class="p">((</span><span class="n">size_min</span> <span class="o">>=</span> <span class="mi">0</span><span class="p">)</span> <span class="ow">and</span> <span class="p">(</span><span class="n">size_min</span> <span class="o"><=</span> <span class="n">n_samples</span><span class="p">)</span>
<span class="ow">and</span> <span class="p">(</span><span class="n">size_max</span> <span class="o">>=</span> <span class="mi">0</span><span class="p">)</span> <span class="ow">and</span> <span class="p">(</span><span class="n">size_max</span> <span class="o"><=</span> <span class="n">n_samples</span><span class="p">)):</span>
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s2">"size_min and size_max must be a positive number smaller "</span>
<span class="s2">"than the number of data points or `None`"</span><span class="p">)</span>
<span class="k">if</span> <span class="n">size_max</span> <span class="o"><</span> <span class="n">size_min</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s2">"size_max must be larger than size_min"</span><span class="p">)</span>
<span class="k">if</span> <span class="n">size_min</span> <span class="o">*</span> <span class="n">n_clusters</span> <span class="o">></span> <span class="n">n_samples</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s2">"The product of size_min and n_clusters cannot exceed the number of samples (X)"</span><span class="p">)</span>
<span class="n">labels</span><span class="p">,</span> <span class="n">inertia</span> <span class="o">=</span> \
<span class="n">_labels_constrained</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">cluster_centers_</span><span class="p">,</span> <span class="n">size_min</span><span class="p">,</span> <span class="n">size_max</span><span class="p">,</span> <span class="n">distances</span><span class="o">=</span><span class="n">distances</span><span class="p">)</span>
<span class="k">return</span> <span class="n">labels</span></div>
<div class="viewcode-block" id="KMeansConstrained.fit_predict"><a class="viewcode-back" href="../../index.html#k_means_constrained.KMeansConstrained.fit_predict">[docs]</a> <span class="k">def</span> <span class="nf">fit_predict</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="sd">"""Compute cluster centers and predict cluster index for each sample.</span>
<span class="sd"> Equivalent to calling fit(X) followed by predict(X) but also more efficient.</span>
<span class="sd"> Parameters</span>
<span class="sd"> ----------</span>
<span class="sd"> X : {array-like, sparse matrix}, shape = [n_samples, n_features]</span>
<span class="sd"> New data to transform.</span>
<span class="sd"> Returns</span>
<span class="sd"> -------</span>
<span class="sd"> labels : array, shape [n_samples,]</span>
<span class="sd"> Index of the cluster each sample belongs to.</span>
<span class="sd"> """</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">X</span><span class="p">)</span><span class="o">.</span><span class="n">labels_</span></div></div>
</pre></div>
</div>
</div>
<footer>
<hr/>
<div role="contentinfo">
<p>
© Copyright 2020, Josh Levy-Kramer. Documentation derived from Scikit-Learn.
</p>
</div>
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<script type="text/javascript">
jQuery(function () {
SphinxRtdTheme.Navigation.enable(true);
});
</script>
</body>
</html>
================================================
FILE: docs/_modules/k_means_constrained/sklearn_cluster/k_means_.html
================================================
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8" />
<title>k_means_constrained.sklearn_cluster.k_means_ — k-means-constrained 0.0.2 documentation</title>
<link rel="stylesheet" href="../../../_static/alabaster.css" type="text/css" />
<link rel="stylesheet" href="../../../_static/pygments.css" type="text/css" />
<script type="text/javascript" id="documentation_options" data-url_root="../../../" src="../../../_static/documentation_options.js"></script>
<script type="text/javascript" src="../../../_static/jquery.js"></script>
<script type="text/javascript" src="../../../_static/underscore.js"></script>
<script type="text/javascript" src="../../../_static/doctools.js"></script>
<script type="text/javascript" src="../../../_static/language_data.js"></script>
<link rel="index" title="Index" href="../../../genindex.html" />
<link rel="search" title="Search" href="../../../search.html" />
<link rel="stylesheet" href="../../../_static/custom.css" type="text/css" />
<meta name="viewport" content="width=device-width, initial-scale=0.9, maximum-scale=0.9" />
</head><body>
<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body" role="main">
<h1>Source code for k_means_constrained.sklearn_cluster.k_means_</h1><div class="highlight"><pre>
<span></span><span class="sd">"""K-means clustering"""</span>
<span class="c1"># Authors: Gael Varoquaux <gael.varoquaux@normalesup.org></span>
<span class="c1"># Thomas Rueckstiess <ruecksti@in.tum.de></span>
<span class="c1"># James Bergstra <james.bergstra@umontreal.ca></span>
<span class="c1"># Jan Schlueter <scikit-learn@jan-schlueter.de></span>
<span class="c1"># Nelle Varoquaux</span>
<span class="c1"># Peter Prettenhofer <peter.prettenhofer@gmail.com></span>
<span class="c1"># Olivier Grisel <olivier.grisel@ensta.org></span>
<span class="c1"># Mathieu Blondel <mathieu@mblondel.org></span>
<span class="c1"># Robert Layton <robertlayton@gmail.com></span>
<span class="c1"># License: BSD 3 clause</span>
<span class="kn">import</span> <span class="nn">warnings</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
<span class="kn">import</span> <span class="nn">scipy.sparse</span> <span class="k">as</span> <span class="nn">sp</span>
<span class="kn">from</span> <span class="nn">sklearn.base</span> <span class="k">import</span> <span class="n">BaseEstimator</span><span class="p">,</span> <span class="n">ClusterMixin</span><span class="p">,</span> <span class="n">TransformerMixin</span>
<span class="kn">from</span> <span class="nn">sklearn.externals.six</span> <span class="k">import</span> <span class="n">string_types</span>
<span class="kn">from</span> <span class="nn">sklearn.metrics.pairwise</span> <span class="k">import</span> <span class="n">euclidean_distances</span>
<span class="kn">from</span> <span class="nn">sklearn.metrics.pairwise</span> <span class="k">import</span> <span class="n">pairwise_distances_argmin_min</span>
<span class="kn">from</span> <span class="nn">sklearn.utils</span> <span class="k">import</span> <span class="n">check_array</span>
<span class="kn">from</span> <span class="nn">sklearn.utils</span> <span class="k">import</span> <span class="n">check_random_state</span>
<span class="kn">from</span> <span class="nn">sklearn.utils.extmath</span> <span class="k">import</span> <span class="n">row_norms</span><span class="p">,</span> <span class="n">stable_cumsum</span>
<span class="kn">from</span> <span class="nn">sklearn.utils.sparsefuncs</span> <span class="k">import</span> <span class="n">mean_variance_axis</span>
<span class="kn">from</span> <span class="nn">sklearn.utils.validation</span> <span class="k">import</span> <span class="n">FLOAT_DTYPES</span>
<span class="kn">from</span> <span class="nn">sklearn.utils.validation</span> <span class="k">import</span> <span class="n">check_is_fitted</span>
<span class="kn">from</span> <span class="nn">.</span> <span class="k">import</span> <span class="n">_k_means</span>
<span class="c1">###############################################################################</span>
<span class="c1"># Initialization heuristic</span>
<span class="k">def</span> <span class="nf">_k_init</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">n_clusters</span><span class="p">,</span> <span class="n">x_squared_norms</span><span class="p">,</span> <span class="n">random_state</span><span class="p">,</span> <span class="n">n_local_trials</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="sd">"""Init n_clusters seeds according to k-means++</span>
<span class="sd"> Parameters</span>
<span class="sd"> -----------</span>
<span class="sd"> X : array or sparse matrix, shape (n_samples, n_features)</span>
<span class="sd"> The data to pick seeds for. To avoid memory copy, the input data</span>
<span class="sd"> should be double precision (dtype=np.float64).</span>
<span class="sd"> n_clusters : integer</span>
<span class="sd"> The number of seeds to choose</span>
<span class="sd"> x_squared_norms : array, shape (n_samples,)</span>
<span class="sd"> Squared Euclidean norm of each data point.</span>
<span class="sd"> random_state : numpy.RandomState</span>
<span class="sd"> The generator used to initialize the centers.</span>
<span class="sd"> n_local_trials : integer, optional</span>
<span class="sd"> The number of seeding trials for each center (except the first),</span>
<span class="sd"> of which the one reducing inertia the most is greedily chosen.</span>
<span class="sd"> Set to None to make the number of trials depend logarithmically</span>
<span class="sd"> on the number of seeds (2+log(k)); this is the default.</span>
<span class="sd"> Notes</span>
<span class="sd"> -----</span>
<span class="sd"> Selects initial cluster centers for k-mean clustering in a smart way</span>
<span class="sd"> to speed up convergence. see: Arthur, D. and Vassilvitskii, S.</span>
<span class="sd"> "k-means++: the advantages of careful seeding". ACM-SIAM symposium</span>
<span class="sd"> on Discrete algorithms. 2007</span>
<span class="sd"> Version ported from http://www.stanford.edu/~darthur/kMeansppTest.zip,</span>
<span class="sd"> which is the implementation used in the aforementioned paper.</span>
<span class="sd"> """</span>
<span class="n">n_samples</span><span class="p">,</span> <span class="n">n_features</span> <span class="o">=</span> <span class="n">X</span><span class="o">.</span><span class="n">shape</span>
<span class="n">centers</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">empty</span><span class="p">((</span><span class="n">n_clusters</span><span class="p">,</span> <span class="n">n_features</span><span class="p">),</span> <span class="n">dtype</span><span class="o">=</span><span class="n">X</span><span class="o">.</span><span class="n">dtype</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">x_squared_norms</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">,</span> <span class="s1">'x_squared_norms None in _k_init'</span>
<span class="c1"># Set the number of local seeding trials if none is given</span>
<span class="k">if</span> <span class="n">n_local_trials</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
<span class="c1"># This is what Arthur/Vassilvitskii tried, but did not report</span>
<span class="c1"># specific results for other than mentioning in the conclusion</span>
<span class="c1"># that it helped.</span>
<span class="n">n_local_trials</span> <span class="o">=</span> <span class="mi">2</span> <span class="o">+</span> <span class="nb">int</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="n">n_clusters</span><span class="p">))</span>
<span class="c1"># Pick first center randomly</span>
<span class="n">center_id</span> <span class="o">=</span> <span class="n">random_state</span><span class="o">.</span><span class="n">randint</span><span class="p">(</span><span class="n">n_samples</span><span class="p">)</span>
<span class="k">if</span> <span class="n">sp</span><span class="o">.</span><span class="n">issparse</span><span class="p">(</span><span class="n">X</span><span class="p">):</span>
<span class="n">centers</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">X</span><span class="p">[</span><span class="n">center_id</span><span class="p">]</span><span class="o">.</span><span class="n">toarray</span><span class="p">()</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">centers</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">X</span><span class="p">[</span><span class="n">center_id</span><span class="p">]</span>
<span class="c1"># Initialize list of closest distances and calculate current potential</span>
<span class="n">closest_dist_sq</span> <span class="o">=</span> <span class="n">euclidean_distances</span><span class="p">(</span>
<span class="n">centers</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="n">np</span><span class="o">.</span><span class="n">newaxis</span><span class="p">],</span> <span class="n">X</span><span class="p">,</span> <span class="n">Y_norm_squared</span><span class="o">=</span><span class="n">x_squared_norms</span><span class="p">,</span>
<span class="n">squared</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">current_pot</span> <span class="o">=</span> <span class="n">closest_dist_sq</span><span class="o">.</span><span class="n">sum</span><span class="p">()</span>
<span class="c1"># Pick the remaining n_clusters-1 points</span>
<span class="k">for</span> <span class="n">c</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">n_clusters</span><span class="p">):</span>
<span class="c1"># Choose center candidates by sampling with probability proportional</span>
<span class="c1"># to the squared distance to the closest existing center</span>
<span class="n">rand_vals</span> <span class="o">=</span> <span class="n">random_state</span><span class="o">.</span><span class="n">random_sample</span><span class="p">(</span><span class="n">n_local_trials</span><span class="p">)</span> <span class="o">*</span> <span class="n">current_pot</span>
<span class="n">candidate_ids</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">searchsorted</span><span class="p">(</span><span class="n">stable_cumsum</span><span class="p">(</span><span class="n">closest_dist_sq</span><span class="p">),</span>
<span class="n">rand_vals</span><span class="p">)</span>
<span class="c1"># Compute distances to center candidates</span>
<span class="n">distance_to_candidates</span> <span class="o">=</span> <span class="n">euclidean_distances</span><span class="p">(</span>
<span class="n">X</span><span class="p">[</span><span class="n">candidate_ids</span><span class="p">],</span> <span class="n">X</span><span class="p">,</span> <span class="n">Y_norm_squared</span><span class="o">=</span><span class="n">x_squared_norms</span><span class="p">,</span> <span class="n">squared</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="c1"># Decide which candidate is the best</span>
<span class="n">best_candidate</span> <span class="o">=</span> <span class="kc">None</span>
<span class="n">best_pot</span> <span class="o">=</span> <span class="kc">None</span>
<span class="n">best_dist_sq</span> <span class="o">=</span> <span class="kc">None</span>
<span class="k">for</span> <span class="n">trial</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">n_local_trials</span><span class="p">):</span>
<span class="c1"># Compute potential when including center candidate</span>
<span class="n">new_dist_sq</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">minimum</span><span class="p">(</span><span class="n">closest_dist_sq</span><span class="p">,</span>
<span class="n">distance_to_candidates</span><span class="p">[</span><span class="n">trial</span><span class="p">])</span>
<span class="n">new_pot</span> <span class="o">=</span> <span class="n">new_dist_sq</span><span class="o">.</span><span class="n">sum</span><span class="p">()</span>
<span class="c1"># Store result if it is the best local trial so far</span>
<span class="k">if</span> <span class="p">(</span><span class="n">best_candidate</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">)</span> <span class="ow">or</span> <span class="p">(</span><span class="n">new_pot</span> <span class="o"><</span> <span class="n">best_pot</span><span class="p">):</span>
<span class="n">best_candidate</span> <span class="o">=</span> <span class="n">candidate_ids</span><span class="p">[</span><span class="n">trial</span><span class="p">]</span>
<span class="n">best_pot</span> <span class="o">=</span> <span class="n">new_pot</span>
<span class="n">best_dist_sq</span> <span class="o">=</span> <span class="n">new_dist_sq</span>
<span class="c1"># Permanently add best center candidate found in local tries</span>
<span class="k">if</span> <span class="n">sp</span><span class="o">.</span><span class="n">issparse</span><span class="p">(</span><span class="n">X</span><span class="p">):</span>
<span class="n">centers</span><span class="p">[</span><span class="n">c</span><span class="p">]</span> <span class="o">=</span> <span class="n">X</span><span class="p">[</span><span class="n">best_candidate</span><span class="p">]</span><span class="o">.</span><span class="n">toarray</span><span class="p">()</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">centers</span><span class="p">[</span><span class="n">c</span><span class="p">]</span> <span class="o">=</span> <span class="n">X</span><span class="p">[</span><span class="n">best_candidate</span><span class="p">]</span>
<span class="n">current_pot</span> <span class="o">=</span> <span class="n">best_pot</span>
<span class="n">closest_dist_sq</span> <span class="o">=</span> <span class="n">best_dist_sq</span>
<span class="k">return</span> <span class="n">centers</span>
<span class="c1">###############################################################################</span>
<span class="c1"># K-means batch estimation by EM (expectation maximization)</span>
<span class="k">def</span> <span class="nf">_validate_center_shape</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">n_centers</span><span class="p">,</span> <span class="n">centers</span><span class="p">):</span>
<span class="sd">"""Check if centers is compatible with X and n_centers"""</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">centers</span><span class="p">)</span> <span class="o">!=</span> <span class="n">n_centers</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s1">'The shape of the initial centers (</span><span class="si">%s</span><span class="s1">) '</span>
<span class="s1">'does not match the number of clusters </span><span class="si">%i</span><span class="s1">'</span>
<span class="o">%</span> <span class="p">(</span><span class="n">centers</span><span class="o">.</span><span class="n">shape</span><span class="p">,</span> <span class="n">n_centers</span><span class="p">))</span>
<span class="k">if</span> <span class="n">centers</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">!=</span> <span class="n">X</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]:</span>
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span>
<span class="s2">"The number of features of the initial centers </span><span class="si">%s</span><span class="s2"> "</span>
<span class="s2">"does not match the number of features of the data </span><span class="si">%s</span><span class="s2">."</span>
<span class="o">%</span> <span class="p">(</span><span class="n">centers</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">X</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]))</span>
<span class="k">def</span> <span class="nf">_tolerance</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">tol</span><span class="p">):</span>
<span class="sd">"""Return a tolerance which is independent of the dataset"""</span>
<span class="k">if</span> <span class="n">sp</span><span class="o">.</span><span class="n">issparse</span><span class="p">(</span><span class="n">X</span><span class="p">):</span>
<span class="n">variances</span> <span class="o">=</span> <span class="n">mean_variance_axis</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)[</span><span class="mi">1</span><span class="p">]</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">variances</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">var</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="k">return</span> <span class="n">np</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="n">variances</span><span class="p">)</span> <span class="o">*</span> <span class="n">tol</span>
<span class="k">def</span> <span class="nf">_labels_inertia_precompute_dense</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">x_squared_norms</span><span class="p">,</span> <span class="n">centers</span><span class="p">,</span> <span class="n">distances</span><span class="p">):</span>
<span class="sd">"""Compute labels and inertia using a full distance matrix.</span>
<span class="sd"> This will overwrite the 'distances' array in-place.</span>
<span class="sd"> Parameters</span>
<span class="sd"> ----------</span>
<span class="sd"> X : numpy array, shape (n_sample, n_features)</span>
<span class="sd"> Input data.</span>
<span class="sd"> x_squared_norms : numpy array, shape (n_samples,)</span>
<span class="sd"> Precomputed squared norms of X.</span>
<span class="sd"> centers : numpy array, shape (n_clusters, n_features)</span>
<span class="sd"> Cluster centers which data is assigned to.</span>
<span class="sd"> distances : numpy array, shape (n_samples,)</span>
<span class="sd"> Pre-allocated array in which distances are stored.</span>
<span class="sd"> Returns</span>
<span class="sd"> -------</span>
<span class="sd"> labels : numpy array, dtype=np.int, shape (n_samples,)</span>
<span class="sd"> Indices of clusters that samples are assigned to.</span>
<span class="sd"> inertia : float</span>
<span class="sd"> Sum of distances of samples to their closest cluster center.</span>
<span class="sd"> """</span>
<span class="n">n_samples</span> <span class="o">=</span> <span class="n">X</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="c1"># Breakup nearest neighbor distance computation into batches to prevent</span>
<span class="c1"># memory blowup in the case of a large number of samples and clusters.</span>
<span class="c1"># TODO: Once PR #7383 is merged use check_inputs=False in metric_kwargs.</span>
<span class="n">labels</span><span class="p">,</span> <span class="n">mindist</span> <span class="o">=</span> <span class="n">pairwise_distances_argmin_min</span><span class="p">(</span>
<span class="n">X</span><span class="o">=</span><span class="n">X</span><span class="p">,</span> <span class="n">Y</span><span class="o">=</span><span class="n">centers</span><span class="p">,</span> <span class="n">metric</span><span class="o">=</span><span class="s1">'euclidean'</span><span class="p">,</span> <span class="n">metric_kwargs</span><span class="o">=</span><span class="p">{</span><span class="s1">'squared'</span><span class="p">:</span> <span class="kc">True</span><span class="p">})</span>
<span class="c1"># cython k-means code assumes int32 inputs</span>
<span class="n">labels</span> <span class="o">=</span> <span class="n">labels</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">int32</span><span class="p">)</span>
<span class="k">if</span> <span class="n">n_samples</span> <span class="o">==</span> <span class="n">distances</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]:</span>
<span class="c1"># distances will be changed in-place</span>
<span class="n">distances</span><span class="p">[:]</span> <span class="o">=</span> <span class="n">mindist</span>
<span class="n">inertia</span> <span class="o">=</span> <span class="n">mindist</span><span class="o">.</span><span class="n">sum</span><span class="p">()</span>
<span class="k">return</span> <span class="n">labels</span><span class="p">,</span> <span class="n">inertia</span>
<span class="k">def</span> <span class="nf">_labels_inertia</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">x_squared_norms</span><span class="p">,</span> <span class="n">centers</span><span class="p">,</span>
<span class="n">precompute_distances</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">distances</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="sd">"""E step of the K-means EM algorithm.</span>
<span class="sd"> Compute the labels and the inertia of the given samples and centers.</span>
<span class="sd"> This will compute the distances in-place.</span>
<span class="sd"> Parameters</span>
<span class="sd"> ----------</span>
<span class="sd"> X : float64 array-like or CSR sparse matrix, shape (n_samples, n_features)</span>
<span class="sd"> The input samples to assign to the labels.</span>
<span class="sd"> x_squared_norms : array, shape (n_samples,)</span>
<span class="sd"> Precomputed squared euclidean norm of each data point, to speed up</span>
<span class="sd"> computations.</span>
<span class="sd"> centers : float array, shape (k, n_features)</span>
<span class="sd"> The cluster centers.</span>
<span class="sd"> precompute_distances : boolean, default: True</span>
<span class="sd"> Precompute distances (faster but takes more memory).</span>
<span class="sd"> distances : float array, shape (n_samples,)</span>
<span class="sd"> Pre-allocated array to be filled in with each sample's distance</span>
<span class="sd"> to the closest center.</span>
<span class="sd"> Returns</span>
<span class="sd"> -------</span>
<span class="sd"> labels : int array of shape(n)</span>
<span class="sd"> The resulting assignment</span>
<span class="sd"> inertia : float</span>
<span class="sd"> Sum of distances of samples to their closest cluster center.</span>
<span class="sd"> """</span>
<span class="n">n_samples</span> <span class="o">=</span> <span class="n">X</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="c1"># set the default value of centers to -1 to be able to detect any anomaly</span>
<span class="c1"># easily</span>
<span class="n">labels</span> <span class="o">=</span> <span class="o">-</span><span class="n">np</span><span class="o">.</span><span class="n">ones</span><span class="p">(</span><span class="n">n_samples</span><span class="p">,</span> <span class="n">np</span><span class="o">.</span><span class="n">int32</span><span class="p">)</span>
<span class="k">if</span> <span class="n">distances</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
<span class="n">distances</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="mi">0</span><span class="p">,),</span> <span class="n">dtype</span><span class="o">=</span><span class="n">X</span><span class="o">.</span><span class="n">dtype</span><span class="p">)</span>
<span class="c1"># distances will be changed in-place</span>
<span class="k">if</span> <span class="n">sp</span><span class="o">.</span><span class="n">issparse</span><span class="p">(</span><span class="n">X</span><span class="p">):</span>
<span class="n">inertia</span> <span class="o">=</span> <span class="n">_k_means</span><span class="o">.</span><span class="n">_assign_labels_csr</span><span class="p">(</span>
<span class="n">X</span><span class="p">,</span> <span class="n">x_squared_norms</span><span class="p">,</span> <span class="n">centers</span><span class="p">,</span> <span class="n">labels</span><span class="p">,</span> <span class="n">distances</span><span class="o">=</span><span class="n">distances</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">if</span> <span class="n">precompute_distances</span><span class="p">:</span>
<span class="k">return</span> <span class="n">_labels_inertia_precompute_dense</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">x_squared_norms</span><span class="p">,</span>
<span class="n">centers</span><span class="p">,</span> <span class="n">distances</span><span class="p">)</span>
<span class="n">inertia</span> <span class="o">=</span> <span class="n">_k_means</span><span class="o">.</span><span class="n">_assign_labels_array</span><span class="p">(</span>
<span class="n">X</span><span class="p">,</span> <span class="n">x_squared_norms</span><span class="p">,</span> <span class="n">centers</span><span class="p">,</span> <span class="n">labels</span><span class="p">,</span> <span class="n">distances</span><span class="o">=</span><span class="n">distances</span><span class="p">)</span>
<span class="k">return</span> <span class="n">labels</span><span class="p">,</span> <span class="n">inertia</span>
<span class="k">def</span> <span class="nf">_init_centroids</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">k</span><span class="p">,</span> <span class="n">init</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">x_squared_norms</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
<span class="n">init_size</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="sd">"""Compute the initial centroids</span>
<span class="sd"> Parameters</span>
<span class="sd"> ----------</span>
<span class="sd"> X : array, shape (n_samples, n_features)</span>
<span class="sd"> k : int</span>
<span class="sd"> number of centroids</span>
<span class="sd"> init : {'k-means++', 'random' or ndarray or callable} optional</span>
<span class="sd"> Method for initialization</span>
<span class="sd"> random_state : int, RandomState instance or None, optional, default: None</span>
<span class="sd"> If int, random_state is the seed used by the random number generator;</span>
<span class="sd"> If RandomState instance, random_state is the random number generator;</span>
<span class="sd"> If None, the random number generator is the RandomState instance used</span>
<span class="sd"> by `np.random`.</span>
<span class="sd"> x_squared_norms : array, shape (n_samples,), optional</span>
<span class="sd"> Squared euclidean norm of each data point. Pass it if you have it at</span>
<span class="sd"> hands already to avoid it being recomputed here. Default: None</span>
<span class="sd"> init_size : int, optional</span>
<span class="sd"> Number of samples to randomly sample for speeding up the</span>
<span class="sd"> initialization (sometimes at the expense of accuracy): the</span>
<span class="sd"> only algorithm is initialized by running a batch KMeans on a</span>
<span class="sd"> random subset of the data. This needs to be larger than k.</span>
<span class="sd"> Returns</span>
<span class="sd"> -------</span>
<span class="sd"> centers : array, shape(k, n_features)</span>
<span class="sd"> """</span>
<span class="n">random_state</span> <span class="o">=</span> <span class="n">check_random_state</span><span class="p">(</span><span class="n">random_state</span><span class="p">)</span>
<span class="n">n_samples</span> <span class="o">=</span> <span class="n">X</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="k">if</span> <span class="n">x_squared_norms</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
<span class="n">x_squared_norms</span> <span class="o">=</span> <span class="n">row_norms</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">squared</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="k">if</span> <span class="n">init_size</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span> <span class="ow">and</span> <span class="n">init_size</span> <span class="o"><</span> <span class="n">n_samples</span><span class="p">:</span>
<span class="k">if</span> <span class="n">init_size</span> <span class="o"><</span> <span class="n">k</span><span class="p">:</span>
<span class="n">warnings</span><span class="o">.</span><span class="n">warn</span><span class="p">(</span>
<span class="s2">"init_size=</span><span class="si">%d</span><span class="s2"> should be larger than k=</span><span class="si">%d</span><span class="s2">. "</span>
<span class="s2">"Setting it to 3*k"</span> <span class="o">%</span> <span class="p">(</span><span class="n">init_size</span><span class="p">,</span> <span class="n">k</span><span class="p">),</span>
<span class="ne">RuntimeWarning</span><span class="p">,</span> <span class="n">stacklevel</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
<span class="n">init_size</span> <span class="o">=</span> <span class="mi">3</span> <span class="o">*</span> <span class="n">k</span>
<span class="n">init_indices</span> <span class="o">=</span> <span class="n">random_state</span><span class="o">.</span><span class="n">randint</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">n_samples</span><span class="p">,</span> <span class="n">init_size</span><span class="p">)</span>
<span class="n">X</span> <span class="o">=</span> <span class="n">X</span><span class="p">[</span><span class="n">init_indices</span><span class="p">]</span>
<span class="n">x_squared_norms</span> <span class="o">=</span> <span class="n">x_squared_norms</span><span class="p">[</span><span class="n">init_indices</span><span class="p">]</span>
<span class="n">n_samples</span> <span class="o">=</span> <span class="n">X</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="k">elif</span> <span class="n">n_samples</span> <span class="o"><</span> <span class="n">k</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span>
<span class="s2">"n_samples=</span><span class="si">%d</span><span class="s2"> should be larger than k=</span><span class="si">%d</span><span class="s2">"</span> <span class="o">%</span> <span class="p">(</span><span class="n">n_samples</span><span class="p">,</span> <span class="n">k</span><span class="p">))</span>
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">init</span><span class="p">,</span> <span class="n">string_types</span><span class="p">)</span> <span class="ow">and</span> <span class="n">init</span> <span class="o">==</span> <span class="s1">'k-means++'</span><span class="p">:</span>
<span class="n">centers</span> <span class="o">=</span> <span class="n">_k_init</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">k</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="n">random_state</span><span class="p">,</span>
<span class="n">x_squared_norms</span><span class="o">=</span><span class="n">x_squared_norms</span><span class="p">)</span>
<span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">init</span><span class="p">,</span> <span class="n">string_types</span><span class="p">)</span> <span class="ow">and</span> <span class="n">init</span> <span class="o">==</span> <span class="s1">'random'</span><span class="p">:</span>
<span class="n">seeds</span> <span class="o">=</span> <span class="n">random_state</span><span class="o">.</span><span class="n">permutation</span><span class="p">(</span><span class="n">n_samples</span><span class="p">)[:</span><span class="n">k</span><span class="p">]</span>
<span class="n">centers</span> <span class="o">=</span> <span class="n">X</span><span class="p">[</span><span class="n">seeds</span><span class="p">]</span>
<span class="k">elif</span> <span class="nb">hasattr</span><span class="p">(</span><span class="n">init</span><span class="p">,</span> <span class="s1">'__array__'</span><span class="p">):</span>
<span class="c1"># ensure that the centers have the same dtype as X</span>
<span class="c1"># this is a requirement of fused types of cython</span>
<span class="n">centers</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">init</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">X</span><span class="o">.</span><span class="n">dtype</span><span class="p">)</span>
<span class="k">elif</span> <span class="n">callable</span><span class="p">(</span><span class="n">init</span><span class="p">):</span>
<span class="n">centers</span> <span class="o">=</span> <span class="n">init</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">k</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="n">random_state</span><span class="p">)</span>
<span class="n">centers</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">asarray</span><span class="p">(</span><span class="n">centers</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">X</span><span class="o">.</span><span class="n">dtype</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s2">"the init parameter for the k-means should "</span>
<span class="s2">"be 'k-means++' or 'random' or an ndarray, "</span>
<span class="s2">"'</span><span class="si">%s</span><span class="s2">' (type '</span><span class="si">%s</span><span class="s2">') was passed."</span> <span class="o">%</span> <span class="p">(</span><span class="n">init</span><span class="p">,</span> <span class="nb">type</span><span class="p">(</span><span class="n">init</span><span class="p">)))</span>
<span class="k">if</span> <span class="n">sp</span><span class="o">.</span><span class="n">issparse</span><span class="p">(</span><span class="n">centers</span><span class="p">):</span>
<span class="n">centers</span> <span class="o">=</span> <span class="n">centers</span><span class="o">.</span><span class="n">toarray</span><span class="p">()</span>
<span class="n">_validate_center_shape</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">k</span><span class="p">,</span> <span class="n">centers</span><span class="p">)</span>
<span class="k">return</span> <span class="n">centers</span>
<span class="k">class</span> <span class="nc">KMeans</span><span class="p">(</span><span class="n">BaseEstimator</span><span class="p">,</span> <span class="n">ClusterMixin</span><span class="p">,</span> <span class="n">TransformerMixin</span><span class="p">):</span>
<span class="sd">"""K-Means clustering</span>
<span class="sd"> Read more in the :ref:`User Guide <k_means>`.</span>
<span class="sd"> Parameters</span>
<span class="sd"> ----------</span>
<span class="sd"> n_clusters : int, optional, default: 8</span>
<span class="sd"> The number of clusters to form as well as the number of</span>
<span class="sd"> centroids to generate.</span>
<span class="sd"> init : {'k-means++', 'random' or an ndarray}</span>
<span class="sd"> Method for initialization, defaults to 'k-means++':</span>
<span class="sd"> 'k-means++' : selects initial cluster centers for k-mean</span>
<span class="sd"> clustering in a smart way to speed up convergence. See section</span>
<span class="sd"> Notes in k_init for more details.</span>
<span class="sd"> 'random': choose k observations (rows) at random from data for</span>
<span class="sd"> the initial centroids.</span>
<span class="sd"> If an ndarray is passed, it should be of shape (n_clusters, n_features)</span>
<span class="sd"> and gives the initial centers.</span>
<span class="sd"> n_init : int, default: 10</span>
<span class="sd"> Number of time the k-means algorithm will be run with different</span>
<span class="sd"> centroid seeds. The final results will be the best output of</span>
<span class="sd"> n_init consecutive runs in terms of inertia.</span>
<span class="sd"> max_iter : int, default: 300</span>
<span class="sd"> Maximum number of iterations of the k-means algorithm for a</span>
<span class="sd"> single run.</span>
<span class="sd"> tol : float, default: 1e-4</span>
<span class="sd"> Relative tolerance with regards to inertia to declare convergence</span>
<span class="sd"> precompute_distances : {'auto', True, False}</span>
<span class="sd"> Precompute distances (faster but takes more memory).</span>
<span class="sd"> 'auto' : do not precompute distances if n_samples * n_clusters > 12</span>
<span class="sd"> million. This corresponds to about 100MB overhead per job using</span>
<span class="sd"> double precision.</span>
<span class="sd"> True : always precompute distances</span>
<span class="sd"> False : never precompute distances</span>
<span class="sd"> verbose : int, default 0</span>
<span class="sd"> Verbosity mode.</span>
<span class="sd"> random_state : int, RandomState instance or None, optional, default: None</span>
<span class="sd"> If int, random_state is the seed used by the random number generator;</span>
<span class="sd"> If RandomState instance, random_state is the random number generator;</span>
<span class="sd"> If None, the random number generator is the RandomState instance used</span>
<span class="sd"> by `np.random`.</span>
<span class="sd"> copy_x : boolean, default True</span>
<span class="sd"> When pre-computing distances it is more numerically accurate to center</span>
<span class="sd"> the data first. If copy_x is True, then the original data is not</span>
<span class="sd"> modified. If False, the original data is modified, and put back before</span>
<span class="sd"> the function returns, but small numerical differences may be introduced</span>
<span class="sd"> by subtracting and then adding the data mean.</span>
<span class="sd"> n_jobs : int</span>
<span class="sd"> The number of jobs to use for the computation. This works by computing</span>
<span class="sd"> each of the n_init runs in parallel.</span>
<span class="sd"> If -1 all CPUs are used. If 1 is given, no parallel computing code is</span>
<span class="sd"> used at all, which is useful for debugging. For n_jobs below -1,</span>
<span class="sd"> (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one</span>
<span class="sd"> are used.</span>
<span class="sd"> algorithm : "auto", "full" or "elkan", default="auto"</span>
<span class="sd"> K-means algorithm to use. The classical EM-style algorithm is "full".</span>
<span class="sd"> The "elkan" variation is more efficient by using the triangle</span>
<span class="sd"> inequality, but currently doesn't support sparse data. "auto" chooses</span>
<span class="sd"> "elkan" for dense data and "full" for sparse data.</span>
<span class="sd"> Attributes</span>
<span class="sd"> ----------</span>
<span class="sd"> cluster_centers_ : array, [n_clusters, n_features]</span>
<span class="sd"> Coordinates of cluster centers</span>
<span class="sd"> labels_ :</span>
<span class="sd"> Labels of each point</span>
<span class="sd"> inertia_ : float</span>
<span class="sd"> Sum of distances of samples to their closest cluster center.</span>
<span class="sd"> Examples</span>
<span class="sd"> --------</span>
<span class="sd"> >>> from sklearn.cluster import KMeans</span>
<span class="sd"> >>> import numpy as np</span>
<span class="sd"> >>> X = np.array([[1, 2], [1, 4], [1, 0],</span>
<span class="sd"> ... [4, 2], [4, 4], [4, 0]])</span>
<span class="sd"> >>> kmeans = KMeans(n_clusters=2, random_state=0).fit(X)</span>
<span class="sd"> >>> kmeans.labels_</span>
<span class="sd"> array([0, 0, 0, 1, 1, 1], dtype=int32)</span>
<span class="sd"> >>> kmeans.predict([[0, 0], [4, 4]])</span>
<span class="sd"> array([0, 1], dtype=int32)</span>
<span class="sd"> >>> kmeans.cluster_centers_</span>
<span class="sd"> array([[ 1., 2.],</span>
<span class="sd"> [ 4., 2.]])</span>
<span class="sd"> See also</span>
<span class="sd"> --------</span>
<span class="sd"> MiniBatchKMeans</span>
<span class="sd"> Alternative online implementation that does incremental updates</span>
<span class="sd"> of the centers positions using mini-batches.</span>
<span class="sd"> For large scale learning (say n_samples > 10k) MiniBatchKMeans is</span>
<span class="sd"> probably much faster than the default batch implementation.</span>
<span class="sd"> Notes</span>
<span class="sd"> ------</span>
<span class="sd"> The k-means problem is solved using Lloyd's algorithm.</span>
<span class="sd"> The average complexity is given by O(k n T), were n is the number of</span>
<span class="sd"> samples and T is the number of iteration.</span>
<span class="sd"> The worst case complexity is given by O(n^(k+2/p)) with</span>
<span class="sd"> n = n_samples, p = n_features. (D. Arthur and S. Vassilvitskii,</span>
<span class="sd"> 'How slow is the k-means method?' SoCG2006)</span>
<span class="sd"> In practice, the k-means algorithm is very fast (one of the fastest</span>
<span class="sd"> clustering algorithms available), but it falls in local minima. That's why</span>
<span class="sd"> it can be useful to restart it several times.</span>
<span class="sd"> """</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">n_clusters</span><span class="o">=</span><span class="mi">8</span><span class="p">,</span> <span class="n">init</span><span class="o">=</span><span class="s1">'k-means++'</span><span class="p">,</span> <span class="n">n_init</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span>
<span class="n">max_iter</span><span class="o">=</span><span class="mi">300</span><span class="p">,</span> <span class="n">tol</span><span class="o">=</span><span class="mf">1e-4</span><span class="p">,</span> <span class="n">precompute_distances</span><span class="o">=</span><span class="s1">'auto'</span><span class="p">,</span>
<span class="n">verbose</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">copy_x</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
<span class="n">n_jobs</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">algorithm</span><span class="o">=</span><span class="s1">'auto'</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">n_clusters</span> <span class="o">=</span> <span class="n">n_clusters</span>
<span class="bp">self</span><span class="o">.</span><span class="n">init</span> <span class="o">=</span> <span class="n">init</span>
<span class="bp">self</span><span class="o">.</span><span class="n">max_iter</span> <span class="o">=</span> <span class="n">max_iter</span>
<span class="bp">self</span><span class="o">.</span><span class="n">tol</span> <span class="o">=</span> <span class="n">tol</span>
<span class="bp">self</span><span class="o">.</span><span class="n">precompute_distances</span> <span class="o">=</span> <span class="n">precompute_distances</span>
<span class="bp">self</span><span class="o">.</span><span class="n">n_init</span> <span class="o">=</span> <span class="n">n_init</span>
<span class="bp">self</span><span class="o">.</span><span class="n">verbose</span> <span class="o">=</span> <span class="n">verbose</span>
<span class="bp">self</span><span class="o">.</span><span class="n">random_state</span> <span class="o">=</span> <span class="n">random_state</span>
<span class="bp">self</span><span class="o">.</span><span class="n">copy_x</span> <span class="o">=</span> <span class="n">copy_x</span>
<span class="bp">self</span><span class="o">.</span><span class="n">n_jobs</span> <span class="o">=</span> <span class="n">n_jobs</span>
<span class="bp">self</span><span class="o">.</span><span class="n">algorithm</span> <span class="o">=</span> <span class="n">algorithm</span>
<span class="k">def</span> <span class="nf">_check_fit_data</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">):</span>
<span class="sd">"""Verify that the number of samples given is larger than k"""</span>
<span class="n">X</span> <span class="o">=</span> <span class="n">check_array</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">accept_sparse</span><span class="o">=</span><span class="s1">'csr'</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="p">[</span><span class="n">np</span><span class="o">.</span><span class="n">float64</span><span class="p">,</span> <span class="n">np</span><span class="o">.</span><span class="n">float32</span><span class="p">])</span>
<span class="k">if</span> <span class="n">X</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o"><</span> <span class="bp">self</span><span class="o">.</span><span class="n">n_clusters</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s2">"n_samples=</span><span class="si">%d</span><span class="s2"> should be >= n_clusters=</span><span class="si">%d</span><span class="s2">"</span> <span class="o">%</span> <span class="p">(</span>
<span class="n">X</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="bp">self</span><span class="o">.</span><span class="n">n_clusters</span><span class="p">))</span>
<span class="k">return</span> <span class="n">X</span>
<span class="k">def</span> <span class="nf">_check_test_data</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">):</span>
<span class="n">X</span> <span class="o">=</span> <span class="n">check_array</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">accept_sparse</span><span class="o">=</span><span class="s1">'csr'</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">FLOAT_DTYPES</span><span class="p">)</span>
<span class="n">n_samples</span><span class="p">,</span> <span class="n">n_features</span> <span class="o">=</span> <span class="n">X</span><span class="o">.</span><span class="n">shape</span>
<span class="n">expected_n_features</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">cluster_centers_</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">n_features</span> <span class="o">==</span> <span class="n">expected_n_features</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s2">"Incorrect number of features. "</span>
<span class="s2">"Got </span><span class="si">%d</span><span class="s2"> features, expected </span><span class="si">%d</span><span class="s2">"</span> <span class="o">%</span> <span class="p">(</span>
<span class="n">n_features</span><span class="p">,</span> <span class="n">expected_n_features</span><span class="p">))</span>
<span class="k">return</span> <span class="n">X</span>
<span class="k">def</span> <span class="nf">fit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="sd">"""Compute k-means clustering.</span>
<span class="sd"> Parameters</span>
<span class="sd"> ----------</span>
<span class="sd"> X : array-like or sparse matrix, shape=(n_samples, n_features)</span>
<span class="sd"> Training instances to cluster.</span>
<span class="sd"> """</span>
<span class="c1"># Added to remove scikit-learn internal dependenceies</span>
<span class="k">raise</span> <span class="bp">NotImplemented</span>
<span class="k">def</span> <span class="nf">fit_predict</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="sd">"""Compute cluster centers and predict cluster index for each sample.</span>
<span class="sd"> Convenience method; equivalent to calling fit(X) followed by</span>
<span class="sd"> predict(X).</span>
<span class="sd"> Parameters</span>
<span class="sd"> ----------</span>
<span class="sd"> X : {array-like, sparse matrix}, shape = [n_samples, n_features]</span>
<span class="sd"> New data to transform.</span>
<span class="sd"> Returns</span>
<span class="sd"> -------</span>
<span class="sd"> labels : array, shape [n_samples,]</span>
<span class="sd"> Index of the cluster each sample belongs to.</span>
<span class="sd"> """</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">X</span><span class="p">)</span><span class="o">.</span><span class="n">labels_</span>
<span class="k">def</span> <span class="nf">fit_transform</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="sd">"""Compute clustering and transform X to cluster-distance space.</span>
<span class="sd"> Equivalent to fit(X).transform(X), but more efficiently implemented.</span>
<span class="sd"> Parameters</span>
<span class="sd"> ----------</span>
<span class="sd"> X : {array-like, sparse matrix}, shape = [n_samples, n_features]</span>
<span class="sd"> New data to transform.</span>
<span class="sd"> Returns</span>
<span class="sd"> -------</span>
<span class="sd"> X_new : array, shape [n_samples, k]</span>
<span class="sd"> X transformed in the new space.</span>
<span class="sd"> """</span>
<span class="c1"># Currently, this just skips a copy of the data if it is not in</span>
<span class="c1"># np.array or CSR format already.</span>
<span class="c1"># XXX This skips _check_test_data, which may change the dtype;</span>
<span class="c1"># we should refactor the input validation.</span>
<span class="n">X</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_check_fit_data</span><span class="p">(</span><span class="n">X</span><span class="p">)</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">X</span><span class="p">)</span><span class="o">.</span><span class="n">_transform</span><span class="p">(</span><span class="n">X</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">transform</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">):</span>
<span class="sd">"""Transform X to a cluster-distance space.</span>
<span class="sd"> In the new space, each dimension is the distance to the cluster</span>
<span class="sd"> centers. Note that even if X is sparse, the array returned by</span>
<span class="sd"> `transform` will typically be dense.</span>
<span class="sd"> Parameters</span>
<span class="sd"> ----------</span>
<span class="sd"> X : {array-like, sparse matrix}, shape = [n_samples, n_features]</span>
<span class="sd"> New data to transform.</span>
<span class="sd"> Returns</span>
<span class="sd"> -------</span>
<span class="sd"> X_new : array, shape [n_samples, k]</span>
<span class="sd"> X transformed in the new space.</span>
<span class="sd"> """</span>
<span class="n">check_is_fitted</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="s1">'cluster_centers_'</span><span class="p">)</span>
<span class="n">X</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_check_test_data</span><span class="p">(</span><span class="n">X</span><span class="p">)</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_transform</span><span class="p">(</span><span class="n">X</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">_transform</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">):</span>
<span class="sd">"""guts of transform method; no input validation"""</span>
<span class="k">return</span> <span class="n">euclidean_distances</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">cluster_centers_</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">predict</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">):</span>
<span class="sd">"""Predict the closest cluster each sample in X belongs to.</span>
<span class="sd"> In the vector quantization literature, `cluster_centers_` is called</span>
<span class="sd"> the code book and each value returned by `predict` is the index of</span>
<span class="sd"> the closest code in the code book.</span>
<span class="sd"> Parameters</span>
<span class="sd"> ----------</span>
<span class="sd"> X : {array-like, sparse matrix}, shape = [n_samples, n_features]</span>
<span class="sd"> New data to predict.</span>
<span class="sd"> Returns</span>
<span class="sd"> -------</span>
<span class="sd"> labels : array, shape [n_samples,]</span>
<span class="sd"> Index of the cluster each sample belongs to.</span>
<span class="sd"> """</span>
<span class="n">check_is_fitted</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="s1">'cluster_centers_'</span><span class="p">)</span>
<span class="n">X</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_check_test_data</span><span class="p">(</span><span class="n">X</span><span class="p">)</span>
<span class="n">x_squared_norms</span> <span class="o">=</span> <span class="n">row_norms</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">squared</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="k">return</span> <span class="n">_labels_inertia</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">x_squared_norms</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">cluster_centers_</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">score</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="sd">"""Opposite of the value of X on the K-means objective.</span>
<span class="sd"> Parameters</span>
<span class="sd"> ----------</span>
<span class="sd"> X : {array-like, sparse matrix}, shape = [n_samples, n_features]</span>
<span class="sd"> New data.</span>
<span class="sd"> Returns</span>
<span class="sd"> -------</span>
<span class="sd"> score : float</span>
<span class="sd"> Opposite of the value of X on the K-means objective.</span>
<span class="sd"> """</span>
<span class="n">check_is_fitted</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="s1">'cluster_centers_'</span><span class="p">)</span>
<span class="n">X</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_check_test_data</span><span class="p">(</span><span class="n">X</span><span class="p">)</span>
<span class="n">x_squared_norms</span> <span class="o">=</span> <span class="n">row_norms</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">squared</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="k">return</span> <span class="o">-</span><span class="n">_labels_inertia</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">x_squared_norms</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">cluster_centers_</span><span class="p">)[</span><span class="mi">1</span><span class="p">]</span>
</pre></div>
</div>
</div>
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<h1 class="logo"><a href="../../../index.html">k-means-constrained</a></h1>
<h3>Navigation</h3>
<div class="relations">
<h3>Related Topics</h3>
<ul>
<li><a href="../../../index.html">Documentation overview</a><ul>
<li><a href="../../index.html">Module code</a><ul>
</ul></li>
</ul></li>
</ul>
</div>
<div id="searchbox" style="display: none" role="search">
<h3 id="searchlabel">Quick search</h3>
<div class="searchformwrapper">
<form class="search" action="../../../search.html" method="get">
<input type="text" name="q" aria-labelledby="searchlabel" />
<input type="submit" value="Go" />
</form>
</div>
</div>
<script type="text/javascript">$('#searchbox').show(0);</script>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="footer">
©2020, Josh Levy-Kramer.
|
Powered by <a href="http://sphinx-doc.org/">Sphinx 2.2.0</a>
& <a href="https://github.com/bitprophet/alabaster">Alabaster 0.7.12</a>
</div>
</body>
</html>
================================================
FILE: docs/_modules/k_means_constrained/sklearn_import/base.html
================================================
<!DOCTYPE html>
<html class="writer-html5" lang="en" >
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>k_means_constrained.sklearn_import.base — k-means-constrained 0.5.1 documentation</title>
<link rel="stylesheet" href="../../../_static/css/theme.css" type="text/css" />
<link rel="stylesheet" href="../../../_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="../../../_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="../../../_static/css/theme.css" type="text/css" />
<!--[if lt IE 9]>
<script src="../../../_static/js/html5shiv.min.js"></script>
<![endif]-->
<script type="text/javascript" id="documentation_options" data-url_root="../../../" src="../../../_static/documentation_options.js"></script>
<script data-url_root="../../../" id="documentation_options" src="../../../_static/documentation_options.js"></script>
<script src="../../../_static/jquery.js"></script>
<script src="../../../_static/underscore.js"></script>
<script src="../../../_static/doctools.js"></script>
<script type="text/javascript" src="../../../_static/js/theme.js"></script>
<link rel="index" title="Index" href="../../../genindex.html" />
<link rel="search" title="Search" href="../../../search.html" />
</head>
<body class="wy-body-for-nav">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll">
<div class="wy-side-nav-search" >
<a href="../../../index.html" class="icon icon-home"> k-means-constrained
</a>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="../../../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div>
<div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
<!-- Local TOC -->
<div class="local-toc"></div>
</div>
</div>
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">
<nav class="wy-nav-top" aria-label="top navigation">
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../../../index.html">k-means-constrained</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="breadcrumbs navigation">
<ul class="wy-breadcrumbs">
<li><a href="../../../index.html" class="icon icon-home"></a> »</li>
<li><a href="../../index.html">Module code</a> »</li>
<li>k_means_constrained.sklearn_import.base</li>
<li class="wy-breadcrumbs-aside">
</li>
</ul>
<hr/>
</div>
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody">
<h1>Source code for k_means_constrained.sklearn_import.base</h1><div class="highlight"><pre>
<span></span><span class="kn">import</span> <span class="nn">warnings</span>
<span class="kn">from</span> <span class="nn">collections</span> <span class="kn">import</span> <span class="n">defaultdict</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
<span class="kn">import</span> <span class="nn">six</span>
<span class="kn">from</span> <span class="nn">k_means_constrained.sklearn_import</span> <span class="kn">import</span> <span class="n">__version__</span>
<span class="kn">from</span> <span class="nn">k_means_constrained.sklearn_import.funcsigs</span> <span class="kn">import</span> <span class="n">signature</span>
<span class="k">class</span> <span class="nc">BaseEstimator</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
<span class="sd">"""Base class for all estimators in scikit-learn</span>
<span class="sd"> Notes</span>
<span class="sd"> -----</span>
<span class="sd"> All estimators should specify all the parameters that can be set</span>
<span class="sd"> at the class level in their ``__init__`` as explicit keyword</span>
<span class="sd"> arguments (no ``*args`` or ``**kwargs``).</span>
<span class="sd"> """</span>
<span class="nd">@classmethod</span>
<span class="k">def</span> <span class="nf">_get_param_names</span><span class="p">(</span><span class="bp">cls</span><span class="p">):</span>
<span class="sd">"""Get parameter names for the estimator"""</span>
<span class="c1"># fetch the constructor or the original constructor before</span>
<span class="c1"># deprecation wrapping if any</span>
<span class="n">init</span> <span class="o">=</span> <span class="nb">getattr</span><span class="p">(</span><span class="bp">cls</span><span class="o">.</span><span class="fm">__init__</span><span class="p">,</span> <span class="s1">'deprecated_original'</span><span class="p">,</span> <span class="bp">cls</span><span class="o">.</span><span class="fm">__init__</span><span class="p">)</span>
<span class="k">if</span> <span class="n">init</span> <span class="ow">is</span> <span class="nb">object</span><span class="o">.</span><span class="fm">__init__</span><span class="p">:</span>
<span class="c1"># No explicit constructor to introspect</span>
<span class="k">return</span> <span class="p">[]</span>
<span class="c1"># introspect the constructor arguments to find the model parameters</span>
<span class="c1"># to represent</span>
<span class="n">init_signature</span> <span class="o">=</span> <span class="n">signature</span><span class="p">(</span><span class="n">init</span><span class="p">)</span>
<span class="c1"># Consider the constructor parameters excluding 'self'</span>
<span class="n">parameters</span> <span class="o">=</span> <span class="p">[</span><span class="n">p</span> <span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">init_signature</span><span class="o">.</span><span class="n">parameters</span><span class="o">.</span><span class="n">values</span><span class="p">()</span>
<span class="k">if</span> <span class="n">p</span><span class="o">.</span><span class="n">name</span> <span class="o">!=</span> <span class="s1">'self'</span> <span class="ow">and</span> <span class="n">p</span><span class="o">.</span><span class="n">kind</span> <span class="o">!=</span> <span class="n">p</span><span class="o">.</span><span class="n">VAR_KEYWORD</span><span class="p">]</span>
<span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">parameters</span><span class="p">:</span>
<span class="k">if</span> <span class="n">p</span><span class="o">.</span><span class="n">kind</span> <span class="o">==</span> <span class="n">p</span><span class="o">.</span><span class="n">VAR_POSITIONAL</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">RuntimeError</span><span class="p">(</span><span class="s2">"scikit-learn estimators should always "</span>
<span class="s2">"specify their parameters in the signature"</span>
<span class="s2">" of their __init__ (no varargs)."</span>
<span class="s2">" </span><span class="si">%s</span><span class="s2"> with constructor </span><span class="si">%s</span><span class="s2"> doesn't "</span>
<span class="s2">" follow this convention."</span>
<span class="o">%</span> <span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="n">init_signature</span><span class="p">))</span>
<span class="c1"># Extract and sort argument names excluding 'self'</span>
<span class="k">return</span> <span class="nb">sorted</span><span class="p">([</span><span class="n">p</span><span class="o">.</span><span class="n">name</span> <span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">parameters</span><span class="p">])</span>
<span class="k">def</span> <span class="nf">get_params</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">deep</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
<span class="sd">"""Get parameters for this estimator.</span>
<span class="sd"> Parameters</span>
<span class="sd"> ----------</span>
<span class="sd"> deep : boolean, optional</span>
<span class="sd"> If True, will return the parameters for this estimator and</span>
<span class="sd"> contained subobjects that are estimators.</span>
<span class="sd"> Returns</span>
<span class="sd"> -------</span>
<span class="sd"> params : mapping of string to any</span>
<span class="sd"> Parameter names mapped to their values.</span>
<span class="sd"> """</span>
<span class="n">out</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">()</span>
<span class="k">for</span> <span class="n">key</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">_get_param_names</span><span class="p">():</span>
<span class="c1"># We need deprecation warnings to always be on in order to</span>
<span class="c1"># catch deprecated param values.</span>
<span class="c1"># This is set in utils/__init__.py but it gets overwritten</span>
<span class="c1"># when running under python3 somehow.</span>
<span class="n">warnings</span><span class="o">.</span><span class="n">simplefilter</span><span class="p">(</span><span class="s2">"always"</span><span class="p">,</span> <span class="ne">DeprecationWarning</span><span class="p">)</span>
<span class="k">try</span><span class="p">:</span>
<span class="k">with</span> <span class="n">warnings</span><span class="o">.</span><span class="n">catch_warnings</span><span class="p">(</span><span class="n">record</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span> <span class="k">as</span> <span class="n">w</span><span class="p">:</span>
<span class="n">value</span> <span class="o">=</span> <span class="nb">getattr</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">key</span><span class="p">,</span> <span class="kc">None</span><span class="p">)</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">w</span><span class="p">)</span> <span class="ow">and</span> <span class="n">w</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">category</span> <span class="o">==</span> <span class="ne">DeprecationWarning</span><span class="p">:</span>
<span class="c1"># if the parameter is deprecated, don't show it</span>
<span class="k">continue</span>
<span class="k">finally</span><span class="p">:</span>
<span class="n">warnings</span><span class="o">.</span><span class="n">filters</span><span class="o">.</span><span class="n">pop</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="c1"># XXX: should we rather test if instance of estimator?</span>
<span class="k">if</span> <span class="n">deep</span> <span class="ow">and</span> <span class="nb">hasattr</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="s1">'get_params'</span><span class="p">):</span>
<span class="n">deep_items</span> <span class="o">=</span> <span class="n">value</span><span class="o">.</span><span class="n">get_params</span><span class="p">()</span><span class="o">.</span><span class="n">items</span><span class="p">()</span>
<span class="n">out</span><span class="o">.</span><span class="n">update</span><span class="p">((</span><span class="n">key</span> <span class="o">+</span> <span class="s1">'__'</span> <span class="o">+</span> <span class="n">k</span><span class="p">,</span> <span class="n">val</span><span class="p">)</span> <span class="k">for</span> <span class="n">k</span><span class="p">,</span> <span class="n">val</span> <span class="ow">in</span> <span class="n">deep_items</span><span class="p">)</span>
<span class="n">out</span><span class="p">[</span><span class="n">key</span><span class="p">]</span> <span class="o">=</span> <span class="n">value</span>
<span class="k">return</span> <span class="n">out</span>
<span class="k">def</span> <span class="nf">set_params</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">**</span><span class="n">params</span><span class="p">):</span>
<span class="sd">"""Set the parameters of this estimator.</span>
<span class="sd"> The method works on simple estimators as well as on nested objects</span>
<span class="sd"> (such as pipelines). The latter have parameters of the form</span>
<span class="sd"> ``<component>__<parameter>`` so that it's possible to update each</span>
<span class="sd"> component of a nested object.</span>
<span class="sd"> Returns</span>
<span class="sd"> -------</span>
<span class="sd"> self</span>
<span class="sd"> """</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">params</span><span class="p">:</span>
<span class="c1"># Simple optimization to gain speed (inspect is slow)</span>
<span class="k">return</span> <span class="bp">self</span>
<span class="n">valid_params</span> <span class="o">=</span> <span class="bp">self</span><span cla
gitextract_2ogob1oo/ ├── .bumpversion.cfg ├── .github/ │ ├── ISSUE_TEMPLATE/ │ │ └── bug_report.md │ └── workflows/ │ └── build_wheels.yml ├── .gitignore ├── CITATION.cff ├── CLAUDE.md ├── LICENSE ├── MANIFEST.in ├── Makefile ├── README.md ├── README_dev.md ├── docs/ │ ├── .buildinfo │ ├── .doctrees/ │ │ ├── environment.pickle │ │ ├── index.doctree │ │ └── modules.doctree │ ├── .nojekyll │ ├── _modules/ │ │ ├── index.html │ │ ├── k_means_constrained/ │ │ │ ├── k_means_constrained_.html │ │ │ ├── sklearn_cluster/ │ │ │ │ └── k_means_.html │ │ │ └── sklearn_import/ │ │ │ ├── base.html │ │ │ └── cluster/ │ │ │ └── k_means_.html │ │ └── sklearn/ │ │ └── base.html │ ├── _sources/ │ │ ├── index.rst.txt │ │ └── modules.rst.txt │ ├── _static/ │ │ ├── alabaster.css │ │ ├── basic.css │ │ ├── css/ │ │ │ ├── badge_only.css │ │ │ └── theme.css │ │ ├── custom.css │ │ ├── doctools.js │ │ ├── documentation_options.js │ │ ├── fonts/ │ │ │ └── FontAwesome.otf │ │ ├── jquery-3.4.1.js │ │ ├── jquery-3.5.1.js │ │ ├── jquery.js │ │ ├── js/ │ │ │ ├── badge_only.js │ │ │ └── theme.js │ │ ├── language_data.js │ │ ├── pygments.css │ │ ├── searchtools.js │ │ ├── underscore-1.12.0.js │ │ ├── underscore-1.3.1.js │ │ └── underscore.js │ ├── genindex.html │ ├── index.html │ ├── modules.html │ ├── objects.inv │ ├── py-modindex.html │ ├── search.html │ └── searchindex.js ├── docs_source/ │ ├── Makefile │ ├── README.md │ ├── conf.py │ ├── index.rst │ └── make.bat ├── etc/ │ ├── benchmark.ipynb │ ├── benchmark_k_means.py │ ├── benchmark_k_means_constrained.py │ └── cython_benchmark.ipynb ├── k_means_constrained/ │ ├── __init__.py │ ├── k_means_constrained_.py │ └── sklearn_import/ │ ├── README │ ├── __init__.py │ ├── base.py │ ├── cluster/ │ │ ├── __init__.py │ │ ├── _k_means.pyx │ │ └── k_means_.py │ ├── exceptions.py │ ├── externals/ │ │ ├── __init__.py │ │ └── funcsigs.py │ ├── fixes.py │ ├── funcsigs.py │ ├── metrics/ │ │ ├── __init__.py │ │ ├── pairwise.py │ │ └── pairwise_fast.pyx │ ├── preprocessing/ │ │ ├── __init__.py │ │ └── data.py │ └── utils/ │ ├── __init__.py │ ├── extmath.py │ ├── fixes.py │ ├── sparsefuncs.py │ ├── sparsefuncs_fast.pyx │ └── validation.py ├── pyproject.toml ├── requirements-dev.txt ├── requirements.txt ├── setup.cfg ├── setup.py ├── tests/ │ ├── test_k_means_constrained_.py │ └── test_kmeans_constrained_from_sklearn.py └── tox.ini
SYMBOL INDEX (572 symbols across 29 files)
FILE: docs/_static/doctools.js
function highlight (line 74) | function highlight(node, addItems) {
FILE: docs/_static/jquery-3.4.1.js
function DOMEval (line 98) | function DOMEval( code, node, doc ) {
function toType (line 128) | function toType( obj ) {
function isArrayLike (line 496) | function isArrayLike( obj ) {
function Sizzle (line 729) | function Sizzle( selector, context, results, seed ) {
function createCache (line 871) | function createCache() {
function markFunction (line 889) | function markFunction( fn ) {
function assert (line 898) | function assert( fn ) {
function addHandle (line 920) | function addHandle( attrs, handler ) {
function siblingCheck (line 935) | function siblingCheck( a, b ) {
function createInputPseudo (line 961) | function createInputPseudo( type ) {
function createButtonPseudo (line 972) | function createButtonPseudo( type ) {
function createDisabledPseudo (line 983) | function createDisabledPseudo( disabled ) {
function createPositionalPseudo (line 1039) | function createPositionalPseudo( fn ) {
function testContext (line 1062) | function testContext( context ) {
function setFilters (line 2150) | function setFilters() {}
function toSelector (line 2221) | function toSelector( tokens ) {
function addCombinator (line 2231) | function addCombinator( matcher, combinator, base ) {
function elementMatcher (line 2295) | function elementMatcher( matchers ) {
function multipleContexts (line 2309) | function multipleContexts( selector, contexts, results ) {
function condense (line 2318) | function condense( unmatched, map, filter, context, xml ) {
function setMatcher (line 2339) | function setMatcher( preFilter, selector, matcher, postFilter, postFinde...
function matcherFromTokens (line 2432) | function matcherFromTokens( tokens ) {
function matcherFromGroupMatchers (line 2490) | function matcherFromGroupMatchers( elementMatchers, setMatchers ) {
function nodeName (line 2826) | function nodeName( elem, name ) {
function winnow (line 2836) | function winnow( elements, qualifier, not ) {
function sibling (line 3131) | function sibling( cur, dir ) {
function createOptions (line 3218) | function createOptions( options ) {
function Identity (line 3443) | function Identity( v ) {
function Thrower (line 3446) | function Thrower( ex ) {
function adoptValue (line 3450) | function adoptValue( value, resolve, reject, noValue ) {
function resolve (line 3543) | function resolve( depth, deferred, handler, special ) {
function completed (line 3908) | function completed() {
function fcamelCase (line 4003) | function fcamelCase( all, letter ) {
function camelCase (line 4010) | function camelCase( string ) {
function Data (line 4027) | function Data() {
function getData (line 4196) | function getData( data ) {
function dataAttr (line 4221) | function dataAttr( elem, key, data ) {
function adjustCSS (line 4554) | function adjustCSS( elem, prop, valueParts, tween ) {
function getDefaultDisplay (line 4622) | function getDefaultDisplay( elem ) {
function showHide (line 4645) | function showHide( elements, show ) {
function getAll (line 4746) | function getAll( context, tag ) {
function setGlobalEval (line 4771) | function setGlobalEval( elems, refElements ) {
function buildFragment (line 4787) | function buildFragment( elems, context, scripts, selection, ignored ) {
function returnTrue (line 4908) | function returnTrue() {
function returnFalse (line 4912) | function returnFalse() {
function expectSync (line 4922) | function expectSync( elem, type ) {
function safeActiveElement (line 4929) | function safeActiveElement() {
function on (line 4935) | function on( elem, types, selector, data, fn, one ) {
function leverageNative (line 5420) | function leverageNative( el, type, expectSync ) {
function manipulationTarget (line 5791) | function manipulationTarget( elem, content ) {
function disableScript (line 5802) | function disableScript( elem ) {
function restoreScript (line 5806) | function restoreScript( elem ) {
function cloneCopyEvent (line 5816) | function cloneCopyEvent( src, dest ) {
function fixInput (line 5851) | function fixInput( src, dest ) {
function domManip (line 5864) | function domManip( collection, args, callback, ignored ) {
function remove (line 5956) | function remove( elem, selector, keepData ) {
function computeStyleTests (line 6249) | function computeStyleTests() {
function roundPixelMeasures (line 6293) | function roundPixelMeasures( measure ) {
function curCSS (line 6338) | function curCSS( elem, name, computed ) {
function addGetHookIf (line 6391) | function addGetHookIf( conditionFn, hookFn ) {
function vendorPropName (line 6416) | function vendorPropName( name ) {
function finalPropName (line 6431) | function finalPropName( name ) {
function setPositiveNumber (line 6457) | function setPositiveNumber( elem, value, subtract ) {
function boxModelAdjustment (line 6469) | function boxModelAdjustment( elem, dimension, box, isBorderBox, styles, ...
function getWidthOrHeight (line 6537) | function getWidthOrHeight( elem, dimension, extra ) {
function Tween (line 6904) | function Tween( elem, options, prop, end, easing ) {
function schedule (line 7027) | function schedule() {
function createFxNow (line 7040) | function createFxNow() {
function genFx (line 7048) | function genFx( type, includeWidth ) {
function createTween (line 7068) | function createTween( value, prop, animation ) {
function defaultPrefilter (line 7082) | function defaultPrefilter( elem, props, opts ) {
function propFilter (line 7254) | function propFilter( props, specialEasing ) {
function Animation (line 7291) | function Animation( elem, properties, options ) {
function stripAndCollapse (line 8006) | function stripAndCollapse( value ) {
function getClass (line 8012) | function getClass( elem ) {
function classesToArray (line 8016) | function classesToArray( value ) {
function buildParams (line 8638) | function buildParams( prefix, obj, traditional, add ) {
function addToPrefiltersOrTransports (line 8792) | function addToPrefiltersOrTransports( structure ) {
function inspectPrefiltersOrTransports (line 8826) | function inspectPrefiltersOrTransports( structure, options, originalOpti...
function ajaxExtend (line 8855) | function ajaxExtend( target, src ) {
function ajaxHandleResponses (line 8875) | function ajaxHandleResponses( s, jqXHR, responses ) {
function ajaxConvert (line 8933) | function ajaxConvert( s, response, jqXHR, isSuccess ) {
function done (line 9448) | function done( status, nativeStatusText, responses, headers ) {
FILE: docs/_static/jquery-3.5.1.js
function DOMEval (line 103) | function DOMEval( code, node, doc ) {
function toType (line 133) | function toType( obj ) {
function isArrayLike (line 503) | function isArrayLike( obj ) {
function Sizzle (line 755) | function Sizzle( selector, context, results, seed ) {
function createCache (line 903) | function createCache() {
function markFunction (line 923) | function markFunction( fn ) {
function assert (line 932) | function assert( fn ) {
function addHandle (line 956) | function addHandle( attrs, handler ) {
function siblingCheck (line 971) | function siblingCheck( a, b ) {
function createInputPseudo (line 997) | function createInputPseudo( type ) {
function createButtonPseudo (line 1008) | function createButtonPseudo( type ) {
function createDisabledPseudo (line 1019) | function createDisabledPseudo( disabled ) {
function createPositionalPseudo (line 1075) | function createPositionalPseudo( fn ) {
function testContext (line 1098) | function testContext( context ) {
function setFilters (line 2309) | function setFilters() {}
function toSelector (line 2383) | function toSelector( tokens ) {
function addCombinator (line 2393) | function addCombinator( matcher, combinator, base ) {
function elementMatcher (line 2460) | function elementMatcher( matchers ) {
function multipleContexts (line 2474) | function multipleContexts( selector, contexts, results ) {
function condense (line 2483) | function condense( unmatched, map, filter, context, xml ) {
function setMatcher (line 2504) | function setMatcher( preFilter, selector, matcher, postFilter, postFinde...
function matcherFromTokens (line 2604) | function matcherFromTokens( tokens ) {
function matcherFromGroupMatchers (line 2667) | function matcherFromGroupMatchers( elementMatchers, setMatchers ) {
function nodeName (line 3025) | function nodeName( elem, name ) {
function winnow (line 3035) | function winnow( elements, qualifier, not ) {
function sibling (line 3330) | function sibling( cur, dir ) {
function createOptions (line 3423) | function createOptions( options ) {
function Identity (line 3648) | function Identity( v ) {
function Thrower (line 3651) | function Thrower( ex ) {
function adoptValue (line 3655) | function adoptValue( value, resolve, reject, noValue ) {
function resolve (line 3748) | function resolve( depth, deferred, handler, special ) {
function completed (line 4113) | function completed() {
function fcamelCase (line 4208) | function fcamelCase( _all, letter ) {
function camelCase (line 4215) | function camelCase( string ) {
function Data (line 4232) | function Data() {
function getData (line 4401) | function getData( data ) {
function dataAttr (line 4426) | function dataAttr( elem, key, data ) {
function adjustCSS (line 4738) | function adjustCSS( elem, prop, valueParts, tween ) {
function getDefaultDisplay (line 4806) | function getDefaultDisplay( elem ) {
function showHide (line 4829) | function showHide( elements, show ) {
function getAll (line 4961) | function getAll( context, tag ) {
function setGlobalEval (line 4986) | function setGlobalEval( elems, refElements ) {
function buildFragment (line 5002) | function buildFragment( elems, context, scripts, selection, ignored ) {
function returnTrue (line 5097) | function returnTrue() {
function returnFalse (line 5101) | function returnFalse() {
function expectSync (line 5111) | function expectSync( elem, type ) {
function safeActiveElement (line 5118) | function safeActiveElement() {
function on (line 5124) | function on( elem, types, selector, data, fn, one ) {
function leverageNative (line 5612) | function leverageNative( el, type, expectSync ) {
function manipulationTarget (line 5976) | function manipulationTarget( elem, content ) {
function disableScript (line 5987) | function disableScript( elem ) {
function restoreScript (line 5991) | function restoreScript( elem ) {
function cloneCopyEvent (line 6001) | function cloneCopyEvent( src, dest ) {
function fixInput (line 6034) | function fixInput( src, dest ) {
function domManip (line 6047) | function domManip( collection, args, callback, ignored ) {
function remove (line 6139) | function remove( elem, selector, keepData ) {
function computeStyleTests (line 6453) | function computeStyleTests() {
function roundPixelMeasures (line 6497) | function roundPixelMeasures( measure ) {
function curCSS (line 6571) | function curCSS( elem, name, computed ) {
function addGetHookIf (line 6624) | function addGetHookIf( conditionFn, hookFn ) {
function vendorPropName (line 6649) | function vendorPropName( name ) {
function finalPropName (line 6664) | function finalPropName( name ) {
function setPositiveNumber (line 6690) | function setPositiveNumber( _elem, value, subtract ) {
function boxModelAdjustment (line 6702) | function boxModelAdjustment( elem, dimension, box, isBorderBox, styles, ...
function getWidthOrHeight (line 6770) | function getWidthOrHeight( elem, dimension, extra ) {
function Tween (line 7146) | function Tween( elem, options, prop, end, easing ) {
function schedule (line 7269) | function schedule() {
function createFxNow (line 7282) | function createFxNow() {
function genFx (line 7290) | function genFx( type, includeWidth ) {
function createTween (line 7310) | function createTween( value, prop, animation ) {
function defaultPrefilter (line 7324) | function defaultPrefilter( elem, props, opts ) {
function propFilter (line 7496) | function propFilter( props, specialEasing ) {
function Animation (line 7533) | function Animation( elem, properties, options ) {
function stripAndCollapse (line 8248) | function stripAndCollapse( value ) {
function getClass (line 8254) | function getClass( elem ) {
function classesToArray (line 8258) | function classesToArray( value ) {
function buildParams (line 8885) | function buildParams( prefix, obj, traditional, add ) {
function addToPrefiltersOrTransports (line 9039) | function addToPrefiltersOrTransports( structure ) {
function inspectPrefiltersOrTransports (line 9073) | function inspectPrefiltersOrTransports( structure, options, originalOpti...
function ajaxExtend (line 9102) | function ajaxExtend( target, src ) {
function ajaxHandleResponses (line 9122) | function ajaxHandleResponses( s, jqXHR, responses ) {
function ajaxConvert (line 9180) | function ajaxConvert( s, response, jqXHR, isSuccess ) {
function done (line 9696) | function done( status, nativeStatusText, responses, headers ) {
FILE: docs/_static/jquery.js
function b (line 2) | function b(e,t,n){var r,i,o=(n=n||E).createElement("script");if(o.text=e...
function w (line 2) | function w(e){return null==e?e+"":"object"==typeof e||"function"==typeof...
function p (line 2) | function p(e){var t=!!e&&"length"in e&&e.length,n=w(e);return!m(e)&&!x(e...
function se (line 2) | function se(t,e,n,r){var i,o,a,s,u,l,c,f=e&&e.ownerDocument,p=e?e.nodeTy...
function ue (line 2) | function ue(){var r=[];return function e(t,n){return r.push(t+" ")>b.cac...
function le (line 2) | function le(e){return e[S]=!0,e}
function ce (line 2) | function ce(e){var t=C.createElement("fieldset");try{return!!e(t)}catch(...
function fe (line 2) | function fe(e,t){var n=e.split("|"),r=n.length;while(r--)b.attrHandle[n[...
function pe (line 2) | function pe(e,t){var n=t&&e,r=n&&1===e.nodeType&&1===t.nodeType&&e.sourc...
function de (line 2) | function de(t){return function(e){return"input"===e.nodeName.toLowerCase...
function he (line 2) | function he(n){return function(e){var t=e.nodeName.toLowerCase();return(...
function ge (line 2) | function ge(t){return function(e){return"form"in e?e.parentNode&&!1===e....
function ve (line 2) | function ve(a){return le(function(o){return o=+o,le(function(e,t){var n,...
function ye (line 2) | function ye(e){return e&&"undefined"!=typeof e.getElementsByTagName&&e}
function me (line 2) | function me(){}
function xe (line 2) | function xe(e){for(var t=0,n=e.length,r="";t<n;t++)r+=e[t].value;return r}
function be (line 2) | function be(s,e,t){var u=e.dir,l=e.next,c=l||u,f=t&&"parentNode"===c,p=r...
function we (line 2) | function we(i){return 1<i.length?function(e,t,n){var r=i.length;while(r-...
function Te (line 2) | function Te(e,t,n,r,i){for(var o,a=[],s=0,u=e.length,l=null!=t;s<u;s++)(...
function Ce (line 2) | function Ce(d,h,g,v,y,e){return v&&!v[S]&&(v=Ce(v)),y&&!y[S]&&(y=Ce(y,e)...
function Ee (line 2) | function Ee(e){for(var i,t,n,r=e.length,o=b.relative[e[0].type],a=o||b.r...
function A (line 2) | function A(e,t){return e.nodeName&&e.nodeName.toLowerCase()===t.toLowerC...
function D (line 2) | function D(e,n,r){return m(n)?S.grep(e,function(e,t){return!!n.call(e,t,...
function O (line 2) | function O(e,t){while((e=e[t])&&1!==e.nodeType);return e}
function R (line 2) | function R(e){return e}
function M (line 2) | function M(e){throw e}
function I (line 2) | function I(e,t,n,r){var i;try{e&&m(i=e.promise)?i.call(e).done(t).fail(n...
function l (line 2) | function l(i,o,a,s){return function(){var n=this,r=arguments,e=function(...
function B (line 2) | function B(){E.removeEventListener("DOMContentLoaded",B),C.removeEventLi...
function U (line 2) | function U(e,t){return t.toUpperCase()}
function X (line 2) | function X(e){return e.replace(_,"ms-").replace(z,U)}
function G (line 2) | function G(){this.expando=S.expando+G.uid++}
function Z (line 2) | function Z(e,t,n){var r,i;if(void 0===n&&1===e.nodeType)if(r="data-"+t.r...
function se (line 2) | function se(e,t,n,r){var i,o,a=20,s=r?function(){return r.cur()}:functio...
function le (line 2) | function le(e,t){for(var n,r,i,o,a,s,u,l=[],c=0,f=e.length;c<f;c++)(r=e[...
function ve (line 2) | function ve(e,t){var n;return n="undefined"!=typeof e.getElementsByTagNa...
function ye (line 2) | function ye(e,t){for(var n=0,r=e.length;n<r;n++)Y.set(e[n],"globalEval",...
function xe (line 2) | function xe(e,t,n,r,i){for(var o,a,s,u,l,c,f=t.createDocumentFragment(),...
function Ce (line 2) | function Ce(){return!0}
function Ee (line 2) | function Ee(){return!1}
function Se (line 2) | function Se(e,t){return e===function(){try{return E.activeElement}catch(...
function ke (line 2) | function ke(e,t,n,r,i,o){var a,s;if("object"==typeof t){for(s in"string"...
function Ae (line 2) | function Ae(e,i,o){o?(Y.set(e,i,!1),S.event.add(e,i,{namespace:!1,handle...
function qe (line 2) | function qe(e,t){return A(e,"table")&&A(11!==t.nodeType?t:t.firstChild,"...
function Le (line 2) | function Le(e){return e.type=(null!==e.getAttribute("type"))+"/"+e.type,e}
function He (line 2) | function He(e){return"true/"===(e.type||"").slice(0,5)?e.type=e.type.sli...
function Oe (line 2) | function Oe(e,t){var n,r,i,o,a,s;if(1===t.nodeType){if(Y.hasData(e)&&(s=...
function Pe (line 2) | function Pe(n,r,i,o){r=g(r);var e,t,a,s,u,l,c=0,f=n.length,p=f-1,d=r[0],...
function Re (line 2) | function Re(e,t,n){for(var r,i=t?S.filter(t,e):e,o=0;null!=(r=i[o]);o++)...
function Be (line 2) | function Be(e,t,n){var r,i,o,a,s=e.style;return(n=n||Ie(e))&&(""!==(a=n....
function $e (line 2) | function $e(e,t){return{get:function(){if(!e())return(this.get=t).apply(...
function e (line 2) | function e(){if(l){u.style.cssText="position:absolute;left:-11111px;widt...
function t (line 2) | function t(e){return Math.round(parseFloat(e))}
function Xe (line 2) | function Xe(e){var t=S.cssProps[e]||Ue[e];return t||(e in ze?e:Ue[e]=fun...
function Je (line 2) | function Je(e,t,n){var r=te.exec(t);return r?Math.max(0,r[2]-(n||0))+(r[...
function Ke (line 2) | function Ke(e,t,n,r,i,o){var a="width"===t?1:0,s=0,u=0;if(n===(r?"border...
function Ze (line 2) | function Ze(e,t,n){var r=Ie(e),i=(!y.boxSizingReliable()||n)&&"border-bo...
function et (line 2) | function et(e,t,n,r,i){return new et.prototype.init(e,t,n,r,i)}
function st (line 2) | function st(){nt&&(!1===E.hidden&&C.requestAnimationFrame?C.requestAnima...
function ut (line 2) | function ut(){return C.setTimeout(function(){tt=void 0}),tt=Date.now()}
function lt (line 2) | function lt(e,t){var n,r=0,i={height:e};for(t=t?1:0;r<4;r+=2-t)i["margin...
function ct (line 2) | function ct(e,t,n){for(var r,i=(ft.tweeners[t]||[]).concat(ft.tweeners["...
function ft (line 2) | function ft(o,e,t){var n,a,r=0,i=ft.prefilters.length,s=S.Deferred().alw...
function vt (line 2) | function vt(e){return(e.match(P)||[]).join(" ")}
function yt (line 2) | function yt(e){return e.getAttribute&&e.getAttribute("class")||""}
function mt (line 2) | function mt(e){return Array.isArray(e)?e:"string"==typeof e&&e.match(P)|...
function Dt (line 2) | function Dt(n,e,r,i){var t;if(Array.isArray(e))S.each(e,function(e,t){r|...
function Ft (line 2) | function Ft(o){return function(e,t){"string"!=typeof e&&(t=e,e="*");var ...
function Bt (line 2) | function Bt(t,i,o,a){var s={},u=t===Mt;function l(e){var r;return s[e]=!...
function $t (line 2) | function $t(e,t){var n,r,i=S.ajaxSettings.flatOptions||{};for(n in t)voi...
function l (line 2) | function l(e,t,n,r){var i,o,a,s,u,l=t;h||(h=!0,d&&C.clearTimeout(d),c=vo...
FILE: docs/_static/js/badge_only.js
function r (line 1) | function r(n){if(t[n])return t[n].exports;var o=t[n]={i:n,l:!1,exports:{...
FILE: docs/_static/js/theme.js
function t (line 1) | function t(i){if(e[i])return e[i].exports;var o=e[i]={i:i,l:!1,exports:{...
FILE: docs/_static/language_data.js
function splitQuery (line 278) | function splitQuery(query) {
FILE: docs/_static/searchtools.js
function splitQuery (line 47) | function splitQuery(query) {
function pulse (line 117) | function pulse() {
function displayNextItem (line 247) | function displayNextItem() {
FILE: docs/_static/underscore-1.12.0.js
function restArguments (line 64) | function restArguments(func, startIndex) {
function isObject (line 88) | function isObject(obj) {
function isNull (line 94) | function isNull(obj) {
function isUndefined (line 99) | function isUndefined(obj) {
function isBoolean (line 104) | function isBoolean(obj) {
function isElement (line 109) | function isElement(obj) {
function tagTester (line 114) | function tagTester(name) {
function ie10IsDataView (line 162) | function ie10IsDataView(obj) {
function has (line 173) | function has(obj, key) {
function isFinite$1 (line 192) | function isFinite$1(obj) {
function isNaN$1 (line 197) | function isNaN$1(obj) {
function constant (line 202) | function constant(value) {
function createSizePropertyCheck (line 209) | function createSizePropertyCheck(getSizeProperty) {
function shallowProperty (line 217) | function shallowProperty(key) {
function isTypedArray (line 232) | function isTypedArray(obj) {
function emulatedSet (line 248) | function emulatedSet(keys) {
function collectNonEnumProps (line 263) | function collectNonEnumProps(obj, keys) {
function keys (line 283) | function keys(obj) {
function isEmpty (line 295) | function isEmpty(obj) {
function isMatch (line 307) | function isMatch(object, attrs) {
function _ (line 321) | function _(obj) {
function toBufferView (line 344) | function toBufferView(bufferSource) {
function eq (line 356) | function eq(a, b, aStack, bStack) {
function deepEq (line 371) | function deepEq(a, b, aStack, bStack) {
function isEqual (line 476) | function isEqual(a, b) {
function allKeys (line 481) | function allKeys(obj) {
function ie11fingerprint (line 494) | function ie11fingerprint(methods) {
function values (line 533) | function values(obj) {
function pairs (line 545) | function pairs(obj) {
function invert (line 556) | function invert(obj) {
function functions (line 566) | function functions(obj) {
function createAssigner (line 575) | function createAssigner(keysFunc, defaults) {
function ctor (line 605) | function ctor() {
function baseCreate (line 610) | function baseCreate(prototype) {
function create (line 623) | function create(prototype, props) {
function clone (line 630) | function clone(obj) {
function tap (line 638) | function tap(obj, interceptor) {
function toPath (line 645) | function toPath(path) {
function toPath$1 (line 652) | function toPath$1(path) {
function deepGet (line 657) | function deepGet(obj, path) {
function get (line 670) | function get(object, path, defaultValue) {
function has$1 (line 678) | function has$1(obj, path) {
function identity (line 690) | function identity(value) {
function matcher (line 696) | function matcher(attrs) {
function property (line 705) | function property(path) {
function optimizeCb (line 715) | function optimizeCb(func, context, argCount) {
function baseIteratee (line 737) | function baseIteratee(value, context, argCount) {
function iteratee (line 747) | function iteratee(value, context) {
function cb (line 754) | function cb(value, context, argCount) {
function mapObject (line 761) | function mapObject(obj, iteratee, context) {
function noop (line 774) | function noop(){}
function propertyOf (line 777) | function propertyOf(obj) {
function times (line 785) | function times(n, iteratee, context) {
function random (line 793) | function random(min, max) {
function createEscaper (line 808) | function createEscaper(map) {
function escapeChar (line 867) | function escapeChar(match) {
function template (line 875) | function template(text, settings, oldSettings) {
function result (line 935) | function result(obj, path, fallback) {
function uniqueId (line 955) | function uniqueId(prefix) {
function chain (line 961) | function chain(obj) {
function executeBound (line 970) | function executeBound(sourceFunc, boundFunc, context, callingContext, ar...
function flatten (line 1015) | function flatten(input, depth, strict, output) {
function memoize (line 1056) | function memoize(func, hasher) {
function throttle (line 1084) | function throttle(func, wait, options) {
function debounce (line 1129) | function debounce(func, wait, immediate) {
function wrap (line 1166) | function wrap(func, wrapper) {
function negate (line 1171) | function negate(predicate) {
function compose (line 1179) | function compose() {
function after (line 1191) | function after(times, func) {
function before (line 1201) | function before(times, func) {
function findKey (line 1217) | function findKey(obj, predicate, context) {
function createPredicateIndexFinder (line 1227) | function createPredicateIndexFinder(dir) {
function sortedIndex (line 1247) | function sortedIndex(array, obj, iteratee, context) {
function createIndexFinder (line 1259) | function createIndexFinder(dir, predicateFind, sortedIndex) {
function find (line 1294) | function find(obj, predicate, context) {
function findWhere (line 1302) | function findWhere(obj, attrs) {
function each (line 1310) | function each(obj, iteratee, context) {
function map (line 1327) | function map(obj, iteratee, context) {
function createReduce (line 1340) | function createReduce(dir) {
function filter (line 1372) | function filter(obj, predicate, context) {
function reject (line 1382) | function reject(obj, predicate, context) {
function every (line 1387) | function every(obj, predicate, context) {
function some (line 1399) | function some(obj, predicate, context) {
function contains (line 1411) | function contains(obj, item, fromIndex, guard) {
function pluck (line 1441) | function pluck(obj, key) {
function where (line 1447) | function where(obj, attrs) {
function max (line 1452) | function max(obj, iteratee, context) {
function min (line 1477) | function min(obj, iteratee, context) {
function sample (line 1505) | function sample(obj, n, guard) {
function shuffle (line 1524) | function shuffle(obj) {
function sortBy (line 1529) | function sortBy(obj, iteratee, context) {
function group (line 1550) | function group(behavior, partition) {
function toArray (line 1589) | function toArray(obj) {
function size (line 1601) | function size(obj) {
function keyInObj (line 1608) | function keyInObj(value, key, obj) {
function initial (line 1650) | function initial(array, n, guard) {
function first (line 1656) | function first(array, n, guard) {
function rest (line 1665) | function rest(array, n, guard) {
function last (line 1671) | function last(array, n, guard) {
function compact (line 1678) | function compact(array) {
function flatten$1 (line 1684) | function flatten$1(array, depth) {
function uniq (line 1707) | function uniq(array, isSorted, iteratee, context) {
function intersection (line 1742) | function intersection(array) {
function unzip (line 1759) | function unzip(array) {
function object (line 1776) | function object(list, values) {
function range (line 1791) | function range(start, stop, step) {
function chunk (line 1812) | function chunk(array, count) {
function chainResult (line 1823) | function chainResult(instance, obj) {
function mixin (line 1828) | function mixin(obj) {
FILE: docs/_static/underscore-1.3.1.js
function eq (line 669) | function eq(a, b, stack) {
FILE: docs/_static/underscore.js
function j (line 6) | function j(n,r){return r=null==r?n.length-1:+r,function(){for(var t=Math...
function _ (line 6) | function _(n){var r=typeof n;return"function"===r||"object"===r&&!!n}
function w (line 6) | function w(n){return void 0===n}
function A (line 6) | function A(n){return!0===n||!1===n||"[object Boolean]"===a.call(n)}
function x (line 6) | function x(n){var r="[object "+n+"]";return function(n){return a.call(n)...
function W (line 6) | function W(n,r){return null!=n&&f.call(n,r)}
function C (line 6) | function C(n){return O(n)&&y(n)}
function K (line 6) | function K(n){return function(){return n}}
function J (line 6) | function J(n){return function(r){var t=n(r);return"number"==typeof t&&t>...
function $ (line 6) | function $(n){return function(r){return null==r?void 0:r[n]}}
function Z (line 6) | function Z(n,r){r=function(n){for(var r={},t=n.length,e=0;e<t;++e)r[n[e]...
function nn (line 6) | function nn(n){if(!_(n))return[];if(p)return p(n);var r=[];for(var t in ...
function rn (line 6) | function rn(n,r){var t=nn(r),e=t.length;if(null==n)return!e;for(var u=Ob...
function tn (line 6) | function tn(n){return n instanceof tn?n:this instanceof tn?void(this._wr...
function en (line 6) | function en(n){return new Uint8Array(n.buffer||n,n.byteOffset||0,G(n))}
function on (line 6) | function on(n,r,t,e){if(n===r)return 0!==n||1/n==1/r;if(null==n||null==r...
function an (line 6) | function an(n){if(!_(n))return[];var r=[];for(var t in n)r.push(t);retur...
function fn (line 6) | function fn(n){var r=Y(n);return function(t){if(null==t)return!1;var e=a...
function jn (line 6) | function jn(n){for(var r=nn(n),t=r.length,e=Array(t),u=0;u<t;u++)e[u]=n[...
function _n (line 6) | function _n(n){for(var r={},t=nn(n),e=0,u=t.length;e<u;e++)r[n[t[e]]]=t[...
function wn (line 6) | function wn(n){var r=[];for(var t in n)D(n[t])&&r.push(t);return r.sort()}
function An (line 6) | function An(n,r){return function(t){var e=arguments.length;if(r&&(t=Obje...
function Mn (line 6) | function Mn(n){if(!_(n))return{};if(v)return v(n);var r=function(){};r.p...
function En (line 6) | function En(n){return _(n)?U(n)?n.slice():xn({},n):n}
function Bn (line 6) | function Bn(n){return U(n)?n:[n]}
function Nn (line 6) | function Nn(n){return tn.toPath(n)}
function In (line 6) | function In(n,r){for(var t=r.length,e=0;e<t;e++){if(null==n)return;n=n[r...
function kn (line 6) | function kn(n,r,t){var e=In(n,Nn(r));return w(e)?t:e}
function Tn (line 6) | function Tn(n){return n}
function Dn (line 6) | function Dn(n){return n=Sn({},n),function(r){return rn(r,n)}}
function Rn (line 6) | function Rn(n){return n=Nn(n),function(r){return In(r,n)}}
function Fn (line 6) | function Fn(n,r,t){if(void 0===r)return n;switch(null==t?3:t){case 1:ret...
function Vn (line 6) | function Vn(n,r,t){return null==n?Tn:D(n)?Fn(n,r,t):_(n)&&!U(n)?Dn(n):Rn...
function Pn (line 6) | function Pn(n,r){return Vn(n,r,1/0)}
function qn (line 6) | function qn(n,r,t){return tn.iteratee!==Pn?tn.iteratee(n,r):Vn(n,r,t)}
function Un (line 6) | function Un(){}
function Wn (line 6) | function Wn(n,r){return null==r&&(r=n,n=0),n+Math.floor(Math.random()*(r...
function Ln (line 6) | function Ln(n){var r=function(r){return n[r]},t="(?:"+nn(n).join("|")+")...
function Xn (line 6) | function Xn(n){return"\\"+Hn[n]}
function Zn (line 6) | function Zn(n,r,t,e,u){if(!(e instanceof r))return n.apply(t,u);var o=Mn...
function er (line 6) | function er(n,r,t,e){if(e=e||[],r||0===r){if(r<=0)return e.concat(n)}els...
function ar (line 6) | function ar(n){return function(){return!n.apply(this,arguments)}}
function fr (line 6) | function fr(n,r){var t;return function(){return--n>0&&(t=r.apply(this,ar...
function lr (line 6) | function lr(n,r,t){r=qn(r,t);for(var e,u=nn(n),o=0,i=u.length;o<i;o++)if...
function sr (line 6) | function sr(n){return function(r,t,e){t=qn(t,e);for(var u=Y(r),o=n>0?0:u...
function hr (line 6) | function hr(n,r,t,e){for(var u=(t=qn(t,e,1))(r),o=0,i=Y(n);o<i;){var a=M...
function yr (line 6) | function yr(n,r,t){return function(e,u,o){var a=0,f=Y(e);if("number"==ty...
function br (line 6) | function br(n,r,t){var e=(tr(n)?pr:lr)(n,r,t);if(void 0!==e&&-1!==e)retu...
function mr (line 6) | function mr(n,r,t){var e,u;if(r=Fn(r,t),tr(n))for(e=0,u=n.length;e<u;e++...
function jr (line 6) | function jr(n,r,t){r=qn(r,t);for(var e=!tr(n)&&nn(n),u=(e||n).length,o=A...
function _r (line 6) | function _r(n){var r=function(r,t,e,u){var o=!tr(r)&&nn(r),i=(o||r).leng...
function xr (line 6) | function xr(n,r,t){var e=[];return r=qn(r,t),mr(n,(function(n,t,u){r(n,t...
function Sr (line 6) | function Sr(n,r,t){r=qn(r,t);for(var e=!tr(n)&&nn(n),u=(e||n).length,o=0...
function Or (line 6) | function Or(n,r,t){r=qn(r,t);for(var e=!tr(n)&&nn(n),u=(e||n).length,o=0...
function Mr (line 6) | function Mr(n,r,t,e){return tr(n)||(n=jn(n)),("number"!=typeof t||e)&&(t...
function Br (line 6) | function Br(n,r){return jr(n,Rn(r))}
function Nr (line 6) | function Nr(n,r,t){var e,u,o=-1/0,i=-1/0;if(null==r||"number"==typeof r&...
function Ir (line 6) | function Ir(n,r,t){if(null==r||t)return tr(n)||(n=jn(n)),n[Wn(n.length-1...
function kr (line 6) | function kr(n,r){return function(t,e,u){var o=r?[[],[]]:{};return e=qn(e...
function Pr (line 6) | function Pr(n,r,t){return r in t}
function Wr (line 6) | function Wr(n,r,t){return i.call(n,0,Math.max(0,n.length-(null==r||t?1:r...
function zr (line 6) | function zr(n,r,t){return null==n||n.length<1?null==r||t?void 0:[]:null=...
function Lr (line 6) | function Lr(n,r,t){return i.call(n,null==r||t?1:r)}
function Jr (line 6) | function Jr(n,r,t,e){A(r)||(e=t,t=r,r=!1),null!=t&&(t=qn(t,e));for(var u...
function Gr (line 6) | function Gr(n){for(var r=n&&Nr(n,Y).length||0,t=Array(r),e=0;e<r;e++)t[e...
function Qr (line 6) | function Qr(n,r){return n._chain?tn(r).chain():r}
function Xr (line 6) | function Xr(n){return mr(wn(n),(function(r){var t=tn[r]=n[r];tn.prototyp...
FILE: k_means_constrained/k_means_constrained_.py
function k_means_constrained (line 32) | def k_means_constrained(X, n_clusters, size_min=None, size_max=None, ini...
function kmeans_constrained_single (line 220) | def kmeans_constrained_single(X, n_clusters, size_min=None, size_max=None,
function _labels_constrained (line 367) | def _labels_constrained(X, centers, size_min, size_max, distances):
function minimum_cost_flow_problem_graph (line 417) | def minimum_cost_flow_problem_graph(X, C, D, size_min, size_max):
function solve_min_cost_flow_graph (line 471) | def solve_min_cost_flow_graph(edges, costs, capacities, supplies, n_C, n...
class KMeansConstrained (line 499) | class KMeansConstrained(KMeans):
method __init__ (line 616) | def __init__(self, n_clusters=8, size_min=None, size_max=None, init='k...
method fit (line 625) | def fit(self, X, y=None):
method predict (line 653) | def predict(self, X, size_min='init', size_max='init'):
method fit_predict (line 723) | def fit_predict(self, X, y=None):
FILE: k_means_constrained/sklearn_import/__init__.py
function get_config (line 4) | def get_config():
FILE: k_means_constrained/sklearn_import/base.py
class BaseEstimator (line 11) | class BaseEstimator(object):
method _get_param_names (line 22) | def _get_param_names(cls):
method get_params (line 48) | def get_params(self, deep=True):
method set_params (line 85) | def set_params(self, **params):
method __repr__ (line 121) | def __repr__(self):
method __getstate__ (line 126) | def __getstate__(self):
method __setstate__ (line 137) | def __setstate__(self, state):
class ClusterMixin (line 153) | class ClusterMixin(object):
method fit_predict (line 157) | def fit_predict(self, X, y=None):
class TransformerMixin (line 176) | class TransformerMixin(object):
method fit_transform (line 179) | def fit_transform(self, X, y=None, **fit_params):
function _pprint (line 209) | def _pprint(params, offset=0, printer=repr):
FILE: k_means_constrained/sklearn_import/cluster/k_means_.py
function _k_init (line 33) | def _k_init(X, n_clusters, x_squared_norms, random_state, n_local_trials...
function _validate_center_shape (line 139) | def _validate_center_shape(X, n_centers, centers):
function _tolerance (line 152) | def _tolerance(X, tol):
function _labels_inertia_precompute_dense (line 161) | def _labels_inertia_precompute_dense(X, x_squared_norms, centers, distan...
function _labels_inertia (line 205) | def _labels_inertia(X, x_squared_norms, centers,
function _init_centroids (line 258) | def _init_centroids(X, k, init, random_state=None, x_squared_norms=None,
class KMeans (line 339) | class KMeans(BaseEstimator, ClusterMixin, TransformerMixin):
method __init__ (line 471) | def __init__(self, n_clusters=8, init='k-means++', n_init=10,
method _check_fit_data (line 488) | def _check_fit_data(self, X):
method _check_test_data (line 496) | def _check_test_data(self, X):
method fit (line 507) | def fit(self, X, y=None):
method fit_predict (line 518) | def fit_predict(self, X, y=None):
method fit_transform (line 536) | def fit_transform(self, X, y=None):
method transform (line 558) | def transform(self, X):
method _transform (line 580) | def _transform(self, X):
method predict (line 584) | def predict(self, X):
method score (line 607) | def score(self, X, y=None):
FILE: k_means_constrained/sklearn_import/exceptions.py
class DataConversionWarning (line 1) | class DataConversionWarning(UserWarning):
class NotFittedError (line 19) | class NotFittedError(ValueError, AttributeError):
FILE: k_means_constrained/sklearn_import/externals/funcsigs.py
function formatannotation (line 28) | def formatannotation(annotation, base_module=None):
function _get_user_defined_method (line 36) | def _get_user_defined_method(cls, method_name, *nested):
function signature (line 52) | def signature(obj):
class _void (line 176) | class _void(object):
class _empty (line 180) | class _empty(object):
class _ParameterKind (line 184) | class _ParameterKind(int):
method __new__ (line 185) | def __new__(self, *args, **kwargs):
method __str__ (line 190) | def __str__(self):
method __repr__ (line 193) | def __repr__(self):
class Parameter (line 204) | class Parameter(object):
method __init__ (line 234) | def __init__(self, name, kind, default=_empty, annotation=_empty,
method name (line 264) | def name(self):
method default (line 268) | def default(self):
method annotation (line 272) | def annotation(self):
method kind (line 276) | def kind(self):
method replace (line 279) | def replace(self, name=_void, kind=_void, annotation=_void,
method __str__ (line 301) | def __str__(self):
method __repr__ (line 325) | def __repr__(self):
method __hash__ (line 329) | def __hash__(self):
method __eq__ (line 333) | def __eq__(self, other):
method __ne__ (line 340) | def __ne__(self, other):
class BoundArguments (line 344) | class BoundArguments(object):
method __init__ (line 361) | def __init__(self, signature, arguments):
method signature (line 366) | def signature(self):
method args (line 370) | def args(self):
method kwargs (line 398) | def kwargs(self):
method __hash__ (line 428) | def __hash__(self):
method __eq__ (line 432) | def __eq__(self, other):
method __ne__ (line 437) | def __ne__(self, other):
class Signature (line 441) | class Signature(object):
method __init__ (line 471) | def __init__(self, parameters=None, return_annotation=_empty,
method from_function (line 510) | def from_function(cls, func):
method parameters (line 583) | def parameters(self):
method return_annotation (line 590) | def return_annotation(self):
method replace (line 593) | def replace(self, parameters=_void, return_annotation=_void):
method __hash__ (line 608) | def __hash__(self):
method __eq__ (line 612) | def __eq__(self, other):
method __ne__ (line 642) | def __ne__(self, other):
method _bind (line 645) | def _bind(self, args, kwargs, partial=False):
method bind (line 773) | def bind(self, *args, **kwargs):
method bind_partial (line 780) | def bind_partial(self, *args, **kwargs):
method __str__ (line 787) | def __str__(self):
FILE: k_means_constrained/sklearn_import/fixes.py
function _parse_version (line 1) | def _parse_version(version_string):
FILE: k_means_constrained/sklearn_import/funcsigs.py
function signature (line 9) | def signature(obj):
FILE: k_means_constrained/sklearn_import/metrics/pairwise.py
function euclidean_distances (line 20) | def euclidean_distances(X, Y=None, Y_norm_squared=None, squared=False,
function pairwise_distances_argmin_min (line 117) | def pairwise_distances_argmin_min(X, Y, axis=1, metric="euclidean",
function check_pairwise_arrays (line 246) | def check_pairwise_arrays(X, Y, precomputed=False, dtype=None):
function manhattan_distances (line 316) | def manhattan_distances(X, Y=None, sum_over_features=True,
function cosine_distances (line 397) | def cosine_distances(X, Y=None):
function pairwise_distances (line 447) | def pairwise_distances(X, Y=None, metric="euclidean", n_jobs=1, **kwds):
function _return_float_dtype (line 556) | def _return_float_dtype(X, Y):
function _parallel_pairwise (line 580) | def _parallel_pairwise(X, Y, func, n_jobs, **kwds):
function _pairwise_callable (line 602) | def _pairwise_callable(X, Y, metric, **kwds):
function cosine_similarity (line 647) | def cosine_similarity(X, Y=None, dense_output=True):
FILE: k_means_constrained/sklearn_import/preprocessing/data.py
function normalize (line 12) | def normalize(X, norm='l2', axis=1, copy=True, return_norm=False):
function _handle_zeros_in_scale (line 110) | def _handle_zeros_in_scale(scale, copy=True):
FILE: k_means_constrained/sklearn_import/utils/__init__.py
function gen_batches (line 1) | def gen_batches(n, batch_size):
function gen_even_slices (line 26) | def gen_even_slices(n, n_packs, n_samples=None):
FILE: k_means_constrained/sklearn_import/utils/extmath.py
function row_norms (line 10) | def row_norms(X, squared=False):
function squared_norm (line 30) | def squared_norm(x):
function cartesian (line 44) | def cartesian(arrays, out=None):
function stable_cumsum (line 93) | def stable_cumsum(arr, axis=None, rtol=1e-05, atol=1e-08):
function safe_sparse_dot (line 122) | def safe_sparse_dot(a, b, dense_output=False):
FILE: k_means_constrained/sklearn_import/utils/fixes.py
function sparse_min_max (line 7) | def sparse_min_max(X, axis):
FILE: k_means_constrained/sklearn_import/utils/sparsefuncs.py
function mean_variance_axis (line 9) | def mean_variance_axis(X, axis):
function min_max_axis (line 46) | def min_max_axis(X, axis):
function _raise_typeerror (line 72) | def _raise_typeerror(X):
function _raise_error_wrong_axis (line 79) | def _raise_error_wrong_axis(axis):
FILE: k_means_constrained/sklearn_import/utils/validation.py
function check_array (line 14) | def check_array(array, accept_sparse=False, dtype="numeric", order=None,
function check_random_state (line 178) | def check_random_state(seed):
function as_float_array (line 199) | def as_float_array(X, copy=True, force_all_finite=True):
function _assert_all_finite (line 239) | def _assert_all_finite(X):
function _num_samples (line 253) | def _num_samples(x):
function _shape_repr (line 274) | def _shape_repr(shape):
function _ensure_sparse_format (line 308) | def _ensure_sparse_format(spmatrix, accept_sparse, dtype, copy,
function check_is_fitted (line 388) | def check_is_fitted(estimator, attributes, msg=None, all_or_any=all):
FILE: setup.py
function get_include (line 15) | def get_include():
function no_cythonize (line 28) | def no_cythonize(extensions, **_ignore):
FILE: tests/test_k_means_constrained_.py
function sort_coordinates (line 13) | def sort_coordinates(array):
function test_minimum_cost_flow_problem_graph (line 18) | def test_minimum_cost_flow_problem_graph():
function test_solve_min_cost_flow_graph (line 47) | def test_solve_min_cost_flow_graph():
function test__labels_constrained (line 76) | def test__labels_constrained():
function test_KMeansConstrained (line 109) | def test_KMeansConstrained():
function test_KMeansConstrained_predict_method (line 139) | def test_KMeansConstrained_predict_method():
function test_spare_not_implemented (line 169) | def test_spare_not_implemented():
FILE: tests/test_kmeans_constrained_from_sklearn.py
function test_labels_assignment_and_inertia (line 25) | def test_labels_assignment_and_inertia():
function _check_fitted_model (line 48) | def _check_fitted_model(km):
function test_k_means_plus_plus_init (line 65) | def test_k_means_plus_plus_init():
function test_k_means_new_centers (line 71) | def test_k_means_new_centers():
function test_k_means_plus_plus_init_2_jobs (line 96) | def test_k_means_plus_plus_init_2_jobs():
function test_k_means_random_init (line 106) | def test_k_means_random_init():
function test_k_means_perfect_init (line 112) | def test_k_means_perfect_init():
function test_k_means_n_init (line 119) | def test_k_means_n_init():
function test_k_means_explicit_init_shape (line 129) | def test_k_means_explicit_init_shape():
function test_k_means_fortran_aligned_data (line 155) | def test_k_means_fortran_aligned_data():
function test_k_means_invalid_init (line 167) | def test_k_means_invalid_init():
function test_k_means_copyx (line 172) | def test_k_means_copyx():
function test_k_means_non_collapsed (line 183) | def test_k_means_non_collapsed():
function test_predict (line 203) | def test_predict():
function test_score (line 221) | def test_score():
function test_transform (line 230) | def test_transform():
function test_fit_transform (line 242) | def test_fit_transform():
function test_n_init (line 248) | def test_n_init():
function test_k_means_function (line 266) | def test_k_means_function():
function test_max_iter_error (line 295) | def test_max_iter_error():
function test_float_precision (line 302) | def test_float_precision():
function test_k_means_init_centers (line 338) | def test_k_means_init_centers():
function test_sparse_k_means_init_centers (line 352) | def test_sparse_k_means_init_centers():
function test_sparse_validate_centers (line 370) | def test_sparse_validate_centers():
Condensed preview — 91 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (2,030K chars).
[
{
"path": ".bumpversion.cfg",
"chars": 145,
"preview": "[bumpversion]\ncurrent_version = 0.9.0\ncommit = True\ntag = True\n\n[bumpversion:file:setup.cfg]\n\n[bumpversion:file:k_means_"
},
{
"path": ".github/ISSUE_TEMPLATE/bug_report.md",
"chars": 479,
"preview": "---\nname: Bug report\nabout: Create a report to help us improve\ntitle: \"[BUG]\"\nlabels: ''\nassignees: ''\n\n---\n\n**Describe "
},
{
"path": ".github/workflows/build_wheels.yml",
"chars": 1886,
"preview": "name: Build & Test\n\non:\n schedule:\n - cron: '0 1 * * 4' # Runs every Thursday at 1 AM (UTC)\n pull_request:\n "
},
{
"path": ".gitignore",
"chars": 2726,
"preview": "# Created by .ignore support plugin (hsz.mobi)\n### Python template\n# Byte-compiled / optimized / DLL files\n__pycache__/\n"
},
{
"path": "CITATION.cff",
"chars": 299,
"preview": "cff-version: 1.2.0\nmessage: \"If you use this software, please cite it as below.\"\nauthors:\n- family-names: \"Levy-Kramer\"\n"
},
{
"path": "CLAUDE.md",
"chars": 8097,
"preview": "# CLAUDE.md\n\nThis file provides guidance for AI assistants working with the k-means-constrained codebase.\n\n## Project Ov"
},
{
"path": "LICENSE",
"chars": 1532,
"preview": "BSD 3-Clause License\n\nCopyright (c) 2022, Josh Levy-Kramer & Outra Limited\nAll rights reserved.\n\nRedistribution and use "
},
{
"path": "MANIFEST.in",
"chars": 125,
"preview": "include README*.md\ninclude requirements*.txt\ninclude LICENSE\ninclude pyproject.toml\nglobal-include *.pyx\nglobal-include "
},
{
"path": "Makefile",
"chars": 1424,
"preview": ".PHONY: build dist redist install dist-no-cython install-from-source clean venv-create venv-activate check-dist test-pyp"
},
{
"path": "README.md",
"chars": 6208,
"preview": "[](https://pypi.org/project/k-means-constrained/)\n;\n\n/* -- page layout ----------------------------------------------------------- */\n\nbody {\n "
},
{
"path": "docs/_static/basic.css",
"chars": 14652,
"preview": "/*\n * basic.css\n * ~~~~~~~~~\n *\n * Sphinx stylesheet -- basic theme.\n *\n * :copyright: Copyright 2007-2021 by the Sphinx"
},
{
"path": "docs/_static/css/badge_only.css",
"chars": 3275,
"preview": ".fa:before{-webkit-font-smoothing:antialiased}.clearfix{*zoom:1}.clearfix:after,.clearfix:before{display:table;content:\""
},
{
"path": "docs/_static/css/theme.css",
"chars": 122337,
"preview": "html{box-sizing:border-box}*,:after,:before{box-sizing:inherit}article,aside,details,figcaption,figure,footer,header,hgr"
},
{
"path": "docs/_static/custom.css",
"chars": 42,
"preview": "/* This file intentionally left blank. */\n"
},
{
"path": "docs/_static/doctools.js",
"chars": 9592,
"preview": "/*\n * doctools.js\n * ~~~~~~~~~~~\n *\n * Sphinx JavaScript utilities for all documentation.\n *\n * :copyright: Copyright 20"
},
{
"path": "docs/_static/documentation_options.js",
"chars": 355,
"preview": "var DOCUMENTATION_OPTIONS = {\n URL_ROOT: document.getElementById(\"documentation_options\").getAttribute('data-url_root"
},
{
"path": "docs/_static/jquery-3.4.1.js",
"chars": 280364,
"preview": "/*!\n * jQuery JavaScript Library v3.4.1\n * https://jquery.com/\n *\n * Includes Sizzle.js\n * https://sizzlejs.com/\n *\n * C"
},
{
"path": "docs/_static/jquery-3.5.1.js",
"chars": 287630,
"preview": "/*!\n * jQuery JavaScript Library v3.5.1\n * https://jquery.com/\n *\n * Includes Sizzle.js\n * https://sizzlejs.com/\n *\n * C"
},
{
"path": "docs/_static/jquery.js",
"chars": 89476,
"preview": "/*! jQuery v3.5.1 | (c) JS Foundation and other contributors | jquery.org/license */\n!function(e,t){\"use strict\";\"object"
},
{
"path": "docs/_static/js/badge_only.js",
"chars": 934,
"preview": "!function(e){var t={};function r(n){if(t[n])return t[n].exports;var o=t[n]={i:n,l:!1,exports:{}};return e[n].call(o.expo"
},
{
"path": "docs/_static/js/theme.js",
"chars": 4916,
"preview": "!function(n){var e={};function t(i){if(e[i])return e[i].exports;var o=e[i]={i:i,l:!1,exports:{}};return n[i].call(o.expo"
},
{
"path": "docs/_static/language_data.js",
"chars": 10854,
"preview": "/*\n * language_data.js\n * ~~~~~~~~~~~~~~~~\n *\n * This script contains the language-specific data used by searchtools.js,"
},
{
"path": "docs/_static/pygments.css",
"chars": 4819,
"preview": "pre { line-height: 125%; }\ntd.linenos .normal { color: inherit; background-color: transparent; padding-left: 5px; paddin"
},
{
"path": "docs/_static/searchtools.js",
"chars": 16578,
"preview": "/*\n * searchtools.js\n * ~~~~~~~~~~~~~~~~\n *\n * Sphinx JavaScript utilities for the full-text search.\n *\n * :copyright: C"
},
{
"path": "docs/_static/underscore-1.12.0.js",
"chars": 67680,
"preview": "(function (global, factory) {\n typeof exports === 'object' && typeof module !== 'undefined' ? module.exports = factory("
},
{
"path": "docs/_static/underscore-1.3.1.js",
"chars": 35168,
"preview": "// Underscore.js 1.3.1\n// (c) 2009-2012 Jeremy Ashkenas, DocumentCloud Inc.\n// Underscore is freely distribu"
},
{
"path": "docs/_static/underscore.js",
"chars": 19358,
"preview": "!function(n,r){\"object\"==typeof exports&&\"undefined\"!=typeof module?module.exports=r():\"function\"==typeof define&&define"
},
{
"path": "docs/genindex.html",
"chars": 7173,
"preview": "\n\n<!DOCTYPE html>\n<html class=\"writer-html5\" lang=\"en\" >\n<head>\n <meta charset=\"utf-8\" />\n \n <meta name=\"viewport\" co"
},
{
"path": "docs/index.html",
"chars": 29947,
"preview": "\n\n<!DOCTYPE html>\n<html class=\"writer-html5\" lang=\"en\" >\n<head>\n <meta charset=\"utf-8\" />\n \n <meta name=\"viewport\" co"
},
{
"path": "docs/modules.html",
"chars": 4127,
"preview": "\n\n<!DOCTYPE html>\n<html class=\"writer-html5\" lang=\"en\" >\n<head>\n <meta charset=\"utf-8\">\n \n <meta name=\"viewport\" cont"
},
{
"path": "docs/objects.inv",
"chars": 269,
"preview": "# Sphinx inventory version 2\n# Project: k-means-constrained\n# Version: \n# The remainder of this file is compressed using"
},
{
"path": "docs/py-modindex.html",
"chars": 4524,
"preview": "\n\n<!DOCTYPE html>\n<html class=\"writer-html5\" lang=\"en\" >\n<head>\n <meta charset=\"utf-8\" />\n \n <meta name=\"viewport\" co"
},
{
"path": "docs/search.html",
"chars": 4451,
"preview": "\n\n<!DOCTYPE html>\n<html class=\"writer-html5\" lang=\"en\" >\n<head>\n <meta charset=\"utf-8\" />\n \n <meta name=\"viewport\" co"
},
{
"path": "docs/searchindex.js",
"chars": 3249,
"preview": "Search.setIndex({docnames:[\"index\"],envversion:{\"sphinx.domains.c\":2,\"sphinx.domains.changeset\":1,\"sphinx.domains.citati"
},
{
"path": "docs_source/Makefile",
"chars": 634,
"preview": "# Minimal makefile for Sphinx documentation\n#\n\n# You can set these variables from the command line, and also\n# from the "
},
{
"path": "docs_source/README.md",
"chars": 75,
"preview": "Build docs:\n```\nmake html\nmv _build/html ../docs\ntocuh ../docs/.nojekyll\n``"
},
{
"path": "docs_source/conf.py",
"chars": 2346,
"preview": "# Configuration file for the Sphinx documentation builder.\n#\n# This file only contains a selection of the most common op"
},
{
"path": "docs_source/index.rst",
"chars": 638,
"preview": ".. k-means-constrained documentation master file, created by\n sphinx-quickstart on Fri Mar 6 13:31:12 2020.\n You ca"
},
{
"path": "docs_source/make.bat",
"chars": 795,
"preview": "@ECHO OFF\r\n\r\npushd %~dp0\r\n\r\nREM Command file for Sphinx documentation\r\n\r\nif \"%SPHINXBUILD%\" == \"\" (\r\n\tset SPHINXBUILD=sp"
},
{
"path": "etc/benchmark.ipynb",
"chars": 289590,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"code\",\n \"execution_count\": 1,\n \"metadata\": {},\n \"outputs\": [],\n \"source\": [\n "
},
{
"path": "etc/benchmark_k_means.py",
"chars": 2227,
"preview": "#!/usr/bin/env python3\n\nfrom argparse import ArgumentParser\nfrom sklearn.cluster import KMeans\nimport numpy as np\nimport"
},
{
"path": "etc/benchmark_k_means_constrained.py",
"chars": 2436,
"preview": "#!/usr/bin/env python3\n\nfrom argparse import ArgumentParser\nimport k_means_constrained\nimport numpy as np\nimport time\nim"
},
{
"path": "etc/cython_benchmark.ipynb",
"chars": 9591,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"code\",\n \"execution_count\": 20,\n \"metadata\": {},\n \"outputs\": [],\n \"source\": [\n"
},
{
"path": "k_means_constrained/__init__.py",
"chars": 109,
"preview": "\n__all__ = ['KMeansConstrained']\n__version__ = '0.9.0'\n\nfrom .k_means_constrained_ import KMeansConstrained\n\n"
},
{
"path": "k_means_constrained/k_means_constrained_.py",
"chars": 28568,
"preview": "\"\"\"k-means-constrained\"\"\"\n\n# Authors: Josh Levy-Kramer <josh@levykramer.co.uk>\n# Gael Varoquaux <gael.varoquaux"
},
{
"path": "k_means_constrained/sklearn_import/README",
"chars": 1701,
"preview": "Code taken from and slightly modified:\nhttps://github.com/scikit-learn/scikit-learn/tree/0.19.X/sklearn/cluster\n\nSubject"
},
{
"path": "k_means_constrained/sklearn_import/__init__.py",
"chars": 370,
"preview": "import os\n\n\ndef get_config():\n \"\"\"Retrieve current values for configuration set by :func:`set_config`\n\n Returns\n "
},
{
"path": "k_means_constrained/sklearn_import/base.py",
"chars": 9283,
"preview": "import warnings\nfrom collections import defaultdict\n\nimport numpy as np\nimport six\n\nfrom k_means_constrained.sklearn_imp"
},
{
"path": "k_means_constrained/sklearn_import/cluster/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "k_means_constrained/sklearn_import/cluster/_k_means.pyx",
"chars": 4851,
"preview": "# cython: profile=True\n# distutils: define_macros=NPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION\n# Profiling is enabled by de"
},
{
"path": "k_means_constrained/sklearn_import/cluster/k_means_.py",
"chars": 22850,
"preview": "\"\"\"K-means clustering\"\"\"\n\n# Authors: Gael Varoquaux <gael.varoquaux@normalesup.org>\n# Thomas Rueckstiess <rueck"
},
{
"path": "k_means_constrained/sklearn_import/exceptions.py",
"chars": 1438,
"preview": "class DataConversionWarning(UserWarning):\n \"\"\"Warning used to notify implicit data conversions happening in the code."
},
{
"path": "k_means_constrained/sklearn_import/externals/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "k_means_constrained/sklearn_import/externals/funcsigs.py",
"chars": 29912,
"preview": "# Copyright 2001-2013 Python Software Foundation; All Rights Reserved\n\"\"\"Function signature objects for callables\n\nBack "
},
{
"path": "k_means_constrained/sklearn_import/fixes.py",
"chars": 270,
"preview": "def _parse_version(version_string):\n version = []\n for x in version_string.split('.'):\n try:\n ve"
},
{
"path": "k_means_constrained/sklearn_import/funcsigs.py",
"chars": 4986,
"preview": "import functools\nimport types\nfrom collections import OrderedDict\n\nfrom k_means_constrained.sklearn_import.externals.fun"
},
{
"path": "k_means_constrained/sklearn_import/metrics/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "k_means_constrained/sklearn_import/metrics/pairwise.py",
"chars": 25114,
"preview": "import itertools\nimport warnings\nfrom functools import partial\n\nimport numpy as np\nfrom scipy.sparse import issparse, cs"
},
{
"path": "k_means_constrained/sklearn_import/metrics/pairwise_fast.pyx",
"chars": 1699,
"preview": "#cython: boundscheck=False\n#cython: cdivision=True\n#cython: wraparound=False\n# distutils: define_macros=NPY_NO_DEPRECATE"
},
{
"path": "k_means_constrained/sklearn_import/preprocessing/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "k_means_constrained/sklearn_import/preprocessing/data.py",
"chars": 4289,
"preview": "import numpy as np\nfrom scipy import sparse\n\nfrom k_means_constrained.sklearn_import.utils.sparsefuncs_fast import inpla"
},
{
"path": "k_means_constrained/sklearn_import/utils/__init__.py",
"chars": 1979,
"preview": "def gen_batches(n, batch_size):\n \"\"\"Generator to create slices containing batch_size elements, from 0 to n.\n\n The "
},
{
"path": "k_means_constrained/sklearn_import/utils/extmath.py",
"chars": 4244,
"preview": "import warnings\n\nimport numpy as np\nfrom scipy.sparse import issparse, csr_matrix\nfrom k_means_constrained.sklearn_impor"
},
{
"path": "k_means_constrained/sklearn_import/utils/fixes.py",
"chars": 258,
"preview": "import numpy as np\nfrom k_means_constrained.sklearn_import.fixes import _parse_version\n\nnp_version = _parse_version(np._"
},
{
"path": "k_means_constrained/sklearn_import/utils/sparsefuncs.py",
"chars": 2185,
"preview": "from scipy import sparse as sp\nfrom k_means_constrained.sklearn_import.utils.fixes import sparse_min_max\n\nfrom .sparsefu"
},
{
"path": "k_means_constrained/sklearn_import/utils/sparsefuncs_fast.pyx",
"chars": 13058,
"preview": "# Authors: Mathieu Blondel\n# Olivier Grisel\n# Peter Prettenhofer\n# Lars Buitinck\n# G"
},
{
"path": "k_means_constrained/sklearn_import/utils/validation.py",
"chars": 16615,
"preview": "import numbers\nimport warnings\n\nimport numpy as np\nfrom scipy import sparse as sp\nfrom k_means_constrained.sklearn_impor"
},
{
"path": "pyproject.toml",
"chars": 85,
"preview": "[build-system]\nrequires = [\"setuptools\", \"wheel\", \"cython>=3.0.11\", \"numpy>=2.0,<3\"]\n"
},
{
"path": "requirements-dev.txt",
"chars": 138,
"preview": "-r requirements.txt\npytest>=5.1\npandas>=2.2.3\ntwine\nsphinx\nsphinx-rtd-theme\nnumpydoc\nbump2version\nnose\nscikit-learn>=1.5"
},
{
"path": "requirements.txt",
"chars": 63,
"preview": "ortools >= 9.15.6755\nscipy >= 1.14.1\nnumpy >= 2.1.1\nsix\njoblib\n"
},
{
"path": "setup.cfg",
"chars": 967,
"preview": "[metadata]\nname = k-means-constrained\nversion = 0.9.0\ndescription = K-Means clustering constrained with minimum and maxi"
},
{
"path": "setup.py",
"chars": 2140,
"preview": "#!/usr/bin/env python3\n\n\"\"\"\nBased on template: https://github.com/FedericoStra/cython-package-example\n\"\"\"\n\nfrom setuptoo"
},
{
"path": "tests/test_k_means_constrained_.py",
"chars": 8022,
"preview": "#!/usr/bin/env python\n\nimport numpy as np\nimport pandas as pd\nimport pytest\nfrom scipy.sparse import csc_matrix, isspars"
},
{
"path": "tests/test_kmeans_constrained_from_sklearn.py",
"chars": 13631,
"preview": "# Tests copied and modified from: https://github.com/scikit-learn/scikit-learn/blob/0.19.X/sklearn/cluster/tests/test_k_"
},
{
"path": "tox.ini",
"chars": 388,
"preview": "# Needed for setup.py to work correctly (no idea why)\n[tox]\nenvlist = py{38,39}\n\n[testenv]\nbasepython =\n py38: python"
}
]
// ... and 4 more files (download for full content)
About this extraction
This page contains the full source code of the joshlk/k-means-constrained GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 91 files (1.8 MB), approximately 653.5k tokens, and a symbol index with 572 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.