Showing preview only (573K chars total). Download the full file or copy to clipboard to get everything.
Repository: voxel51/fiftyone-brain
Branch: develop
Commit: 05fccee0ae1c
Files: 61
Total size: 549.0 KB
Directory structure:
gitextract_zu49vpz0/
├── .github/
│ ├── CODEOWNERS
│ ├── dependabot.yml
│ ├── pull_request_template.md
│ └── workflows/
│ └── build.yml
├── .gitignore
├── .pre-commit-config.yaml
├── .prettierrc
├── CONTRIBUTING.md
├── LICENSE
├── MANIFEST.in
├── README.md
├── RELEASING.md
├── STYLE_GUIDE.md
├── fiftyone/
│ ├── __init__.py
│ └── brain/
│ ├── __init__.py
│ ├── config.py
│ ├── internal/
│ │ ├── __init__.py
│ │ ├── core/
│ │ │ ├── __init__.py
│ │ │ ├── duplicates.py
│ │ │ ├── elasticsearch.py
│ │ │ ├── hardness.py
│ │ │ ├── lancedb.py
│ │ │ ├── leaky_splits.py
│ │ │ ├── milvus.py
│ │ │ ├── mistakenness.py
│ │ │ ├── mongodb.py
│ │ │ ├── mosaic.py
│ │ │ ├── pgvector.py
│ │ │ ├── pinecone.py
│ │ │ ├── qdrant.py
│ │ │ ├── redis.py
│ │ │ ├── representativeness.py
│ │ │ ├── sklearn.py
│ │ │ ├── uniqueness.py
│ │ │ ├── utils.py
│ │ │ └── visualization.py
│ │ └── models/
│ │ ├── .gitignore
│ │ ├── __init__.py
│ │ ├── manifest.json
│ │ ├── simple_resnet.py
│ │ └── torch.py
│ ├── similarity.py
│ └── visualization.py
├── install.bat
├── install.sh
├── pylintrc
├── pyproject.toml
├── pytest.ini
├── requirements/
│ ├── build.txt
│ ├── common.txt
│ ├── dev.txt
│ └── prod.txt
├── requirements.txt
├── setup.py
└── tests/
├── README.md
├── intensive/
│ ├── test_interface.py
│ ├── test_similarity.py
│ ├── test_uniqueness.py
│ └── test_visualization.py
├── models/
│ └── test_simple_resnet.py
└── test_uniqueness.py
================================================
FILE CONTENTS
================================================
================================================
FILE: .github/CODEOWNERS
================================================
* @voxel51/developers
# Aloha!
.github/ @voxel51/aloha-shirts
pyproject.toml @voxel51/aloha-shirts
RELEASING.md @voxel51/aloha-shirts
setup.py @voxel51/aloha-shirts
================================================
FILE: .github/dependabot.yml
================================================
---
version: 2
updates:
- package-ecosystem: "github-actions"
directory: "/"
schedule:
interval: "weekly"
day: "wednesday"
time: "14:00"
timezone: "UTC"
================================================
FILE: .github/pull_request_template.md
================================================
# Rationale
<!-- Explain why you are making this change. Describe the problem. -->
## Changes
<!-- Describe the changes. -->
## Testing
<!-- Describe the way the changes were tested. -->
<!-- Optional Sections:
## Screenshots
## To Do
## Notes
## Related
-->
<!-- Template for collapsed sections
<details>
<summary></summary>
</details>
-->
================================================
FILE: .github/workflows/build.yml
================================================
name: Build
on:
pull_request:
branches:
- develop
types: [opened, synchronize]
push:
branches:
- develop
tags:
- v*
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Clone fiftyone-brain
uses: actions/checkout@v6
with:
submodules: true
- name: Set up Python
uses: actions/setup-python@v6
with:
python-version: 3.9
- name: Check Python version
run: |
python --version
pip --version
- name: Install dependencies
run: |
pip install --upgrade pip setuptools wheel
pip install -r requirements/build.txt
- name: Set environment
env:
RELEASE_TAG: ${{ github.ref }}
run: |
if [[ $RELEASE_TAG =~ ^refs\/tags\/v.* ]]; then
echo "RELEASE_VERSION=$(echo '${{ github.ref }}' | sed 's/^refs\/tags\/v//')" >> $GITHUB_ENV
fi
- name: Build wheel
run: |
python setup.py sdist bdist_wheel
- name: Upload wheel
uses: actions/upload-artifact@v7
with:
name: dist
path: dist/
retention-days: 1
test:
needs: [build]
runs-on: ubuntu-latest
env:
FIFTYONE_DATASET_ZOO_DIR: ${{ github.workspace }}/.fiftyone
FIFTYONE_DO_NOT_TRACK: true
FIFTYONE_MODEL_ZOO_DIR: ${{ github.workspace }}/.fiftyone
permissions:
contents: read
id-token: write
strategy:
fail-fast: false
matrix:
python:
- "3.9"
- "3.10"
- "3.11"
steps:
- name: Clone fiftyone-brain
uses: actions/checkout@v6
with:
submodules: true
- name: Clone fiftyone
uses: actions/checkout@v6
with:
fetch-depth: 1
path: fiftyone-src
ref: develop
repository: voxel51/fiftyone
- name: Clone voxel51-eta
uses: actions/checkout@v6
if: ${{ !startsWith(github.ref, 'refs/heads/rel') && !startsWith(github.ref, 'refs/tags/') }}
with:
fetch-depth: 1
path: eta
ref: develop
repository: voxel51/eta
# ETA tests will create a storage client which,
# in it's __init__, tries to log in to GCP.
# See tests/tests_uniqueness.py
- name: Authenticate to Google Cloud
uses: google-github-actions/auth@v3
with:
project_id: ${{ secrets.REPO_GCP_PROJECT }}
service_account: ${{ secrets.REPO_GCP_SERVICE_ACCOUNT }}
workload_identity_provider: ${{ secrets.REPO_GOOGLE_WORKLOAD_IDP }}
- name: Set Up Cloud SDK
uses: google-github-actions/setup-gcloud@v3
- name: Set up Python ${{ matrix.python }}
uses: actions/setup-python@v6
with:
python-version: ${{ matrix.python }}
- name: Free Disk Space (Ubuntu) # standard runner's 14 GB available disk size isn't enough. Need at least 22 GB free.
uses: jlumbroso/free-disk-space@v1.3.1
- name: Install dependencies
run: |
pip install --upgrade pip setuptools wheel
- name: Download fiftyone-brain wheel
uses: actions/download-artifact@v8
with:
name: dist
path: dist/
- name: Install fiftyone
working-directory: fiftyone-src
run: |
python setup.py bdist_wheel
pip install voxel51-eta[storage] fiftyone-db
pip install ./dist/*.whl
- name: Install ETA from source
working-directory: eta
# Don't install from source if this is a release.
# Install from PyPI
if: ${{ !startsWith(github.ref, 'refs/heads/rel') && !startsWith(github.ref, 'refs/tags/') }}
run: |
echo "Installing ETA from source because github.ref = ${{ github.ref }} (not a release)"
python setup.py bdist_wheel
pip install ./dist/*.whl --force-reinstall
- name: Reinstall fiftyone-brain
run: |
pip install --force-reinstall --no-deps dist/*.whl
- name: Install test dependencies
run: |
pip install imageio pytest torch torchvision
- name: Cache Zoo
id: fiftyone-cache
uses: actions/cache@v5
with:
path: |
.fiftyone
key: zoo-${{ hashFiles('tests/**') }}
- name: Run tests
run: |
pytest --verbose tests/ --ignore tests/intensive/
publish:
needs: [build, test]
if: startsWith(github.ref, 'refs/tags/v')
runs-on: ubuntu-latest
environment: release # For trusted publishing. See below.
permissions:
contents: read
id-token: write
steps:
- name: Download wheels
uses: actions/download-artifact@v8
with:
name: dist
path: dist/
# Utilize
# [trusted publishers](https://docs.pypi.org/trusted-publishers/)
# This will use OIDC to publish the dists/ package to pypi.
# See
# [fiftyone-brain](https://pypi.org/manage/project/fiftyone-brain/settings/publishing/)
- name: Publish
uses: pypa/gh-action-pypi-publish@v1.14.0
================================================
FILE: .gitignore
================================================
__pycache__
.DS_store
.ipynb_checkpoints
*~
*.egg-info
*.py[cod]
*.pth
*.swp
.idea
.project
.pydevproject
build/
dist/
/fiftyone/brain/internal/models/cache/**/*
!/fiftyone/brain/internal/models/cache/manifest.json
*.pth
================================================
FILE: .pre-commit-config.yaml
================================================
repos:
- repo: https://github.com/asottile/blacken-docs
rev: v1.12.0
hooks:
- id: blacken-docs
additional_dependencies: [black==21.12b0]
args: ["-l 79"]
- repo: https://github.com/ambv/black
rev: 22.3.0
hooks:
- id: black
language_version: python3
args: ["-l 79"]
- repo: local
hooks:
- id: pylint
name: pylint
language: system
files: \.py$
entry: pylint
args: ["--errors-only"]
- repo: local
hooks:
- id: ipynb-strip
name: ipynb-strip
language: system
files: \.ipynb$
entry: jupyter nbconvert --clear-output --ClearOutputPreprocessor.enabled=True
args: ["--log-level=ERROR"]
- repo: https://github.com/pre-commit/mirrors-prettier
rev: v2.6.2
hooks:
- id: prettier
language_version: system
================================================
FILE: .prettierrc
================================================
{
"overrides": [
{
"files": "*.md",
"options": {
"printWidth": 79,
"proseWrap": "always",
"tabWidth": 4
}
},
{
"files": "*.json",
"options": {
"tabWidth": 4
}
}
]
}
================================================
FILE: CONTRIBUTING.md
================================================
# Contributing to FiftyOne Brain
All Brain contributions should follow the practices established in
[FiftyOne](https://github.com/voxel51/fiftyone/blob/develop/CONTRIBUTING.md).
## Adding new public methods to the Brain package
The `fiftyone.brain` package should expose all core user-functionality at the
base level. For example, for hardness, the user should be able to execute calls
in the following way:
```py
# Users should be able to do this
import fiftyone.brain as fob
fob.compute_hardness(...)
# And NOT have to do this
import fiftyone.brain.hardness as fobh
fobh.compute_hardness(...)
```
To achieve this, follow the existing pattern of declaring new public methods in
[`fiftyone/brain/__init__.py`](https://github.com/voxel51/fiftyone-brain/blob/develop/fiftyone/brain/__init__.py).
Be sure to include a detailed docstring for all methods in this file, as they
are pulled in by FiftyOne documentation builds and are made available in the
[public docs](https://docs.voxel51.com/api/fiftyone.brain.html).
================================================
FILE: LICENSE
================================================
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright 2017-2026, Voxel51, Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
================================================
FILE: MANIFEST.in
================================================
global-include *
prune fiftyone/brain/internal/models/cache/
include fiftyone/brain/internal/models/cache/manifest.json
================================================
FILE: README.md
================================================
<div align="center">
<p align="center">
<img src="https://github.com/user-attachments/assets/17afdf93-289c-40f1-805c-06344f095cf6" height="55px">
**Open Source AI from [Voxel51](https://voxel51.com)**
<!-- prettier-ignore -->
<a href="https://voxel51.com/fiftyone">FiftyOne Website</a> •
<a href="https://voxel51.com/docs/fiftyone">FiftyOne Docs</a> •
<a href="https://docs.voxel51.com/brain.html">FiftyOne Brain Docs</a> •
<a href="https://voxel51.com/blog/">Blog</a> •
<a href="https://slack.voxel51.com">Community</a>
[](https://pypi.org/project/fiftyone-brain)
[](https://pypi.org/project/fiftyone-brain)
[](https://pepy.tech/project/fiftyone-brain)
[](LICENSE)
[](https://discord.gg/fiftyone-community)
[](https://huggingface.co/Voxel51)
[](https://voxel51.com/blog)
[](https://share.hsforms.com/1zpJ60ggaQtOoVeBqIZdaaA2ykyk)
[](https://www.linkedin.com/company/voxel51)
[](https://x.com/voxel51)
[](https://medium.com/voxel51)
</p>
</div>
---
FiftyOne Brain contains the open source AI/ML capabilities for the
[FiftyOne ecosystem](https://github.com/voxel51/fiftyone), enabling users to
automatically analyze and manipulate their datasets and models. FiftyOne Brain
includes features like visual similarity search, query by text, finding unique
and representative samples, finding media quality problems and annotation
mistakes, and more 🚀
## Documentation
Public documentation for the FiftyOne Brain is
[available here](https://docs.voxel51.com/user_guide/brain.html).
## Installation
The FiftyOne Brain is distributed via the `fiftyone-brain` package, and a
suitable version is automatically included with every `fiftyone` install:
```shell
pip install fiftyone
pip show fiftyone-brain
```
### Installing from source
If you wish to do a source install of the latest FiftyOne Brain version, simply
clone this repository:
```shell
git clone https://github.com/voxel51/fiftyone-brain
cd fiftyone-brain
```
and run the install script:
```shell
# Mac or Linux
bash install.sh
# Windows
.\install.bat
```
### Developer installation
If you are a developer contributing to this repository, you should perform a
developer installation using the `-d` flag of the install script:
```shell
# Mac or Linux
bash install.sh -d
# Windows
.\install.bat -d
```
Check out the [contribution guide](CONTRIBUTING.md) to get started.
## Uninstallation
```shell
pip uninstall fiftyone-brain
```
## Repository layout
- `fiftyone/brain/` definition of the `fiftyone.brain` namespace
- `requirements/` Python requirements for the project
- `tests/` tests for the various components of the Brain
## Citation
If you use the FiftyOne Brain in your research, please cite the project:
```bibtex
@article{moore2020fiftyone,
title={FiftyOne},
author={Moore, B. E. and Corso, J. J.},
journal={GitHub. Note: https://github.com/voxel51/fiftyone-brain},
year={2020}
}
```
================================================
FILE: RELEASING.md
================================================
# Releasing the Brain package
> [!NOTE]
> These steps are to be performed by authorized Voxel51 engineers.
The `fiftyone-brain` repository follows `Gitflow`.
Releases will be initiated when a teammate submits a
pull request from their respective `release/v*` branch to `main`.
We can see an example PR for
[version 0.21.4](https://github.com/voxel51/fiftyone-brain/pull/265).
Reviewers should always check that the version in the `setup.py`
matches the branch version.
The release engineer will merge the pull request once it is approved.
The PyPI uploads will be triggered when a release tag is pushed to the
repository:
1. Navigate to the
[releases page](https://github.com/voxel51/fiftyone-brain/pull/265).
1. Select `Draft a new release`.
1. Select `Create new tag` with the appropriate version and set the target to
`main`.
1. The tag format is `v<semantic-version>`.
For example, `v0.21.4`.
This should match the `setup.py` and release branch.
1. Select `Generate release notes`.
1. Select `Set as the latest release`.
1. Select `Publish release`.
This will create a new tag in the repository and will trigger the
[build/publish workflow](https://github.com/voxel51/fiftyone-brain/actions/workflows/build.yml).
This workflow will build the `.whl` artifacts and publish them to
[PyPI](https://pypi.org/project/fiftyone-brain/).
Once the build are finished, submit a PR from `main` to `develop` to complete
the `Gitflow` process.
================================================
FILE: STYLE_GUIDE.md
================================================
# FiftyOne Brain Style Guide
The Brain follows the same style guidelines as
[FiftyOne](https://github.com/voxel51/fiftyone/blob/develop/STYLE_GUIDE.md).
================================================
FILE: fiftyone/__init__.py
================================================
from pkgutil import extend_path
#
# This statement allows multiple `fiftyone.XXX` packages to be installed in the
# same environment and used simultaneously.
#
# https://docs.python.org/3/library/pkgutil.html#pkgutil.extend_path
#
__path__ = extend_path(__path__, __name__)
from fiftyone.__public__ import *
================================================
FILE: fiftyone/brain/__init__.py
================================================
"""
The brains behind FiftyOne: a powerful package for dataset curation, analysis,
and visualization.
See https://github.com/voxel51/fiftyone for more information.
| Copyright 2017-2026, Voxel51, Inc.
| `voxel51.com <https://voxel51.com/>`_
|
"""
import fiftyone.brain.config as _foc
from .similarity import (
Similarity,
SimilarityConfig,
SimilarityIndex,
)
from .visualization import (
Visualization,
VisualizationConfig,
VisualizationResults,
)
brain_config = _foc.load_brain_config()
def compute_hardness(
samples,
label_field,
hardness_field="hardness",
progress=None,
):
"""Adds a hardness field to each sample scoring the difficulty that the
specified label field observed in classifying the sample.
Hardness is a measure computed based on model prediction output (through
logits) that summarizes a measure of the uncertainty the model had with the
sample. This makes hardness quantitative and can be used to detect things
like hard samples, annotation errors during noisy training, and more.
All classifications must have their
:attr:`logits <fiftyone.core.labels.Classification.logits>` attributes
populated in order to use this method.
.. note::
Runs of this method can be referenced later via brain key
``hardness_field``.
Args:
samples: a :class:`fiftyone.core.collections.SampleCollection`
label_field: the :class:`fiftyone.core.labels.Classification` or
:class:`fiftyone.core.labels.Classifications` field to use from
each sample
hardness_field ("hardness"): the field name to use to store the
hardness value for each sample
progress (None): whether to render a progress bar (True/False), use the
default value ``fiftyone.config.show_progress_bars`` (None), or a
progress callback function to invoke instead
"""
import fiftyone.brain.internal.core.hardness as fbh
return fbh.compute_hardness(samples, label_field, hardness_field, progress)
def compute_mistakenness(
samples,
pred_field,
label_field,
mistakenness_field="mistakenness",
missing_field="possible_missing",
spurious_field="possible_spurious",
use_logits=False,
copy_missing=False,
progress=None,
):
"""Computes the mistakenness (likelihood of being incorrect) of the labels
in ``label_field`` based on the predcted labels in ``pred_field``.
Mistakenness is measured based on either the ``confidence`` or ``logits``
of the predictions in ``pred_field``. This measure can be used to detect
things like annotation errors and unusually hard samples.
For classifications, a ``mistakenness_field`` field is populated on each
sample that quantifies the likelihood that the label in the ``label_field``
of that sample is incorrect.
For objects (detections, polylines, keypoints, etc), the mistakenness of
each object in ``label_field`` is computed, using
:meth:`fiftyone.core.collections.SampleCollection.evaluate_detections` to
locate corresponding objects in ``pred_field``. Three types of mistakes
are identified:
- **(Mistakes)** Objects in ``label_field`` with a match in
``pred_field`` are assigned a mistakenness value in their
``mistakenness_field`` that captures the likelihood that the class
label of the object in ``label_field`` is a mistake. A
``mistakenness_field + "_loc"`` field is also populated that captures
the likelihood that the object in ``label_field`` is a mistake due
to its localization (bounding box).
- **(Missing)** Objects in ``pred_field`` with no matches in
``label_field`` but which are likely to be correct will have their
``missing_field`` attribute set to True. In addition, if
``copy_missing`` is True, copies of these objects are *added* to the
ground truth ``label_field``.
- **(Spurious)** Objects in ``label_field`` with no matches in
``pred_field`` but which are likely to be incorrect will have their
``spurious_field`` attribute set to True.
In addition, for objects, the following sample-level fields are populated:
- **(Mistakes)** The ``mistakenness_field`` of each sample is populated
with the maximum mistakenness of the objects in ``label_field``
- **(Missing)** The ``missing_field`` of each sample is populated with
the number of missing objects that were deemed missing from
``label_field``.
- **(Spurious)** The ``spurious_field`` of each sample is populated with
the number of objects in ``label_field`` that were given deemed
spurious.
.. note::
Runs of this method can be referenced later via brain key
``mistakenness_field``.
Args:
samples: a :class:`fiftyone.core.collections.SampleCollection`
pred_field: the name of the predicted label field to use from each
sample. Can be of type
:class:`fiftyone.core.labels.Classification`,
:class:`fiftyone.core.labels.Classifications`,
:class:`fiftyone.core.labels.Detections`,
:class:`fiftyone.core.labels.Polylines`,
:class:`fiftyone.core.labels.Keypoints`, or
:class:`fiftyone.core.labels.TemporalDetections`
label_field: the name of the "ground truth" label field that you want
to test for mistakes with respect to the predictions in
``pred_field``. Must have the same type as ``pred_field``
mistakenness_field ("mistakenness"): the field name to use to store the
mistakenness value for each sample
missing_field ("possible_missing): the field in which to store
per-sample counts of potential missing objects
spurious_field ("possible_spurious): the field in which to store
per-sample counts of potential spurious objects
use_logits (False): whether to use logits (True) or confidence (False)
to compute mistakenness. Logits typically yield better results,
when they are available
copy_missing (False): whether to copy predicted objects that were
deemed to be missing into ``label_field``
progress (None): whether to render a progress bar (True/False), use the
default value ``fiftyone.config.show_progress_bars`` (None), or a
progress callback function to invoke instead
"""
import fiftyone.brain.internal.core.mistakenness as fbm
return fbm.compute_mistakenness(
samples,
pred_field,
label_field,
mistakenness_field,
missing_field,
spurious_field,
use_logits,
copy_missing,
progress,
)
def compute_uniqueness(
samples,
uniqueness_field="uniqueness",
roi_field=None,
embeddings=None,
similarity_index=None,
model=None,
model_kwargs=None,
force_square=False,
alpha=None,
batch_size=None,
num_workers=None,
skip_failures=True,
progress=None,
):
"""Adds a uniqueness field to each sample scoring how unique it is with
respect to the rest of the samples.
This function only uses the pixel data and can therefore process labeled or
unlabeled samples.
If no ``embeddings``, ``similarity_index``, or ``model`` is provided, a
default model is used to generate embeddings.
.. note::
Runs of this method can be referenced later via brain key
``uniqueness_field``.
Args:
samples: a :class:`fiftyone.core.collections.SampleCollection`
uniqueness_field ("uniqueness"): the field name to use to store the
uniqueness value for each sample
roi_field (None): an optional :class:`fiftyone.core.labels.Detection`,
:class:`fiftyone.core.labels.Detections`,
:class:`fiftyone.core.labels.Polyline`, or
:class:`fiftyone.core.labels.Polylines` field defining a region of
interest within each image to use to compute uniqueness
embeddings (None): if no ``model`` is provided, this argument specifies
pre-computed embeddings to use, which can be any of the following:
- a ``num_samples x num_dims`` array of embeddings
- if ``roi_field`` is specified, a dict mapping sample IDs to
``num_patches x num_dims`` arrays of patch embeddings
- the name of a dataset field containing the embeddings to use
If a ``model`` is provided, this argument specifies the name of a
field in which to store the computed embeddings. In either case,
when working with patch embeddings, you can provide either the
fully-qualified path to the patch embeddings or just the name of
the label attribute in ``roi_field``
similarity_index (None): a
:class:`fiftyone.brain.similarity.SimilarityIndex` or the brain key
of a similarity index to use to load pre-computed embeddings
model (None): a :class:`fiftyone.core.models.Model` or the name of a
model from the
`FiftyOne Model Zoo <https://docs.voxel51.com/user_guide/model_zoo/models.html>`_
to use to generate embeddings. The model must expose embeddings
(``model.has_embeddings = True``)
model_kwargs (None): a dictionary of optional keyword arguments to pass
to the model's ``Config`` when a model name is provided
force_square (False): whether to minimally manipulate the patch
bounding boxes into squares prior to extraction. Only applicable
when a ``model`` and ``roi_field`` are specified
alpha (None): an optional expansion/contraction to apply to the patches
before extracting them, in ``[-1, inf)``. If provided, the length
and width of the box are expanded (or contracted, when
``alpha < 0``) by ``(100 * alpha)%``. For example, set
``alpha = 0.1`` to expand the boxes by 10%, and set
``alpha = -0.1`` to contract the boxes by 10%. Only applicable when
a ``model`` and ``roi_field`` are specified
batch_size (None): a batch size to use when computing embeddings. Only
applicable when a ``model`` is provided
num_workers (None): the number of workers to use when loading images.
Only applicable when a Torch-based model is being used to compute
embeddings
skip_failures (True): whether to gracefully continue without raising an
error if embeddings cannot be generated for a sample
progress (None): whether to render a progress bar (True/False), use the
default value ``fiftyone.config.show_progress_bars`` (None), or a
progress callback function to invoke instead
"""
import fiftyone.brain.internal.core.uniqueness as fbu
return fbu.compute_uniqueness(
samples,
uniqueness_field,
roi_field,
embeddings,
similarity_index,
model,
model_kwargs,
force_square,
alpha,
batch_size,
num_workers,
skip_failures,
progress,
)
def compute_representativeness(
samples,
representativeness_field="representativeness",
method="cluster-center",
roi_field=None,
embeddings=None,
similarity_index=None,
model=None,
model_kwargs=None,
force_square=False,
alpha=None,
batch_size=None,
num_workers=None,
skip_failures=True,
progress=None,
):
"""Adds a representativeness field to each sample scoring how representative
of nearby samples it is.
This function only uses the pixel data and can therefore process labeled or
unlabeled samples.
If no ``embeddings``, ``similarity_index``, or ``model`` is provided, a
default model is used to generate embeddings.
.. note::
Runs of this method can be referenced later via brain key
``representativeness_field``.
Args:
samples: a :class:`fiftyone.core.collections.SampleCollection`
representativeness_field ("representativeness"): the field name to use
to store the representativeness value for each sample
method ("cluster-center"): the name of the method to use to compute the
representativeness. The supported values are
``["cluster-center", 'cluster-center-downweight']``.
``"cluster-center"` will make a sample's representativeness
proportional to it's proximity to cluster centers, while
``"cluster-center-downweight"`` will ensure more diversity in
representative samples
roi_field (None): an optional :class:`fiftyone.core.labels.Detection`,
:class:`fiftyone.core.labels.Detections`,
:class:`fiftyone.core.labels.Polyline`, or
:class:`fiftyone.core.labels.Polylines` field defining a region of
interest within each image to use to compute representativeness
embeddings (None): if no ``model`` is provided, this argument specifies
pre-computed embeddings to use, which can be any of the following:
- a ``num_samples x num_dims`` array of embeddings
- if ``roi_field`` is specified, a dict mapping sample IDs to
``num_patches x num_dims`` arrays of patch embeddings
- the name of a dataset field containing the embeddings to use
If a ``model`` is provided, this argument specifies the name of a
field in which to store the computed embeddings. In either case,
when working with patch embeddings, you can provide either the
fully-qualified path to the patch embeddings or just the name of
the label attribute in ``roi_field``
similarity_index (None): a
:class:`fiftyone.brain.similarity.SimilarityIndex` or the brain key
of a similarity index to use to load pre-computed embeddings
model (None): a :class:`fiftyone.core.models.Model` or the name of a
model from the
`FiftyOne Model Zoo <https://docs.voxel51.com/user_guide/model_zoo/models.html>`_
to use to generate embeddings. The model must expose embeddings
(``model.has_embeddings = True``)
model_kwargs (None): a dictionary of optional keyword arguments to pass
to the model's ``Config`` when a model name is provided
force_square (False): whether to minimally manipulate the patch
bounding boxes into squares prior to extraction. Only applicable
when a ``model`` and ``roi_field`` are specified
alpha (None): an optional expansion/contraction to apply to the patches
before extracting them, in ``[-1, inf)``. If provided, the length
and width of the box are expanded (or contracted, when
``alpha < 0``) by ``(100 * alpha)%``. For example, set
``alpha = 0.1`` to expand the boxes by 10%, and set
``alpha = -0.1`` to contract the boxes by 10%. Only applicable when
a ``model`` and ``roi_field`` are specified
batch_size (None): a batch size to use when computing embeddings. Only
applicable when a ``model`` is provided
num_workers (None): the number of workers to use when loading images.
Only applicable when a Torch-based model is being used to compute
embeddings
skip_failures (True): whether to gracefully continue without raising an
error if embeddings cannot be generated for a sample
progress (None): whether to render a progress bar (True/False), use the
default value ``fiftyone.config.show_progress_bars`` (None), or a
progress callback function to invoke instead
"""
import fiftyone.brain.internal.core.representativeness as fbr
return fbr.compute_representativeness(
samples,
representativeness_field,
method,
roi_field,
embeddings,
similarity_index,
model,
model_kwargs,
force_square,
alpha,
batch_size,
num_workers,
skip_failures,
progress,
)
def compute_visualization(
samples,
patches_field=None,
embeddings=None,
points=None,
create_index=False,
points_field=None,
brain_key=None,
num_dims=2,
method=None,
similarity_index=None,
model=None,
model_kwargs=None,
force_square=False,
alpha=None,
batch_size=None,
num_workers=None,
skip_failures=True,
progress=None,
**kwargs,
):
"""Computes a low-dimensional representation of the samples' media or their
patches that can be interactively visualized.
The representation can be visualized by calling the
:meth:`visualize() <fiftyone.brain.visualization.VisualizationResults.visualize>`
method of the returned
:class:`fiftyone.brain.visualization.VisualizationResults` object.
If no ``embeddings``, ``similarity_index``, or ``model`` is provided, a
default model is used to generate embeddings.
You can use the ``method`` parameter to select the dimensionality reduction
method to use, and you can optionally customize the method by passing
additional parameters for the method's
:class:`fiftyone.brain.visualization.VisualizationConfig` class as
``kwargs``.
The builtin ``method`` values and their associated config classes are:
- ``"umap"``: :class:`fiftyone.brain.visualization.UMAPVisualizationConfig`
- ``"tsne"``: :class:`fiftyone.brain.visualization.TSNEVisualizationConfig`
- ``"pca"``: :class:`fiftyone.brain.visualization.PCAVisualizationConfig`
- ``"manual"``: :class:`fiftyone.brain.visualization.ManualVisualizationConfig`
You can pass ``create_index=True`` to create a spatial index of the
computed points on your dataset's samples. This is highly recommended for
large datasets as it enables efficient querying when lassoing points in
embeddings plots. By default, spatial indexes are created in a field with
name ``points_field=brain_key``, but you can customize this by manually
providing a ``points_field``.
You can also provide a ``points_field`` with ``create_index=False`` to
store the points on your dataset without explicitly creating a database
index. This will allow lasso callbacks to leverage point data rather than
relying on ID selection, but without the added benefit of a database index
to further optimize performance.
Args:
samples: a :class:`fiftyone.core.collections.SampleCollection`
patches_field (None): a sample field defining the image patches in each
sample that have been/will be embedded. Must be of type
:class:`fiftyone.core.labels.Detection`,
:class:`fiftyone.core.labels.Detections`,
:class:`fiftyone.core.labels.Polyline`, or
:class:`fiftyone.core.labels.Polylines`
embeddings (None): if no ``model`` is provided, this argument specifies
pre-computed embeddings to use, which can be any of the following:
- a dict mapping sample IDs to embedding vectors
- a ``num_samples x num_embedding_dims`` array of embeddings
corresponding to the samples in ``samples``
- if ``patches_field`` is specified, a dict mapping label IDs to
to embedding vectors
- if ``patches_field`` is specified, a dict mapping sample IDs
to ``num_patches x num_embedding_dims`` arrays of patch
embeddings
- the name of a dataset field containing the embeddings to use
If a ``model`` is provided, this argument specifies the name of a
field in which to store the computed embeddings. In either case,
when working with patch embeddings, you can provide either the
fully-qualified path to the patch embeddings or just the name of
the label attribute in ``patches_field``
points (None): a pre-computed low-dimensional representation to use. If
provided, no embeddings will be used/computed. Can be any of the
following:
- a dict mapping sample IDs to points vectors
- a ``num_samples x num_dims`` array of points corresponding to
the samples in ``samples``
- if ``patches_field`` is specified, a dict mapping label IDs to
points vectors
- if ``patches_field`` is specified, a ``num_patches x num_dims``
array of points whose rows correspond to the flattened list of
patches whose IDs are shown below::
# The list of patch IDs that the rows of `points` must match
_, id_field = samples._get_label_field_path(patches_field, "id")
patch_ids = samples.values(id_field, unwind=True)
create_index (False): whether to create a spatial index for the
computed points on your dataset
points_field (None): an optional field name in which to store the
spatial index. When ``create_index=True``, this defaults to
``points_field=brain_key``. When working with patches, you can
provide either the fully-qualified path to the points field or just
the name of the label attribute in ``patches_field``
brain_key (None): a brain key under which to store the results of this
method
num_dims (2): the dimension of the visualization space
method (None): the dimensionality reduction method to use. The
supported values are
``fiftyone.brain.brain_config.visualization_methods.keys()`` and
the default is
``fiftyone.brain.brain_config.default_visualization_method``
similarity_index (None): a
:class:`fiftyone.brain.similarity.SimilarityIndex` or the brain key
of a similarity index to use to load pre-computed embeddings
model (None): a :class:`fiftyone.core.models.Model` or the name of a
model from the
`FiftyOne Model Zoo <https://docs.voxel51.com/user_guide/model_zoo/index.html>`_
to use to generate embeddings. The model must expose embeddings
(``model.has_embeddings = True``)
model_kwargs (None): a dictionary of optional keyword arguments to pass
to the model's ``Config`` when a model name is provided
force_square (False): whether to minimally manipulate the patch
bounding boxes into squares prior to extraction. Only applicable
when a ``model`` and ``patches_field`` are specified
alpha (None): an optional expansion/contraction to apply to the patches
before extracting them, in ``[-1, inf)``. If provided, the length
and width of the box are expanded (or contracted, when
``alpha < 0``) by ``(100 * alpha)%``. For example, set
``alpha = 0.1`` to expand the boxes by 10%, and set
``alpha = -0.1`` to contract the boxes by 10%. Only applicable when
a ``model`` and ``patches_field`` are specified
batch_size (None): an optional batch size to use when computing
embeddings. Only applicable when a ``model`` is provided
num_workers (None): the number of workers to use when loading images.
Only applicable when a Torch-based model is being used to compute
embeddings
skip_failures (True): whether to gracefully continue without raising an
error if embeddings cannot be generated for a sample
progress (None): whether to render a progress bar (True/False), use the
default value ``fiftyone.config.show_progress_bars`` (None), or a
progress callback function to invoke instead
**kwargs: optional keyword arguments for the constructor of the
:class:`fiftyone.brain.visualization.VisualizationConfig`
being used
Returns:
a :class:`fiftyone.brain.visualization.VisualizationResults`
"""
import fiftyone.brain.visualization as fbv
return fbv.compute_visualization(
samples,
patches_field,
embeddings,
points,
create_index,
points_field,
brain_key,
num_dims,
method,
similarity_index,
model,
model_kwargs,
force_square,
alpha,
batch_size,
num_workers,
skip_failures,
progress,
**kwargs,
)
def compute_similarity(
samples,
patches_field=None,
roi_field=None,
embeddings=None,
brain_key=None,
model=None,
model_kwargs=None,
force_square=False,
alpha=None,
batch_size=None,
num_workers=None,
skip_failures=True,
progress=None,
backend=None,
**kwargs,
):
"""Uses embeddings to index the samples or their patches so that you can
query/sort by similarity.
Calling this method only creates the index. You can then call the methods
exposed on the retuned :class:`fiftyone.brain.similarity.SimilarityIndex`
object to perform the following operations:
- :meth:`sort_by_similarity() <fiftyone.brain.similarity.SimilarityIndex.sort_by_similarity>`:
Sort the samples in the collection by similarity to a specific example
or example(s)
All indexes support querying by image similarity by passing sample IDs to
:meth:`sort_by_similarity() <fiftyone.brain.similarity.SimilarityIndex.sort_by_similarity>`.
In addition, if you pass the name of a model from the
`FiftyOne Model Zoo <https://docs.voxel51.com/user_guide/model_zoo/index.html>`_
like ``model="clip-vit-base32-torch"`` that can embed prompts to this
method, then you can query the index by text similarity as well.
In addition, if the backend supports it, you can call the following
duplicate detection methods:
- :meth:`find_duplicates() <fiftyone.brain.similarity.DuplicatesMixin.find_duplicates>`:
Query the index to find all examples with near-duplicates in the
collection
- :meth:`find_unique() <fiftyone.brain.similarity.DuplicatesMixin.find_unique>`:
Query the index to select a subset of examples of a specified size that
are maximally unique with respect to each other
If no ``embeddings`` or ``model`` is provided, a default model is used to
generate embeddings.
Args:
samples: a :class:`fiftyone.core.collections.SampleCollection`
patches_field (None): a sample field defining the image patches in each
sample that have been/will be embedded. Must be of type
:class:`fiftyone.core.labels.Detection`,
:class:`fiftyone.core.labels.Detections`,
:class:`fiftyone.core.labels.Polyline`, or
:class:`fiftyone.core.labels.Polylines`
roi_field (None): an optional :class:`fiftyone.core.labels.Detection`,
:class:`fiftyone.core.labels.Detections`,
:class:`fiftyone.core.labels.Polyline`, or
:class:`fiftyone.core.labels.Polylines` field defining a region of
interest within each image to use to compute embeddings
embeddings (None): embeddings to feed the index. This argument's
behavior depends on whether a ``model`` is provided, as described
below.
If no ``model`` is provided, this argument specifies pre-computed
embeddings to use:
- a ``num_samples x num_dims`` array of embeddings
- if ``patches_field``/``roi_field`` is specified, a dict
mapping sample IDs to ``num_patches x num_dims`` arrays of
patch embeddings
- the name of a dataset field from which to load embeddings
- ``None``: use the default model to compute embeddings
- ``False``: **do not** compute embeddings right now
If a ``model`` is provided, this argument specifies where to store
the model's embeddings:
- the name of a field in which to store the computed embeddings
- ``False``: **do not** compute embeddings right now
In either case, when working with patch embeddings, you can provide
either the fully-qualified path to the patch embeddings or just the
name of the label attribute in ``patches_field``/``roi_field``
brain_key (None): a brain key under which to store the results of this
method
model (None): a :class:`fiftyone.core.models.Model` or the name of a
model from the
`FiftyOne Model Zoo <https://docs.voxel51.com/user_guide/model_zoo/index.html>`_
to use, or that was already used, to generate embeddings. The model
must expose embeddings (``model.has_embeddings = True``)
model_kwargs (None): a dictionary of optional keyword arguments to pass
to the model's ``Config`` when a model name is provided
force_square (False): whether to minimally manipulate the patch
bounding boxes into squares prior to extraction. Only applicable
when a ``model`` and ``patches_field``/``roi_field`` are specified
alpha (None): an optional expansion/contraction to apply to the patches
before extracting them, in ``[-1, inf)``. If provided, the length
and width of the box are expanded (or contracted, when
``alpha < 0``) by ``(100 * alpha)%``. For example, set
``alpha = 0.1`` to expand the boxes by 10%, and set
``alpha = -0.1`` to contract the boxes by 10%. Only applicable when
a ``model`` and ``patches_field``/``roi_field`` are specified
batch_size (None): an optional batch size to use when computing
embeddings. Only applicable when a ``model`` is provided
num_workers (None): the number of workers to use when loading images.
Only applicable when a Torch-based model is being used to compute
embeddings
skip_failures (True): whether to gracefully continue without raising an
error if embeddings cannot be generated for a sample
progress (None): whether to render a progress bar (True/False), use the
default value ``fiftyone.config.show_progress_bars`` (None), or a
progress callback function to invoke instead
backend (None): the similarity backend to use. The supported values are
``fiftyone.brain.brain_config.similarity_backends.keys()`` and the
default is
``fiftyone.brain.brain_config.default_similarity_backend``
**kwargs: keyword arguments for the
:class:`fiftyone.brian.SimilarityConfig` subclass of the backend
being used
Returns:
a :class:`fiftyone.brain.similarity.SimilarityIndex`
"""
import fiftyone.brain.similarity as fbs
return fbs.compute_similarity(
samples,
patches_field,
roi_field,
embeddings,
brain_key,
model,
model_kwargs,
force_square,
alpha,
batch_size,
num_workers,
skip_failures,
progress,
backend,
**kwargs,
)
def compute_near_duplicates(
samples,
threshold=0.2,
roi_field=None,
embeddings=None,
similarity_index=None,
model=None,
model_kwargs=None,
force_square=False,
alpha=None,
batch_size=None,
num_workers=None,
skip_failures=True,
progress=None,
):
"""Detects potential duplicates in the given sample collection.
Calling this method only initializes the index. You can then call the
methods exposed on the returned object to perform the following operations:
- :meth:`duplicate_ids <fiftyone.brain.similarity.DuplicatesMixin.duplicate_ids>`:
A list of duplicate IDs
- :meth:`neighbors_map <fiftyone.brain.similarity.DuplicatesMixin.neighbors_map>`:
A dictionary mapping IDs to lists of ``(dup_id, dist)`` tuples
- :meth:`duplicates_view() <fiftyone.brain.similarity.DuplicatesMixin.duplicates_view>`:
Returns a view of all duplicates in the input collection
Args:
samples: a :class:`fiftyone.core.collections.SampleCollection`
threshold (0.2): the similarity distance threshold to use when
detecting duplicates. Values in ``[0.1, 0.25]`` work well for the
default setup
roi_field (None): an optional :class:`fiftyone.core.labels.Detection`,
:class:`fiftyone.core.labels.Detections`,
:class:`fiftyone.core.labels.Polyline`, or
:class:`fiftyone.core.labels.Polylines` field defining a region of
interest within each image to use to compute embeddings
embeddings (None): if no ``model`` is provided, this argument specifies
pre-computed embeddings to use, which can be any of the following:
- a ``num_samples x num_dims`` array of embeddings
- if ``roi_field`` is specified, a dict mapping sample IDs to
``num_patches x num_dims`` arrays of patch embeddings
- the name of a dataset field containing the embeddings to use
If a ``model`` is provided, this argument specifies the name of a
field in which to store the computed embeddings. In either case,
when working with patch embeddings, you can provide either the
fully-qualified path to the patch embeddings or just the name of
the label attribute in ``roi_field``
similarity_index (None): a
:class:`fiftyone.brain.similarity.SimilarityIndex` or the brain key
of a similarity index to use to load pre-computed embeddings
model (None): a :class:`fiftyone.core.models.Model` or the name of a
model from the
`FiftyOne Model Zoo <https://docs.voxel51.com/user_guide/model_zoo/models.html>`_
to use to generate embeddings. The model must expose embeddings
(``model.has_embeddings = True``)
model_kwargs (None): a dictionary of optional keyword arguments to pass
to the model's ``Config`` when a model name is provided
force_square (False): whether to minimally manipulate the patch
bounding boxes into squares prior to extraction. Only applicable
when a ``model`` and ``roi_field`` are specified
alpha (None): an optional expansion/contraction to apply to the patches
before extracting them, in ``[-1, inf)``. If provided, the length
and width of the box are expanded (or contracted, when
``alpha < 0``) by ``(100 * alpha)%``. For example, set
``alpha = 0.1`` to expand the boxes by 10%, and set
``alpha = -0.1`` to contract the boxes by 10%. Only applicable when
a ``model`` and ``roi_field`` are specified
batch_size (None): a batch size to use when computing embeddings. Only
applicable when a ``model`` is provided
num_workers (None): the number of workers to use when loading images.
Only applicable when a Torch-based model is being used to compute
embeddings
skip_failures (True): whether to gracefully continue without raising an
error if embeddings cannot be generated for a sample
progress (None): whether to render a progress bar (True/False), use the
default value ``fiftyone.config.show_progress_bars`` (None), or a
progress callback function to invoke instead
Returns:
a :class:`fiftyone.brain.similarity.SimilarityIndex`
"""
import fiftyone.brain.internal.core.duplicates as fbd
return fbd.compute_near_duplicates(
samples,
threshold=threshold,
roi_field=roi_field,
embeddings=embeddings,
similarity_index=similarity_index,
model=model,
model_kwargs=model_kwargs,
force_square=force_square,
alpha=alpha,
batch_size=batch_size,
num_workers=num_workers,
skip_failures=skip_failures,
progress=progress,
)
def compute_exact_duplicates(
samples,
num_workers=None,
skip_failures=True,
progress=None,
):
"""Detects duplicate media in a sample collection.
This method detects exact duplicates with the same filehash. Use
:meth:`compute_near_duplicates` to detect near-duplicates.
If duplicates are found, the first instance in ``samples`` will be the key
in the returned dictionary, while the subsequent duplicates will be the
values in the corresponding list.
Args:
samples: a :class:`fiftyone.core.collections.SampleCollection`
num_workers (None): an optional number of processes to use
skip_failures (True): whether to gracefully ignore samples whose
filehash cannot be computed
progress (None): whether to render a progress bar (True/False), use the
default value ``fiftyone.config.show_progress_bars`` (None), or a
progress callback function to invoke instead
Returns:
a dictionary mapping IDs of samples with exact duplicates to lists of
IDs of the duplicates for the corresponding sample
"""
import fiftyone.brain.internal.core.duplicates as fbd
return fbd.compute_exact_duplicates(
samples, num_workers, skip_failures, progress
)
def compute_leaky_splits(
samples,
splits,
threshold=0.2,
roi_field=None,
embeddings=None,
similarity_index=None,
model=None,
model_kwargs=None,
force_square=False,
alpha=None,
batch_size=None,
num_workers=None,
skip_failures=True,
progress=None,
):
"""Computes potential leaks between splits of the given sample collection.
Calling this method only initializes the index. You can then call the
methods exposed on the returned object to perform the following operations:
- :meth:`leaks_view() <fiftyone.brain.core.internal.leaky_splits.LeakySplitsIndex.leaks_view>`:
Returns a view of all leaks in the input collection
- :meth:`no_leaks_view() <fiftyone.brain.core.internal.leaky_splits.LeakySplitsIndex.no_leaks_view>`:
Returns the subset of the input collection without any leaks
- :meth:`leaks_for_sample() <fiftyone.brain.core.internal.leaky_splits.LeakySplitsIndex.leaks_for_sample>`:
Returns a view with leaks corresponding to the given sample
- :meth:`tag_leaks() <fiftyone.brain.core.internal.leaky_splits.LeakySplitsIndex.tag_leaks>`:
Tags leaks in the dataset as leaks
Args:
samples: a :class:`fiftyone.core.collections.SampleCollection`
splits: the dataset splits, specified in one of the following ways:
- a list of tag strings
- the name of a string/list field that encodes the split
memberships
- a dict mapping split names to
:class:`fiftyone.core.view.DatasetView` instances
threshold (0.2): the similarity distance threshold to use when
detecting leaks. Values in ``[0.1, 0.25]`` work well for the
default setup
roi_field (None): an optional :class:`fiftyone.core.labels.Detection`,
:class:`fiftyone.core.labels.Detections`,
:class:`fiftyone.core.labels.Polyline`, or
:class:`fiftyone.core.labels.Polylines` field defining a region of
interest within each image to use to compute leaks
embeddings (None): if no ``model`` is provided, this argument specifies
pre-computed embeddings to use, which can be any of the following:
- a ``num_samples x num_dims`` array of embeddings
- if ``roi_field`` is specified, a dict mapping sample IDs to
``num_patches x num_dims`` arrays of patch embeddings
- the name of a dataset field containing the embeddings to use
If a ``model`` is provided, this argument specifies the name of a
field in which to store the computed embeddings. In either case,
when working with patch embeddings, you can provide either the
fully-qualified path to the patch embeddings or just the name of
the label attribute in ``roi_field``
similarity_index (None): a
:class:`fiftyone.brain.similarity.SimilarityIndex` or the brain key
of a similarity index to use to load pre-computed embeddings
model (None): a :class:`fiftyone.core.models.Model` or the name of a
model from the
`FiftyOne Model Zoo <https://docs.voxel51.com/user_guide/model_zoo/models.html>`_
to use to generate embeddings. The model must expose embeddings
(``model.has_embeddings = True``)
model_kwargs (None): a dictionary of optional keyword arguments to pass
to the model's ``Config`` when a model name is provided
force_square (False): whether to minimally manipulate the patch
bounding boxes into squares prior to extraction. Only applicable
when a ``model`` and ``roi_field`` are specified
alpha (None): an optional expansion/contraction to apply to the patches
before extracting them, in ``[-1, inf)``. If provided, the length
and width of the box are expanded (or contracted, when
``alpha < 0``) by ``(100 * alpha)%``. For example, set
``alpha = 0.1`` to expand the boxes by 10%, and set
``alpha = -0.1`` to contract the boxes by 10%. Only applicable when
a ``model`` and ``roi_field`` are specified
batch_size (None): a batch size to use when computing embeddings. Only
applicable when a ``model`` is provided
num_workers (None): the number of workers to use when loading images.
Only applicable when a Torch-based model is being used to compute
embeddings
skip_failures (True): whether to gracefully continue without raising an
error if embeddings cannot be generated for a sample
progress (None): whether to render a progress bar (True/False), use the
default value ``fiftyone.config.show_progress_bars`` (None), or a
progress callback function to invoke instead
Returns:
a :class:`fiftyone.brain.internal.core.leaky_splits.LeakySplitsIndex`
"""
import fiftyone.brain.internal.core.leaky_splits as fbl
return fbl.compute_leaky_splits(
samples,
splits,
threshold=threshold,
roi_field=roi_field,
embeddings=embeddings,
similarity_index=similarity_index,
model=model,
model_kwargs=model_kwargs,
force_square=force_square,
alpha=alpha,
batch_size=batch_size,
num_workers=num_workers,
skip_failures=skip_failures,
progress=progress,
)
================================================
FILE: fiftyone/brain/config.py
================================================
"""
Brain config.
| Copyright 2017-2026, Voxel51, Inc.
| `voxel51.com <https://voxel51.com/>`_
|
"""
import os
from fiftyone.core.config import EnvConfig
class BrainConfig(EnvConfig):
"""FiftyOne brain configuration settings."""
_BUILTIN_SIMILARITY_BACKENDS = {
"sklearn": {
"config_cls": "fiftyone.brain.internal.core.sklearn.SklearnSimilarityConfig",
},
"pinecone": {
"config_cls": "fiftyone.brain.internal.core.pinecone.PineconeSimilarityConfig",
},
"qdrant": {
"config_cls": "fiftyone.brain.internal.core.qdrant.QdrantSimilarityConfig",
},
"milvus": {
"config_cls": "fiftyone.brain.internal.core.milvus.MilvusSimilarityConfig",
},
"lancedb": {
"config_cls": "fiftyone.brain.internal.core.lancedb.LanceDBSimilarityConfig",
},
"redis": {
"config_cls": "fiftyone.brain.internal.core.redis.RedisSimilarityConfig",
},
"mongodb": {
"config_cls": "fiftyone.brain.internal.core.mongodb.MongoDBSimilarityConfig",
},
"elasticsearch": {
"config_cls": "fiftyone.brain.internal.core.elasticsearch.ElasticsearchSimilarityConfig",
},
"pgvector": {
"config_cls": "fiftyone.brain.internal.core.pgvector.PgVectorSimilarityConfig",
},
"mosaic": {
"config_cls": "fiftyone.brain.internal.core.mosaic.MosaicSimilarityConfig",
},
}
_BUILTIN_VISUALIZATION_METHODS = {
"umap": {
"config_cls": "fiftyone.brain.visualization.UMAPVisualizationConfig",
},
"tsne": {
"config_cls": "fiftyone.brain.visualization.TSNEVisualizationConfig",
},
"pca": {
"config_cls": "fiftyone.brain.visualization.PCAVisualizationConfig",
},
"manual": {
"config_cls": "fiftyone.brain.visualization.ManualVisualizationConfig",
},
}
def __init__(self, d=None):
if d is None:
d = {}
self.default_similarity_backend = self.parse_string(
d,
"default_similarity_backend",
env_var="FIFTYONE_BRAIN_DEFAULT_SIMILARITY_BACKEND",
default="sklearn",
)
self.similarity_backends = self._parse_similarity_backends(d)
if self.default_similarity_backend not in self.similarity_backends:
self.default_similarity_backend = next(
iter(sorted(self.similarity_backends.keys())), None
)
self.default_visualization_method = self.parse_string(
d,
"default_visualization_method",
env_var="FIFTYONE_BRAIN_DEFAULT_VISUALIZATION_METHOD",
default="umap",
)
self.visualization_methods = self._parse_visualization_methods(d)
if self.default_visualization_method not in self.visualization_methods:
self.default_visualization_method = next(
iter(sorted(self.visualization_methods.keys())), None
)
def _parse_similarity_backends(self, d):
d = d.get("similarity_backends", {})
env_vars = dict(os.environ)
#
# `FIFTYONE_BRAIN_SIMILARITY_BACKENDS` can be used to declare which
# backends are exposed. This may exclude builtin backends and/or
# declare new backends
#
if "FIFTYONE_BRAIN_SIMILARITY_BACKENDS" in env_vars:
backends = env_vars["FIFTYONE_BRAIN_SIMILARITY_BACKENDS"].split(
","
)
# Special syntax to append rather than override default backends
if "*" in backends:
backends = set(b for b in backends if b != "*")
backends |= set(self._BUILTIN_SIMILARITY_BACKENDS.keys())
d = {backend: d.get(backend, {}) for backend in backends}
else:
backends = self._BUILTIN_SIMILARITY_BACKENDS.keys()
for backend in backends:
if backend not in d:
d[backend] = {}
#
# Extract parameters from any environment variables of the form
# `FIFTYONE_BRAIN_SIMILARITY_<BACKEND>_<PARAMETER>`
#
for backend, d_backend in d.items():
prefix = "FIFTYONE_BRAIN_SIMILARITY_%s_" % backend.upper()
for env_name, env_value in env_vars.items():
if env_name.startswith(prefix):
name = env_name[len(prefix) :].lower()
value = _parse_env_value(env_value)
d_backend[name] = value
#
# Set default parameters for builtin similarity backends
#
for backend, defaults in self._BUILTIN_SIMILARITY_BACKENDS.items():
if backend not in d:
continue
d_backend = d[backend]
for name, value in defaults.items():
if name not in d_backend:
d_backend[name] = value
return d
def _parse_visualization_methods(self, d):
d = d.get("visualization_methods", {})
env_vars = dict(os.environ)
#
# `FIFTYONE_BRAIN_VISUALIZATION_METHODS` can be used to declare which
# methods are exposed. This may exclude builtin methods and/or declare
# new methods
#
if "FIFTYONE_BRAIN_VISUALIZATION_METHODS" in env_vars:
methods = env_vars["FIFTYONE_BRAIN_VISUALIZATION_METHODS"].split(
","
)
# Special syntax to append rather than override default methods
if "*" in methods:
methods = set(m for m in methods if m != "*")
methods |= set(self._BUILTIN_VISUALIZATION_METHODS.keys())
d = {method: d.get(method, {}) for method in methods}
else:
methods = self._BUILTIN_VISUALIZATION_METHODS.keys()
for method in methods:
if method not in d:
d[method] = {}
#
# Extract parameters from any environment variables of the form
# `FIFTYONE_BRAIN_VISUALIZATION_<METHOD>_<PARAMETER>`
#
for method, d_method in d.items():
prefix = "FIFTYONE_BRAIN_VISUALIZATION_%s_" % method.upper()
for env_name, env_value in env_vars.items():
if env_name.startswith(prefix):
name = env_name[len(prefix) :].lower()
value = _parse_env_value(env_value)
d_method[name] = value
#
# Set default parameters for builtin visualization methods
#
for method, defaults in self._BUILTIN_VISUALIZATION_METHODS.items():
if method not in d:
continue
d_method = d[method]
for name, value in defaults.items():
if name not in d_method:
d_method[name] = value
return d
def locate_brain_config():
"""Returns the path to the :class:`BrainConfig` on disk.
The default location is ``~/.fiftyone/brain_config.json``, but you can
override this path by setting the ``FIFTYONE_BRAIN_CONFIG_PATH``
environment variable.
Note that a config file may not actually exist on disk.
Returns:
the path to the :class:`BrainConfig` on disk
"""
if "FIFTYONE_BRAIN_CONFIG_PATH" not in os.environ:
return os.path.join(
os.path.expanduser("~"), ".fiftyone", "brain_config.json"
)
return os.environ["FIFTYONE_BRAIN_CONFIG_PATH"]
def load_brain_config():
"""Loads the FiftyOne brain config.
Returns:
a :class:`BrainConfig` instance
"""
brain_config_path = locate_brain_config()
if os.path.isfile(brain_config_path):
return BrainConfig.from_json(brain_config_path)
return BrainConfig()
def _parse_env_value(value):
try:
return int(value)
except:
pass
try:
return float(value)
except:
pass
if value in ("True", "true"):
return True
if value in ("False", "false"):
return False
if value in ("None", ""):
return None
if "," in value:
return [_parse_env_value(v) for v in value.split(",")]
return value
================================================
FILE: fiftyone/brain/internal/__init__.py
================================================
"""
Internal FiftyOne Brain package.
Contains all non-public code powering the ``fiftyone.brain`` public namespace.
| Copyright 2017-2026, Voxel51, Inc.
| `voxel51.com <https://voxel51.com/>`_
|
"""
================================================
FILE: fiftyone/brain/internal/core/__init__.py
================================================
"""
| Copyright 2017-2026, Voxel51, Inc.
| `voxel51.com <https://voxel51.com/>`_
|
"""
================================================
FILE: fiftyone/brain/internal/core/duplicates.py
================================================
"""
Duplicates methods.
| Copyright 2017-2026, Voxel51, Inc.
| `voxel51.com <https://voxel51.com/>`_
|
"""
from collections import defaultdict
import itertools
import logging
import multiprocessing
import eta.core.utils as etau
import fiftyone.core.media as fom
import fiftyone.core.utils as fou
import fiftyone.core.validation as fov
import fiftyone.brain as fb
import fiftyone.brain.similarity as fbs
import fiftyone.brain.internal.core.utils as fbu
logger = logging.getLogger(__name__)
_DEFAULT_MODEL = "resnet18-imagenet-torch"
def compute_near_duplicates(
samples,
threshold=None,
roi_field=None,
embeddings=None,
similarity_index=None,
model=None,
model_kwargs=None,
force_square=False,
alpha=None,
batch_size=None,
num_workers=None,
skip_failures=True,
progress=None,
):
"""See ``fiftyone/brain/__init__.py``."""
fov.validate_collection(samples)
if etau.is_str(embeddings):
embeddings_field, embeddings_exist = fbu.parse_data_field(
samples,
embeddings,
data_type="embeddings",
)
embeddings = None
else:
embeddings_field = None
embeddings_exist = None
if etau.is_str(similarity_index):
similarity_index = samples.load_brain_results(similarity_index)
if (
model is None
and embeddings is None
and similarity_index is None
and not embeddings_exist
):
model = _DEFAULT_MODEL
if similarity_index is None:
similarity_index = fb.compute_similarity(
samples,
backend="sklearn",
roi_field=roi_field,
embeddings=embeddings_field or embeddings,
model=model,
model_kwargs=model_kwargs,
force_square=force_square,
alpha=alpha,
batch_size=batch_size,
num_workers=num_workers,
skip_failures=skip_failures,
progress=progress,
)
elif not isinstance(similarity_index, fbs.DuplicatesMixin):
raise ValueError(
"This method only supports similarity indexes that implement the "
"%s mixin" % fbs.DuplicatesMixin
)
similarity_index.find_duplicates(thresh=threshold)
return similarity_index
def compute_exact_duplicates(samples, num_workers, skip_failures, progress):
"""See ``fiftyone/brain/__init__.py``."""
fov.validate_collection(samples)
if num_workers is None:
if samples.media_type == fom.VIDEO:
num_workers = multiprocessing.cpu_count()
else:
num_workers = 1
logger.info("Computing filehashes...")
method = "md5" if samples.media_type == fom.VIDEO else None
if num_workers <= 1:
hashes = _compute_filehashes(samples, method, progress)
else:
hashes = _compute_filehashes_multi(
samples, method, num_workers, progress
)
num_missing = sum(h is None for h in hashes)
if num_missing > 0:
msg = "Failed to compute %d filehashes" % num_missing
if skip_failures:
logger.warning(msg)
else:
raise ValueError(msg)
neighbors_map = defaultdict(list)
observed_hashes = {}
for _id, _hash in hashes.items():
if _hash is None:
continue
if _hash in observed_hashes:
neighbors_map[observed_hashes[_hash]].append(_id)
else:
observed_hashes[_hash] = _id
return dict(neighbors_map)
def _compute_filehashes(samples, method, progress):
ids, filepaths = samples.values(["id", "filepath"])
with fou.ProgressBar(total=len(ids), progress=progress) as pb:
return {
_id: _compute_filehash(filepath, method)
for _id, filepath in pb(zip(ids, filepaths))
}
def _compute_filehashes_multi(samples, method, num_workers, progress):
ids, filepaths = samples.values(["id", "filepath"])
methods = itertools.repeat(method)
inputs = list(zip(ids, filepaths, methods))
with fou.ProgressBar(total=len(inputs), progress=progress) as pb:
with multiprocessing.Pool(processes=num_workers) as pool:
return {
k: v
for k, v in pb(
pool.imap_unordered(_do_compute_filehash, inputs)
)
}
def _compute_filehash(filepath, method):
try:
filehash = fou.compute_filehash(filepath, method=method)
except:
filehash = None
return filehash
def _do_compute_filehash(args):
_id, filepath, method = args
try:
filehash = fou.compute_filehash(filepath, method=method)
except:
filehash = None
return _id, filehash
================================================
FILE: fiftyone/brain/internal/core/elasticsearch.py
================================================
"""
Elastisearch similarity backend.
| Copyright 2017-2026, Voxel51, Inc.
| `voxel51.com <https://voxel51.com/>`_
|
"""
import logging
import numpy as np
import eta.core.utils as etau
from fiftyone import ViewField as F
import fiftyone.core.utils as fou
import fiftyone.brain.internal.core.utils as fbu
from fiftyone.brain.similarity import (
SimilarityConfig,
Similarity,
SimilarityIndex,
)
es = fou.lazy_import("elasticsearch")
logger = logging.getLogger(__name__)
_SUPPORTED_METRICS = {
"cosine": "cosine",
"dotproduct": "dot_product",
"euclidean": "l2_norm",
"innerproduct": "max_inner_product",
}
class ElasticsearchSimilarityConfig(SimilarityConfig):
"""Configuration for a Elasticsearch similarity instance.
Args:
index_name (None): the name of the Elasticsearch index to use or
create. If none is provided, a new index will be created
metric ("cosine"): the embedding distance metric to use when creating a
new index. Supported values are
``("cosine", "dotproduct", "euclidean", "innerproduct")``
hosts (None): the full Elasticsearch server address(es) to use. Can be
a string or list of strings
cloud_id (None): the Cloud ID of an Elastic Cloud to connect to
username (None): a username to use
password (None): a password to use
api_key (None): an API key to use
ca_certs (None): a path to a CA certificate
bearer_auth (None): a bearer token to use
ssl_assert_fingerprint (None): a SHA256 fingerprint to use
verify_certs (None): whether to verify SSL certificates
**kwargs: keyword arguments for
:class:`fiftyone.brain.similarity.SimilarityConfig`
"""
def __init__(
self,
index_name=None,
metric="cosine",
hosts=None,
cloud_id=None,
username=None,
password=None,
api_key=None,
ca_certs=None,
bearer_auth=None,
ssl_assert_fingerprint=None,
verify_certs=None,
**kwargs,
):
if metric not in _SUPPORTED_METRICS:
raise ValueError(
"Unsupported metric '%s'. Supported values are %s"
% (metric, tuple(_SUPPORTED_METRICS.keys()))
)
super().__init__(**kwargs)
self.index_name = index_name
self.metric = metric
self._hosts = hosts
self._cloud_id = cloud_id
self._username = username
self._password = password
self._api_key = api_key
self._ca_certs = ca_certs
self._bearer_auth = bearer_auth
self._ssl_assert_fingerprint = ssl_assert_fingerprint
self._verify_certs = verify_certs
@property
def method(self):
return "elasticsearch"
@property
def hosts(self):
return self._hosts
@hosts.setter
def hosts(self, value):
self._hosts = value
@property
def cloud_id(self):
return self._cloud_id
@cloud_id.setter
def cloud_id(self, value):
self._cloud_id = value
@property
def username(self):
return self._username
@username.setter
def username(self, value):
self._username = value
@property
def password(self):
return self._password
@password.setter
def password(self, value):
self._password = value
@property
def api_key(self):
return self._api_key
@api_key.setter
def api_key(self, value):
self._api_key = value
@property
def ca_certs(self):
return self._ca_certs
@ca_certs.setter
def ca_certs(self, value):
self._ca_certs = value
@property
def bearer_auth(self):
return self._bearer_auth
@bearer_auth.setter
def bearer_auth(self, value):
self._bearer_auth = value
@property
def ssl_assert_fingerprint(self):
return self._ssl_assert_fingerprint
@ssl_assert_fingerprint.setter
def ssl_assert_fingerprint(self, value):
self._ssl_assert_fingerprint = value
@property
def verify_certs(self):
return self._verify_certs
@verify_certs.setter
def verify_certs(self, value):
self._verify_certs = value
@property
def max_k(self):
return 10000 # Elasticsearch limit
@property
def supports_least_similarity(self):
return False
@property
def supported_aggregations(self):
return ("mean",)
def load_credentials(
self,
hosts=None,
cloud_id=None,
username=None,
password=None,
api_key=None,
ca_certs=None,
bearer_auth=None,
ssl_assert_fingerprint=None,
verify_certs=None,
):
self._load_parameters(
hosts=hosts,
cloud_id=cloud_id,
username=username,
password=password,
api_key=api_key,
ca_certs=ca_certs,
bearer_auth=bearer_auth,
ssl_assert_fingerprint=ssl_assert_fingerprint,
verify_certs=verify_certs,
)
class ElasticsearchSimilarity(Similarity):
"""Elasticsearch similarity factory.
Args:
config: a :class:`ElasticsearchSimilarityConfig`
"""
def ensure_requirements(self):
fou.ensure_package("elasticsearch")
def ensure_usage_requirements(self):
fou.ensure_package("elasticsearch")
def initialize(self, samples, brain_key):
return ElasticsearchSimilarityIndex(
samples, self.config, brain_key, backend=self
)
class ElasticsearchSimilarityIndex(SimilarityIndex):
"""Class for interacting with Elasticsearch similarity indexes.
Args:
samples: the :class:`fiftyone.core.collections.SampleCollection` used
config: the :class:`ElasticsearchSimilarityConfig` used
brain_key: the brain key
backend (None): a :class:`ElasticsearchSimilarity` instance
"""
def __init__(self, samples, config, brain_key, backend=None):
super().__init__(samples, config, brain_key, backend=backend)
self._client = None
self._metric = None
self._initialize()
@property
def total_index_size(self):
try:
return self._client.count(index=self.config.index_name)["count"]
except:
return 0
@property
def client(self):
"""The ``elasticsearch.Elasticsearch`` instance for this index."""
return self._client
def _initialize(self):
kwargs = {}
for key in (
"hosts",
"cloud_id",
"username",
"password",
"api_key",
"ca_certs",
"bearer_auth",
"ssl_assert_fingerprint",
"verify_certs",
):
value = getattr(self.config, key, None)
if value is not None:
kwargs[key] = value
username = kwargs.pop("username", None)
password = kwargs.pop("password", None)
if username is not None and password is not None:
kwargs["basic_auth"] = (username, password)
try:
self._client = es.Elasticsearch(**kwargs)
except Exception as e:
raise ValueError(
"Failed to connect to Elasticsearch backend. Refer to "
"https://docs.voxel51.com/integrations/elasticsearch.html for more "
"information"
) from e
if self.config.index_name is None:
root = "fiftyone-" + fou.to_slug(self.samples._root_dataset.name)
index_name = fbu.get_unique_name(root, self._get_index_names())
self.config.index_name = index_name
self.save_config()
def _get_index_names(self):
return self._client.indices.get_alias().keys()
def _get_index_ids(self, batch_size=1000):
sample_ids = []
label_ids = []
for batch in range(0, self.total_index_size, batch_size):
response = self._client.search(
index=self.config.index_name,
body={
"fields": ["sample_id"],
"from": batch,
"query": {
"bool": {
"must": [
{"exists": {"field": "vector"}},
{"exists": {"field": "sample_id"}},
]
}
},
},
source=False,
size=batch_size,
)
for doc in response["hits"]["hits"]:
sample_id = doc["fields"]["sample_id"][0]
sample_or_label_id = doc["_id"]
sample_ids.append(sample_id)
label_ids.append(sample_or_label_id)
return sample_ids, label_ids
def _get_dimension(self):
if self.total_index_size == 0:
return None
if self.config.patches_field is not None:
embeddings, _, _ = self.get_embeddings(
label_ids=self._label_ids[:1]
)
else:
embeddings, _, _ = self.get_embeddings(
sample_ids=self._sample_ids[:1]
)
return embeddings.shape[1]
def _get_metric(self):
if self._metric is None:
try:
# We must ask ES rather than using `self.config.metric` because
# we may be working with a preexisting index
self._metric = self._client.indices.get_mapping(
index=self.config.index_name
)[self.config.index_name]["mappings"]["properties"]["vector"][
"similarity"
]
except:
logger.warning(
"Failed to infer similarity metric from index '%s'",
self.config.index_name,
)
return self._metric
def _index_exists(self):
if self.config.index_name is None:
return False
return self.config.index_name in self._get_index_names()
def _create_index(self, dimension):
metric = _SUPPORTED_METRICS[self.config.metric]
mappings = {
"properties": {
"vector": {
"type": "dense_vector",
"dims": dimension,
"index": "true",
"similarity": metric,
}
}
}
self._client.indices.create(
index=self.config.index_name, mappings=mappings
)
self._metric = metric
def _get_existing_ids(self, ids):
docs = [{"_index": self.config.index_name, "_id": i} for i in ids]
resp = self._client.mget(docs=docs)
return [d["_id"] for d in resp["docs"] if d["found"]]
def add_to_index(
self,
embeddings,
sample_ids,
label_ids=None,
overwrite=True,
allow_existing=True,
warn_existing=False,
reload=True,
batch_size=500,
):
if not self._index_exists():
self._create_index(embeddings.shape[1])
if label_ids is not None:
ids = label_ids
else:
ids = sample_ids
if warn_existing or not allow_existing or not overwrite:
existing_ids = self._get_existing_ids(ids)
num_existing = len(existing_ids)
if num_existing > 0:
if not allow_existing:
raise ValueError(
"Found %d IDs (eg %s) that already exist in the index"
% (num_existing, next(iter(existing_ids)))
)
if warn_existing:
if overwrite:
logger.warning(
"Overwriting %d IDs that already exist in the "
"index",
num_existing,
)
else:
logger.warning(
"Skipping %d IDs that already exist in the index",
num_existing,
)
else:
existing_ids = set()
if existing_ids and not overwrite:
del_inds = [i for i, _id in enumerate(ids) if _id in existing_ids]
embeddings = np.delete(embeddings, del_inds, axis=0)
sample_ids = np.delete(sample_ids, del_inds)
if label_ids is not None:
label_ids = np.delete(label_ids, del_inds)
if self._get_metric() == _SUPPORTED_METRICS["dotproduct"]:
embeddings /= np.linalg.norm(embeddings, axis=1)[:, np.newaxis]
embeddings = [e.tolist() for e in embeddings]
sample_ids = list(sample_ids)
if label_ids is not None:
ids = list(label_ids)
else:
ids = list(sample_ids)
for _embeddings, _ids, _sample_ids in zip(
fou.iter_batches(embeddings, batch_size),
fou.iter_batches(ids, batch_size),
fou.iter_batches(sample_ids, batch_size),
):
operations = []
for _e, _id, _sid in zip(_embeddings, _ids, _sample_ids):
operations.append(
{"index": {"_index": self.config.index_name, "_id": _id}}
)
operations.append({"sample_id": _sid, "vector": _e})
self._client.bulk(
index=self.config.index_name,
operations=operations,
refresh=True,
)
if reload:
self.reload()
def remove_from_index(
self,
sample_ids=None,
label_ids=None,
allow_missing=True,
warn_missing=False,
reload=True,
):
if label_ids is not None:
ids = label_ids
else:
ids = sample_ids
if not allow_missing or warn_missing:
existing_ids = self._get_existing_ids(ids)
missing_ids = set(ids) - set(existing_ids)
num_missing = len(missing_ids)
if num_missing > 0:
if not allow_missing:
raise ValueError(
"Found %d IDs (eg %s) that are not present in the "
"index" % (num_missing, next(iter(missing_ids)))
)
if warn_missing:
logger.warning(
"Ignoring %d IDs that are not present in the index",
num_missing,
)
ids = existing_ids
operations = [
{"delete": {"_index": self.config.index_name, "_id": i}}
for i in ids
]
self._client.bulk(body=operations, refresh=True)
if reload:
self.reload()
def get_embeddings(
self,
sample_ids=None,
label_ids=None,
allow_missing=True,
warn_missing=False,
):
if label_ids is not None:
if self.config.patches_field is None:
raise ValueError("This index does not support label IDs")
if sample_ids is not None:
logger.warning(
"Ignoring sample IDs when label IDs are provided"
)
if sample_ids is not None and self.config.patches_field is not None:
(
embeddings,
sample_ids,
label_ids,
missing_ids,
) = self._get_patch_embeddings_from_sample_ids(sample_ids)
elif self.config.patches_field is not None:
(
embeddings,
sample_ids,
label_ids,
missing_ids,
) = self._get_patch_embeddings_from_label_ids(label_ids)
else:
(
embeddings,
sample_ids,
label_ids,
missing_ids,
) = self._get_sample_embeddings(sample_ids)
num_missing_ids = len(missing_ids)
if num_missing_ids > 0:
if not allow_missing:
raise ValueError(
"Found %d IDs (eg %s) that do not exist in the index"
% (num_missing_ids, missing_ids[0])
)
if warn_missing:
logger.warning(
"Skipping %d IDs that do not exist in the index",
num_missing_ids,
)
embeddings = np.array(embeddings)
sample_ids = np.array(sample_ids)
if label_ids is not None:
label_ids = np.array(label_ids)
return embeddings, sample_ids, label_ids
def _parse_embeddings_response(self, response, label_id=True):
found_embeddings = []
found_sample_ids = []
found_label_ids = []
for r in response:
if r.get("found", True):
found_embeddings.append(r["_source"]["vector"])
if label_id:
found_sample_ids.append(r["_source"]["sample_id"])
found_label_ids.append(r["_id"])
else:
found_sample_ids.append(r["_id"])
return found_embeddings, found_sample_ids, found_label_ids
def _get_sample_embeddings(self, sample_ids, batch_size=1000):
found_embeddings = []
found_sample_ids = []
if sample_ids is None:
sample_ids, label_ids = self._get_index_ids(batch_size=batch_size)
for batch_ids in fou.iter_batches(sample_ids, batch_size):
response = self._client.mget(
index=self.config.index_name, ids=batch_ids, source=True
)
(
_found_embeddings,
_found_sample_ids,
_,
) = self._parse_embeddings_response(
response["docs"], label_id=False
)
found_embeddings += _found_embeddings
found_sample_ids += _found_sample_ids
missing_ids = list(set(sample_ids) - set(found_sample_ids))
return found_embeddings, found_sample_ids, None, missing_ids
def _get_patch_embeddings_from_label_ids(self, label_ids, batch_size=1000):
found_embeddings = []
found_sample_ids = []
found_label_ids = []
if label_ids is None:
sample_ids, label_ids = self._get_index_ids(batch_size=batch_size)
for batch_ids in fou.iter_batches(label_ids, batch_size):
response = self._client.mget(
index=self.config.index_name, ids=batch_ids, source=True
)
(
_found_embeddings,
_found_sample_ids,
_found_label_ids,
) = self._parse_embeddings_response(response["docs"])
found_embeddings += _found_embeddings
found_sample_ids += _found_sample_ids
found_label_ids += _found_label_ids
missing_ids = list(set(label_ids) - set(found_label_ids))
return found_embeddings, found_sample_ids, found_label_ids, missing_ids
def _get_patch_embeddings_from_sample_ids(
self, sample_ids, batch_size=100
):
found_embeddings = []
found_sample_ids = []
found_label_ids = []
if sample_ids is None:
sample_ids, label_ids = self._get_index_ids(batch_size=batch_size)
for batch_ids in fou.iter_batches(sample_ids, batch_size):
response = self._client.search(
index=self.config.index_name,
body={"query": {"terms": {"sample_id": sample_ids}}},
)
(
_found_embeddings,
_found_sample_ids,
_found_label_ids,
) = self._parse_embeddings_response(response["hits"]["hits"])
found_embeddings += _found_embeddings
found_sample_ids += _found_sample_ids
found_label_ids += _found_label_ids
missing_ids = list(set(sample_ids) - set(found_sample_ids))
return found_embeddings, found_sample_ids, found_label_ids, missing_ids
def cleanup(self):
self._client.indices.delete(
index=self.config.index_name, ignore_unavailable=True
)
def _kneighbors(
self,
query=None,
k=None,
reverse=False,
aggregation=None,
return_dists=False,
):
if query is None:
raise ValueError(
"Elasticsearch does not support full index neighbors"
)
if reverse is True:
raise ValueError(
"Elasticsearch does not support least similarity queries"
)
if aggregation not in (None, "mean"):
raise ValueError(
f"Elasticsearch does not support {aggregation} aggregation"
)
query = self._parse_neighbors_query(query)
if aggregation == "mean" and query.ndim == 2:
query = query.mean(axis=0)
single_query = query.ndim == 1
if single_query:
query = [query]
if self.has_view:
if self.config.patches_field is not None:
index_ids = self.current_label_ids
else:
index_ids = self.current_sample_ids
_filter = {"terms": {"_id": list(index_ids)}}
else:
_filter = None
sample_ids = []
label_ids = [] if self.config.patches_field is not None else None
dists = []
for q in query:
if self._get_metric() == _SUPPORTED_METRICS["dotproduct"]:
q /= np.linalg.norm(q)
knn = {
"field": "vector",
"query_vector": q.tolist(),
"k": k,
"num_candidates": 10 * k,
}
if _filter:
knn["filter"] = _filter
source = self.config.patches_field is not None
response = self._client.search(
index=self.config.index_name,
knn=knn,
size=k,
source=source,
)
if self.config.patches_field is not None:
sample_ids.append(
[
r["_source"]["sample_id"]
for r in response["hits"]["hits"]
]
)
label_ids.append([r["_id"] for r in response["hits"]["hits"]])
else:
sample_ids.append([r["_id"] for r in response["hits"]["hits"]])
if return_dists:
dists.append([r["_score"] for r in response["hits"]["hits"]])
if single_query:
sample_ids = sample_ids[0]
if label_ids is not None:
label_ids = label_ids[0]
if return_dists:
dists = dists[0]
if return_dists:
return sample_ids, label_ids, dists
return sample_ids, label_ids
def _parse_neighbors_query(self, query):
if etau.is_str(query):
query_ids = [query]
single_query = True
else:
query = np.asarray(query)
# Query by vector(s)
if np.issubdtype(query.dtype, np.number):
return query
query_ids = list(query)
single_query = False
# Query by ID(s)
response = self._client.mget(
index=self.config.index_name, ids=query_ids, source=True
)
query = np.array(
[r["_source"]["vector"] for r in response["docs"] if r["found"]]
)
if query.size == 0:
raise ValueError(
"Query IDs %s were not found in the index" % query_ids
)
if single_query:
query = query[0, :]
return query
@classmethod
def _from_dict(cls, d, samples, config, brain_key):
return cls(samples, config, brain_key)
================================================
FILE: fiftyone/brain/internal/core/hardness.py
================================================
"""
Hardness methods.
| Copyright 2017-2026, Voxel51, Inc.
| `voxel51.com <https://voxel51.com/>`_
|
"""
import logging
import numpy as np
from scipy.special import softmax
from scipy.stats import entropy
import fiftyone.core.brain as fob
import fiftyone.core.labels as fol
import fiftyone.core.media as fom
import fiftyone.core.utils as fou
import fiftyone.core.validation as fov
logger = logging.getLogger(__name__)
_ALLOWED_TYPES = (fol.Classification, fol.Classifications)
def compute_hardness(samples, label_field, hardness_field, progress):
"""See ``fiftyone/brain/__init__.py``."""
#
# Algorithm
#
# Hardness is computed directly as the entropy of the logits
#
fov.validate_collection(samples)
fov.validate_collection_label_fields(samples, label_field, _ALLOWED_TYPES)
if samples.media_type == fom.VIDEO:
hardness_field, _ = samples._handle_frame_field(hardness_field)
config = HardnessConfig(label_field, hardness_field)
brain_key = hardness_field
brain_method = config.build()
brain_method.ensure_requirements()
brain_method.register_run(samples, brain_key, cleanup=False)
brain_method.register_samples(samples)
view = samples.select_fields(label_field)
processing_frames = samples._is_frame_field(label_field)
logger.info("Computing hardness...")
for sample in view.iter_samples(progress=progress):
if processing_frames:
images = sample.frames.values()
else:
images = [sample]
sample_hardness = []
for image in images:
hardness = brain_method.process_image(image)
if hardness is not None:
sample_hardness.append(hardness)
if processing_frames:
image[hardness_field] = hardness
if sample_hardness:
sample[hardness_field] = np.max(sample_hardness)
else:
sample[hardness_field] = None
sample.save()
brain_method.save_run_results(samples, brain_key, None)
logger.info("Hardness computation complete")
# @todo move to `fiftyone/brain/hardness.py`
class HardnessConfig(fob.BrainMethodConfig):
def __init__(self, label_field, hardness_field, **kwargs):
self.label_field = label_field
self.hardness_field = hardness_field
super().__init__(**kwargs)
@property
def type(self):
return "mistakenness"
@property
def method(self):
return "entropy"
class Hardness(fob.BrainMethod):
def __init__(self, config):
super().__init__(config)
self.label_field = None
def ensure_requirements(self):
pass
def register_samples(self, samples):
self.label_field, _ = samples._handle_frame_field(
self.config.label_field
)
def process_image(self, sample_or_frame):
label = _get_data(sample_or_frame, self.label_field)
if label is None:
return None
return entropy(softmax(np.asarray(label.logits)))
def get_fields(self, samples, brain_key):
label_field = self.config.label_field
hardness_field = self.config.hardness_field
fields = [label_field, hardness_field]
if samples._is_frame_field(label_field):
fields.append(samples._FRAMES_PREFIX + hardness_field)
return fields
def cleanup(self, samples, brain_key):
label_field = self.config.label_field
hardness_field = self.config.hardness_field
samples._dataset.delete_sample_fields(hardness_field, error_level=1)
if samples._is_frame_field(label_field):
samples._dataset.delete_frame_fields(hardness_field, error_level=1)
def _validate_run(self, samples, brain_key, existing_info):
self._validate_fields_match(brain_key, "hardness_field", existing_info)
def _get_data(sample, label_field):
label = sample[label_field]
if label is None:
return None
if label.logits is None:
raise ValueError(
"Sample '%s' field '%s' has no logits" % (sample.id, label_field)
)
return label
================================================
FILE: fiftyone/brain/internal/core/lancedb.py
================================================
"""
LanceDB similarity backend.
| Copyright 2017-2026, Voxel51, Inc.
| `voxel51.com <https://voxel51.com/>`_
|
"""
import logging
import numpy as np
import eta.core.utils as etau
import fiftyone.core.storage as fos
import fiftyone.core.utils as fou
import fiftyone.brain.internal.core.utils as fbu
from fiftyone.brain.similarity import (
SimilarityConfig,
Similarity,
SimilarityIndex,
)
lancedb = fou.lazy_import("lancedb")
pa = fou.lazy_import("pyarrow")
_SUPPORTED_METRICS = {
"cosine": "cosine",
"euclidean": "l2",
}
logger = logging.getLogger(__name__)
class LanceDBSimilarityConfig(SimilarityConfig):
"""Configuration for a LanceDB similarity instance.
Args:
table_name (None): the name of the LanceDB table to use. If none is
provided, a new table will be created
metric ("cosine"): the embedding distance metric to use when creating a
new index. Supported values are ``("cosine", "euclidean")``
uri ("/tmp/lancedb"): the database URI to use
**kwargs: keyword arguments for :class:`SimilarityConfig`
"""
def __init__(
self,
table_name=None,
metric="cosine",
uri="/tmp/lancedb",
**kwargs,
):
if metric not in _SUPPORTED_METRICS:
raise ValueError(
"Unsupported metric '%s'. Supported values are %s"
% (metric, tuple(_SUPPORTED_METRICS.keys()))
)
super().__init__(**kwargs)
self.table_name = table_name
self.metric = metric
# store privately so these aren't serialized
self._uri = uri
@property
def method(self):
return "lancedb"
@property
def uri(self):
return self._uri
@uri.setter
def uri(self, value):
self._uri = value
@property
def max_k(self):
return None
@property
def supports_least_similarity(self):
return False
@property
def supported_aggregations(self):
return ("mean",)
def load_credentials(self, uri=None):
self._load_parameters(uri=uri)
class LanceDBSimilarity(Similarity):
"""LanceDB similarity factory.
Args:
config: a :class:`LanceDBSimilarityConfig`
"""
def ensure_requirements(self):
fou.ensure_package("lancedb")
def ensure_usage_requirements(self):
fou.ensure_package("lancedb")
def initialize(self, samples, brain_key):
return LanceDBSimilarityIndex(
samples, self.config, brain_key, backend=self
)
class LanceDBSimilarityIndex(SimilarityIndex):
"""Class for interacting with LanceDB similarity indexes.
Args:
samples: the :class:`fiftyone.core.collections.SampleCollection` used
config: the :class:`LanceDBSimilarityConfig` used
brain_key: the brain key
backend (None): a :class:`LanceDBSimilarity` instance
"""
def __init__(self, samples, config, brain_key, backend=None):
super().__init__(samples, config, brain_key, backend=backend)
self._table = None
self._db = None
self._initialize()
def _initialize(self):
try:
db = lancedb.connect(self.config.uri)
except Exception as e:
raise ValueError(
"Failed to connect to LanceDB backend at URI '%s'. Refer to "
"https://docs.voxel51.com/integrations/lancedb.html for more "
"information" % self.config.uri
) from e
table_names = db.table_names()
if self.config.table_name is None:
root = "fiftyone-" + fou.to_slug(self.samples._root_dataset.name)
table_name = fbu.get_unique_name(root, table_names)
self.config.table_name = table_name
self.save_config()
if self.config.table_name in table_names:
table = db.open_table(self.config.table_name)
else:
table = None
self._db = db
self._table = table
@property
def table(self):
"""The ``lancedb.LanceTable`` instance for this index."""
return self._table
@property
def total_index_size(self):
if self._table is None:
return 0
return len(self._table)
def add_to_index(
self,
embeddings,
sample_ids,
label_ids=None,
overwrite=True,
allow_existing=True,
warn_existing=False,
reload=True,
):
if self._table is None:
pa_table = pa.Table.from_arrays(
[[], [], []], names=["id", "sample_id", "vector"]
)
else:
pa_table = self._table.to_arrow()
if label_ids is not None:
ids = label_ids
else:
ids = sample_ids
if warn_existing or not allow_existing or not overwrite:
existing_ids = set(pa_table["id"].to_pylist()) & set(ids)
num_existing = len(existing_ids)
if num_existing > 0:
if not allow_existing:
raise ValueError(
"Found %d IDs (eg %s) that already exist in the index"
% (num_existing, next(iter(existing_ids)))
)
if warn_existing:
if overwrite:
logger.warning(
"Overwriting %d IDs that already exist in the "
"index",
num_existing,
)
else:
logger.warning(
"Skipping %d IDs that already exist in the index",
num_existing,
)
else:
existing_ids = set()
if existing_ids and not overwrite:
del_inds = [i for i, _id in enumerate(ids) if _id in existing_ids]
embeddings = np.delete(embeddings, del_inds, axis=0)
sample_ids = np.delete(sample_ids, del_inds)
if label_ids is not None:
label_ids = np.delete(label_ids, del_inds)
if label_ids is not None:
ids = list(label_ids)
else:
ids = list(sample_ids)
dim = embeddings.shape[1]
if self._table:
prev_embeddings = np.concatenate(
pa_table["vector"].to_numpy()
).reshape(-1, dim)
embeddings = np.concatenate([prev_embeddings, embeddings])
ids = pa_table["id"].to_pylist() + ids
sample_ids = pa_table["sample_id"].to_pylist() + sample_ids
embeddings = pa.array(embeddings.reshape(-1), type=pa.float32())
embeddings = pa.FixedSizeListArray.from_arrays(embeddings, dim)
sample_ids = list(sample_ids)
pa_table = pa.Table.from_arrays(
[ids, sample_ids, embeddings], names=["id", "sample_id", "vector"]
)
self._table = self._db.create_table(
self.config.table_name, pa_table, mode="overwrite"
)
if reload:
self.reload()
def remove_from_index(
self,
sample_ids=None,
label_ids=None,
allow_missing=True,
warn_missing=False,
reload=True,
):
if label_ids is not None:
ids = label_ids
else:
ids = sample_ids
if not allow_missing or warn_missing:
existing_ids = list(self._index.fetch(ids).vectors.keys())
missing_ids = set(ids) - set(existing_ids)
num_missing = len(missing_ids)
if num_missing > 0:
if not allow_missing:
raise ValueError(
"Found %d IDs (eg %s) that are not present in the "
"index" % (num_missing, next(iter(missing_ids)))
)
if warn_missing:
logger.warning(
"Ignoring %d IDs that are not present in the index",
num_missing,
)
ids = existing_ids
df = self._table.to_pandas()
df = df[~df["id"].isin(ids)]
self._table = self._db.create_table(
self.config.table_name, df, mode="overwrite"
)
if reload:
self.reload()
def get_embeddings(
self,
sample_ids=None,
label_ids=None,
allow_missing=True,
warn_missing=False,
):
if label_ids is not None:
if self.config.patches_field is None:
raise ValueError("This index does not support label IDs")
if sample_ids is not None:
logger.warning(
"Ignoring sample IDs when label IDs are provided"
)
df = self._table.to_pandas()
found_embeddings = []
found_sample_ids = []
found_label_ids = []
missing_ids = []
if sample_ids is not None and self.config.patches_field is not None:
df.set_index("sample_id", drop=False, inplace=True)
if not etau.is_container(sample_ids):
sample_ids = [sample_ids]
for sample_id in sample_ids:
if sample_id in df.index:
found_embeddings.append(df.loc[sample_id]["vector"])
found_sample_ids.append(sample_id)
found_label_ids.append(df.loc[sample_id]["id"])
else:
missing_ids.append(sample_id)
elif self.config.patches_field is not None:
df.set_index("id", drop=False, inplace=True)
if label_ids is None:
label_ids = list(df.index)
elif not etau.is_container(label_ids):
label_ids = [label_ids]
for label_id in label_ids:
if label_id in df.index:
found_embeddings.append(df.loc[label_id]["vector"])
found_sample_ids.append(df.loc[label_id]["sample_id"])
found_label_ids.append(label_id)
else:
missing_ids.append(label_id)
else:
df.set_index("id", drop=False, inplace=True)
if sample_ids is None:
sample_ids = list(df.index)
elif not etau.is_container(sample_ids):
sample_ids = [sample_ids]
for sample_id in sample_ids:
if sample_id in df.index:
found_embeddings.append(df.loc[sample_id]["vector"])
found_sample_ids.append(sample_id)
else:
missing_ids.append(sample_id)
num_missing_ids = len(missing_ids)
if num_missing_ids > 0:
if not allow_missing:
raise ValueError(
"Found %d IDs (eg %s) that do not exist in the index"
% (num_missing_ids, missing_ids[0])
)
if warn_missing:
logger.warning(
"Skipping %d IDs that do not exist in the index",
num_missing_ids,
)
embeddings = np.array(found_embeddings)
sample_ids = np.array(found_sample_ids)
if label_ids is not None:
label_ids = np.array(found_label_ids)
return embeddings, sample_ids, label_ids
def cleanup(self):
if self._db is None:
return
for tbl in (
self.config.table_name,
self.config.table_name + "_filter",
):
if tbl in self._db.table_names():
self._db.drop_table(tbl)
self._table = None
def _kneighbors(
self,
query=None,
k=None,
reverse=False,
aggregation=None,
return_dists=False,
):
if query is None:
raise ValueError("LanceDB does not support full index neighbors")
if reverse is True:
raise ValueError(
"LanceDB does not support least similarity queries"
)
if aggregation not in (None, "mean"):
raise ValueError(
f"LanceDB does not support {aggregation} aggregation"
)
if k is None:
k = self.index_size
query = self._parse_neighbors_query(query)
if aggregation == "mean" and query.ndim == 2:
query = query.mean(axis=0)
single_query = query.ndim == 1
if single_query:
query = [query]
table = self._table
if self.has_view:
if self.config.patches_field is not None:
index_ids = list(self.current_label_ids)
else:
index_ids = list(self.current_sample_ids)
df = table.to_pandas()
df = df[df["id"].isin(index_ids)]
table = self._db.create_table(
self.config.table_name + "_filter", df, mode="overwrite"
)
metric = _SUPPORTED_METRICS[self.config.metric]
sample_ids = []
label_ids = [] if self.config.patches_field is not None else None
dists = []
for q in query:
results = table.search(q).metric(metric).limit(k).to_df()
if self.config.patches_field is not None:
sample_ids.append(results.sample_id.tolist())
label_ids.append(results.id.tolist())
else:
sample_ids.append(results.id.tolist())
if return_dists:
dists.append(results._distance.tolist())
if single_query:
sample_ids = sample_ids[0]
if label_ids is not None:
label_ids = label_ids[0]
if return_dists:
dists = dists[0]
if return_dists:
return sample_ids, label_ids, dists
return sample_ids, label_ids
def _parse_neighbors_query(self, query):
if etau.is_str(query):
query_ids = [query]
single_query = True
else:
query = np.asarray(query)
# Query by vector(s)
if np.issubdtype(query.dtype, np.number):
return query
query_ids = list(query)
single_query = False
# Query by ID(s)
df = self._table.to_pandas()
df = df[df["id"].isin(query_ids)]
query = np.array([v for v in df["vector"]])
if query.size == 0:
raise ValueError(
"Query IDs %s were not found in the index" % query_ids
)
if single_query:
query = query[0, :]
return query
@classmethod
def _from_dict(cls, d, samples, config, brain_key):
return cls(samples, config, brain_key)
================================================
FILE: fiftyone/brain/internal/core/leaky_splits.py
================================================
"""
Finds leaks between splits.
| Copyright 2017-2026, Voxel51, Inc.
| `voxel51.com <https://voxel51.com/>`_
|
"""
import logging
import eta.core.utils as etau
import fiftyone.core.brain as fob
import fiftyone.core.fields as fof
import fiftyone.core.validation as fov
import fiftyone.zoo as foz
from fiftyone import ViewField as F
import fiftyone.brain as fb
import fiftyone.brain.similarity as fbs
import fiftyone.brain.internal.core.utils as fbu
logger = logging.getLogger(__name__)
_DEFAULT_MODEL = "resnet18-imagenet-torch"
_DEFAULT_BATCH_SIZE = None
def compute_leaky_splits(
samples,
splits,
threshold=None,
roi_field=None,
embeddings=None,
similarity_index=None,
model=None,
model_kwargs=None,
force_square=False,
alpha=None,
batch_size=None,
num_workers=None,
skip_failures=True,
progress=None,
):
"""See ``fiftyone/brain/__init__.py``."""
fov.validate_collection(samples)
if etau.is_str(embeddings):
embeddings_field, embeddings_exist = fbu.parse_data_field(
samples,
embeddings,
data_type="embeddings",
)
embeddings = None
else:
embeddings_field = None
embeddings_exist = None
if etau.is_str(similarity_index):
similarity_index = samples.load_brain_results(similarity_index)
if (
model is None
and embeddings is None
and similarity_index is None
and not embeddings_exist
):
model = foz.load_zoo_model(_DEFAULT_MODEL)
if batch_size is None:
batch_size = _DEFAULT_BATCH_SIZE
config = LeakySplitsConfig(
splits=splits,
embeddings_field=embeddings_field,
similarity_index=similarity_index,
model=model,
model_kwargs=model_kwargs,
)
brain_method = config.build()
brain_method.ensure_requirements()
if similarity_index is None:
similarity_index = fb.compute_similarity(
samples,
backend="sklearn",
roi_field=roi_field,
embeddings=embeddings_field or embeddings,
model=model,
model_kwargs=model_kwargs,
force_square=force_square,
alpha=alpha,
batch_size=batch_size,
num_workers=num_workers,
skip_failures=skip_failures,
progress=progress,
)
elif not isinstance(similarity_index, fbs.DuplicatesMixin):
raise ValueError(
"This method only supports similarity indexes that implement the "
"%s mixin" % fbs.DuplicatesMixin
)
split_views = _to_split_views(samples, splits)
index = brain_method.initialize(samples, similarity_index, split_views)
if threshold is not None:
index.find_leaks(threshold)
return index
class LeakySplitsConfig(fob.BrainMethodConfig):
def __init__(
self,
splits=None,
embeddings_field=None,
similarity_index=None,
model=None,
model_kwargs=None,
**kwargs,
):
if isinstance(splits, dict):
splits = None
if similarity_index is not None and not etau.is_str(similarity_index):
similarity_index = similarity_index.key
if model is not None and not etau.is_str(model):
model = etau.get_class_name(model)
self.splits = splits
self.embeddings_field = embeddings_field
self.similarity_index = similarity_index
self.model = model
self.model_kwargs = model_kwargs
super().__init__(**kwargs)
@property
def type(self):
return "leakage"
@property
def method(self):
return "similarity"
class LeakySplits(fob.BrainMethod):
def initialize(self, samples, similarity_index, split_views):
return LeakySplitsIndex(
samples, self.config, similarity_index, split_views
)
def get_fields(self, samples, _):
fields = []
if self.config.embeddings_field is not None:
fields.append(self.config.embeddings_field)
return fields
class LeakySplitsIndex(fob.BrainResults):
def __init__(self, samples, config, similarity_index, split_views):
super().__init__(samples, config, None)
self._similarity_index = similarity_index
self._split_views = split_views
self._id2split = None
self._thresh = None
self._leak_ids = None
self._initialize()
@property
def split_views(self):
"""A dict mapping split names to views."""
return self._split_views
@property
def thresh(self):
"""The threshold used by the last call to :meth:`find_leaks`."""
return self._thresh
@property
def leak_ids(self):
"""The list of leaky sample IDs from the last call to
:meth:`find_leaks`.
"""
return self._leak_ids
def find_leaks(self, thresh):
"""Scans the index for leaks between splits.
Args:
thresh: the similarity distance threshold to use when detecting
potential leaks
"""
if thresh == self._thresh:
return
# Find duplicates
self._thresh = thresh
if self._similarity_index.thresh != self._thresh:
self._similarity_index.find_duplicates(self._thresh)
# Filter duplicates to just those with neighbors in different splits
leak_ids = []
neighbors_map = self._similarity_index.neighbors_map
for sample_id, neighbors in neighbors_map.items():
_leak_ids = []
sample_split = self._id2split.get(sample_id, None)
if sample_split is None:
continue
for n in neighbors:
neighbor_id = n[0]
neighbor_split = self._id2split.get(neighbor_id, None)
if neighbor_split is None:
continue
if neighbor_split != sample_split:
_leak_ids.append(neighbor_id)
if _leak_ids:
leak_ids.append(sample_id)
leak_ids.extend(_leak_ids)
self._leak_ids = leak_ids
def leaks_view(self):
"""Returns a view containg all potential leaks generated by the last
call to :meth:`find_leaks`.
Returns:
a :class:`fiftyone.core.view.DatasetView`
"""
if self._thresh is None:
raise ValueError("You must first call `find_leaks()`")
return self.samples.select(self._leak_ids, ordered=True)
def leaks_for_sample(self, sample_or_id):
"""Returns a view that contains all leaks related to the given sample.
The given sample is always first in the returned view, followed by any
related leaks.
Args:
sample_or_id: a :class:`fiftyone.core.sample.Sample` or sample ID
Returns:
a :class:`fiftyone.core.view.DatasetView`
"""
if self._thresh is None:
raise ValueError("You must first call `find_leaks()`")
if etau.is_str(sample_or_id):
sample_id = sample_or_id
else:
sample_id = sample_or_id.id
sample_split = self._id2split[sample_id]
neighbors_map = self._similarity_index.neighbors_map
leak_ids = []
if sample_id in neighbors_map.keys():
neighbors = neighbors_map[sample_id]
leak_ids = [
n[0] for n in neighbors if self._id2split[n[0]] != sample_split
]
else:
for unique_id, neighbors in neighbors_map.items():
if sample_id in [n[0] for n in neighbors]:
leak_ids = [
n[0]
for n in neighbors
if self._id2split[n[0]] != sample_split
]
leak_ids.append(unique_id)
break
return self.samples.select([sample_id] + leak_ids, ordered=True)
def no_leaks_view(self, view=None):
"""Returns a view with leaks excluded.
Args:
view (None): an optional :class:`fiftyone.core.view.DatasetView`
from which to exclude. By default, :meth:`samples` is used
"""
if self._thresh is None:
raise ValueError("You must first call `find_leaks()`")
if view is None:
view = self.samples
return view.exclude(self._leak_ids)
def tag_leaks(self, tag="leak"):
"""Tags all potential leaks in :meth:`leaks_view` with the given tag.
Args:
tag ("leak"): the tag string to apply
"""
self.leaks_view().tag_samples(tag)
def _initialize(self):
id2split = {}
split_ids = {}
for split_name, split_view in self.split_views.items():
sample_ids = set(split_view.values("id"))
split_ids[split_name] = sample_ids
id2split.update({sid: split_name for sid in sample_ids})
# Check for overlapping splits
split_names = list(split_ids.keys())
for idx, split1 in enumerate(split_names):
for split2 in split_names[idx + 1 :]:
overlap = split_ids[split1] & split_ids[split2]
if overlap:
logger.warning(
"The '%s' and '%s' splits contain %d overlapping samples."
"Use dataset.match_tags('%s').match_tags('%s') to "
"identify them",
split1,
split2,
len(overlap),
split1,
split2,
)
# Check for samples not in index
index_ids = self._similarity_index.sample_ids
if index_ids is not None:
index_ids = set(index_ids)
all_split_ids = set(id2split.keys())
missing_ids = all_split_ids - index_ids
if missing_ids:
logger.warning(
"The provided splits contain %d samples (eg '%s') that "
"are not present in the index",
len(missing_ids),
next(iter(missing_ids)),
)
self._id2split = id2split
def _to_split_views(samples, splits):
if etau.is_container(splits):
return {tag: samples.match_tags(tag) for tag in splits}
if isinstance(splits, str):
field = samples.get_field(splits)
if isinstance(field, fof.ListField):
return {
value: samples.exists(splits).match(F(splits).contains(value))
for value in samples.distinct(splits)
}
else:
return {
value: samples.match(F(splits) == value)
for value in samples.distinct(splits)
}
================================================
FILE: fiftyone/brain/internal/core/milvus.py
================================================
"""
Milvus similarity backend.
| Copyright 2017-2026, Voxel51, Inc.
| `voxel51.com <https://voxel51.com/>`_
|
"""
import logging
import numpy as np
from uuid import uuid4
import eta.core.utils as etau
import fiftyone.core.utils as fou
from fiftyone.brain.similarity import (
SimilarityConfig,
Similarity,
SimilarityIndex,
)
import fiftyone.brain.internal.core.utils as fbu
pymilvus = fou.lazy_import("pymilvus")
logger = logging.getLogger(__name__)
_SUPPORTED_METRICS = {
"cosine": "COSINE",
"dotproduct": "IP",
"euclidean": "L2",
}
class MilvusSimilarityConfig(SimilarityConfig):
"""Configuration for the Milvus similarity backend.
Args:
collection_name (None): the name of a Milvus collection to use or
create. If none is provided, a new collection will be created
metric ("dotproduct"): the embedding distance metric to use when
creating a new index. Supported values are
``("cosine", "dotproduct", "euclidean")``
consistency_level ("Session"): the consistency level to use. Supported
values are ``("Session", "Strong", "Bounded", "Eventually")``
uri (None): a full Milvus server address to use, like
``"http://localhost:19530"``,
``"tcp:localhost:19530"``, or
``"https://ok.s3.south.com:19530"``
user (None): a username to use
password (None): a password to use
secure (None): whether to enable TLS (True)
token (None): a header token for RPC calls
db_name (None): a database name for the connection
client_key_path (None): a client.key path for TLS two-way
client_pem_path (None): a client.pem path for TLS two-way
ca_pem_path (None): a ca.pem path for TLS two-way
server_pem_path (None): a server.pem path for TLS one-way
server_name (None): the server name, for TLS
**kwargs: keyword arguments for
:class:`fiftyone.brain.similarity.SimilarityConfig`
"""
def __init__(
self,
collection_name=None,
metric="dotproduct",
consistency_level="Session",
uri=None,
user=None,
password=None,
secure=None,
token=None,
db_name=None,
client_key_path=None,
client_pem_path=None,
ca_pem_path=None,
server_pem_path=None,
server_name=None,
**kwargs,
):
if metric not in _SUPPORTED_METRICS:
raise ValueError(
"Unsupported metric '%s'. Supported values are %s"
% (metric, tuple(_SUPPORTED_METRICS.keys()))
)
super().__init__(**kwargs)
self.collection_name = collection_name
self.metric = metric
self.consistency_level = consistency_level
# store privately so these aren't serialized
self._uri = uri
self._user = user
self._password = password
self._secure = secure
self._token = token
self._db_name = db_name
self._client_key_path = client_key_path
self._client_pem_path = client_pem_path
self._ca_pem_path = ca_pem_path
self._server_pem_path = server_pem_path
self._server_name = server_name
@property
def method(self):
return "milvus"
@property
def uri(self):
return self._uri
@uri.setter
def uri(self, value):
self._uri = value
@property
def user(self):
return self._user
@user.setter
def user(self, value):
self._user = value
@property
def password(self):
return self._password
@password.setter
def password(self, value):
self._password = value
@property
def secure(self):
return self._secure
@secure.setter
def secure(self, value):
self._secure = value
@property
def token(self):
return self._token
@token.setter
def token(self, value):
self._token = value
@property
def db_name(self):
return self._db_name
@db_name.setter
def db_name(self, value):
self._db_name = value
@property
def client_key_path(self):
return self._client_key_path
@client_key_path.setter
def client_key_path(self, value):
self._client_key_path = value
@property
def client_pem_path(self):
return self._client_pem_path
@client_pem_path.setter
def client_pem_path(self, value):
self._client_pem_path = value
@property
def ca_pem_path(self):
return self._ca_pem_path
@ca_pem_path.setter
def ca_pem_path(self, value):
self._ca_pem_path = value
@property
def server_pem_path(self):
return self._server_pem_path
@server_pem_path.setter
def server_pem_path(self, value):
self._server_pem_path = value
@property
def server_name(self):
return self._server_name
@server_name.setter
def server_name(self, value):
self._server_name = value
@property
def max_k(self):
return 16384
@property
def supports_least_similarity(self):
return False
@property
def supported_aggregations(self):
return ("mean",)
@property
def index_params(self):
return {
"metric_type": _SUPPORTED_METRICS[self.metric],
"index_type": "HNSW",
"params": {"M": 8, "efConstruction": 64},
}
@property
def search_params(self):
return {
"HNSW": {
"metric_type": _SUPPORTED_METRICS[self.metric],
"params": {"ef": 10},
},
}
def load_credentials(
self,
uri=None,
user=None,
password=None,
secure=None,
token=None,
db_name=None,
client_key_path=None,
client_pem_path=None,
ca_pem_path=None,
server_pem_path=None,
server_name=None,
):
self._load_parameters(
uri=uri,
user=user,
password=password,
secure=secure,
token=token,
db_name=db_name,
client_key_path=client_key_path,
client_pem_path=client_pem_path,
ca_pem_path=ca_pem_path,
server_pem_path=server_pem_path,
server_name=server_name,
)
class MilvusSimilarity(Similarity):
"""Milvus similarity factory.
Args:
config: a :class:`MilvusSimilarityConfig`
"""
def ensure_requirements(self):
fou.ensure_package("pymilvus")
def ensure_usage_requirements(self):
fou.ensure_package("pymilvus")
def initialize(self, samples, brain_key):
return MilvusSimilarityIndex(
samples, self.config, brain_key, backend=self
)
class MilvusSimilarityIndex(SimilarityIndex):
"""Class for interacting with Milvus similarity indexes.
Args:
samples: the :class:`fiftyone.core.collections.SampleCollection` used
config: the :class:`MilvusSimilarityConfig` used
brain_key: the brain key
backend (None): a :class:`MilvusSimilarity` instance
"""
def __init__(self, samples, config, brain_key, backend=None):
super().__init__(samples, config, brain_key, backend=backend)
self._alias = None
self._collection = None
self._initialize()
def _initialize(self):
kwargs = {}
for key in (
"uri",
"user",
"password",
"secure",
"token",
"db_name",
"client_key_path",
"client_pem_path",
"ca_pem_path",
"server_pem_path",
"server_name",
):
value = getattr(self.config, key, None)
if value is not None:
kwargs[key] = value
alias = uuid4().hex if kwargs else "default"
try:
pymilvus.connections.connect(alias=alias, **kwargs)
except pymilvus.MilvusException as e:
raise ValueError(
"Failed to connect to Milvus backend at URI '%s'. Refer to "
"https://docs.voxel51.com/integrations/milvus.html for more "
"information" % self.config.uri
) from e
collection_names = pymilvus.utility.list_collections(using=alias)
if self.config.collection_name is None:
# Milvus only supports numbers, letters and underscores
root = "fiftyone-" + fou.to_slug(self.samples._root_dataset.name)
root = root.replace("-", "_")
collection_name = fbu.get_unique_name(root, collection_names)
collection_name = collection_name.replace("-", "_")
self.config.collection_name = collection_name
self.save_config()
if self.config.collection_name in collection_names:
collection = pymilvus.Collection(
self.config.collection_name, using=alias
)
collection.load()
else:
collection = None
self._alias = alias
self._collection = collection
def _create_collection(self, dimension):
schema = pymilvus.CollectionSchema(
[
pymilvus.FieldSchema(
"pk",
pymilvus.DataType.VARCHAR,
is_primary=True,
auto_id=False,
max_length=64000,
),
pymilvus.FieldSchema(
"vector", pymilvus.DataType.FLOAT_VECTOR, dim=dimension
),
pymilvus.FieldSchema(
"sample_id", pymilvus.DataType.VARCHAR, max_length=64000
),
]
)
collection = pymilvus.Collection(
self.config.collection_name,
schema,
consistency_level=self.config.consistency_level,
using=self._alias,
)
collection.create_index(
"vector", index_params=self.config.index_params
)
collection.load()
self._collection = collection
@property
def collection(self):
"""The ``pymilvus.Collection`` instance for this index."""
return self._collection
@property
def total_index_size(self):
if self._collection is None:
return 0
return self._collection.num_entities
def add_to_index(
self,
embeddings,
sample_ids,
label_ids=None,
overwrite=True,
allow_existing=True,
warn_existing=False,
reload=True,
batch_size=100,
):
if self._collection is None:
self._create_collection(embeddings.shape[1])
if label_ids is not None:
ids = label_ids
else:
ids = sample_ids
if warn_existing or not allow_existing or not overwrite:
existing_ids = self._get_existing_ids(ids)
num_existing = len(existing_ids)
if num_existing > 0:
if not allow_existing:
raise ValueError(
"Found %d IDs (eg %s) that already exist in the index"
% (num_existing, next(iter(existing_ids)))
)
if warn_existing:
if overwrite:
logger.warning(
"Overwriting %d IDs that already exist in the "
"index",
num_existing,
)
else:
logger.warning(
"Skipping %d IDs that already exist in the index",
num_existing,
)
else:
existing_ids = set()
if existing_ids and not overwrite:
del_inds = [i for i, _id in enumerate(ids) if _id in existing_ids]
embeddings = np.delete(embeddings, del_inds, axis=0)
sample_ids = np.delete(sample_ids, del_inds)
if label_ids is not None:
label_ids = np.delete(label_ids, del_inds)
elif existing_ids and overwrite:
self._delete_ids(existing_ids)
embeddings = [e.tolist() for e in embeddings]
sample_ids = list(sample_ids)
ids = list(ids)
for _embeddings, _ids, _sample_ids in zip(
fou.iter_batches(embeddings, batch_size),
fou.iter_batches(ids, batch_size),
fou.iter_batches(sample_ids, batch_size),
):
insert_data = [
list(_ids),
list(_embeddings),
list(_sample_ids),
]
self._collection.insert(insert_data)
self._collection.flush()
if reload:
self.reload()
def _get_existing_ids(self, ids):
ids = ['"' + str(entry) + '"' for entry in ids]
expr = f"""pk in [{','.join(ids)}]"""
return self._collection.query(expr)
def _delete_ids(self, ids):
ids = ['"' + str(entry) + '"' for entry in ids]
expr = f"""pk in [{','.join(ids)}]"""
self._collection.delete(expr)
self._collection.flush()
def _get_embeddings(self, ids):
ids = ['"' + str(entry) + '"' for entry in ids]
expr = f"""pk in [{','.join(ids)}]"""
return self._collection.query(
expr, output_fields=["pk", "sample_id", "vector"]
)
def remove_from_index(
self,
sample_ids=None,
label_ids=None,
allow_missing=True,
warn_missing=False,
reload=True,
):
if label_ids is not None:
ids = label_ids
else:
ids = sample_ids
if not allow_missing or warn_missing:
existing_ids = self._get_existing_ids(ids)
missing_ids = set(ids) - set(existing_ids)
num_missing = len(missing_ids)
if num_missing > 0:
if not allow_missing:
raise ValueError(
"Found %d IDs (eg %s) that are not present in the "
"index" % (num_missing, next(iter(missing_ids)))
)
if warn_missing:
logger.warning(
"Ignoring %d IDs that are not present in the index",
num_missing,
)
ids = existing_ids
self._delete_ids(ids=ids)
if reload:
self.reload()
def get_embeddings(
self,
sample_ids=None,
label_ids=None,
allow_missing=True,
warn_missing=False,
):
if label_ids is not None:
if self.config.patches_field is None:
raise ValueError("This index does not support label IDs")
if sample_ids is not None:
logger.warning(
"Ignoring sample IDs when label IDs are provided"
)
if sample_ids is not None and self.config.patches_field is not None:
(
embeddings,
sample_ids,
label_ids,
missing_ids,
) = self._get_patch_embeddings_from_sample_ids(sample_ids)
elif self.config.patches_field is not None:
(
embeddings,
sample_ids,
label_ids,
missing_ids,
) = self._get_patch_embeddings_from_label_ids(label_ids)
else:
(
embeddings,
sample_ids,
label_ids,
missing_ids,
) = self._get_sample_embeddings(sample_ids)
num_missing_ids = len(missing_ids)
if num_missing_ids > 0:
if not allow_missing:
raise ValueError(
"Found %d IDs (eg %s) that do not exist in the index"
% (num_missing_ids, missing_ids[0])
)
if warn_missing:
logger.warning(
"Skipping %d IDs that do not exist in the index",
num_missing_ids,
)
embeddings = np.array(embeddings)
sample_ids = np.array(sample_ids)
if label_ids is not None:
label_ids = np.array(label_ids)
return embeddings, sample_ids, label_ids
def cleanup(self):
pymilvus.utility.drop_collection(
self.config.collection_name, using=self._alias
)
self._collection = None
def _get_sample_embeddings(self, sample_ids, batch_size=1000):
found_embeddings = []
found_sample_ids = []
if sample_ids is None:
raise ValueError(
"Milvus does not support retrieving all vectors in an index"
)
for batch_ids in fou.iter_batches(sample_ids, batch_size):
response = self._get_embeddings(list(batch_ids))
for r in response:
found_embeddings.append(r["vector"])
found_sample_ids.append(r["sample_id"])
missing_ids = list(set(sample_ids) - set(found_sample_ids))
return found_embeddings, found_sample_ids, None, missing_ids
def _get_patch_embeddings_from_label_ids(self, label_ids, batch_size=1000):
found_embeddings = []
found_sample_ids = []
found_label_ids = []
if label_ids is None:
raise ValueError(
"Milvus does not support retrieving all vectors in an index"
)
for batch_ids in fou.iter_batches(label_ids, batch_size):
response = self._get_embeddings(list(batch_ids))
for r in response:
found_embeddings.append(r["vector"])
found_sample_ids.append(r["sample_id"])
found_label_ids.append(r["pk"])
missing_ids = list(set(label_ids) - set(found_label_ids))
return found_embeddings, found_sample_ids, found_label_ids, missing_ids
def _get_patch_embeddings_from_sample_ids(
self, sample_ids, batch_size=100
):
found_embeddings = []
found_sample_ids = []
found_label_ids = []
query_vector = [0.0] * self._get_dimension()
top_k = min(batch_size, self.config.max_k)
for batch_ids in fou.iter_batches(sample_ids, batch_size):
ids = ['"' + str(entry) + '"' for entry in batch_ids]
expr = f"""pk in [{','.join(ids)}]"""
response = self._collection.search(
data=[query_vector],
anns_field="vector",
param=self.config.search_params,
expr=expr,
limit=top_k,
)
ids = [x.id for x in response[0]]
response = self._get_embeddings(ids)
for r in response:
found_embeddings.append(r["vector"])
found_sample_ids.append(r["sample_id"])
found_label_ids.append(r["pk"])
missing_ids = list(set(sample_ids) - set(found_sample_ids))
return found_embeddings, found_sample_ids, found_label_ids, missing_ids
def _kneighbors(
self,
query=None,
k=None,
reverse=False,
aggregation=None,
return_dists=False,
):
if query is None:
raise ValueError("Milvus does not support full index neighbors")
if reverse is True:
raise ValueError(
"Milvus does not support least similarity queries"
)
if k is None or k > self.config.max_k:
raise ValueError("Milvus requires k<=%s" % self.config.max_k)
if aggregation not in (None, "mean"):
raise ValueError("Unsupported aggregation '%s'" % aggregation)
query = self._parse_neighbors_query(query)
if aggregation == "mean" and query.ndim == 2:
query = query.mean(axis=0)
single_query = query.ndim == 1
if single_query:
query = [query]
if self.has_view:
if self.config.patches_field is not None:
index_ids = self.current_label_ids
else:
index_ids = self.current_sample_ids
expr = ['"' + str(entry) + '"' for entry in index_ids]
expr = f"""pk in [{','.join(expr)}]"""
else:
expr = None
sample_ids = []
label_ids = [] if self.config.patches_field is not None else None
dists = []
for q in query:
if self.config.patches_field is not None:
output_fields = ["sample_id"]
else:
output_fields = None
response = self._collection.search(
data=[q.tolist()],
anns_field="vector",
limit=k,
expr=expr,
param=self.config.search_params,
output_fields=output_fields,
)
if self.config.patches_field is not None:
sample_ids.append(
[r.entity.get("sample_id") for r in response[0]]
)
label_ids.append([r.id for r in response[0]])
else:
sample_ids.append([r.id for r in response[0]])
if return_dists:
dists.append([r.score for r in response[0]])
if single_query:
sample_ids = sample_ids[0]
if label_ids is not None:
label_ids = label_ids[0]
if return_dists:
dists = dists[0]
if return_dists:
return sample_ids, label_ids, dists
return sample_ids, label_ids
def _parse_neighbors_query(self, query):
if etau.is_str(query):
query_ids = [query]
single_query = True
else:
query = np.asarray(query)
# Query by vector(s)
if np.issubdtype(query.dtype, np.number):
return query
query_ids = list(query)
single_query = False
# Query by ID(s)
response = self._get_embeddings(query_ids)
query = np.array([x["vector"] for x in response])
if query.size == 0:
raise ValueError(
"Query IDs %s were not found in the index" % query_ids
)
if single_query:
query = query[0, :]
return query
def _get_dimension(self):
if self._collection is None:
return None
for field in self._collection.describe()["fields"]:
if field["name"] == "vector":
return field["params"]["dim"]
@classmethod
def _from_dict(cls, d, samples, config, brain_key):
return cls(samples, config, brain_key)
================================================
FILE: fiftyone/brain/internal/core/mistakenness.py
================================================
"""
Mistakenness methods.
| Copyright 2017-2026, Voxel51, Inc.
| `voxel51.com <https://voxel51.com/>`_
|
"""
import logging
from math import exp
import numpy as np
from scipy.special import softmax
from scipy.stats import entropy
from fiftyone import ViewField as F
import fiftyone.core.brain as fob
import fiftyone.core.labels as fol
import fiftyone.core.media as fom
import fiftyone.core.utils as fou
import fiftyone.core.validation as fov
logger = logging.getLogger(__name__)
_ALLOWED_TYPES = (
fol.Classification,
fol.Classifications,
fol.Detections,
fol.Polylines,
fol.Keypoints,
fol.TemporalDetections,
)
_MISSED_CONFIDENCE_THRESHOLD = 0.95
_DETECTION_IOU = 0.5
def compute_mistakenness(
samples,
pred_field,
label_field,
mistakenness_field,
missing_field,
spurious_field,
use_logits,
copy_missing,
progress,
):
"""See ``fiftyone/brain/__init__.py``."""
#
# Algorithm
#
# The chance of a mistake is related to how confident the model prediction
# was as well as whether or not the prediction is correct. A prediction
# that is highly confident and incorrect is likely to be a mistake. A
# prediction that is low confidence and incorrect is not likely to be a
# mistake.
#
# Let us compute a confidence measure based on negative entropy of logits:
# $c = -entropy(logits)$. This value is large when there is low uncertainty
# and small when there is high uncertainty. Let us define modulator, $m$,
# based on whether or not the answer is correct. $m = -1$ when the label is
# correct and $1$ otherwise. Then, mistakenness is computed as
# $(m * exp(c) + 1) / 2$ so that high confidence correct predictions result
# in low mistakenness, high confidence incorrect predictions result in high
# mistakenness, and low confidence predictions result in middling
# mistakenness.
#
fov.validate_collection_label_fields(
samples, (pred_field, label_field), _ALLOWED_TYPES, same_type=True
)
if samples.media_type == fom.VIDEO:
mistakenness_field, _ = samples._handle_frame_field(mistakenness_field)
missing_field, _ = samples._handle_frame_field(missing_field)
spurious_field, _ = samples._handle_frame_field(spurious_field)
is_objects = samples._is_label_field(
pred_field,
(fol.Detections, fol.Polylines, fol.Keypoints, fol.TemporalDetections),
)
if is_objects:
eval_key = _make_eval_key(samples, mistakenness_field)
config = DetectionMistakennessConfig(
pred_field,
label_field,
mistakenness_field,
missing_field,
spurious_field,
use_logits,
copy_missing,
eval_key,
)
else:
eval_key = None
config = ClassificationMistakennessConfig(
pred_field, label_field, mistakenness_field, use_logits
)
brain_key = mistakenness_field
brain_method = config.build()
brain_method.ensure_requirements()
brain_method.register_run(samples, brain_key, cleanup=False)
brain_method.register_samples(samples)
if is_objects:
samples.evaluate_detections(
pred_field,
gt_field=label_field,
eval_key=eval_key,
classwise=False,
iou=_DETECTION_IOU,
progress=progress,
)
view = samples.select_fields([label_field, pred_field])
processing_frames = samples._is_frame_field(label_field)
logger.info("Computing mistakenness...")
for sample in view.iter_samples(progress=progress):
if processing_frames:
images = sample.frames.values()
else:
images = [sample]
sample_mistakenness = []
num_missing = 0
num_spurious = 0
for image in images:
if is_objects:
(
img_mistakenness,
img_missing,
img_spurious,
) = brain_method.process_image(image, eval_key)
num_missing += img_missing
num_spurious += img_spurious
if processing_frames:
image[missing_field] = img_missing
image[spurious_field] = img_spurious
else:
img_mistakenness = brain_method.process_image(image)
if img_mistakenness is not None:
sample_mistakenness.append(img_mistakenness)
if processing_frames:
image[mistakenness_field] = img_mistakenness
if sample_mistakenness:
sample[mistakenness_field] = np.max(sample_mistakenness)
else:
sample[mistakenness_field] = None
if is_objects:
sample[missing_field] = num_missing
sample[spurious_field] = num_spurious
sample.save()
if eval_key is not None:
samples.delete_evaluation(eval_key)
brain_method.save_run_results(samples, brain_key, None)
logger.info("Mistakenness computation complete")
# @todo move to `fiftyone/brain/mistakenness.py`
# Don't do this hastily; `get_brain_info()` on existing datasets has this
# class's full path in it and may need migration
class MistakennessMethodConfig(fob.BrainMethodConfig):
def __init__(self, pred_field, label_field, mistakenness_field, **kwargs):
super().__init__(**kwargs)
self.pred_field = pred_field
self.label_field = label_field
self.mistakenness_field = mistakenness_field
@property
def type(self):
return "mistakenness"
class MistakennessMethod(fob.BrainMethod):
def __init__(self, config):
super().__init__(config)
self.pred_field = None
self.label_field = None
self.label_type = None
def ensure_requirements(self):
pass
def register_samples(self, samples):
self.pred_field, _ = samples._handle_frame_field(
self.config.pred_field
)
self.label_field, _ = samples._handle_frame_field(
self.config.label_field
)
self.label_type = samples._get_label_field_type(self.config.pred_field)
def _validate_run(self, samples, brain_key, existing_info):
self._validate_fields_match(brain_key, "pred_field", existing_info)
self._validate_fields_match(brain_key, "label_field", existing_info)
self._validate_fields_match(
brain_key, "mistakenness_field", existing_info
)
# @todo move to `fiftyone/brain/mistakenness.py`
# Don't do this hastily; `get_brain_info()` on existing datasets has this
# class's full path in it and may need migration
class ClassificationMistakennessConfig(MistakennessMethodConfig):
def __init__(
self, pred_field, label_field, mistakenness_field, use_logits, **kwargs
):
super().__init__(pred_field, label_field, mistakenness_field, **kwargs)
self.use_logits = use_logits
@property
def method(self):
return "classification"
class ClassificationMistakenness(MistakennessMethod):
def process_image(self, sample_or_frame):
use_logits = self.config.use_logits
pred_label, gt_label = _get_data(
sample_or_frame, self.pred_field, self.label_field, use_logits
)
if pred_label is None and gt_label is None:
return None
if pred_label is None or gt_label is None:
m = 1.0
elif isinstance(pred_label, fol.Classifications):
# For multilabel problems, all labels must match
pred_labels = set(c.label for c in pred_label.classifications)
gt_labels = set(c.label for c in gt_label.classifications)
m = float(pred_labels == gt_labels)
else:
m = float(pred_label.label == gt_label.label)
if pred_label is None:
mistakenness = 1.0
elif use_logits:
mistakenness = _compute_mistakenness_class(pred_label.logits, m)
else:
mistakenness = _compute_mistakenness_class_conf(
pred_label.confidence, m
)
return mistakenness
def get_fields(self, samples, brain_key):
pred_field = self.config.pred_field
label_field = self.config.label_field
mistakenness_field = self.config.mistakenness_field
fields = [pred_field, label_field, mistakenness_field]
if samples._is_frame_field(label_field):
fields.append(samples._FRAMES_PREFIX + mistakenness_field)
return fields
def cleanup(self, samples, brain_key):
label_field = self.config.label_field
mistakenness_field = self.config.mistakenness_field
samples._dataset.delete_sample_fields(
mistakenness_field, error_level=1
)
if samples._is_frame_field(label_field):
samples._dataset.delete_frame_fields(
mistakenness_field, error_level=1
)
# @todo move to `fiftyone/brain/mistakenness.py`
# Don't do this hastily; `get_brain_info()` on existing datasets has this
# class's full path in it and may need migration
class DetectionMistakennessConfig(MistakennessMethodConfig):
def __init__(
self,
pred_field,
label_field,
mistakenness_field,
missing_field,
spurious_field,
use_logits,
copy_missing,
eval_key,
**kwargs
):
super().__init__(pred_field, label_field, mistakenness_field, **kwargs)
self.missing_field = missing_field
self.spurious_field = spurious_field
self.use_logits = use_logits
self.copy_missing = copy_missing
self.eval_key = eval_key
@property
def method(self):
return "detection"
class DetectionMistakenness(MistakennessMethod):
def process_image(self, sample_or_frame, eval_key):
missing_field = self.config.missing_field
spurious_field = self.config.spurious_field
mistakenness_field = self.config.mistakenness_field
copy_missing = self.config.copy_missing
use_logits = self.config.use_logits
pred_label, gt_label = _get_data(
sample_or_frame, self.pred_field, self.label_field, use_logits
)
list_field = self.label_type._LABEL_LIST_FIELD
if pred_label is None:
pred_label = self.label_type()
if gt_label is None:
gt_label = self.label_type()
num_spurious = 0
num_missing = 0
missing_objects = {}
image_mistakenness = []
pred_map = {}
for pred_obj in pred_label[list_field]:
pred_map[pred_obj.id] = pred_obj
gt_id = pred_obj[eval_key + "_id"]
conf = pred_obj.confidence
if gt_id == "" and conf > _MISSED_CONFIDENCE_THRESHOLD:
# Unmached FP with high confidence are missing
pred_obj[missing_field] = True
num_missing += 1
missing_objects[pred_obj.id] = pred_obj
for gt_obj in gt_label[list_field]:
# Avoid adding the same unmatched FP predictions upon multiple runs
# of this method
if copy_missing and gt_obj.has_field(missing_field):
if gt_obj.id in missing_objects:
del missing_objects[gt_obj.id]
continue
pred_id = gt_obj[eval_key + "_id"]
if pred_id == "":
# FN may be spurious
gt_obj[spurious_field] = True
num_spurious += 1
else:
# For matched FP, compute mistakenness
iou = gt_obj[eval_key + "_iou"]
pred_obj = pred_map[pred_id]
m = float(gt_obj.label == pred_obj.label)
if use_logits:
mistakenness_class = _compute_mistakenness_class(
pred_obj.logits, m
)
mistakenness_loc = _compute_mistakenness_loc(
pred_obj.logits, iou
)
else:
mistakenness_class = _compute_mistakenness_class_conf(
pred_obj.confidence, m
)
mistakenness_loc = _compute_mistakenness_loc_conf(
pred_obj.confidence, iou
)
gt_obj[mistakenness_field] = mistakenness_class
gt_obj[mistakenness_field + "_loc"] = mistakenness_loc
image_mistakenness.append(mistakenness_class)
if copy_missing:
gt_label[list_field].extend(missing_objects.values())
sample_or_frame[self.label_field] = gt_label
if image_mistakenness:
mistakenness = np.max(image_mistakenness)
else:
mistakenness = -1
return mistakenness, num_missing, num_spurious
def get_fields(self, samples, brain_key):
pred_field = self.config.pred_field
label_field = self.config.label_field
mistakenness_field = self.config.mistakenness_field
missing_field = self.config.missing_field
spurious_field = self.config.spurious_field
label_type = samples._get_label_field_type(pred_field)
list_field = label_type._LABEL_LIST_FIELD
fields = [
mistakenness_field,
missing_field,
spurious_field,
"%s.%s.%s" % (label_field, list_field, mistakenness_field),
"%s.%s.%s_loc" % (label_field, list_field, mistakenness_field),
"%s.%s.%s" % (pred_field, list_field, missing_field),
"%s.%s.%s" % (label_field, list_field, spurious_field),
]
if samples._is_frame_field(pred_field):
fields.extend(
[
samples._FRAMES_PREFIX + mistakenness_field,
samples._FRAMES_PREFIX + missing_field,
samples._FRAMES_PREFIX + spurious_field,
]
)
return fields
def cleanup(self, samples, brain_key):
pred_field = self.config.pred_field
label_field = self.config.label_field
mistakenness_field = self.config.mistakenness_field
missing_field = self.config.missing_field
spurious_field = self.config.spurious_field
eval_key = self.config.eval_key
label_type = samples._get_label_field_type(pred_field)
list_field = label_type._LABEL_LIST_FIELD
pred_field, is_frame_field = samples._handle_frame_field(pred_field)
label_field, _ = samples._handle_frame_field(label_field)
fields = [
mistakenness_field,
missing_field,
spurious_field,
"%s.%s.%s" % (label_field, list_field, mistakenness_field),
"%s.%s.%s_loc" % (label_field, list_field, mistakenness_field),
"%s.%s.%s" % (pred_field, list_field, missing_field),
"%s.%s.%s" % (label_field, list_field, spurious_field),
]
if self.config.copy_missing:
# Remove objects that were added to `label_field`
samples._dataset.filter_labels(
self.config.label_field, F(missing_field).exists(False)
).save()
if is_frame_field:
samples._dataset.delete_sample_fields(
[mistakenness_field, spurious_field, missing_field],
error_level=1,
)
samples._dataset.delete_frame_fields(fields, error_level=1)
else:
samples._dataset.delete_sample_fields(fields, error_level=1)
if eval_key in samples.list_evaluations():
samples.delete_evaluation(eval_key)
def _validate_run(self, samples, brain_key, existing_info):
super()._validate_run(samples, brain_key, existing_info)
self._validate_fields_match(brain_key, "missing_field", existing_info)
self._validate_fields_match(brain_key, "spurious_field", existing_info)
self._validate_fields_match(brain_key, "copy_missing", existing_info)
def _make_eval_key(samples, brain_key):
existing_eval_keys = samples.list_evaluations()
eval_key = brain_key + "_eval"
if eval_key not in existing_eval_keys:
return eval_key
idx = 2
while eval_key + str(idx) in existing_eval_keys:
idx += 1
return eval_key + str(idx)
def _get_data(sample, pred_field, label_field, use_logits):
pred_label = sample[pred_field]
label = sample[label_field]
if pred_label is None:
return pred_label, label
if isinstance(pred_label, fol.Detections):
for det in pred_label.detections:
if det.confidence is None:
raise ValueError(
"Detection '%s' in sample '%s' field '%s' has no "
"confidence" % (det.id, sample.id, pred_field)
)
elif isinstance(pred_label, fol.Polylines):
for poly in pred_label.polylines:
if poly.confidence is None:
raise ValueError(
"Polyline '%s' in sample '%s' field '%s' has no "
"confidence" % (poly.id, sample.id, pred_field)
)
elif use_logits:
if pred_label.logits is None:
raise ValueError(
"Sample '%s' field '%s' has no logits"
% (sample.id, pred_field)
)
else:
if pred_label.confidence is None:
raise ValueError(
"Sample '%s' field '%s' has no confidence"
% (sample.id, pred_field)
)
return pred_label, label
def _compute_mistakenness_class(logits, m):
# constrain m to either 1 (incorrect) or -1 (correct)
m = m * -2.0 + 1.0
c = -1.0 * entropy(softmax(np.asarray(logits)))
mistakenness = (m * exp(c) + 1.0) / 2.0
return mistakenness
def _compute_mistakenness_loc(logits, iou):
# i = 0 for high iou, i = 1 for low iou
i = (1.0 / (1.0 - _DETECTION_IOU)) * (1.0 - iou)
# c = 0 for low confidence, c = 1 for high confidence
c = exp(-1.0 * entropy(softmax(np.asarray(logits))))
# mistakenness = i when c = i, mistakenness = 0.5 if c = 0
# mistakenness is higher with lower IoU and closer to 0 or 1 with higher
# confidence
mistakenness = (c * ((2.0 * i) - 1.0) + 1.0) / 2.0
return mistakenness
def _compute_mistakenness_class_conf(confidence, m):
# constrain m to either 1 (incorrect) or -1 (correct)
m = m * -2.0 + 1.0
mistakenness = (m * confidence + 1.0) / 2.0
return mistakenness
def _compute_mistakenness_loc_conf(confidence, iou):
# i = 0 for high iou, i = 1 for low iou
i = (1.0 / (1.0 - _DETECTION_IOU)) * (1.0 - iou)
# c = 0 for low confidence, c = 1 for high confidence
c = confidence
# mistakenness = i when c = i, mistakenness = 0.5 if c = 0
# mistakenness is higher with lower IoU and closer to 0 or 1 with higher
# confidence
mistakenness = (c * ((2.0 * i) - 1.0) + 1.0) / 2.0
return mistakenness
================================================
FILE: fiftyone/brain/internal/core/mongodb.py
================================================
"""
MongoDB similarity backend.
| Copyright 2017-2026, Voxel51, Inc.
| `voxel51.com <https://voxel51.com/>`_
|
"""
import logging
from bson import ObjectId
import numpy as np
from pymongo.errors import OperationFailure
import eta.core.utils as etau
from fiftyone import ViewField as F
import fiftyone.core.fields as fof
import fiftyone.core.media as fom
import fiftyone.core.utils as fou
import fiftyone.brain.internal.core.utils as fbu
from fiftyone.brain.similarity import (
SimilarityConfig,
Similarity,
SimilarityIndex,
)
logger = logging.getLogger(__name__)
_SUPPORTED_METRICS = {
"cosine": "cosine",
"dotproduct": "dotProduct",
"euclidean": "euclidean",
}
class MongoDBSimilarityConfig(SimilarityConfig):
"""Configuration for a MongoDB similarity instance.
Args:
index_name (None): the name of the MongoDB vector index to use or
create. If none is provided, a new index will be created
metric ("cosine"): the embedding distance metric to use when creating a
new index. Supported values are
``("cosine", "dotproduct", "euclidean")``
**kwargs: keyword arguments for
:class:`fiftyone.brain.similarity.SimilarityConfig`
"""
def __init__(self, index_name=None, metric="cosine", **kwargs):
if kwargs.get("embeddings_field") is None and index_name is None:
raise ValueError(
"You must provide either the name of a field to read/write "
"embeddings for this index by passing the `embeddings` "
"parameter, or you must provide the name of an existing "
"vector search index via the `index_name` parameter"
)
# @todo support this. Will likely require copying embeddings to a new
# collection as vector search indexes do not yet support array fields
if kwargs.get("patches_field") is not None:
raise ValueError(
"The MongoDB backend does not yet support patch embeddings"
)
if metric not in _SUPPORTED_METRICS:
raise ValueError(
"Unsupported metric '%s'. Supported values are %s"
% (metric, tuple(_SUPPORTED_METRICS.keys()))
)
super().__init__(**kwargs)
self.index_name = index_name
self.metric = metric
@property
def method(self):
return "mongodb"
@property
def max_k(self):
return 10000 # MongoDB limit
@property
def supports_least_similarity(self):
return False
@property
def supported_aggregations(self):
return ("mean",)
class MongoDBSimilarity(Similarity):
"""MongoDB similarity factory.
Args:
config: a :class:`MongoDBSimilarityConfig`
"""
def ensure_requirements(self):
#
# https://pymongo.readthedocs.io/en/stable/api/pymongo/collection.html#pymongo.collection.Collection.create_search_index
#
# Could also validate that user is connected to an Atlas cluster here
# eg Atlas clusters generally have hostnames which end in "mongodb.net"
# https://stackoverflow.com/q/73180110
#
fou.ensure_package("pymongo>=4.7")
def ensure_usage_requirements(self):
fou.ensure_package("pymongo>=4.7")
def initialize(self, samples, brain_key):
return MongoDBSimilarityIndex(
samples, self.config, brain_key, backend=self
)
class MongoDBSimilarityIndex(SimilarityIndex):
"""Class for interacting with MongoDB similarity indexes.
Args:
samples: the :class:`fiftyone.core.collections.SampleCollection` used
config: the :class:`MongoDBSimilarityConfig` used
brain_key: the brain key
backend (None): a :class:`MongoDBSimilarity` instance
"""
def __init__(self, samples, config, brain_key, backend=None):
super().__init__(samples, config, brain_key, backend=backend)
self._dataset = samples._dataset
self._sample_ids = None
self._label_ids = None
self._index = None
self._initialize()
@property
def is_external(self):
return False
@property
def total_index_size(self):
if self._sample_ids is not None:
return len(self._sample_ids)
if self._dataset.media_type == fom.GROUP:
samples = self._dataset.select_group_slices(_allow_mixed=True)
else:
samples = self._dataset
patches_field = self.config.patches_field
embeddings_field = self.config.embeddings_field
if patches_field is not None:
_, embeddings_path = self._dataset._get_label_field_path(
patches_field, embeddings_field
)
samples = samples.filter_labels(
patches_field, F(embeddings_field).exists()
)
return samples.count(embeddings_path)
if samples.has_field(embeddings_field):
return samples.exists(embeddings_field).count()
return 0
def _initialize(self):
coll = self._dataset._sample_collection
try:
indexes = {
i["name"]: i
for i in coll.aggregate([{"$listSearchIndexes": {}}])
}
except OperationFailure:
# https://www.mongodb.com/docs/manual/release-notes/7.0/#atlas-search-index-management
# https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-overview
if self.config.index_name is None:
raise ValueError(
"You must be running MongoDB Atlas 7.0 or later in order "
"to use vector search indexes"
)
# Must assume index exists because we can't use pymongo to check...
self._index = True
return
if self.config.index_name is None:
root = self.config.embeddings_field
index_name = fbu.get_unique_name(root, list(indexes.keys()))
self.config.index_name = index_name
self.save_config()
elif self.config.embeddings_field is None:
info = indexes.get(self.config.index_name, None)
if info is None:
raise ValueError(
"Index '%s' does not exist" % self.config.index_name
)
self.config.embeddings_field = next(
iter(info["latestDefinition"]["mappings"]["fields"].keys())
)
self.save_config()
if self.config.index_name in indexes:
# Index already exists
self._index = True
elif self.total_index_size > 0:
# Embeddings already exist but the index hasn't been declared yet
dimension = self._get_dimension()
self._create_index(dimension)
else:
# Index will be created when add_to_index() is called
pass
def _get_dimension(self):
if self._dataset.media_type == fom.GROUP:
samples = self._dataset.select_group_slices(_allow_mixed=True)
else:
samples = self._dataset
patches_field = self.config.patches_field
embeddings_field = self.config.embeddings_field
if patches_field is not None:
_, embeddings_path = self._dataset._get_label_field_path(
patches_field, embeddings_field
)
view = samples.filter_labels(
patches_field, F(embeddings_field).exists()
).limit(1)
embeddings = view.values(embeddings_path, unwind=True)
else:
view = samples.exists(embeddings_field).limit(1)
embeddings = view.values(embeddings_field)
embedding = next(iter(embeddings), None)
if embedding is None:
return None
return len(embedding) # MongoDB requires list fields
def _create_index(self, dimension):
# https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-stage
# https://www.mongodb.com/docs/languages/python/pymongo-driver/current/indexes/atlas-search-index/
from pymongo.operations import SearchIndexModel
field = self._dataset.get_field(self.config.embeddings_field)
if field is not None and not isinstance(field, fof.ListField):
raise ValueError(
"MongoDB vector search indexes require embeddings to be "
"stored in list fields"
)
metric = _SUPPORTED_METRICS[self.config.metric]
fields = [
{
"type": "vector",
"numDimensions": dimension,
"path": self.config.embeddings_field,
"similarity": metric,
},
{
"type": "filter",
"path": "_id",
},
]
"""
if self._dataset.media_type == fom.GROUP:
fields.append(
{
"type": "filter",
"path": self._dataset.group_field + ".name",
}
)
"""
model = SearchIndexModel(
name=self.config.index_name,
type="vectorSearch", # requires pymongo>=4.7
definition={"fields": fields},
)
coll = self._dataset._sample_collection
coll.create_search_index(model=model)
self._index = True
@property
def ready(self):
"""Returns True/False whether the vector search index is ready to be
queried.
"""
if self._index is None:
return False
try:
coll = self._dataset._sample_collection
indexes = {
i["name"]: i
for i in coll.aggregate([{"$listSearchIndexes": {}}])
}
except OperationFailure:
# requires MongoDB Atlas 7.0 or later
return None
info = indexes.get(self.config.index_name, {})
return info.get("status", None) == "READY"
def add_to_index(
self,
embeddings,
sample_ids,
label_ids=None,
overwrite=True,
allow_existing=True,
warn_existing=False,
reload=True,
):
if self._index is None:
self._create_index(embeddings.shape[1])
sample_ids = np.asarray(sample_ids)
label_ids = np.asarray(label_ids) if label_ids is not None else None
if not overwrite or not allow_existing or warn_existing:
if self._sample_ids is not None:
_sample_ids, _label_ids = self._sample_ids, self._label_ids
else:
_sample_ids, _label_ids = self._parse_data(
self._dataset, self.config
)
index_sample_ids, index_label_ids, ii, _ = fbu.add_ids(
sample_ids,
label_ids,
_sample_ids,
_label_ids,
patches_field=self.config.patches_field,
overwrite=overwrite,
allow_existing=allow_existing,
warn_existing=warn_existing,
)
self._sample_ids = index_sample_ids
self._label_ids = index_label_ids
if ii.size == 0:
return
embeddings = embeddings[ii, :]
sample_ids = sample_ids[ii]
label_ids = label_ids[ii] if label_ids is not None else None
else:
index_sample_ids = None
index_label_ids = None
fbu.add_embeddings(
self._dataset,
embeddings.tolist(), # MongoDB requires list fields
sample_ids,
label_ids,
self.config.embeddings_field,
patches_field=self.config.patches_field,
)
if reload:
super().reload()
self._sample_ids = index_sample_ids
self._label_ids = index_label_ids
def remove_from_index(
self,
sample_ids=None,
label_ids=None,
allow_missing=True,
warn_missing=False,
reload=True,
):
if not allow_missing or warn_missing:
if self._sample_ids is not None:
_sample_ids, _label_ids = self._sample_ids, self._label_ids
else:
_sample_ids, _label_ids = self._parse_data(
self._dataset, self.config
)
index_sample_ids, index_label_ids, rm_inds = fbu.remove_ids(
sample_ids,
label_ids,
_sample_ids,
_label_ids,
patches_field=self.config.patches_field,
allow_missing=allow_missing,
warn_missing=warn_missing,
)
self._sample_ids = index_sample_ids
self._label_ids = index_label_ids
if rm_inds.size == 0:
return
if self.config.patches_field is not None:
sample_ids = None
label_ids = _label_ids[rm_inds]
else:
sample_ids = _sample_ids[rm_inds]
label_ids = None
else:
index_sample_ids = None
index_label_ids = None
fbu.remove_embeddings(
self._dataset,
self.config.embeddings_field,
sample_ids=sample_ids,
label_ids=label_ids,
patches_field=self.config.patches_field,
)
if reload:
super().reload()
self._sample_ids = index_sample_ids
self._label_ids = index_label_ids
def get_embeddings(
self,
sample_ids=None,
label_ids=None,
allow_missing=True,
warn_missing=False,
):
if self._dataset.media_type == fom.GROUP:
samples = self._dataset.select_group_slices(_allow_mixed=True)
else:
samples = self._dataset
if sample_ids is not None:
samples = samples.select(sample_ids)
elif label_ids is not None:
if self.config.patches_field is None:
raise ValueError("This index does not support label IDs")
if sample_ids is not None:
logger.warning(
"Ignoring sample IDs when label IDs are provided"
)
samples = samples.select_labels(
ids=label_ids, fields=self.config.patches_field
)
_embeddings, _sample_ids, _label_ids = fbu.get_embeddings(
samples,
patches_field=self.config.patches_field,
embeddings_field=self.config.embeddings_field,
)
if label_ids is not None:
inds = _get_inds(
label_ids,
_label_ids,
"label",
allow_missing,
warn_missing,
)
embeddings = _embeddings[inds, :]
sample_ids = _sample_ids[inds]
label_ids = np.asarray(label_ids)
elif sample_ids is not None:
if etau.is_str(sample_ids):
sample_ids = [sample_ids]
if self.config.patches_field is not None:
sample_ids = set(sample_ids)
bools = [_id in sample_ids for _id in _sample_ids]
inds = np.nonzero(bools)[0]
else:
inds = _get_inds(
sample_ids,
_sample_ids,
"sample",
allow_missing,
warn_missing,
)
embeddings = _embeddings[inds, :]
sample_ids = _sample_ids[inds]
if self.config.patches_field is not None:
label_ids = _label_ids[inds]
else:
label_ids = None
else:
embeddings = _embeddings
sample_ids = _sample_ids
label_ids = _label_ids
return embeddings, sample_ids, label_ids
def reload(self):
self._sample_ids = None
self._label_ids = None
super().reload()
def cleanup(self):
if self._index is None:
return
try:
coll = self._dataset._sample_collection
coll.drop_search_index(self.config.index_name)
except OperationFailure:
# requires MongoDB Atlas 7.0 or later
pass
self._index = None
def _kneighbors(
self,
query=None,
k=None,
reverse=False,
aggregation=None,
return_dists=False,
):
if query is None:
raise ValueError("MongoDB does not support full index neighbors")
if reverse is True:
raise ValueError(
"MongoDB does not support least similarity queries"
)
if aggregation not in (None, "mean"):
raise ValueError(
f"MongoDB does not support {aggregation} aggregation"
)
if k is None:
k = min(self.index_size, self.config.max_k)
# https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-stage
num_candidates = min(10 * k, self.config.max_k)
query = self._parse_neighbors_query(query)
if aggregation == "mean" and query.ndim == 2:
query = query.mean(axis=0)
single_query = query.ndim == 1
if single_query:
query = [query]
if self.has_view:
index_ids = self.current_sample_ids
# if self.config.patches_field is not None:
# index_ids = self.current_label_ids
else:
index_ids = None
dataset = self._dataset
sample_ids = []
label_ids = None
# if self.config.patches_field is not None:
# label_ids = []
dists = []
for q in query:
search = {
"index": self.config.index_name,
"path": self.config.embeddings_field,
"limit": k,
"numCandidates": num_candidates,
"queryVector": q.tolist(),
}
if index_ids is not None:
search["filter"] = {
"_id": {"$in": [ObjectId(_id) for _id in index_ids]}
}
"""
elif dataset.media_type == fom.GROUP:
# $vectorSearch must be the first stage in all pipelines, so we
# have to incorporate slice selection as a $filter
name_fi
gitextract_zu49vpz0/
├── .github/
│ ├── CODEOWNERS
│ ├── dependabot.yml
│ ├── pull_request_template.md
│ └── workflows/
│ └── build.yml
├── .gitignore
├── .pre-commit-config.yaml
├── .prettierrc
├── CONTRIBUTING.md
├── LICENSE
├── MANIFEST.in
├── README.md
├── RELEASING.md
├── STYLE_GUIDE.md
├── fiftyone/
│ ├── __init__.py
│ └── brain/
│ ├── __init__.py
│ ├── config.py
│ ├── internal/
│ │ ├── __init__.py
│ │ ├── core/
│ │ │ ├── __init__.py
│ │ │ ├── duplicates.py
│ │ │ ├── elasticsearch.py
│ │ │ ├── hardness.py
│ │ │ ├── lancedb.py
│ │ │ ├── leaky_splits.py
│ │ │ ├── milvus.py
│ │ │ ├── mistakenness.py
│ │ │ ├── mongodb.py
│ │ │ ├── mosaic.py
│ │ │ ├── pgvector.py
│ │ │ ├── pinecone.py
│ │ │ ├── qdrant.py
│ │ │ ├── redis.py
│ │ │ ├── representativeness.py
│ │ │ ├── sklearn.py
│ │ │ ├── uniqueness.py
│ │ │ ├── utils.py
│ │ │ └── visualization.py
│ │ └── models/
│ │ ├── .gitignore
│ │ ├── __init__.py
│ │ ├── manifest.json
│ │ ├── simple_resnet.py
│ │ └── torch.py
│ ├── similarity.py
│ └── visualization.py
├── install.bat
├── install.sh
├── pylintrc
├── pyproject.toml
├── pytest.ini
├── requirements/
│ ├── build.txt
│ ├── common.txt
│ ├── dev.txt
│ └── prod.txt
├── requirements.txt
├── setup.py
└── tests/
├── README.md
├── intensive/
│ ├── test_interface.py
│ ├── test_similarity.py
│ ├── test_uniqueness.py
│ └── test_visualization.py
├── models/
│ └── test_simple_resnet.py
└── test_uniqueness.py
SYMBOL INDEX (774 symbols across 31 files)
FILE: fiftyone/brain/__init__.py
function compute_hardness (line 28) | def compute_hardness(
function compute_mistakenness (line 67) | def compute_mistakenness(
function compute_uniqueness (line 174) | def compute_uniqueness(
function compute_representativeness (line 275) | def compute_representativeness(
function compute_visualization (line 385) | def compute_visualization(
function compute_similarity (line 563) | def compute_similarity(
function compute_near_duplicates (line 710) | def compute_near_duplicates(
function compute_exact_duplicates (line 815) | def compute_exact_duplicates(
function compute_leaky_splits (line 850) | def compute_leaky_splits(
FILE: fiftyone/brain/config.py
class BrainConfig (line 13) | class BrainConfig(EnvConfig):
method __init__ (line 64) | def __init__(self, d=None):
method _parse_similarity_backends (line 94) | def _parse_similarity_backends(self, d):
method _parse_visualization_methods (line 149) | def _parse_visualization_methods(self, d):
function locate_brain_config (line 205) | def locate_brain_config():
function load_brain_config (line 225) | def load_brain_config():
function _parse_env_value (line 238) | def _parse_env_value(value):
FILE: fiftyone/brain/internal/core/duplicates.py
function compute_near_duplicates (line 29) | def compute_near_duplicates(
function compute_exact_duplicates (line 96) | def compute_exact_duplicates(samples, num_workers, skip_failures, progre...
function _compute_filehashes (line 141) | def _compute_filehashes(samples, method, progress):
function _compute_filehashes_multi (line 151) | def _compute_filehashes_multi(samples, method, num_workers, progress):
function _compute_filehash (line 168) | def _compute_filehash(filepath, method):
function _do_compute_filehash (line 177) | def _do_compute_filehash(args):
FILE: fiftyone/brain/internal/core/elasticsearch.py
class ElasticsearchSimilarityConfig (line 36) | class ElasticsearchSimilarityConfig(SimilarityConfig):
method __init__ (line 59) | def __init__(
method method (line 96) | def method(self):
method hosts (line 100) | def hosts(self):
method hosts (line 104) | def hosts(self, value):
method cloud_id (line 108) | def cloud_id(self):
method cloud_id (line 112) | def cloud_id(self, value):
method username (line 116) | def username(self):
method username (line 120) | def username(self, value):
method password (line 124) | def password(self):
method password (line 128) | def password(self, value):
method api_key (line 132) | def api_key(self):
method api_key (line 136) | def api_key(self, value):
method ca_certs (line 140) | def ca_certs(self):
method ca_certs (line 144) | def ca_certs(self, value):
method bearer_auth (line 148) | def bearer_auth(self):
method bearer_auth (line 152) | def bearer_auth(self, value):
method ssl_assert_fingerprint (line 156) | def ssl_assert_fingerprint(self):
method ssl_assert_fingerprint (line 160) | def ssl_assert_fingerprint(self, value):
method verify_certs (line 164) | def verify_certs(self):
method verify_certs (line 168) | def verify_certs(self, value):
method max_k (line 172) | def max_k(self):
method supports_least_similarity (line 176) | def supports_least_similarity(self):
method supported_aggregations (line 180) | def supported_aggregations(self):
method load_credentials (line 183) | def load_credentials(
class ElasticsearchSimilarity (line 208) | class ElasticsearchSimilarity(Similarity):
method ensure_requirements (line 215) | def ensure_requirements(self):
method ensure_usage_requirements (line 218) | def ensure_usage_requirements(self):
method initialize (line 221) | def initialize(self, samples, brain_key):
class ElasticsearchSimilarityIndex (line 227) | class ElasticsearchSimilarityIndex(SimilarityIndex):
method __init__ (line 237) | def __init__(self, samples, config, brain_key, backend=None):
method total_index_size (line 244) | def total_index_size(self):
method client (line 251) | def client(self):
method _initialize (line 255) | def _initialize(self):
method _get_index_names (line 294) | def _get_index_names(self):
method _get_index_ids (line 297) | def _get_index_ids(self, batch_size=1000):
method _get_dimension (line 326) | def _get_dimension(self):
method _get_metric (line 341) | def _get_metric(self):
method _index_exists (line 359) | def _index_exists(self):
method _create_index (line 365) | def _create_index(self, dimension):
method _get_existing_ids (line 382) | def _get_existing_ids(self, ids):
method add_to_index (line 387) | def add_to_index(
method remove_from_index (line 470) | def remove_from_index(
method get_embeddings (line 512) | def get_embeddings(
method _parse_embeddings_response (line 571) | def _parse_embeddings_response(self, response, label_id=True):
method _get_sample_embeddings (line 586) | def _get_sample_embeddings(self, sample_ids, batch_size=1000):
method _get_patch_embeddings_from_label_ids (line 612) | def _get_patch_embeddings_from_label_ids(self, label_ids, batch_size=1...
method _get_patch_embeddings_from_sample_ids (line 638) | def _get_patch_embeddings_from_sample_ids(
method cleanup (line 667) | def cleanup(self):
method _kneighbors (line 672) | def _kneighbors(
method _parse_neighbors_query (line 763) | def _parse_neighbors_query(self, query):
method _from_dict (line 796) | def _from_dict(cls, d, samples, config, brain_key):
FILE: fiftyone/brain/internal/core/hardness.py
function compute_hardness (line 27) | def compute_hardness(samples, label_field, hardness_field, progress):
class HardnessConfig (line 82) | class HardnessConfig(fob.BrainMethodConfig):
method __init__ (line 83) | def __init__(self, label_field, hardness_field, **kwargs):
method type (line 89) | def type(self):
method method (line 93) | def method(self):
class Hardness (line 97) | class Hardness(fob.BrainMethod):
method __init__ (line 98) | def __init__(self, config):
method ensure_requirements (line 102) | def ensure_requirements(self):
method register_samples (line 105) | def register_samples(self, samples):
method process_image (line 110) | def process_image(self, sample_or_frame):
method get_fields (line 118) | def get_fields(self, samples, brain_key):
method cleanup (line 129) | def cleanup(self, samples, brain_key):
method _validate_run (line 138) | def _validate_run(self, samples, brain_key, existing_info):
function _get_data (line 142) | def _get_data(sample, label_field):
FILE: fiftyone/brain/internal/core/lancedb.py
class LanceDBSimilarityConfig (line 35) | class LanceDBSimilarityConfig(SimilarityConfig):
method __init__ (line 47) | def __init__(
method method (line 69) | def method(self):
method uri (line 73) | def uri(self):
method uri (line 77) | def uri(self, value):
method max_k (line 81) | def max_k(self):
method supports_least_similarity (line 85) | def supports_least_similarity(self):
method supported_aggregations (line 89) | def supported_aggregations(self):
method load_credentials (line 92) | def load_credentials(self, uri=None):
class LanceDBSimilarity (line 96) | class LanceDBSimilarity(Similarity):
method ensure_requirements (line 103) | def ensure_requirements(self):
method ensure_usage_requirements (line 106) | def ensure_usage_requirements(self):
method initialize (line 109) | def initialize(self, samples, brain_key):
class LanceDBSimilarityIndex (line 115) | class LanceDBSimilarityIndex(SimilarityIndex):
method __init__ (line 125) | def __init__(self, samples, config, brain_key, backend=None):
method _initialize (line 131) | def _initialize(self):
method table (line 159) | def table(self):
method total_index_size (line 164) | def total_index_size(self):
method add_to_index (line 170) | def add_to_index(
method remove_from_index (line 253) | def remove_from_index(
method get_embeddings (line 295) | def get_embeddings(
method cleanup (line 382) | def cleanup(self):
method _kneighbors (line 395) | def _kneighbors(
method _parse_neighbors_query (line 470) | def _parse_neighbors_query(self, query):
method _from_dict (line 500) | def _from_dict(cls, d, samples, config, brain_key):
FILE: fiftyone/brain/internal/core/leaky_splits.py
function compute_leaky_splits (line 29) | def compute_leaky_splits(
class LeakySplitsConfig (line 115) | class LeakySplitsConfig(fob.BrainMethodConfig):
method __init__ (line 116) | def __init__(
method type (line 143) | def type(self):
method method (line 147) | def method(self):
class LeakySplits (line 151) | class LeakySplits(fob.BrainMethod):
method initialize (line 152) | def initialize(self, samples, similarity_index, split_views):
method get_fields (line 157) | def get_fields(self, samples, _):
class LeakySplitsIndex (line 165) | class LeakySplitsIndex(fob.BrainResults):
method __init__ (line 166) | def __init__(self, samples, config, similarity_index, split_views):
method split_views (line 178) | def split_views(self):
method thresh (line 183) | def thresh(self):
method leak_ids (line 188) | def leak_ids(self):
method find_leaks (line 194) | def find_leaks(self, thresh):
method leaks_view (line 234) | def leaks_view(self):
method leaks_for_sample (line 246) | def leaks_for_sample(self, sample_or_id):
method no_leaks_view (line 288) | def no_leaks_view(self, view=None):
method tag_leaks (line 303) | def tag_leaks(self, tag="leak"):
method _initialize (line 311) | def _initialize(self):
function _to_split_views (line 355) | def _to_split_views(samples, splits):
FILE: fiftyone/brain/internal/core/milvus.py
class MilvusSimilarityConfig (line 35) | class MilvusSimilarityConfig(SimilarityConfig):
method __init__ (line 64) | def __init__(
method method (line 108) | def method(self):
method uri (line 112) | def uri(self):
method uri (line 116) | def uri(self, value):
method user (line 120) | def user(self):
method user (line 124) | def user(self, value):
method password (line 128) | def password(self):
method password (line 132) | def password(self, value):
method secure (line 136) | def secure(self):
method secure (line 140) | def secure(self, value):
method token (line 144) | def token(self):
method token (line 148) | def token(self, value):
method db_name (line 152) | def db_name(self):
method db_name (line 156) | def db_name(self, value):
method client_key_path (line 160) | def client_key_path(self):
method client_key_path (line 164) | def client_key_path(self, value):
method client_pem_path (line 168) | def client_pem_path(self):
method client_pem_path (line 172) | def client_pem_path(self, value):
method ca_pem_path (line 176) | def ca_pem_path(self):
method ca_pem_path (line 180) | def ca_pem_path(self, value):
method server_pem_path (line 184) | def server_pem_path(self):
method server_pem_path (line 188) | def server_pem_path(self, value):
method server_name (line 192) | def server_name(self):
method server_name (line 196) | def server_name(self, value):
method max_k (line 200) | def max_k(self):
method supports_least_similarity (line 204) | def supports_least_similarity(self):
method supported_aggregations (line 208) | def supported_aggregations(self):
method index_params (line 212) | def index_params(self):
method search_params (line 220) | def search_params(self):
method load_credentials (line 228) | def load_credentials(
class MilvusSimilarity (line 257) | class MilvusSimilarity(Similarity):
method ensure_requirements (line 264) | def ensure_requirements(self):
method ensure_usage_requirements (line 267) | def ensure_usage_requirements(self):
method initialize (line 270) | def initialize(self, samples, brain_key):
class MilvusSimilarityIndex (line 276) | class MilvusSimilarityIndex(SimilarityIndex):
method __init__ (line 286) | def __init__(self, samples, config, brain_key, backend=None):
method _initialize (line 292) | def _initialize(self):
method _create_collection (line 346) | def _create_collection(self, dimension):
method collection (line 379) | def collection(self):
method total_index_size (line 384) | def total_index_size(self):
method add_to_index (line 390) | def add_to_index(
method _get_existing_ids (line 465) | def _get_existing_ids(self, ids):
method _delete_ids (line 470) | def _delete_ids(self, ids):
method _get_embeddings (line 476) | def _get_embeddings(self, ids):
method remove_from_index (line 483) | def remove_from_index(
method get_embeddings (line 521) | def get_embeddings(
method cleanup (line 580) | def cleanup(self):
method _get_sample_embeddings (line 586) | def _get_sample_embeddings(self, sample_ids, batch_size=1000):
method _get_patch_embeddings_from_label_ids (line 606) | def _get_patch_embeddings_from_label_ids(self, label_ids, batch_size=1...
method _get_patch_embeddings_from_sample_ids (line 628) | def _get_patch_embeddings_from_sample_ids(
method _kneighbors (line 659) | def _kneighbors(
method _parse_neighbors_query (line 741) | def _parse_neighbors_query(self, query):
method _get_dimension (line 769) | def _get_dimension(self):
method _from_dict (line 778) | def _from_dict(cls, d, samples, config, brain_key):
FILE: fiftyone/brain/internal/core/mistakenness.py
function compute_mistakenness (line 38) | def compute_mistakenness(
class MistakennessMethodConfig (line 175) | class MistakennessMethodConfig(fob.BrainMethodConfig):
method __init__ (line 176) | def __init__(self, pred_field, label_field, mistakenness_field, **kwar...
method type (line 183) | def type(self):
class MistakennessMethod (line 187) | class MistakennessMethod(fob.BrainMethod):
method __init__ (line 188) | def __init__(self, config):
method ensure_requirements (line 194) | def ensure_requirements(self):
method register_samples (line 197) | def register_samples(self, samples):
method _validate_run (line 206) | def _validate_run(self, samples, brain_key, existing_info):
class ClassificationMistakennessConfig (line 217) | class ClassificationMistakennessConfig(MistakennessMethodConfig):
method __init__ (line 218) | def __init__(
method method (line 225) | def method(self):
class ClassificationMistakenness (line 229) | class ClassificationMistakenness(MistakennessMethod):
method process_image (line 230) | def process_image(self, sample_or_frame):
method get_fields (line 261) | def get_fields(self, samples, brain_key):
method cleanup (line 273) | def cleanup(self, samples, brain_key):
class DetectionMistakennessConfig (line 290) | class DetectionMistakennessConfig(MistakennessMethodConfig):
method __init__ (line 291) | def __init__(
method method (line 311) | def method(self):
class DetectionMistakenness (line 315) | class DetectionMistakenness(MistakennessMethod):
method process_image (line 316) | def process_image(self, sample_or_frame, eval_key):
method get_fields (line 399) | def get_fields(self, samples, brain_key):
method cleanup (line 430) | def cleanup(self, samples, brain_key):
method _validate_run (line 472) | def _validate_run(self, samples, brain_key, existing_info):
function _make_eval_key (line 479) | def _make_eval_key(samples, brain_key):
function _get_data (line 492) | def _get_data(sample, pred_field, label_field, use_logits):
function _compute_mistakenness_class (line 529) | def _compute_mistakenness_class(logits, m):
function _compute_mistakenness_loc (line 539) | def _compute_mistakenness_loc(logits, iou):
function _compute_mistakenness_class_conf (line 554) | def _compute_mistakenness_class_conf(confidence, m):
function _compute_mistakenness_loc_conf (line 563) | def _compute_mistakenness_loc_conf(confidence, iou):
FILE: fiftyone/brain/internal/core/mongodb.py
class MongoDBSimilarityConfig (line 37) | class MongoDBSimilarityConfig(SimilarityConfig):
method __init__ (line 50) | def __init__(self, index_name=None, metric="cosine", **kwargs):
method method (line 78) | def method(self):
method max_k (line 82) | def max_k(self):
method supports_least_similarity (line 86) | def supports_least_similarity(self):
method supported_aggregations (line 90) | def supported_aggregations(self):
class MongoDBSimilarity (line 94) | class MongoDBSimilarity(Similarity):
method ensure_requirements (line 101) | def ensure_requirements(self):
method ensure_usage_requirements (line 111) | def ensure_usage_requirements(self):
method initialize (line 114) | def initialize(self, samples, brain_key):
class MongoDBSimilarityIndex (line 120) | class MongoDBSimilarityIndex(SimilarityIndex):
method __init__ (line 130) | def __init__(self, samples, config, brain_key, backend=None):
method is_external (line 140) | def is_external(self):
method total_index_size (line 144) | def total_index_size(self):
method _initialize (line 170) | def _initialize(self):
method _get_dimension (line 221) | def _get_dimension(self):
method _create_index (line 248) | def _create_index(self, dimension):
method ready (line 297) | def ready(self):
method add_to_index (line 317) | def add_to_index(
method remove_from_index (line 380) | def remove_from_index(
method get_embeddings (line 436) | def get_embeddings(
method reload (line 511) | def reload(self):
method cleanup (line 517) | def cleanup(self):
method _kneighbors (line 530) | def _kneighbors(
method _parse_neighbors_query (line 658) | def _parse_neighbors_query(self, query):
method _get_embeddings (line 692) | def _get_embeddings(self, query_ids):
method _parse_data (line 715) | def _parse_data(samples, config):
method _from_dict (line 729) | def _from_dict(cls, d, samples, config, brain_key):
function _get_inds (line 733) | def _get_inds(ids, index_ids, ftype, allow_missing, warn_missing):
FILE: fiftyone/brain/internal/core/mosaic.py
class MosaicSimilarityConfig (line 28) | class MosaicSimilarityConfig(SimilarityConfig):
method __init__ (line 44) | def __init__(
method method (line 70) | def method(self):
method workspace_url (line 74) | def workspace_url(self):
method workspace_url (line 78) | def workspace_url(self, workspace_url):
method service_principal_client_id (line 82) | def service_principal_client_id(self):
method service_principal_client_id (line 86) | def service_principal_client_id(self, service_principal_client_id):
method service_principal_client_secret (line 90) | def service_principal_client_secret(self):
method service_principal_client_secret (line 94) | def service_principal_client_secret(self, service_principal_client_sec...
method personal_access_token (line 98) | def personal_access_token(self):
method personal_access_token (line 102) | def personal_access_token(self, personal_access_token):
method max_k (line 106) | def max_k(self):
method supports_least_similarity (line 110) | def supports_least_similarity(self):
method supported_aggregations (line 114) | def supported_aggregations(self):
method load_credentials (line 117) | def load_credentials(
class MosaicSimilarity (line 132) | class MosaicSimilarity(Similarity):
method ensure_requirements (line 139) | def ensure_requirements(self):
method ensure_usage_requirements (line 142) | def ensure_usage_requirements(self):
method initialize (line 145) | def initialize(self, samples, brain_key):
class MosaicSimilarityIndex (line 151) | class MosaicSimilarityIndex(SimilarityIndex):
method __init__ (line 161) | def __init__(self, samples, config, brain_key, backend=None):
method _initialize (line 167) | def _initialize(self):
method _create_index (line 211) | def _create_index(self, dimension):
method client (line 226) | def client(self):
method total_index_size (line 231) | def total_index_size(self):
method add_to_index (line 236) | def add_to_index(
method _get_index_ids (line 304) | def _get_index_ids(self, batch_size=200):
method _get_values (line 320) | def _get_values(self, ids, batch_size=200):
method remove_from_index (line 341) | def remove_from_index(
method get_embeddings (line 379) | def get_embeddings(
method cleanup (line 439) | def cleanup(self):
method _get_sample_embeddings (line 447) | def _get_sample_embeddings(self, sample_ids, batch_size=200):
method _get_patch_embeddings_from_label_ids (line 476) | def _get_patch_embeddings_from_label_ids(self, label_ids, batch_size=2...
method _get_patch_embeddings_from_sample_ids (line 509) | def _get_patch_embeddings_from_sample_ids(
method _kneighbors (line 541) | def _kneighbors(
method _parse_neighbors_query (line 624) | def _parse_neighbors_query(self, query):
method _from_dict (line 652) | def _from_dict(cls, d, samples, config, brain_key):
FILE: fiftyone/brain/internal/core/pgvector.py
class PgVectorSimilarityConfig (line 38) | class PgVectorSimilarityConfig(SimilarityConfig):
method __init__ (line 61) | def __init__(
method method (line 96) | def method(self):
method connection_string (line 100) | def connection_string(self):
method connection_string (line 104) | def connection_string(self, connection_string):
method max_k (line 108) | def max_k(self):
method supports_least_similarity (line 112) | def supports_least_similarity(self):
method supported_aggregations (line 116) | def supported_aggregations(self):
method load_credentials (line 119) | def load_credentials(
class PgVectorSimilarity (line 126) | class PgVectorSimilarity(Similarity):
method ensure_requirements (line 133) | def ensure_requirements(self):
method ensure_usage_requirements (line 136) | def ensure_usage_requirements(self):
method initialize (line 139) | def initialize(self, samples, brain_key):
class PgVectorSimilarityIndex (line 145) | class PgVectorSimilarityIndex(SimilarityIndex):
method __init__ (line 155) | def __init__(self, samples, config, brain_key, backend=None):
method total_index_size (line 162) | def total_index_size(self):
method _initialize (line 174) | def _initialize(self):
method _get_table_names (line 214) | def _get_table_names(self):
method _get_index_names (line 220) | def _get_index_names(self, table_name):
method _create_table (line 226) | def _create_table(self, dimension):
method create_hnsw_index (line 244) | def create_hnsw_index(self):
method _get_index_ids (line 266) | def _get_index_ids(self, batch_size=1000):
method add_to_index (line 284) | def add_to_index(
method remove_from_index (line 375) | def remove_from_index(
method close_connections (line 426) | def close_connections(self):
method get_embeddings_by_id (line 432) | def get_embeddings_by_id(self, sample_ids=None, label_ids=None):
method get_embeddings (line 489) | def get_embeddings(
method _kneighbors (line 559) | def _kneighbors(
method _parse_neighbors_query (line 653) | def _parse_neighbors_query(self, query):
method cleanup (line 679) | def cleanup(self, drop_table=False):
method _from_dict (line 707) | def _from_dict(cls, d, samples, config, brain_key):
FILE: fiftyone/brain/internal/core/pinecone.py
class PineconeSimilarityConfig (line 30) | class PineconeSimilarityConfig(SimilarityConfig):
method __init__ (line 61) | def __init__(
method method (line 101) | def method(self):
method api_key (line 105) | def api_key(self):
method api_key (line 109) | def api_key(self, value):
method cloud (line 113) | def cloud(self):
method cloud (line 117) | def cloud(self, value):
method region (line 121) | def region(self):
method region (line 125) | def region(self, value):
method environment (line 129) | def environment(self):
method environment (line 133) | def environment(self, value):
method max_k (line 137) | def max_k(self):
method supports_least_similarity (line 141) | def supports_least_similarity(self):
method supported_aggregations (line 145) | def supported_aggregations(self):
method load_credentials (line 148) | def load_credentials(
class PineconeSimilarity (line 159) | class PineconeSimilarity(Similarity):
method ensure_requirements (line 166) | def ensure_requirements(self):
method ensure_usage_requirements (line 169) | def ensure_usage_requirements(self):
method initialize (line 172) | def initialize(self, samples, brain_key):
class PineconeSimilarityIndex (line 178) | class PineconeSimilarityIndex(SimilarityIndex):
method __init__ (line 188) | def __init__(self, samples, config, brain_key, backend=None):
method _initialize (line 194) | def _initialize(self):
method _create_index (line 221) | def _create_index(self, dimension):
method index (line 255) | def index(self):
method total_index_size (line 260) | def total_index_size(self):
method ready (line 267) | def ready(self):
method add_to_index (line 272) | def add_to_index(
method remove_from_index (line 352) | def remove_from_index(
method get_embeddings (line 390) | def get_embeddings(
method cleanup (line 449) | def cleanup(self):
method _get_existing_ids (line 453) | def _get_existing_ids(self, ids, batch_size=1000):
method _get_sample_embeddings (line 461) | def _get_sample_embeddings(self, sample_ids, batch_size=1000):
method _get_patch_embeddings_from_label_ids (line 481) | def _get_patch_embeddings_from_label_ids(self, label_ids, batch_size=1...
method _get_patch_embeddings_from_sample_ids (line 503) | def _get_patch_embeddings_from_sample_ids(
method _kneighbors (line 531) | def _kneighbors(
method _parse_neighbors_query (line 606) | def _parse_neighbors_query(self, query):
method _get_dimension (line 634) | def _get_dimension(self):
method _from_dict (line 641) | def _from_dict(cls, d, samples, config, brain_key):
FILE: fiftyone/brain/internal/core/qdrant.py
class QdrantSimilarityConfig (line 35) | class QdrantSimilarityConfig(SimilarityConfig):
method __init__ (line 64) | def __init__(
method method (line 104) | def method(self):
method url (line 108) | def url(self):
method url (line 112) | def url(self, value):
method api_key (line 116) | def api_key(self):
method api_key (line 120) | def api_key(self, value):
method grpc_port (line 124) | def grpc_port(self):
method grpc_port (line 128) | def grpc_port(self, value):
method prefer_grpc (line 132) | def prefer_grpc(self):
method prefer_grpc (line 136) | def prefer_grpc(self, value):
method max_k (line 140) | def max_k(self):
method supports_least_similarity (line 144) | def supports_least_similarity(self):
method supported_aggregations (line 148) | def supported_aggregations(self):
method load_credentials (line 151) | def load_credentials(
class QdrantSimilarity (line 162) | class QdrantSimilarity(Similarity):
method ensure_requirements (line 169) | def ensure_requirements(self):
method ensure_usage_requirements (line 172) | def ensure_usage_requirements(self):
method initialize (line 175) | def initialize(self, samples, brain_key):
class QdrantSimilarityIndex (line 181) | class QdrantSimilarityIndex(SimilarityIndex):
method __init__ (line 191) | def __init__(self, samples, config, brain_key, backend=None):
method _initialize (line 196) | def _initialize(self):
method _get_collection_names (line 232) | def _get_collection_names(self):
method _create_collection (line 235) | def _create_collection(self, dimension):
method _get_index_ids (line 273) | def _get_index_ids(self, batch_size=1000):
method total_index_size (line 291) | def total_index_size(self):
method client (line 298) | def client(self):
method add_to_index (line 302) | def add_to_index(
method remove_from_index (line 380) | def remove_from_index(
method get_embeddings (line 423) | def get_embeddings(
method cleanup (line 482) | def cleanup(self):
method _retrieve_points (line 485) | def _retrieve_points(self, qids, with_vectors=True, with_payload=True):
method _get_sample_embeddings (line 494) | def _get_sample_embeddings(self, sample_ids):
method _get_patch_embeddings_from_label_ids (line 509) | def _get_patch_embeddings_from_label_ids(self, label_ids):
method _get_patch_embeddings_from_sample_ids (line 526) | def _get_patch_embeddings_from_sample_ids(self, sample_ids):
method _kneighbors (line 550) | def _kneighbors(
method _parse_neighbors_query (line 638) | def _parse_neighbors_query(self, query):
method _to_qdrant_id (line 667) | def _to_qdrant_id(self, _id):
method _to_qdrant_ids (line 670) | def _to_qdrant_ids(self, ids):
method _to_fiftyone_id (line 673) | def _to_fiftyone_id(self, qid):
method _to_fiftyone_ids (line 676) | def _to_fiftyone_ids(self, qids):
method _from_dict (line 680) | def _from_dict(cls, d, samples, config, brain_key):
FILE: fiftyone/brain/internal/core/redis.py
class RedisSimilarityConfig (line 34) | class RedisSimilarityConfig(SimilarityConfig):
method __init__ (line 54) | def __init__(
method method (line 86) | def method(self):
method host (line 90) | def host(self):
method host (line 94) | def host(self, value):
method port (line 98) | def port(self):
method port (line 102) | def port(self, value):
method db (line 106) | def db(self):
method db (line 110) | def db(self, value):
method username (line 114) | def username(self):
method username (line 118) | def username(self, value):
method password (line 122) | def password(self):
method password (line 126) | def password(self, value):
method max_k (line 130) | def max_k(self):
method supports_least_similarity (line 134) | def supports_least_similarity(self):
method supported_aggregations (line 138) | def supported_aggregations(self):
method load_credentials (line 141) | def load_credentials(
class RedisSimilarity (line 149) | class RedisSimilarity(Similarity):
method ensure_requirements (line 156) | def ensure_requirements(self):
method ensure_usage_requirements (line 159) | def ensure_usage_requirements(self):
method initialize (line 162) | def initialize(self, samples, brain_key):
class RedisSimilarityIndex (line 168) | class RedisSimilarityIndex(SimilarityIndex):
method __init__ (line 178) | def __init__(self, samples, config, brain_key, backend=None):
method _initialize (line 184) | def _initialize(self):
method _create_index (line 218) | def _create_index(self, dimension):
method client (line 249) | def client(self):
method total_index_size (line 254) | def total_index_size(self):
method add_to_index (line 260) | def add_to_index(
method _get_existing_ids (line 328) | def _get_existing_ids(self, ids):
method _delete_ids (line 331) | def _delete_ids(self, ids):
method _get_values (line 335) | def _get_values(self, ids):
method remove_from_index (line 342) | def remove_from_index(
method get_embeddings (line 380) | def get_embeddings(
method cleanup (line 439) | def cleanup(self):
method _get_sample_embeddings (line 446) | def _get_sample_embeddings(self, sample_ids, batch_size=1000):
method _get_patch_embeddings_from_label_ids (line 464) | def _get_patch_embeddings_from_label_ids(self, label_ids, batch_size=1...
method _get_patch_embeddings_from_sample_ids (line 484) | def _get_patch_embeddings_from_sample_ids(
method _kneighbors (line 505) | def _kneighbors(
method _parse_neighbors_query (line 580) | def _parse_neighbors_query(self, query):
method _from_dict (line 609) | def _from_dict(cls, d, samples, config, brain_key):
FILE: fiftyone/brain/internal/core/representativeness.py
function compute_representativeness (line 39) | def compute_representativeness(
function _compute_representativeness (line 155) | def _compute_representativeness(embeddings, method="cluster-center"):
function _cluster_ranker (line 186) | def _cluster_ranker(
function _adjust_rankings (line 235) | def _adjust_rankings(embeddings, initial_ranking, ball_radius=0.5):
class RepresentativenessConfig (line 261) | class RepresentativenessConfig(fob.BrainMethodConfig):
method __init__ (line 262) | def __init__(
method type (line 289) | def type(self):
method method (line 293) | def method(self):
method _virtual_attributes (line 297) | def _virtual_attributes(cls):
class Representativeness (line 302) | class Representativeness(fob.BrainMethod):
method ensure_requirements (line 303) | def ensure_requirements(self):
method get_fields (line 306) | def get_fields(self, samples, brain_key):
method cleanup (line 316) | def cleanup(self, samples, brain_key):
method _validate_run (line 322) | def _validate_run(self, samples, brain_key, existing_info):
FILE: fiftyone/brain/internal/core/sklearn.py
class SklearnSimilarityConfig (line 40) | class SklearnSimilarityConfig(SimilarityConfig):
method __init__ (line 50) | def __init__(self, metric="cosine", **kwargs):
method method (line 55) | def method(self):
method max_k (line 59) | def max_k(self):
method supports_least_similarity (line 63) | def supports_least_similarity(self):
method supported_aggregations (line 67) | def supported_aggregations(self):
class SklearnSimilarity (line 71) | class SklearnSimilarity(Similarity):
method initialize (line 78) | def initialize(self, samples, brain_key):
class SklearnSimilarityIndex (line 84) | class SklearnSimilarityIndex(SimilarityIndex, DuplicatesMixin):
method __init__ (line 98) | def __init__(
method is_external (line 130) | def is_external(self):
method embeddings (line 134) | def embeddings(self):
method sample_ids (line 138) | def sample_ids(self):
method label_ids (line 142) | def label_ids(self):
method total_index_size (line 146) | def total_index_size(self):
method add_to_index (line 149) | def add_to_index(
method remove_from_index (line 214) | def remove_from_index(
method use_view (line 263) | def use_view(self, *args, **kwargs):
method get_embeddings (line 267) | def get_embeddings(
method reload (line 327) | def reload(self):
method cleanup (line 342) | def cleanup(self):
method attributes (line 345) | def attributes(self):
method _kneighbors (line 353) | def _kneighbors(
method _radius_neighbors (line 408) | def _radius_neighbors(self, query=None, thresh=None, return_dists=False):
method _kneighbors_aggregate (line 454) | def _kneighbors_aggregate(
method _parse_neighbors_query (line 525) | def _parse_neighbors_query(self, query):
method _get_ids_to_inds (line 584) | def _get_ids_to_inds(self, full=False):
method _get_neighbors (line 606) | def _get_neighbors(self, can_use_neighbors=True, can_use_dists=True):
method _format_output (line 618) | def _format_output(
method _parse_data (line 650) | def _parse_data(
method _from_dict (line 678) | def _from_dict(cls, d, samples, config, brain_key):
class NeighborsHelper (line 701) | class NeighborsHelper(object):
method __init__ (line 705) | def __init__(self, embeddings, metric):
method get_neighbors (line 716) | def get_neighbors(
method _same_keep_inds (line 753) | def _same_keep_inds(self, keep_inds):
method _build (line 757) | def _build(
method _build_dists (line 796) | def _build_dists(self, embeddings):
method _build_neighbors (line 810) | def _build_neighbors(self, embeddings):
function _get_inds (line 841) | def _get_inds(ids, index_ids, ftype, allow_missing, warn_missing):
function _nanargmin (line 876) | def _nanargmin(array, k=1):
FILE: fiftyone/brain/internal/core/uniqueness.py
function compute_uniqueness (line 38) | def compute_uniqueness(
function _compute_uniqueness (line 159) | def _compute_uniqueness(
class UniquenessConfig (line 198) | class UniquenessConfig(fob.BrainMethodConfig):
method __init__ (line 199) | def __init__(
method type (line 225) | def type(self):
method method (line 229) | def method(self):
class Uniqueness (line 233) | class Uniqueness(fob.BrainMethod):
method ensure_requirements (line 234) | def ensure_requirements(self):
method get_fields (line 237) | def get_fields(self, samples, brain_key):
method cleanup (line 247) | def cleanup(self, samples, brain_key):
method _validate_run (line 251) | def _validate_run(self, samples, brain_key, existing_info):
FILE: fiftyone/brain/internal/core/utils.py
function parse_data (line 30) | def parse_data(
function _parse_label_data (line 81) | def _parse_label_data(
function get_embeddings_from_index (line 146) | def get_embeddings_from_index(
function get_ids (line 181) | def get_ids(
function filter_ids (line 225) | def filter_ids(
function _get_patch_ids (line 277) | def _get_patch_ids(
function _apply_ref_sample_ids (line 305) | def _apply_ref_sample_ids(sample_ids, label_ids, ref_sample_ids):
function _flatten_list_ids (line 316) | def _flatten_list_ids(sample_ids, label_ids, handle_missing):
function _parse_ids (line 333) | def _parse_ids(ids, index_ids, ftype, allow_missing, warn_missing):
function skip_ids (line 395) | def skip_ids(samples, ids, patches_field=None, warn_existing=False):
function add_ids (line 422) | def add_ids(
function add_embeddings (line 507) | def add_embeddings(
function remove_ids (line 533) | def remove_ids(
function _find_ids (line 576) | def _find_ids(ids, index_ids, allow_missing, warn_missing, ftype):
function remove_embeddings (line 609) | def remove_embeddings(
function filter_values (line 639) | def filter_values(values, keep_inds, patches_field=None):
function get_values (line 664) | def get_values(samples, path_or_expr, ids, patches_field=None):
function parse_data_field (line 673) | def parse_data_field(
function get_embeddings (line 726) | def get_embeddings(
function get_unique_name (line 837) | def get_unique_name(name, ref_names_or_fcn, max_len=None):
function _get_unique_name (line 848) | def _get_unique_name(name, ref_names_or_fcn):
function _get_unique_name_from_list (line 855) | def _get_unique_name_from_list(name, ref_names):
function _get_unique_name_from_function (line 868) | def _get_unique_name_from_function(name, exists_fcn):
function _get_random_characters (line 879) | def _get_random_characters(n):
function _empty_embeddings (line 885) | def _empty_embeddings(patches_field):
function _has_embeddings_field (line 897) | def _has_embeddings_field(samples, embeddings_field, patches_field=None):
function _load_embeddings (line 911) | def _load_embeddings(samples, embeddings_field, patches_field=None):
function _validate_args (line 943) | def _validate_args(samples, patches_field=None, path_or_expr=None):
function _validate_samples_args (line 952) | def _validate_samples_args(samples, path_or_expr=None):
function _validate_patches_args (line 965) | def _validate_patches_args(samples, patches_field, path_or_expr=None):
function _handle_missing_embeddings (line 1003) | def _handle_missing_embeddings(embeddings, samples):
FILE: fiftyone/brain/internal/models/__init__.py
function list_models (line 27) | def list_models():
function list_downloaded_models (line 37) | def list_downloaded_models():
function is_model_downloaded (line 54) | def is_model_downloaded(name):
function download_model (line 69) | def download_model(name, overwrite=False):
function install_model_requirements (line 98) | def install_model_requirements(name, error_level=0):
function ensure_model_requirements (line 116) | def ensure_model_requirements(name, error_level=0):
function load_model (line 135) | def load_model(
function find_model (line 186) | def find_model(name):
function get_model (line 210) | def get_model(name):
function delete_model (line 222) | def delete_model(name):
class HasBrainModel (line 234) | class HasBrainModel(etal.HasPublishedModel):
method download_model_if_necessary (line 239) | def download_model_if_necessary(self):
method _get_model (line 250) | def _get_model(cls, model_name):
function _load_models_manifest (line 254) | def _load_models_manifest():
function _get_model_in_dir (line 258) | def _get_model_in_dir(name):
function _get_model (line 264) | def _get_model(name):
function _get_exact_model (line 271) | def _get_exact_model(name):
function _get_latest_model (line 279) | def _get_latest_model(base_name):
FILE: fiftyone/brain/internal/models/simple_resnet.py
function simple_resnet (line 19) | def simple_resnet(
class Network (line 58) | class Network(nn.Module):
method __init__ (line 59) | def __init__(self, net, input_layer=None, output_layer=None):
method nodes (line 67) | def nodes(self):
method forward (line 70) | def forward(self, inputs):
method half (line 86) | def half(self):
function has_inputs (line 96) | def has_inputs(node):
function build_graph (line 100) | def build_graph(net):
function pipeline (line 116) | def pipeline(net):
class Crop (line 123) | class Crop(namedtuple("Crop", ("h", "w"))):
method __call__ (line 124) | def __call__(self, x, x0, y0):
method options (line 127) | def options(self, shape):
method output_shape (line 135) | def output_shape(self, shape):
class FlipLR (line 140) | class FlipLR(namedtuple("FlipLR", ())):
method __call__ (line 141) | def __call__(self, x, choice):
method options (line 147) | def options(self, shape):
class Cutout (line 151) | class Cutout(namedtuple("Cutout", ("h", "w"))):
method __call__ (line 152) | def __call__(self, x, x0, y0):
method options (line 156) | def options(self, shape):
class PiecewiseLinear (line 165) | class PiecewiseLinear(namedtuple("PiecewiseLinear", ("knots", "vals"))):
method __call__ (line 166) | def __call__(self, t):
class Const (line 170) | class Const(namedtuple("Const", ["val"])):
method __call__ (line 171) | def __call__(self, x):
class Identity (line 175) | class Identity(namedtuple("Identity", [])):
method __call__ (line 176) | def __call__(self, x):
class Add (line 180) | class Add(namedtuple("Add", [])):
method __call__ (line 181) | def __call__(self, x, y):
class AddWeighted (line 185) | class AddWeighted(namedtuple("AddWeighted", ["wx", "wy"])):
method __call__ (line 186) | def __call__(self, x, y):
class Mul (line 190) | class Mul(nn.Module):
method __init__ (line 191) | def __init__(self, weight):
method __call__ (line 195) | def __call__(self, x):
class Flatten (line 199) | class Flatten(nn.Module):
method forward (line 200) | def forward(self, x):
class Concat (line 204) | class Concat(nn.Module):
method forward (line 205) | def forward(self, *xs):
class BatchNorm (line 209) | class BatchNorm(nn.BatchNorm2d):
method __init__ (line 210) | def __init__(
function conv_bn (line 231) | def conv_bn(c_in, c_out):
function residual (line 241) | def residual(c):
function path_iter (line 250) | def path_iter(nested_dict, pfx=()):
FILE: fiftyone/brain/internal/models/torch.py
class TorchImageModelConfig (line 15) | class TorchImageModelConfig(fout.TorchImageModelConfig, HasBrainModel):
method __init__ (line 26) | def __init__(self, d):
class TorchImageModel (line 31) | class TorchImageModel(fout.TorchImageModel):
method _download_model (line 39) | def _download_model(self, config):
method _load_state_dict (line 42) | def _load_state_dict(self, model, config):
FILE: fiftyone/brain/similarity.py
function compute_similarity (line 49) | def compute_similarity(
function _parse_config (line 192) | def _parse_config(name, **kwargs):
class SimilarityConfig (line 224) | class SimilarityConfig(fob.BrainMethodConfig):
method __init__ (line 241) | def __init__(
method type (line 267) | def type(self):
method method (line 271) | def method(self):
method max_k (line 276) | def max_k(self):
method supports_least_similarity (line 283) | def supports_least_similarity(self):
method supported_aggregations (line 290) | def supported_aggregations(self):
method load_credentials (line 300) | def load_credentials(self, **kwargs):
method _load_parameters (line 303) | def _load_parameters(self, **kwargs):
class Similarity (line 315) | class Similarity(fob.BrainMethod):
method initialize (line 322) | def initialize(self, samples, brain_key):
method get_fields (line 334) | def get_fields(self, samples, brain_key):
class SimilarityIndex (line 345) | class SimilarityIndex(fob.BrainResults):
method __init__ (line 355) | def __init__(self, samples, config, brain_key, backend=None):
method __enter__ (line 372) | def __enter__(self):
method __exit__ (line 376) | def __exit__(self, *args):
method config (line 385) | def config(self):
method supports_prompts (line 390) | def supports_prompts(self):
method is_external (line 398) | def is_external(self):
method sample_ids (line 406) | def sample_ids(self):
method label_ids (line 411) | def label_ids(self):
method total_index_size (line 418) | def total_index_size(self):
method has_view (line 427) | def has_view(self):
method view (line 459) | def view(self):
method current_sample_ids (line 469) | def current_sample_ids(self):
method current_label_ids (line 483) | def current_label_ids(self):
method _current_inds (line 498) | def _current_inds(self):
method index_size (line 506) | def index_size(self):
method missing_size (line 516) | def missing_size(self):
method add_to_index (line 527) | def add_to_index(
method remove_from_index (line 556) | def remove_from_index(
method get_embeddings (line 581) | def get_embeddings(
method use_view (line 613) | def use_view(
method _apply_view (line 673) | def _apply_view(self):
method _apply_view_if_necessary (line 700) | def _apply_view_if_necessary(self):
method clear_view (line 704) | def clear_view(self):
method reload (line 711) | def reload(self):
method cleanup (line 721) | def cleanup(self):
method values (line 725) | def values(self, path_or_expr):
method sort_by_similarity (line 756) | def sort_by_similarity(
method _parse_query (line 877) | def _parse_query(self, query):
method _kneighbors (line 918) | def _kneighbors(
method get_model (line 982) | def get_model(self):
method compute_embeddings (line 1001) | def compute_embeddings(
method _from_dict (line 1100) | def _from_dict(cls, d, samples, config, brain_key):
class DuplicatesMixin (line 1116) | class DuplicatesMixin(object):
method __init__ (line 1124) | def __init__(self):
method thresh (line 1131) | def thresh(self):
method unique_ids (line 1138) | def unique_ids(self):
method duplicate_ids (line 1145) | def duplicate_ids(self):
method neighbors_map (line 1152) | def neighbors_map(self):
method _radius_neighbors (line 1158) | def _radius_neighbors(self, query=None, thresh=None, return_dists=False):
method find_duplicates (line 1209) | def find_duplicates(self, thresh=None, fraction=None):
method find_unique (line 1285) | def find_unique(self, count):
method _remove_duplicates_count (line 1318) | def _remove_duplicates_count(self, num_keep, ids, init_thresh=None):
method _remove_duplicates_thresh (line 1373) | def _remove_duplicates_thresh(self, thresh, ids):
method plot_distances (line 1384) | def plot_distances(self, bins=100, log=False, backend="plotly", **kwar...
method duplicates_view (line 1421) | def duplicates_view(
method unique_view (line 1518) | def unique_view(self):
method visualize_duplicates (line 1545) | def visualize_duplicates(self, visualization, backend="plotly", **kwar...
method visualize_unique (line 1623) | def visualize_unique(self, visualization, backend="plotly", **kwargs):
function _unique_no_sort (line 1689) | def _unique_no_sort(values):
function _build_edges (line 1694) | def _build_edges(ids, neighbors_map):
function _plot_distances_plotly (line 1707) | def _plot_distances_plotly(dists, metric, thresh, bins, log, **kwargs):
function _plot_distances_mpl (line 1767) | def _plot_distances_mpl(
FILE: fiftyone/brain/visualization.py
function compute_visualization (line 39) | def compute_visualization(
function values (line 187) | def values(results, path_or_expr):
function visualize (line 200) | def visualize(
function _is_expr (line 251) | def _is_expr(arg):
function _parse_config (line 255) | def _parse_config(name, **kwargs):
function _get_dimension (line 289) | def _get_dimension(points):
function _generate_spatial_index (line 302) | def _generate_spatial_index(
class VisualizationResults (line 345) | class VisualizationResults(fob.BrainResults):
method __init__ (line 359) | def __init__(
method __enter__ (line 393) | def __enter__(self):
method __exit__ (line 397) | def __exit__(self, *args):
method config (line 402) | def config(self):
method index_size (line 407) | def index_size(self):
method total_index_size (line 416) | def total_index_size(self):
method missing_size (line 425) | def missing_size(self):
method current_points (line 440) | def current_points(self):
method current_sample_ids (line 449) | def current_sample_ids(self):
method current_label_ids (line 458) | def current_label_ids(self):
method view (line 468) | def view(self):
method has_spatial_index (line 478) | def has_spatial_index(self):
method use_view (line 486) | def use_view(
method clear_view (line 553) | def clear_view(self):
method values (line 560) | def values(self, path_or_expr):
method visualize (line 582) | def visualize(
method index_points (line 653) | def index_points(
method remove_index (line 701) | def remove_index(self):
method _from_dict (line 722) | def _from_dict(cls, d, samples, config, brain_key):
class VisualizationConfig (line 743) | class VisualizationConfig(fob.BrainMethodConfig):
method __init__ (line 762) | def __init__(
method type (line 789) | def type(self):
class Visualization (line 793) | class Visualization(fob.BrainMethod):
method fit (line 794) | def fit(self, embeddings):
method get_fields (line 797) | def get_fields(self, samples, brain_key):
method rename (line 806) | def rename(self, samples, key, new_key):
method cleanup (line 827) | def cleanup(self, samples, key):
class UMAPVisualizationConfig (line 841) | class UMAPVisualizationConfig(VisualizationConfig):
method __init__ (line 878) | def __init__(
method method (line 911) | def method(self):
class UMAPVisualization (line 915) | class UMAPVisualization(Visualization):
method ensure_requirements (line 916) | def ensure_requirements(self):
method fit (line 927) | def fit(self, embeddings):
class TSNEVisualizationConfig (line 939) | class TSNEVisualizationConfig(VisualizationConfig):
method __init__ (line 987) | def __init__(
method method (line 1026) | def method(self):
class TSNEVisualization (line 1030) | class TSNEVisualization(Visualization):
method fit (line 1031) | def fit(self, embeddings):
class PCAVisualizationConfig (line 1064) | class PCAVisualizationConfig(VisualizationConfig):
method __init__ (line 1090) | def __init__(
method method (line 1117) | def method(self):
class PCAVisualization (line 1121) | class PCAVisualization(Visualization):
method fit (line 1122) | def fit(self, embeddings):
class ManualVisualizationConfig (line 1131) | class ManualVisualizationConfig(VisualizationConfig):
method __init__ (line 1140) | def __init__(self, patches_field=None, num_dims=2, **kwargs):
method method (line 1146) | def method(self):
class ManualVisualization (line 1150) | class ManualVisualization(Visualization):
method fit (line 1151) | def fit(self, embeddings):
FILE: setup.py
function get_version (line 16) | def get_version():
FILE: tests/intensive/test_interface.py
function test_uniqueness (line 15) | def test_uniqueness():
function test_detection_mistakenness (line 27) | def test_detection_mistakenness():
function test_classification_mistakenness_confidence (line 57) | def test_classification_mistakenness_confidence():
function test_classification_mistakenness_logits (line 78) | def test_classification_mistakenness_logits():
function test_hardness (line 101) | def test_hardness():
FILE: tests/intensive/test_similarity.py
function get_custom_backends (line 148) | def get_custom_backends():
function test_brain_config (line 155) | def test_brain_config():
function test_image_similarity_backends (line 199) | def test_image_similarity_backends():
function test_patch_similarity_backends (line 317) | def test_patch_similarity_backends():
function test_qdrant_backend_config (line 446) | def test_qdrant_backend_config():
function test_images (line 489) | def test_images():
function test_images_subset (line 498) | def test_images_subset():
function test_images_missing (line 510) | def test_images_missing():
function test_images_embeddings (line 541) | def test_images_embeddings():
function test_patches (line 601) | def test_patches():
function test_patches_subset (line 612) | def test_patches_subset():
function test_patches_missing (line 631) | def test_patches_missing():
function test_patches_embeddings (line 676) | def test_patches_embeddings():
function _load_images_dataset (line 743) | def _load_images_dataset():
function _load_patches_dataset (line 752) | def _load_patches_dataset():
function _make_images_dataset (line 761) | def _make_images_dataset(name):
function _make_patches_dataset (line 780) | def _make_patches_dataset(name):
function _verify_total_index_size (line 806) | def _verify_total_index_size(index, expected_size, timeout=10, interval=1):
FILE: tests/intensive/test_uniqueness.py
function test_uniqueness (line 15) | def test_uniqueness():
function test_uniqueness_torch (line 19) | def test_uniqueness_torch():
function test_uniqueness_tf (line 24) | def test_uniqueness_tf():
function test_uniqueness_missing (line 29) | def test_uniqueness_missing():
function test_roi_uniqueness (line 48) | def test_roi_uniqueness():
function test_roi_uniqueness_torch (line 52) | def test_roi_uniqueness_torch():
function test_roi_uniqueness_tf (line 57) | def test_roi_uniqueness_tf():
function test_roi_uniqueness_missing (line 62) | def test_roi_uniqueness_missing():
function test_uniqueness_similarity_index (line 87) | def test_uniqueness_similarity_index():
function _run_uniqueness (line 126) | def _run_uniqueness(roi_field=None, model=None, batch_size=None):
FILE: tests/intensive/test_visualization.py
function test_mnist (line 23) | def test_mnist():
function test_images (line 48) | def test_images():
function test_images_subset (line 62) | def test_images_subset():
function test_images_missing (line 79) | def test_images_missing():
function test_patches (line 110) | def test_patches():
function test_patches_subset (line 126) | def test_patches_subset():
function test_patches_missing (line 153) | def test_patches_missing():
function test_points (line 198) | def test_points():
function test_similarity_index (line 245) | def test_similarity_index():
function test_points_field (line 280) | def test_points_field():
function test_points_field_patches (line 324) | def test_points_field_patches():
function test_index_points (line 370) | def test_index_points():
function test_index_points_patches (line 412) | def test_index_points_patches():
function _load_images_dataset (line 457) | def _load_images_dataset():
function _load_patches_dataset (line 466) | def _load_patches_dataset():
function _make_images_dataset (line 475) | def _make_images_dataset(name):
function _make_patches_dataset (line 499) | def _make_patches_dataset(name):
FILE: tests/models/test_simple_resnet.py
function _transpose (line 21) | def _transpose(x, source, target):
function _check_prediction (line 25) | def _check_prediction(actual, expected):
function test_simple_resnet (line 32) | def test_simple_resnet():
FILE: tests/test_uniqueness.py
function test_uniqueness (line 19) | def test_uniqueness():
function test_gray (line 30) | def test_gray():
Condensed preview — 61 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (589K chars).
[
{
"path": ".github/CODEOWNERS",
"chars": 184,
"preview": "* @voxel51/developers\n\n# Aloha!\n.github/ @voxel51/aloha-shirts\npyproject.toml @voxel51/aloha-shirts\nRELEASING.md"
},
{
"path": ".github/dependabot.yml",
"chars": 187,
"preview": "---\nversion: 2\nupdates:\n - package-ecosystem: \"github-actions\"\n directory: \"/\"\n schedule:\n interval: \"weekly"
},
{
"path": ".github/pull_request_template.md",
"chars": 349,
"preview": "# Rationale\n\n<!-- Explain why you are making this change. Describe the problem. -->\n\n## Changes\n\n<!-- Describe the chang"
},
{
"path": ".github/workflows/build.yml",
"chars": 5186,
"preview": "name: Build\n\non:\n pull_request:\n branches:\n - develop\n types: [opened, synchronize]\n push:\n branches:\n "
},
{
"path": ".gitignore",
"chars": 225,
"preview": "__pycache__\n.DS_store\n.ipynb_checkpoints\n\n*~\n*.egg-info\n*.py[cod]\n*.pth\n*.swp\n\n.idea\n.project\n.pydevproject\n\nbuild/\ndist"
},
{
"path": ".pre-commit-config.yaml",
"chars": 882,
"preview": "repos:\n - repo: https://github.com/asottile/blacken-docs\n rev: v1.12.0\n hooks:\n - id: blacken-docs\n a"
},
{
"path": ".prettierrc",
"chars": 253,
"preview": "{\n \"overrides\": [\n {\n \"files\": \"*.md\",\n \"options\": {\n \"printWidth\": 79,\n \"proseWrap\": \"alway"
},
{
"path": "CONTRIBUTING.md",
"chars": 1023,
"preview": "# Contributing to FiftyOne Brain\n\nAll Brain contributions should follow the practices established in\n[FiftyOne](https://"
},
{
"path": "LICENSE",
"chars": 10247,
"preview": "\nApache License\nVersion 2.0, January 2004\nhttp://www.apache.org/licenses/\n\nTERMS AND CONDITIONS FOR USE, REPRODUCTION, A"
},
{
"path": "MANIFEST.in",
"chars": 121,
"preview": "global-include *\n\nprune fiftyone/brain/internal/models/cache/\ninclude fiftyone/brain/internal/models/cache/manifest.json"
},
{
"path": "README.md",
"chars": 3737,
"preview": "<div align=\"center\">\n<p align=\"center\">\n\n<img src=\"https://github.com/user-attachments/assets/17afdf93-289c-40f1-805c-06"
},
{
"path": "RELEASING.md",
"chars": 1473,
"preview": "# Releasing the Brain package\n\n> [!NOTE]\n> These steps are to be performed by authorized Voxel51 engineers.\n\nThe `fiftyo"
},
{
"path": "STYLE_GUIDE.md",
"chars": 154,
"preview": "# FiftyOne Brain Style Guide\n\nThe Brain follows the same style guidelines as\n[FiftyOne](https://github.com/voxel51/fifty"
},
{
"path": "fiftyone/__init__.py",
"chars": 310,
"preview": "from pkgutil import extend_path\n\n#\n# This statement allows multiple `fiftyone.XXX` packages to be installed in the\n# sam"
},
{
"path": "fiftyone/brain/__init__.py",
"chars": 43583,
"preview": "\"\"\"\nThe brains behind FiftyOne: a powerful package for dataset curation, analysis,\nand visualization.\n\nSee https://githu"
},
{
"path": "fiftyone/brain/config.py",
"chars": 8336,
"preview": "\"\"\"\nBrain config.\n\n| Copyright 2017-2026, Voxel51, Inc.\n| `voxel51.com <https://voxel51.com/>`_\n|\n\"\"\"\nimport os\n\nfrom fi"
},
{
"path": "fiftyone/brain/internal/__init__.py",
"chars": 201,
"preview": "\"\"\"\nInternal FiftyOne Brain package.\n\nContains all non-public code powering the ``fiftyone.brain`` public namespace.\n\n| "
},
{
"path": "fiftyone/brain/internal/core/__init__.py",
"chars": 87,
"preview": "\"\"\"\n| Copyright 2017-2026, Voxel51, Inc.\n| `voxel51.com <https://voxel51.com/>`_\n|\n\"\"\"\n"
},
{
"path": "fiftyone/brain/internal/core/duplicates.py",
"chars": 4759,
"preview": "\"\"\"\nDuplicates methods.\n\n| Copyright 2017-2026, Voxel51, Inc.\n| `voxel51.com <https://voxel51.com/>`_\n|\n\"\"\"\nfrom collect"
},
{
"path": "fiftyone/brain/internal/core/elasticsearch.py",
"chars": 24250,
"preview": "\"\"\"\nElastisearch similarity backend.\n\n| Copyright 2017-2026, Voxel51, Inc.\n| `voxel51.com <https://voxel51.com/>`_\n|\n\"\"\""
},
{
"path": "fiftyone/brain/internal/core/hardness.py",
"chars": 4142,
"preview": "\"\"\"\nHardness methods.\n\n| Copyright 2017-2026, Voxel51, Inc.\n| `voxel51.com <https://voxel51.com/>`_\n|\n\"\"\"\nimport logging"
},
{
"path": "fiftyone/brain/internal/core/lancedb.py",
"chars": 14901,
"preview": "\"\"\"\nLanceDB similarity backend.\n\n| Copyright 2017-2026, Voxel51, Inc.\n| `voxel51.com <https://voxel51.com/>`_\n|\n\"\"\"\nimpo"
},
{
"path": "fiftyone/brain/internal/core/leaky_splits.py",
"chars": 10985,
"preview": "\"\"\"\nFinds leaks between splits.\n\n| Copyright 2017-2026, Voxel51, Inc.\n| `voxel51.com <https://voxel51.com/>`_\n|\n\"\"\"\nimpo"
},
{
"path": "fiftyone/brain/internal/core/milvus.py",
"chars": 23042,
"preview": "\"\"\"\nMilvus similarity backend.\n\n| Copyright 2017-2026, Voxel51, Inc.\n| `voxel51.com <https://voxel51.com/>`_\n|\n\"\"\"\nimpor"
},
{
"path": "fiftyone/brain/internal/core/mistakenness.py",
"chars": 19233,
"preview": "\"\"\"\nMistakenness methods.\n\n| Copyright 2017-2026, Voxel51, Inc.\n| `voxel51.com <https://voxel51.com/>`_\n|\n\"\"\"\nimport log"
},
{
"path": "fiftyone/brain/internal/core/mongodb.py",
"chars": 24190,
"preview": "\"\"\"\nMongoDB similarity backend.\n\n| Copyright 2017-2026, Voxel51, Inc.\n| `voxel51.com <https://voxel51.com/>`_\n|\n\"\"\"\nimpo"
},
{
"path": "fiftyone/brain/internal/core/mosaic.py",
"chars": 21896,
"preview": "\"\"\"\nMosaic similarity backend.\n\n| Copyright 2017-2026, Voxel51, Inc.\n| `voxel51.com <https://voxel51.com/>`_\n|\n\"\"\"\nimpor"
},
{
"path": "fiftyone/brain/internal/core/pgvector.py",
"chars": 22937,
"preview": "\"\"\"\nPGVector similarity backend.\n\n| Copyright 2017-2026, Voxel51, Inc.\n| `voxel51.com <https://voxel51.com/>`_\n|\n\"\"\"\nimp"
},
{
"path": "fiftyone/brain/internal/core/pinecone.py",
"chars": 19743,
"preview": "\"\"\"\nPiencone similarity backend.\n\n| Copyright 2017-2026, Voxel51, Inc.\n| `voxel51.com <https://voxel51.com/>`_\n|\n\"\"\"\nimp"
},
{
"path": "fiftyone/brain/internal/core/qdrant.py",
"chars": 21067,
"preview": "\"\"\"\nQdrant similarity backend.\n\n| Copyright 2017-2026, Voxel51, Inc.\n| `voxel51.com <https://voxel51.com/>`_\n|\n\"\"\"\nimpor"
},
{
"path": "fiftyone/brain/internal/core/redis.py",
"chars": 17901,
"preview": "\"\"\"\nRedis similarity backend.\n\n| Copyright 2017-2026, Voxel51, Inc.\n| `voxel51.com <https://voxel51.com/>`_\n|\n\"\"\"\nimport"
},
{
"path": "fiftyone/brain/internal/core/representativeness.py",
"chars": 9558,
"preview": "\"\"\"\nRepresentativeness methods.\n\n| Copyright 2017-2026, Voxel51, Inc.\n| `voxel51.com <https://voxel51.com/>`_\n|\n\"\"\"\nimpo"
},
{
"path": "fiftyone/brain/internal/core/sklearn.py",
"chars": 25745,
"preview": "\"\"\"\nSklearn similarity backend.\n\n| Copyright 2017-2026, Voxel51, Inc.\n| `voxel51.com <https://voxel51.com/>`_\n|\n\"\"\"\nimpo"
},
{
"path": "fiftyone/brain/internal/core/uniqueness.py",
"chars": 7243,
"preview": "\"\"\"\nUniqueness methods.\n\n| Copyright 2017-2026, Voxel51, Inc.\n| `voxel51.com <https://voxel51.com/>`_\n|\n\"\"\"\nimport loggi"
},
{
"path": "fiftyone/brain/internal/core/utils.py",
"chars": 28620,
"preview": "\"\"\"\nUtilities.\n\n| Copyright 2017-2026, Voxel51, Inc.\n| `voxel51.com <https://voxel51.com/>`_\n|\n\"\"\"\nimport itertools\nimpo"
},
{
"path": "fiftyone/brain/internal/core/visualization.py",
"chars": 457,
"preview": "\"\"\"\nVisualization methods.\n\n| Copyright 2017-2026, Voxel51, Inc.\n| `voxel51.com <https://voxel51.com/>`_\n|\n\"\"\"\n\n# For ba"
},
{
"path": "fiftyone/brain/internal/models/.gitignore",
"chars": 7,
"preview": "cache/\n"
},
{
"path": "fiftyone/brain/internal/models/__init__.py",
"chars": 8695,
"preview": "\"\"\"\nBrain models.\n\n| Copyright 2017-2026, Voxel51, Inc.\n| `voxel51.com <https://voxel51.com/>`_\n|\n\"\"\"\nfrom copy import d"
},
{
"path": "fiftyone/brain/internal/models/manifest.json",
"chars": 1309,
"preview": "{\n \"models\": [\n {\n \"base_name\": \"simple-resnet-cifar10\",\n \"base_filename\": \"simple-resne"
},
{
"path": "fiftyone/brain/internal/models/simple_resnet.py",
"chars": 6551,
"preview": "\"\"\"\nImplementation of a simple ResNet that is suitable only for smallish data.\n\nThe original implementation of this is f"
},
{
"path": "fiftyone/brain/internal/models/torch.py",
"chars": 1161,
"preview": "\"\"\"\nPyTorch utilities.\n\n| Copyright 2017-2026, Voxel51, Inc.\n| `voxel51.com <https://voxel51.com/>`_\n|\n\"\"\"\nimport fiftyo"
},
{
"path": "fiftyone/brain/similarity.py",
"chars": 61746,
"preview": "\"\"\"\nSimilarity interface.\n\n| Copyright 2017-2026, Voxel51, Inc.\n| `voxel51.com <https://voxel51.com/>`_\n|\n\"\"\"\nfrom colle"
},
{
"path": "fiftyone/brain/visualization.py",
"chars": 37385,
"preview": "\"\"\"\nVisualization interface.\n\n| Copyright 2017-2026, Voxel51, Inc.\n| `voxel51.com <https://voxel51.com/>`_\n|\n\"\"\"\nfrom co"
},
{
"path": "install.bat",
"chars": 836,
"preview": "@echo off\n:: Installs the `fiftyone-brain` package and its dependencies.\n::\n:: Usage:\n:: .\\install.bat\n::\n:: Copyright 2"
},
{
"path": "install.sh",
"chars": 904,
"preview": "#!/bin/sh\n# Installs the `fiftyone-brain` package and its dependencies.\n#\n# Usage:\n# sh install.sh\n#\n# Copyright 2017-"
},
{
"path": "pylintrc",
"chars": 13398,
"preview": "[MASTER]\n\n# Specify a configuration file.\n#rcfile=\n\n# Python code to execute, usually for sys.path manipulation such as\n"
},
{
"path": "pyproject.toml",
"chars": 84,
"preview": "[tool.black]\nline-length = 79\ninclude = '\\.pyi?$'\nexclude = '''\n/(\n | \\.git\n)/\n'''\n"
},
{
"path": "pytest.ini",
"chars": 512,
"preview": "[pytest]\npython_files = *test*.py\nfilterwarnings =\n ignore:dns.hash module will be removed in future versions:Depreca"
},
{
"path": "requirements/build.txt",
"chars": 38,
"preview": "-r common.txt\n\npytest==5.4.3\ntwine>=3\n"
},
{
"path": "requirements/common.txt",
"chars": 25,
"preview": "numpy\nscipy\nscikit-learn\n"
},
{
"path": "requirements/dev.txt",
"chars": 146,
"preview": "-r common.txt\n\nflickrapi==2.4.0\nimageio==2.8.0\nipython>=7.16.1\npandas\npre-commit==2.0.1\npylint==2.3.1\npytest==7.3.1\ntwin"
},
{
"path": "requirements/prod.txt",
"chars": 14,
"preview": "-r common.txt\n"
},
{
"path": "requirements.txt",
"chars": 25,
"preview": "-r requirements/prod.txt\n"
},
{
"path": "setup.py",
"chars": 2002,
"preview": "#!/usr/bin/env python\n\"\"\"\nInstalls `fiftyone-brain`.\n\n| Copyright 2017-2026, Voxel51, Inc.\n| `voxel51.com <https://voxel"
},
{
"path": "tests/README.md",
"chars": 949,
"preview": "# FiftyOne-Brain Tests\n\nThe brain currently uses both\n[unittest](https://docs.python.org/3/library/unittest.html) and\n[p"
},
{
"path": "tests/intensive/test_interface.py",
"chars": 3711,
"preview": "\"\"\"\nBrain interface tests.\n\n| Copyright 2017-2026, Voxel51, Inc.\n| `voxel51.com <https://voxel51.com/>`_\n|\n\"\"\"\nimport un"
},
{
"path": "tests/intensive/test_similarity.py",
"chars": 24348,
"preview": "\"\"\"\nSimilarity tests.\n\nUsage::\n\n # Optional: specific backends to test\n export SIMILARITY_BACKENDS=qdrant,pinecone"
},
{
"path": "tests/intensive/test_uniqueness.py",
"chars": 3812,
"preview": "\"\"\"\nUniqueness tests.\n\n| Copyright 2017-2026, Voxel51, Inc.\n| `voxel51.com <https://voxel51.com/>`_\n|\n\"\"\"\nimport unittes"
},
{
"path": "tests/intensive/test_visualization.py",
"chars": 13474,
"preview": "\"\"\"\nVisualization tests.\n\nAll of these tests are designed to be run manually via::\n\n pytest tests/intensive/test_visu"
},
{
"path": "tests/models/test_simple_resnet.py",
"chars": 2604,
"preview": "\"\"\"\nTests for :mod:`fiftyone.brain.internal.models.simple_resnet`.\n\n| Copyright 2017-2026, Voxel51, Inc.\n| `voxel51.com "
},
{
"path": "tests/test_uniqueness.py",
"chars": 1284,
"preview": "\"\"\"\nUniqueness tests.\n\n| Copyright 2017-2026, Voxel51, Inc.\n| `voxel51.com <https://voxel51.com/>`_\n|\n\"\"\"\nimport os\nimpo"
}
]
About this extraction
This page contains the full source code of the voxel51/fiftyone-brain GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 61 files (549.0 KB), approximately 124.4k tokens, and a symbol index with 774 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.