Showing preview only (2,179K chars total). Download the full file or copy to clipboard to get everything.
Repository: dholzmueller/pytabkit
Branch: main
Commit: c126ea51187c
Files: 157
Total size: 2.0 MB
Directory structure:
gitextract_xlrx7g0c/
├── .github/
│ └── workflows/
│ └── testing.yml
├── .gitignore
├── .readthedocs.yaml
├── LICENSE.txt
├── README.md
├── docs/
│ ├── Makefile
│ ├── make.bat
│ ├── requirements.txt
│ └── source/
│ ├── bench/
│ │ ├── 00_installation.md
│ │ ├── 01_running_the_benchmark.md
│ │ ├── 02_stored_data.md
│ │ ├── 03_code.md
│ │ ├── adding_models.md
│ │ ├── download_results.md
│ │ ├── refine_then_calibrate.md
│ │ └── using_the_scheduler.md
│ ├── conf.py
│ ├── index.rst
│ └── models/
│ ├── 00_overview.md
│ ├── 01_sklearn_interfaces.rst
│ ├── 02_hpo.md
│ ├── 03_training_implementation.md
│ ├── examples.md
│ ├── nn_classes.md
│ └── quantile_reg.md
├── examples/
│ └── tutorial_notebook.ipynb
├── original_requirements/
│ ├── conda_env_2024_06_25.yml
│ ├── conda_env_2024_10_28.yml
│ ├── conda_env_2025_01_15.yml
│ └── requirements_2024_06_25.txt
├── pyproject.toml
├── pytabkit/
│ ├── __about__.py
│ ├── __init__.py
│ ├── bench/
│ │ ├── __init__.py
│ │ ├── alg_wrappers/
│ │ │ ├── __init__.py
│ │ │ ├── general.py
│ │ │ └── interface_wrappers.py
│ │ ├── data/
│ │ │ ├── __init__.py
│ │ │ ├── common.py
│ │ │ ├── get_uci.py
│ │ │ ├── import_talent_benchmark.py
│ │ │ ├── import_tasks.py
│ │ │ ├── paths.py
│ │ │ ├── tasks.py
│ │ │ └── uci_file_ops.py
│ │ ├── eval/
│ │ │ ├── __init__.py
│ │ │ ├── analysis.py
│ │ │ ├── colors.py
│ │ │ ├── evaluation.py
│ │ │ ├── plotting.py
│ │ │ ├── runtimes.py
│ │ │ └── tables.py
│ │ ├── run/
│ │ │ ├── __init__.py
│ │ │ ├── results.py
│ │ │ └── task_execution.py
│ │ └── scheduling/
│ │ ├── __init__.py
│ │ ├── execution.py
│ │ ├── jobs.py
│ │ ├── resource_manager.py
│ │ ├── resources.py
│ │ └── schedulers.py
│ └── models/
│ ├── __init__.py
│ ├── alg_interfaces/
│ │ ├── __init__.py
│ │ ├── alg_interfaces.py
│ │ ├── autogluon_model_interfaces.py
│ │ ├── base.py
│ │ ├── calibration.py
│ │ ├── catboost_interfaces.py
│ │ ├── ensemble_interfaces.py
│ │ ├── lightgbm_interfaces.py
│ │ ├── nn_interfaces.py
│ │ ├── other_interfaces.py
│ │ ├── resource_computation.py
│ │ ├── resource_params.py
│ │ ├── rtdl_interfaces.py
│ │ ├── sub_split_interfaces.py
│ │ ├── tabm_interface.py
│ │ ├── tabr_interface.py
│ │ ├── xgboost_interfaces.py
│ │ └── xrfm_interfaces.py
│ ├── data/
│ │ ├── __init__.py
│ │ ├── conversion.py
│ │ ├── data.py
│ │ ├── nested_dict.py
│ │ └── splits.py
│ ├── hyper_opt/
│ │ ├── __init__.py
│ │ ├── coord_opt.py
│ │ └── hyper_optimizers.py
│ ├── nn_models/
│ │ ├── __init__.py
│ │ ├── activations.py
│ │ ├── base.py
│ │ ├── categorical.py
│ │ ├── models.py
│ │ ├── nn.py
│ │ ├── pipeline.py
│ │ ├── rtdl_num_embeddings.py
│ │ ├── rtdl_resnet.py
│ │ ├── tabm.py
│ │ ├── tabr.py
│ │ ├── tabr_context_freeze.py
│ │ └── tabr_lib.py
│ ├── optim/
│ │ ├── __init__.py
│ │ ├── adopt.py
│ │ ├── optimizers.py
│ │ └── scheduling_adam.py
│ ├── sklearn/
│ │ ├── __init__.py
│ │ ├── default_params.py
│ │ ├── sklearn_base.py
│ │ └── sklearn_interfaces.py
│ ├── torch_utils.py
│ ├── training/
│ │ ├── __init__.py
│ │ ├── auc_mu.py
│ │ ├── coord.py
│ │ ├── lightning_callbacks.py
│ │ ├── lightning_modules.py
│ │ ├── logging.py
│ │ ├── metrics.py
│ │ ├── nn_creator.py
│ │ └── scheduling.py
│ └── utils.py
├── scripts/
│ ├── analyze_hpo_best_params.py
│ ├── analyze_tasks.py
│ ├── check_missing_values.py
│ ├── copy_algs.py
│ ├── create_plots_and_tables.py
│ ├── create_probclass_plots.py
│ ├── create_xrfm_ablations_table.py
│ ├── custom_paths.py.default
│ ├── download_data.py
│ ├── estimate_resource_params.py
│ ├── get_sklearn_names.py
│ ├── make_plot_animation.py
│ ├── meta_hyperopt.py
│ ├── move_algs.py
│ ├── move_many_algs.py
│ ├── print_complete_results.py
│ ├── print_runtimes.py
│ ├── ray_slurm_launch.py
│ ├── ray_slurm_template.sh
│ ├── rename_alg.py
│ ├── rename_tag.py
│ ├── run_evaluation.py
│ ├── run_experiments.py
│ ├── run_experiments_unused.py
│ ├── run_probclass_experiments.py
│ ├── run_single_task.py
│ ├── run_slurm.py
│ ├── run_time_measurement.py
│ └── run_xrfm_large_ablations.py
└── tests/
├── __init__.py
├── test_bench.py
├── test_ensemble.py
├── test_metrics.py
├── test_rtdl_nns.py
├── test_sklearn_interfaces.py
├── test_tabr.py
└── test_variants.py
================================================
FILE CONTENTS
================================================
================================================
FILE: .github/workflows/testing.yml
================================================
name: 'test'
on:
push:
branches:
- "main"
- "dev"
pull_request:
branches:
- '*'
jobs:
test:
strategy:
fail-fast: false
matrix:
os: [windows-latest, ubuntu-latest, macos-latest]
python-version: ['3.9', '3.10', '3.11', '3.12'] # 3.13 fails on Windows because it doesn't find a ray version
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install uv
uses: astral-sh/setup-uv@v3
with:
# Install a specific version of uv.
version: "0.5.4"
- name: Install hatch
run: uv pip install --system hatch
- name: Install swig
run: uv pip install --system swig
- name: Run tests
run: hatch test # removed codecov upload in v1.7.3
================================================
FILE: .gitignore
================================================
*.pyc
*.pdf
*.zip
*.ckpt
experiments/*/
experiments/trace.json
!experiments/meta_hpo
!experiments/prototypes
public_export
dist
files
plots
lightning_logs
docs/build
docs/source/modules.rst
docs/source/pytabkit.*
.coverage*
.idea
catboost_info
tab_bench_data
rtdl_checkpoints
examples/.ipynb_checkpoints
scripts/custom_paths.py
================================================
FILE: .readthedocs.yaml
================================================
# Read the Docs configuration file for Sphinx projects
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
# Required
version: 2
# Set the OS, Python version and other tools you might need
build:
os: ubuntu-22.04
tools:
python: "3.10"
# You can also specify other tool versions:
# nodejs: "20"
# rust: "1.70"
# golang: "1.20"
jobs:
pre_build:
- sphinx-apidoc -o docs/source/ pytabkit
# Build documentation in the "docs/" directory with Sphinx
sphinx:
configuration: docs/source/conf.py
# You can configure Sphinx to use a different builder, for instance use the dirhtml builder for simpler URLs
# builder: "dirhtml"
builder: "html"
# Fail on all warnings to avoid broken references
# fail_on_warning: true
# Optionally build your docs in additional formats such as PDF and ePub
# formats:
# - pdf
# - epub
# Optional but recommended, declare the Python requirements required
# to build your documentation
# See https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
python:
install:
- requirements: docs/requirements.txt
================================================
FILE: LICENSE.txt
================================================
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "{}"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright {yyyy} {name of copyright owner}
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
================================================
FILE: README.md
================================================
[](https://colab.research.google.com/github/dholzmueller/pytabkit/blob/main/examples/tutorial_notebook.ipynb)
[](https://pytabkit.readthedocs.io/en/latest/)
[](https://github.com/dholzmueller/pytabkit/actions/workflows/testing.yml)
[](https://pypistats.org/packages/pytabkit)
# PyTabKit: Tabular ML models and benchmarking (NeurIPS 2024)
[Paper](https://arxiv.org/abs/2407.04491) | [Documentation](https://pytabkit.readthedocs.io) | [RealMLP-TD-S standalone implementation](https://github.com/dholzmueller/realmlp-td-s_standalone) | [Grinsztajn et al. benchmark code](https://github.com/LeoGrin/tabular-benchmark/tree/better_by_default) | [Data archive](https://doi.org/10.18419/darus-4555) |
|-------------------------------------------|--------------------------------------------------|---------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------|-----------------------------------------------------|
PyTabKit provides **scikit-learn interfaces for modern tabular classification and regression methods**
benchmarked in our [paper](https://arxiv.org/abs/2407.04491), see below.
It also contains the code we used for **benchmarking** these methods
on our benchmarks.

## When (not) to use pytabkit
- **To get the best possible results**:
- Generally we recommend AutoGluon for the best possible results,
though it does not include all the models from pytabkit. AutoGluon 1.4
includes RealMLP (though not in a default configuration) and TabM (in the "extreme" preset for <= 30K samples).
- To get the best possible results from `pytabkit`,
we recommend using
`Ensemble_HPO_Classifier(n_cv=8, use_full_caruana_ensembling=True, use_tabarena_spaces=True, n_hpo_steps=50)`
with a `val_metric_name` corresponding to your target metric
(e.g., `class_error`, `cross_entropy`, `brier`, `1-auc_ovr`), or the corresponding `Regressor`.
(This might take very long to fit.)
- For only a single model, we recommend using
`RealMLP_HPO_Classifier(n_cv=8, hpo_space_name='tabarena-new', use_caruana_ensembling=True, n_hyperopt_steps=50)`,
also with `val_metric_name` as above, or the corresponding `Regressor`.
- **Models**: [TabArena](https://github.com/AutoGluon/tabarena)
also includes some newer models like RealMLP and TabM
with more general preprocessing (missing numericals, text, etc.),
as well as very good boosted tree implementations.
`pytabkit` is currently still easier to use
and supports vectorized cross-validation for RealMLP,
which can significantly speed up the training.
- **Benchmarking**: While pytabkit can be good for quick benchmarking for development,
for method evaluation we recommend [TabArena](https://github.com/AutoGluon/tabarena).
## Installation (new in 1.4.0: optional model dependencies)
```bash
pip install pytabkit[models]
```
- RealMLP (and TabM) can be used without the `[models]` part.
- For xRFM on GPU, faster kernels will be used if you install `kermac[cu12]` or `kermac[cu11]`
(depending on your CUDA version).
- If you want to use **TabR**, you have to manually install
[faiss](https://github.com/facebookresearch/faiss/blob/main/INSTALL.md),
which is only available on **conda**.
- Please install torch separately if you want to control the version (CPU/GPU etc.)
- Use `pytabkit[models,autogluon,extra,hpo,bench,dev]` to install additional dependencies for the other models,
AutoGluon models, extra preprocessing,
hyperparameter optimization methods beyond random search (hyperopt/SMAC),
the benchmarking part, and testing/documentation. For the hpo part,
you might need to install *swig* (e.g. via pip) if the build of *pyrfr* fails.
See also the [documentation](https://pytabkit.readthedocs.io).
To run the data download for the meta-train benchmark, you need one of rar, unrar, or 7-zip
to be installed on the system.
## Using the ML models
Most of our machine learning models are directly available via scikit-learn interfaces.
For example, you can use RealMLP-TD for classification as follows:
```python
from pytabkit import RealMLP_TD_Classifier
model = RealMLP_TD_Classifier() # or TabR_S_D_Classifier, CatBoost_TD_Classifier, etc.
model.fit(X_train, y_train)
model.predict(X_test)
```
The code above will automatically select a GPU if available,
try to detect categorical columns in dataframes,
preprocess numerical variables and regression targets (no standardization required),
and use a training-validation split for early stopping.
All of this (and much more) can be configured through the constructor
and the parameters of the fit() method.
For example, it is possible to do bagging
(ensembling of models on 5-fold cross-validation)
simply by passing `n_cv=5` to the constructor.
Here is an example for some of the parameters that can be set explicitly:
```python
from pytabkit import RealMLP_TD_Classifier
model = RealMLP_TD_Classifier(device='cpu', random_state=0, n_cv=1, n_refit=0,
n_epochs=256, batch_size=256, hidden_sizes=[256] * 3,
val_metric_name='cross_entropy',
use_ls=False, # for metrics like AUC / log-loss
lr=0.04, verbosity=2)
model.fit(X_train, y_train, X_val, y_val, cat_col_names=['Education'])
model.predict_proba(X_test)
```
See [this notebook](https://colab.research.google.com/github/dholzmueller/pytabkit/blob/main/examples/tutorial_notebook.ipynb)
for more examples. Missing numerical values are currently *not* allowed and need to be imputed beforehand.
### Available ML models
Our ML models are available in up to three variants, all with best-epoch selection:
- library defaults (D)
- our tuned defaults (TD)
- random search hyperparameter optimization (HPO),
sometimes also tree parzen estimator (HPO-TPE) or weighted ensembling (Ensemble)
We provide the following ML models:
- **RealMLP** (TD, HPO, Ensemble): Our new neural net models with tuned defaults (TD),
random search hyperparameter optimization (HPO), or Ensembling
- **XGB**, **LGBM**, **CatBoost** (D, TD, HPO, HPO-TPE): Interfaces for gradient-boosted
tree libraries XGBoost, LightGBM, CatBoost
- **MLP**, **ResNet**, **FTT** (D, HPO): Models
from [Revisiting Deep Learning Models for Tabular Data](https://proceedings.neurips.cc/paper_files/paper/2021/hash/9d86d83f925f2149e9edb0ac3b49229c-Abstract.html)
- **MLP-PLR** (D, HPO): MLP with numerical embeddings
from [On Embeddings for Numerical Features in Tabular Deep Learning](https://proceedings.neurips.cc/paper_files/paper/2022/hash/9e9f0ffc3d836836ca96cbf8fe14b105-Abstract-Conference.html)
- **TabR** (D, HPO): TabR model
from [TabR: Tabular Deep Learning Meets Nearest Neighbors](https://openreview.net/forum?id=rhgIgTSSxW)
- **TabM** (D, HPO): TabM model
from [TabM: Advancing Tabular Deep Learning with Parameter-Efficient Ensembling](https://arxiv.org/abs/2410.24210)
- **XRFM** (D, HPO): xRFM model from [here](https://arxiv.org/abs/2508.10053) ([original repo](https://github.com/dmbeaglehole/xRFM))
- **RealTabR** (D): Our new TabR variant with default parameters
- **Ensemble-TD**: Weighted ensemble of all TD models (RealMLP, XGB, LGBM, CatBoost)
## Post-hoc calibration and refinement stopping
For using post-hoc temperature scaling and refinement stopping from our
paper [Rethinking Early Stopping: Refine, Then Calibrate](https://arxiv.org/abs/2501.19195),
you can pass the following parameters to the scikit-learn interfaces:
```python
from pytabkit import RealMLP_TD_Classifier
clf = RealMLP_TD_Classifier(
val_metric_name='ref-ll-ts', # short for 'refinement_logloss_ts-mix_all'
calibration_method='ts-mix', # temperature scaling with laplace smoothing
use_ls=False # recommended for cross-entropy loss
)
```
Other calibration methods and validation metrics
from [probmetrics](https://github.com/dholzmueller/probmetrics)
can be used as well.
For reproducing the results from this paper, we refer to the
[documentation](https://pytabkit.readthedocs.io/en/latest/bench/refine_then_calibrate.html).
## Benchmarking code
Our benchmarking code has functionality for
- dataset download
- running methods highly parallel on single-node/multi-node/multi-GPU hardware,
with automatic scheduling and trying to respect RAM constraints
- analyzing/plotting results
For more details, we refer to the [documentation](https://pytabkit.readthedocs.io).
## Preprocessing code
While many preprocessing methods are implemented in this repository,
a standalone version of our robust scaling + smooth clipping
can be found [here](https://github.com/dholzmueller/realmlp-td-s_standalone/blob/main/preprocessing.py#L65C7-L65C37).
## Citation
If you use this repository for research purposes, please cite our [paper](https://arxiv.org/abs/2407.04491):
```
@inproceedings{holzmuller2024better,
title={Better by default: {S}trong pre-tuned {MLPs} and boosted trees on tabular data},
author={Holzm{\"u}ller, David and Grinsztajn, Leo and Steinwart, Ingo},
booktitle = {Neural {Information} {Processing} {Systems}},
year={2024}
}
```
## Contributors
- David Holzmüller (main developer)
- Léo Grinsztajn (deep learning baselines, plotting)
- Ingo Steinwart (UCI dataset download)
- Katharina Strecker (PyTorch-Lightning interface)
- Daniel Beaglehole (part of the xRFM implementation)
- Lennart Purucker (some features/fixes)
- Jérôme Dockès (deployment, continuous integration)
## Acknowledgements
Code from other repositories is acknowledged as well as possible in code comments.
Especially, we used code from https://github.com/yandex-research/rtdl
and sub-packages (Apache 2.0 license),
code from https://github.com/catboost/benchmarks/
(Apache 2.0 license),
and https://docs.ray.io/en/latest/cluster/vms/user-guides/community/slurm.html
(Apache 2.0 license).
## Releases (see git tags)
- v1.7.3:
- disabled RealMLP lightning log file creation that was accidentally introduced
in predict() in >=v1.7.0.
- removed pynvml dependency.
- v1.7.2:
- Added scikit-learn 1.8 compatibility.
- Removed debug print in RealMLP.
- fixed device memory estimation error in the scheduler when `CUDA_VISIBLE_DEVICES` was used.
- v1.7.1:
- LightGBM now processes the `extra_trees`, `max_cat_to_onehot`, and `min_data_per_group` parameters
used in the `'tabarena'` search space, which should improve results.
- Scikit-learn interfaces for RealMLP (TD, HPO) now support moving the model to a different device
(e.g., before saving). This can be achived using, e.g., `model.to('cpu')` (which is in-place).
- Fixed an xRFM bug in handling binary categorical features.
- v1.7.0:
- added [xRFM](https://arxiv.org/abs/2508.10053) (D, HPO)
- added new `'tabarena-new'` search space for RealMLP-HPO, including per-fold ensembling (more expensive)
and tuning two more categorical hyperparameters
(with [better results](https://github.com/autogluon/tabarena/pull/195))
- reduced RealMLP pickle size by not storing the dataset ([#33](https://github.com/dholzmueller/pytabkit/issues/33))
- fixed gradient clipping for TabM
(it did nothing previously, see [#34](https://github.com/dholzmueller/pytabkit/issues/34)).
To ensure backward compatibility, it is set to None in the HPO search spaces now
(it was already None in the default parameters).
- removed debug print in TabM training loop
- v1.6.1:
- For `n_ens>1`, changed the default behavior for classification to averaging probabilities instead of logits.
This can be reverted by setting `ens_av_before_softmax=True`.
- Implemented time limit for HPO/ensemble methods through `time_limit_s` parameter.
- Support `torch>=2.6` and Python 3.13.
- v1.6.0:
- Added support for other training losses in TabM through the `train_metric_name` parameter,
for example, (multi)quantile regression via `train_metric_name='multi_pinball(0.05,0.95)'`.
- RealMLP-TD now adds the `n_ens` hyperparameter, which can be set to values >1
to train ensembles per train-validation split (called PackedEnsemble in the TabM paper).
This is especially useful when using holdout validation instead of cross-validation ensembles,
and to get more reliable validation predictions and scores for tuning/ensembling.
- fixed RealMLP TabArena search space (`hpo_space_name='tabarena'`) for classification
(allow no label smoothing through `use_ls=False` instead of `use_ls="auto"`).
- v1.5.2: fixed more device bugs for HPO and ensembling
- v1.5.1: fixed a device bug in TabM for GPU
- v1.5.0:
- added `n_repeats` parameter to scikit-learn interfaces for repeated cross-validation
- HPO sklearn interfaces (the ones using random search)
can now do weighted ensembling instead by setting `use_caruana_ensembling=True`.
Removed the `RealMLP_Ensemble_Classifier` and `RealMLP_Ensemble_Regressor` from v1.4.2
since they are now redundant through this feature.
- renamed `space` parameter of GBDT HPO interface
to `hpo_space_name` so now it also works with non-TPE versions.
- Added new [TabArena](https://tabarena.ai) search spaces for boosted trees (not TPE),
which should be almost equivalent to the ones from TabArena
except for the early stopping logic.
- TabM now supports `val_metric_name` for early stopping on different metrics.
- fixed issues #20 and #21 regarding HPO
- small updates for the ["Rethinking Early Stopping" paper](https://arxiv.org/abs/2501.19195)
- v1.4.2:
- fixed handling of custom `val_metric_name` HPO models and `Ensemble_TD_Regressor`.
- if `tmp_folder` is specified in HPO models,
save each model to disk immediately instead of holding all of them in memory.
This can considerably reduce RAM/VRAM usage.
In this case, pickled HPO models will still rely on the models stored in the `tmp_folder`.
- We now provide `RealMLP_Ensemble_Classifier` and `RealMLP_Ensemble_Regressor`,
which will use weighted ensembling and usually perform better than HPO
(but have slower inference time). We recommend using the new `hpo_space_name='tabarena'`
for best results.
- v1.4.1:
- moved dill to optional dependencies
- updated TabM code to a newer version:
added option share_training_batches=False (old version: True),
exclude certain parameters from weight decay.
- added [documentation](https://pytabkit.readthedocs.io/en/latest/bench/using_the_scheduler.html) for using the scheduler with custom jobs.
- fixed bug in RealMLP refitting.
- updated process start method for scheduler to speed up benchmarking
- v1.4.0:
- moved some imports to the new `models` optional dependencies
to have a more light-weight RealMLP installation
- Added GPU support for CatBoost with help from
[Maximilian Schambach](https://github.com/MaxSchambach)
in #16 (not guaranteed to produce exactly the same results).
- Ensembling now saves models after training if a path is supplied, to reduce memory usage
- Added more search spaces
- fixed error in multiquantile output when the passed y was one-dimensional
instead of having shape `(n_samples, 1)`
- Added some examples to the documentation
- v1.3.0:
- Added multiquantile regression for RealMLP:
see the [documentation](https://pytabkit.readthedocs.io/en/latest/models/quantile_reg.html)
- More hyperparameters for RealMLP
- Added [TabICL](github.com/soda-inria/tabicl) wrapper
- Small fixes
- v1.2.1: avoid error for older skorch versions
- v1.2.0:
- Included post-hoc calibration and more metrics through
[probmetrics](https://github.com/dholzmueller/probmetrics).
- Added benchmarking code for [Rethinking Early Stopping: Refine, Then Calibrate](https://arxiv.org/abs/2501.19195).
- Updated format for saving predictions,
allow to stop on multiple metrics during the same training
in the benchmark.
- Better categorical handling,
avoiding an error for string and object columns,
not ignoring boolean columns by default but treating them as
categorical.
- Added Ensemble_HPO_Classifier and Ensemble_HPO_Regressor.
- v1.1.3:
- Fixed a bug where the categorical encoding was incorrect if categories
were missing in the training or validation set. The bug affected XGBoost
and potentially many other models except RealMLP.
- Scikit-learn interfaces now accept and auto-detect categorical datatypes
(category, string, object) in dataframes.
- v1.1.2:
- Some compatibility improvements for scikit-learn 1.6
(but disabled 1.6 since skorch is not compatible with it).
- Improved documentation for Pytorch-Lightning interface.
- Other small bugfixes and improvements.
- v1.1.1:
- Added parameters `weight_decay`, `tfms`,
and `gradient_clipping_norm` to TabM.
The updated default parameters now apply the RTDL quantile transform.
- v1.1.0:
- Included TabM
- Replaced `__` by `_` in parameter names for MLP, MLP-PLR, ResNet, and FTT,
to comply with scikit-learn interface requirements.
- Fixed non-determinism in NN baselines
by initializing the random state of quantile (and KDI)
preprocessing transforms.
- n_threads parameter is not ignored by NNs anymore.
- Changes by [Lennart Purucker](https://github.com/LennartPurucker):
Add time limit for RealMLP,
add support for `lightning` (but also still allowing `pytorch-lightning`),
making skorch a lazy import, removed msgpack\_numpy dependency.
- v1.0.0: Release for the NeurIPS version and arXiv v2+v3.
- More baselines (MLP-PLR, FT-Transformer, TabR-HPO, RF-HPO),
also some un-polished internal interfaces for other methods,
esp. the ones in AutoGluon.
- Updated benchmarking code (configurations, plots)
including the new version of the Grinsztajn et al. benchmark
- Updated fit() parameters in scikit-learn interfaces, etc.
- v0.0.1: First release for arXiv v1.
Code and data are archived at [DaRUS](https://doi.org/10.18419/darus-4255).
================================================
FILE: docs/Makefile
================================================
# Minimal makefile for Sphinx documentation
#
# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = source
BUILDDIR = build
# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
.PHONY: help Makefile
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
================================================
FILE: docs/make.bat
================================================
@ECHO OFF
pushd %~dp0
REM Command file for Sphinx documentation
if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=sphinx-build
)
set SOURCEDIR=source
set BUILDDIR=build
%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
echo.
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
echo.installed, then set the SPHINXBUILD environment variable to point
echo.to the full path of the 'sphinx-build' executable. Alternatively you
echo.may add the Sphinx directory to PATH.
echo.
echo.If you don't have Sphinx installed, grab it from
echo.https://www.sphinx-doc.org/
exit /b 1
)
if "%1" == "" goto help
%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
goto end
:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
:end
popd
================================================
FILE: docs/requirements.txt
================================================
adjustText>=1.0
autorank>=1.0
catboost>=1.2
dask[dataframe]>=2023
dill
fire
lightgbm>=4.1
matplotlib>=3.0
msgpack>=1.0
myst_parser>=3.0
numba>=0.59.0
numpy>=1.25
openml>=0.14
openpyxl>=3.0
pandas>=2.0
patool>=1.0
probmetrics>=0.0.1
psutil>=5.0
pytest-cov>=4.0
pytest>=7.0
pytorch_lightning>=2.0
pyyaml>=5.0
ray>=2.8
requests>=2.0
scikit-learn>=1.3
seaborn>=0.0.13
skorch>=0.15
sphinx>=7.0
sphinx_rtd_theme>=2.0
torch>=2.0
torchmetrics>=1.2.1
tqdm
tueplots>=0.0.12
xgboost>=2.0
xlrd>=2.0
xrfm>=0.4.3
================================================
FILE: docs/source/bench/00_installation.md
================================================
# Overview and Installation of the Benchmarking code
Our benchmarking code contains several features:
- Automatic dataset download
- Running models (parallelized) with automatic scheduling,
trying to respect RAM constraints
- Evaluation and plotting
## Installation
Our code has been tested with python 3.9 and 3.10.
After cloning/forking the repo,
the required libraries can be installed as follows:
```commandline
# in the repo folder:
pip3 install -e .[extra,hpo,bench]
```
Note that the version requirements in our `pyproject.toml`
are somewhat restrictive to avoid problems, they can potentially be relaxed.
To more closely reproduce the installation we used for running the benchmarks,
we refer to the configuration files in the `original_requirements` folder:
- The pip-only requirements in `requirements_2024_06_25.txt`
were used to compute many of the older NN results (not TabR).
- The conda requirements in `conda_env_2024_06_25.yml`
and `conda_env_2024_10_28.yml` were used to compute GBDT-HPO results
and TabR results as well as a few newer NN results.
They can be installed as a new conda environment using
`conda env create -f conda_env_2024_10_28.yml`.
Note that the older of the two conda environments was very slow
for TabR on some datasets
since it uses an older torchmetrics version with slow implementations.
## Using Sphinx Documentation
Go to the repo root dir and run
```commandline
sphinx-apidoc -o docs/source/ pytabkit
sphinx-build -M html docs/source/ docs/build/
```
then open `docs/build/html/index.html`.
================================================
FILE: docs/source/bench/01_running_the_benchmark.md
================================================
# Running the benchmark
## Configuration of data paths
The paths for storing data and results are configured
through the `tab_bench.data.paths.Paths` class.
There are several options to configure which folders are used,
which will be automatically recognized by `Paths.from_env_variables()`:
- **Through environmental variables**:
The base folder can be configured by setting the environmental variable
`TAB_BENCH_DATA_BASE_FOLDER`.
Optionally, some sub-folders can be set separately
(e.g. for moving them to another partition). These are
`TAB_BENCH_DATA_TASKS_FOLDER`, `TAB_BENCH_DATA_RESULTS_FOLDER`,
`TAB_BENCH_DATA_RESULT_SUMMARIES_FOLDER`, `TAB_BENCH_DATA_UCI_DOWNLOAD_FOLDER`.
- **Through a python file**: If `TAB_BENCH_DATA_BASE_FOLDER` is not available,
the code will try to get the base folder (as a string) from
`scripts.custom_paths.get_base_folder()`.
This can be implemented by copying `scripts/custom_paths.py.default` to `scripts/custom_paths.py`
(ignored by git) and adjusting the path therein.
- If neither of the two options above is used,
all data will be stored in `./tab_bench_data`.
## Download datasets
To download all datasets for the meta-train and meta-test benchmarks, run
(with your desired OpenML cache directory, optionally)
```commandline
python3 scripts/download_data.py openml_cache_dir --import_meta_train --import_meta_test --import_grinsztajn_medium
```
To run methods on the benchmarks, there are two options:
## Run experiments with slurm
Our benchmarking code contains its own scheduling code that will start subprocesses
for each algorithm-dataset-split combination.
Therefore, it is in principle possible to run all experiments
through a single slurm job,
though experiments can be divided into smaller pieces by running them separately.
First, in `scripts/ray_slurm_template.sh`,
replace the line `cd ~/git/pytabkit` according to your folder location.
Also, make sure that the data path is specified there
if you want to set it via an environmental variable.
Run the following command (replacing some of the parameters with your own values) on the login node:
```commandline
python3 scripts/ray_slurm_launch.py --exp_name=my_exp_name --num_nodes=num_nodes --queue="queue_name" --time=24:00:00 --mail_user="my@address.edu" --log_folder=log_folder --command="python3 -u scripts/run_slurm.py"
```
This will submit a job to the configured queue that will run `scripts/run_slurm.py` and create logfiles.
Your experiments then have to be configured in `scripts/run_slurm.py`, see below.
Multi-node is supported: `ray` will start instances on each node
and our benchmarking code will schedule the individual experiments on the nodes.
## Run experiments without slurm
Run the file with the corresponding experiments directly.
For example, many of our experiment configurations
can be found in `scripts/run_experiments.py`.
One possible way to run the experiments detached from the shell with log-files is
````commandline
systemd-run --scope --user python3 -u scripts/run_experiments.py > ./out.log 2> ./err.log &
````
## Time measurements
For time measurements, simply run `scripts/run_time_measurements.py` (with or without slurm).
Results can be printed using `scripts/print_runtimes.py`
(but these are averaged total times, not averaged per 1K samples as in the paper).
## Evaluating the benchmark results
Aggregated algorithm results can be printed using
````commandline
python3 scripts/run_evaluation.py meta-train-class
````
where `meta-train-class` can be replaced by the name of any other task collection
(that is stored in the `task_collections` folder in the configured data directory),
or a single dataset such as `openml-class/Higgs`.
This script also has many more command line options, see the python file.
For example, one can print only those methods with a certain tag
using the `--tag` option,
print results on individual datasets, for different metrics, etc.
The parameters are the same as the ones of the following method:
```{eval-rst}
.. autofunction:: scripts.run_evaluation.show_eval
```
## Creating plots and tables
Plots and tables can be created using
````commandline
python3 scripts/create_plots_and_tables.py
````
The plots without missing value datasets require running
```commandline
python3 scripts/check_missing_values.py
```
once beforehand.
## Single-task experiments
You can also run a configuration on a single data set,
without saving the results, by adjusting and running `scripts/run_single_task.py`.
## Other utilities
- Use `scripts/analyze_tasks.py` to print some dataset statistics.
- You can rename a method using `python3 scripts/rename_alg.py old_name new_name`.
- We used some code in `scripts/meta_hyperopt.py` to optimize the default parameters for GBDTs.
- The code in `scripts/estimate_resource_params.py` has been used to get more precise estimates
for RAM usage etc. for running methods on the benchmark.
- `scripts/print_complete_results.py` can be used to check which methods have results available
on all splits for all tasks in a given collection.
================================================
FILE: docs/source/bench/02_stored_data.md
================================================
# Data format
Here, we describe how the main data is stored
inside the main data folder configured in the `tab_bench.data.paths.Paths` object
(see the documentation on running the benchmark).
As file formats, we mostly use `.yaml` (for small, human-readable files),
`.msgpack.gz` (for efficiently storing dicts, lists, etc.), and `.npy`
(standard format for storing numpy arrays).
## Algs folder
The following files are stored in `algs/<alg_name>`,
see `tab_bench.run.task_execution.TabBenchJobManager.add_jobs()`
for details on how they are stored:
- `tags.yaml` contains a list of tags,
which can be used to only load results for algs with certain tags.
- `extended_config.yaml` contains a dictionary with the wrapper parameters,
as well as the alg_name and the wrapper class name.
- `wrapper.pkl`: Optionally, a pickled version (using `dill`) of the wrapper.
(However, our code does not load these as pickle is an unsafe format.)
- `src`: A folder containing the source files at the time of execution, as a backup.
## Tasks folder
We store datasets (tasks) in folders `tasks/<source_name>/<task_name>`,
where source_name and task_name are derived from how the tasks are imported
(see also the `tab_bench.data.tasks.TaskDescription` class).
In each of these folders, we store the following files:
- `x_cont.npy`, `x_cat.npy`, `y.npy` store the three relevant tensors
for the DictDataset
(see the `tab_models` documentation).
- `task_info.yaml` stores the information of a `TaskInfo` object.
## Task collections folder
In `task_collections/<coll_name>.yaml`,
we store the list of tasks that a task collection with name `coll_name` consists of.
## Results folder
We store the results of experiments in the folder
`results/<alg_name>/<source_name>/<task_name>/<k>-fold/<split_type>/<split_idx>`.
Here,
- alg_name is the name given to the method,
- source_name and task_name identify a task,
- k refers to the number of cross-validation folds (training-validation, not test),
- split_type is either `random-split` (usually the case)
or `default-split` (not used in our benchmark),
- split_idx is the index (starting from zero) of the trainval-test-split.
The results are stored in files `metrics.yaml` and `other.msgpack.gz`.
The former contains only the errors in different metrics,
the latter contains other things like predictions (if configured to be saved),
best stopping epoch, and possibly optimized hyperparameters.
These files are stored by `tab_bench.run.results.ResultManager`.
The involved dictionaries are generated by
`tab_models.alg_interfaces.alg_interfaces.AlgInterface.eval()`.
## Result summaries folder
Since loading the results directly can be slow,
we store accumulated versions of them in a more efficient format. Specifically,
`tab_bench.run.task_execution.TabBenchJobManager.run_jobs()` will call
`tab_bench.run.task_execution.results.save_summaries()`, which will generate files
`result_summaries/<alg_name>/<source_name>/<task_name>/<k>-fold/metrics.msgpack.gz`
that contain the metrics results for all splits.
## Other folders
- Plots and LaTeX tables will be saved in the `plots` folder.
- Results of estimating resource prediction parameters
are saved in the `resources` folder.
- Results of time measurements are saved in the `times` folder.
- Downloaded datasets from the UCI repository are saved in the `uci_download` folder.
They can be deleted after the data import in `download_data.py` is completed.
- The `tmp` folder can be used for storing temporary files.
When running experiments, methods can store intermediate results
in a temporary folder in their respective results folder.
================================================
FILE: docs/source/bench/03_code.md
================================================
# Code structure
## Algorithm wrappers
To run methods in `tab_bench`, one needs to
provide them as a subclass of `tab_bench.alg_wrappers.general.AlgWrapper`.
Generally, we use models from the `tab_models` library that implement
the `AlgInterface` from there, and wrap them lightly as an `AlgInterfaceWrapper`
in `tab_bench/alg_wrappers/interface_wrappers.py`,
see the numerous classes there for examples.
As in `tab_models`, we pass parameters to these models via `**kwargs`.
The scikit-learn interfaces in `tab_models` provide in their constructors
a list of the most important hyperparameters.
## Datasets
We represent our datasets using the `DictDataset` class from `tab_models`.
These datasets can be loaded as follows:
```python
from pytabkit.bench.data.paths import Paths
from pytabkit.bench.data.tasks import TaskDescription
paths = Paths.from_env_variables()
task_desc = TaskDescription('openml-reg', 'fifa')
task_info = task_desc.load_info(paths) # a TaskInfo object
task = task_info.load_task(paths)
ds = task.ds # this is the DictDataset object
```
We can convert `ds` to a Pandas DataFrame using `ds.to_df()`.
It is also possible to load a list of all TaskInfo objects
for an entire task collection:
```python
from pytabkit.bench.data.paths import Paths
from pytabkit.bench.data.tasks import TaskCollection
paths = Paths.from_env_variables()
task_infos = TaskCollection.from_name('meta-train-class', paths).load_infos(paths)
```
## Scheduling code
We implement general scheduling code in `tab_bench/scheduling`.
This code can take a list of jobs with certain functionalities
and run them in parallel in a single-node or multi-node setup,
respecting the provided resource requirements
(on RAM usage, number of threads, etc.). It can be used independently as follows:
```python
from typing import List
from pytabkit.bench.scheduling.jobs import AbstractJob
from pytabkit.bench.scheduling.execution import RayJobManager
from pytabkit.bench.scheduling.schedulers import SimpleJobScheduler
jobs: List[AbstractJob] = [] # create a list of jobs here
scheduler = SimpleJobScheduler(RayJobManager())
scheduler.add_jobs(jobs)
scheduler.run()
```
For our tabular benchmarking code,
the `AbstractJob` objects will be created by the
`tab_bench.run.task_execution.TabBenchJobManager`.
Numerous examples for this can be found in `run_final_experiments.py`.
## Resource estimation
## Evaluation and plotting
================================================
FILE: docs/source/bench/adding_models.md
================================================
# Adding your own models to the benchmark
To run your own models,
- implement an `AlgInterface` subclass. There are numerous examples already implemented.
For models that can only run a single train-validation-test split at a time,
you might want to subclass or modify `SklearnSubSplitInterface` from
`pytabkit/models/alg_interfaces/sub_split_interfaces.py`. Examples can be found in
`pytabkit/models/alg_interfaces/other_interfaces.py` or
`pytabkit/models/alg_interfaces/rtdl_interfaces.py`.
- add an `AlgInterfaceWrapper` subclass. This is often just a three-liner
that specifies which AlgInterfaces subclass to instantiate.
See the numerous examples in
`pytabkit/bench/alg_wrappers/interface_wrappers.py`, especially the later ones.
- adjust the code to run your `AlgInterfaceWrapper` on the benchmark,
see `scripts/run_experiments.py` for many examples.
Note that `RunConfig` has an option to save the model predictions
on the whole datasets,
which can significantly increase the disk usage
(can be up to 2 GB per model on the meta-test-class benchmark).
================================================
FILE: docs/source/bench/download_results.md
================================================
# Downloading the benchmark results
The benchmark data (as well as the code)
is archived at [DaRUS](https://doi.org/10.18419/darus-4555).
To download the benchmark data,
- create a folder for the data
(which is then linked in the environmental variable
`TAB_BENCH_DATA_BASE_FOLDER` or in `custom_paths.py`)
- in the folder, unpack `main_no_results.tar.gz`,
this should create the folders `algs`, `result_summaries`, `times`, `plots`,
`task_collections`, and `tasks_only_infos`
(which should be renamed to `tasks` if no `tasks` folder has been created).
Since `result_summaries` stores the main metrics of the results,
this is already enough for plotting/evaluating the results.
- If you want the non-summarized results,
download and unpack `results_small.tar.gz`, which contains the `results` folder
(you might need to rename it from `results_no_gz` to `results`).
However, this does not contain the additional files storing the predictions
and optimal hyperparameters.
- If you want the full results, download and unpack
`results_main.tar.gz` (180 GB!) into the results folder
(overwriting/replacing the contents of `results_small.tar.gz`)
Moreover, there are additional files containing the results
of the individual random search steps
for the different methods,
which could be used for retrospectively optimizing on a different metric etc.
The file `cv_refit.tar.gz` contains the results of the cross-validation/refitting experiments,
which are also somewhat large.
- If you need the datasets (in the `tasks` folder),
you can normally just obtain it by running `scripts/download_data.py`.
However, there is the option to request access to download `tasks.tar.gz` directly.
================================================
FILE: docs/source/bench/refine_then_calibrate.md
================================================
# Reproducing results of "Rethinking Early Stopping: Refine, Then Calibrate"
Here, we document how to reproduce results from our paper [Rethinking Early Stopping: Refine, Then Calibrate](https://arxiv.org/abs/2501.19195).
For general instructions on how to set data paths and use slurm,
we refer to the installation page.
The following will be the parts specific to this paper.
## Installation
```bash
pip install probmetrics[extra] # to get smECE
pip install pytabkit[bench,dev]
```
### Original environment
The original conda environment for exact reproduction
is stored in `original_requirements/conda_env_2025_01_15.yml`.
## Downloading datasets
Download the zipped datasets (`dataset-latest.zip`) of the TALENT benchmark from
[here](https://drive.google.com/drive/folders/1j1zt3zQIo8dO6vkO-K-WE6pSrl71bf0z).
Extract them into a folder. Then, use
```commandline
python3 scripts/download_data.py --import_talent_class_small --talent_folder=<unzipped data folder>
```
where the provided data folder should be the `data` folder inside the unzipped results.
## Running experiments
Experiments can be run using `python3 scripts/run_probclass_experiments.py`,
then plots can be generated using `python3 scripts/create_probclass_plots.py`.
================================================
FILE: docs/source/bench/using_the_scheduler.md
================================================
# Using the scheduler
`pytabkit` includes a flexible scheduler that can schedule jobs within python using `ray` and `multiprocessing`.
Essentially, it is a much fancier version of `multiprocessing.Pool`.
Custom jobs need to provide an estimate of their required resources. The scheduler will
- run as many jobs in parallel as possible on the current hardware while respecting the RAM and resource constraints
- try to run the slowest jobs first, to avoid waiting for a few slow jobs in the end
- measure free CPU RAM in the beginning, and add the fixed RAM that a CPU process uses to the requested RAM.
For processes requesting a GPU, the fixed RAM used by a process using torch CUDA will be added to the requested RAM.
- print info including remaining time estimates after each new started job, failed jobs etc.
(unless the jobs run so fast that multiple ones are started at once).
The time estimates will be based on the time estimates by the jobs,
but they will be adapted by a factor learned based on the actual time taken by already finished jobs.
Hence, the time estimate is only accurate after a few jobs have finished.
It often underestimates the actually needed time to some extent.
(This is probably also due to selection bias, since the estimated longest jobs are run first.)
The scheduler also works on multi-GPU systems,
and it even works on multi-node systems thanks to `ray`'s multi-node support.
See [`ray_slurm_launch.py`](https://github.com/dholzmueller/pytabkit/blob/main/scripts/ray_slurm_launch.py)
and [`ray_slurm_template.sh`](https://github.com/dholzmueller/pytabkit/blob/main/scripts/ray_slurm_template.sh).
To use the scheduler, install `pytabkit[models,bench]`.
Here is some example code:
```python
from pytabkit.models.alg_interfaces.base import RequiredResources
from pytabkit.bench.scheduling.execution import RayJobManager
from pytabkit.bench.scheduling.jobs import AbstractJob
from pytabkit.bench.scheduling.resources import NodeResources
from pytabkit.bench.scheduling.schedulers import SimpleJobScheduler
class CustomJob(AbstractJob):
def get_group(self):
# group name, for all jobs with the same group name
# one joint time multiplier will be fitted in the scheduler
return 'default'
def get_desc(self) -> str:
return 'CustomJob' # name for displaying
def __call__(self, assigned_resources: NodeResources) -> bool:
# the main job, should only use the assigned resources
print(f'Running job with {assigned_resources.get_n_threads()} threads', flush=True)
return True # job finished successfully
def get_required_resources(self) -> RequiredResources:
# Return the resources requested by this job (RAM should be upper bounds, time doesn't need to be)
return RequiredResources(time_s=1.0, n_threads=1, cpu_ram_gb=0.1, n_gpus=0, gpu_ram_gb=0.0, gpu_usage=1.0)
sched = SimpleJobScheduler(RayJobManager(available_gpu_ram_multiplier=0.7))
sched.add_jobs([CustomJob() for _ in range(1000)])
sched.run()
```
================================================
FILE: docs/source/conf.py
================================================
# Configuration file for the Sphinx documentation builder.
#
# For the full list of built-in configuration values, see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html
# -- Project information -----------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information
# following https://stackoverflow.com/questions/10324393/sphinx-build-fail-autodoc-cant-import-find-module
import os
import sys
sys.path.insert(0, os.path.abspath('../..'))
from pytabkit.__about__ import __version__
project = 'pytabkit'
copyright = '2024, David Holzmüller, Léo Grinsztajn, Ingo Steinwart'
author = 'David Holzmüller, Léo Grinsztajn, Ingo Steinwart'
release = __version__
# release = "0.0.1"
# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
extensions = ['myst_parser', 'sphinx.ext.autodoc']
templates_path = ['_templates']
exclude_patterns = []
# -- Options for HTML output -------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output
# html_theme = 'alabaster'
html_theme = 'sphinx_rtd_theme'
# html_theme = 'default'
html_static_path = ['_static']
# Automatically extract typehints when specified and place them in
# descriptions of the relevant function/method.
autodoc_typehints = "description"
# python_maximum_signature_line_length = 88
# Don't show class signature with the class' name.
autodoc_class_signature = "separated"
================================================
FILE: docs/source/index.rst
================================================
Welcome to PyTabKit's documentation!
======================================
.. toctree::
:maxdepth: 2
:caption: Contents:
Tabular ML models in pytabkit.models
===============================
.. toctree::
models/00_overview
models/01_sklearn_interfaces
models/02_hpo
models/examples
models/nn_classes
models/03_training_implementation
models/quantile_reg
Tabular benchmarking using pytabkit.bench
====================================
.. toctree::
bench/00_installation
bench/01_running_the_benchmark
bench/adding_models
bench/02_stored_data
bench/03_code
bench/download_results
bench/refine_then_calibrate
bench/using_the_scheduler
Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
================================================
FILE: docs/source/models/00_overview.md
================================================
# Overview of the `models` part
## Scikit-learn interfaces
We provide scikit-learn interfaces for various methods in
`sklearn/sklearn_interfaces.py`.
These use the default parameter dictionaries defined in `sklearn/default_params.py`.
## AlgInterface: more fine-grained control
We implement all our methods
through subclassing `AlgInterface` in `alg_interfaces/alg_interfaces.py`.
`AlgInterface` provides more functionality than scikit-learn interfaces,
which is crucial for our benchmarking in `pytabkit.bench`.
All our scikit-learn interfaces are wrappers around `AlgInterface` classes,
using the `sklearn.sklearn_base.AlgInterfaceEstimator` base class.
Compared to scikit-learn interfaces,
`AlgInterface` provides the following additional features:
- Vectorized evaluation on multiple train-validation-test splits
(used by RealMLP-TD and RealMLP-TD-S).
- Specification of train-validation-test splits, random seeds, temporary folder, custom loggers
- Inclusion of required resource estimates (CPU RAM, GPU RAM, GPU usage, n_threads, time)
- Evaluation on a list of metrics
- Refitting with best found parameters
## Hyperparameter handling
Hyperparameters are explicitly defined in scikit-learn constructors.
<!--- but otherwise, they are just passed on through `**kwargs` (often called `**config`).
Hence, some default parameters are just filled in
at the point where the parameter is used.-->
Elsewhere, we generally pass all configuration parameters as **kwargs,
then the corresponding functions pick out the parameters that they need
and pass the rest on to nested function calls.
This allows for very convenient coding,
but one has to pay attention for typos in parameter names,
which will often not be caught.
For example, one could have the following structure:
```python
def fit(**kwargs):
model = build_model(**kwargs)
train_model(model, **kwargs)
def build_model(n_layers=4, **kwargs):
...
def train_model(model, lr=4e-2, batch_size=256, **kwargs):
...
```
We usually write `**config` instead of `**kwargs`.
We also generally try to give unique names to parameters.
For example, the epsilon parameter of the optimizer
is called `opt_eps` and the epsilon parameter of label smoothing is called `ls_eps`.
## Internal data representation
We represent datasets internally using the `DictDataset` class.
It contains a dictionary of PyTorch tensors.
In our case, there are usually three tensors:
`'x_cont'` for continuous features,
`'x_cat'` for categorical features (`dtype=torch.long`), and
`'y'` for labels.
A `DictDataset` also contains a dictionary `tensor_infos`,
which for each of these keys contains a `TensorInfo` object.
The latter describes the number of features and,
if applicable, the number of categories for each feature
(for categorical variables or classification labels).
We reserve the category `0` as the category for missing values
(and values that have not been known to exist at train time).
Missing numerical values are currently not handled by the NN code,
so they need to be encoded beforehand.
## Data preprocessing (also available for other models)
Most models offer to customize the data preprocessing
through the `tfms` parameter.
This is done using the NN preprocessing code in
`nn_models.models.PreprocessingFactory`
(see the corresponding documentation page
for an explanation of the Factory classes).
## NN implementation
For the implementation of RealMLP,
we extend and alter the typical PyTorch structure,
see the documentation page on NN classes.
## Vectorization
Due to the vectorization of NN models, we use different terms for similar things:
- `n_cv` refers to the number
of training-validation splits in cross-validation (bagging)
- `n_refit` refers to the number of models
that are refitted on training+validation data after the CV stage
- `n_tv_splits` (or `n_models`) refers to the number of training-validation
splits used in the current training (could be `n_cv` or `n_refit`)
- `n_tt_splits` (or `n_parallel`) refers to the number of trainval-test splits used
(this is normally 1 when used through the scikit-learn interface,
but can be larger when using RealMLP through the benchmark)
================================================
FILE: docs/source/models/01_sklearn_interfaces.rst
================================================
Scikit-learn interfaces
=======================
We provide scikit-learn interfaces for numerous methods in
``pytabkit.models.sklearn.sklearn_interfaces``.
Below, we provide an overview.
All of our interfaces allow to specify the validation set(s)
and categorical features in the ``fit`` method:
.. autofunction:: pytabkit.models.sklearn.sklearn_base.AlgInterfaceEstimator.fit
Important: For HPO and ensemble interfaces, it is recommended to set `tmp_folder`
to allow these methods to store fitted models instead of holding them in the RAM.
This means that `tmp_folder` should not be deleted while the associated interface
still exists (even when it is pickled).
RealMLP
-------
For RealMLP, we provide TD (tuned default),
HPO (hyperparameter optimization with random search),
and Ensemble (weighted ensembling of random search configurations) variants:
- RealMLP_TD_Classifier
- RealMLP_TD_Regressor
- RealMLP_HPO_Classifier
- RealMLP_HPO_Regressor
- RealMLP_Ensemble_Classifier
- RealMLP_Ensemble_Regressor
While the TD variants have good defaults,
they provide the option to override any hyperparameters.
The classifier and regressor have the same hyperparameters,
therefore we only show the constructor of the classifier here.
The first parameters until (including) verbosity
are provided for every scikit-learn interface,
although ``random_state``, ``n_threads``, ``tmp_folder``,
and ``verbosity`` may be ignored by some of the methods.
.. autofunction:: pytabkit.models.sklearn.sklearn_interfaces.RealMLP_TD_Classifier.__init__
For the HPO and Ensemble variants, we currently only provide few options:
.. autofunction:: pytabkit.models.sklearn.sklearn_interfaces.RealMLP_HPO_Classifier.__init__
Boosted Trees
-------------
For boosted trees, we provide the same interfaces as for RealMLP (TD, D, and HPO variants),
but do not wrap the full parameter space from the respective libraries.
Here are some representative examples:
.. autofunction:: pytabkit.models.sklearn.sklearn_interfaces.XGB_TD_Classifier.__init__
.. autofunction:: pytabkit.models.sklearn.sklearn_interfaces.LGBM_TD_Classifier.__init__
.. autofunction:: pytabkit.models.sklearn.sklearn_interfaces.CatBoost_TD_Classifier.__init__
Other NN baselines
---------
We offer interfaces (D and HPO variants) for
- MLP (from the RTDL code)
- ResNet (from the RTDL code)
- FTT (FT-Transformer from the RTDL code)
- MLP-PLR (from the RTDL code)
- TabR (requires installing faiss)
- TabM
.. autofunction:: pytabkit.models.sklearn.sklearn_interfaces.MLP_RTDL_D_Classifier.__init__
.. autofunction:: pytabkit.models.sklearn.sklearn_interfaces.Resnet_RTDL_D_Classifier.__init__
.. autofunction:: pytabkit.models.sklearn.sklearn_interfaces.FTT_D_Classifier.__init__
.. autofunction:: pytabkit.models.sklearn.sklearn_interfaces.MLP_PLR_D_Classifier.__init__
.. autofunction:: pytabkit.models.sklearn.sklearn_interfaces.TabR_S_D_Classifier.__init__
.. autofunction:: pytabkit.models.sklearn.sklearn_interfaces.TabM_D_Classifier.__init__
xRFM
------
We offer D and HPO variants for xRFM.
.. autofunction:: pytabkit.models.sklearn.sklearn_interfaces.XRFM_D_Classifier.__init__
Other methods
-------------
For convenience, we wrap the scikit-learn RF and MLP interfaces
with our scikit-learn interfaces,
although in this case the validation sets are not used.
The respective classes are called
``RF_SKL_Classifier`` and ``MLP_SKL_Classifier`` etc.
We also provide our ``Ensemble_TD_Classifier`` and ``Ensemble_HPO_Classifier``,
a weighted ensemble of our TD / HPO models (and similar for regression).
..
test
.. autoclass:: pytabkit.models.sklearn.sklearn_interfaces.RealMLPConstructorMixin
test2
.. automodule:: pytabkit.models.sklearn.sklearn_interfaces
:members:
:undoc-members:
:show-inheritance:
Saving and loading
------------------
RealMLP and possibly other models (except probably TabR)
can be saved using pickle-like modules.
With standard pickling,
a model trained on a GPU will be restored to use the same GPU,
and fail to load if the GPU is not present.
(Note that dill fails to save torch models in newer torch versions,
while pickle can still save them.)
The following code allows to load GPU-trained models to the CPU,
but fails to run predict() due to pytorch-lightning device issues.
.. code-block:: language
import torch
import dill # might also work with pickle instead
torch.save(model, 'model.pkl', pickle_module=dill, _use_new_zipfile_serialization=False)
model = torch.load('model.pkl', map_location='cpu', pickle_module=dill)
================================================
FILE: docs/source/models/02_hpo.md
================================================
# Hyperparameter optimization
This is a guide how to perform hyperparameter optimization (HPO)
to get the best results out of RealMLP.
We consider RealMLP for classification here, but most of the guide
applies to regression and other baselines as well.
## Option 1: Using the HPO interface
The easiest option is to use the direct HPO interface:
```python
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from pytabkit.models.sklearn.sklearn_interfaces import RealMLP_HPO_Classifier
X, y = make_classification(random_state=42, n_samples=200, n_features=5)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0)
clf = RealMLP_HPO_Classifier(n_hyperopt_steps=10, n_cv=1, verbosity=2, val_metric_name='brier')
clf.fit(X_train, y_train)
clf.predict(X_test)
```
The code above
- runs random search with 10 configurations from the HPO space in the paper
(should be increased to, say, 50 for better results)
- only uses one training-validation split
(should be increased to, say, 5 for better results)
- prints validation results of each epoch and best found parameters thanks to `verbosity=2`
- selects the best model and best epoch based on the Brier score
(default would be classification error)
While using the interface directly is convenient, it has certain drawbacks:
- It is not possible to change the search space,
e.g. to reduce label smoothing for other metrics than classification error.
- It is not possible to save and resume from an intermediate state.
- It is not possible to use another HPO method than random search.
- It is not (easily) possible to access intermediate results.
Therefore, we now look at a more manual approach.
## Option 2: Performing your own HPO
The following code provides an example on how to do HPO manually.
```python
import numpy as np
import torch
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split, StratifiedKFold
from pytabkit.models.alg_interfaces.nn_interfaces import RealMLPParamSampler
from pytabkit.models.sklearn.sklearn_interfaces import RealMLP_TD_Classifier
from pytabkit.models.training.metrics import Metrics
n_hyperopt_steps = 10
n_cv = 1
is_classification = True
X, y = make_classification(random_state=42, n_samples=200, n_features=5)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0)
# We compute train-validation splits here instead of letting the sklearn interface do it
# such that we can compute the validation error ourselves
if n_cv == 1:
# we cannot do 1-fold CV, so we do an 80%-20% train-validation split
_, val_idxs = train_test_split(np.arange(X_train.shape[0]), test_size=0.2, random_state=0)
val_idxs = val_idxs[None, :]
else:
skf = StratifiedKFold(n_splits=n_cv, shuffle=True, random_state=0)
val_idxs_list = [val_idxs for train_idxs, val_idxs in skf.split(X_train, y_train)]
# make sure that each validation set has the same length, so we can exploit vectorization
max_len = max([len(val_idxs) for val_idxs in val_idxs_list])
val_idxs_list = [val_idxs[:max_len] for val_idxs in val_idxs_list]
val_idxs = np.asarray(val_idxs_list)
best_val_loss = np.Inf
best_clf = None
best_params = None
for hpo_step in range(n_hyperopt_steps):
# sample random params according to the proposed search space, but this can be replaced by a custom HPO method
params = RealMLPParamSampler(is_classification=is_classification).sample_params(seed=hpo_step)
# we only use one classifier that will fit n_cv sub-models, since RealMLP can vectorize the fitting,
# but it would also be possible to use one classifier per cross-validation split.
clf = RealMLP_TD_Classifier(**params, n_cv=n_cv, verbosity=2, val_metric_name='brier')
clf.fit(X_train, y_train, val_idxs=val_idxs)
# evaluate validation loss
# for n_cv >= 2, predict_proba() only outputs averaged predictions of the cross-validation models,
# but we need separate predictions of each of the cross-validation members to extract the out-of-bag ones,
# so we use predict_proba_ensemble().
# There is also predict_ensemble() which replaces predict().
y_pred_prob = clf.predict_proba_ensemble(X_train)
val_predictions = np.concatenate([y_pred_prob[i, val_idxs[i, :]] for i in range(n_cv)], axis=0)
val_labels = np.concatenate([y_train[val_idxs[i, :]] for i in range(n_cv)], axis=0)
val_logits = np.log(val_predictions + 1e-30)
val_loss = Metrics.apply(torch.as_tensor(val_logits, dtype=torch.float32), torch.as_tensor(val_labels),
metric_name='brier').item()
# update best model if loss improved
if val_loss < best_val_loss:
best_val_loss = val_loss
best_clf = clf
best_params = params
best_clf.predict(X_test)
print(f'best params: {best_params}')
```
Here is the equivalent search space for `hyperopt`:
```python
from hyperopt import hp
import numpy as np
space = {
'num_emb_type': hp.choice('num_emb_type', ['none', 'pbld', 'pl', 'plr']),
'add_front_scale': hp.pchoice('add_front_scale', [(0.6, True), (0.4, False)]),
'lr': hp.loguniform('lr', np.log(2e-2), np.log(3e-1)),
'p_drop': hp.pchoice('p_drop', [(0.3, 0.0), (0.5, 0.15), (0.2, 0.3)]),
'wd': hp.choice('wd', [0.0, 2e-2]),
'plr_sigma': hp.loguniform('plr_sigma', np.log(0.05), np.log(0.5)),
'hidden_sizes': hp.pchoice('hidden_sizes', [(0.6, [256] * 3), (0.2, [64] * 5), (0.2, [512])]),
'act': hp.choice('act', ['selu', 'mish', 'relu']),
'ls_eps': hp.pchoice('ls_eps', [(0.3, 0.0), (0.7, 0.1)])
}
```
================================================
FILE: docs/source/models/03_training_implementation.md
================================================
# Training directly with PyTorch Lightning
## Using PyTorch Lightning
The TabNN models are implemented using [Pytorch Lightning](https://lightning.ai/docs/pytorch/stable/).
It follows the following training implementation principle as described [here](https://lightning.ai/docs/pytorch/stable/model/train_model_basic.html):
```python
# define Dataloader
train_loader = DataLoader(x_train, y_train)
val_loader = DataLoader(x_val, y_val)
test_loader = DataLoader(x_test, y_test)
# define model using a Pytorch LightningModule
nn_model = MyModel(hyper_param1, hyper_param2, ...)
# train model using the Pytorch Lightning Trainer
trainer = pl.Trainer()
trainer.fit(model=nn_model, train_dataloaders=train_loader, val_dataloaders=val_loader)
# make predictions using the Trainer
pred = trainer.predict(nn_model, dataloaders=test_loader)
```
In our use case, adapted to the Tabular NN Network, the implementation looks like this:
``` { .python .annotate }
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from pytabkit.models.alg_interfaces.base import SplitIdxs, InterfaceResources
from pytabkit.models.data.data import DictDataset, TensorInfo
from pytabkit.models.sklearn.default_params import DefaultParams
from pytabkit.models.training.lightning_modules import TabNNModule
import lightning.pytorch as pl # or: import pytorch_lightning as pl
import numpy as np
import torch
n_epochs = 200
X, y = make_classification()
idxs = np.arange(len(X))
trainval_idxs, test_idxs = train_test_split(idxs, test_size=0.2)
n_trainval_splits = 5
train_idxs_list = []
val_idxs_list = []
for i in range(n_trainval_splits):
train_idxs, val_idxs = train_test_split(trainval_idxs, test_size=0.2)
train_idxs_list.append(train_idxs)
val_idxs_list.append(val_idxs)
# define datasets
ds = DictDataset(tensors={'x_cont': torch.as_tensor(X, dtype=torch.float32),
'x_cat': torch.zeros(len(X), 0),
'y': torch.as_tensor(y, dtype=torch.long)[:, None]},
tensor_infos={'x_cont': TensorInfo(feat_shape=[X.shape[1]]),
'x_cat': TensorInfo(cat_sizes=[]),
'y': TensorInfo(cat_sizes=[np.max(y) + 1])}, ) # (1)
train_val_splitting_idxs_list = [
SplitIdxs(train_idxs=torch.as_tensor(np.stack(train_idxs_list, axis=0), dtype=torch.long),
val_idxs=torch.as_tensor(np.stack(val_idxs_list, axis=0), dtype=torch.long),
test_idxs=torch.as_tensor(test_idxs, dtype=torch.long),
split_seed=0, sub_split_seeds=list(range(len(train_idxs_list))), split_id=0)]
test_ds = ds.get_sub_dataset(torch.as_tensor(test_idxs, dtype=torch.long))
# Create assigned resources
# interface_resources = InterfaceResources(n_threads=4, gpu_devices=['cuda:0']) # (2)
interface_resources = InterfaceResources(n_threads=4, gpu_devices=[]) # (2)
# define the model using our LightningModule TabNNModule
nn_model = TabNNModule(**DefaultParams.RealMLP_TD_CLASS)
# build and 'compile' the model using the data, now it is ready to use
nn_model.compile_model(ds, train_val_splitting_idxs_list, interface_resources)
# train the model using the Pytorch Lightning Trainer
trainer = pl.Trainer(
callbacks=nn_model.create_callbacks(),
max_epochs=n_epochs,
enable_checkpointing=False,
enable_progress_bar=False,
num_sanity_val_steps=0,
logger=pl.loggers.logger.DummyLogger(),
) # (3)
trainer.fit(
model=nn_model,
train_dataloaders=nn_model.train_dl,
val_dataloaders=nn_model.val_dl
)
# make predictions using the Trainer
pred = trainer.predict(
model=nn_model,
dataloaders=nn_model.get_predict_dataloader(test_ds)
)
```
1. The NN Models have special requirements for their dataloaders, therefore we need to use the `DictDataset` first to create a dataset for both training and validation.
2. We handle our resource management manually, not with Lightning, therefore we need to create an `InterfaceResources` object
3. We use the original [`Trainer`](https://lightning.ai/docs/pytorch/stable/common/trainer.html#trainer-class-api) Class from Lightning. However, all of the parameters specified here are obligatory for the TabNNModule to work properly.
================================================
FILE: docs/source/models/examples.md
================================================
# Examples
## Refitting RealMLP on train+val data using the best epoch from a previous run
You can refit RealMLP by simply using $n_refit=1$
(or, better, larger values to ensemble multiple NNs).
But in case you want more control, you can do it manually
(e.g., if you only want to refit the best configuration from HPO,
but you're not using the HPO within pytabkit).
```python
import numpy as np
from sklearn.model_selection import train_test_split
from pytabkit import RealMLP_TD_Regressor
np.random.seed(0)
X = np.random.randn(500, 5)
y = np.random.randn(500)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=0)
reg = RealMLP_TD_Regressor(verbosity=2, random_state=0)
reg.fit(X_train, y_train, X_val, y_val)
refit = RealMLP_TD_Regressor(verbosity=2, stop_epoch=list(reg.fit_params_['stop_epoch'].values())[0], val_fraction=0.0, random_state=0)
refit.fit(X, y)
```
## Fitting again after HPO on a smaller subset
Here is an example on how to fit HPO on a smaller subset
and fit the best configuration again with validation.
(It might be better to just use `n_refit` in the HPO classifier/regressor instead.)
```python
import numpy as np
from sklearn.model_selection import train_test_split
from pytabkit import LGBM_HPO_TPE_Regressor, LGBM_TD_Regressor
# This is an example on how to fit a HPO method on a smaller subset of the data,
# and then refit the best hyperparams on the full dataset
np.random.seed(0)
X = np.random.randn(500, 5)
y = np.random.randn(500)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.9, random_state=0)
# use 90% for validation to train faster
# if there is too much validation data, validation data might be the bottleneck, then you should pass
model = LGBM_HPO_TPE_Regressor(val_fraction=0.9, n_hyperopt_steps=5)
model.fit(X, y)
# unfortunately params are not always called the same way, so we need to rename a few
params = model.fit_params_['hyper_fit_params']
params['subsample'] = params.pop('bagging_fraction')
params['colsample_bytree'] = params.pop('feature_fraction')
params['lr'] = params.pop('learning_rate')
# unfortunately, it is hard right now to check if this is exactly the same config,
# as this might set some default params that are not used in the HPO config
model_refit = LGBM_TD_Regressor(**params)
model_refit.fit(X, y)
```
================================================
FILE: docs/source/models/nn_classes.md
================================================
# NN implementation
While RealMLP is implemented in PyTorch,
we extend the conventional `nn.Module` logic.
Traditionally, one writes some PyTorch code to assemble a NN model,
which is a nn.Module composed of building blocks
that are also nn.Module objects (Composite design pattern).
The nn.Module classes initialize the parameters in the constructor
and are then callable objects providing the forward() transformation.
Data preprocessing is done separately via different code/classes.
We use a different structure of classes that unifies preprocessing and NN layers,
which is useful for vectorized NNs:
The vectorized NNs can share a single non-preprocessed data set,
loaded into GPU RAM,
while having different preprocessing parameters
(fitted on different training sets since different splits are used).
Individual preprocessed data sets are never fully instantiated in GPU RAM;
instead, the vectorized NN models do preprocessing on batches individually,
which saves GPU RAM (we're talking e.g.
about having 50-100 NNs on the same GPU at the same time).
The class structure uses three base classes:
- `Layer` classes are similar to nn.Module,
but they do not perform random initialization in the constructor.
Instead, they simply take the already initialized parameters as input.
There are some additional features:
Layer objects of the same type can be combined into a vectorized Layer.
The vectorized NN is not built directly,
but first NNs are built and initialized sequentially for better reproducibility
(random seed etc.) and RAM saving,
and then they are vectorized after initialization using the Layer.stack() function.
Additionally, Layer classes work with the DictDataset class,
which usually contains 'x_cont' and 'x_cat' tensors
for continuous and categorical variables.
Moreover, during training, we also pass the labels 'y' through the Layer,
which allows to implement mixup, label smoothing,
and output standardization as Layer objects.
- `Fitter` classes initialize the NN based on a single forward pass
on the (subsampled) training (and possibly validation) set.
This is done using the `fit()` or `fit_transform()` functions
similar to scikit-learn preprocessing classes,
which return a `Layer` object
(and, in case of `fit_transform()`, the transformed dataset).
Initialization can be random or depending on the so far transformed training set.
Typically, parameters of preprocessing layers
such as standardization depend on the training set,
while NN parameters do not depend on the training set.
However, we also use weight and bias initializations
that depend on the training set,
and the unification of NN and preprocessing makes this much more convenient.
- `FitterFactory` (could also be called ArchitectureBuilder) classes
build the NN structure based on the input and output shape and type.
Specifically, `FitterFactory` objects can build `Fitter` objects
given the corresponding 'tensor_infos' of the data set,
which specifies the number of continuous variables,
the number of categorical variables and the category sizes,
and the same for the labels.
For example, a `FitterFactory` can decide to use one-hot encoding
for categorical variables with small category sizes,
and Embedding layers for larger category sizes.
The `Layer`, `Fitter`, and `FitterFactory` classes are defined in `model/base.py`.
Other subclasses are also defined in `model` folder. There are some more features:
- We introduce a class called `Variable` that inherits from `torch.nn.Parameter`.
Variable has a parameter `trainable: bool`, and in the case `trainable==False`,
the `Layer` class will register it using `register_buffer()`.
One might also be able to just use `nn.Parameter(..., requires_grad=False)`
for this, though we did not check whether it has the same effect
(will it be saved when using `model.state_dict()`?).
There is also the convenience function `Variable.stack()` used by `Layer.stack()`.
Moreover, Variables can have names
(to assign individual hyperparameter values to them),
and they can have custom hyperparameter factors
(e.g. to specify that the lr should be multiplied
by a certain value for this Variable).
- The classes above can be given scope names,
which are then prepended to variable names.
For example, using scope names,
the weight of the first linear layer
in a NN could be called 'net/first_layer/layer-0/weight',
where 0 is the layer index and 'first_layer' is
redundant information that can be useful when regex matching variable names.
One can assign an individual lr to this layer by using
`lr={'': global_lr, '.*first_layer.*weight': first_layer_weight_lr}`
in `**kwargs` to the `NNAlgInterface`.
This works as follows: The `HyperparamManager`,
which is available through a global context managed by the `TrainContext` class,
stores the hyperparameter configurations obtained through **kwargs.
Different classes can require getters for specific hyperparameters
for specific variables.
If multiple lr values are specified above,
the one from the last matching regex is taken.
The scope names are passed on from FitterFactory to Fitter and then
to Layer and Variable by a somewhat complicated context manager system,
for which I didn't find a more elegant solution.
- Fitter objects can be split up in three parts using the `split_off_dynamic()`
and `split_off_individual()` functions.
The static part would typically be the one-hot encoding,
since it does not depend on the data and is not trainable,
which means that even in a vectorized context,
it can be applied once to the single shared data set
since it does not depend on the train/val/test split.
Then, there is the dynamic but not individual part,
which can depend on the fitting data but is not trained or randomized,
and can therefore be shared by models with the same trainval-test split.
Finally, there is the individual (trainable/randomized) part,
which is usually the NN part.
- `Fitter` classes should implement methods that allow to estimate
the RAM usage of the parameters and a forward pass,
which allows to decide how many NNs fit onto a GPU when running the benchmark.
================================================
FILE: docs/source/models/quantile_reg.md
================================================
# (Multi)quantile regression with RealMLP
RealMLP supports multiquantile regression, for example by using
```python
from pytabkit import RealMLP_TD_Regressor
reg = RealMLP_TD_Regressor(
train_metric_name='multi_pinball(0.25,0.5,0.75)',
val_metric_name='multi_pinball(0.25,0.5,0.75)'
)
```
This will adjust the training objective
as well as the metric for best-epoch selection on the validation set.
The quantiles can be specified in any format
that Python can convert to a float.
There must be no space between the commas,
and the quantiles need to be in ascending order.
The latter is relevant because RealMLP
will by default sort the prediction outputs,
to always have ascending quantile predictions.
This can be deactivated by passing `sort_quantile_predictions=False`.
================================================
FILE: examples/tutorial_notebook.ipynb
================================================
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "enZVuzCHCy1n"
},
"source": [
"**To train neural networks faster, you need to enable GPUs for the notebook:**\n",
"* Navigate to Edit→Notebook Settings\n",
"* select GPU from the Hardware Accelerator drop-down"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "rtKFT1oSCy1p"
},
"source": [
"# Setup"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Sr0lfFYqCy1q"
},
"source": [
"## Installation"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "d-Zn1o8jCy1q"
},
"outputs": [],
"source": [
"!pip install pytabkit\n",
"!pip install openml"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "V1Qo43ciCy1r"
},
"source": [
"## Getting a dataset"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"id": "o-MpREHMCy1r"
},
"outputs": [],
"source": [
"import openml\n",
"from sklearn.model_selection import train_test_split\n",
"import numpy as np\n",
"\n",
"task = openml.tasks.get_task(361113) # covertype dataset\n",
"dataset = openml.datasets.get_dataset(task.dataset_id, download_data=False)\n",
"X, y, categorical_indicator, attribute_names = dataset.get_data(\n",
" dataset_format='dataframe',\n",
" target=task.target_name\n",
")\n",
"# we restrict to 15K samples for demonstration purposes\n",
"index = np.random.choice(range(len(X)), 15_000, replace=False)\n",
"X = X.iloc[index]\n",
"y = y.iloc[index]\n",
"\n",
"X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "PeMtLz0ICy1s"
},
"source": [
"# Using RealMLP"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "CgSOr3l0Cy1s",
"outputId": "d2b0ea97-45ac-4a9e-ff3d-291d72094615"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Accuracy of RealMLP: 0.8770666666666667\n",
"CPU times: user 1min 11s, sys: 192 ms, total: 1min 11s\n",
"Wall time: 1min 11s\n"
]
}
],
"source": [
"%%time\n",
"from pytabkit import RealMLP_TD_Classifier\n",
"from sklearn.metrics import accuracy_score\n",
"\n",
"model = RealMLP_TD_Classifier()\n",
"model.fit(X_train, y_train)\n",
"\n",
"y_pred = model.predict(X_test)\n",
"acc = accuracy_score(y_test, y_pred)\n",
"print(f\"Accuracy of RealMLP: {acc}\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "-G8Oblk5Cy1s"
},
"source": [
"## With bagging\n",
"It is possible to do bagging (ensembling of models on 5-fold cross-validation) simply by passing `n_cv=5` to the constructor. Note that it doesn't take 5x as long because of vectorized training."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "i0NpWvjKCy1s",
"outputId": "89c07496-fd0e-4f46-ea59-3457f8a35371"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Accuracy of RealMLP with bagging: 0.8930666666666667\n",
"CPU times: user 1min 8s, sys: 180 ms, total: 1min 9s\n",
"Wall time: 1min 8s\n"
]
}
],
"source": [
"%%time\n",
"from pytabkit import RealMLP_TD_Classifier\n",
"from sklearn.metrics import accuracy_score\n",
"\n",
"model = RealMLP_TD_Classifier(n_cv=5)\n",
"model.fit(X_train, y_train)\n",
"\n",
"y_pred = model.predict(X_test)\n",
"acc = accuracy_score(y_test, y_pred)\n",
"print(f\"Accuracy of RealMLP with bagging: {acc}\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "KHphiGKBCy1t"
},
"source": [
"## With hyperparameter optimization\n",
"It is possible to do hyperparameter optimization directly inside a sklearn interface by using the `RealMLP_HPO_Regressor` interface.\n",
"This is also available for classification, and for other models, for instance `LGBM_HPO_Classifier` or `LGBM_HPO_TPE_Classifier` (to use the Tree-structured Parzen Estimator algorithm)."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "7e4wjdYJCy1t",
"outputId": "a7ed7867-c808-4ed9-dbc2-badea992eae2"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Accuracy of RealMLP with 3 steps HPO: 0.8605333333333334\n",
"CPU times: user 2min 27s, sys: 442 ms, total: 2min 28s\n",
"Wall time: 2min 28s\n"
]
}
],
"source": [
"%%time\n",
"from pytabkit import RealMLP_HPO_Classifier\n",
"from sklearn.metrics import accuracy_score\n",
"\n",
"n_hyperopt_steps = 3 # small number for demonstration purposes\n",
"model = RealMLP_HPO_Classifier(n_hyperopt_steps=n_hyperopt_steps)\n",
"model.fit(X_train, y_train)\n",
"\n",
"y_pred = model.predict(X_test)\n",
"acc = accuracy_score(y_test, y_pred)\n",
"print(f\"Accuracy of RealMLP with {n_hyperopt_steps} steps HPO: {acc}\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "SB0D5MnbCy1t"
},
"source": [
"# Using improved default for tree based models"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "OLulH2rGCy1t"
},
"source": [
"`TD` stands for *tuned defaults*, which are the improved default we propose. `D` stands for *defaults*, which are the libraries defaults."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "UEZU3kaDCy1t",
"outputId": "1c5bd06f-caf6-499c-8f84-5496db9d0ce6"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Accuracy of CatBoost_TD_Classifier: 0.8685333333333334\n",
"Accuracy of CatBoost_D_Classifier: 0.8464\n",
"Accuracy of LGBM_TD_Classifier: 0.8602666666666666\n",
"Accuracy of LGBM_D_Classifier: 0.8344\n",
"Accuracy of XGB_TD_Classifier: 0.8544\n",
"Accuracy of XGB_D_Classifier: 0.8472\n",
"CPU times: user 1min 55s, sys: 44.3 s, total: 2min 40s\n",
"Wall time: 24 s\n"
]
}
],
"source": [
"%%time\n",
"from pytabkit import CatBoost_TD_Classifier, CatBoost_D_Classifier, LGBM_TD_Classifier, LGBM_D_Classifier, XGB_TD_Classifier, XGB_D_Classifier\n",
"\n",
"for model in [CatBoost_TD_Classifier(), CatBoost_D_Classifier(), LGBM_TD_Classifier(), LGBM_D_Classifier(), XGB_TD_Classifier(), XGB_D_Classifier()]:\n",
" model.fit(X_train, y_train)\n",
" y_pred = model.predict(X_test)\n",
" acc = accuracy_score(y_test, y_pred)\n",
" print(f\"Accuracy of {model.__class__.__name__}: {acc}\")\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "tMzbmtJMCy1t"
},
"source": [
"# Ensembling tuned defaults of tree-based methods and RealMLP: a very good baseline"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "JZJH1sWfCy1t",
"outputId": "8d059418-5236-4a84-b55a-6829200bb330"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Accuracy of Ensemble_TD_Classifier: 0.8834666666666666\n",
"CPU times: user 2min 34s, sys: 38 s, total: 3min 12s\n",
"Wall time: 1min 30s\n"
]
}
],
"source": [
"%%time\n",
"from pytabkit import Ensemble_TD_Classifier\n",
"\n",
"model = Ensemble_TD_Classifier()\n",
"model.fit(X_train, y_train)\n",
"y_pred = model.predict(X_test)\n",
"acc = accuracy_score(y_test, y_pred)\n",
"print(f\"Accuracy of Ensemble_TD_Classifier: {acc}\")"
]
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"gpuType": "T4",
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "undefined.undefined.undefined"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
================================================
FILE: original_requirements/conda_env_2024_06_25.yml
================================================
name: tab_bench_venv_3
channels:
- pytorch
- nvidia
- defaults
dependencies:
- _libgcc_mutex=0.1
- _openmp_mutex=5.1
- _py-xgboost-mutex=2.0
- abseil-cpp=20211102.0
- arrow-cpp=11.0.0
- atk-1.0=2.36.0
- aws-c-common=0.6.8
- aws-c-event-stream=0.1.6
- aws-checksums=0.1.11
- aws-sdk-cpp=1.8.185
- blas=1.0
- boost-cpp=1.82.0
- bottleneck=1.3.5
- brotli=1.0.9
- brotli-bin=1.0.9
- brotlipy=0.7.0
- bzip2=1.0.8
- c-ares=1.19.1
- ca-certificates=2024.3.11
- cairo=1.16.0
- catboost=1.2
- certifi=2024.6.2
- cffi=1.16.0
- charset-normalizer=2.0.4
- configparser=5.0.2
- contourpy=1.2.0
- coverage=7.2.2
- cryptography=41.0.7
- cuda-cudart=11.7.99
- cuda-cupti=11.7.101
- cuda-libraries=11.7.1
- cuda-nvrtc=11.7.99
- cuda-nvtx=11.7.91
- cuda-runtime=11.7.1
- cudatoolkit=11.4.1
- cycler=0.11.0
- cyrus-sasl=2.1.28
- cython=3.0.6
- dbus=1.13.18
- dill=0.3.7
- et_xmlfile=1.1.0
- exceptiongroup=1.2.0
- expat=2.5.0
- faiss-gpu=1.7.4
- filelock=3.13.1
- font-ttf-dejavu-sans-mono=2.37
- font-ttf-inconsolata=2.001
- font-ttf-source-code-pro=2.030
- font-ttf-ubuntu=0.83
- fontconfig=2.14.1
- fonts-anaconda=1
- fonts-conda-ecosystem=1
- fonttools=4.25.0
- freetype=2.12.1
- fribidi=1.0.10
- fsspec=2023.10.0
- future=0.18.3
- gdk-pixbuf=2.42.10
- gflags=2.2.2
- giflib=5.2.1
- glib=2.69.1
- glog=0.5.0
- gmp=6.2.1
- gmpy2=2.1.2
- gobject-introspection=1.72.0
- graphite2=1.3.14
- graphviz=2.50.0
- grpc-cpp=1.48.2
- gst-plugins-base=1.14.1
- gstreamer=1.14.1
- gtk2=2.24.33
- gts=0.7.6
- harfbuzz=4.3.0
- icu=73.1
- idna=3.4
- iniconfig=1.1.1
- intel-openmp=2021.4.0
- jinja2=3.1.2
- joblib=1.2.0
- jpeg=9e
- kiwisolver=1.4.4
- krb5=1.20.1
- lcms2=2.12
- ld_impl_linux-64=2.38
- lerc=3.0
- liac-arff=2.5.0
- libboost=1.82.0
- libbrotlicommon=1.0.9
- libbrotlidec=1.0.9
- libbrotlienc=1.0.9
- libclang=14.0.6
- libclang13=14.0.6
- libcublas=11.10.3.66
- libcufft=10.7.2.124
- libcufile=1.8.1.2
- libcups=2.4.2
- libcurand=10.3.4.107
- libcurl=8.5.0
- libcusolver=11.4.0.1
- libcusparse=11.7.4.91
- libdeflate=1.17
- libedit=3.1.20230828
- libev=4.33
- libevent=2.1.12
- libfaiss=1.7.4
- libffi=3.4.4
- libgcc-ng=11.2.0
- libgd=2.3.3
- libgfortran-ng=11.2.0
- libgfortran5=11.2.0
- libgomp=11.2.0
- libiconv=1.16
- libllvm14=14.0.6
- libnghttp2=1.57.0
- libnpp=11.7.4.75
- libnvjpeg=11.8.0.2
- libpng=1.6.39
- libpq=12.17
- libprotobuf=3.20.3
- librsvg=2.54.4
- libssh2=1.10.0
- libstdcxx-ng=11.2.0
- libthrift=0.15.0
- libtiff=4.5.1
- libtool=2.4.6
- libuuid=1.41.5
- libwebp=1.3.2
- libwebp-base=1.3.2
- libxcb=1.15
- libxgboost=1.7.3
- libxkbcommon=1.0.1
- libxml2=2.10.4
- lightgbm=4.1.0
- lightning-utilities=0.9.0
- llvm-openmp=14.0.6
- lz4-c=1.9.4
- markupsafe=2.1.3
- matplotlib=3.8.0
- matplotlib-base=3.8.0
- minio=7.1.0
- mkl=2021.4.0
- mkl-service=2.4.0
- mkl_fft=1.3.1
- mkl_random=1.2.2
- mpc=1.1.0
- mpfr=4.0.2
- mpmath=1.3.0
- munkres=1.1.4
- mysql=5.7.24
- ncurses=6.4
- networkx=3.1
- ninja=1.10.2
- ninja-base=1.10.2
- nspr=4.35
- nss=3.89.1
- numexpr=2.8.4
- openjpeg=2.4.0
- openml=0.12.2
- openpyxl=3.0.10
- openssl=3.0.13
- orc=1.7.4
- packaging=23.1
- pandas=2.1.4
- pango=1.50.7
- pcre=8.45
- pigz=2.6
- pillow=10.0.1
- pip=23.3.1
- pixman=0.40.0
- platformdirs=3.10.0
- plotly=5.9.0
- pluggy=1.0.0
- ply=3.11
- pooch=1.7.0
- poppler=22.12.0
- poppler-data=0.4.11
- psutil=5.9.0
- py-xgboost=1.7.3
- pyarrow=11.0.0
- pycparser=2.21
- pyopenssl=23.2.0
- pyparsing=3.0.9
- pyqt=5.15.10
- pyqt5-sip=12.13.0
- pysocks=1.7.1
- pytest=7.4.0
- pytest-cov=4.1.0
- python=3.10.13
- python-dateutil=2.8.2
- python-graphviz=0.20.1
- python-tzdata=2023.3
- pytorch=2.0.1
- pytorch-cuda=11.7
- pytorch-lightning=2.0.3
- pytorch-mutex=1.0
- pytz=2023.3.post1
- pyyaml=6.0.1
- qt-main=5.15.2
- re2=2022.04.01
- readline=8.2
- requests=2.31.0
- scikit-learn=1.3.0
- scipy=1.10.1
- seaborn=0.12.2
- setuptools=68.2.2
- sip=6.7.12
- six=1.16.0
- snappy=1.1.10
- sqlite=3.41.2
- swig=4.0.2
- sympy=1.12
- tbb=2021.8.0
- tenacity=8.2.2
- threadpoolctl=2.2.0
- tk=8.6.12
- toml=0.10.2
- tomli=2.0.1
- torchmetrics=1.1.2
- torchtriton=2.0.0
- tornado=6.3.3
- tqdm=4.65.0
- typing-extensions=4.9.0
- typing_extensions=4.9.0
- tzdata=2023d
- urllib3=1.26.16
- utf8proc=2.6.1
- wheel=0.41.2
- xgboost=1.7.3
- xlrd=2.0.1
- xmltodict=0.13.0
- xz=5.4.5
- yaml=0.2.5
- zlib=1.2.13
- zstd=1.5.5
- pip:
- adjusttext==1.0.4
- aiosignal==1.3.1
- annotated-types==0.6.0
- attrs==23.2.0
- babel==2.14.0
- blis==0.7.11
- catalogue==2.0.10
- cir-model==0.2.0
- click==8.1.7
- cloudpathlib==0.16.0
- cloudpickle==3.0.0
- colorama==0.4.6
- confection==0.1.4
- configspace==0.7.1
- cramjam==2.8.1
- cymem==2.0.8
- dask==2024.1.1
- dask-jobqueue==0.8.2
- distributed==2024.1.1
- einops==0.7.0
- emcee==3.1.4
- fastparquet==2023.10.1
- fire==0.5.0
- frozenlist==1.4.1
- gensim==4.3.2
- ghp-import==2.1.0
- griffe==0.39.1
- hyperopt==0.2.7
- importlib-metadata==7.0.1
- imutils==0.5.4
- jsonschema==4.21.1
- jsonschema-specifications==2023.12.1
- kditransform==0.2.0
- langcodes==3.3.0
- llvmlite==0.41.1
- locket==1.0.0
- markdown==3.5.2
- mergedeep==1.3.4
- mkdocs==1.5.3
- mkdocs-autorefs==0.5.0
- mkdocs-material==9.5.6
- mkdocs-material-extensions==1.3.1
- mkdocstrings==0.24.0
- mkdocstrings-python==1.8.0
- more-itertools==10.2.0
- msgpack==1.0.7
- msgpack-numpy==0.4.8
- murmurhash==1.0.10
- numba==0.58.1
- numpy==1.26.4
- nvidia-cublas-cu12==12.1.3.1
- nvidia-cuda-cupti-cu12==12.1.105
- nvidia-cuda-nvrtc-cu12==12.1.105
- nvidia-cuda-runtime-cu12==12.1.105
- nvidia-cudnn-cu12==8.9.2.26
- nvidia-cufft-cu12==11.0.2.54
- nvidia-curand-cu12==10.3.2.106
- nvidia-cusolver-cu12==11.4.5.107
- nvidia-cusparse-cu12==12.1.0.106
- nvidia-nccl-cu12==2.18.1
- nvidia-nvjitlink-cu12==12.3.101
- nvidia-nvtx-cu12==12.1.105
- opencv-contrib-python==4.9.0.80
- paginate==0.5.6
- partd==1.4.1
- pathspec==0.12.1
- patool==2.1.1
- preshed==3.0.9
- protobuf==4.25.2
- py4j==0.10.9.7
- pydantic==2.5.3
- pydantic-core==2.14.6
- pygments==2.17.2
- pymdown-extensions==10.7
- pynisher==1.0.10
- pynvml==11.5.0
- pyrfr==0.9.0
- pytorch-widedeep==1.4.0
- pyyaml-env-tag==0.1
- ray==2.9.1
- referencing==0.32.1
- regex==2023.12.25
- rpds-py==0.17.1
- skorch==0.15.0
- smac==2.0.2
- smart-open==6.4.0
- sortedcontainers==2.4.0
- spacy==3.7.2
- spacy-legacy==3.0.12
- spacy-loggers==1.0.5
- srsly==2.4.8
- tabulate==0.9.0
- tblib==3.0.0
- termcolor==2.4.0
- thinc==8.2.2
- toolz==0.12.1
- torch==2.1.2
- torchvision==0.16.2
- triton==2.1.0
- tueplots==0.0.13
- typer==0.9.0
- venn-abers==1.4.1
- wasabi==1.1.2
- watchdog==3.0.0
- weasel==0.3.4
- wrapt==1.16.0
- zict==3.0.0
- zipp==3.17.0
================================================
FILE: original_requirements/conda_env_2024_10_28.yml
================================================
name: tab_bench_conda
channels:
- pytorch
- nvidia
- defaults
dependencies:
- _libgcc_mutex=0.1
- _openmp_mutex=5.1
- _py-xgboost-mutex=2.0
- abseil-cpp=20211102.0
- arrow-cpp=11.0.0
- atk-1.0=2.36.0
- aws-c-common=0.6.8
- aws-c-event-stream=0.1.6
- aws-checksums=0.1.11
- aws-sdk-cpp=1.8.185
- blas=1.0
- boost-cpp=1.82.0
- bottleneck=1.3.5
- brotli=1.0.9
- brotli-bin=1.0.9
- brotlipy=0.7.0
- bzip2=1.0.8
- c-ares=1.19.1
- ca-certificates=2024.7.2
- cairo=1.16.0
- catboost=1.2.3
- certifi=2024.8.30
- cffi=1.16.0
- charset-normalizer=2.0.4
- configparser=5.0.2
- contourpy=1.2.0
- coverage=7.2.2
- cryptography=41.0.7
- cuda-cudart=11.7.99
- cuda-cupti=11.7.101
- cuda-libraries=11.7.1
- cuda-nvrtc=11.7.99
- cuda-nvtx=11.7.91
- cuda-runtime=11.7.1
- cudatoolkit=11.4.1
- cycler=0.11.0
- cyrus-sasl=2.1.28
- cython=3.0.6
- dbus=1.13.18
- dill=0.3.7
- et_xmlfile=1.1.0
- exceptiongroup=1.2.0
- expat=2.5.0
- faiss-gpu=1.7.4
- filelock=3.13.1
- font-ttf-dejavu-sans-mono=2.37
- font-ttf-inconsolata=2.001
- font-ttf-source-code-pro=2.030
- font-ttf-ubuntu=0.83
- fontconfig=2.14.1
- fonts-anaconda=1
- fonts-conda-ecosystem=1
- fonttools=4.25.0
- freetype=2.12.1
- fribidi=1.0.10
- fsspec=2023.10.0
- future=0.18.3
- gdk-pixbuf=2.42.10
- gflags=2.2.2
- giflib=5.2.1
- glib=2.69.1
- glog=0.5.0
- gmp=6.2.1
- gmpy2=2.1.2
- gobject-introspection=1.72.0
- graphite2=1.3.14
- graphviz=2.50.0
- grpc-cpp=1.48.2
- gst-plugins-base=1.14.1
- gstreamer=1.14.1
- gtk2=2.24.33
- gts=0.7.6
- harfbuzz=4.3.0
- icu=73.1
- idna=3.4
- iniconfig=1.1.1
- intel-openmp=2021.4.0
- jinja2=3.1.2
- joblib=1.2.0
- jpeg=9e
- kiwisolver=1.4.4
- krb5=1.20.1
- lcms2=2.12
- ld_impl_linux-64=2.38
- lerc=3.0
- liac-arff=2.5.0
- libboost=1.82.0
- libbrotlicommon=1.0.9
- libbrotlidec=1.0.9
- libbrotlienc=1.0.9
- libclang=14.0.6
- libclang13=14.0.6
- libcublas=11.10.3.66
- libcufft=10.7.2.124
- libcufile=1.8.1.2
- libcups=2.4.2
- libcurand=10.3.4.107
- libcurl=8.5.0
- libcusolver=11.4.0.1
- libcusparse=11.7.4.91
- libdeflate=1.17
- libedit=3.1.20230828
- libev=4.33
- libevent=2.1.12
- libfaiss=1.7.4
- libffi=3.4.4
- libgcc-ng=11.2.0
- libgd=2.3.3
- libgfortran-ng=11.2.0
- libgfortran5=11.2.0
- libgomp=11.2.0
- libiconv=1.16
- libllvm14=14.0.6
- libnghttp2=1.57.0
- libnpp=11.7.4.75
- libnvjpeg=11.8.0.2
- libpng=1.6.39
- libpq=12.17
- libprotobuf=3.20.3
- librsvg=2.54.4
- libssh2=1.10.0
- libstdcxx-ng=11.2.0
- libthrift=0.15.0
- libtiff=4.5.1
- libtool=2.4.6
- libuuid=1.41.5
- libwebp=1.3.2
- libwebp-base=1.3.2
- libxcb=1.15
- libxgboost=1.7.3
- libxkbcommon=1.0.1
- libxml2=2.10.4
- lightgbm=4.1.0
- lightning-utilities=0.9.0
- llvm-openmp=14.0.6
- lz4-c=1.9.4
- markupsafe=2.1.3
- matplotlib=3.8.0
- matplotlib-base=3.8.0
- minio=7.1.0
- mkl=2021.4.0
- mkl-service=2.4.0
- mkl_fft=1.3.1
- mkl_random=1.2.2
- mpc=1.1.0
- mpfr=4.0.2
- mpmath=1.3.0
- munkres=1.1.4
- mysql=5.7.24
- ncurses=6.4
- networkx=3.1
- ninja=1.10.2
- ninja-base=1.10.2
- nspr=4.35
- nss=3.89.1
- numexpr=2.8.4
- numpy-base=1.24.3
- openjpeg=2.4.0
- openml=0.12.2
- openpyxl=3.0.10
- openssl=3.0.15
- orc=1.7.4
- packaging=23.1
- pandas=2.1.4
- pango=1.50.7
- pcre=8.45
- pigz=2.6
- pillow=10.0.1
- pip=23.3.1
- pixman=0.40.0
- platformdirs=3.10.0
- plotly=5.9.0
- pluggy=1.0.0
- ply=3.11
- pooch=1.7.0
- poppler=22.12.0
- poppler-data=0.4.11
- psutil=5.9.0
- py-xgboost=1.7.3
- pyarrow=11.0.0
- pycparser=2.21
- pyopenssl=23.2.0
- pyparsing=3.0.9
- pyqt=5.15.10
- pyqt5-sip=12.13.0
- pysocks=1.7.1
- pytest=7.4.0
- pytest-cov=4.1.0
- python=3.10.13
- python-dateutil=2.8.2
- python-graphviz=0.20.1
- python-tzdata=2023.3
- pytorch=2.0.1
- pytorch-cuda=11.7
- pytorch-lightning=2.0.3
- pytorch-mutex=1.0
- pytz=2023.3.post1
- pyyaml=6.0.1
- qt-main=5.15.2
- re2=2022.04.01
- readline=8.2
- requests=2.31.0
- scikit-learn=1.3.0
- scipy=1.10.1
- setuptools=68.2.2
- sip=6.7.12
- six=1.16.0
- snappy=1.1.10
- sqlite=3.41.2
- swig=4.0.2
- sympy=1.12
- tbb=2021.8.0
- tenacity=8.2.2
- threadpoolctl=2.2.0
- tk=8.6.12
- toml=0.10.2
- tomli=2.0.1
- torchmetrics=1.4.0.post0
- torchtriton=2.0.0
- tornado=6.3.3
- tqdm=4.65.0
- typing-extensions=4.9.0
- typing_extensions=4.9.0
- tzdata=2023d
- urllib3=1.26.16
- utf8proc=2.6.1
- wheel=0.41.2
- xgboost=1.7.3
- xlrd=2.0.1
- xmltodict=0.13.0
- xz=5.4.5
- yaml=0.2.5
- zlib=1.2.13
- zstd=1.5.5
- pip:
- adjusttext==1.0.4
- aiosignal==1.3.1
- annotated-types==0.6.0
- attrs==23.2.0
- autorank==1.1.3
- babel==2.14.0
- baycomp==1.0.3
- blis==0.7.11
- catalogue==2.0.10
- cir-model==0.2.0
- click==8.1.7
- cloudpathlib==0.16.0
- cloudpickle==3.0.0
- colorama==0.4.6
- confection==0.1.4
- configspace==0.7.1
- cramjam==2.8.1
- cymem==2.0.8
- dask==2024.1.1
- dask-jobqueue==0.8.2
- distributed==2024.1.1
- einops==0.7.0
- emcee==3.1.4
- fastparquet==2023.10.1
- fire==0.5.0
- frozenlist==1.4.1
- gensim==4.3.2
- ghp-import==2.1.0
- griffe==0.39.1
- hyperopt==0.2.7
- importlib-metadata==7.0.1
- imutils==0.5.4
- jsonschema==4.21.1
- jsonschema-specifications==2023.12.1
- kditransform==0.2.0
- langcodes==3.3.0
- llvmlite==0.41.1
- locket==1.0.0
- markdown==3.5.2
- mergedeep==1.3.4
- mkdocs==1.5.3
- mkdocs-autorefs==0.5.0
- mkdocs-material==9.5.6
- mkdocs-material-extensions==1.3.1
- mkdocstrings==0.24.0
- mkdocstrings-python==1.8.0
- more-itertools==10.2.0
- msgpack==1.0.7
- msgpack-numpy==0.4.8
- murmurhash==1.0.10
- numba==0.58.1
- numpy==1.26.4
- nvidia-cublas-cu12==12.1.3.1
- nvidia-cuda-cupti-cu12==12.1.105
- nvidia-cuda-nvrtc-cu12==12.1.105
- nvidia-cuda-runtime-cu12==12.1.105
- nvidia-cudnn-cu12==8.9.2.26
- nvidia-cufft-cu12==11.0.2.54
- nvidia-curand-cu12==10.3.2.106
- nvidia-cusolver-cu12==11.4.5.107
- nvidia-cusparse-cu12==12.1.0.106
- nvidia-nccl-cu12==2.18.1
- nvidia-nvjitlink-cu12==12.3.101
- nvidia-nvtx-cu12==12.1.105
- opencv-contrib-python==4.9.0.80
- paginate==0.5.6
- partd==1.4.1
- pathspec==0.12.1
- patool==2.1.1
- patsy==0.5.6
- preshed==3.0.9
- protobuf==4.25.2
- py4j==0.10.9.7
- pydantic==2.5.3
- pydantic-core==2.14.6
- pygments==2.17.2
- pymdown-extensions==10.7
- pynisher==1.0.10
- pynvml==11.5.0
- pyrfr==0.9.0
- pytorch-widedeep==1.4.0
- pyyaml-env-tag==0.1
- ray==2.9.1
- referencing==0.32.1
- regex==2023.12.25
- rpds-py==0.17.1
- rtdl-revisiting-models==0.0.2
- seaborn==0.13.2
- skorch==0.15.0
- smac==2.0.2
- smart-open==6.4.0
- sortedcontainers==2.4.0
- spacy==3.7.2
- spacy-legacy==3.0.12
- spacy-loggers==1.0.5
- srsly==2.4.8
- statsmodels==0.14.3
- tabulate==0.9.0
- tblib==3.0.0
- termcolor==2.4.0
- thinc==8.2.2
- toolz==0.12.1
- torch==2.1.2
- torchvision==0.16.2
- triton==2.1.0
- tueplots==0.0.13
- typer==0.9.0
- venn-abers==1.4.1
- wasabi==1.1.2
- watchdog==3.0.0
- weasel==0.3.4
- wrapt==1.16.0
- zict==3.0.0
- zipp==3.17.0
================================================
FILE: original_requirements/conda_env_2025_01_15.yml
================================================
name: probclass
channels:
- pytorch
- nvidia
- defaults
dependencies:
- _libgcc_mutex=0.1
- _openmp_mutex=5.1
- blas=1.0
- brotli-python=1.0.9
- bzip2=1.0.8
- ca-certificates=2024.12.31
- certifi=2024.12.14
- charset-normalizer=3.3.2
- cuda-cudart=11.8.89
- cuda-cupti=11.8.87
- cuda-libraries=11.8.0
- cuda-nvrtc=11.8.89
- cuda-nvtx=11.8.86
- cuda-runtime=11.8.0
- cuda-version=12.6
- expat=2.6.4
- ffmpeg=4.3
- filelock=3.13.1
- freetype=2.12.1
- giflib=5.2.2
- gmp=6.2.1
- gnutls=3.6.15
- idna=3.7
- intel-openmp=2023.1.0
- jinja2=3.1.4
- jpeg=9e
- lame=3.100
- lcms2=2.16
- ld_impl_linux-64=2.40
- lerc=4.0.0
- libcublas=11.11.3.6
- libcufft=10.9.0.58
- libcufile=1.11.1.6
- libcurand=10.3.7.77
- libcusolver=11.4.1.48
- libcusparse=11.7.5.86
- libdeflate=1.22
- libffi=3.4.4
- libgcc-ng=11.2.0
- libgomp=11.2.0
- libiconv=1.16
- libidn2=2.3.4
- libjpeg-turbo=2.0.0
- libnpp=11.8.0.86
- libnvjpeg=11.9.0.86
- libpng=1.6.39
- libstdcxx-ng=11.2.0
- libtasn1=4.19.0
- libtiff=4.5.1
- libunistring=0.9.10
- libuuid=1.41.5
- libwebp=1.3.2
- libwebp-base=1.3.2
- llvm-openmp=14.0.6
- lz4-c=1.9.4
- markupsafe=2.1.3
- mkl=2023.1.0
- mkl-service=2.4.0
- mkl_fft=1.3.11
- mkl_random=1.2.8
- mpmath=1.3.0
- ncurses=6.4
- nettle=3.7.3
- networkx=3.2.1
- openh264=2.1.1
- openjpeg=2.5.2
- openssl=3.0.15
- pillow=11.0.0
- pip=24.2
- pysocks=1.7.1
- python=3.12.8
- pytorch=2.5.1
- pytorch-cuda=11.8
- pytorch-mutex=1.0
- pyyaml=6.0.2
- readline=8.2
- requests=2.32.3
- setuptools=72.1.0
- sqlite=3.45.3
- tbb=2021.8.0
- tk=8.6.14
- torchtriton=3.1.0
- torchvision=0.20.1
- typing_extensions=4.12.2
- urllib3=2.2.3
- wheel=0.44.0
- xz=5.4.6
- yaml=0.2.5
- zlib=1.2.13
- zstd=1.5.6
- pip:
- absl-py==2.1.0
- adjusttext==1.3.0
- aiohappyeyeballs==2.4.4
- aiohttp==3.11.11
- aiosignal==1.3.2
- alabaster==1.0.0
- argon2-cffi==23.1.0
- argon2-cffi-bindings==21.2.0
- attrs==24.3.0
- autorank==1.2.1
- babel==2.16.0
- baycomp==1.0.3
- catboost==1.2.7
- cffi==1.17.1
- cir-model==0.2.0
- click==8.1.8
- cloudpickle==3.1.0
- contourpy==1.3.1
- coverage==7.6.10
- cycler==0.12.1
- dask==2024.12.1
- dask-expr==1.1.21
- deprecation==2.1.0
- dill==0.3.9
- docutils==0.21.2
- et-xmlfile==2.0.0
- fire==0.7.0
- fonttools==4.55.3
- frozenlist==1.5.0
- fsspec==2024.12.0
- gpytorch==1.13
- grpcio==1.69.0
- imagesize==1.4.1
- iniconfig==2.0.0
- jaxtyping==0.2.19
- joblib==1.4.2
- jsonschema==4.23.0
- jsonschema-specifications==2024.10.1
- kiwisolver==1.4.8
- liac-arff==2.5.0
- lightgbm==4.5.0
- lightning-utilities==0.11.9
- linear-operator==0.5.3
- locket==1.0.0
- markdown==3.7
- markdown-it-py==3.0.0
- matplotlib==3.7.5
- mdit-py-plugins==0.4.2
- mdurl==0.1.2
- minio==7.2.14
- msgpack==1.1.0
- msgpack-numpy==0.4.8
- multidict==6.1.0
- myst-parser==4.0.0
- netcal==1.3.6
- numpy==1.26.4
- nvidia-ml-py==12.560.30
- nvidia-nccl-cu12==2.24.3
- openml==0.15.0
- openpyxl==3.1.5
- opt-einsum==3.4.0
- packaging==24.2
- pandas==2.2.3
- partd==1.4.2
- patool==3.1.0
- patsy==1.0.1
- plotly==5.24.1
- pluggy==1.5.0
- probmetrics==0.0.1
- propcache==0.2.1
- protobuf==5.29.3
- psutil==6.1.1
- pyarrow==18.1.0
- pycparser==2.22
- pycryptodome==3.21.0
- pygments==2.19.1
- pynvml==12.0.0
- pyparsing==3.2.1
- pyro-api==0.1.2
- pyro-ppl==1.9.1
- pytabkit==1.1.3
- pytest==8.3.4
- pytest-cov==6.0.0
- python-dateutil==2.9.0.post0
- python-graphviz==0.20.3
- pytorch-lightning==2.5.0.post0
- pytz==2024.2
- ray==2.40.0
- referencing==0.35.1
- relplot==1.0
- rpds-py==0.22.3
- scikit-learn==1.5.2
- scipy==1.15.1
- seaborn==0.13.2
- six==1.17.0
- skorch==1.1.0
- snowballstemmer==2.2.0
- sphinx==8.1.3
- sphinx-rtd-theme==3.0.2
- sphinxcontrib-applehelp==2.0.0
- sphinxcontrib-devhelp==2.0.0
- sphinxcontrib-htmlhelp==2.1.0
- sphinxcontrib-jquery==4.1
- sphinxcontrib-jsmath==1.0.1
- sphinxcontrib-qthelp==2.0.0
- sphinxcontrib-serializinghtml==2.0.0
- statsmodels==0.14.4
- swig==4.3.0
- sympy==1.13.1
- tabulate==0.9.0
- tenacity==9.0.0
- tensorboard==2.18.0
- tensorboard-data-server==0.7.2
- termcolor==2.5.0
- threadpoolctl==3.5.0
- tikzplotlib==0.9.8
- toolz==1.0.0
- torchmetrics==1.6.1
- tqdm==4.67.1
- tueplots==0.0.17
- typeguard==4.4.1
- tzdata==2024.2
- venn-abers==1.4.6
- werkzeug==3.1.3
- xgboost==2.1.3
- xlrd==2.0.1
- xmltodict==0.14.2
- yarl==1.18.3
================================================
FILE: original_requirements/requirements_2024_06_25.txt
================================================
adjustText==1.0.4
aiohttp==3.9.1
aiosignal==1.3.1
annotated-types==0.6.0
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
asttokens==2.4.1
async-timeout==4.0.3
attrs==23.1.0
autorank==1.1.3
Babel==2.14.0
baycomp==1.0.3
blis==0.7.11
boltons==23.0.0
brotlipy==0.7.0
catalogue==2.0.10
catboost==1.2.2
certifi==2023.7.22
cffi==1.15.1
charset-normalizer==2.0.4
cir-model==0.2.0
click==8.1.7
cloudpathlib==0.16.0
cloudpickle==3.0.0
cmake==3.28.1
colorama==0.4.6
comm==0.2.0
confection==0.1.4
ConfigSpace==0.7.1
contourpy==1.2.0
coverage==7.3.3
cramjam==2.7.0
cryptography==41.0.3
cycler==0.12.1
cymem==2.0.8
dask==2023.12.1
dask-jobqueue==0.8.2
debugpy==1.8.0
decorator==5.1.1
dill==0.3.7
distinctipy==1.3.4
distributed==2023.12.1
einops==0.7.0
emcee==3.1.4
et-xmlfile==1.1.0
exceptiongroup==1.1.3
executing==2.0.1
fastparquet==2023.10.1
filelock==3.13.1
fire==0.5.0
fonttools==4.46.0
frozenlist==1.4.1
fsspec==2023.12.2
future==0.18.3
gensim==4.3.2
ghp-import==2.1.0
graphviz==0.20.1
griffe==0.38.1
hyperopt==0.2.7
idna==3.4
importlib-metadata==6.8.0
importlib-resources==6.1.1
imutils==0.5.4
iniconfig==2.0.0
ipykernel==6.26.0
ipython==8.17.2
jedi==0.19.1
Jinja2==3.1.2
joblib==1.3.2
jsonpatch==1.32
jsonpointer==2.1
jsonschema==4.20.0
jsonschema-specifications==2023.11.2
jupyter_client==8.6.0
jupyter_core==5.5.0
kditransform==0.2.0
kiwisolver==1.4.5
langcodes==3.3.0
liac-arff==2.5.0
lightgbm==4.1.0
lightning-utilities==0.10.0
lit==17.0.6
llvmlite==0.41.1
locket==1.0.0
Markdown==3.5.1
MarkupSafe==2.1.3
matplotlib==3.8.2
matplotlib-inline==0.1.6
mergedeep==1.3.4
minio==7.2.0
mkdocs==1.5.3
mkdocs-autorefs==0.5.0
mkdocs-material==9.5.2
mkdocs-material-extensions==1.3.1
mkdocstrings==0.24.0
mkdocstrings-python==1.7.5
more-itertools==10.1.0
mpmath==1.3.0
msgpack==1.0.7
msgpack-numpy==0.4.8
multidict==6.0.4
murmurhash==1.0.10
nest-asyncio==1.5.8
networkx==3.2.1
numba==0.58.1
numpy==1.26.2
nvidia-cublas-cu11==11.10.3.66
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu11==11.7.101
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu11==8.5.0.96
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu11==10.9.0.58
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu11==10.2.10.91
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu11==11.4.0.1
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu11==11.7.4.91
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu11==2.14.3
nvidia-nccl-cu12==2.18.1
nvidia-nvjitlink-cu12==12.3.101
nvidia-nvtx-cu11==11.7.91
nvidia-nvtx-cu12==12.1.105
opencv-contrib-python==4.8.1.78
openml==0.14.1
openpyxl==3.1.2
packaging==23.1
paginate==0.5.6
pandas==2.1.4
parso==0.8.3
partd==1.4.1
pathspec==0.12.1
patool==1.15.0
patsy==0.5.6
pexpect==4.8.0
Pillow==10.1.0
pkg_resources==0.0.0
platformdirs==3.11.0
plotly==5.18.0
pluggy==1.0.0
preshed==3.0.9
prompt-toolkit==3.0.39
protobuf==4.25.1
psutil==5.9.6
ptyprocess==0.7.0
pure-eval==0.2.2
py4j==0.10.9.7
pyarrow==14.0.2
pycosat==0.6.6
pycparser==2.21
pycryptodome==3.19.0
pydantic==1.10.13
pydantic_core==2.14.5
Pygments==2.16.1
pymdown-extensions==10.5
pynisher==1.0.10
pynvml==11.5.0
pyOpenSSL==23.2.0
pyparsing==3.1.1
pyrfr==0.9.0
PySocks==1.7.1
pytest==7.4.3
pytest-cov==4.1.0
python-dateutil==2.8.2
pytorch-lightning==2.1.2
pytorch-widedeep==1.4.0
pytz==2023.3.post1
PyYAML==6.0.1
pyyaml_env_tag==0.1
pyzmq==25.1.1
ray==2.8.1
referencing==0.32.0
regex==2023.10.3
requests==2.31.0
rpds-py==0.15.2
ruamel.yaml==0.17.21
ruamel.yaml.clib==0.2.6
scikit-learn==1.3.2
scipy==1.11.4
seaborn==0.13.1
six==1.16.0
skorch==0.15.0
smac==2.0.2
smart-open==6.4.0
sortedcontainers==2.4.0
spacy==3.7.2
spacy-legacy==3.0.12
spacy-loggers==1.0.5
srsly==2.4.8
stack-data==0.6.3
statsmodels==0.14.2
sympy==1.12
tabulate==0.9.0
tblib==3.0.0
tenacity==8.2.3
termcolor==2.4.0
textalloc==0.0.7
thinc==8.2.2
threadpoolctl==3.2.0
tomli==2.0.1
toolz==0.12.0
torch==2.0.0
torchmetrics==1.2.1
torchvision==0.16.2
tornado==6.3.3
tqdm==4.65.0
traitlets==5.13.0
triton==2.0.0
tueplots==0.0.12
typer==0.9.0
typing_extensions==4.8.0
tzdata==2023.3
urllib3==1.26.16
venn-abers==1.4.1
wasabi==1.1.2
watchdog==3.0.0
wcwidth==0.2.9
weasel==0.3.4
wrapt==1.16.0
xgboost==2.0.2
xlrd==2.0.1
xmltodict==0.13.0
yarl==1.9.4
zict==3.0.0
zipp==3.17.0
zstandard==0.19.0
================================================
FILE: pyproject.toml
================================================
[build-system]
requires = ["hatchling>=1.26.1"] # https://github.com/pypa/hatch/issues/1818
build-backend = "hatchling.build"
[project]
name = "pytabkit"
dynamic = ["version"]
description = 'ML models + benchmark for tabular data classification and regression'
readme = "README.md"
requires-python = ">=3.9"
license = "Apache-2.0"
keywords = ['tabular data', 'scikit-learn', 'deep learning', 'gradient boosting', 'RealMLP']
authors = [
{ name = "David Holzmüller" }, #, email = "a@b.org" },
{ name = "Léo Grinsztajn" }, #, email = "a@b.org" },
{ name = "Ingo Steinwart" }, #, email = "a@b.org" },
]
classifiers = [
"Development Status :: 4 - Beta",
"Programming Language :: Python",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
"Programming Language :: Python :: 3.13",
"Programming Language :: Python :: Implementation :: CPython",
"Programming Language :: Python :: Implementation :: PyPy",
"License :: OSI Approved :: Apache Software License",
]
dependencies = [
"torch>=2.0",
"numpy>=1.25", # hopefully don't need <2.0 anymore?
"pandas>=2.0",
"scikit-learn>=1.3",
# these could be made optional with lazy imports
# older versions of torchmetrics (<1.2.1) have a bug that makes certain metrics used in TabR slow:
# https://github.com/Lightning-AI/torchmetrics/pull/2184
"torchmetrics>=1.2.1",
# can also install the newer lightning package with more dependencies instead, it will be prioritized
"pytorch_lightning>=2.0",
"psutil>=5.0", # used for getting logical CPU count in the sklearn base and for getting process RAM usage
]
[project.optional-dependencies]
models = [
# use <2.6 for now since it can run into pickling issues with skorch if the skorch version is too old
# see https://github.com/skorch-dev/skorch/commit/be93b7769d61aa22fb928d2e89e258c629bfeaf9
"torch>=2.0",
"xgboost>=2.0",
"catboost>=1.2",
"lightgbm>=4.1",
"xrfm>=0.4.3", # lower bound is not checked extensively
# for rtdl models (MLP, ResNet) but also lightly used in TabR
# note that scikit-learn 1.6 needs skorch >= 1.1.0
"skorch>=0.15",
"dask[dataframe]>=2023", # this is here because of a pandas warning:
# "Dask dataframe query planning is disabled because dask-expr is not installed"
# "packaging", # unclear why this is here?
"tqdm", # for TabM with verbosity >= 1
# more classification metrics and post-hoc calibrators
# not necessary unless these things are actually used
"probmetrics>=0.0.1",
# more powerful pickle, used for file-saving and multiprocessing.
# Unfortunately it can't save certain torch objects
"dill",
# saving objects in yaml/msgpack
# needed if used in utils.serialize() / deserialize()
"pyyaml>=5.0",
"msgpack>=1.0",
# apparently msgpack_numpy fixed some bug in using numpy arrays in msgpack?
# but apparently it can also cause a bug in ray due to its monkey-patching of msgpack functions# in theory we shouldn't be using if for numpy arrays at the moment, not sure why the need for this occurred
# maybe it occurred because we tried to save hyperparameters that were numpy scalars instead of python scalars
# "msgpack_numpy>=0.4",
# this is needed because probmetrics uses unpinned numba,
# but for some reason the github actions CI wants to install 0.53.1
# which is incompatible with Python 3.11 and 3.12.
# 0.59.0 is the lowest version that is compatible with 3.12
"numba>=0.59.0",
]
autogluon = [
"autogluon.tabular[all]>=1.0",
"autogluon.multimodal>=1.0",
]
extra = [
"kditransform>=0.2",
]
hpo = [
"ConfigSpace>=0.7",
"smac>=2.0",
"hyperopt>=0.2",
]
bench = [
"fire", # argparse utilities
"ray>=2.8", # parallelization
"openml>=0.14", # OpenML data download
# ----- UCI import ------
"requests>=2.0",
"patool>=1.0",
"openpyxl>=3.0",
"xlrd>=2.0",
# ----- plotting -----
"matplotlib>=3.0",
"tueplots>=0.0.12",
"seaborn>=0.0.13",
"adjustText>=1.0",
"autorank>=1.0",
]
dev = [
"pytest>=7.0",
"pytest-cov>=4.0",
"sphinx>=7.0",
"myst_parser>=3.0",
"sphinx_rtd_theme>=2.0",
]
[tool.hatch.version]
path = "pytabkit/__about__.py"
[tool.hatch.envs.default]
installer = "uv"
features = ["models", "bench", "autogluon", "extra", "hpo", "dev"]
[tool.hatch.envs.hatch-test]
installer = "uv"
features = ["models", "bench", "dev", "hpo"]
#features = ["models","bench","autogluon","extra","hpo","dev"]
[tool.hatch.build.targets.sdist]
package = ['pytabkit']
only-include = ['pytabkit']
[tool.hatch.build.targets.wheel]
package = ['pytabkit']
only-include = ['pytabkit']
[project.urls]
Documentation = "https://github.com/dholzmueller/pytabkit#readme"
Issues = "https://github.com/dholzmueller/pytabkit/issues"
Source = "https://github.com/dholzmueller/pytabkit"
[tool.hatch.envs.types]
extra-dependencies = [
"mypy>=1.0.0",
]
[tool.hatch.envs.types.scripts]
check = "mypy --install-types --non-interactive {args:pytabkit tests}"
[tool.coverage.run]
source_pkgs = ["pytabkit", "tests"]
branch = true
parallel = true
omit = [
"pytabkit/__about__.py",
]
[tool.coverage.paths]
models = ["pytabkit/models", "*/pytabkit/pytabkit/models"]
bench = ["pytabkit/bench", "*/pytabkit/pytabkit/bench"]
tests = ["tests", "*/pytabkit/tests"]
[tool.coverage.report]
exclude_lines = [
"no cov",
"if __name__ == .__main__.:",
"if TYPE_CHECKING:",
]
================================================
FILE: pytabkit/__about__.py
================================================
# SPDX-FileCopyrightText: 2024-present David Holzmüller
#
# SPDX-License-Identifier: Apache-2.0
__version__ = "1.7.3"
================================================
FILE: pytabkit/__init__.py
================================================
from .models.sklearn.sklearn_interfaces import *
================================================
FILE: pytabkit/bench/__init__.py
================================================
================================================
FILE: pytabkit/bench/alg_wrappers/__init__.py
================================================
================================================
FILE: pytabkit/bench/alg_wrappers/general.py
================================================
from pathlib import Path
from typing import List, Dict, Optional
from pytabkit.bench.data.tasks import TaskPackage, TaskInfo
from pytabkit.bench.run.results import ResultManager
from pytabkit.models.training.logging import Logger
from pytabkit.bench.scheduling.resources import NodeResources
from pytabkit.models.alg_interfaces.base import RequiredResources
from pytabkit.models.training.metrics import Metrics
class AlgWrapper:
"""
Base class for ML methods that can be run in the benchmarking code.
"""
def __init__(self, **config):
"""
Constructor.
:param config: Configuration parameters.
"""
self.config = config
def run(self, task_package: TaskPackage, logger: Logger, assigned_resources: NodeResources,
tmp_folders: List[Path], metrics: Optional[Metrics] = None) -> Dict[str, List[ResultManager]]:
"""
Run the ML method on the given task. Should be overridden in subclasses.
:param task_package: Information about the task to be run.
:param logger: Logger.
:param assigned_resources: Assigned resources (e.g. number of threads).
:param tmp_folders: Temporary folders, one for each train/test split, to save temporary data to.
:return: A dictionary of lists of ResultManager objects.
The dict key is the predict params name, which is used as a suffix for the alg_name,
and each list contains ResultManagers for each train/test split.
"""
raise NotImplementedError()
def get_required_resources(self, task_package: TaskPackage) -> RequiredResources:
"""
Should be overridden in subclasses.
:param task_package: Information about the task that should be executed.
:return: Information about the estimated required resources that will be needed to run this task.
"""
raise NotImplementedError()
def get_max_n_vectorized(self, task_info: TaskInfo) -> int:
"""
Returns 1 by default, should be overridden in subclasses if they benefit from vectorization.
:param task_info: Information about the task that this method should run on.
:return: Maximum number of train/test splits that this method can be run on at once.
"""
return 1
def get_pred_param_names(self, task_package: TaskPackage) -> List[str]:
"""
Return the possible prediction parameter names, used as suffixes for alg names
:param task_package: Task package.
:return: List of the possible names.
"""
raise NotImplementedError()
# want to have:
# - more general / easy ResourceComputation
# - generic thread-allocation parameters for such a ResourceComputation
# that allow to allocate more threads for larger workloads
# - better NodeResources class that supports mps or perhaps a new class that summarizes the allocated resources
# - should the resource estimation be moved to AlgInterface?
# Then, we would need to instantiate an AlgInterface in the wrapper to do the estimation
# - maybe a code that estimates RAM (and time) constants? With fake data sets?
# better ResourceComputation:
# have identical components for CPU and GPU, and maybe also for RAM and time
# components:
# - dataset size
# - factory (model) size
# - RAM for forward (and backward) pass
# - generic calculation (constant, per-tree, per-class, per-sample),
# for the NN we might also need to include the batch size, number of epochs, etc.
# what about the number of threads etc.?
# want to have one per device?
# better NodeResources:
# maybe just have a dict with the devices that are being referred to by the array?
================================================
FILE: pytabkit/bench/alg_wrappers/interface_wrappers.py
================================================
import shutil
from pathlib import Path
from typing import Callable, List, Optional, Dict
import torch
from pytabkit.bench.data.paths import Paths
from pytabkit.models import utils
from pytabkit.models.alg_interfaces.autogluon_model_interfaces import AutoGluonModelAlgInterface
from pytabkit.models.alg_interfaces.catboost_interfaces import CatBoostSubSplitInterface, CatBoostHyperoptAlgInterface, \
CatBoostSklearnSubSplitInterface, RandomParamsCatBoostAlgInterface
from pytabkit.models.alg_interfaces.ensemble_interfaces import PrecomputedPredictionsAlgInterface, \
CaruanaEnsembleAlgInterface, AlgorithmSelectionAlgInterface
from pytabkit.models.alg_interfaces.lightgbm_interfaces import LGBMSubSplitInterface, LGBMHyperoptAlgInterface, \
LGBMSklearnSubSplitInterface, RandomParamsLGBMAlgInterface
from pytabkit.bench.alg_wrappers.general import AlgWrapper
from pytabkit.bench.data.tasks import TaskPackage, TaskInfo
from pytabkit.bench.run.results import ResultManager
from pytabkit.models.alg_interfaces.other_interfaces import RFSubSplitInterface, SklearnMLPSubSplitInterface, \
KANSubSplitInterface, GrandeSubSplitInterface, GBTSubSplitInterface, RandomParamsRFAlgInterface, \
TabPFN2SubSplitInterface, TabICLSubSplitInterface, RandomParamsExtraTreesAlgInterface, RandomParamsKNNAlgInterface, \
ExtraTreesSubSplitInterface, KNNSubSplitInterface, RandomParamsLinearModelAlgInterface, \
LinearModelSubSplitInterface
from pytabkit.bench.scheduling.resources import NodeResources
from pytabkit.models.alg_interfaces.alg_interfaces import AlgInterface, MultiSplitWrapperAlgInterface
from pytabkit.models.alg_interfaces.base import SplitIdxs, RequiredResources
from pytabkit.models.alg_interfaces.rtdl_interfaces import RTDL_MLPSubSplitInterface, ResnetSubSplitInterface, \
FTTransformerSubSplitInterface, RandomParamsResnetAlgInterface, RandomParamsRTDLMLPAlgInterface, \
RandomParamsFTTransformerAlgInterface
from pytabkit.models.alg_interfaces.sub_split_interfaces import SingleSplitWrapperAlgInterface
from pytabkit.models.alg_interfaces.tabm_interface import TabMSubSplitInterface
from pytabkit.models.alg_interfaces.tabr_interface import TabRSubSplitInterface, \
RandomParamsTabRAlgInterface
from pytabkit.models.alg_interfaces.nn_interfaces import NNAlgInterface, RandomParamsNNAlgInterface, \
NNHyperoptAlgInterface
from pytabkit.models.alg_interfaces.xgboost_interfaces import XGBSubSplitInterface, XGBHyperoptAlgInterface, \
XGBSklearnSubSplitInterface, RandomParamsXGBAlgInterface
from pytabkit.models.alg_interfaces.xrfm_interfaces import xRFMSubSplitInterface, RandomParamsxRFMAlgInterface
from pytabkit.models.data.data import TaskType, DictDataset
from pytabkit.models.nn_models.models import PreprocessingFactory
from pytabkit.models.torch_utils import TorchTimer
from pytabkit.models.training.logging import Logger
from pytabkit.models.training.metrics import Metrics
# what is the value of wrappers around AlgInterface?
# - it has a create-function that can create multiple instances,
# and can wrap with MultiSplitAlgInterface and SingleSplitAlgInterface
# - there is some wrapping code in run(), but this could be moved to where the wrapper is used
# - it provides get_max_n_vectorized()
# perhaps we should generalize TreeResourceComputation to also work for NNs?
# But this would require extra functionality for backprop, GPU RAM, etc.
def get_prep_factory(**config):
return config.get('factory', None) or PreprocessingFactory(**config)
class AlgInterfaceWrapper(AlgWrapper):
"""
Base class for wrapping AlgInterface classes for benchmarking.
"""
def __init__(self, create_alg_interface_fn: Optional[Callable[[...], AlgInterface]], **config):
"""
Constructor.
:param create_alg_interface_fn: Function to create an AlgInterface via create_alg_interface_fn(**config).
:param config: Configuration parameters.
"""
super().__init__(**config)
self.create_alg_interface_fn = create_alg_interface_fn
# def _create_alg_interface_impl(self, n_cv: int, n_splits: int, task_type: TaskType) -> AlgInterface:
def _create_alg_interface_impl(self, task_package: TaskPackage) -> AlgInterface:
"""
Factory method to create an AlgInterface.
Should be overridden unless ``create_alg_interface_fn`` has been provided in the constructor.
This method should not be used directly, instead create_alg_interface() should be used.
:param task_package: Task information.
:return: An AlgInterface corresponding to an ML method.
"""
if self.create_alg_interface_fn is not None:
return self.create_alg_interface_fn(**self.config)
else:
raise NotImplementedError()
def create_alg_interface(self, task_package: TaskPackage) -> AlgInterface:
"""
Method to create an AlgInterface.
:param task_package: Task information.
:return: An AlgInterface corresponding to an ML method.
"""
alg_interface = self._create_alg_interface_impl(task_package)
if 'calibration_method' in self.config:
try:
from pytabkit.models.alg_interfaces.calibration import PostHocCalibrationAlgInterface
alg_interface = PostHocCalibrationAlgInterface(alg_interface, **self.config)
except ImportError:
raise ValueError('Calibration methods are not implemented')
if 'quantile_calib_alpha' in self.config:
try:
from pytabkit.models.alg_interfaces.custom_interfaces import QuantileCalibrationAlgInterface
alg_interface = QuantileCalibrationAlgInterface(alg_interface, **self.config)
except ImportError:
raise ValueError('Quantile Calibration methods are not implemented')
return alg_interface
def run(self, task_package: TaskPackage, logger: Logger, assigned_resources: NodeResources,
tmp_folders: List[Path], metrics: Optional[Metrics] = None) -> Dict[str, List[ResultManager]]:
task = task_package.task_info.load_task(task_package.paths)
task_desc = task_package.task_info.task_desc
n_cv = task_package.n_cv
n_refit = task_package.n_refit
interface_resources = assigned_resources.get_interface_resources()
old_torch_n_threads = torch.get_num_threads()
old_torch_n_interop_threads = torch.get_num_interop_threads()
torch.set_num_threads(interface_resources.n_threads)
# don't set this because it can throw
# Error: cannot set number of interop threads after parallel work has started or set_num_interop_threads called
# torch.set_num_interop_threads(interface_resources.n_threads)
ds = task.ds
name = 'alg ' + task_package.alg_name + ' on task ' + str(task_desc)
# return_preds = self.config.get(f'save_y_pred', False)
return_preds = task_package.save_y_pred
if metrics is None:
metrics = Metrics.defaults(ds.tensor_infos['y'].cat_sizes,
val_metric_name=self.config.get('val_metric_name', None))
cv_idxs_list = []
refit_idxs_list = []
n_splits = len(task_package.split_infos)
if n_splits == 1:
logger.log(1,
f'Running on split {task_package.split_infos[0].id} of task {task_package.task_info.task_desc}')
else:
logger.log(1, f'Running on {n_splits} splits of task {task_package.task_info.task_desc}')
for split_id, split_info in enumerate(task_package.split_infos):
# this will usually be called with len(task_package.split_infos) == 1, but do a loop for safety
test_split = split_info.splitter.split_ds(task.ds)
trainval_idxs, test_idxs = test_split.idxs[0], test_split.idxs[1]
trainval_ds = test_split.get_sub_ds(0)
cv_sub_splits = split_info.get_sub_splits(trainval_ds, n_splits=n_cv, is_cv=True)
cv_train_idxs = []
cv_val_idxs = []
for sub_idx, sub_split in enumerate(cv_sub_splits):
cv_train_idxs.append(trainval_idxs[sub_split.idxs[0]])
cv_val_idxs.append(trainval_idxs[sub_split.idxs[1]])
cv_train_idxs = torch.stack(cv_train_idxs, dim=0)
cv_val_idxs = torch.stack(cv_val_idxs, dim=0)
cv_alg_seeds = [split_info.get_sub_seed(split_idx, is_cv=True) for split_idx in range(n_cv)]
cv_idxs_list.append(SplitIdxs(cv_train_idxs, cv_val_idxs, test_idxs, split_seed=split_info.alg_seed,
sub_split_seeds=cv_alg_seeds, split_id=split_id))
if n_refit > 0:
refit_train_idxs = torch.stack([trainval_idxs] * n_refit, dim=0)
refit_alg_seeds = [split_info.get_sub_seed(split_idx, is_cv=False) for split_idx in range(n_refit)]
refit_idxs_list.append(SplitIdxs(refit_train_idxs, None, test_idxs, split_seed=split_info.alg_seed,
sub_split_seeds=refit_alg_seeds, split_id=split_id))
if task_package.rerun:
for tmp_folder in tmp_folders:
if utils.existsDir(tmp_folder):
# delete the folder such that the method doesn't load old results from the tmp folder
shutil.rmtree(tmp_folder)
cv_tmp_folders = [tmp_folder / 'cv' for tmp_folder in tmp_folders]
refit_tmp_folders = [tmp_folder / 'refit' for tmp_folder in tmp_folders]
cv_alg_interface = self.create_alg_interface(task_package)
pred_param_names = list(cv_alg_interface.get_available_predict_params().keys())
if n_refit > 0 and len(pred_param_names) > 1:
raise NotImplementedError('Refitting with multiple prediction parameters is currently not implemented')
rms = {name: [ResultManager() for _ in task_package.split_infos] for name in pred_param_names}
with TorchTimer() as cv_fit_timer:
cv_alg_interface.fit(ds, cv_idxs_list, interface_resources, logger, cv_tmp_folders, name)
for pred_param_name in pred_param_names:
cv_alg_interface.set_current_predict_params(pred_param_name)
with TorchTimer() as cv_eval_timer:
cv_results_list = cv_alg_interface.eval(ds, cv_idxs_list, metrics, return_preds)
for rm, cv_results in zip(rms[pred_param_name], cv_results_list):
rm.add_results(is_cv=True, results_dict=cv_results.get_dict() |
dict(fit_time_s=cv_fit_timer.elapsed,
eval_time_s=cv_eval_timer.elapsed))
if n_refit > 0:
refit_alg_interface = cv_alg_interface.get_refit_interface(n_refit)
with TorchTimer() as refit_fit_timer:
refit_alg_interface.fit(ds, refit_idxs_list, interface_resources, logger, refit_tmp_folders, name)
with TorchTimer() as refit_eval_timer:
refit_results_list = refit_alg_interface.eval(ds, refit_idxs_list, metrics, return_preds)
for rm, refit_results in zip(rms[pred_param_name], refit_results_list):
rm.add_results(is_cv=False,
results_dict=refit_results.get_dict() |
dict(fit_time_s=refit_fit_timer.elapsed,
eval_time_s=refit_eval_timer.elapsed))
torch.set_num_threads(old_torch_n_threads)
# torch.set_num_interop_threads(old_torch_n_interop_threads)
return rms
def get_required_resources(self, task_package: TaskPackage) -> RequiredResources:
ds = DictDataset(tensors=None, tensor_infos=task_package.task_info.tensor_infos,
device='cpu', n_samples=task_package.task_info.n_samples)
alg_interface = self.create_alg_interface(task_package)
n_train, n_val = task_package.split_infos[0].get_train_and_val_size(n_samples=task_package.task_info.n_samples,
n_splits=len(task_package.split_infos),
is_cv=True)
# n_train = split_info.get_sub_splits(trainval_ds, n_splits=n_cv, is_cv=True)
return alg_interface.get_required_resources(ds=ds, n_cv=task_package.n_cv, n_refit=task_package.n_refit,
n_splits=len(task_package.split_infos),
split_seeds=[si.alg_seed for si in task_package.split_infos],
n_train=n_train)
def get_pred_param_names(self, task_package: TaskPackage) -> List[str]:
return list(self.create_alg_interface(task_package).get_available_predict_params().keys())
class LoadResultsWrapper(AlgInterfaceWrapper):
def __init__(self, alg_name: str, **config):
super().__init__(create_alg_interface_fn=None, **config)
self.alg_name = alg_name
def _create_alg_interface_impl(self, task_package: TaskPackage) -> AlgInterface:
assert len(task_package.split_infos) == 1 # only support single-split
paths = self.config.get('paths', Paths.from_env_variables())
task_info = task_package.task_info
split_info = task_package.split_infos[0]
split_id = split_info.id
results_path = paths.results_alg_task_split(task_desc=task_info.task_desc, alg_name=self.alg_name,
n_cv=task_package.n_cv, split_type=split_info.split_type,
split_id=split_id)
rm = ResultManager.load(results_path)
y_preds_cv = rm.y_preds_cv if rm.y_preds_cv is not None else rm.other_dict['cv']['y_preds']
y_preds_cv = torch.as_tensor(y_preds_cv, dtype=torch.float32)
y_preds_refit = None
if rm.y_preds_refit is not None:
y_preds_refit = torch.as_tensor(rm.y_preds_refit, dtype=torch.float32)
elif 'refit' in rm.other_dict:
y_preds_refit = torch.as_tensor(rm.other_dict['refit']['y_preds'], dtype=torch.float32)
fit_params_cv = rm.other_dict['cv']['fit_params']
fit_params_refit = None if 'refit' not in rm.other_dict else rm.other_dict['refit']['fit_params']
return PrecomputedPredictionsAlgInterface(y_preds_cv=y_preds_cv, y_preds_refit=y_preds_refit,
fit_params_cv=fit_params_cv, fit_params_refit=fit_params_refit)
def get_required_resources(self, task_package: TaskPackage) -> RequiredResources:
# do this here such that we don't have to load the results for computing the required resources
return RequiredResources(time_s=1e-5 * task_package.task_info.n_samples, cpu_ram_gb=1.5, n_threads=1)
class CaruanaEnsembleWrapper(AlgInterfaceWrapper):
def __init__(self, sub_wrappers: List[AlgInterfaceWrapper], **config):
super().__init__(create_alg_interface_fn=None, **config)
self.sub_wrappers = sub_wrappers
def _create_alg_interface_impl(self, task_package: TaskPackage) -> AlgInterface:
single_split_alg_interfaces = []
for split_info in task_package.split_infos:
single_alg_interfaces = []
for sub_wrapper in self.sub_wrappers:
sub_tp = TaskPackage(task_info=task_package.task_info, split_infos=[split_info], n_cv=task_package.n_cv,
n_refit=task_package.n_refit, paths=task_package.paths, rerun=task_package.rerun,
alg_name=task_package.alg_name, save_y_pred=task_package.save_y_pred)
single_alg_interfaces.append(sub_wrapper.create_alg_interface(sub_tp))
single_split_alg_interfaces.append(CaruanaEnsembleAlgInterface(single_alg_interfaces, **self.config))
return MultiSplitWrapperAlgInterface(single_split_alg_interfaces)
def get_required_resources(self, task_package: TaskPackage) -> RequiredResources:
single_resources = [sub_wrapper.get_required_resources(task_package)
for sub_wrapper in self.sub_wrappers]
return RequiredResources.combine_sequential(single_resources)
class AlgorithmSelectionWrapper(AlgInterfaceWrapper):
def __init__(self, sub_wrappers: List[AlgInterfaceWrapper], **config):
super().__init__(create_alg_interface_fn=None, **config)
self.sub_wrappers = sub_wrappers
def _create_alg_interface_impl(self, task_package: TaskPackage) -> AlgInterface:
single_split_alg_interfaces = []
for split_info in task_package.split_infos:
single_alg_interfaces = []
for sub_wrapper in self.sub_wrappers:
sub_tp = TaskPackage(task_info=task_package.task_info, split_infos=[split_info], n_cv=task_package.n_cv,
n_refit=task_package.n_refit, paths=task_package.paths, rerun=task_package.rerun,
alg_name=task_package.alg_name, save_y_pred=task_package.save_y_pred)
single_alg_interfaces.append(sub_wrapper.create_alg_interface(sub_tp))
single_split_alg_interfaces.append(AlgorithmSelectionAlgInterface(single_alg_interfaces, **self.config))
return MultiSplitWrapperAlgInterface(single_split_alg_interfaces)
def get_required_resources(self, task_package: TaskPackage) -> RequiredResources:
# too pessimistic for refit...
single_resources = [sub_wrapper.get_required_resources(task_package)
for sub_wrapper in self.sub_wrappers]
return RequiredResources.combine_sequential(single_resources)
class MultiSplitAlgInterfaceWrapper(AlgInterfaceWrapper):
def __init__(self, **config):
super().__init__(create_alg_interface_fn=None, **config)
def create_single_alg_interface(self, n_cv: int, task_type: TaskType) \
-> AlgInterface:
raise NotImplementedError()
def _create_alg_interface_impl(self, task_package: TaskPackage) -> AlgInterface:
n_cv = task_package.n_cv
task_type = task_package.task_info.task_type
n_splits = len(task_package.split_infos)
return MultiSplitWrapperAlgInterface(
single_split_interfaces=[self.create_single_alg_interface(n_cv, task_type)
for i in range(n_splits)], **self.config)
class SubSplitInterfaceWrapper(MultiSplitAlgInterfaceWrapper):
def __init__(self, create_sub_split_learner_fn: Optional[Callable[[...], AlgInterface]] = None, **config):
super().__init__(**config)
self.create_sub_split_learner_fn = create_sub_split_learner_fn
def create_sub_split_interface(self, task_type: TaskType) -> AlgInterface:
if self.create_sub_split_learner_fn is not None:
return self.create_sub_split_learner_fn(**self.config)
raise NotImplementedError()
def create_single_alg_interface(self, n_cv: int, task_type: TaskType) \
-> AlgInterface:
return SingleSplitWrapperAlgInterface([self.create_sub_split_interface(task_type)
for i in range(n_cv)], **self.config)
class NNInterfaceWrapper(AlgInterfaceWrapper):
def __init__(self, **config):
super().__init__(NNAlgInterface, **config)
def get_max_n_vectorized(self, task_info: TaskInfo) -> int:
ds = DictDataset(tensors=None, tensor_infos=task_info.tensor_infos, device='cpu',
n_samples=task_info.n_samples)
max_ram_gb = 8.0
max_n_vectorized = self.config.get('max_n_vectorized', 50)
alg_interface = NNAlgInterface(**self.config)
while max_n_vectorized > 1:
required_resources = alg_interface.get_required_resources(ds, n_cv=1, n_refit=0, n_splits=max_n_vectorized,
split_seeds=[0] * max_n_vectorized,
n_train=task_info.n_samples)
if required_resources.gpu_ram_gb <= max_ram_gb and required_resources.cpu_ram_gb <= max_ram_gb:
return max_n_vectorized
max_n_vectorized -= 1
return 1
class NNHyperoptInterfaceWrapper(AlgInterfaceWrapper):
def __init__(self, **config):
super().__init__(NNHyperoptAlgInterface, **config)
def get_max_n_vectorized(self, task_info: TaskInfo) -> int:
ds = DictDataset(tensors=None, tensor_infos=task_info.tensor_infos, device='cpu',
n_samples=task_info.n_samples)
max_ram_gb = 8.0
max_n_vectorized = self.config.get('max_n_vectorized', 50)
alg_interface = NNHyperoptAlgInterface(**self.config)
while max_n_vectorized > 1:
required_resources = alg_interface.get_required_resources(ds, n_cv=1, n_refit=0, n_splits=max_n_vectorized,
split_seeds=[0] * max_n_vectorized,
n_train=task_info.n_samples)
if required_resources.gpu_ram_gb <= max_ram_gb and required_resources.cpu_ram_gb <= max_ram_gb:
return max_n_vectorized
max_n_vectorized -= 1
return 1
class RandomParamsNNInterfaceWrapper(AlgInterfaceWrapper):
def __init__(self, model_idx: int, **config):
# model_idx should be the random search iteration (i.e. start from zero)
super().__init__(RandomParamsNNAlgInterface, model_idx=model_idx, **config)
class LGBMSklearnInterfaceWrapper(SubSplitInterfaceWrapper):
def create_sub_split_interface(self, task_type: TaskType):
return LGBMSklearnSubSplitInterface(**self.config)
class LGBMInterfaceWrapper(SubSplitInterfaceWrapper):
def create_sub_split_interface(self, task_type: TaskType) -> AlgInterface:
return LGBMSubSplitInterface(**self.config)
class LGBMHyperoptInterfaceWrapper(MultiSplitAlgInterfaceWrapper):
def create_single_alg_interface(self, n_cv: int, task_type: TaskType) \
-> AlgInterface:
return LGBMHyperoptAlgInterface(**self.config)
class RandomParamsLGBMInterfaceWrapper(MultiSplitAlgInterfaceWrapper):
def create_single_alg_interface(self, n_cv: int, task_type: TaskType) \
-> AlgInterface:
return RandomParamsLGBMAlgInterface(**self.config)
class XGBSklearnInterfaceWrapper(SubSplitInterfaceWrapper):
def create_sub_split_interface(self, task_type: TaskType) -> AlgInterface:
return XGBSklearnSubSplitInterface(**self.config)
class XGBInterfaceWrapper(SubSplitInterfaceWrapper):
def create_sub_split_interface(self, task_type: TaskType) -> AlgInterface:
return XGBSubSplitInterface(**self.config)
class RandomParamsXGBInterfaceWrapper(MultiSplitAlgInterfaceWrapper):
def create_single_alg_interface(self, n_cv: int, task_type: TaskType) \
-> AlgInterface:
return RandomParamsXGBAlgInterface(**self.config)
class XGBHyperoptInterfaceWrapper(MultiSplitAlgInterfaceWrapper):
def create_single_alg_interface(self, n_cv: int, task_type: TaskType) \
-> AlgInterface:
return XGBHyperoptAlgInterface(**self.config)
class CatBoostSklearnInterfaceWrapper(SubSplitInterfaceWrapper):
def create_sub_split_interface(self, task_type: TaskType) -> AlgInterface:
return CatBoostSklearnSubSplitInterface(**self.config)
class CatBoostInterfaceWrapper(SubSplitInterfaceWrapper):
def create_sub_split_interface(self, task_type: TaskType) -> AlgInterface:
return CatBoostSubSplitInterface(**self.config)
class CatBoostHyperoptInterfaceWrapper(MultiSplitAlgInterfaceWrapper):
def create_single_alg_interface(self, n_cv: int, task_type: TaskType) \
-> AlgInterface:
return CatBoostHyperoptAlgInterface(**self.config)
class RandomParamsCatBoostInterfaceWrapper(MultiSplitAlgInterfaceWrapper):
def create_single_alg_interface(self, n_cv: int, task_type: TaskType) \
-> AlgInterface:
return RandomParamsCatBoostAlgInterface(**self.config)
class RFInterfaceWrapper(SubSplitInterfaceWrapper):
def create_sub_split_interface(self, task_type: TaskType) -> AlgInterface:
return RFSubSplitInterface(**self.config)
class ExtraTreesInterfaceWrapper(SubSplitInterfaceWrapper):
def create_sub_split_interface(self, task_type: TaskType) -> AlgInterface:
return ExtraTreesSubSplitInterface(**self.config)
class KNNInterfaceWrapper(SubSplitInterfaceWrapper):
def create_sub_split_interface(self, task_type: TaskType) -> AlgInterface:
return KNNSubSplitInterface(**self.config)
class LinearModelInterfaceWrapper(SubSplitInterfaceWrapper):
def create_sub_split_interface(self, task_type: TaskType) -> AlgInterface:
return LinearModelSubSplitInterface(**self.config)
class GBTInterfaceWrapper(SubSplitInterfaceWrapper):
def create_sub_split_interface(self, task_type: TaskType) -> AlgInterface:
return GBTSubSplitInterface(**self.config)
class SklearnMLPInterfaceWrapper(SubSplitInterfaceWrapper):
def create_sub_split_interface(self, task_type: TaskType) -> AlgInterface:
return SklearnMLPSubSplitInterface(**self.config)
class KANInterfaceWrapper(SubSplitInterfaceWrapper):
def create_sub_split_interface(self, task_type: TaskType) -> AlgInterface:
return KANSubSplitInterface(**self.config)
class GrandeInterfaceWrapper(SubSplitInterfaceWrapper):
def create_sub_split_interface(self, task_type: TaskType) -> AlgInterface:
return GrandeSubSplitInterface(**self.config)
class TabPFN2InterfaceWrapper(SubSplitInterfaceWrapper):
def create_sub_split_interface(self, task_type: TaskType) -> AlgInterface:
return TabPFN2SubSplitInterface(**self.config)
class TabICLInterfaceWrapper(SubSplitInterfaceWrapper):
def create_sub_split_interface(self, task_type: TaskType) -> AlgInterface:
return TabICLSubSplitInterface(**self.config)
class MLPRTDLInterfaceWrapper(SubSplitInterfaceWrapper):
def create_sub_split_interface(self, task_type: TaskType) -> AlgInterface:
return RTDL_MLPSubSplitInterface(**self.config)
class ResNetRTDLInterfaceWrapper(SubSplitInterfaceWrapper):
def create_sub_split_interface(self, task_type: TaskType) -> AlgInterface:
return ResnetSubSplitInterface(**self.config)
class FTTransformerInterfaceWrapper(SubSplitInterfaceWrapper):
def create_sub_split_interface(self, task_type: TaskType) -> AlgInterface:
return FTTransformerSubSplitInterface(**self.config)
class TabRInterfaceWrapper(SubSplitInterfaceWrapper):
def create_sub_split_interface(self, task_type: TaskType) -> AlgInterface:
return TabRSubSplitInterface(**self.config)
class TabMInterfaceWrapper(SubSplitInterfaceWrapper):
def create_sub_split_interface(self, task_type: TaskType) -> AlgInterface:
return TabMSubSplitInterface(**self.config)
class RandomParamsResnetInterfaceWrapper(AlgInterfaceWrapper):
def __init__(self, model_idx: int, **config):
# model_idx should be the random search iteration (i.e. start from zero)
super().__init__(RandomParamsResnetAlgInterface, model_idx=model_idx, **config)
class RandomParamsRTDLMLPInterfaceWrapper(AlgInterfaceWrapper):
def __init__(self, model_idx: int, **config):
# model_idx should be the random search iteration (i.e. start from zero)
super().__init__(RandomParamsRTDLMLPAlgInterface, model_idx=model_idx, **config)
class RandomParamsFTTransformerInterfaceWrapper(AlgInterfaceWrapper):
def __init__(self, model_idx: int, **config):
# model_idx should be the random search iteration (i.e. start from zero)
super().__init__(RandomParamsFTTransformerAlgInterface, model_idx=model_idx, **config)
class AutoGluonModelInterfaceWrapper(AlgInterfaceWrapper):
def __init__(self, **config):
# model_idx should be the random search iteration (i.e. start from zero)
super().__init__(AutoGluonModelAlgInterface, **config)
class RandomParamsTabRInterfaceWrapper(SubSplitInterfaceWrapper):
def create_single_alg_interface(self, n_cv: int, task_type: TaskType) \
-> AlgInterface:
return RandomParamsTabRAlgInterface(**self.config)
class RandomParamsRFInterfaceWrapper(AlgInterfaceWrapper):
def __init__(self, model_idx: int, **config):
# model_idx should be the random search iteration (i.e. start from zero)
super().__init__(RandomParamsRFAlgInterface, model_idx=model_idx, **config)
class RandomParamsExtraTreesInterfaceWrapper(AlgInterfaceWrapper):
def __init__(self, model_idx: int, **config):
# model_idx should be the random search iteration (i.e. start from zero)
super().__init__(RandomParamsExtraTreesAlgInterface, model_idx=model_idx, **config)
class RandomParamsKNNInterfaceWrapper(AlgInterfaceWrapper):
def __init__(self, model_idx: int, **config):
# model_idx should be the random search iteration (i.e. start from zero)
super().__init__(RandomParamsKNNAlgInterface, model_idx=model_idx, **config)
class RandomParamsLinearModelInterfaceWrapper(AlgInterfaceWrapper):
def __init__(self, model_idx: int, **config):
# model_idx should be the random search iteration (i.e. start from zero)
super().__init__(RandomParamsLinearModelAlgInterface, model_idx=model_idx, **config)
class xRFMInterfaceWrapper(SubSplitInterfaceWrapper):
def create_sub_split_interface(self, task_type: TaskType) -> AlgInterface:
return xRFMSubSplitInterface(**self.config)
class RandomParamsxRFMInterfaceWrapper(MultiSplitAlgInterfaceWrapper):
def create_single_alg_interface(self, n_cv: int, task_type: TaskType) \
-> AlgInterface:
return RandomParamsxRFMAlgInterface(**self.config)
================================================
FILE: pytabkit/bench/data/__init__.py
================================================
================================================
FILE: pytabkit/bench/data/common.py
================================================
class TaskSource:
UCI_BIN_CLASS = 'uci-bin-class'
UCI_MULTI_CLASS = 'uci-multi-class'
UCI_REGRESSION = 'uci-reg'
OPENML_CLASS = 'openml-class'
OPENML_CLASS_BIN_EXTRA = 'openml-class-bin-extra'
OPENML_REGRESSION = 'openml-reg'
AUTOML_CLASS_SMALL = 'automl-class-small'
TABARENA_CLASS = 'tabarena-class'
TABARENA_REG = 'tabarena-reg'
CUSTOM = 'custom'
class SplitType:
RANDOM = 'random-split'
DEFAULT = 'default-split'
================================================
FILE: pytabkit/bench/data/get_uci.py
================================================
#!/usr/bin/python3
import os
import shutil
import ssl
import pandas
from pytabkit.bench.data.paths import Paths
from pytabkit.bench.data.uci_file_ops import prepare_new_data_set_group_id, download_and_save, replace_chars_in_file, \
load_raw_data, remove_columns, save_data_to_file, unzip_raw_data, concat_files, remove_files, UCIVars, \
move_label_in_front, remove_rows_with_label, ungz_raw_data, load_mixed_raw_data, \
auto_replace_categories_in_mixed_data, write_mixed_raw_data, replace_ordinals_in_mixed_data, \
replace_isodate_by_day_in_mixed_data, replace_circulars_in_mixed_data, get_categories_in_mixed_data, \
replace_time_by_seconds_in_mixed_data, unrar_raw_data, unarff_raw_data, un_z_raw_data, untar_raw_data, \
replace_categories_in_mixed_data, replace_bin_cats_in_mixed_data, auto_replace_missing_in_mixed_data, \
replace_manual_in_mixed_data
from pytabkit.models import utils
import numpy
import sklearn.datasets as datasets
import re as re
#---------------------------------------------------------------------------------------------------
#---------------------------------------------------------------------------------------------------
#---------------------------------------------------------------------------------------------------
#---------------------------------------------------------------------------------------------------
#---------------------------------------------------------------------------------------------------
def get_skill_craft():
prepare_new_data_set_group_id()
download_and_save('https://archive.ics.uci.edu/ml/machine-learning-databases/00272/SkillCraft1_Dataset.csv', 'skill_craft.data')
replace_chars_in_file('skill_craft.data', '"', '')
data = load_raw_data('skill_craft.data', sep = ',')
data = remove_columns(data, [0])
save_data_to_file(data, 'skill_craft', is_classification = True)
#---------------------------------------------------------------------------------------------------
def get_cargo_2000():
prepare_new_data_set_group_id()
print("Cargo 2000 data set is currently not processed since:")
print(" - from the description it is completely unclear how this data set can be used")
#---------------------------------------------------------------------------------------------------
def get_KDC_4007():
prepare_new_data_set_group_id()
print("KDC 4007 data set is currently not processed since:")
print(" - from the description it is completely unclear how this data set can be used")
#---------------------------------------------------------------------------------------------------
def get_sml2010():
prepare_new_data_set_group_id()
download_and_save('https://archive.ics.uci.edu/ml/machine-learning-databases/00274/NEW-DATA.zip', 'sml2010.zip')
unzip_raw_data('sml2010.zip')
concat_files(UCIVars.raw_data_folder + 'NEW-DATA*.txt', UCIVars.raw_data_folder + 'sml2010.data')
remove_files(UCIVars.raw_data_folder, 'NEW-DATA*.txt')
replace_chars_in_file('sml2010.data', '#', '')
data = load_raw_data('sml2010.data', sep = ' ', description_columns = 2)
data_dining = remove_columns(data, [1])
save_data_to_file(data_dining, 'sml2010_dining', is_classification = False)
data_room = remove_columns(data, [0])
save_data_to_file(data_room, 'sml2010_room', is_classification = False)
#---------------------------------------------------------------------------------------------------
def get_wine_quality():
prepare_new_data_set_group_id()
download_and_save('https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv', 'wine_quality_red.data')
download_and_save('https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv', 'wine_quality_white.data')
download_and_save('https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality.names', 'wine_quality.description')
# The first task is to create data sets in which the quality is the label.
# To this end, we add a column at the right, which indicates whether the wine is white or read.
data_white = load_raw_data('wine_quality_white.data', sep = ';', header = True)
data_white = move_label_in_front(data_white, 11)
white_label = numpy.ones((numpy.shape(data_white)[0], 1))
data_white = numpy.concatenate((data_white, white_label), axis = 1)
save_data_to_file(data_white, 'wine_quality_white', is_classification = True)
data_red = load_raw_data('wine_quality_red.data', sep = ';', header = True)
data_red = move_label_in_front(data_red, 11)
red_label = numpy.zeros((numpy.shape(data_red)[0], 1))
data_red = numpy.concatenate((data_red, red_label), axis = 1)
data_all = numpy.concatenate((data_red, data_white), axis = 0)
save_data_to_file(data_all, 'wine_quality_all', is_classification = True)
# The next task is to combine the white and red wine data set and
# to add a label describing the color of the wine. We further remove
# the quality of the wine, since this may give too much information
# about the color.
data_white = load_raw_data('wine_quality_white.data', sep = ';', header = True)
data_white = remove_columns(data_white, [11])
white_label = numpy.ones((numpy.shape(data_white)[0], 1))
data_white = numpy.concatenate((white_label, data_white), axis = 1)
data_red = load_raw_data('wine_quality_red.data', sep = ';', header = True)
data_red = remove_columns(data_red, [11])
red_label = numpy.zeros((numpy.shape(data_red)[0], 1))
data_red = numpy.concatenate((red_label, data_red), axis = 1)
data_all = numpy.concatenate((data_red, data_white), axis = 0)
save_data_to_file(data_all, 'wine_quality_type', is_classification = True, is_regression = False)
#---------------------------------------------------------------------------------------------------
def get_parkinson():
prepare_new_data_set_group_id()
download_and_save('https://archive.ics.uci.edu/ml/machine-learning-databases/parkinsons/telemonitoring/parkinsons_updrs.data', 'parkinson_updrs.data')
download_and_save('https://archive.ics.uci.edu/ml/machine-learning-databases/parkinsons/telemonitoring/parkinsons_updrs.names', 'parkinson_updrs.description')
data = load_raw_data('parkinson_updrs.data', sep = ',', description_columns = 1)
# The data has two variables that can be predicted, namely updrs_motor and updrs_total.
# For both prediction tasks, the other target variable needs to be removed from the data
data_motor = remove_columns(data, [4])
data_motor = move_label_in_front(data_motor, 3)
save_data_to_file(data_motor, 'parkinson_motor', is_classification = False)
data_total = remove_columns(data, [3])
data_total = move_label_in_front(data_total, 3)
save_data_to_file(data_total, 'parkinson_total', is_classification = False)
#---------------------------------------------------------------------------------------------------
def get_insurance_benchmark():
prepare_new_data_set_group_id()
download_and_save('https://archive.ics.uci.edu/ml/machine-learning-databases/tic-mld/ticdata2000.txt', 'insurance_benchmark.train.data')
download_and_save('https://archive.ics.uci.edu/ml/machine-learning-databases/tic-mld/ticeval2000.txt', 'insurance_benchmark.test.data')
download_and_save('https://archive.ics.uci.edu/ml/machine-learning-databases/tic-mld/tictgts2000.txt', 'insurance_benchmark.test.labels.data')
download_and_save('https://archive.ics.uci.edu/ml/machine-learning-databases/tic-mld/TicDataDescr.txt', 'insurance_benchmark.description')
train_data = load_raw_data('insurance_benchmark.train.data', sep = '\t')
test_data = load_raw_data('insurance_benchmark.test.data', sep = '\t')
test_label = load_raw_data('insurance_benchmark.test.labels.data', sep = '\t')
test_data = numpy.concatenate((test_data, test_label), axis = 1)
data = numpy.concatenate((train_data, test_data), axis = 0)
data = move_label_in_front(data, 85)
save_data_to_file(data, 'insurance_benchmark', is_classification = True)
#---------------------------------------------------------------------------------------------------
def get_EEG_steady_state():
prepare_new_data_set_group_id()
print("EEG Steady State Visual data set is currently not processed since:")
print(" - the description indicates that it is time series data")
#---------------------------------------------------------------------------------------------------
def get_air_quality():
prepare_new_data_set_group_id()
download_and_save('https://archive.ics.uci.edu/ml/machine-learning-databases/00360/AirQualityUCI.zip', 'air_quality.zip')
unzip_raw_data('air_quality.zip')
os.rename(UCIVars.raw_data_folder + 'AirQualityUCI.csv', UCIVars.raw_data_folder + 'air_quality.data')
os.remove(UCIVars.raw_data_folder + 'AirQualityUCI.xlsx')
data = load_raw_data('air_quality.data', sep = ';', date_column = 0, date_sep = '/', date_order = 'dmY', time_column = 1, time_sep = '.', german_decimal = True)
# The data has five variables that can be predicted,
# namely those in columns 2, 4, 5, 7, and 9 (C++ like).
# For these prediction tasks, the other target variables
# need to be removed from the data.
data_co2 = remove_columns(data, [4, 5, 7, 9])
data_co2 = move_label_in_front(data_co2, 2)
data_co2 = remove_rows_with_label(data_co2, -200.0)
save_data_to_file(data_co2, 'air_quality_co2', is_classification = False)
# The hydrocarbon reference measurements have only been taken 914 times
# For this reason, they are not included in the constructed data sets.
data_bc = remove_columns(data, [2, 4, 7, 9])
data_bc = move_label_in_front(data_bc, 3)
data_bc = remove_rows_with_label(data_bc, -200.0)
save_data_to_file(data_bc, 'air_quality_bc', is_classification = False)
data_nox = remove_columns(data, [2, 4, 5, 9])
data_nox = move_label_in_front(data_nox, 4)
data_nox = remove_rows_with_label(data_nox, -200.0)
save_data_to_file(data_nox, 'air_quality_nox', is_classification = False)
data_no2 = remove_columns(data, [2, 4, 5, 7])
data_no2 = move_label_in_front(data_no2, 5)
data_no2 = remove_rows_with_label(data_no2, -200.0)
save_data_to_file(data_no2, 'air_quality_no2', is_classification = False)
#---------------------------------------------------------------------------------------------------
def get_cycle_power_plant():
prepare_new_data_set_group_id()
download_and_save('https://archive.ics.uci.edu/ml/machine-learning-databases/00294/CCPP.zip', 'cycle_power_plant.zip')
unzip_raw_data('cycle_power_plant.zip')
# The zip file contains some junk and in addition, the data is in EXCEL format. This is addressed now:
excel_data = pandas.read_excel(UCIVars.raw_data_folder + 'CCPP/Folds5x2_pp.xlsx', engine = 'openpyxl')
excel_data.to_csv(UCIVars.raw_data_folder + 'cycle_power_plant.data')
shutil.rmtree(UCIVars.raw_data_folder + 'CCPP')
# The response variable is in the last column
data = load_raw_data('cycle_power_plant.data', sep = ',', description_columns = 1)
data = move_label_in_front(data, 4)
save_data_to_file(data, 'cycle_power_plant', is_classification = False)
#---------------------------------------------------------------------------------------------------
def get_carbon_nanotubes():
prepare_new_data_set_group_id()
download_and_save('https://archive.ics.uci.edu/ml/machine-learning-databases/00448/carbon_nanotubes.csv', 'carbon_nanotubes.data')
data = load_raw_data('carbon_nanotubes.data', sep = ';', german_decimal = True)
data_u = remove_columns(data, [6, 7])
data_u = move_label_in_front(data_u, 5)
save_data_to_file(data_u, 'carbon_nanotubes_u', is_classification = False)
data_v = remove_columns(data, [5, 7])
data_v = move_label_in_front(data_v, 5)
save_data_to_file(data_v, 'carbon_nanotubes_v', is_classification = False)
data_w = remove_columns(data, [5, 6])
data_w = move_label_in_front(data_w, 5)
save_data_to_file(data_w, 'carbon_nanotubes_w', is_classification = False)
#---------------------------------------------------------------------------------------------------
def get_naval_propulsion():
prepare_new_data_set_group_id()
download_and_save('https://archive.ics.uci.edu/ml/machine-learning-databases/00316/UCI%20CBM%20Dataset.zip', 'naval_propulsion.zip')
unzip_raw_data('naval_propulsion.zip')
# The zip file contains quite a bit of junk, which is removed in the following
shutil.copy(UCIVars.raw_data_folder + 'UCI CBM Dataset/data.txt', UCIVars.raw_data_folder + 'naval_propulsion.data')
shutil.copy(UCIVars.raw_data_folder + 'UCI CBM Dataset/Features.txt', UCIVars.raw_data_folder + 'naval_propulsion.features.txt')
shutil.copy(UCIVars.raw_data_folder + 'UCI CBM Dataset/README.txt', UCIVars.raw_data_folder + 'naval_propulsion.description')
shutil.rmtree(UCIVars.raw_data_folder + 'UCI CBM Dataset/')
shutil.rmtree(UCIVars.raw_data_folder + '__MACOSX')
data = load_raw_data('naval_propulsion.data', sep = ' ')
# The data has actually three response variables, but one of those, namely the ship speed
# is affine linear in the lever position, which is also recorded in the data. For this
# reason, only the other two response variables are considered.
data_comp = remove_columns(data, [17])
data_comp = move_label_in_front(data_comp, 16)
save_data_to_file(data_comp, 'naval_propulsion_comp', is_classification = False)
data_turb = remove_columns(data, [16])
data_turb = move_label_in_front(data_turb, 16)
save_data_to_file(data_turb, 'naval_propulsion_turb', is_classification = False)
#---------------------------------------------------------------------------------------------------
def get_blood_pressure():
prepare_new_data_set_group_id()
print("Cuff-Less Blood pressure Estimation is currently not processed since:")
print(" - the zip file is about 3.1GB large")
print(" - the description indicates that each of the three features is actually a times series")
print(" - the file is in matlab format")
#print('The following download may take a while, since the .zip file is about 3.1GB large.')
#download_and_save('https://archive.ics.uci.edu/ml/machine-learning-databases/00340/data.zip', 'blood_pressure.zip')
#unzip_raw_data('blood_pressure.zip')
#---------------------------------------------------------------------------------------------------
def get_gas_sensor_drift():
prepare_new_data_set_group_id()
download_and_save('https://archive.ics.uci.edu/ml/machine-learning-databases/00270/driftdataset.zip', 'gas_sensor_drift.zip')
unzip_raw_data('gas_sensor_drift.zip')
concat_files(UCIVars.raw_data_folder + 'batch*.dat', UCIVars.raw_data_folder + 'gas_sensor_drift.data')
remove_files(UCIVars.raw_data_folder, 'batch*.dat')
# Next we need to replace ; by , in .data file, since otherwise the routines for libsvm-like formats won't work.
# Also, the first label is multiplied by 10000 since the routine for libsvm-like formats seem to sort the
# labels. By multiplying the label by 10000, we actually can guarantee that the first label is always the larger
# one, so that the routine places it at the second position in the list of labels.
# Then we read a libsvm like file with multiple labels and convert it from Compressed Sparse Row format to normal format
replace_chars_in_file('gas_sensor_drift.data', ';', '0000,')
data = datasets.load_svmlight_file(UCIVars.raw_data_folder + 'gas_sensor_drift.data', multilabel = True)
x_data = data[0].toarray()
all_labels = numpy.reshape(data[1], newshape = (-1, 2))
## The data has two response variables, one indicating which chemical is measured
## and one reporting its concentration. We simply take both as being of interest ...
class_labels = numpy.reshape(all_labels[ :, 1], newshape = (-1, 1)) / 10000.0
data_class = numpy.concatenate((class_labels, x_data), axis = 1)
save_data_to_file(data_class, 'gas_sensor_drift_class', is_classification = True)
conc_labels = numpy.reshape(all_labels[ :, 0], newshape = (-1, 1))
data_conc = numpy.concatenate((conc_labels, x_data), axis = 1)
save_data_to_file(data_conc, 'gas_sensor_drift_conc', is_classification = False)
#---------------------------------------------------------------------------------------------------
def get_bike_sharing():
prepare_new_data_set_group_id()
download_and_save('https://archive.ics.uci.edu/ml/machine-learning-databases/00275/Bike-Sharing-Dataset.zip', 'bike_sharing.zip')
unzip_raw_data('bike_sharing.zip')
os.remove(UCIVars.raw_data_folder + 'day.csv')
os.rename(UCIVars.raw_data_folder + 'hour.csv', UCIVars.raw_data_folder + 'bike_sharing.data')
os.rename(UCIVars.raw_data_folder + 'Readme.txt', UCIVars.raw_data_folder + 'bike_sharing.description')
data = load_raw_data('bike_sharing.data', sep = ',', description_columns = 2)
data_casual = remove_columns(data, [13, 14])
data_casual = move_label_in_front(data_casual, 12)
save_data_to_file(data_casual, 'bike_sharing_casual', is_classification = False)
data_total = remove_columns(data, [12, 13])
data_total = move_label_in_front(data_total, 12)
save_data_to_file(data_total, 'bike_sharing_total', is_classification = False)
#---------------------------------------------------------------------------------------------------
def get_appliances_energy():
prepare_new_data_set_group_id()
download_and_save('https://archive.ics.uci.edu/ml/machine-learning-databases/00374/energydata_complete.csv', 'appliances_energy.data')
# The data entries are saved as strings, that is as "...". In addition, date and time are not separated by commas.
# The following lines cure this.
replace_chars_in_file('appliances_energy.data', '"', '')
replace_chars_in_file('appliances_energy.data', ', ', ',')
replace_chars_in_file('appliances_energy.data', ', ', ',')
replace_chars_in_file('appliances_energy.data', ' ', ',')
data = load_raw_data('appliances_energy.data', sep = ',', date_column = 0, date_sep = '-', date_order = 'Ymd', time_column = 1, time_sep = ':')
data = move_label_in_front(data, 2)
save_data_to_file(data, 'appliances_energy', is_classification = False)
#---------------------------------------------------------------------------------------------------
def get_indoor_loc():
prepare_new_data_set_group_id()
download_and_save('https://archive.ics.uci.edu/ml/machine-learning-databases/00310/UJIndoorLoc.zip', 'indoor_loc.zip')
unzip_raw_data('indoor_loc.zip')
os.rename(UCIVars.raw_data_folder + 'UJIndoorLoc/trainingData.csv', UCIVars.raw_data_folder + 'indoor_loc.train.csv')
os.rename(UCIVars.raw_data_folder + 'UJIndoorLoc/validationData.csv', UCIVars.raw_data_folder + 'indoor_loc.val.csv')
shutil.rmtree(UCIVars.raw_data_folder + 'UJIndoorLoc')
concat_files(UCIVars.raw_data_folder + 'indoor*.csv', UCIVars.raw_data_folder + 'indoor_loc.data')
remove_files(UCIVars.raw_data_folder, 'indoor*.csv')
# --- Regression part ------
data = load_raw_data('indoor_loc.data', sep = ',')
data = remove_columns(data, range(523, 529))
data_long = remove_columns(data, [521, 522])
data_long = move_label_in_front(data_long, 520)
save_data_to_file(data_long, 'indoor_loc_long', is_classification = False)
data_lat = remove_columns(data, [520, 522])
data_lat = move_label_in_front(data_lat, 520)
save_data_to_file(data_lat, 'indoor_loc_lat', is_classification = False)
data_alt = remove_columns(data, [520, 521])
data_alt = move_label_in_front(data_alt, 520)
save_data_to_file(data_alt, 'indoor_loc_alt', is_classification = False)
# --- Classification part -----
data = load_raw_data('indoor_loc.data', sep = ',')
data = remove_columns(data, range(526, 529))
data_relative = move_label_in_front(data, 525)
data_relative = remove_columns(data_relative, range(521, 526))
save_data_to_file(data_relative, 'indoor_loc_relative', is_classification = True, is_regression = False)
data_building = move_label_in_front(data, 523)
data_building = remove_columns(data_building, range(521, 526))
save_data_to_file(data_building, 'indoor_loc_building', is_classification = True, is_regression = False)
#---------------------------------------------------------------------------------------------------
def get_online_news_popularity():
prepare_new_data_set_group_id()
download_and_save('http://archive.ics.uci.edu/ml/machine-learning-databases/00332/OnlineNewsPopularity.zip', 'online_news_popularity.zip')
unzip_raw_data('online_news_popularity.zip')
os.rename(UCIVars.raw_data_folder + 'OnlineNewsPopularity/OnlineNewsPopularity.csv', UCIVars.raw_data_folder + 'online_news_popularity.data')
os.rename(UCIVars.raw_data_folder + 'OnlineNewsPopularity/OnlineNewsPopularity.names', UCIVars.raw_data_folder + 'online_news_popularity.description')
shutil.rmtree(UCIVars.raw_data_folder + 'OnlineNewsPopularity')
data = load_raw_data('online_news_popularity.data', sep = ', ', description_columns = 2)
data = move_label_in_front(data, 58)
save_data_to_file(data, 'online_news_popularity', is_classification = False)
#---------------------------------------------------------------------------------------------------
def get_facebook_comment_volume():
prepare_new_data_set_group_id()
download_and_save('http://archive.ics.uci.edu/ml/machine-learning-databases/00363/Dataset.zip', 'facebook_comment_volume.zip')
unzip_raw_data('facebook_comment_volume.zip')
os.rename(UCIVars.raw_data_folder + 'Dataset/Training/Features_Variant_1.csv', UCIVars.raw_data_folder + 'facebook_comment_volume.data')
shutil.rmtree(UCIVars.raw_data_folder + 'Dataset')
shutil.rmtree(UCIVars.raw_data_folder + '__MACOSX')
data = load_raw_data('facebook_comment_volume.data', sep = ',')
data = move_label_in_front(data, 53)
save_data_to_file(data, 'facebook_comment_volume', is_classification = False)
#---------------------------------------------------------------------------------------------------
def get_bejing_pm25():
prepare_new_data_set_group_id()
download_and_save('http://archive.ics.uci.edu/ml/machine-learning-databases/00381/PRSA_data_2010.1.1-2014.12.31.csv', 'bejing_pm25.data')
replace_chars_in_file('bejing_pm25.data', 'cv', '0,0')
replace_chars_in_file('bejing_pm25.data', 'NW', '1,2')
replace_chars_in_file('bejing_pm25.data', 'NE', '1,1')
replace_chars_in_file('bejing_pm25.data', 'SE', '2,1')
replace_chars_in_file('bejing_pm25.data', 'SW', '2,2')
data = load_raw_data('bejing_pm25.data', sep = ',', description_columns = 1)
data = move_label_in_front(data, 4)
save_data_to_file(data, 'bejing_pm25', is_classification = False)
#---------------------------------------------------------------------------------------------------
def get_protein_tertiary_structure():
prepare_new_data_set_group_id()
download_and_save('http://archive.ics.uci.edu/ml/machine-learning-databases/00265/CASP.csv', 'protein_tertiary_structure.data')
data = load_raw_data('protein_tertiary_structure.data', sep = ',')
save_data_to_file(data, 'protein_tertiary_structure', is_classification = False)
#---------------------------------------------------------------------------------------------------
def get_tamilnadu_electricity():
prepare_new_data_set_group_id()
print("Tamilnadu Electricity data set is currently not processed since:")
print(" - from the description it is completely unclear how this data set can be used")
#---------------------------------------------------------------------------------------------------
def get_metro_interstate_traffic_volume():
prepare_new_data_set_group_id()
download_and_save('https://archive.ics.uci.edu/ml/machine-learning-databases/00492/Metro_Interstate_Traffic_Volume.csv.gz', 'metro_interstate_traffic_volume.zip')
ungz_raw_data('metro_interstate_traffic_volume.zip')
os.rename(UCIVars.raw_data_folder + 'metro_interstate_traffic_volume.zip.data', UCIVars.raw_data_folder + 'metro_interstate_traffic_volume.data')
data = load_mixed_raw_data('metro_interstate_traffic_volume.data', sep = ',', header = True)
size = data.shape[0]
data[0:size, 7] = [date_and_time.replace(' ', ',') for date_and_time in data[0:size, 7]]
# Deal with the holidays: we put all holidays in one category, and all non-holidays in the other.
# There are 11 holidays and 'None'. The latter receives the value 0, while all holidays receive the
# value 1. The following code is based on string replacement and the particular form of the entries.
data[0:size, 0] = [re.sub(r" ", '', holiday) for holiday in data[0:size, 0]]
data[0:size, 0] = [re.sub(r"None", '0', holiday) for holiday in data[0:size, 0]]
data[0:size, 0] = [re.sub(r"D", '1', holiday) for holiday in data[0:size, 0]]
data[0:size, 0] = [re.sub(r"WashingtonsBirthday", '1', holiday) for holiday in data[0:size, 0]]
data[0:size, 0] = [re.sub(r"StateFair", '1', holiday) for holiday in data[0:size, 0]]
data[0:size, 0] = [re.sub(r"[a-zA-Z]", '', holiday) for holiday in data[0:size, 0]]
# The weather is briefly described in column 5 and in more detail in column 6.
# We create two data sets, one for each type of description.
data_short = auto_replace_categories_in_mixed_data(data, 5, ',')
data_short = remove_columns(data_short, 6)
write_mixed_raw_data(UCIVars.raw_data_folder + 'metro_interstate_traffic_volume_short.data', data_short, sep = ",")
data_long = auto_replace_categories_in_mixed_data(data, 6, ',')
data_long = remove_columns(data_long, 5)
write_mixed_raw_data(UCIVars.raw_data_folder + 'metro_interstate_traffic_volume_long.data', data_long, sep = ",")
write_mixed_raw_data(UCIVars.raw_data_folder + 'metro_interstate_traffic_volume.data', data, sep = ",")
replace_chars_in_file('metro_interstate_traffic_volume.data', ' ', ' ')
# Now we are in the position to read the data, convert the time and date, and movel the labels
data = load_raw_data('metro_interstate_traffic_volume_short.data', ',', description_columns = 0, date_column = 16, date_sep = '-', date_order = 'Ymd', time_column = 17, time_sep = ':')
data = move_label_in_front(data, 18)
save_data_to_file(data, 'metro_interstate_traffic_volume_short', is_classification = False, is_regression = True)
data = load_raw_data('metro_interstate_traffic_volume_long.data', ',', description_columns = 0, date_column = 43, date_sep = '-', date_order = 'Ymd', time_column = 44, time_sep = ':')
data = move_label_in_front(data, 45)
save_data_to_file(data, 'metro_interstate_traffic_volume_long', is_classification = False, is_regression = True)
#---------------------------------------------------------------------------------------------------
def get_facebook_live_sellers_thailand():
prepare_new_data_set_group_id()
download_and_save('https://archive.ics.uci.edu/ml/machine-learning-databases/00488/Live_20210128.csv', 'facebook_live_sellers_thailand.data')
data = load_mixed_raw_data('facebook_live_sellers_thailand.data', sep = ",", header = True)
# Columns 0 and 2 contain id and time information. These are deleted. The last 4 columns are empty,
# and thus deleted, too.
data = remove_columns(data, [0, 2, 12, 13, 14, 15])
# Next we replace the status_type by some numbers
categories = [u'link', u'photo', u'status', u'video']
data = replace_ordinals_in_mixed_data(data, categories, 0, separator = ',')
write_mixed_raw_data(UCIVars.raw_data_folder + 'facebook_live_sellers_thailand.data', data, sep = ",")
data = load_raw_data('facebook_live_sellers_thailand.data', ',')
# The classes 1 and 3 contain 63 and 365 samples, only. We remove them for the classification data set
data_class = remove_rows_with_label(data, 1)
data_class = remove_rows_with_label(data_class, 3)
save_data_to_file(data_class, 'facebook_live_sellers_thailand_status', is_classification = True, is_regression = False)
# For the regression data set, we pick the 'shares' column as label
data_regr = move_label_in_front(data, 3)
save_data_to_file(data_regr, 'facebook_live_sellers_thailand_shares', is_classification = False, is_regression = True)
#---------------------------------------------------------------------------------------------------
def get_parking_birmingham():
prepare_new_data_set_group_id()
download_and_save('https://archive.ics.uci.edu/ml/machine-learning-databases/00482/dataset.zip', 'parking_birmingham.zip')
unzip_raw_data('parking_birmingham.zip')
os.rename(UCIVars.raw_data_folder + 'dataset.csv', UCIVars.raw_data_folder + 'parking_birmingham.data')
# One could also convert the name of the parking spot into a binary vector. However, this vector is
# of dimension 30 and therefore it would dominate the remaining features. We thus use a one dimensional
# representation, instead.
data = load_mixed_raw_data('parking_birmingham.data', sep = ',', header = True)
categories = ['BHMEURBRD01', 'BHMEURBRD02', 'Bull Ring', 'BHMBRCBRG02', 'BHMBRCBRG03', 'BHMBRCBRG01', 'Shopping', 'BHMNCPLDH01', 'BHMBCCSNH01', 'BHMNCPRAN01', 'BHMBCCPST01', 'Others-CCCPS133', 'BHMBRTARC01', 'Others-CCCPS98', 'NIA North', 'BHMNCPHST01', 'BHMNCPNST01', 'BHMNCPNHS01', 'BHMBCCTHL01', 'Others-CCCPS119a', 'Others-CCCPS8', 'Others-CCCPS105a', 'Broad Street', 'NIA South', 'NIA Car Parks', 'BHMBCCMKT01', 'BHMMBMMBX01', 'Others-CCCPS202', 'Others-CCCPS135a', 'BHMNCPPLS01']
data = replace_ordinals_in_mixed_data(data, categories, 0, separator = ',')
write_mixed_raw_data(UCIVars.raw_data_folder + 'parking_birmingham.data', data, sep = ",")
# Next we split date-time into two features
replace_chars_in_file('parking_birmingham.data', ' ', ',')
# Now, we convert the date into a weekday and then into a point on the circle
# Furthermore, we create a second data set with rounded times fur possible future time series
# treatment.
data = load_mixed_raw_data('parking_birmingham.data', sep = ",", header = False)
data = replace_isodate_by_day_in_mixed_data(data, 3)
data = replace_circulars_in_mixed_data(data, get_categories_in_mixed_data(data, 3), 3, ",")
write_mixed_raw_data(UCIVars.raw_data_folder + 'parking_birmingham.data', data, sep = ",")
data = replace_time_by_seconds_in_mixed_data(data, 4, sep = ':', rounded = 1800)
write_mixed_raw_data(UCIVars.raw_data_folder + 'parking_birmingham.rounded.data', data, sep = ",")
# Now we compute the relative occupancy and use it as label
# Note that we keep both the parking spot number and its capacity
data = load_raw_data('parking_birmingham.data', ',', time_column = 5, time_sep = ':')
data[:, 2] = data[:, 2] / data[:, 1]
data = move_label_in_front(data, 2)
save_data_to_file(data, 'parking_birmingham', is_classification = False, is_regression = True)
#---------------------------------------------------------------------------------------------------
def get_tarvel_review_ratings():
prepare_new_data_set_group_id()
# Download the data and correct the misspelling of its name
download_and_save('http://archive.ics.uci.edu/ml/machine-learning-databases/00485/google_review_ratings.csv', 'travel_review_ratings.data')
# Remove the commas at the end of each row and clean a few messy lines
replace_chars_in_file('travel_review_ratings.data', ',\r', '\r')
replace_chars_in_file('travel_review_ratings.data', '"', '')
replace_chars_in_file('travel_review_ratings.data', ',,', ',')
replace_chars_in_file('travel_review_ratings.data', '\t', '')
data = load_raw_data('travel_review_ratings.data', ',', description_columns = 1, header = True)
# Determine the first column that contains the most ratings, use it as label, and remove possible rows
# with label = 0
ratings_counts = data.astype(bool).sum(axis=0)
most_rated_column = numpy.argmax(ratings_counts)
data = move_label_in_front(data, most_rated_column)
remove_rows_with_label(data, 0.0)
save_data_to_file(data, 'travel_review_ratings', is_classification = False, is_regression = True)
#---------------------------------------------------------------------------------------------------
def get_superconductivity():
prepare_new_data_set_group_id()
download_and_save('http://archive.ics.uci.edu/ml/machine-learning-databases/00464/superconduct.zip', 'superconductivity.zip')
unzip_raw_data('superconductivity.zip')
os.rename(UCIVars.raw_data_folder + 'train.csv', UCIVars.raw_data_folder + 'superconductivity.data')
os.remove(UCIVars.raw_data_folder + 'unique_m.csv')
data = load_raw_data('superconductivity.data', ',', header = True)
data_regr = move_label_in_front(data, 81)
save_data_to_file(data_regr, 'superconductivity', is_classification = False, is_regression = True)
# We also create a classification daat set, in which we try to identify materials with critical temperature above 77K.
# We refer to https://en.wikipedia.org/wiki/Superconductivity for the i
gitextract_xlrx7g0c/
├── .github/
│ └── workflows/
│ └── testing.yml
├── .gitignore
├── .readthedocs.yaml
├── LICENSE.txt
├── README.md
├── docs/
│ ├── Makefile
│ ├── make.bat
│ ├── requirements.txt
│ └── source/
│ ├── bench/
│ │ ├── 00_installation.md
│ │ ├── 01_running_the_benchmark.md
│ │ ├── 02_stored_data.md
│ │ ├── 03_code.md
│ │ ├── adding_models.md
│ │ ├── download_results.md
│ │ ├── refine_then_calibrate.md
│ │ └── using_the_scheduler.md
│ ├── conf.py
│ ├── index.rst
│ └── models/
│ ├── 00_overview.md
│ ├── 01_sklearn_interfaces.rst
│ ├── 02_hpo.md
│ ├── 03_training_implementation.md
│ ├── examples.md
│ ├── nn_classes.md
│ └── quantile_reg.md
├── examples/
│ └── tutorial_notebook.ipynb
├── original_requirements/
│ ├── conda_env_2024_06_25.yml
│ ├── conda_env_2024_10_28.yml
│ ├── conda_env_2025_01_15.yml
│ └── requirements_2024_06_25.txt
├── pyproject.toml
├── pytabkit/
│ ├── __about__.py
│ ├── __init__.py
│ ├── bench/
│ │ ├── __init__.py
│ │ ├── alg_wrappers/
│ │ │ ├── __init__.py
│ │ │ ├── general.py
│ │ │ └── interface_wrappers.py
│ │ ├── data/
│ │ │ ├── __init__.py
│ │ │ ├── common.py
│ │ │ ├── get_uci.py
│ │ │ ├── import_talent_benchmark.py
│ │ │ ├── import_tasks.py
│ │ │ ├── paths.py
│ │ │ ├── tasks.py
│ │ │ └── uci_file_ops.py
│ │ ├── eval/
│ │ │ ├── __init__.py
│ │ │ ├── analysis.py
│ │ │ ├── colors.py
│ │ │ ├── evaluation.py
│ │ │ ├── plotting.py
│ │ │ ├── runtimes.py
│ │ │ └── tables.py
│ │ ├── run/
│ │ │ ├── __init__.py
│ │ │ ├── results.py
│ │ │ └── task_execution.py
│ │ └── scheduling/
│ │ ├── __init__.py
│ │ ├── execution.py
│ │ ├── jobs.py
│ │ ├── resource_manager.py
│ │ ├── resources.py
│ │ └── schedulers.py
│ └── models/
│ ├── __init__.py
│ ├── alg_interfaces/
│ │ ├── __init__.py
│ │ ├── alg_interfaces.py
│ │ ├── autogluon_model_interfaces.py
│ │ ├── base.py
│ │ ├── calibration.py
│ │ ├── catboost_interfaces.py
│ │ ├── ensemble_interfaces.py
│ │ ├── lightgbm_interfaces.py
│ │ ├── nn_interfaces.py
│ │ ├── other_interfaces.py
│ │ ├── resource_computation.py
│ │ ├── resource_params.py
│ │ ├── rtdl_interfaces.py
│ │ ├── sub_split_interfaces.py
│ │ ├── tabm_interface.py
│ │ ├── tabr_interface.py
│ │ ├── xgboost_interfaces.py
│ │ └── xrfm_interfaces.py
│ ├── data/
│ │ ├── __init__.py
│ │ ├── conversion.py
│ │ ├── data.py
│ │ ├── nested_dict.py
│ │ └── splits.py
│ ├── hyper_opt/
│ │ ├── __init__.py
│ │ ├── coord_opt.py
│ │ └── hyper_optimizers.py
│ ├── nn_models/
│ │ ├── __init__.py
│ │ ├── activations.py
│ │ ├── base.py
│ │ ├── categorical.py
│ │ ├── models.py
│ │ ├── nn.py
│ │ ├── pipeline.py
│ │ ├── rtdl_num_embeddings.py
│ │ ├── rtdl_resnet.py
│ │ ├── tabm.py
│ │ ├── tabr.py
│ │ ├── tabr_context_freeze.py
│ │ └── tabr_lib.py
│ ├── optim/
│ │ ├── __init__.py
│ │ ├── adopt.py
│ │ ├── optimizers.py
│ │ └── scheduling_adam.py
│ ├── sklearn/
│ │ ├── __init__.py
│ │ ├── default_params.py
│ │ ├── sklearn_base.py
│ │ └── sklearn_interfaces.py
│ ├── torch_utils.py
│ ├── training/
│ │ ├── __init__.py
│ │ ├── auc_mu.py
│ │ ├── coord.py
│ │ ├── lightning_callbacks.py
│ │ ├── lightning_modules.py
│ │ ├── logging.py
│ │ ├── metrics.py
│ │ ├── nn_creator.py
│ │ └── scheduling.py
│ └── utils.py
├── scripts/
│ ├── analyze_hpo_best_params.py
│ ├── analyze_tasks.py
│ ├── check_missing_values.py
│ ├── copy_algs.py
│ ├── create_plots_and_tables.py
│ ├── create_probclass_plots.py
│ ├── create_xrfm_ablations_table.py
│ ├── custom_paths.py.default
│ ├── download_data.py
│ ├── estimate_resource_params.py
│ ├── get_sklearn_names.py
│ ├── make_plot_animation.py
│ ├── meta_hyperopt.py
│ ├── move_algs.py
│ ├── move_many_algs.py
│ ├── print_complete_results.py
│ ├── print_runtimes.py
│ ├── ray_slurm_launch.py
│ ├── ray_slurm_template.sh
│ ├── rename_alg.py
│ ├── rename_tag.py
│ ├── run_evaluation.py
│ ├── run_experiments.py
│ ├── run_experiments_unused.py
│ ├── run_probclass_experiments.py
│ ├── run_single_task.py
│ ├── run_slurm.py
│ ├── run_time_measurement.py
│ └── run_xrfm_large_ablations.py
└── tests/
├── __init__.py
├── test_bench.py
├── test_ensemble.py
├── test_metrics.py
├── test_rtdl_nns.py
├── test_sklearn_interfaces.py
├── test_tabr.py
└── test_variants.py
Showing preview only (232K chars total). Download the full file or copy to clipboard to get everything.
SYMBOL INDEX (2572 symbols across 99 files)
FILE: pytabkit/bench/alg_wrappers/general.py
class AlgWrapper (line 13) | class AlgWrapper:
method __init__ (line 17) | def __init__(self, **config):
method run (line 25) | def run(self, task_package: TaskPackage, logger: Logger, assigned_reso...
method get_required_resources (line 40) | def get_required_resources(self, task_package: TaskPackage) -> Require...
method get_max_n_vectorized (line 49) | def get_max_n_vectorized(self, task_info: TaskInfo) -> int:
method get_pred_param_names (line 58) | def get_pred_param_names(self, task_package: TaskPackage) -> List[str]:
FILE: pytabkit/bench/alg_wrappers/interface_wrappers.py
function get_prep_factory (line 55) | def get_prep_factory(**config):
class AlgInterfaceWrapper (line 59) | class AlgInterfaceWrapper(AlgWrapper):
method __init__ (line 64) | def __init__(self, create_alg_interface_fn: Optional[Callable[[...], A...
method _create_alg_interface_impl (line 75) | def _create_alg_interface_impl(self, task_package: TaskPackage) -> Alg...
method create_alg_interface (line 89) | def create_alg_interface(self, task_package: TaskPackage) -> AlgInterf...
method run (line 112) | def run(self, task_package: TaskPackage, logger: Logger, assigned_reso...
method get_required_resources (line 222) | def get_required_resources(self, task_package: TaskPackage) -> Require...
method get_pred_param_names (line 235) | def get_pred_param_names(self, task_package: TaskPackage) -> List[str]:
class LoadResultsWrapper (line 239) | class LoadResultsWrapper(AlgInterfaceWrapper):
method __init__ (line 240) | def __init__(self, alg_name: str, **config):
method _create_alg_interface_impl (line 244) | def _create_alg_interface_impl(self, task_package: TaskPackage) -> Alg...
method get_required_resources (line 267) | def get_required_resources(self, task_package: TaskPackage) -> Require...
class CaruanaEnsembleWrapper (line 272) | class CaruanaEnsembleWrapper(AlgInterfaceWrapper):
method __init__ (line 273) | def __init__(self, sub_wrappers: List[AlgInterfaceWrapper], **config):
method _create_alg_interface_impl (line 277) | def _create_alg_interface_impl(self, task_package: TaskPackage) -> Alg...
method get_required_resources (line 289) | def get_required_resources(self, task_package: TaskPackage) -> Require...
class AlgorithmSelectionWrapper (line 295) | class AlgorithmSelectionWrapper(AlgInterfaceWrapper):
method __init__ (line 296) | def __init__(self, sub_wrappers: List[AlgInterfaceWrapper], **config):
method _create_alg_interface_impl (line 300) | def _create_alg_interface_impl(self, task_package: TaskPackage) -> Alg...
method get_required_resources (line 312) | def get_required_resources(self, task_package: TaskPackage) -> Require...
class MultiSplitAlgInterfaceWrapper (line 319) | class MultiSplitAlgInterfaceWrapper(AlgInterfaceWrapper):
method __init__ (line 320) | def __init__(self, **config):
method create_single_alg_interface (line 323) | def create_single_alg_interface(self, n_cv: int, task_type: TaskType) \
method _create_alg_interface_impl (line 327) | def _create_alg_interface_impl(self, task_package: TaskPackage) -> Alg...
class SubSplitInterfaceWrapper (line 336) | class SubSplitInterfaceWrapper(MultiSplitAlgInterfaceWrapper):
method __init__ (line 337) | def __init__(self, create_sub_split_learner_fn: Optional[Callable[[......
method create_sub_split_interface (line 341) | def create_sub_split_interface(self, task_type: TaskType) -> AlgInterf...
method create_single_alg_interface (line 346) | def create_single_alg_interface(self, n_cv: int, task_type: TaskType) \
class NNInterfaceWrapper (line 352) | class NNInterfaceWrapper(AlgInterfaceWrapper):
method __init__ (line 353) | def __init__(self, **config):
method get_max_n_vectorized (line 356) | def get_max_n_vectorized(self, task_info: TaskInfo) -> int:
class NNHyperoptInterfaceWrapper (line 373) | class NNHyperoptInterfaceWrapper(AlgInterfaceWrapper):
method __init__ (line 374) | def __init__(self, **config):
method get_max_n_vectorized (line 377) | def get_max_n_vectorized(self, task_info: TaskInfo) -> int:
class RandomParamsNNInterfaceWrapper (line 394) | class RandomParamsNNInterfaceWrapper(AlgInterfaceWrapper):
method __init__ (line 395) | def __init__(self, model_idx: int, **config):
class LGBMSklearnInterfaceWrapper (line 400) | class LGBMSklearnInterfaceWrapper(SubSplitInterfaceWrapper):
method create_sub_split_interface (line 401) | def create_sub_split_interface(self, task_type: TaskType):
class LGBMInterfaceWrapper (line 405) | class LGBMInterfaceWrapper(SubSplitInterfaceWrapper):
method create_sub_split_interface (line 406) | def create_sub_split_interface(self, task_type: TaskType) -> AlgInterf...
class LGBMHyperoptInterfaceWrapper (line 410) | class LGBMHyperoptInterfaceWrapper(MultiSplitAlgInterfaceWrapper):
method create_single_alg_interface (line 411) | def create_single_alg_interface(self, n_cv: int, task_type: TaskType) \
class RandomParamsLGBMInterfaceWrapper (line 416) | class RandomParamsLGBMInterfaceWrapper(MultiSplitAlgInterfaceWrapper):
method create_single_alg_interface (line 417) | def create_single_alg_interface(self, n_cv: int, task_type: TaskType) \
class XGBSklearnInterfaceWrapper (line 422) | class XGBSklearnInterfaceWrapper(SubSplitInterfaceWrapper):
method create_sub_split_interface (line 423) | def create_sub_split_interface(self, task_type: TaskType) -> AlgInterf...
class XGBInterfaceWrapper (line 427) | class XGBInterfaceWrapper(SubSplitInterfaceWrapper):
method create_sub_split_interface (line 428) | def create_sub_split_interface(self, task_type: TaskType) -> AlgInterf...
class RandomParamsXGBInterfaceWrapper (line 432) | class RandomParamsXGBInterfaceWrapper(MultiSplitAlgInterfaceWrapper):
method create_single_alg_interface (line 433) | def create_single_alg_interface(self, n_cv: int, task_type: TaskType) \
class XGBHyperoptInterfaceWrapper (line 438) | class XGBHyperoptInterfaceWrapper(MultiSplitAlgInterfaceWrapper):
method create_single_alg_interface (line 439) | def create_single_alg_interface(self, n_cv: int, task_type: TaskType) \
class CatBoostSklearnInterfaceWrapper (line 444) | class CatBoostSklearnInterfaceWrapper(SubSplitInterfaceWrapper):
method create_sub_split_interface (line 445) | def create_sub_split_interface(self, task_type: TaskType) -> AlgInterf...
class CatBoostInterfaceWrapper (line 449) | class CatBoostInterfaceWrapper(SubSplitInterfaceWrapper):
method create_sub_split_interface (line 450) | def create_sub_split_interface(self, task_type: TaskType) -> AlgInterf...
class CatBoostHyperoptInterfaceWrapper (line 454) | class CatBoostHyperoptInterfaceWrapper(MultiSplitAlgInterfaceWrapper):
method create_single_alg_interface (line 455) | def create_single_alg_interface(self, n_cv: int, task_type: TaskType) \
class RandomParamsCatBoostInterfaceWrapper (line 460) | class RandomParamsCatBoostInterfaceWrapper(MultiSplitAlgInterfaceWrapper):
method create_single_alg_interface (line 461) | def create_single_alg_interface(self, n_cv: int, task_type: TaskType) \
class RFInterfaceWrapper (line 466) | class RFInterfaceWrapper(SubSplitInterfaceWrapper):
method create_sub_split_interface (line 467) | def create_sub_split_interface(self, task_type: TaskType) -> AlgInterf...
class ExtraTreesInterfaceWrapper (line 471) | class ExtraTreesInterfaceWrapper(SubSplitInterfaceWrapper):
method create_sub_split_interface (line 472) | def create_sub_split_interface(self, task_type: TaskType) -> AlgInterf...
class KNNInterfaceWrapper (line 476) | class KNNInterfaceWrapper(SubSplitInterfaceWrapper):
method create_sub_split_interface (line 477) | def create_sub_split_interface(self, task_type: TaskType) -> AlgInterf...
class LinearModelInterfaceWrapper (line 481) | class LinearModelInterfaceWrapper(SubSplitInterfaceWrapper):
method create_sub_split_interface (line 482) | def create_sub_split_interface(self, task_type: TaskType) -> AlgInterf...
class GBTInterfaceWrapper (line 486) | class GBTInterfaceWrapper(SubSplitInterfaceWrapper):
method create_sub_split_interface (line 487) | def create_sub_split_interface(self, task_type: TaskType) -> AlgInterf...
class SklearnMLPInterfaceWrapper (line 491) | class SklearnMLPInterfaceWrapper(SubSplitInterfaceWrapper):
method create_sub_split_interface (line 492) | def create_sub_split_interface(self, task_type: TaskType) -> AlgInterf...
class KANInterfaceWrapper (line 496) | class KANInterfaceWrapper(SubSplitInterfaceWrapper):
method create_sub_split_interface (line 497) | def create_sub_split_interface(self, task_type: TaskType) -> AlgInterf...
class GrandeInterfaceWrapper (line 501) | class GrandeInterfaceWrapper(SubSplitInterfaceWrapper):
method create_sub_split_interface (line 502) | def create_sub_split_interface(self, task_type: TaskType) -> AlgInterf...
class TabPFN2InterfaceWrapper (line 506) | class TabPFN2InterfaceWrapper(SubSplitInterfaceWrapper):
method create_sub_split_interface (line 507) | def create_sub_split_interface(self, task_type: TaskType) -> AlgInterf...
class TabICLInterfaceWrapper (line 511) | class TabICLInterfaceWrapper(SubSplitInterfaceWrapper):
method create_sub_split_interface (line 512) | def create_sub_split_interface(self, task_type: TaskType) -> AlgInterf...
class MLPRTDLInterfaceWrapper (line 516) | class MLPRTDLInterfaceWrapper(SubSplitInterfaceWrapper):
method create_sub_split_interface (line 517) | def create_sub_split_interface(self, task_type: TaskType) -> AlgInterf...
class ResNetRTDLInterfaceWrapper (line 521) | class ResNetRTDLInterfaceWrapper(SubSplitInterfaceWrapper):
method create_sub_split_interface (line 522) | def create_sub_split_interface(self, task_type: TaskType) -> AlgInterf...
class FTTransformerInterfaceWrapper (line 526) | class FTTransformerInterfaceWrapper(SubSplitInterfaceWrapper):
method create_sub_split_interface (line 527) | def create_sub_split_interface(self, task_type: TaskType) -> AlgInterf...
class TabRInterfaceWrapper (line 531) | class TabRInterfaceWrapper(SubSplitInterfaceWrapper):
method create_sub_split_interface (line 532) | def create_sub_split_interface(self, task_type: TaskType) -> AlgInterf...
class TabMInterfaceWrapper (line 536) | class TabMInterfaceWrapper(SubSplitInterfaceWrapper):
method create_sub_split_interface (line 537) | def create_sub_split_interface(self, task_type: TaskType) -> AlgInterf...
class RandomParamsResnetInterfaceWrapper (line 541) | class RandomParamsResnetInterfaceWrapper(AlgInterfaceWrapper):
method __init__ (line 542) | def __init__(self, model_idx: int, **config):
class RandomParamsRTDLMLPInterfaceWrapper (line 547) | class RandomParamsRTDLMLPInterfaceWrapper(AlgInterfaceWrapper):
method __init__ (line 548) | def __init__(self, model_idx: int, **config):
class RandomParamsFTTransformerInterfaceWrapper (line 553) | class RandomParamsFTTransformerInterfaceWrapper(AlgInterfaceWrapper):
method __init__ (line 554) | def __init__(self, model_idx: int, **config):
class AutoGluonModelInterfaceWrapper (line 559) | class AutoGluonModelInterfaceWrapper(AlgInterfaceWrapper):
method __init__ (line 560) | def __init__(self, **config):
class RandomParamsTabRInterfaceWrapper (line 565) | class RandomParamsTabRInterfaceWrapper(SubSplitInterfaceWrapper):
method create_single_alg_interface (line 566) | def create_single_alg_interface(self, n_cv: int, task_type: TaskType) \
class RandomParamsRFInterfaceWrapper (line 571) | class RandomParamsRFInterfaceWrapper(AlgInterfaceWrapper):
method __init__ (line 572) | def __init__(self, model_idx: int, **config):
class RandomParamsExtraTreesInterfaceWrapper (line 577) | class RandomParamsExtraTreesInterfaceWrapper(AlgInterfaceWrapper):
method __init__ (line 578) | def __init__(self, model_idx: int, **config):
class RandomParamsKNNInterfaceWrapper (line 583) | class RandomParamsKNNInterfaceWrapper(AlgInterfaceWrapper):
method __init__ (line 584) | def __init__(self, model_idx: int, **config):
class RandomParamsLinearModelInterfaceWrapper (line 589) | class RandomParamsLinearModelInterfaceWrapper(AlgInterfaceWrapper):
method __init__ (line 590) | def __init__(self, model_idx: int, **config):
class xRFMInterfaceWrapper (line 595) | class xRFMInterfaceWrapper(SubSplitInterfaceWrapper):
method create_sub_split_interface (line 596) | def create_sub_split_interface(self, task_type: TaskType) -> AlgInterf...
class RandomParamsxRFMInterfaceWrapper (line 600) | class RandomParamsxRFMInterfaceWrapper(MultiSplitAlgInterfaceWrapper):
method create_single_alg_interface (line 601) | def create_single_alg_interface(self, n_cv: int, task_type: TaskType) \
FILE: pytabkit/bench/data/common.py
class TaskSource (line 2) | class TaskSource:
class SplitType (line 15) | class SplitType:
FILE: pytabkit/bench/data/get_uci.py
function get_skill_craft (line 32) | def get_skill_craft():
function get_cargo_2000 (line 49) | def get_cargo_2000():
function get_KDC_4007 (line 60) | def get_KDC_4007():
function get_sml2010 (line 71) | def get_sml2010():
function get_wine_quality (line 97) | def get_wine_quality():
function get_parkinson (line 151) | def get_parkinson():
function get_insurance_benchmark (line 176) | def get_insurance_benchmark():
function get_EEG_steady_state (line 200) | def get_EEG_steady_state():
function get_air_quality (line 212) | def get_air_quality():
function get_cycle_power_plant (line 257) | def get_cycle_power_plant():
function get_carbon_nanotubes (line 280) | def get_carbon_nanotubes():
function get_naval_propulsion (line 304) | def get_naval_propulsion():
function get_blood_pressure (line 340) | def get_blood_pressure():
function get_gas_sensor_drift (line 356) | def get_gas_sensor_drift():
function get_bike_sharing (line 394) | def get_bike_sharing():
function get_appliances_energy (line 419) | def get_appliances_energy():
function get_indoor_loc (line 444) | def get_indoor_loc():
function get_online_news_popularity (line 500) | def get_online_news_popularity():
function get_facebook_comment_volume (line 518) | def get_facebook_comment_volume():
function get_bejing_pm25 (line 539) | def get_bejing_pm25():
function get_protein_tertiary_structure (line 560) | def get_protein_tertiary_structure():
function get_tamilnadu_electricity (line 574) | def get_tamilnadu_electricity():
function get_metro_interstate_traffic_volume (line 584) | def get_metro_interstate_traffic_volume():
function get_facebook_live_sellers_thailand (line 642) | def get_facebook_live_sellers_thailand():
function get_parking_birmingham (line 680) | def get_parking_birmingham():
function get_tarvel_review_ratings (line 734) | def get_tarvel_review_ratings():
function get_superconductivity (line 770) | def get_superconductivity():
function get_gnfuv_unmanned_surface_vehicles (line 803) | def get_gnfuv_unmanned_surface_vehicles():
function get_five_cities_pm25 (line 815) | def get_five_cities_pm25():
function get_phishing (line 873) | def get_phishing():
function get_ozone_level (line 895) | def get_ozone_level():
function get_opportunity_activity (line 917) | def get_opportunity_activity():
function get_australian_sign_language (line 932) | def get_australian_sign_language():
function get_seismic_bumps (line 945) | def get_seismic_bumps():
function get_meu_mobile_ksd (line 968) | def get_meu_mobile_ksd():
function get_character_trajectories (line 981) | def get_character_trajectories():
function get_vicon_physical_action (line 996) | def get_vicon_physical_action():
function get_simulated_falls (line 1008) | def get_simulated_falls():
function get_chess (line 1021) | def get_chess():
function get_abalone (line 1048) | def get_abalone():
function get_madelon (line 1066) | def get_madelon():
function get_spambase (line 1100) | def get_spambase():
function get_wilt (line 1117) | def get_wilt():
function get_waveform (line 1144) | def get_waveform():
function get_wall_following_robot (line 1176) | def get_wall_following_robot():
function get_page_blocks (line 1219) | def get_page_blocks():
function get_optical_recognition_handwritten_digits (line 1248) | def get_optical_recognition_handwritten_digits():
function get_bach_chorals_harmony (line 1269) | def get_bach_chorals_harmony():
function get_turkiye_student_evaluation (line 1282) | def get_turkiye_student_evaluation():
function get_smartphone_human_activity (line 1299) | def get_smartphone_human_activity():
function get_artificial_characters (line 1332) | def get_artificial_characters():
function get_first_order_theorem_proving (line 1345) | def get_first_order_theorem_proving():
function get_landsat_satimage (line 1381) | def get_landsat_satimage():
function get_hiv_1_protease (line 1400) | def get_hiv_1_protease():
function get_musk (line 1413) | def get_musk():
function get_ble_rssi_indoor_location (line 1435) | def get_ble_rssi_indoor_location():
function get_australian_sign_language (line 1447) | def get_australian_sign_language():
function get_anuran_calls (line 1460) | def get_anuran_calls():
function get_thyroids (line 1523) | def get_thyroids():
function get_isolet (line 1769) | def get_isolet():
function get_mushroom (line 1783) | def get_mushroom():
function get_assamese_characters (line 1802) | def get_assamese_characters():
function get_arabic_digit (line 1815) | def get_arabic_digit():
function get_eeg_steady_state_visual (line 1829) | def get_eeg_steady_state_visual():
function get_gesture_phase_segmentation (line 1843) | def get_gesture_phase_segmentation():
function get_emg_physical_action (line 1891) | def get_emg_physical_action():
function get_human_activity_smartphone (line 1904) | def get_human_activity_smartphone():
function get_polish_companies_bankruptcy (line 1943) | def get_polish_companies_bankruptcy():
function get_crowd_sourced_mapping (line 1969) | def get_crowd_sourced_mapping():
function get_firm_teacher_clave (line 2008) | def get_firm_teacher_clave():
function get_smartphone_human_activity_postural (line 2042) | def get_smartphone_human_activity_postural():
function get_pen_recognition_handwritten_characters (line 2088) | def get_pen_recognition_handwritten_characters():
function get_epileptic_seizure_recognition (line 2108) | def get_epileptic_seizure_recognition():
function get_nursery (line 2124) | def get_nursery():
function get_indoor_user_movement_prediction (line 2175) | def get_indoor_user_movement_prediction():
function get_eeg_eye_state (line 2189) | def get_eeg_eye_state():
function get_htru2 (line 2203) | def get_htru2():
function get_magic_gamma_telescope (line 2230) | def get_magic_gamma_telescope():
function get_letter_recognition (line 2248) | def get_letter_recognition():
function get_occupancy_detection (line 2267) | def get_occupancy_detection():
function get_avila (line 2289) | def get_avila():
function get_grammatical_facial_expressions (line 2323) | def get_grammatical_facial_expressions():
function get_chess_krvk (line 2335) | def get_chess_krvk():
function get_default_credit_card (line 2362) | def get_default_credit_card():
function get_nomao (line 2380) | def get_nomao():
function get_indoor_loc_mag (line 2414) | def get_indoor_loc_mag():
function get_activity_recognition (line 2427) | def get_activity_recognition():
function get_bank_marketing (line 2441) | def get_bank_marketing():
function get_census_income (line 2543) | def get_census_income():
function get_emg_for_gestures (line 2604) | def get_emg_for_gestures():
function get_indoor_channel_measurements (line 2617) | def get_indoor_channel_measurements():
function get_electrical_grid_stability_simulated (line 2629) | def get_electrical_grid_stability_simulated():
function get_online_shoppers_attention (line 2657) | def get_online_shoppers_attention():
function get_pmu_ud (line 2685) | def get_pmu_ud():
function get_seoul_bike_data (line 2697) | def get_seoul_bike_data():
function get_south_german_credit (line 2727) | def get_south_german_credit():
function get_shill_bidding (line 2744) | def get_shill_bidding():
function get_gas_turbine (line 2764) | def get_gas_turbine():
function get_oral_toxicity (line 2789) | def get_oral_toxicity():
function get_wave_energy (line 2812) | def get_wave_energy():
function get_firewall (line 2850) | def get_firewall():
function get_real_estate_value (line 2869) | def get_real_estate_value():
function get_crop_mapping (line 2887) | def get_crop_mapping():
function get_bitcoin_heist (line 2901) | def get_bitcoin_heist():
function get_query_analytics (line 2928) | def get_query_analytics():
function download_all_uci (line 2961) | def download_all_uci(paths: Paths):
FILE: pytabkit/bench/data/import_talent_benchmark.py
function import_talent_benchmark (line 14) | def import_talent_benchmark(paths: Paths, talent_folder: str, source_nam...
FILE: pytabkit/bench/data/import_tasks.py
function download_if_not_exists (line 17) | def download_if_not_exists(url: str, dest: str):
function extract_categories (line 39) | def extract_categories(X):
function check_zero_hot (line 72) | def check_zero_hot(uci_base_path):
function convert_to_class_numbers (line 92) | def convert_to_class_numbers(y):
function import_from_csv (line 102) | def import_from_csv(ds_path: Union[Path, str], task_type: TaskType, task...
function import_uci_tasks (line 140) | def import_uci_tasks(paths: Paths, remove_duplicates: bool = False, reru...
function get_openml_task_ids (line 164) | def get_openml_task_ids(suite_id: Union[str, int]) -> List[int]:
class PandasTask (line 170) | class PandasTask:
method __init__ (line 171) | def __init__(self, x_df: pd.DataFrame, y_df: pd.Series, cat_indicator:...
method get_n_classes (line 190) | def get_n_classes(self):
method get_n_samples (line 197) | def get_n_samples(self):
method deduplicate (line 200) | def deduplicate(self):
method limit_n_classes (line 207) | def limit_n_classes(self, max_n_classes: int):
method subsample (line 223) | def subsample(self, max_size: int):
method remove_missing_cont (line 231) | def remove_missing_cont(self):
method normalize_regression_y (line 239) | def normalize_regression_y(self):
method get_task (line 244) | def get_task(self, task_desc: TaskDescription) -> Task:
method from_openml_task_id (line 286) | def from_openml_task_id(task_id: int):
function set_openml_cache_dir (line 304) | def set_openml_cache_dir(dir_name: Union[str, Path]):
function get_openml_ds_names (line 314) | def get_openml_ds_names(task_ids: List[int]):
function import_openml (line 325) | def import_openml(task_ids: List[int], task_source_name: str, paths: Pat...
FILE: pytabkit/bench/data/paths.py
class TmpPathContextManager (line 10) | class TmpPathContextManager:
method __init__ (line 14) | def __init__(self, path: Path):
method __enter__ (line 17) | def __enter__(self) -> Path:
method __exit__ (line 23) | def __exit__(self, type, value, traceback):
class Paths (line 27) | class Paths:
method __init__ (line 35) | def __init__(self, base_folder: str, tasks_folder: Optional[str] = Non...
method from_env_variables (line 46) | def from_env_variables() -> 'Paths':
method base (line 67) | def base(self) -> Path:
method algs (line 70) | def algs(self) -> Path:
method tasks (line 73) | def tasks(self) -> Path:
method task_collections (line 76) | def task_collections(self) -> Path:
method results (line 79) | def results(self) -> Path:
method result_summaries (line 82) | def result_summaries(self) -> Path:
method eval (line 85) | def eval(self) -> Path:
method plots (line 88) | def plots(self) -> Path:
method tmp (line 91) | def tmp(self) -> Path:
method uci_download (line 94) | def uci_download(self) -> Path:
method resources (line 97) | def resources(self):
method times (line 100) | def times(self) -> Path:
method new_tmp_folder (line 103) | def new_tmp_folder(self) -> TmpPathContextManager:
method results_alg_task (line 107) | def results_alg_task(self, task_desc: 'TaskDescription', alg_name: str...
method summary_alg_task (line 110) | def summary_alg_task(self, task_desc: 'TaskDescription', alg_name: str...
method results_alg_task_split (line 114) | def results_alg_task_split(self, task_desc: 'TaskDescription', alg_nam...
method tasks_task (line 118) | def tasks_task(self, task_desc: 'TaskDescription') -> Path:
method results_task (line 121) | def results_task(self, task_desc: 'TaskDescription') -> Path:
method resources_exp_it (line 124) | def resources_exp_it(self, exp_name: str, iteration: int) -> Path:
method task_source (line 127) | def task_source(self, task_source_name: str) -> Path:
method times_alg_task (line 130) | def times_alg_task(self, alg_name: str, task_desc: 'TaskDescription'):
FILE: pytabkit/bench/data/tasks.py
class TaskDescription (line 25) | class TaskDescription:
method __init__ (line 30) | def __init__(self, task_source: str, task_name: str):
method load_info (line 38) | def load_info(self, paths: Paths) -> 'TaskInfo':
method load_task (line 47) | def load_task(self, paths: Paths):
method exists_task (line 56) | def exists_task(self, paths: Paths):
method __str__ (line 65) | def __str__(self):
method to_dict (line 71) | def to_dict(self) -> Dict:
method from_dict (line 80) | def from_dict(data: Dict) -> 'TaskDescription':
method __hash__ (line 89) | def __hash__(self):
method __eq__ (line 92) | def __eq__(self, other):
class TaskCollection (line 98) | class TaskCollection:
method __init__ (line 104) | def __init__(self, coll_name: str, task_descs: List[TaskDescription]):
method save (line 112) | def save(self, paths: Paths):
method load_infos (line 117) | def load_infos(self, paths: Paths) -> List['TaskInfo']:
method from_name (line 121) | def from_name(coll_name: str, paths: Paths) -> 'TaskCollection':
method from_source (line 128) | def from_source(task_source: str, paths: Paths) -> 'TaskCollection':
class TaskInfo (line 145) | class TaskInfo:
method __init__ (line 149) | def __init__(self, task_desc: TaskDescription, n_samples: int, tensor_...
method get_n_classes (line 170) | def get_n_classes(self) -> int:
method load_task (line 176) | def load_task(self, paths: Paths) -> 'Task':
method get_ds_size_gb (line 191) | def get_ds_size_gb(self) -> float:
method save (line 200) | def save(self, paths: Paths):
method load (line 210) | def load(paths: Paths, task_desc: TaskDescription):
method from_ds (line 221) | def from_ds(task_desc: TaskDescription, ds: DictDataset, default_split...
method get_random_splits (line 226) | def get_random_splits(self, n_splits: int, trainval_fraction: float = ...
method get_default_splits (line 234) | def get_default_splits(self, n_splits) -> List[SplitInfo]:
class Task (line 243) | class Task:
method __init__ (line 248) | def __init__(self, task_info: TaskInfo, ds: DictDataset):
method save (line 252) | def save(self, paths: Paths):
class TaskPackage (line 262) | class TaskPackage:
method __init__ (line 266) | def __init__(self, task_info: TaskInfo, split_infos: List[SplitInfo], ...
FILE: pytabkit/bench/data/uci_file_ops.py
class UCIVars (line 35) | class UCIVars:
function prepare_new_data_set_group_id (line 59) | def prepare_new_data_set_group_id():
function make_folder (line 68) | def make_folder(folder):
function download_and_save (line 79) | def download_and_save(url, filename):
function unzip_raw_data (line 94) | def unzip_raw_data(filename):
function unrar_raw_data (line 104) | def unrar_raw_data(filename):
function my_decode (line 113) | def my_decode(x):
function unarff_raw_data (line 124) | def unarff_raw_data(filename):
function un_z_raw_data (line 142) | def un_z_raw_data(filename):
function untar_raw_data (line 156) | def untar_raw_data(filename):
function ungz_raw_data (line 167) | def ungz_raw_data(filename):
function replace_chars_in_file (line 184) | def replace_chars_in_file(filename, old_char, new_char):
function get_category_replace_string (line 201) | def get_category_replace_string(category_size, position, separator):
function replace_categories_in_file (line 224) | def replace_categories_in_file(filename, categories, separator):
function convert_replace_string_to_vector (line 233) | def convert_replace_string_to_vector(string, separator):
function get_categories_in_mixed_data (line 246) | def get_categories_in_mixed_data(data, column):
function auto_replace_categories_in_mixed_data (line 258) | def auto_replace_categories_in_mixed_data(data, column, separator, unkno...
function auto_replace_missing_in_mixed_data (line 274) | def auto_replace_missing_in_mixed_data(data, unknown_string = '?'):
function replace_categories_in_mixed_data (line 299) | def replace_categories_in_mixed_data(data, categories, column, separator...
function replace_bin_cats_in_mixed_data (line 330) | def replace_bin_cats_in_mixed_data(data, categories, column, separator, ...
function replace_ordinals_in_mixed_data (line 359) | def replace_ordinals_in_mixed_data(data, categories, column, separator, ...
function replace_manual_in_mixed_data (line 387) | def replace_manual_in_mixed_data(data, categories, column, replacement, ...
function replace_circulars_in_mixed_data (line 414) | def replace_circulars_in_mixed_data(data, categories, column, separator,...
function replace_isodate_by_day_in_mixed_data (line 447) | def replace_isodate_by_day_in_mixed_data(data, column):
function replace_time_by_seconds_in_mixed_data (line 466) | def replace_time_by_seconds_in_mixed_data(data, column, sep, rounded = 1):
function remove_files (line 483) | def remove_files(folder, filename_pattern):
function concat_files (line 493) | def concat_files(source_filename_pattern, target_filename):
function load_mixed_raw_data (line 508) | def load_mixed_raw_data(filename, sep, header = False):
function write_mixed_raw_data (line 535) | def write_mixed_raw_data(filename, data, sep):
function load_raw_data (line 548) | def load_raw_data(filename, sep, description_columns = 0, date_column = ...
function remove_rows_with_label (line 670) | def remove_rows_with_label(data, label):
function remove_empty_columns (line 683) | def remove_empty_columns(data):
function save_data_to_file (line 702) | def save_data_to_file(data, filename, is_classification, is_regression =...
function save_data_stats (line 795) | def save_data_stats(data_stats):
function is_number (line 812) | def is_number(string, german_decimal):
function remove_columns (line 835) | def remove_columns(data, columns):
function move_label_in_front (line 843) | def move_label_in_front(data, label_column):
function count_bin_columns (line 857) | def count_bin_columns(data):
function convert_time_to_seconds (line 873) | def convert_time_to_seconds(time, sep):
FILE: pytabkit/bench/eval/analysis.py
class ResultsTables (line 14) | class ResultsTables:
method __init__ (line 15) | def __init__(self, paths: Paths):
method get (line 19) | def get(self, coll_name: str, n_cv: int = 1, tag: str = 'paper') -> Mu...
function _get_t_mean_confidence_interval_single (line 32) | def _get_t_mean_confidence_interval_single(values: np.ndarray) -> Tuple[...
function get_t_mean_confidence_interval (line 46) | def get_t_mean_confidence_interval(values: np.ndarray) -> Tuple[np.ndarr...
function get_benchmark_results (line 59) | def get_benchmark_results(paths: Paths, table: MultiResultsTable, coll_n...
function get_opt_groups (line 221) | def get_opt_groups(task_type_name: str) -> Dict[str, List[str]]:
function get_ensemble_groups (line 246) | def get_ensemble_groups(task_type_name: str) -> Dict[str, List[str]]:
function get_simplified_name (line 266) | def get_simplified_name(alg_name: str):
function get_display_name (line 282) | def get_display_name(alg_name: str) -> str:
FILE: pytabkit/bench/eval/colors.py
function bilin_int (line 4) | def bilin_int(x: float, values: List[Tuple[float, float]]) -> float:
function bisection_find (line 21) | def bisection_find(f: Callable[[float], float], y: float, xmin: float, x...
function more_percep_uniform_hue (line 48) | def more_percep_uniform_hue(x: float) -> float:
FILE: pytabkit/bench/eval/evaluation.py
class AlgFilter (line 13) | class AlgFilter:
method __call__ (line 14) | def __call__(self, alg_name: str, tags: List[str], alg_config: Dict[st...
class FunctionAlgFilter (line 18) | class FunctionAlgFilter(AlgFilter):
method __init__ (line 19) | def __init__(self, f):
method __call__ (line 22) | def __call__(self, alg_name: str, tags: List[str], alg_config: Dict[st...
class EvalModeSelector (line 26) | class EvalModeSelector: # base class
method select_eval_modes (line 27) | def select_eval_modes(self, eval_modes: List[Tuple[str, str, str]]) ->...
method select (line 32) | def select(self, alg_name: str, task_results: List) -> Tuple[List[str]...
class DefaultEvalModeSelector (line 57) | class DefaultEvalModeSelector(EvalModeSelector):
method select_eval_modes (line 58) | def select_eval_modes(self, eval_modes: List[Tuple[str, str, str]]) ->...
class AlgTaskTable (line 89) | class AlgTaskTable:
method __init__ (line 90) | def __init__(self, alg_names: List[str], task_infos: List[TaskInfo], a...
method map (line 95) | def map(self, f):
method filter_n_splits (line 100) | def filter_n_splits(self, n_splits: int) -> 'AlgTaskTable':
method to_array (line 114) | def to_array(self) -> np.ndarray:
method rename_algs (line 117) | def rename_algs(self, f: Callable[[str], str]) -> 'AlgTaskTable':
method filter_algs (line 121) | def filter_algs(self, alg_names: List[str]) -> 'AlgTaskTable':
class MultiResultsTable (line 127) | class MultiResultsTable:
method __init__ (line 128) | def __init__(self, train_table: AlgTaskTable, val_table: AlgTaskTable,...
method get_test_results_table (line 138) | def get_test_results_table(self, eval_mode_selector: EvalModeSelector,...
method load (line 290) | def load(task_collection: TaskCollection, n_cv: int, paths: Paths, alg...
class TableAnalyzer (line 356) | class TableAnalyzer:
method __init__ (line 357) | def __init__(self, post_f: Optional[Callable[[float], float]] = None):
method _print_table (line 360) | def _print_table(self, alg_names: List[str], means, stds=None, is_high...
method print_analysis (line 379) | def print_analysis(self, alg_task_table: AlgTaskTable):
class TaskWeighting (line 383) | class TaskWeighting:
method __init__ (line 384) | def __init__(self, task_infos: List[TaskInfo], separate_task_names: Op...
method get_n_groups (line 404) | def get_n_groups(self) -> int:
method get_task_weights (line 407) | def get_task_weights(self) -> np.ndarray:
class MeanTableAnalyzer (line 411) | class MeanTableAnalyzer(TableAnalyzer):
method __init__ (line 412) | def __init__(self, f=None, use_weighting=False, separate_task_names: O...
method print_analysis (line 418) | def print_analysis(self, alg_task_table: AlgTaskTable) -> None:
method get_means (line 438) | def get_means(self, alg_task_table: AlgTaskTable) -> List[float]:
method get_intervals (line 451) | def get_intervals(self, alg_task_table: AlgTaskTable, std_factor: floa...
class ArrayTableAnalyzer (line 471) | class ArrayTableAnalyzer(TableAnalyzer):
method __init__ (line 476) | def __init__(self, f=None, use_weighting=False, separate_task_names: O...
method _is_higher_better (line 482) | def _is_higher_better(self) -> bool:
method _process_losses (line 486) | def _process_losses(self, loss_arr: np.ndarray, val_loss_arr: Optional...
method print_analysis (line 491) | def print_analysis(self, alg_task_table: AlgTaskTable, val_table: Opti...
class WinsTableAnalyzer (line 526) | class WinsTableAnalyzer(ArrayTableAnalyzer):
method _process_losses (line 527) | def _process_losses(self, loss_arr: np.ndarray, val_loss_arr: Optional...
method _is_higher_better (line 531) | def _is_higher_better(self) -> bool:
function get_ranks (line 535) | def get_ranks(values: np.ndarray) -> np.ndarray:
class RankTableAnalyzer (line 544) | class RankTableAnalyzer(ArrayTableAnalyzer):
method _process_losses (line 545) | def _process_losses(self, loss_arr: np.ndarray, val_loss_arr: Optional...
class NormalizedLossTableAnalyzer (line 550) | class NormalizedLossTableAnalyzer(ArrayTableAnalyzer):
method _process_losses (line 551) | def _process_losses(self, loss_arr: np.ndarray, val_loss_arr: Optional...
class GreedyAlgSelectionTableAnalyzer (line 558) | class GreedyAlgSelectionTableAnalyzer(ArrayTableAnalyzer):
method _process_losses (line 563) | def _process_losses(self, loss_arr: np.ndarray, val_loss_arr: Optional...
function alg_results_str (line 591) | def alg_results_str(alg_task_table: AlgTaskTable, alg_name: str):
function alg_comparison_str (line 607) | def alg_comparison_str(alg_task_table: AlgTaskTable, alg_names: List[str]):
FILE: pytabkit/bench/eval/plotting.py
function get_plot_color_idx (line 55) | def get_plot_color_idx(alg_name: str):
function gg_color_hue (line 72) | def gg_color_hue(n: int, saturation: float = 1.0, value: float = 0.65):
function get_plot_color (line 80) | def get_plot_color(alg_name: str):
function coll_name_to_title (line 89) | def coll_name_to_title(coll_name: str) -> str:
function plot_schedule (line 112) | def plot_schedule(paths: Paths, filename: str, sched_name: str) -> None:
function plot_schedules (line 128) | def plot_schedules(paths: Paths, filename: str, sched_names: List[str], ...
function _create_benchmark_result_plot (line 146) | def _create_benchmark_result_plot(file_path: Path, benchmark_results: Di...
function _create_benchmark_result_plot_with_intervals (line 207) | def _create_benchmark_result_plot_with_intervals(file_path: Path, benchm...
function get_equidistant_colors (line 279) | def get_equidistant_colors(n: int):
function plot_benchmark_bars (line 286) | def plot_benchmark_bars(paths: Paths, tables: ResultsTables, filename: s...
function plot_scatter_ax (line 354) | def plot_scatter_ax(paths: Paths, tables: ResultsTables, ax: matplotlib....
function plot_scatter (line 419) | def plot_scatter(paths: Paths, filename: str, tables: ResultsTables, col...
function _plot_scatter_with_labels (line 455) | def _plot_scatter_with_labels(x_dict: Dict[str, float], y_dict: Dict[str...
function extend_runtimes (line 651) | def extend_runtimes(times: Dict[str, float], task_type_name: str, keep_g...
function plot_pareto_ax (line 718) | def plot_pareto_ax(ax: matplotlib.axes.Axes, paths: Paths, tables: Resul...
function shorten_coll_names (line 853) | def shorten_coll_names(coll_names: List[str]) -> List[str]:
function plot_pareto (line 862) | def plot_pareto(paths: Paths, tables: ResultsTables, coll_names: List[st...
function plot_winrates (line 951) | def plot_winrates(paths: Paths, tables: ResultsTables, coll_name: str, a...
function plot_stopping_ax (line 1029) | def plot_stopping_ax(ax: plt.Axes, paths: Paths, tables: ResultsTables, ...
function plot_stopping (line 1062) | def plot_stopping(paths: Paths, tables: ResultsTables, classification: b...
function get_equidistant_blue_colors (line 1093) | def get_equidistant_blue_colors(n: int):
function _create_cumul_abl_plot (line 1103) | def _create_cumul_abl_plot(file_path: Path, benchmark_results: Dict[str,...
function plot_cumulative_ablations (line 1266) | def plot_cumulative_ablations(paths: Paths, tables: ResultsTables, filen...
function plot_cdd_ax (line 1375) | def plot_cdd_ax(ax: matplotlib.axes.Axes, paths: Paths, tables: ResultsT...
function plot_cdd (line 1418) | def plot_cdd(paths: Paths, tables: ResultsTables, coll_names: List[str],...
FILE: pytabkit/bench/eval/runtimes.py
function get_avg_train_times (line 10) | def get_avg_train_times(paths: Paths, coll_name: str, per_1k_samples: bo...
function get_avg_predict_times (line 28) | def get_avg_predict_times(paths: Paths, coll_name: str, per_1k_samples: ...
FILE: pytabkit/bench/eval/tables.py
function _get_table_str (line 14) | def _get_table_str(*parts: List[List[str]]):
function generate_ds_table (line 29) | def generate_ds_table(paths: Paths, task_collection: TaskCollection, inc...
function generate_collections_table (line 67) | def generate_collections_table(paths: Paths):
function generate_individual_results_table (line 131) | def generate_individual_results_table(paths: Paths, tables: ResultsTable...
function generate_ablations_table (line 181) | def generate_ablations_table(paths: Paths, tables: ResultsTables):
function generate_refit_table (line 303) | def generate_refit_table(paths: Paths, tables: ResultsTables, alg_family...
function generate_preprocessing_table (line 355) | def generate_preprocessing_table(paths: Paths, tables: ResultsTables):
function generate_stopping_table (line 404) | def generate_stopping_table(paths: Paths, tables: ResultsTables):
function generate_architecture_table (line 449) | def generate_architecture_table(paths: Paths, tables: ResultsTables):
FILE: pytabkit/bench/run/results.py
class ResultManager (line 11) | class ResultManager:
method __init__ (line 16) | def __init__(self):
method add_results (line 29) | def add_results(self, is_cv: bool, results_dict: Dict) -> None:
method save (line 51) | def save(self, path: Path) -> None:
method load (line 66) | def load(path: Path, load_other: bool = True, load_preds: bool = True):
function save_summaries (line 94) | def save_summaries(paths: Paths, task_infos: List[TaskInfo], alg_name: s...
FILE: pytabkit/bench/run/task_execution.py
class TabBenchJob (line 24) | class TabBenchJob(AbstractJob):
method __init__ (line 29) | def __init__(self, alg_name: str, alg_wrapper: AlgWrapper, task_packag...
method get_group (line 43) | def get_group(self) -> str:
method __call__ (line 49) | def __call__(self, assigned_resources: NodeResources) -> bool:
method get_required_resources (line 92) | def get_required_resources(self) -> RequiredResources:
method get_desc (line 95) | def get_desc(self) -> str:
class RunConfig (line 106) | class RunConfig:
method __init__ (line 111) | def __init__(self, n_tt_splits: int, n_cv: int = 1, n_refit: int = 0, ...
class TabBenchJobManager (line 141) | class TabBenchJobManager:
method __init__ (line 146) | def __init__(self, paths: Paths):
method add_jobs (line 154) | def add_jobs(self, task_infos: List[TaskInfo], run_config: RunConfig, ...
method run_jobs (line 252) | def run_jobs(self, scheduler: BaseJobScheduler) -> None:
function run_alg_selection (line 269) | def run_alg_selection(paths: Paths, config: RunConfig, task_infos: List[...
FILE: pytabkit/bench/scheduling/execution.py
function get_gpu_rams_gb (line 16) | def get_gpu_rams_gb(use_reserved: bool = True):
function measure_node_resources (line 48) | def measure_node_resources(node_id: int) -> Tuple[NodeResources, NodeRes...
function node_runner (line 88) | def node_runner(feedback_queue, job_queue, node_id: int):
class NodeManager (line 143) | class NodeManager:
method start (line 144) | def start(self):
method terminate (line 147) | def terminate(self):
class RayJobManager (line 151) | class RayJobManager(NodeManager):
method __init__ (line 152) | def __init__(self, max_n_threads: Optional[int] = None, available_cpu_...
method start (line 163) | def start(self) -> None:
method get_resource_manager (line 208) | def get_resource_manager(self) -> ResourceManager:
method submit_job (line 213) | def submit_job(self, job_info: JobInfo) -> None:
method pop_finished_job_infos (line 228) | def pop_finished_job_infos(self, timeout_s: float = -1.0) -> List[JobI...
method terminate (line 252) | def terminate(self) -> None:
FILE: pytabkit/bench/scheduling/jobs.py
class JobResult (line 10) | class JobResult:
method __init__ (line 14) | def __init__(self, job_id: int, time_s: float,
method set_max_cpu_ram_gb (line 39) | def set_max_cpu_ram_gb(self, value: float) -> None:
class AbstractJob (line 47) | class AbstractJob:
method get_group (line 51) | def get_group(self) -> str:
method __call__ (line 58) | def __call__(self, assigned_resources: NodeResources) -> bool:
method get_required_resources (line 72) | def get_required_resources(self) -> RequiredResources:
method get_desc (line 78) | def get_desc(self) -> str:
class JobRunner (line 85) | class JobRunner:
method __init__ (line 89) | def __init__(self, job: AbstractJob, job_id: int, assigned_resources: ...
method __call__ (line 99) | def __call__(self) -> JobResult:
FILE: pytabkit/bench/scheduling/resource_manager.py
class JobStatus (line 10) | class JobStatus(enum.Enum):
class JobInfo (line 17) | class JobInfo:
method __init__ (line 18) | def __init__(self, job: AbstractJob, job_id: int, start_time: Optional...
method get_status (line 27) | def get_status(self) -> JobStatus:
method set_started (line 37) | def set_started(self, assigned_resources: NodeResources):
method set_finished (line 41) | def set_finished(self, job_result: JobResult):
method is_remaining (line 44) | def is_remaining(self):
method is_running (line 47) | def is_running(self):
method is_finished (line 50) | def is_finished(self):
method is_failed (line 53) | def is_failed(self):
method is_succeed (line 56) | def is_succeed(self):
class ResourceManager (line 60) | class ResourceManager:
method __init__ (line 64) | def __init__(self, total_resources: SystemResources, fixed_resources: ...
method get_fixed_resources (line 69) | def get_fixed_resources(self):
method get_total_resources (line 72) | def get_total_resources(self):
method get_free_resources (line 75) | def get_free_resources(self):
method job_started (line 84) | def job_started(self, job_info: JobInfo):
method job_finished (line 90) | def job_finished(self, job_result: JobResult) -> JobInfo:
FILE: pytabkit/bench/scheduling/resources.py
class NodeResources (line 15) | class NodeResources:
method __init__ (line 19) | def __init__(self, node_id: int, n_threads: float, cpu_ram_gb: float, ...
method get_n_threads (line 27) | def get_n_threads(self) -> int:
method set_n_threads (line 30) | def set_n_threads(self, n_threads: int):
method get_cpu_ram_gb (line 35) | def get_cpu_ram_gb(self) -> float:
method set_cpu_ram_gb (line 38) | def set_cpu_ram_gb(self, cpu_ram_gb: float) -> None:
method set_gpu_rams_gb (line 43) | def set_gpu_rams_gb(self, gpu_rams_gb: np.ndarray) -> None:
method get_gpu_usages (line 48) | def get_gpu_usages(self) -> np.ndarray:
method get_gpu_rams_gb (line 51) | def get_gpu_rams_gb(self) -> np.ndarray:
method get_physical_core_usages (line 54) | def get_physical_core_usages(self) -> np.ndarray:
method get_n_physical_cores (line 57) | def get_n_physical_cores(self) -> int:
method get_total_gpu_ram_gb (line 60) | def get_total_gpu_ram_gb(self) -> float:
method get_total_gpu_usage (line 63) | def get_total_gpu_usage(self) -> float:
method get_used_gpu_ids (line 66) | def get_used_gpu_ids(self) -> np.ndarray: # todo: naming
method get_used_physical_cores (line 69) | def get_used_physical_cores(self) -> np.ndarray:
method get_resource_vector (line 72) | def get_resource_vector(self) -> np.ndarray:
method get_interface_resources (line 76) | def get_interface_resources(self) -> InterfaceResources:
method __iadd__ (line 80) | def __iadd__(self, other: 'NodeResources') -> 'NodeResources': # oper...
method __isub__ (line 84) | def __isub__(self, other: 'NodeResources') -> 'NodeResources':
method __imul__ (line 88) | def __imul__(self, other: 'NodeResources') -> 'NodeResources':
method __itruediv__ (line 92) | def __itruediv__(self, other: 'NodeResources') -> 'NodeResources':
method __add__ (line 96) | def __add__(self, other: 'NodeResources') -> 'NodeResources':
method __sub__ (line 101) | def __sub__(self, other: 'NodeResources') -> 'NodeResources':
method __mul__ (line 106) | def __mul__(self, other: 'NodeResources') -> 'NodeResources':
method __truediv__ (line 111) | def __truediv__(self, other: 'NodeResources') -> 'NodeResources':
method try_assign (line 116) | def try_assign(self, required_resources: RequiredResources,
method zeros_like (line 172) | def zeros_like(node_resources: 'NodeResources') -> 'NodeResources':
class SystemResources (line 178) | class SystemResources:
method __init__ (line 182) | def __init__(self, resources: List[NodeResources]):
method __getitem__ (line 185) | def __getitem__(self, index: int):
method __len__ (line 188) | def __len__(self):
method __iadd__ (line 191) | def __iadd__(self, other):
method __isub__ (line 196) | def __isub__(self, other):
method __imul__ (line 201) | def __imul__(self, other):
method __itruediv__ (line 206) | def __itruediv__(self, other):
method __add__ (line 211) | def __add__(self, other):
method __sub__ (line 216) | def __sub__(self, other):
method __mul__ (line 221) | def __mul__(self, other):
method __truediv__ (line 226) | def __truediv__(self, other):
method get_n_threads (line 231) | def get_n_threads(self):
method get_cpu_ram_gb (line 234) | def get_cpu_ram_gb(self):
method get_gpu_usage (line 237) | def get_gpu_usage(self):
method get_gpu_ram_gb (line 240) | def get_gpu_ram_gb(self):
method get_num_gpus (line 243) | def get_num_gpus(self):
method get_resource_vector (line 246) | def get_resource_vector(self):
FILE: pytabkit/bench/scheduling/schedulers.py
function format_length_s (line 13) | def format_length_s(duration: float) -> str:
function format_date_s (line 33) | def format_date_s(time_s: float) -> str:
class BaseJobScheduler (line 37) | class BaseJobScheduler:
method __init__ (line 42) | def __init__(self, job_manager: RayJobManager):
method _submit_more_jobs (line 47) | def _submit_more_jobs(self) -> None:
method add_jobs (line 51) | def add_jobs(self, jobs: List[AbstractJob]):
method run (line 55) | def run(self):
method _has_unfinished_jobs (line 84) | def _has_unfinished_jobs(self) -> bool:
method _print_start (line 87) | def _print_start(self):
method _print_end (line 97) | def _print_end(self):
method _compute_group_stats (line 123) | def _compute_group_stats(self) -> Dict[str, Dict[str, Union[int, float...
method _get_time_estimates (line 156) | def _get_time_estimates(self, job_infos: List[JobInfo], group_stats: D...
method _print_progress (line 174) | def _print_progress(self):
method _print_running_jobs (line 226) | def _print_running_jobs(self):
class SimpleJobScheduler (line 264) | class SimpleJobScheduler(BaseJobScheduler):
method _submit_more_jobs (line 271) | def _submit_more_jobs(self) -> None:
class CustomJobScheduler (line 336) | class CustomJobScheduler(BaseJobScheduler):
method _submit_more_jobs (line 342) | def _submit_more_jobs(self) -> None:
FILE: pytabkit/models/alg_interfaces/alg_interfaces.py
class AlgInterface (line 19) | class AlgInterface:
method __init__ (line 44) | def __init__(self, fit_params: Optional[List[Dict[str, Any]]] = None, ...
method fit (line 58) | def fit(self, ds: DictDataset, idxs_list: List[SplitIdxs], interface_r...
method fit_and_eval (line 87) | def fit_and_eval(self, ds: DictDataset, idxs_list: List[SplitIdxs], in...
method eval (line 104) | def eval(self, ds: DictDataset, idxs_list: List[SplitIdxs], metrics: O...
method predict (line 191) | def predict(self, ds: DictDataset) -> torch.Tensor:
method get_refit_interface (line 203) | def get_refit_interface(self, n_refit: int, fit_params: Optional[List[...
method get_fit_params (line 215) | def get_fit_params(self) -> Optional[List[Dict]]:
method get_required_resources (line 221) | def get_required_resources(self, ds: DictDataset, n_cv: int, n_refit: ...
method get_available_predict_params (line 237) | def get_available_predict_params(self) -> Dict[str, Dict[str, Any]]:
method get_current_predict_params_name (line 241) | def get_current_predict_params_name(self):
method get_current_predict_params_dict (line 244) | def get_current_predict_params_dict(self):
method set_current_predict_params (line 247) | def set_current_predict_params(self, name: str) -> None:
method to (line 250) | def to(self, device: str) -> None:
class MultiSplitWrapperAlgInterface (line 254) | class MultiSplitWrapperAlgInterface(AlgInterface):
method __init__ (line 256) | def __init__(self, single_split_interfaces: List[AlgInterface], **conf...
method get_refit_interface (line 261) | def get_refit_interface(self, n_refit: int, fit_params: Optional[List[...
method fit (line 273) | def fit(self, ds: DictDataset, idxs_list: List[SplitIdxs], interface_r...
method fit_and_eval (line 285) | def fit_and_eval(self, ds: DictDataset, idxs_list: List[SplitIdxs], in...
method predict (line 300) | def predict(self, ds: DictDataset) -> torch.Tensor:
method get_required_resources (line 303) | def get_required_resources(self, ds: DictDataset, n_cv: int, n_refit: ...
method get_available_predict_params (line 310) | def get_available_predict_params(self) -> Dict[str, Dict[str, Any]]:
method set_current_predict_params (line 313) | def set_current_predict_params(self, name: str) -> None:
class SingleSplitAlgInterface (line 319) | class SingleSplitAlgInterface(AlgInterface):
class OptAlgInterface (line 323) | class OptAlgInterface(SingleSplitAlgInterface):
method __init__ (line 324) | def __init__(self, hyper_optimizer: HyperOptimizer, max_resource_confi...
method create_alg_interface (line 345) | def create_alg_interface(self, n_sub_splits: int, **config) -> AlgInte...
method get_refit_interface (line 348) | def get_refit_interface(self, n_refit: int, fit_params: Optional[List[...
method objective (line 360) | def objective(self, params, ds: DictDataset, idxs_list: List[SplitIdxs...
method fit_and_eval (line 418) | def fit_and_eval(self, ds: DictDataset, idxs_list: List[SplitIdxs], in...
method predict (line 444) | def predict(self, ds: DictDataset) -> torch.Tensor:
method get_required_resources (line 447) | def get_required_resources(self, ds: DictDataset, n_cv: int, n_refit: ...
class RandomParamsAlgInterface (line 456) | class RandomParamsAlgInterface(SingleSplitAlgInterface):
method __init__ (line 457) | def __init__(self, model_idx: int, fit_params: Optional[List[Dict[str,...
method _sample_params (line 468) | def _sample_params(self, is_classification: bool, seed: int, n_train: ...
method _create_interface_from_config (line 471) | def _create_interface_from_config(self, n_tv_splits: int, **config):
method get_refit_interface (line 474) | def get_refit_interface(self, n_refit: int, fit_params: Optional[List[...
method _create_sub_interface (line 479) | def _create_sub_interface(self, ds: DictDataset, seed: int, n_train: i...
method fit (line 489) | def fit(self, ds: DictDataset, idxs_list: List[SplitIdxs], interface_r...
method predict (line 500) | def predict(self, ds: DictDataset) -> torch.Tensor:
method get_required_resources (line 504) | def get_required_resources(self, ds: DictDataset, n_cv: int, n_refit: ...
FILE: pytabkit/models/alg_interfaces/autogluon_model_interfaces.py
class AutoGluonModelAlgInterface (line 16) | class AutoGluonModelAlgInterface(SklearnSubSplitInterface):
method _create_sklearn_model (line 20) | def _create_sklearn_model(self, seed: int, n_threads: int, gpu_devices...
method _create_df (line 59) | def _create_df(self, X: pd.DataFrame, y: Optional[np.ndarray]):
method get_required_resources (line 69) | def get_required_resources(self, ds: DictDataset, n_cv: int, n_refit: ...
method _fit_sklearn (line 91) | def _fit_sklearn(self, x_df: pd.DataFrame, y: np.ndarray, val_idxs: np...
method _predict_sklearn (line 161) | def _predict_sklearn(self, x_df: pd.DataFrame) -> np.ndarray:
method _predict_proba_sklearn (line 164) | def _predict_proba_sklearn(self, x_df: pd.DataFrame) -> np.ndarray:
FILE: pytabkit/models/alg_interfaces/base.py
class SplitIdxs (line 7) | class SplitIdxs:
method __init__ (line 11) | def __init__(self, train_idxs: torch.Tensor, val_idxs: Optional[torch....
method get_sub_split_idxs (line 39) | def get_sub_split_idxs(self, i: int) -> 'SubSplitIdxs':
method get_sub_split_idxs_alt (line 43) | def get_sub_split_idxs_alt(self, i: int) -> 'SplitIdxs':
class SubSplitIdxs (line 48) | class SubSplitIdxs:
method __init__ (line 52) | def __init__(self, train_idxs: torch.Tensor, val_idxs: Optional[torch....
class InterfaceResources (line 66) | class InterfaceResources:
method __init__ (line 70) | def __init__(self, n_threads: int, gpu_devices: List[str], time_in_sec...
class RequiredResources (line 76) | class RequiredResources:
method __init__ (line 80) | def __init__(self, time_s: float, n_threads: float, cpu_ram_gb: float,...
method get_resource_vector (line 91) | def get_resource_vector(self, fixed_resource_vector: np.ndarray):
method should_add_fixed_resources (line 99) | def should_add_fixed_resources(self) -> bool:
method combine_sequential (line 103) | def combine_sequential(resources_list: List['RequiredResources']):
FILE: pytabkit/models/alg_interfaces/calibration.py
class PostHocCalibrationAlgInterface (line 24) | class PostHocCalibrationAlgInterface(AlgInterface):
method __init__ (line 25) | def __init__(self, alg_interface: AlgInterface, fit_params: Optional[L...
method _transform_probs (line 31) | def _transform_probs(self, probs: np.ndarray) -> np.ndarray:
method fit (line 38) | def fit(self, ds: DictDataset, idxs_list: List[SplitIdxs], interface_r...
method predict (line 95) | def predict(self, ds: DictDataset) -> torch.Tensor:
method get_required_resources (line 127) | def get_required_resources(self, ds: DictDataset, n_cv: int, n_refit: ...
method to (line 131) | def to(self, device: str) -> None:
FILE: pytabkit/models/alg_interfaces/catboost_interfaces.py
class CatBoostSklearnSubSplitInterface (line 24) | class CatBoostSklearnSubSplitInterface(SklearnSubSplitInterface):
method _get_cat_indexes_arg_name (line 25) | def _get_cat_indexes_arg_name(self) -> str:
method _create_sklearn_model (line 28) | def _create_sklearn_model(self, seed: int, n_threads: int, gpu_devices...
method get_required_resources (line 62) | def get_required_resources(self, ds: DictDataset, n_cv: int, n_refit: ...
class CatBoostCustomMetric (line 73) | class CatBoostCustomMetric:
method __init__ (line 77) | def __init__(self, metric_name: str, is_classification: bool, is_highe...
method is_max_optimal (line 84) | def is_max_optimal(self):
method evaluate (line 87) | def evaluate(self, approxes, target, weight):
method get_final_error (line 116) | def get_final_error(self, error, weight):
class CatBoostSubSplitInterface (line 120) | class CatBoostSubSplitInterface(TreeBasedSubSplitInterface):
method _get_params (line 121) | def _get_params(self):
method get_refit_interface (line 166) | def get_refit_interface(self, n_refit: int, fit_params: Optional[List[...
method _get_eval_metric (line 170) | def _get_eval_metric(self, val_metric_name: Optional[str], n_classes: ...
method _preprocess_params (line 197) | def _preprocess_params(self, params: Dict[str, Any], n_classes: int) -...
method _convert_ds (line 227) | def _convert_ds(self, ds: DictDataset) -> Any:
method _fit (line 239) | def _fit(self, train_ds: DictDataset, val_ds: Optional[DictDataset], p...
method _predict (line 272) | def _predict(self, bst, ds: DictDataset, n_classes: int,
method get_required_resources (line 297) | def get_required_resources(self, ds: DictDataset, n_cv: int, n_refit: ...
class CatBoostHyperoptAlgInterface (line 308) | class CatBoostHyperoptAlgInterface(OptAlgInterface):
method __init__ (line 309) | def __init__(self, space=None, n_hyperopt_steps: int = 50, **config):
method create_alg_interface (line 392) | def create_alg_interface(self, n_sub_splits: int, **config) -> AlgInte...
class RandomParamsCatBoostAlgInterface (line 396) | class RandomParamsCatBoostAlgInterface(RandomParamsAlgInterface):
method _sample_params (line 397) | def _sample_params(self, is_classification: bool, seed: int, n_train: ...
method _create_interface_from_config (line 758) | def _create_interface_from_config(self, n_tv_splits: int, **config):
FILE: pytabkit/models/alg_interfaces/ensemble_interfaces.py
class WeightedPrediction (line 18) | class WeightedPrediction:
method __init__ (line 19) | def __init__(self, y_pred_list: List[torch.Tensor], task_type: TaskType):
method predict_for_weights (line 24) | def predict_for_weights(self, weights: np.ndarray):
class CaruanaEnsembleAlgInterface (line 33) | class CaruanaEnsembleAlgInterface(SingleSplitAlgInterface):
method __init__ (line 39) | def __init__(self, alg_interfaces: List[AlgInterface], fit_params: Opt...
method get_refit_interface (line 44) | def get_refit_interface(self, n_refit: int, fit_params: Optional[List[...
method fit (line 49) | def fit(self, ds: DictDataset, idxs_list: List[SplitIdxs], interface_r...
method predict (line 160) | def predict(self, ds: DictDataset) -> torch.Tensor:
method get_required_resources (line 172) | def get_required_resources(self, ds: DictDataset, n_cv: int, n_refit: ...
method to (line 179) | def to(self, device: str) -> None:
class AlgorithmSelectionAlgInterface (line 186) | class AlgorithmSelectionAlgInterface(SingleSplitAlgInterface):
method __init__ (line 191) | def __init__(self, alg_interfaces: List[AlgInterface], fit_params: Opt...
method get_refit_interface (line 196) | def get_refit_interface(self, n_refit: int, fit_params: Optional[List[...
method fit (line 205) | def fit(self, ds: DictDataset, idxs_list: List[SplitIdxs], interface_r...
method predict (line 273) | def predict(self, ds: DictDataset) -> torch.Tensor:
method get_required_resources (line 278) | def get_required_resources(self, ds: DictDataset, n_cv: int, n_refit: ...
method to (line 286) | def to(self, device: str) -> None:
class PrecomputedPredictionsAlgInterface (line 293) | class PrecomputedPredictionsAlgInterface(SingleSplitAlgInterface):
method __init__ (line 294) | def __init__(self, y_preds_cv: torch.Tensor, y_preds_refit: Optional[t...
method get_refit_interface (line 303) | def get_refit_interface(self, n_refit: int, fit_params: Optional[List[...
method fit (line 306) | def fit(self, ds: DictDataset, idxs_list: List[SplitIdxs], interface_r...
method predict (line 311) | def predict(self, ds: DictDataset) -> torch.Tensor:
method get_required_resources (line 317) | def get_required_resources(self, ds: DictDataset, n_cv: int, n_refit: ...
FILE: pytabkit/models/alg_interfaces/lightgbm_interfaces.py
class LGBMCustomMetric (line 23) | class LGBMCustomMetric:
method __init__ (line 24) | def __init__(self, metric_name: str, is_classification: bool, is_highe...
method __call__ (line 29) | def __call__(self, y_pred: np.ndarray, eval_data):
class LGBMSklearnSubSplitInterface (line 61) | class LGBMSklearnSubSplitInterface(SklearnSubSplitInterface):
method _get_cat_indexes_arg_name (line 62) | def _get_cat_indexes_arg_name(self) -> str:
method _create_sklearn_model (line 65) | def _create_sklearn_model(self, seed: int, n_threads: int, gpu_devices...
method get_required_resources (line 96) | def get_required_resources(self, ds: DictDataset, n_cv: int, n_refit: ...
class LGBMSubSplitInterface (line 107) | class LGBMSubSplitInterface(TreeBasedSubSplitInterface):
method _get_params (line 108) | def _get_params(self):
method get_refit_interface (line 134) | def get_refit_interface(self, n_refit: int, fit_params: Optional[List[...
method _preprocess_params (line 139) | def _preprocess_params(self, params: Dict[str, Any], n_classes: int) -...
method _convert_ds (line 161) | def _convert_ds(self, ds: DictDataset) -> Any:
method _fit (line 177) | def _fit(self, train_ds: DictDataset, val_ds: Optional[DictDataset], p...
method _predict (line 250) | def _predict(self, bst, ds: DictDataset, n_classes: int, other_params:...
method get_required_resources (line 266) | def get_required_resources(self, ds: DictDataset, n_cv: int, n_refit: ...
class LGBMHyperoptAlgInterface (line 277) | class LGBMHyperoptAlgInterface(OptAlgInterface):
method __init__ (line 278) | def __init__(self, space=None, n_hyperopt_steps: int = 50, opt_method:...
method create_alg_interface (line 356) | def create_alg_interface(self, n_sub_splits: int, **config) -> AlgInte...
class RandomParamsLGBMAlgInterface (line 360) | class RandomParamsLGBMAlgInterface(RandomParamsAlgInterface):
method _sample_params (line 361) | def _sample_params(self, is_classification: bool, seed: int, n_train: ...
method _create_interface_from_config (line 704) | def _create_interface_from_config(self, n_tv_splits: int, **config):
FILE: pytabkit/models/alg_interfaces/nn_interfaces.py
function get_lignting_accel_and_devices (line 33) | def get_lignting_accel_and_devices(device: str):
class NNAlgInterface (line 52) | class NNAlgInterface(AlgInterface):
method __init__ (line 53) | def __init__(self, fit_params: Optional[List[Dict[str, Any]]] = None, ...
method get_refit_interface (line 58) | def get_refit_interface(self, n_refit: int, fit_params: Optional[List[...
method fit (line 61) | def fit(self, ds: DictDataset, idxs_list: List[SplitIdxs], interface_r...
method predict (line 155) | def predict(self, ds: DictDataset) -> torch.Tensor:
method get_available_predict_params (line 181) | def get_available_predict_params(self) -> Dict[str, Dict[str, Any]]:
method get_required_resources (line 189) | def get_required_resources(self, ds: DictDataset, n_cv: int, n_refit: ...
method get_model_ram_gb (line 242) | def get_model_ram_gb(self, ds: DictDataset, n_cv: int, n_refit: int, n...
method to (line 255) | def to(self, device: str) -> None:
method get_importances (line 260) | def get_importances(self) -> torch.Tensor:
method get_first_layer_weights (line 303) | def get_first_layer_weights(self, with_scale: bool) -> torch.Tensor:
class NNHyperoptAlgInterface (line 325) | class NNHyperoptAlgInterface(OptAlgInterface):
method __init__ (line 326) | def __init__(self, space: Optional[Union[str, Dict[str, Any]]] = None,...
method create_alg_interface (line 355) | def create_alg_interface(self, n_sub_splits: int, **config) -> AlgInte...
method get_required_resources (line 358) | def get_required_resources(self, ds: DictDataset, n_cv: int, n_refit: ...
class RealMLPParamSampler (line 369) | class RealMLPParamSampler:
method __init__ (line 370) | def __init__(self, is_classification: bool, hpo_space_name: str = 'def...
method sample_params (line 374) | def sample_params(self, seed: int) -> Dict[str, Any]:
class RandomParamsNNAlgInterface (line 984) | class RandomParamsNNAlgInterface(SingleSplitAlgInterface):
method __init__ (line 985) | def __init__(self, model_idx: int, fit_params: Optional[List[Dict[str,...
method get_refit_interface (line 992) | def get_refit_interface(self, n_refit: int, fit_params: Optional[List[...
method _create_sub_interface (line 997) | def _create_sub_interface(self, ds: DictDataset, seed: int):
method fit (line 1011) | def fit(self, ds: DictDataset, idxs_list: List[SplitIdxs], interface_r...
method predict (line 1019) | def predict(self, ds: DictDataset) -> torch.Tensor:
method get_required_resources (line 1023) | def get_required_resources(self, ds: DictDataset, n_cv: int, n_refit: ...
method get_available_predict_params (line 1029) | def get_available_predict_params(self) -> Dict[str, Dict[str, Any]]:
method to (line 1032) | def to(self, device: str) -> None:
FILE: pytabkit/models/alg_interfaces/other_interfaces.py
class RFSubSplitInterface (line 21) | class RFSubSplitInterface(SklearnSubSplitInterface):
method _create_sklearn_model (line 22) | def _create_sklearn_model(self, seed: int, n_threads: int, gpu_devices...
method get_required_resources (line 55) | def get_required_resources(self, ds: DictDataset, n_cv: int, n_refit: ...
class RandomParamsRFAlgInterface (line 68) | class RandomParamsRFAlgInterface(RandomParamsAlgInterface):
method _sample_params (line 69) | def _sample_params(self, is_classification: bool, seed: int, n_train: ...
method _create_interface_from_config (line 508) | def _create_interface_from_config(self, n_tv_splits: int, **config):
class ExtraTreesSubSplitInterface (line 512) | class ExtraTreesSubSplitInterface(SklearnSubSplitInterface):
method _create_sklearn_model (line 513) | def _create_sklearn_model(self, seed: int, n_threads: int, gpu_devices...
method get_required_resources (line 546) | def get_required_resources(self, ds: DictDataset, n_cv: int, n_refit: ...
class RandomParamsExtraTreesAlgInterface (line 559) | class RandomParamsExtraTreesAlgInterface(RandomParamsAlgInterface):
method _sample_params (line 560) | def _sample_params(self, is_classification: bool, seed: int, n_train: ...
method _create_interface_from_config (line 773) | def _create_interface_from_config(self, n_tv_splits: int, **config):
class GBTSubSplitInterface (line 777) | class GBTSubSplitInterface(SklearnSubSplitInterface):
method _create_sklearn_model (line 778) | def _create_sklearn_model(self, seed: int, n_threads: int, gpu_devices...
method get_required_resources (line 802) | def get_required_resources(self, ds: DictDataset, n_cv: int, n_refit: ...
class KNNSubSplitInterface (line 815) | class KNNSubSplitInterface(SklearnSubSplitInterface):
method _create_sklearn_model (line 816) | def _create_sklearn_model(self, seed: int, n_threads: int, gpu_devices...
method get_required_resources (line 830) | def get_required_resources(self, ds: DictDataset, n_cv: int, n_refit: ...
class RandomParamsKNNAlgInterface (line 843) | class RandomParamsKNNAlgInterface(RandomParamsAlgInterface):
method _sample_params (line 844) | def _sample_params(self, is_classification: bool, seed: int, n_train: ...
method _create_interface_from_config (line 866) | def _create_interface_from_config(self, n_tv_splits: int, **config):
class LinearModelSubSplitInterface (line 870) | class LinearModelSubSplitInterface(SklearnSubSplitInterface):
method _create_sklearn_model (line 871) | def _create_sklearn_model(self, seed: int, n_threads: int, gpu_devices...
method get_required_resources (line 906) | def get_required_resources(self, ds: DictDataset, n_cv: int, n_refit: ...
class RandomParamsLinearModelAlgInterface (line 919) | class RandomParamsLinearModelAlgInterface(RandomParamsAlgInterface):
method _sample_params (line 920) | def _sample_params(self, is_classification: bool, seed: int, n_train: ...
method _create_interface_from_config (line 984) | def _create_interface_from_config(self, n_tv_splits: int, **config):
class SklearnMLPSubSplitInterface (line 988) | class SklearnMLPSubSplitInterface(SklearnSubSplitInterface):
method _create_sklearn_model (line 989) | def _create_sklearn_model(self, seed: int, n_threads: int, gpu_devices...
method get_required_resources (line 998) | def get_required_resources(self, ds: DictDataset, n_cv: int, n_refit: ...
class KANSubSplitInterface (line 1011) | class KANSubSplitInterface(SklearnSubSplitInterface):
method _create_sklearn_model (line 1012) | def _create_sklearn_model(self, seed: int, n_threads: int, gpu_devices...
method get_required_resources (line 1023) | def get_required_resources(self, ds: DictDataset, n_cv: int, n_refit: ...
method _fit_sklearn (line 1036) | def _fit_sklearn(self, x_df: pd.DataFrame, y: np.ndarray, val_idxs: np...
method _predict_sklearn (line 1052) | def _predict_sklearn(self, x_df: pd.DataFrame) -> np.ndarray:
method _predict_proba_sklearn (line 1055) | def _predict_proba_sklearn(self, x_df: pd.DataFrame) -> np.ndarray:
class GrandeWrapper (line 1059) | class GrandeWrapper:
method __init__ (line 1064) | def __init__(self, **config):
method fit (line 1067) | def fit(self, X, y, X_val, y_val, cat_features: Optional[List[str]] = ...
method predict_proba (line 1134) | def predict_proba(self, X):
method predict (line 1137) | def predict(self, X):
class GrandeSubSplitInterface (line 1145) | class GrandeSubSplitInterface(SklearnSubSplitInterface):
method _create_sklearn_model (line 1146) | def _create_sklearn_model(self, seed: int, n_threads: int, gpu_devices...
method get_required_resources (line 1152) | def get_required_resources(self, ds: DictDataset, n_cv: int, n_refit: ...
method _fit_sklearn (line 1164) | def _fit_sklearn(self, x_df: pd.DataFrame, y: np.ndarray, val_idxs: np...
class TabPFN2SubSplitInterface (line 1180) | class TabPFN2SubSplitInterface(SklearnSubSplitInterface):
method _create_sklearn_model (line 1181) | def _create_sklearn_model(self, seed: int, n_threads: int, gpu_devices...
method _fit_sklearn (line 1208) | def _fit_sklearn(self, x_df: pd.DataFrame, y: np.ndarray, val_idxs: np...
method get_required_resources (line 1220) | def get_required_resources(self, ds: DictDataset, n_cv: int, n_refit: ...
class TabICLSubSplitInterface (line 1233) | class TabICLSubSplitInterface(SklearnSubSplitInterface):
method _create_sklearn_model (line 1234) | def _create_sklearn_model(self, seed: int, n_threads: int, gpu_devices...
method _fit_sklearn (line 1262) | def _fit_sklearn(self, x_df: pd.DataFrame, y: np.ndarray, val_idxs: np...
method _predict_sklearn (line 1284) | def _predict_sklearn(self, x_df: pd.DataFrame) -> np.ndarray:
method _predict_proba_sklearn (line 1292) | def _predict_proba_sklearn(self, x_df: pd.DataFrame) -> np.ndarray:
method get_required_resources (line 1300) | def get_required_resources(self, ds: DictDataset, n_cv: int, n_refit: ...
FILE: pytabkit/models/alg_interfaces/resource_computation.py
function get_resource_features (line 24) | def get_resource_features(config: Dict, ds: DictDataset, n_cv: int, n_re...
function process_resource_features (line 72) | def process_resource_features(raw_features: Dict[str, Any], feature_spec...
function eval_linear_product_model (line 91) | def eval_linear_product_model(raw_features: Dict[str, Any], params: Dict...
class FeatureSpec (line 108) | class FeatureSpec:
method _listify (line 113) | def _listify(spec: Union[List, str]):
method _product_str (line 122) | def _product_str(first: str, second: str) -> str:
method concat (line 135) | def concat(*feature_specs):
method product (line 141) | def product(*feature_specs):
method powerset_products (line 154) | def powerset_products(*feature_specs):
class NormalizedDataRegressor (line 166) | class NormalizedDataRegressor:
method __init__ (line 167) | def __init__(self, sub_regressor):
method fit (line 170) | def fit(self, X: np.ndarray, y: np.ndarray):
method get_coefs (line 175) | def get_coefs(self) -> np.ndarray:
method predict (line 178) | def predict(self, X: np.ndarray) -> np.ndarray:
class LogLinearModule (line 182) | class LogLinearModule(nn.Module):
method __init__ (line 183) | def __init__(self, n_features: int):
method forward (line 187) | def forward(self, x: torch.Tensor) -> torch.Tensor:
class LogLinearRegressor (line 191) | class LogLinearRegressor:
method __init__ (line 192) | def __init__(self, pessimistic: bool):
method fit (line 195) | def fit(self, X: np.ndarray, y: np.ndarray):
method get_coefs (line 221) | def get_coefs(self) -> np.ndarray:
function fit_resource_factors (line 225) | def fit_resource_factors(data: List[Tuple[Dict[str, float], float]], pes...
class TimeWrapper (line 254) | class TimeWrapper:
method __init__ (line 255) | def __init__(self, f: Callable):
method __call__ (line 258) | def __call__(self):
function create_ds (line 265) | def create_ds(n_samples: int, n_cont: int, n_cat: int, cat_size: int, n_...
class Sampler (line 281) | class Sampler:
method sample (line 282) | def sample(self) -> Union[int, float]:
class UniformSampler (line 286) | class UniformSampler(Sampler):
method __init__ (line 287) | def __init__(self, low: Union[int, float], high: Union[int, float], lo...
method sample (line 293) | def sample(self) -> Union[int, float]:
function ds_to_xy (line 307) | def ds_to_xy(ds: DictDataset) -> Tuple[pd.DataFrame, np.ndarray]:
class ResourcePredictor (line 313) | class ResourcePredictor:
method __init__ (line 317) | def __init__(self, config: Dict[str, Any], time_params: Dict[str, floa...
method get_required_resources (line 334) | def get_required_resources(self, ds: DictDataset, **extra_params) -> R...
FILE: pytabkit/models/alg_interfaces/resource_params.py
class ResourceParams (line 1) | class ResourceParams:
class ResourceParamsOld (line 141) | class ResourceParamsOld:
FILE: pytabkit/models/alg_interfaces/rtdl_interfaces.py
function allow_single_underscore (line 23) | def allow_single_underscore(params_config: List[Tuple]) -> List[Tuple]:
class SkorchSubSplitInterface (line 36) | class SkorchSubSplitInterface(SklearnSubSplitInterface):
method _fit_sklearn (line 37) | def _fit_sklearn(self, x_df: pd.DataFrame, y: np.ndarray, val_idxs: np...
method predict (line 87) | def predict(self, ds: DictDataset) -> torch.Tensor:
class RTDL_MLPSubSplitInterface (line 124) | class RTDL_MLPSubSplitInterface(SkorchSubSplitInterface):
method _create_sklearn_model (line 125) | def _create_sklearn_model(self, seed: int, n_threads: int, gpu_devices...
method get_required_resources (line 166) | def get_required_resources(self, ds: DictDataset, n_cv: int, n_refit: ...
class ResnetSubSplitInterface (line 181) | class ResnetSubSplitInterface(SkorchSubSplitInterface):
method _create_sklearn_model (line 182) | def _create_sklearn_model(self, seed: int, n_threads: int, gpu_devices...
method get_required_resources (line 226) | def get_required_resources(self, ds: DictDataset, n_cv: int, n_refit: ...
class FTTransformerSubSplitInterface (line 243) | class FTTransformerSubSplitInterface(SkorchSubSplitInterface):
method _create_sklearn_model (line 244) | def _create_sklearn_model(self, seed: int, n_threads: int, gpu_devices...
method get_required_resources (line 286) | def get_required_resources(self, ds: DictDataset, n_cv: int, n_refit: ...
function choose_batch_size_rtdl (line 311) | def choose_batch_size_rtdl(train_size) -> int:
function choose_batch_size_rtdl_new (line 324) | def choose_batch_size_rtdl_new(train_size: int) -> int:
class RTDL_MLP_ParamSamplerNew (line 337) | class RTDL_MLP_ParamSamplerNew:
method __init__ (line 338) | def __init__(self, is_classification: bool, train_size: int, num_emb_t...
method sample_params (line 343) | def sample_params(self, seed: int) -> Dict[str, Any]:
class RTDL_ResNet_ParamSampler (line 400) | class RTDL_ResNet_ParamSampler:
method __init__ (line 401) | def __init__(self, is_classification: bool, train_size: int):
method sample_params (line 405) | def sample_params(self, seed: int) -> Dict[str, Any]:
class RTDL_ResNet_ParamSamplerNew (line 447) | class RTDL_ResNet_ParamSamplerNew:
method __init__ (line 448) | def __init__(self, is_classification: bool, train_size: int):
method sample_params (line 452) | def sample_params(self, seed: int) -> Dict[str, Any]:
class RandomParamsResnetAlgInterface (line 493) | class RandomParamsResnetAlgInterface(SingleSplitAlgInterface):
method __init__ (line 494) | def __init__(self, model_idx: int, fit_params: Optional[List[Dict[str,...
method get_refit_interface (line 499) | def get_refit_interface(self, n_refit: int, fit_params: Optional[List[...
method _create_sub_interface (line 503) | def _create_sub_interface(self, ds: DictDataset, seed: int, n_train: i...
method fit (line 513) | def fit(self, ds: DictDataset, idxs_list: List[SplitIdxs], interface_r...
method predict (line 519) | def predict(self, ds: DictDataset) -> torch.Tensor:
method get_required_resources (line 522) | def get_required_resources(self, ds: DictDataset, n_cv: int, n_refit: ...
class RandomParamsFTTransformerAlgInterface (line 529) | class RandomParamsFTTransformerAlgInterface(RandomParamsAlgInterface):
method _sample_params (line 530) | def _sample_params(self, is_classification: bool, seed: int, n_train: ...
method _create_interface_from_config (line 561) | def _create_interface_from_config(self, n_tv_splits: int, **config):
class RandomParamsRTDLMLPAlgInterface (line 565) | class RandomParamsRTDLMLPAlgInterface(SingleSplitAlgInterface):
method __init__ (line 566) | def __init__(self, model_idx: int, fit_params: Optional[List[Dict[str,...
method get_refit_interface (line 571) | def get_refit_interface(self, n_refit: int, fit_params: Optional[List[...
method _create_sub_interface (line 575) | def _create_sub_interface(self, ds: DictDataset, seed: int, n_train: i...
method fit (line 587) | def fit(self, ds: DictDataset, idxs_list: List[SplitIdxs], interface_r...
method predict (line 594) | def predict(self, ds: DictDataset) -> torch.Tensor:
method get_required_resources (line 597) | def get_required_resources(self, ds: DictDataset, n_cv: int, n_refit: ...
FILE: pytabkit/models/alg_interfaces/sub_split_interfaces.py
class SingleSplitWrapperAlgInterface (line 19) | class SingleSplitWrapperAlgInterface(SingleSplitAlgInterface):
method __init__ (line 25) | def __init__(self, sub_split_interfaces: List[AlgInterface], fit_param...
method get_refit_interface (line 33) | def get_refit_interface(self, n_refit: int, fit_params: Optional[List[...
method fit (line 57) | def fit(self, ds: DictDataset, idxs_list: List[SplitIdxs], interface_r...
method predict (line 126) | def predict(self, ds: DictDataset) -> torch.Tensor:
method get_required_resources (line 130) | def get_required_resources(self, ds: DictDataset, n_cv: int, n_refit: ...
method get_available_predict_params (line 140) | def get_available_predict_params(self) -> Dict[str, Dict[str, Any]]:
method set_current_predict_params (line 143) | def set_current_predict_params(self, name: str) -> None:
class SklearnSubSplitInterface (line 149) | class SklearnSubSplitInterface(SingleSplitAlgInterface): # todo: have a...
method __init__ (line 154) | def __init__(self, fit_params: Optional[List[Dict[str, Any]]] = None, ...
method fit (line 161) | def fit(self, ds: DictDataset, idxs_list: List[SplitIdxs], interface_r...
method _fit_sklearn (line 224) | def _fit_sklearn(self, x_df: pd.DataFrame, y: np.ndarray, val_idxs: np...
method predict (line 237) | def predict(self, ds: DictDataset) -> torch.Tensor:
method _predict_sklearn (line 259) | def _predict_sklearn(self, x_df: pd.DataFrame) -> np.ndarray:
method _predict_proba_sklearn (line 262) | def _predict_proba_sklearn(self, x_df: pd.DataFrame) -> np.ndarray:
method _create_sklearn_model (line 265) | def _create_sklearn_model(self, seed: int, n_threads: int, gpu_devices...
method _get_cat_indexes_arg_name (line 269) | def _get_cat_indexes_arg_name(self) -> str:
class TreeBasedSubSplitInterface (line 274) | class TreeBasedSubSplitInterface(SingleSplitAlgInterface): # todo: inse...
method __init__ (line 279) | def __init__(self, fit_params: Optional[List[Dict[str, Any]]] = None, ...
method fit (line 286) | def fit(self, ds: DictDataset, idxs_list: List[SplitIdxs], interface_r...
method predict (line 350) | def predict(self, ds: DictDataset) -> torch.Tensor:
method _fit (line 364) | def _fit(self, train_ds: DictDataset, val_ds: Optional[DictDataset], p...
method _predict (line 369) | def _predict(self, bst: Any, ds: DictDataset, n_classes: int, other_pa...
method _get_params (line 372) | def _get_params(self) -> Dict[str, Any]:
method get_available_predict_params (line 375) | def get_available_predict_params(self) -> Dict[str, Dict[str, Any]]:
FILE: pytabkit/models/alg_interfaces/tabm_interface.py
function get_tabm_auto_batch_size (line 28) | def get_tabm_auto_batch_size(n_train: int) -> int:
class TabMSubSplitInterface (line 43) | class TabMSubSplitInterface(SingleSplitAlgInterface):
method __init__ (line 44) | def __init__(self, fit_params: Optional[List[Dict[str, Any]]] = None, ...
method get_refit_interface (line 47) | def get_refit_interface(self, n_refit: int, fit_params: Optional[List[...
method fit (line 50) | def fit(self, ds: DictDataset, idxs_list: List[SplitIdxs], interface_r...
method predict (line 414) | def predict(self, ds: DictDataset) -> torch.Tensor:
method get_required_resources (line 453) | def get_required_resources(self, ds: DictDataset, n_cv: int, n_refit: ...
class RandomParamsTabMAlgInterface (line 471) | class RandomParamsTabMAlgInterface(RandomParamsAlgInterface):
method _sample_params (line 472) | def _sample_params(self, is_classification: bool, seed: int, n_train: ...
method _create_interface_from_config (line 524) | def _create_interface_from_config(self, n_tv_splits: int, **config):
method get_available_predict_params (line 527) | def get_available_predict_params(self) -> Dict[str, Dict[str, Any]]:
method set_current_predict_params (line 530) | def set_current_predict_params(self, name: str) -> None:
FILE: pytabkit/models/alg_interfaces/tabr_interface.py
class ExceptionPrintingCallback (line 37) | class ExceptionPrintingCallback(pl.callbacks.Callback):
method on_exception (line 38) | def on_exception(self, trainer, pl_module, exception):
class TabRSubSplitInterface (line 44) | class TabRSubSplitInterface(AlgInterface):
method __init__ (line 45) | def __init__(self, **config):
method create_model (line 52) | def create_model(self, n_num_features, n_bin_features,
method infer_batch_size (line 95) | def infer_batch_size(self, n_samples_train: int) -> int:
method fit (line 107) | def fit(self, ds: DictDataset, idxs_list: List[SplitIdxs], interface_r...
method predict (line 348) | def predict(self, ds: DictDataset) -> torch.Tensor:
method get_required_resources (line 435) | def get_required_resources(self, ds: DictDataset, n_cv: int, n_refit: ...
class RandomParamsTabRAlgInterface (line 458) | class RandomParamsTabRAlgInterface(RandomParamsAlgInterface):
method _sample_params (line 459) | def _sample_params(self, is_classification: bool, seed: int, n_train: ...
method _create_interface_from_config (line 531) | def _create_interface_from_config(self, n_tv_splits: int, **config):
FILE: pytabkit/models/alg_interfaces/xgboost_interfaces.py
class XGBCustomMetric (line 22) | class XGBCustomMetric:
method __init__ (line 23) | def __init__(self, metric_names: Union[str, List[str]], is_classificat...
method __call__ (line 28) | def __call__(self, y_pred: np.ndarray, dtrain):
class XGBSklearnSubSplitInterface (line 71) | class XGBSklearnSubSplitInterface(SklearnSubSplitInterface):
method _create_sklearn_model (line 72) | def _create_sklearn_model(self, seed: int, n_threads: int, gpu_devices...
method get_required_resources (line 103) | def get_required_resources(self, ds: DictDataset, n_cv: int, n_refit: ...
class XGBSubSplitInterface (line 114) | class XGBSubSplitInterface(TreeBasedSubSplitInterface):
method _get_params (line 116) | def _get_params(self):
method get_refit_interface (line 144) | def get_refit_interface(self, n_refit: int, fit_params: Optional[List[...
method _preprocess_params (line 149) | def _preprocess_params(self, params: Dict[str, Any], n_classes: int) -...
method _convert_ds (line 179) | def _convert_ds(self, ds: DictDataset) -> Any:
method _fit (line 188) | def _fit(self, train_ds: DictDataset, val_ds: Optional[DictDataset], p...
method _predict (line 267) | def _predict(self, bst, ds: DictDataset, n_classes: int, other_params:...
method get_required_resources (line 284) | def get_required_resources(self, ds: DictDataset, n_cv: int, n_refit: ...
class XGBHyperoptAlgInterface (line 295) | class XGBHyperoptAlgInterface(OptAlgInterface):
method __init__ (line 296) | def __init__(self, space=None, n_hyperopt_steps: int = 50, **config):
method create_alg_interface (line 430) | def create_alg_interface(self, n_sub_splits: int, **config) -> AlgInte...
class RandomParamsXGBAlgInterface (line 434) | class RandomParamsXGBAlgInterface(RandomParamsAlgInterface):
method _sample_params (line 435) | def _sample_params(self, is_classification: bool, seed: int, n_train: ...
method _create_interface_from_config (line 650) | def _create_interface_from_config(self, n_tv_splits: int, **config):
method get_available_predict_params (line 653) | def get_available_predict_params(self) -> Dict[str, Dict[str, Any]]:
method set_current_predict_params (line 656) | def set_current_predict_params(self, name: str) -> None:
FILE: pytabkit/models/alg_interfaces/xrfm_interfaces.py
class xRFMSubSplitInterface (line 23) | class xRFMSubSplitInterface(SingleSplitAlgInterface):
method __init__ (line 24) | def __init__(self, fit_params: Optional[List[Dict[str, Any]]] = None, ...
method get_refit_interface (line 27) | def get_refit_interface(self, n_refit: int, fit_params: Optional[List[...
method fit (line 30) | def fit(self, ds: DictDataset, idxs_list: List[SplitIdxs], interface_r...
method predict (line 229) | def predict(self, ds: DictDataset) -> torch.Tensor:
method get_required_resources (line 245) | def get_required_resources(self, ds: DictDataset, n_cv: int, n_refit: ...
function sample_xrfm_params (line 275) | def sample_xrfm_params(seed: int, hpo_space_name: str = 'default'):
class RandomParamsxRFMAlgInterface (line 517) | class RandomParamsxRFMAlgInterface(RandomParamsAlgInterface):
method _sample_params (line 518) | def _sample_params(self, is_classification: bool, seed: int, n_train: ...
method _create_interface_from_config (line 521) | def _create_interface_from_config(self, n_tv_splits: int, **config):
FILE: pytabkit/models/data/conversion.py
class ToDictDatasetConverter (line 14) | class ToDictDatasetConverter:
method __init__ (line 15) | def __init__(self, cat_features: Optional[Union[List[bool], np.ndarray...
method fit_transform (line 25) | def fit_transform(self, x: Union[np.ndarray, pd.DataFrame, pd.Series, ...
method transform (line 83) | def transform(self, x: Union[np.ndarray, pd.DataFrame, pd.Series, Dict...
FILE: pytabkit/models/data/data.py
class TaskType (line 12) | class TaskType:
class TensorInfo (line 19) | class TensorInfo:
method __init__ (line 20) | def __init__(self, feat_shape: Optional[Union[List, np.ndarray, torch....
method get_feat_shape (line 27) | def get_feat_shape(self) -> np.ndarray:
method get_cat_sizes (line 33) | def get_cat_sizes(self) -> torch.Tensor:
method get_n_features (line 38) | def get_n_features(self) -> int:
method get_cat_size_product (line 41) | def get_cat_size_product(self) -> int:
method is_empty (line 44) | def is_empty(self) -> bool:
method is_cont (line 47) | def is_cont(self) -> bool:
method is_cat (line 51) | def is_cat(self) -> bool:
method to_dict (line 54) | def to_dict(self) -> Dict:
method from_dict (line 59) | def from_dict(data: Dict) -> 'TensorInfo':
method concat (line 63) | def concat(tensor_infos: List['TensorInfo']) -> 'TensorInfo':
class DictDataset (line 76) | class DictDataset:
method __init__ (line 79) | def __init__(self, tensors: Optional[Dict[str, torch.Tensor]], tensor_...
method split_xy (line 93) | def split_xy(self) -> Tuple['DictDataset', 'DictDataset']:
method without_labels (line 98) | def without_labels(self) -> 'DictDataset':
method to_df (line 101) | def to_df(self) -> pd.DataFrame:
method get_batch (line 119) | def get_batch(self, idxs) -> Dict[str, torch.Tensor]:
method get_sub_dataset (line 122) | def get_sub_dataset(self, idxs) -> 'DictDataset':
method get_shuffled (line 125) | def get_shuffled(self, seed) -> 'DictDataset':
method get_size_gb (line 128) | def get_size_gb(self) -> float:
method join (line 136) | def join(*datasets):
method to (line 140) | def to(self, device):
method __getitem__ (line 143) | def __getitem__(self, key):
method get_n_classes (line 150) | def get_n_classes(self):
class ParallelDictDataLoader (line 158) | class ParallelDictDataLoader:
method __init__ (line 159) | def __init__(self, ds: DictDataset, idxs: torch.Tensor, batch_size: in...
method get_num_samples (line 194) | def get_num_samples(self):
method get_num_iterated_samples (line 197) | def get_num_iterated_samples(self):
method __len__ (line 202) | def __len__(self):
method __iter__ (line 205) | def __iter__(self):
class ValDictDataLoader (line 217) | class ValDictDataLoader:
method __init__ (line 218) | def __init__(self, ds: DictDataset, val_idxs: torch.Tensor, val_batch_...
method __len__ (line 228) | def __len__(self):
method __iter__ (line 231) | def __iter__(self):
FILE: pytabkit/models/data/nested_dict.py
class NestedDict (line 6) | class NestedDict:
method __init__ (line 18) | def __init__(self, data_dict=None):
method __getitem__ (line 21) | def __getitem__(self, idxs):
method __setitem__ (line 29) | def __setitem__(self, idxs, value):
method __contains__ (line 44) | def __contains__(self, item: Union[List, Tuple]):
method get (line 53) | def get(self, idxs, default=None):
method _dict_update_rec (line 59) | def _dict_update_rec(self, d1: dict, d2: dict):
method update (line 66) | def update(self, other: 'NestedDict'):
method __str__ (line 69) | def __str__(self):
method __repr__ (line 72) | def __repr__(self):
method get_dict (line 75) | def get_dict(self) -> Dict:
method from_kwargs (line 79) | def from_kwargs(**kwargs):
FILE: pytabkit/models/data/splits.py
class Split (line 14) | class Split:
method __init__ (line 15) | def __init__(self, ds: DictDataset, idxs: Tuple[torch.Tensor, torch.Te...
method get_sub_ds (line 23) | def get_sub_ds(self, i):
method get_sub_idxs (line 26) | def get_sub_idxs(self, i):
class Splitter (line 30) | class Splitter:
method get_idxs (line 31) | def get_idxs(self, ds: DictDataset) -> Tuple[torch.Tensor, torch.Tensor]:
method split_ds (line 34) | def split_ds(self, ds: DictDataset) -> Split:
method get_split_sizes (line 38) | def get_split_sizes(self, n_samples: int) -> Tuple:
class RandomSplitter (line 42) | class RandomSplitter(Splitter):
method __init__ (line 43) | def __init__(self, seed, first_fraction=0.8, max_n_first: Optional[int...
method get_idxs (line 48) | def get_idxs(self, ds: DictDataset) -> Tuple[torch.Tensor, torch.Tensor]:
method get_split_sizes (line 56) | def get_split_sizes(self, n_samples: int) -> Tuple:
class IndexSplitter (line 63) | class IndexSplitter(Splitter):
method __init__ (line 64) | def __init__(self, index):
method get_idxs (line 67) | def get_idxs(self, ds: DictDataset) -> Tuple[torch.Tensor, torch.Tensor]:
method get_split_sizes (line 71) | def get_split_sizes(self, n_samples: int) -> Tuple:
class AllNothingSplitter (line 75) | class AllNothingSplitter(Splitter):
method get_idxs (line 76) | def get_idxs(self, ds: DictDataset) -> Tuple[torch.Tensor, torch.Tensor]:
method split_ds (line 81) | def split_ds(self, ds: DictDataset) -> Split:
method get_split_sizes (line 85) | def get_split_sizes(self, n_samples: int) -> Tuple:
class MultiSplitter (line 89) | class MultiSplitter:
method get_idxs (line 90) | def get_idxs(self, ds: DictDataset) -> List[Tuple[torch.Tensor, torch....
method split_ds (line 93) | def split_ds(self, ds: DictDataset) -> List[Split]:
class KFoldSplitter (line 98) | class KFoldSplitter(MultiSplitter):
method __init__ (line 99) | def __init__(self, k: int, seed: int, stratified=False):
method get_idxs (line 106) | def get_idxs(self, ds: DictDataset) -> List[Tuple[torch.Tensor, torch....
method get_split_sizes (line 121) | def get_split_sizes(self, n_samples: int) -> Tuple:
class SplitInfo (line 126) | class SplitInfo:
method __init__ (line 127) | def __init__(self, splitter: Splitter, split_type: str, id: int, alg_s...
method get_sub_seed (line 134) | def get_sub_seed(self, split_idx: int, is_cv: bool):
method get_sub_splits (line 138) | def get_sub_splits(self, ds: DictDataset, n_splits: int, is_cv: bool) ...
method get_train_and_val_size (line 149) | def get_train_and_val_size(self, n_samples: int, n_splits: int, is_cv:...
FILE: pytabkit/models/hyper_opt/coord_opt.py
function identity (line 13) | def identity(x):
class Hyperparameter (line 17) | class Hyperparameter:
method __init__ (line 18) | def __init__(self, start_value: Union[int, float], min_step_size: Unio...
method adjust_step_size (line 44) | def adjust_step_size(self, current_value: float, step_size: float) -> ...
method apply_tfms (line 93) | def apply_tfms(self, x: Any) -> Any:
class CoordOptimizerImpl (line 97) | class CoordOptimizerImpl:
method __init__ (line 101) | def __init__(self, f: Callable[[Dict], Tuple[float, Any]], space: Dict...
method suggest (line 140) | def suggest(self, new_hp_values) -> float:
method convert_hp_values (line 156) | def convert_hp_values(self, values: np.ndarray) -> Dict[str, Any]:
method eval (line 159) | def eval(self, new_hp_values: np.ndarray) -> Tuple[float, Any]:
method already_evaluated (line 169) | def already_evaluated(self, new_hp_values: np.ndarray) -> bool:
method coord_opt_idx (line 180) | def coord_opt_idx(self, idx: int):
method run (line 226) | def run(self) -> None:
class CoordOptimizer (line 252) | class CoordOptimizer(HyperOptimizer):
class CoordOptFuncWrapper (line 253) | class CoordOptFuncWrapper:
method __init__ (line 254) | def __init__(self, f: Callable[[dict], Tuple[float, Any]], fixed_par...
method __call__ (line 258) | def __call__(self, params: Dict[str, Any], seed: int = 0):
method __init__ (line 263) | def __init__(self, space: Dict[str, Hyperparameter], fixed_params: Dic...
method _optimize_impl (line 270) | def _optimize_impl(self, f: Callable[[dict], Tuple[float, Any]], seed:...
FILE: pytabkit/models/hyper_opt/hyper_optimizers.py
class FunctionEvaluationTracker (line 11) | class FunctionEvaluationTracker:
method __init__ (line 15) | def __init__(self, f: Callable[[dict], Tuple[float, Any]], n_steps: in...
method __call__ (line 24) | def __call__(self, params: dict) -> Tuple[float, Any]:
method get_best_params_and_result (line 41) | def get_best_params_and_result(self) -> Tuple[Dict, Tuple[float, Any]]:
class HyperOptimizer (line 45) | class HyperOptimizer:
method __init__ (line 46) | def __init__(self, n_hyperopt_steps: int):
method _optimize_impl (line 49) | def _optimize_impl(self, f: Callable[[dict], Tuple[float, Any]], seed:...
method optimize (line 53) | def optimize(self, f: Callable[[dict], Tuple[float, Any]], seed: int, ...
method get_n_hyperopt_steps (line 75) | def get_n_hyperopt_steps(self) -> int:
class ConstantHyperOptimizer (line 85) | class ConstantHyperOptimizer(HyperOptimizer):
method __init__ (line 86) | def __init__(self, params: dict):
method _optimize_impl (line 90) | def _optimize_impl(self, f: Callable[[dict], Tuple[float, Any]], seed:...
function f_unpack_dict (line 94) | def f_unpack_dict(dct):
class HyperoptOptimizer (line 121) | class HyperoptOptimizer(HyperOptimizer):
class HyperoptFuncWrapper (line 122) | class HyperoptFuncWrapper:
method __init__ (line 123) | def __init__(self, f: Callable[[dict], Tuple[float, Any]], fixed_par...
method __call__ (line 127) | def __call__(self, params: dict):
method __init__ (line 136) | def __init__(self, space, fixed_params, n_hyperopt_steps: int = 50, **...
method _optimize_impl (line 142) | def _optimize_impl(self, f: Callable[[dict], Tuple[float, Any]], seed:...
class SMACOptimizer (line 165) | class SMACOptimizer(HyperOptimizer):
class SMACFuncWrapper (line 166) | class SMACFuncWrapper:
method __init__ (line 167) | def __init__(self, f: Callable[[dict], Tuple[float, Any]], fixed_par...
method __call__ (line 171) | def __call__(self, params, seed: int = 0):
method __init__ (line 178) | def __init__(self, space, fixed_params: Dict[str, Any], n_hyperopt_ste...
method _optimize_impl (line 187) | def _optimize_impl(self, f: Callable[[dict], Tuple[float, Any]], seed:...
FILE: pytabkit/models/nn_models/activations.py
function _swish_jit_fwd (line 13) | def _swish_jit_fwd(x): return x.mul(torch.sigmoid(x))
function _swish_jit_bwd (line 17) | def _swish_jit_bwd(x, grad_output):
class _SwishJitAutoFn (line 22) | class _SwishJitAutoFn(torch.autograd.Function):
method forward (line 24) | def forward(ctx, x):
method backward (line 29) | def backward(ctx, grad_output):
function swish (line 36) | def swish(x): return x * torch.sigmoid(x)
function _mish_jit_fwd (line 40) | def _mish_jit_fwd(x): return x.mul(torch.tanh(F.softplus(x)))
function _mish_jit_bwd (line 44) | def _mish_jit_bwd(x, grad_output):
class MishJitAutoFn (line 50) | class MishJitAutoFn(torch.autograd.Function):
method forward (line 52) | def forward(ctx, x):
method backward (line 57) | def backward(ctx, grad_output):
function mish (line 64) | def mish(x): return x.mul(torch.tanh(F.softplus(x)))
function golu (line 67) | def golu(x):
class ParametricActivationLayer (line 73) | class ParametricActivationLayer(Layer):
method __init__ (line 74) | def __init__(self, f, weight):
method forward_cont (line 79) | def forward_cont(self, x):
method _stack (line 83) | def _stack(self, layers):
class ParametricActivationFitter (line 87) | class ParametricActivationFitter(Fitter):
method __init__ (line 88) | def __init__(self, f, **config):
method get_n_params (line 94) | def get_n_params(self, tensor_infos: Dict[str, TensorInfo]) -> int:
method _fit (line 97) | def _fit(self, ds: DictDataset) -> Layer:
class ActivationFactory (line 104) | class ActivationFactory(FitterFactory):
method __init__ (line 105) | def __init__(self, **config):
method _create (line 109) | def _create(self, tensor_infos) -> Fitter:
FILE: pytabkit/models/nn_models/base.py
class Scope (line 62) | class Scope:
method __init__ (line 63) | def __init__(self, names: Optional[List[str]] = None):
method get_sub_scope (line 66) | def get_sub_scope(self, name: str) -> 'Scope':
method __str__ (line 69) | def __str__(self):
method matches (line 72) | def matches(self, regex: Union[str, re.Pattern]) -> bool:
class TrainContext (line 78) | class TrainContext:
method __init__ (line 82) | def __init__(self, scope: Optional[Scope] = None, hp_manager: Optional...
method clone (line 86) | def clone(self):
method get_global_context (line 90) | def get_global_context() -> 'TrainContext':
function sub_scope_context (line 97) | def sub_scope_context(name: str):
function sub_scopes_context (line 106) | def sub_scopes_context(names: List[str]):
function set_scope_context (line 118) | def set_scope_context(scope: Scope):
function set_hp_context (line 127) | def set_hp_context(hp_manager: Optional[HyperparamManager]):
class ContextAware (line 136) | class ContextAware:
method __init__ (line 137) | def __init__(self, scope_names: Optional[List[str]] = None):
method add_scope (line 141) | def add_scope(self, name: str):
method add_others_scope (line 145) | def add_others_scope(self, other: 'ContextAware'):
method set_context (line 150) | def set_context(self):
class ContextRecorder (line 155) | class ContextRecorder:
method __init__ (line 156) | def __init__(self):
method set_context (line 161) | def set_context(self):
class StringConvertible (line 167) | class StringConvertible:
method __init__ (line 168) | def __init__(self):
method __repr__ (line 171) | def __repr__(self):
method __str__ (line 174) | def __str__(self):
class Variable (line 179) | class Variable(ContextRecorder, nn.Parameter):
method __new__ (line 180) | def __new__(cls, data=None, trainable=True, requires_grad=None, hyper_...
method __init__ (line 190) | def __init__(self, data=None, trainable=True, requires_grad=None, hype...
method __deepcopy__ (line 193) | def __deepcopy__(self, memo):
method __repr__ (line 202) | def __repr__(self):
method stack (line 208) | def stack(vars: List['Variable'], dim=0):
class Layer (line 220) | class Layer(ContextRecorder, StringConvertible, nn.Module):
method __init__ (line 232) | def __init__(self, new_tensor_infos: Optional[Dict[str, TensorInfo]] =...
method forward_tensor_infos (line 253) | def forward_tensor_infos(self, tensor_infos: Dict[str, TensorInfo]) ->...
method forward (line 263) | def forward(self, data: Union[DictDataset, Dict[str, torch.Tensor]]) -...
method forward_ds (line 275) | def forward_ds(self, ds: DictDataset) -> DictDataset:
method forward_cont (line 280) | def forward_cont(self, x: torch.Tensor) -> torch.Tensor:
method forward_tensors (line 287) | def forward_tensors(self, tensors: Dict[str, torch.Tensor]) -> Dict[st...
method _stack (line 297) | def _stack(self, layers: List['Layer']) -> 'Layer':
method stack (line 313) | def stack(self, layers: List['Layer']) -> 'Layer':
method __setattr__ (line 323) | def __setattr__(self, name, value):
class IdentityLayer (line 352) | class IdentityLayer(Layer):
method forward_tensors (line 354) | def forward_tensors(self, x):
class SequentialLayer (line 358) | class SequentialLayer(Layer):
method __init__ (line 359) | def __init__(self, tfms: List[Layer]):
method forward_tensor_infos (line 363) | def forward_tensor_infos(self, tensor_infos):
method forward_ds (line 368) | def forward_ds(self, ds: DictDataset):
method forward_tensors (line 373) | def forward_tensors(self, tensors):
method _stack (line 378) | def _stack(self, seq_tfms):
method __repr__ (line 382) | def __repr__(self):
method __str__ (line 385) | def __str__(self):
class ResidualLayer (line 390) | class ResidualLayer(Layer):
method __init__ (line 391) | def __init__(self, inner_layer: Layer):
method forward_tensor_infos (line 395) | def forward_tensor_infos(self, tensor_infos):
method forward_tensors (line 398) | def forward_tensors(self, tensors: Dict[str, torch.Tensor]):
method _stack (line 403) | def _stack(self, seq_tfms):
method __repr__ (line 406) | def __repr__(self):
method __str__ (line 409) | def __str__(self):
class ConcatParallelLayer (line 414) | class ConcatParallelLayer(Layer):
method __init__ (line 422) | def __init__(self, layers: List[Layer], fitter: 'Fitter'):
method forward_tensors (line 426) | def forward_tensors(self, tensors):
method _stack (line 433) | def _stack(self, tfms: List[Layer]):
method __repr__ (line 437) | def __repr__(self):
method __str__ (line 440) | def __str__(self):
class FilterTensorsLayer (line 445) | class FilterTensorsLayer(Layer):
method __init__ (line 449) | def __init__(self, include_keys: Optional[List[str]], exclude_keys: Op...
method forward_tensors (line 457) | def forward_tensors(self, tensors: Dict[str, torch.Tensor]) -> Dict[st...
class FunctionLayer (line 467) | class FunctionLayer(Layer):
method __init__ (line 468) | def __init__(self, f):
method forward_cont (line 472) | def forward_cont(self, x: torch.Tensor) -> torch.Tensor:
class BiasLayer (line 476) | class BiasLayer(Layer):
method __init__ (line 477) | def __init__(self, bias: Variable, factor: float = 1.0):
method forward_cont (line 482) | def forward_cont(self, x):
method _stack (line 489) | def _stack(self, tfms):
class ScaleLayer (line 493) | class ScaleLayer(Layer):
method __init__ (line 494) | def __init__(self, scale: Variable):
method forward_cont (line 498) | def forward_cont(self, x):
method _stack (line 502) | def _stack(self, tfms):
class WeightLayer (line 506) | class WeightLayer(Layer):
method __init__ (line 507) | def __init__(self, weight: Variable, factor: float = 1.0):
method forward_cont (line 513) | def forward_cont(self, x):
method _stack (line 519) | def _stack(self, tfms):
class RenameTensorLayer (line 523) | class RenameTensorLayer(Layer):
method __init__ (line 524) | def __init__(self, old_name: str, new_name: str, fitter: 'Fitter'):
method forward_tensors (line 529) | def forward_tensors(self, tensors: Dict[str, torch.Tensor]) -> Dict[st...
method _stack (line 539) | def _stack(self, layers: List['Layer']) -> 'Layer':
class Fitter (line 546) | class Fitter(ContextAware, StringConvertible):
method __init__ (line 551) | def __init__(self, needs_tensors: bool = True, is_individual: bool = T...
method _get_n_values (line 572) | def _get_n_values(self, tensor_infos: Dict[str, TensorInfo], relevant_...
method get_n_params (line 584) | def get_n_params(self, tensor_infos: Dict[str, TensorInfo]) -> int:
method get_n_forward (line 592) | def get_n_forward(self, tensor_infos: Dict[str, TensorInfo]) -> int:
method forward_tensor_infos (line 602) | def forward_tensor_infos(self, tensor_infos: Dict[str, TensorInfo]) ->...
method fit (line 610) | def fit(self, ds: DictDataset) -> Layer:
method fit_transform (line 620) | def fit_transform(self, ds: DictDataset, needs_tensors: bool = True) -...
method fit_transform_subsample (line 632) | def fit_transform_subsample(self, ds: DictDataset, ram_limit_gb: float...
method _fit (line 645) | def _fit(self, ds: DictDataset) -> Layer:
method _fit_transform (line 663) | def _fit_transform(self, ds: DictDataset, needs_tensors: bool) -> Tupl...
method _fit_transform_subsample (line 681) | def _fit_transform_subsample(self, ds: DictDataset, ram_limit_gb: floa...
method split_off_dynamic (line 696) | def split_off_dynamic(self) -> Tuple['Fitter', 'Fitter']:
method split_off_individual (line 711) | def split_off_individual(self):
class IdentityFitter (line 725) | class IdentityFitter(Fitter):
method __init__ (line 726) | def __init__(self, **config):
method _fit (line 729) | def _fit(self, ds: DictDataset) -> Layer:
class SequentialFitter (line 733) | class SequentialFitter(Fitter):
method __init__ (line 734) | def __init__(self, fitters: List[Fitter], **config):
method forward_tensor_infos (line 740) | def forward_tensor_infos(self, tensor_infos: Dict[str, TensorInfo]):
method get_n_params (line 745) | def get_n_params(self, tensor_infos: Dict[str, TensorInfo]):
method get_n_forward (line 752) | def get_n_forward(self, tensor_infos: Dict[str, TensorInfo]):
method _fit_transform (line 759) | def _fit_transform(self, ds: DictDataset, needs_tensors: bool = True):
method _fit_transform_subsample (line 768) | def _fit_transform_subsample(self, ds: DictDataset, ram_limit_gb: floa...
method split_off_dynamic (line 778) | def split_off_dynamic(self):
method split_off_individual (line 788) | def split_off_individual(self):
method __str__ (line 798) | def __str__(self):
class ResidualFitter (line 803) | class ResidualFitter(Fitter):
method __init__ (line 804) | def __init__(self, inner_fitter: Fitter, **config):
method forward_tensor_infos (line 809) | def forward_tensor_infos(self, tensor_infos: Dict[str, TensorInfo]):
method get_n_params (line 812) | def get_n_params(self, tensor_infos: Dict[str, TensorInfo]):
method get_n_forward (line 815) | def get_n_forward(self, tensor_infos: Dict[str, TensorInfo]):
method _fit_transform (line 818) | def _fit_transform(self, ds: DictDataset, needs_tensors=True):
method split_off_dynamic (line 824) | def split_off_dynamic(self):
method split_off_individual (line 830) | def split_off_individual(self):
method __str__ (line 836) | def __str__(self):
class FunctionFitter (line 841) | class FunctionFitter(Fitter):
method __init__ (line 842) | def __init__(self, f, **config):
method _fit (line 846) | def _fit(self, ds: DictDataset):
class ConcatParallelFitter (line 850) | class ConcatParallelFitter(Fitter):
method __init__ (line 852) | def __init__(self, fitters: List[Fitter]):
method get_n_forward (line 857) | def get_n_forward(self, tensor_infos: Dict[str, TensorInfo]) -> int:
method get_n_params (line 863) | def get_n_params(self, tensor_infos: Dict[str, TensorInfo]) -> int:
method forward_tensor_infos (line 866) | def forward_tensor_infos(self, tensor_infos: Dict[str, TensorInfo]) ->...
method _fit (line 874) | def _fit(self, ds: DictDataset) -> Layer:
class FitterFactory (line 880) | class FitterFactory(ContextAware, StringConvertible):
method __init__ (line 885) | def __init__(self, scope_names: Optional[List[str]] = None):
method create (line 888) | def create(self, tensor_infos: Dict[str, TensorInfo]) -> Fitter:
method create_transform (line 900) | def create_transform(self, tensor_infos: Dict[str, TensorInfo]) -> Tup...
method _create (line 912) | def _create(self, tensor_infos: Dict[str, TensorInfo]) -> Fitter:
method _create_transform (line 926) | def _create_transform(self, tensor_infos: Dict[str, TensorInfo]) -> Tu...
class SequentialFactory (line 931) | class SequentialFactory(FitterFactory):
method __init__ (line 932) | def __init__(self, factories: List[FitterFactory]):
method _create_transform (line 936) | def _create_transform(self, tensor_infos: Dict[str, TensorInfo]):
method __str__ (line 943) | def __str__(self):
class IdentityFactory (line 948) | class IdentityFactory(FitterFactory):
method _create (line 949) | def _create(self, tensor_infos):
class FunctionFactory (line 953) | class FunctionFactory(FitterFactory):
method __init__ (line 954) | def __init__(self, f):
method _create (line 958) | def _create(self, tensor_infos):
class ConcatParallelFactory (line 962) | class ConcatParallelFactory(FitterFactory):
method __init__ (line 963) | def __init__(self, factories: List[FitterFactory]):
method _create (line 967) | def _create(self, tensor_infos) -> Fitter:
class FilterTensorsFactory (line 971) | class FilterTensorsFactory(Fitter, FitterFactory):
method __init__ (line 972) | def __init__(self, include_keys: Optional[List[str]] = None, exclude_k...
method forward_tensor_infos (line 977) | def forward_tensor_infos(self, tensor_infos: Dict[str, TensorInfo]) ->...
method _fit (line 983) | def _fit(self, ds: DictDataset) -> Layer:
class RenameTensorFactory (line 987) | class RenameTensorFactory(Fitter, FitterFactory):
method __init__ (line 988) | def __init__(self, old_name: str, new_name: str, **config):
method get_n_forward (line 993) | def get_n_forward(self, tensor_infos: Dict[str, TensorInfo]) -> int:
method forward_tensor_infos (line 999) | def forward_tensor_infos(self, tensor_infos: Dict[str, TensorInfo]) ->...
method _fit (line 1011) | def _fit(self, ds: DictDataset) -> Layer:
FILE: pytabkit/models/nn_models/categorical.py
class SingleEncodingFactory (line 14) | class SingleEncodingFactory(FitterFactory):
method __init__ (line 15) | def __init__(self, create_fitter, min_cat_size=0, max_cat_size=-1):
method apply_on (line 21) | def apply_on(self, cat_size: int, n_classes: int):
method _create (line 25) | def _create(self, tensor_infos):
class EncodingLayer (line 40) | class EncodingLayer(Layer):
method __init__ (line 41) | def __init__(self, single_enc_layers: Iterable[Layer], enc_output_name...
method forward_tensors (line 46) | def forward_tensors(self, tensors):
method _stack (line 70) | def _stack(self, layers: List['EncodingLayer']):
class EncodingFitter (line 75) | class EncodingFitter(Fitter):
method __init__ (line 76) | def __init__(self, single_encoder_fitters: List[Fitter], enc_output_na...
method get_n_params (line 83) | def get_n_params(self, tensor_infos: Dict[str, TensorInfo]) -> int:
method get_n_forward (line 87) | def get_n_forward(self, tensor_infos: Dict[str, TensorInfo]) -> int:
method _sub_tensor_infos (line 97) | def _sub_tensor_infos(self, tensor_infos):
method forward_tensor_infos (line 103) | def forward_tensor_infos(self, tensor_infos):
method _fit (line 122) | def _fit(self, ds: DictDataset) -> Layer:
class EncodingFactory (line 147) | class EncodingFactory(FitterFactory):
method __init__ (line 148) | def __init__(self, single_encoder_factory, enc_output_name: str = 'x_c...
method _create (line 153) | def _create(self, tensor_infos):
class SingleOneHotLayer (line 166) | class SingleOneHotLayer(Layer):
method __init__ (line 167) | def __init__(self, fitter: Fitter, onoff, cat_size, use_missing_zero: ...
method _binary (line 174) | def _binary(self, x_cat, values):
method _multiple (line 180) | def _multiple(self, x_cat, on_value, off_value):
method forward_tensors (line 188) | def forward_tensors(self, tensors):
class SingleOneHotFitter (line 213) | class SingleOneHotFitter(Fitter):
method __init__ (line 214) | def __init__(self, use_missing_zero: bool, bin_onoff: Tuple[float, flo...
method forward_tensor_infos (line 222) | def forward_tensor_infos(self, tensor_infos):
method _fit (line 230) | def _fit(self, ds: DictDataset) -> Layer:
class SingleOneHotFactory (line 238) | class SingleOneHotFactory(SingleEncodingFactory):
method __init__ (line 239) | def __init__(self, use_missing_zero=True, bin_onoff=(1.0, 0.0), multi_...
method apply_on (line 249) | def apply_on(self, cat_size: int, n_classes: int):
class SingleEmbeddingLayer (line 259) | class SingleEmbeddingLayer(Layer):
method __init__ (line 260) | def __init__(self, emb: Variable):
method forward_tensors (line 266) | def forward_tensors(self, tensors):
method _stack (line 310) | def _stack(self, layers: List['SingleEmbeddingLayer']):
function fastai_emb_size_fn (line 314) | def fastai_emb_size_fn(n_cat: int):
class ConstantFunction (line 318) | class ConstantFunction:
method __init__ (line 319) | def __init__(self, value: Any):
method __call__ (line 322) | def __call__(self, *args, **kwargs) -> Any:
function get_embedding_size (line 326) | def get_embedding_size(fn: Optional[Union[int, str, Callable[[int], int]...
class SingleEmbeddingFitter (line 342) | class SingleEmbeddingFitter(Fitter):
method __init__ (line 343) | def __init__(self, embedding_size=None, **config):
method get_n_params (line 353) | def get_n_params(self, tensor_infos: Dict[str, TensorInfo]) -> int:
method forward_tensor_infos (line 357) | def forward_tensor_infos(self, tensor_infos):
method _fit (line 361) | def _fit(self, ds: DictDataset) -> Layer:
class SingleEmbeddingFactory (line 380) | class SingleEmbeddingFactory(SingleEncodingFactory):
method __init__ (line 381) | def __init__(self, embedding_size=None, min_embedding_cat_size=0, max_...
class SingleTargetEncodingFitter (line 389) | class SingleTargetEncodingFitter(Fitter):
method __init__ (line 390) | def __init__(self, n_classes, **config):
method get_n_params (line 394) | def get_n_params(self, tensor_infos: Dict[str, TensorInfo]) -> int:
method forward_tensor_infos (line 400) | def forward_tensor_infos(self, tensor_infos):
method _fit (line 404) | def _fit(self, ds: DictDataset) -> Layer:
class SingleTargetEncodingFactory (line 428) | class SingleTargetEncodingFactory(SingleEncodingFactory):
method __init__ (line 429) | def __init__(self, min_targetenc_cat_size=0, max_targetenc_cat_size=-1...
class SingleOrdinalEncodingLayer (line 438) | class SingleOrdinalEncodingLayer(Layer):
method __init__ (line 439) | def __init__(self, fitter, cat_size: int, permute_ordinal_encoding: bo...
method forward_tensors (line 447) | def forward_tensors(self, tensors):
class SingleOrdinalEncodingFitter (line 454) | class SingleOrdinalEncodingFitter(Fitter):
method __init__ (line 455) | def __init__(self, permute_ordinal_encoding: bool = False, **config):
method forward_tensor_infos (line 459) | def forward_tensor_infos(self, tensor_infos):
method _fit (line 462) | def _fit(self, ds: DictDataset) -> Layer:
class SingleOrdinalEncodingFactory (line 467) | class SingleOrdinalEncodingFactory(SingleEncodingFactory):
method __init__ (line 468) | def __init__(self, min_labelenc_cat_size=0, max_labelenc_cat_size=-1, ...
FILE: pytabkit/models/nn_models/models.py
class BlockFactory (line 30) | class BlockFactory(FitterFactory):
method __init__ (line 31) | def __init__(self, out_features: int, block_str: str = 'w-b-a', **conf...
method _create_transform (line 39) | def _create_transform(self, tensor_infos):
function smooth_clip_func (line 71) | def smooth_clip_func(x, max_abs_value: float = 3.0):
function tanh_clip_func (line 75) | def tanh_clip_func(x):
class PreprocessingFactory (line 79) | class PreprocessingFactory(FitterFactory):
method __init__ (line 80) | def __init__(self, **config):
method _create (line 84) | def _create(self, tensor_infos: Dict[str, TensorInfo]) -> Fitter:
class NNFactory (line 157) | class NNFactory(FitterFactory):
method __init__ (line 158) | def __init__(self, **config):
method _create_transform (line 168) | def _create_transform(self, tensor_infos: Dict[str, TensorInfo]) -> Tu...
FILE: pytabkit/models/nn_models/nn.py
class WeightFitter (line 16) | class WeightFitter(Fitter):
method __init__ (line 17) | def __init__(self, out_features, **config):
method get_n_params (line 46) | def get_n_params(self, tensor_infos: Dict[str, TensorInfo]) -> int:
method forward_tensor_infos (line 49) | def forward_tensor_infos(self, tensor_infos):
method _fit (line 52) | def _fit(self, ds: DictDataset):
class BiasFitter (line 192) | class BiasFitter(Fitter):
method __init__ (line 193) | def __init__(self, **config):
method get_n_params (line 210) | def get_n_params(self, tensor_infos: Dict[str, TensorInfo]) -> int:
method heplus_bias (line 213) | def heplus_bias(self, x, n_simplex):
method _fit (line 222) | def _fit(self, ds: DictDataset):
class ScaleFitter (line 279) | class ScaleFitter(Fitter):
method __init__ (line 280) | def __init__(self, **config):
method get_n_params (line 290) | def get_n_params(self, tensor_infos: Dict[str, TensorInfo]) -> int:
method _fit (line 293) | def _fit(self, ds: DictDataset):
class ScaleFactory (line 318) | class ScaleFactory(FitterFactory):
method __init__ (line 319) | def __init__(self, **config):
method _create (line 323) | def _create(self, tensor_infos: Dict[str, TensorInfo]) -> Fitter:
class DropoutLayer (line 327) | class DropoutLayer(Layer):
method __init__ (line 328) | def __init__(self):
method forward_cont (line 332) | def forward_cont(self, x):
class DropoutFitter (line 339) | class DropoutFitter(Fitter):
method __init__ (line 340) | def __init__(self):
method _fit (line 343) | def _fit(self, ds: DictDataset) -> Layer:
class NoiseLayer (line 347) | class NoiseLayer(Layer):
method __init__ (line 348) | def __init__(self):
method forward_cont (line 352) | def forward_cont(self, x):
class NoiseFitter (line 359) | class NoiseFitter(Fitter):
method __init__ (line 360) | def __init__(self, **config):
method _fit (line 363) | def _fit(self, ds: DictDataset) -> Layer:
class ClampLayer (line 370) | class ClampLayer(Layer):
method __init__ (line 371) | def __init__(self, low: Variable, high: Variable):
method forward_cont (line 376) | def forward_cont(self, x):
method _stack (line 382) | def _stack(self, layers):
class ClampOutputFactory (line 387) | class ClampOutputFactory(Fitter, FitterFactory):
method __init__ (line 388) | def __init__(self, **config):
method get_n_params (line 392) | def get_n_params(self, tensor_infos: Dict[str, TensorInfo]) -> int:
method _fit (line 395) | def _fit(self, ds: DictDataset) -> Layer:
class NormalizeOutputLayer (line 401) | class NormalizeOutputLayer(Layer):
method __init__ (line 402) | def __init__(self, mean: Variable, std: Variable):
method forward_tensors (line 407) | def forward_tensors(self, tensors):
method _stack (line 416) | def _stack(self, layers):
class NormalizeOutputFactory (line 421) | class NormalizeOutputFactory(Fitter, FitterFactory):
method __init__ (line 422) | def __init__(self, **config):
method get_n_params (line 425) | def get_n_params(self, tensor_infos: Dict[str, TensorInfo]) -> int:
method _fit (line 428) | def _fit(self, ds: DictDataset) -> Layer:
class NormWeightLayer (line 434) | class NormWeightLayer(Layer):
method __init__ (line 435) | def __init__(self, weight: Variable, factor: float, fitter: Fitter, tr...
method forward_cont (line 441) | def forward_cont(self, x):
method _stack (line 444) | def _stack(self, layers):
class FixedScaleFactory (line 451) | class FixedScaleFactory(Fitter, FitterFactory):
method __init__ (line 452) | def __init__(self, scale: torch.Tensor):
method _fit (line 456) | def _fit(self, ds: DictDataset) -> Layer:
class FeatureImportanceFactory (line 460) | class FeatureImportanceFactory(Fitter, FitterFactory):
method __init__ (line 461) | def __init__(self):
method _fit (line 464) | def _fit(self, ds: DictDataset) -> Layer:
class FixedWeightFactory (line 469) | class FixedWeightFactory(Fitter, FitterFactory):
method __init__ (line 470) | def __init__(self):
method _fit (line 473) | def _fit(self, ds: DictDataset) -> Layer:
class RFFeatureImportanceFactory (line 478) | class RFFeatureImportanceFactory(Fitter, FitterFactory):
method __init__ (line 479) | def __init__(self):
method _fit (line 482) | def _fit(self, ds: DictDataset) -> Layer:
class PLREmbeddingsFactory (line 501) | class PLREmbeddingsFactory(Fitter, FitterFactory):
method __init__ (line 503) | def __init__(self, plr_sigma: float = 1.0, plr_hidden_1: int = 8, plr_...
method get_n_params (line 521) | def get_n_params(self, tensor_infos: Dict[str, TensorInfo]) -> int:
method get_n_forward (line 531) | def get_n_forward(self, tensor_infos: Dict[str, TensorInfo]) -> int:
method forward_tensor_infos (line 546) | def forward_tensor_infos(self, tensor_infos: Dict[str, TensorInfo]) ->...
method _fit (line 551) | def _fit(self, ds: DictDataset) -> Layer:
class PLREmbeddingsLayer (line 594) | class PLREmbeddingsLayer(Layer):
method __init__ (line 597) | def __init__(self, fitter: Fitter, weight_1: Variable, weight_2: Varia...
method forward_cont (line 606) | def forward_cont(self, x):
method _stack (line 630) | def _stack(self, layers):
class PLREmbeddingsLayerCosBias (line 639) | class PLREmbeddingsLayerCosBias(Layer):
method __init__ (line 642) | def __init__(self, fitter: Fitter, weight_1: Variable, bias_1: Variable,
method forward_cont (line 653) | def forward_cont(self, x):
method _stack (line 679) | def _stack(self, layers):
class PeriodicEmbeddingsFactory (line 689) | class PeriodicEmbeddingsFactory(Fitter, FitterFactory):
method __init__ (line 691) | def __init__(self, periodic_emb_sigma: float = 1.0, periodic_emb_dim: ...
method get_n_params (line 702) | def get_n_params(self, tensor_infos: Dict[str, TensorInfo]) -> int:
method get_n_forward (line 712) | def get_n_forward(self, tensor_infos: Dict[str, TensorInfo]) -> int:
method forward_tensor_infos (line 717) | def forward_tensor_infos(self, tensor_infos: Dict[str, TensorInfo]) ->...
method _fit (line 722) | def _fit(self, ds: DictDataset) -> Layer:
class PeriodicEmbeddingsLayerSinCos (line 745) | class PeriodicEmbeddingsLayerSinCos(Layer):
method __init__ (line 748) | def __init__(self, fitter: Fitter, weight: Variable, periodic_emb_dens...
method forward_cont (line 753) | def forward_cont(self, x):
method _stack (line 769) | def _stack(self, layers):
class ToSoftLabelLayer (line 775) | class ToSoftLabelLayer(Layer):
method __init__ (line 776) | def __init__(self, y_tensor_info, fitter: Fitter):
method forward_tensors (line 780) | def forward_tensors(self, tensors):
class ToSoftLabelFitter (line 798) | class ToSoftLabelFitter(Fitter):
method __init__ (line 799) | def __init__(self):
method forward_tensor_infos (line 802) | def forward_tensor_infos(self, tensor_infos):
method _fit (line 809) | def _fit(self, ds: DictDataset) -> Layer:
class LabelSmoothingLayer (line 813) | class LabelSmoothingLayer(Layer):
method __init__ (line 815) | def __init__(self, ls_dist: Variable):
method forward_tensors (line 820) | def forward_tensors(self, tensors):
method _stack (line 832) | def _stack(self, layers):
class LabelSmoothingFitter (line 836) | class LabelSmoothingFitter(Fitter):
method __init__ (line 837) | def __init__(self, use_ls_prior=False, **config):
method _fit (line 847) | def _fit(self, ds: DictDataset) -> Layer:
class LabelSmoothingFactory (line 860) | class LabelSmoothingFactory(FitterFactory):
method __init__ (line 861) | def __init__(self, **config):
method _create (line 865) | def _create(self, tensor_infos) -> Fitter:
class StochasticLabelNoiseLayer (line 873) | class StochasticLabelNoiseLayer(Layer):
method __init__ (line 874) | def __init__(self):
method forward_tensors (line 878) | def forward_tensors(self, tensors):
class StochasticLabelNoiseFitter (line 886) | class StochasticLabelNoiseFitter(Fitter):
method __init__ (line 887) | def __init__(self):
method _fit (line 890) | def _fit(self, ds: DictDataset) -> Layer:
class StochasticLabelNoiseFactory (line 895) | class StochasticLabelNoiseFactory(FitterFactory):
method _create (line 896) | def _create(self, tensor_infos) -> Fitter:
class StochasticGateLayer (line 905) | class StochasticGateLayer(Layer):
method __init__ (line 906) | def __init__(self, mu: Variable):
method forward_cont (line 912) | def forward_cont(self, x):
method _stack (line 923) | def _stack(self, layers):
class StochasticGateFactory (line 927) | class StochasticGateFactory(Fitter, FitterFactory):
method __init__ (line 928) | def __init__(self):
method get_n_params (line 931) | def get_n_params(self, tensor_infos: Dict[str, TensorInfo]) -> int:
method get_n_forward (line 934) | def get_n_forward(self, tensor_infos: Dict[str, TensorInfo]) -> int:
method _fit (line 938) | def _fit(self, ds: DictDataset) -> Layer:
class AntisymmetricInitializationFactory (line 944) | class AntisymmetricInitializationFactory(FitterFactory):
method __init__ (line 945) | def __init__(self, factory, **config):
method _create (line 950) | def _create(self, tensor_infos) -> Fitter:
class AntisymmetricInitializationFitter (line 959) | class AntisymmetricInitializationFitter(Fitter):
method __init__ (line 964) | def __init__(self, fitter: Fitter, **config):
method forward_tensor_infos (line 970) | def forward_tensor_infos(self, tensor_infos: Dict[str, TensorInfo]):
method get_n_params (line 973) | def get_n_params(self, tensor_infos: Dict[str, TensorInfo]):
method get_n_forward (line 976) | def get_n_forward(self, tensor_infos: Dict[str, TensorInfo]):
method _fit (line 979) | def _fit(self, ds: DictDataset) -> Layer:
method __str__ (line 992) | def __str__(self):
class SubtractionLayer (line 997) | class SubtractionLayer(Layer):
method __init__ (line 998) | def __init__(self, layer1: Layer, layer2: Layer):
method forward_tensor_infos (line 1003) | def forward_tensor_infos(self, tensor_infos):
method forward_tensors (line 1007) | def forward_tensors(self, tensors):
method _stack (line 1015) | def _stack(self, layers):
FILE: pytabkit/models/nn_models/pipeline.py
class ReplaceMissingContLayer (line 17) | class ReplaceMissingContLayer(Layer):
method __init__ (line 18) | def __init__(self, means: Variable):
method forward_cont (line 24) | def forward_cont(self, x):
method _stack (line 27) | def _stack(self, layers: List['ReplaceMissingContLayer']):
class MeanReplaceMissingContFactory (line 31) | class MeanReplaceMissingContFactory(Fitter, FitterFactory):
method __init__ (line 32) | def __init__(self, trainable=False, **config):
method get_n_params (line 36) | def get_n_params(self, tensor_infos: Dict[str, TensorInfo]) -> int:
method _fit (line 39) | def _fit(self, ds: DictDataset) -> Layer:
class MeanCenterFactory (line 50) | class MeanCenterFactory(Fitter, FitterFactory):
method __init__ (line 51) | def __init__(self, trainable=False, **config):
method get_n_params (line 55) | def get_n_params(self, tensor_infos: Dict[str, TensorInfo]) -> int:
method _fit (line 58) | def _fit(self, ds: DictDataset) -> Layer:
class MedianCenterFactory (line 65) | class MedianCenterFactory(Fitter, FitterFactory):
method __init__ (line 66) | def __init__(self, median_center_trainable=False, **config):
method get_n_params (line 70) | def get_n_params(self, tensor_infos: Dict[str, TensorInfo]) -> int:
method _fit (line 73) | def _fit(self, ds: DictDataset) -> Layer:
class L2NormalizeFactory (line 83) | class L2NormalizeFactory(Fitter, FitterFactory):
method __init__ (line 84) | def __init__(self, trainable=False, l2_normalize_eps=1e-8, **config):
method get_n_params (line 89) | def get_n_params(self, tensor_infos: Dict[str, TensorInfo]) -> int:
method _fit (line 92) | def _fit(self, ds: DictDataset) -> Layer:
class L1NormalizeFactory (line 100) | class L1NormalizeFactory(Fitter, FitterFactory):
method __init__ (line 101) | def __init__(self, trainable=False, eps=1e-8, **config):
method get_n_params (line 106) | def get_n_params(self, tensor_infos: Dict[str, TensorInfo]) -> int:
method _fit (line 109) | def _fit(self, ds: DictDataset) -> Layer:
class MinMaxScaleFactory (line 116) | class MinMaxScaleFactory(Fitter, FitterFactory):
method __init__ (line 117) | def __init__(self, trainable=False, eps=1e-8, **config):
method get_n_params (line 122) | def get_n_params(self, tensor_infos: Dict[str, TensorInfo]) -> int:
method _fit (line 125) | def _fit(self, ds: DictDataset) -> Layer:
class RobustScaleFactory (line 137) | class RobustScaleFactory(Fitter, FitterFactory):
method __init__ (line 138) | def __init__(self, robust_scale_trainable=False, robust_scale_eps=1e-3...
method get_n_params (line 143) | def get_n_params(self, tensor_infos: Dict[str, TensorInfo]) -> int:
method _fit (line 146) | def _fit(self, ds: DictDataset) -> Layer:
class RobustScaleV2Factory (line 160) | class RobustScaleV2Factory(Fitter, FitterFactory):
method __init__ (line 161) | def __init__(self, robust_scale_trainable=False, **config):
method get_n_params (line 165) | def get_n_params(self, tensor_infos: Dict[str, TensorInfo]) -> int:
method _fit (line 168) | def _fit(self, ds: DictDataset) -> Layer:
class GlobalScaleNormalizeFactory (line 183) | class GlobalScaleNormalizeFactory(Fitter, FitterFactory):
method __init__ (line 184) | def __init__(self, global_scale_factor=1.0, **config):
method get_n_params (line 188) | def get_n_params(self, tensor_infos: Dict[str, TensorInfo]) -> int:
method _fit (line 191) | def _fit(self, ds: DictDataset) -> Layer:
class ThermometerCodingLayer (line 199) | class ThermometerCodingLayer(Layer):
method __init__ (line 200) | def __init__(self, centers: Variable, scale: float, fitter: Fitter):
method forward_cont (line 205) | def forward_cont(self, x):
method _stack (line 209) | def _stack(self, layers):
class ThermometerCodingFactory (line 213) | class ThermometerCodingFactory(Fitter, FitterFactory):
method __init__ (line 214) | def __init__(self, tc_low=-1.0, tc_high=1.0, tc_num=3, tc_scale=1.0, *...
method get_n_params (line 221) | def get_n_params(self, tensor_infos: Dict[str, TensorInfo]) -> int:
method forward_tensor_infos (line 224) | def forward_tensor_infos(self, tensor_infos):
method _fit (line 228) | def _fit(self, ds: DictDataset) -> Layer:
class CircleCodingLayer (line 236) | class CircleCodingLayer(Layer):
method __init__ (line 237) | def __init__(self, scale: float, fitter: Fitter):
method forward_cont (line 241) | def forward_cont(self, x):
method _stack (line 246) | def _stack(self, layers):
class CircleCodingFactory (line 250) | class CircleCodingFactory(Fitter, FitterFactory):
method __init__ (line 251) | def __init__(self, circle_coding_scale=1.0, **config):
method forward_tensor_infos (line 255) | def forward_tensor_infos(self, tensor_infos):
method _fit (line 259) | def _fit(self, ds: DictDataset) -> Layer:
function apply_tfms_rec (line 265) | def apply_tfms_rec(tfms: Union[BaseEstimator, List], x: torch.Tensor):
class SklearnTransformLayer (line 272) | class SklearnTransformLayer(Layer):
method __init__ (line 273) | def __init__(self, tfms: Union[BaseEstimator, List], fitter: Fitter):
method forward_cont (line 277) | def forward_cont(self, x):
method _stack (line 280) | def _stack(self, layers):
class SklearnTransformFactory (line 284) | class SklearnTransformFactory(Fitter, FitterFactory):
method __init__ (line 285) | def __init__(self, tfm: BaseEstimator, **config):
method _fit (line 289) | def _fit(self, ds: DictDataset) -> Layer:
FILE: pytabkit/models/nn_models/rtdl_num_embeddings.py
function _check_input_shape (line 36) | def _check_input_shape(x: Tensor, expected_n_features: int) -> None:
class LinearEmbeddings (line 48) | class LinearEmbeddings(nn.Module):
method __init__ (line 67) | def __init__(self, n_features: int, d_embedding: int) -> None:
method reset_parameters (line 83) | def reset_parameters(self) -> None:
method forward (line 88) | def forward(self, x: Tensor) -> Tensor:
class LinearReLUEmbeddings (line 94) | class LinearReLUEmbeddings(nn.Sequential):
method __init__ (line 114) | def __init__(self, n_features: int, d_embedding: int = 32) -> None:
class _Periodic (line 128) | class _Periodic(nn.Module):
method __init__ (line 137) | def __init__(self, n_features: int, k: int, sigma: float) -> None:
method reset_parameters (line 146) | def reset_parameters(self):
method forward (line 154) | def forward(self, x: Tensor) -> Tensor:
class _NLinear (line 164) | class _NLinear(nn.Module):
method __init__ (line 171) | def __init__(
method reset_parameters (line 179) | def reset_parameters(self):
method forward (line 186) | def forward(self, x: torch.Tensor) -> torch.Tensor:
class PeriodicEmbeddings (line 203) | class PeriodicEmbeddings(nn.Module):
method __init__ (line 234) | def __init__(
method forward (line 272) | def forward(self, x: Tensor) -> Tensor:
function _check_bins (line 281) | def _check_bins(bins: list[Tensor]) -> None:
function compute_bins (line 319) | def compute_bins(
class _PiecewiseLinearEncodingImpl (line 500) | class _PiecewiseLinearEncodingImpl(nn.Module):
method __init__ (line 562) | def __init__(self, bins: list[Tensor]) -> None:
method get_max_n_bins (line 624) | def get_max_n_bins(self) -> int:
method forward (line 627) | def forward(self, x: Tensor) -> Tensor:
class PiecewiseLinearEncoding (line 655) | class PiecewiseLinearEncoding(nn.Module):
method __init__ (line 671) | def __init__(self, bins: list[Tensor]) -> None:
method forward (line 679) | def forward(self, x: Tensor) -> Tensor:
class PiecewiseLinearEmbeddings (line 685) | class PiecewiseLinearEmbeddings(nn.Module):
method __init__ (line 694) | def __init__(
method forward (line 748) | def forward(self, x: Tensor) -> Tensor:
FILE: pytabkit/models/nn_models/rtdl_resnet.py
function reglu (line 33) | def reglu(x: Tensor) -> Tensor:
function geglu (line 38) | def geglu(x: Tensor) -> Tensor:
function get_activation_fn (line 43) | def get_activation_fn(name: str) -> ty.Callable[[Tensor], Tensor]:
function get_nonglu_activation_fn (line 55) | def get_nonglu_activation_fn(name: str) -> ty.Callable[[Tensor], Tensor]:
function print_but_serializable (line 65) | def print_but_serializable(*args, **kwargs):
class RTDL_MLP (line 73) | class RTDL_MLP(nn.Module):
method __init__ (line 75) | def __init__(
method forward (line 151) | def forward(self, x):
class ResNet (line 187) | class ResNet(nn.Module):
method __init__ (line 188) | def __init__(
method forward (line 257) | def forward(self, x) -> Tensor:
class Tokenizer (line 298) | class Tokenizer(nn.Module):
method __init__ (line 301) | def __init__(
method n_tokens (line 340) | def n_tokens(self) -> int:
method forward (line 345) | def forward(self, x_num: Tensor, x_cat: ty.Optional[Tensor]) -> Tensor:
class MultiheadAttention (line 373) | class MultiheadAttention(nn.Module):
method __init__ (line 374) | def __init__(
method _reshape (line 397) | def _reshape(self, x: Tensor) -> Tensor:
method forward (line 406) | def forward(
class FT_Transformer (line 444) | class FT_Transformer(nn.Module):
method __init__ (line 453) | def __init__(
method _get_kv_compressions (line 539) | def _get_kv_compressions(self, layer):
method _start_residual (line 550) | def _start_residual(self, x, layer, norm_idx):
method _end_residual (line 558) | def _end_residual(self, x, x_residual, layer, norm_idx):
method forward (line 566) | def forward(self, x) -> Tensor:
class InputShapeSetterResnet (line 610) | class InputShapeSetterResnet(skorch.callbacks.Callback):
method __init__ (line 611) | def __init__(
method on_train_begin (line 619) | def on_train_begin(self, net, X, y):
class LearningRateLogger (line 657) | class LearningRateLogger(Callback):
method on_epoch_begin (line 658) | def on_epoch_begin(self, net, dataset_train=None, dataset_valid=None, ...
class UniquePrefixCheckpoint (line 667) | class UniquePrefixCheckpoint(Checkpoint):
method initialize (line 676) | def initialize(self):
method on_train_end (line 683) | def on_train_end(self, net, **kwargs):
class MyCustomError (line 719) | class MyCustomError(Exception):
class EarlyStoppingCustomError (line 723) | class EarlyStoppingCustomError(EarlyStopping):
method on_epoch_end (line 724) | def on_epoch_end(self, net, **kwargs):
class NeuralNetRegressorWrapped (line 742) | class NeuralNetRegressorWrapped(NeuralNetRegressor):
method __init__ (line 743) | def __init__(self, *args, **kwargs):
method set_categorical_indicator (line 750) | def set_categorical_indicator(self, categorical_indicator):
method set_predict_mean (line 753) | def set_predict_mean(self, predict_mean):
method set_y_train_mean (line 756) | def set_y_train_mean(self, y_train_mean):
method get_default_callbacks (line 759) | def get_default_callbacks(self):
method fit (line 765) | def fit(self, X, y):
method predict (line 771) | def predict(self, X):
method partial_fit (line 781) | def partial_fit(self, X, y=None, classes=None, **fit_params):
class NeuralNetClassifierWrapped (line 795) | class NeuralNetClassifierWrapped(NeuralNetClassifier):
method __init__ (line 796) | def __init__(self, *args, **kwargs):
method set_categorical_indicator (line 801) | def set_categorical_indicator(self, categorical_indicator):
method set_n_classes (line 804) | def set_n_classes(self, n_classes):
method fit (line 807) | def fit(self, X, y):
method get_default_callbacks (line 811) | def get_default_callbacks(self):
method partial_fit (line 821) | def partial_fit(self, X, y=None, classes=None, **fit_params):
function initialize_optimizer_ft_transformer (line 836) | def initialize_optimizer_ft_transformer(self, triggered_directly=None):
class NeuralNetClassifierCustomOptim (line 875) | class NeuralNetClassifierCustomOptim(NeuralNetClassifierWrapped):
method initialize_optimizer (line 876) | def initialize_optimizer(self, triggered_directly=None):
class NeuralNetRegressorCustomOptim (line 879) | class NeuralNetRegressorCustomOptim(NeuralNetRegressorWrapped):
method initialize_optimizer (line 880) | def initialize_optimizer(self, triggered_directly=None):
function mse_constant_predictor (line 883) | def mse_constant_predictor(model, X, y):
function create_regressor_skorch (line 887) | def create_regressor_skorch(
function create_classifier_skorch (line 1002) | def create_classifier_skorch(
FILE: pytabkit/models/nn_models/tabm.py
function init_rsqrt_uniform_ (line 22) | def init_rsqrt_uniform_(x: Tensor, d: int) -> Tensor:
function init_random_signs_ (line 29) | def init_random_signs_(x: Tensor) -> Tensor:
class NLinear (line 36) | class NLinear(nn.Module):
method __init__ (line 53) | def __init__(
method reset_parameters (line 61) | def reset_parameters(self):
method forward (line 67) | def forward(self, x: torch.Tensor) -> torch.Tensor:
class OneHotEncoding0d (line 79) | class OneHotEncoding0d(nn.Module):
method __init__ (line 83) | def __init__(self, cardinalities: List[int]) -> None:
method forward (line 87) | def forward(self, x: Tensor) -> Tensor:
class ScaleEnsemble (line 114) | class ScaleEnsemble(nn.Module):
method __init__ (line 115) | def __init__(
method reset_parameters (line 127) | def reset_parameters(self) -> None:
method forward (line 137) | def forward(self, x: Tensor) -> Tensor:
class LinearEfficientEnsemble (line 142) | class LinearEfficientEnsemble(nn.Module):
method __init__ (line 168) | def __init__(
method reset_parameters (line 220) | def reset_parameters(self):
method forward (line 243) | def forward(self, x: Tensor) -> Tensor:
class MLP (line 260) | class MLP(nn.Module):
method __init__ (line 261) | def __init__(
method forward (line 286) | def forward(self, x: Tensor) -> Tensor:
function make_efficient_ensemble (line 294) | def make_efficient_ensemble(module: nn.Module, EnsembleLayer, **kwargs) ...
function _get_first_ensemble_layer (line 318) | def _get_first_ensemble_layer(backbone: MLP) -> LinearEfficientEnsemble:
function _init_first_adapter (line 326) | def _init_first_adapter(
function make_module (line 375) | def make_module(type: str, *args, **kwargs) -> nn.Module:
function default_zero_weight_decay_condition (line 385) | def default_zero_weight_decay_condition(
function make_parameter_groups (line 397) | def make_parameter_groups(
class Model (line 433) | class Model(nn.Module):
method __init__ (line 436) | def __init__(
method forward (line 594) | def forward(
FILE: pytabkit/models/nn_models/tabr.py
class NTPLinearLayer (line 24) | class NTPLinearLayer(nn.Module):
method __init__ (line 25) | def __init__(self, in_features: int, out_features: int, bias: bool = T...
method forward (line 41) | def forward(self, x):
class ParametricMishActivationLayer (line 48) | class ParametricMishActivationLayer(nn.Module):
method __init__ (line 49) | def __init__(self, n_features: int, lr_factor: float = 1.0):
method f (line 54) | def f(self, x):
method forward (line 57) | def forward(self, x):
class ParametricReluActivationLayer (line 62) | class ParametricReluActivationLayer(nn.Module):
method __init__ (line 63) | def __init__(self, n_features: int, lr_factor: float = 1.0):
method f (line 68) | def f(self, x):
method forward (line 71) | def forward(self, x):
class ScalingLayer (line 76) | class ScalingLayer(nn.Module):
method __init__ (line 77) | def __init__(self, n_features: int, lr_factor: float = 6.0):
method forward (line 82) | def forward(self, x):
function bce_with_logits_and_label_smoothing (line 86) | def bce_with_logits_and_label_smoothing(inputs, *args, ls_eps: float, **...
class TabrModel (line 92) | class TabrModel(nn.Module):
method __init__ (line 93) | def __init__(
method reset_parameters (line 223) | def reset_parameters(self):
method _encode (line 232) | def _encode(self, x_: dict[str, Tensor]) -> tuple[Tensor, Tensor]:
method forward (line 265) | def forward(
function zero_wd_condition (line 403) | def zero_wd_condition(
class TabrLightning (line 418) | class TabrLightning(pl.LightningModule):
method __init__ (line 419) | def __init__(self, model, train_dataset,
method setup (line 461) | def setup(self, stage=None):
method get_Xy (line 474) | def get_Xy(self, part: str, idx) -> tuple[dict[str, Tensor], Tensor]:
method training_step (line 496) | def training_step(self, batch, batch_idx):
method validation_step (line 543) | def validation_step(self, batch, batch_idx):
method predict_step (line 585) | def predict_step(self, batch, batch_idx, dataloader_idx=None):
method configure_optimizers (line 614) | def configure_optimizers(self):
method train_dataloader (line 621) | def train_dataloader(self):
method val_dataloader (line 626) | def val_dataloader(self):
FILE: pytabkit/models/nn_models/tabr_context_freeze.py
class TabrModelContextFreeze (line 29) | class TabrModelContextFreeze(nn.Module):
class ForwardOutput (line 30) | class ForwardOutput(NamedTuple):
method __init__ (line 35) | def __init__(
method reset_parameters (line 155) | def reset_parameters(self):
method _encode (line 164) | def _encode(self, x_: dict[str, Tensor]) -> tuple[Tensor, Tensor]:
method forward (line 197) | def forward(
function zero_wd_condition (line 310) | def zero_wd_condition(
class TabrLightningContextFreeze (line 325) | class TabrLightningContextFreeze(pl.LightningModule):
method __init__ (line 326) | def __init__(self, model, train_dataset,
method setup (line 370) | def setup(self, stage=None):
method get_Xy (line 383) | def get_Xy(self, part: str, idx) -> tuple[dict[str, Tensor], Tensor]:
method apply_model (line 405) | def apply_model(self, part, batch, batch_idx, training):
method training_step (line 447) | def training_step(self, batch, batch_idx):
method validation_step (line 517) | def validation_step(self, batch, batch_idx):
method predict_step (line 572) | def predict_step(self, batch, batch_idx, dataloader_idx=None):
method evaluate (line 612) | def evaluate(self, eval_batch_size: int, *, progress_bar: bool = False):
method configure_optimizers (line 677) | def configure_optimizers(self):
method train_dataloader (line 684) | def train_dataloader(self):
method val_dataloader (line 689) | def val_dataloader(self):
FILE: pytabkit/models/nn_models/tabr_lib.py
function _initialize_embeddings (line 27) | def _initialize_embeddings(weight: Tensor, d: Optional[int]) -> None:
function make_trainable_vector (line 34) | def make_trainable_vector(d: int) -> Parameter:
class OneHotEncoder (line 56) | class OneHotEncoder(nn.Module):
method __init__ (line 59) | def __init__(self, cardinalities: list[int]) -> None:
method forward (line 63) | def forward(self, x: torch.Tensor) -> torch.Tensor:
class CLSEmbedding (line 79) | class CLSEmbedding(nn.Module):
method __init__ (line 80) | def __init__(self, d_embedding: int) -> None:
method forward (line 84) | def forward(self, x: Tensor) -> Tensor:
class CatEmbeddings (line 90) | class CatEmbeddings(nn.Module):
method __init__ (line 91) | def __init__(
method reset_parameters (line 123) | def reset_parameters(self) -> None:
method forward (line 127) | def forward(self, x: Tensor) -> Tensor:
class LinearEmbeddings (line 134) | class LinearEmbeddings(nn.Module):
method __init__ (line 135) | def __init__(self, n_features: int, d_embedding: int, bias: bool = True):
method reset_parameters (line 141) | def reset_parameters(self) -> None:
method forward (line 146) | def forward(self, x: Tensor) -> Tensor:
class PeriodicEmbeddings (line 154) | class PeriodicEmbeddings(nn.Module):
method __init__ (line 155) | def __init__(
method forward (line 163) | def forward(self, x: Tensor) -> Tensor:
class NLinear (line 170) | class NLinear(nn.Module):
method __init__ (line 171) | def __init__(
method forward (line 184) | def forward(self, x):
class LREmbeddings (line 193) | class LREmbeddings(nn.Sequential):
method __init__ (line 196) | def __init__(self, n_features: int, d_embedding: int) -> None:
class PLREmbeddings (line 200) | class PLREmbeddings(nn.Sequential):
method __init__ (line 208) | def __init__(
class PBLDEmbeddings (line 227) | class PBLDEmbeddings(nn.Module):
method __init__ (line 228) | def __init__(self, n_features: int,
method forward (line 245) | def forward(self, x):
class MLP (line 272) | class MLP(nn.Module):
class Block (line 273) | class Block(nn.Module):
method __init__ (line 274) | def __init__(
method forward (line 288) | def forward(self, x: Tensor) -> Tensor:
method __init__ (line 293) | def __init__(
method d_out (line 321) | def d_out(self) -> int:
method forward (line 328) | def forward(self, x: Tensor) -> Tensor:
function register_module (line 347) | def register_module(key: str, f: Callable[..., nn.Module]) -> None:
function make_module (line 352) | def make_module(spec: ModuleSpec, *args, **kwargs) -> nn.Module:
function get_n_parameters (line 377) | def get_n_parameters(m: nn.Module):
function get_d_out (line 381) | def get_d_out(n_classes: Optional[int]) -> int:
function default_zero_weight_decay_condition (line 388) | def default_zero_weight_decay_condition(
function make_parameter_groups (line 404) | def make_parameter_groups(
function make_optimizer (line 442) | def make_optimizer(
function get_lr (line 460) | def get_lr(optimizer: optim.Optimizer) -> float:
function set_lr (line 464) | def set_lr(optimizer: optim.Optimizer, lr: float) -> None:
class Lambda (line 471) | class Lambda(torch.nn.Module):
method __init__ (line 512) | def __init__(self, fn: Callable[..., torch.Tensor], /, **kwargs) -> None:
method forward (line 561) | def forward(self, x: torch.Tensor) -> torch.Tensor:
function _make_index_batches (line 568) | def _make_index_batches(
function iter_batches (line 590) | def iter_batches(
function cat (line 775) | def cat(data: List[T], /, dim: int = 0) -> T:
function is_oom_exception (line 946) | def is_oom_exception(err: RuntimeError) -> bool:
FILE: pytabkit/models/optim/adopt.py
class ADOPT (line 36) | class ADOPT(Optimizer):
method __init__ (line 37) | def __init__(
method __setstate__ (line 98) | def __setstate__(self, state):
method _init_group (line 120) | def _init_group(
method step (line 188) | def step(self, closure=None):
function _single_tensor_adopt (line 244) | def _single_tensor_adopt(
function _multi_tensor_adopt (line 318) | def _multi_tensor_adopt(
function adopt (line 432) | def adopt(
FILE: pytabkit/models/optim/optimizers.py
class OptimizerBase (line 15) | class OptimizerBase(torch.optim.Optimizer):
method __init__ (line 16) | def __init__(self, opt, hyper_mappings, hp_manager: HyperparamManager):
method get_hyper_values (line 31) | def get_hyper_values(self, name, i, use_hyper_factor=True):
method step (line 38) | def step(self, closure=None, loss: Optional[torch.Tensor] = None):
method train (line 69) | def train(self):
method eval (line 74) | def eval(self):
method _opt_step_with_loss (line 79) | def _opt_step_with_loss(self, loss: Optional[torch.Tensor]):
method __getstate__ (line 82) | def __getstate__(self) -> Dict[str, Any]:
method __setstate__ (line 86) | def __setstate__(self, state: Dict[str, Any]) -> None:
class AdamOptimizer (line 91) | class AdamOptimizer(OptimizerBase):
method __init__ (line 92) | def __init__(self, param_groups, hp_manager):
class SchedulingAdamOptimizer (line 99) | class SchedulingAdamOptimizer(OptimizerBase):
method __init__ (line 100) | def __init__(self, param_groups, hp_manager):
class AMSGradOptimizer (line 107) | class AMSGradOptimizer(OptimizerBase):
method __init__ (line 108) | def __init__(self, param_groups, hp_manager):
class AdamaxOptimizer (line 115) | class AdamaxOptimizer(OptimizerBase):
method __init__ (line 116) | def __init__(self, param_groups, hp_manager):
class SGDOptimizer (line 123) | class SGDOptimizer(OptimizerBase):
method __init__ (line 124) | def __init__(self, param_groups, hp_manager):
class SFAdamOptimizer (line 129) | class SFAdamOptimizer(OptimizerBase):
method __init__ (line 130) | def __init__(self, param_groups, hp_manager: HyperparamManager):
class MoMoAdamOptimizer (line 140) | class MoMoAdamOptimizer(OptimizerBase):
method __init__ (line 141) | def __init__(self, param_groups, hp_manager: HyperparamManager):
method _opt_step_with_loss (line 148) | def _opt_step_with_loss(self, loss: Optional[torch.Tensor]):
class AdoptOptimizer (line 152) | class AdoptOptimizer(OptimizerBase):
method __init__ (line 153) | def __init__(self, param_groups, hp_manager: HyperparamManager):
function get_opt_class (line 161) | def get_opt_class(opt_name):
FILE: pytabkit/models/optim/scheduling_adam.py
class SchedulingAdam (line 7) | class SchedulingAdam(Optimizer):
method __init__ (line 35) | def __init__(self, params, lr=1e-3, betas=(0.9, 0.999), eps=1e-8,
method __setstate__ (line 51) | def __setstate__(self, state):
method step (line 57) | def step(self, closure=None):
FILE: pytabkit/models/sklearn/default_params.py
class DefaultParams (line 6) | class DefaultParams:
FILE: pytabkit/models/sklearn/sklearn_base.py
function to_df (line 30) | def to_df(x) -> pd.DataFrame:
function to_normal_type (line 39) | def to_normal_type(x) -> Any:
function concat_arrays (line 45) | def concat_arrays(x1, x2) -> Any:
function check_X_y_wrapper (line 53) | def check_X_y_wrapper(*args, **kwargs):
function check_array_wrapper (line 66) | def check_array_wrapper(*args, **kwargs):
class AlgInterfaceEstimator (line 79) | class AlgInterfaceEstimator(BaseEstimator):
method _create_alg_interface (line 84) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _supports_multioutput (line 88) | def _supports_multioutput(self) -> bool:
method _supports_single_class (line 92) | def _supports_single_class(self) -> bool:
method _supports_single_sample (line 97) | def _supports_single_sample(self) -> bool:
method _non_deterministic_tag (line 100) | def _non_deterministic_tag(self) -> bool:
method _is_classification (line 103) | def _is_classification(self) -> bool:
method _get_default_params (line 106) | def _get_default_params(self) -> Dict[str, Any]:
method _allowed_device_names (line 111) | def _allowed_device_names(self) -> List[str]:
method _more_tags (line 115) | def _more_tags(self):
method __sklearn_tags__ (line 118) | def __sklearn_tags__(self):
method get_config (line 123) | def get_config(self) -> Dict[str, Any]:
method fit (line 142) | def fit(self, X, y, X_val: Optional = None, y_val: Optional = None, va...
method _predict_raw (line 462) | def _predict_raw(self, X) -> torch.Tensor:
method to (line 486) | def to(self, device: str) -> None:
class AlgInterfaceClassifier (line 496) | class AlgInterfaceClassifier(ClassifierMixin, AlgInterfaceEstimator):
method _is_classification (line 499) | def _is_classification(self) -> bool:
method predict_proba (line 502) | def predict_proba(self, X) -> np.ndarray:
method predict_proba_ensemble (line 508) | def predict_proba_ensemble(self, X) -> np.ndarray:
method predict (line 515) | def predict(self, X):
method predict_ensemble (line 533) | def predict_ensemble(self, X):
class AlgInterfaceRegressor (line 539) | class AlgInterfaceRegressor(RegressorMixin, AlgInterfaceEstimator):
method _is_classification (line 542) | def _is_classification(self) -> bool:
method _more_tags (line 545) | def _more_tags(self):
method __sklearn_tags__ (line 548) | def __sklearn_tags__(self):
method predict (line 554) | def predict(self, X):
method predict_ensemble (line 565) | def predict_ensemble(self, X):
FILE: pytabkit/models/sklearn/sklearn_interfaces.py
class RealMLPConstructorMixin (line 43) | class RealMLPConstructorMixin:
method __init__ (line 44) | def __init__(self, device: Optional[str] = None, random_state: Optiona...
class RealMLP_TD_Classifier (line 343) | class RealMLP_TD_Classifier(RealMLPConstructorMixin, AlgInterfaceClassif...
method _get_default_params (line 348) | def _get_default_params(self):
method _create_alg_interface (line 351) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _allowed_device_names (line 354) | def _allowed_device_names(self) -> List[str]:
class RealMLP_TD_S_Classifier (line 358) | class RealMLP_TD_S_Classifier(RealMLPConstructorMixin, AlgInterfaceClass...
method _get_default_params (line 363) | def _get_default_params(self):
method _create_alg_interface (line 366) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _allowed_device_names (line 369) | def _allowed_device_names(self) -> List[str]:
class RealMLP_TD_Regressor (line 373) | class RealMLP_TD_Regressor(RealMLPConstructorMixin, AlgInterfaceRegressor):
method _get_default_params (line 378) | def _get_default_params(self):
method _create_alg_interface (line 381) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _allowed_device_names (line 384) | def _allowed_device_names(self) -> List[str]:
class RealMLP_TD_S_Regressor (line 388) | class RealMLP_TD_S_Regressor(RealMLPConstructorMixin, AlgInterfaceRegres...
method _get_default_params (line 393) | def _get_default_params(self):
method _create_alg_interface (line 396) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _allowed_device_names (line 399) | def _allowed_device_names(self) -> List[str]:
class LGBMConstructorMixin (line 406) | class LGBMConstructorMixin:
method __init__ (line 407) | def __init__(self, device: Optional[str] = None, random_state: Optiona...
class LGBM_TD_Classifier (line 457) | class LGBM_TD_Classifier(LGBMConstructorMixin, AlgInterfaceClassifier):
method _get_default_params (line 458) | def _get_default_params(self):
method _create_alg_interface (line 461) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
class LGBM_D_Classifier (line 466) | class LGBM_D_Classifier(LGBMConstructorMixin, AlgInterfaceClassifier):
method _get_default_params (line 467) | def _get_default_params(self):
method _create_alg_interface (line 470) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
class LGBM_TD_Regressor (line 475) | class LGBM_TD_Regressor(LGBMConstructorMixin, AlgInterfaceRegressor):
method _get_default_params (line 476) | def _get_default_params(self):
method _create_alg_interface (line 479) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _supports_multioutput (line 483) | def _supports_multioutput(self) -> bool:
class LGBM_D_Regressor (line 487) | class LGBM_D_Regressor(LGBMConstructorMixin, AlgInterfaceRegressor):
method _get_default_params (line 488) | def _get_default_params(self):
method _create_alg_interface (line 491) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _supports_multioutput (line 495) | def _supports_multioutput(self) -> bool:
class XGBConstructorMixin (line 499) | class XGBConstructorMixin:
method __init__ (line 500) | def __init__(self, device: Optional[str] = None, random_state: Optiona...
class XGB_TD_Classifier (line 555) | class XGB_TD_Classifier(XGBConstructorMixin, AlgInterfaceClassifier):
method _get_default_params (line 556) | def _get_default_params(self):
method _create_alg_interface (line 559) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _allowed_device_names (line 563) | def _allowed_device_names(self) -> List[str]:
class XGB_D_Classifier (line 567) | class XGB_D_Classifier(XGBConstructorMixin, AlgInterfaceClassifier):
method _get_default_params (line 568) | def _get_default_params(self):
method _create_alg_interface (line 571) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _allowed_device_names (line 575) | def _allowed_device_names(self) -> List[str]:
class XGB_PBB_D_Classifier (line 579) | class XGB_PBB_D_Classifier(XGBConstructorMixin, AlgInterfaceClassifier):
method _get_default_params (line 580) | def _get_default_params(self):
method _create_alg_interface (line 583) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _allowed_device_names (line 587) | def _allowed_device_names(self) -> List[str]:
class XGB_TD_Regressor (line 591) | class XGB_TD_Regressor(XGBConstructorMixin, AlgInterfaceRegressor):
method _get_default_params (line 592) | def _get_default_params(self):
method _create_alg_interface (line 595) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _allowed_device_names (line 599) | def _allowed_device_names(self) -> List[str]:
method _supports_multioutput (line 602) | def _supports_multioutput(self) -> bool:
class XGB_D_Regressor (line 606) | class XGB_D_Regressor(XGBConstructorMixin, AlgInterfaceRegressor):
method _get_default_params (line 607) | def _get_default_params(self):
method _create_alg_interface (line 610) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _allowed_device_names (line 614) | def _allowed_device_names(self) -> List[str]:
method _supports_multioutput (line 617) | def _supports_multioutput(self) -> bool:
class CatBoostConstructorMixin (line 621) | class CatBoostConstructorMixin:
method __init__ (line 622) | def __init__(self, device: Optional[str] = None, random_state: Optiona...
class CatBoost_TD_Classifier (line 677) | class CatBoost_TD_Classifier(CatBoostConstructorMixin, AlgInterfaceClass...
method _get_default_params (line 678) | def _get_default_params(self):
method _create_alg_interface (line 681) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _supports_single_class (line 685) | def _supports_single_class(self) -> bool:
method _supports_single_sample (line 688) | def _supports_single_sample(self) -> bool:
method _allowed_device_names (line 691) | def _allowed_device_names(self) -> List[str]:
class CatBoost_D_Classifier (line 695) | class CatBoost_D_Classifier(CatBoostConstructorMixin, AlgInterfaceClassi...
method _get_default_params (line 696) | def _get_default_params(self):
method _create_alg_interface (line 699) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _supports_single_class (line 703) | def _supports_single_class(self) -> bool:
method _supports_single_sample (line 706) | def _supports_single_sample(self) -> bool:
method _allowed_device_names (line 709) | def _allowed_device_names(self) -> List[str]:
class CatBoost_TD_Regressor (line 713) | class CatBoost_TD_Regressor(CatBoostConstructorMixin, AlgInterfaceRegres...
method _get_default_params (line 714) | def _get_default_params(self):
method _create_alg_interface (line 717) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _supports_multioutput (line 721) | def _supports_multioutput(self) -> bool:
method _supports_single_sample (line 724) | def _supports_single_sample(self) -> bool:
method _allowed_device_names (line 727) | def _allowed_device_names(self) -> List[str]:
class CatBoost_D_Regressor (line 731) | class CatBoost_D_Regressor(CatBoostConstructorMixin, AlgInterfaceRegress...
method _get_default_params (line 732) | def _get_default_params(self):
method _create_alg_interface (line 735) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _supports_multioutput (line 739) | def _supports_multioutput(self) -> bool:
method _supports_single_sample (line 742) | def _supports_single_sample(self) -> bool:
method _allowed_device_names (line 745) | def _allowed_device_names(self) -> List[str]:
class RFConstructorMixin (line 749) | class RFConstructorMixin:
method __init__ (line 750) | def __init__(self, device: Optional[str] = None, random_state: Optiona...
class RF_SKL_D_Classifier (line 786) | class RF_SKL_D_Classifier(RFConstructorMixin, AlgInterfaceClassifier):
method _get_default_params (line 787) | def _get_default_params(self):
method _create_alg_interface (line 790) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
class RF_SKL_D_Regressor (line 795) | class RF_SKL_D_Regressor(RFConstructorMixin, AlgInterfaceRegressor):
method _get_default_params (line 796) | def _get_default_params(self):
method _create_alg_interface (line 799) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
class MLPSKLConstructorMixin (line 804) | class MLPSKLConstructorMixin:
method __init__ (line 805) | def __init__(self, device: Optional[str] = None, random_state: Optiona...
class MLP_SKL_D_Classifier (line 823) | class MLP_SKL_D_Classifier(MLPSKLConstructorMixin, AlgInterfaceClassifier):
method _get_default_params (line 824) | def _get_default_params(self):
method _create_alg_interface (line 827) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
class MLP_SKL_D_Regressor (line 832) | class MLP_SKL_D_Regressor(MLPSKLConstructorMixin, AlgInterfaceRegressor):
method _get_default_params (line 833) | def _get_default_params(self):
method _create_alg_interface (line 836) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
class GBDTHPOConstructorMixin (line 843) | class GBDTHPOConstructorMixin:
method __init__ (line 844) | def __init__(self, device: Optional[str] = None, random_state: Optiona...
class XGB_HPO_Classifier (line 872) | class XGB_HPO_Classifier(GBDTHPOConstructorMixin, AlgInterfaceClassifier):
method _get_default_params (line 873) | def _get_default_params(self) -> Dict[str, Any]:
method _create_alg_interface (line 876) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _allowed_device_names (line 886) | def _allowed_device_names(self) -> List[str]:
class XGB_HPO_TPE_Classifier (line 890) | class XGB_HPO_TPE_Classifier(GBDTHPOConstructorMixin, AlgInterfaceClassi...
method _get_default_params (line 891) | def _get_default_params(self) -> Dict[str, Any]:
method _create_alg_interface (line 896) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _allowed_device_names (line 900) | def _allowed_device_names(self) -> List[str]:
class XGB_HPO_Regressor (line 904) | class XGB_HPO_Regressor(GBDTHPOConstructorMixin, AlgInterfaceRegressor):
method _get_default_params (line 905) | def _get_default_params(self) -> Dict[str, Any]:
method _allowed_device_names (line 908) | def _allowed_device_names(self) -> List[str]:
method _create_alg_interface (line 911) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _supports_multioutput (line 921) | def _supports_multioutput(self) -> bool:
class XGB_HPO_TPE_Regressor (line 925) | class XGB_HPO_TPE_Regressor(GBDTHPOConstructorMixin, AlgInterfaceRegress...
method _get_default_params (line 926) | def _get_default_params(self) -> Dict[str, Any]:
method _allowed_device_names (line 931) | def _allowed_device_names(self) -> List[str]:
method _create_alg_interface (line 934) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _supports_multioutput (line 938) | def _supports_multioutput(self) -> bool:
class LGBM_HPO_Classifier (line 942) | class LGBM_HPO_Classifier(GBDTHPOConstructorMixin, AlgInterfaceClassifier):
method _get_default_params (line 943) | def _get_default_params(self) -> Dict[str, Any]:
method _create_alg_interface (line 946) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
class LGBM_HPO_TPE_Classifier (line 957) | class LGBM_HPO_TPE_Classifier(GBDTHPOConstructorMixin, AlgInterfaceClass...
method _get_default_params (line 958) | def _get_default_params(self) -> Dict[str, Any]:
method _create_alg_interface (line 963) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
class LGBM_HPO_Regressor (line 968) | class LGBM_HPO_Regressor(GBDTHPOConstructorMixin, AlgInterfaceRegressor):
method _get_default_params (line 969) | def _get_default_params(self) -> Dict[str, Any]:
method _create_alg_interface (line 972) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _supports_multioutput (line 982) | def _supports_multioutput(self) -> bool:
class LGBM_HPO_TPE_Regressor (line 986) | class LGBM_HPO_TPE_Regressor(GBDTHPOConstructorMixin, AlgInterfaceRegres...
method _get_default_params (line 987) | def _get_default_params(self) -> Dict[str, Any]:
method _create_alg_interface (line 992) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _supports_multioutput (line 996) | def _supports_multioutput(self) -> bool:
class CatBoost_HPO_Classifier (line 1000) | class CatBoost_HPO_Classifier(GBDTHPOConstructorMixin, AlgInterfaceClass...
method _get_default_params (line 1001) | def _get_default_params(self) -> Dict[str, Any]:
method _create_alg_interface (line 1004) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _supports_single_class (line 1015) | def _supports_single_class(self) -> bool:
method _supports_single_sample (line 1018) | def _supports_single_sample(self) -> bool:
method _allowed_device_names (line 1021) | def _allowed_device_names(self) -> List[str]:
class CatBoost_HPO_TPE_Classifier (line 1025) | class CatBoost_HPO_TPE_Classifier(GBDTHPOConstructorMixin, AlgInterfaceC...
method _get_default_params (line 1026) | def _get_default_params(self) -> Dict[str, Any]:
method _create_alg_interface (line 1031) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _supports_single_class (line 1035) | def _supports_single_class(self) -> bool:
method _supports_single_sample (line 1038) | def _supports_single_sample(self) -> bool:
method _allowed_device_names (line 1041) | def _allowed_device_names(self) -> List[str]:
class CatBoost_HPO_Regressor (line 1045) | class CatBoost_HPO_Regressor(GBDTHPOConstructorMixin, AlgInterfaceRegres...
method _get_default_params (line 1046) | def _get_default_params(self) -> Dict[str, Any]:
method _create_alg_interface (line 1049) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _supports_multioutput (line 1060) | def _supports_multioutput(self) -> bool:
method _supports_single_sample (line 1063) | def _supports_single_sample(self) -> bool:
method _allowed_device_names (line 1066) | def _allowed_device_names(self) -> List[str]:
class CatBoost_HPO_TPE_Regressor (line 1070) | class CatBoost_HPO_TPE_Regressor(GBDTHPOConstructorMixin, AlgInterfaceRe...
method _get_default_params (line 1071) | def _get_default_params(self) -> Dict[str, Any]:
method _create_alg_interface (line 1076) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _supports_multioutput (line 1080) | def _supports_multioutput(self) -> bool:
method _supports_single_sample (line 1083) | def _supports_single_sample(self) -> bool:
method _allowed_device_names (line 1086) | def _allowed_device_names(self) -> List[str]:
class RF_HPO_Classifier (line 1090) | class RF_HPO_Classifier(GBDTHPOConstructorMixin, AlgInterfaceClassifier):
method _get_default_params (line 1091) | def _get_default_params(self) -> Dict[str, Any]:
method _create_alg_interface (line 1094) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _supports_single_class (line 1105) | def _supports_single_class(self) -> bool:
method _supports_single_sample (line 1108) | def _supports_single_sample(self) -> bool:
class RF_HPO_Regressor (line 1112) | class RF_HPO_Regressor(GBDTHPOConstructorMixin, AlgInterfaceRegressor):
method _get_default_params (line 1113) | def _get_default_params(self) -> Dict[str, Any]:
method _create_alg_interface (line 1116) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _supports_multioutput (line 1127) | def _supports_multioutput(self) -> bool:
method _supports_single_sample (line 1130) | def _supports_single_sample(self) -> bool:
class RealMLPHPOConstructorMixin (line 1134) | class RealMLPHPOConstructorMixin:
method __init__ (line 1135) | def __init__(self, device: Optional[str] = None, random_state: Optiona...
class RealMLP_HPO_Classifier (line 1224) | class RealMLP_HPO_Classifier(RealMLPHPOConstructorMixin, AlgInterfaceCla...
method _get_default_params (line 1225) | def _get_default_params(self):
method _create_alg_interface (line 1228) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _allowed_device_names (line 1236) | def _allowed_device_names(self) -> List[str]:
class RealMLP_HPO_Regressor (line 1240) | class RealMLP_HPO_Regressor(RealMLPHPOConstructorMixin, AlgInterfaceRegr...
method _get_default_params (line 1241) | def _get_default_params(self):
method _create_alg_interface (line 1244) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _allowed_device_names (line 1252) | def _allowed_device_names(self) -> List[str]:
class ResnetConstructorMixin (line 1256) | class ResnetConstructorMixin:
method __init__ (line 1257) | def __init__(self,
class Resnet_RTDL_D_Classifier (line 1321) | class Resnet_RTDL_D_Classifier(ResnetConstructorMixin, AlgInterfaceClass...
method _get_default_params (line 1322) | def _get_default_params(self):
method _create_alg_interface (line 1325) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _allowed_device_names (line 1329) | def _allowed_device_names(self) -> List[str]:
method _supports_single_class (line 1332) | def _supports_single_class(self) -> bool:
method _supports_single_sample (line 1335) | def _supports_single_sample(self) -> bool:
method _non_deterministic_tag (line 1338) | def _non_deterministic_tag(self) -> bool:
class Resnet_RTDL_D_Regressor (line 1345) | class Resnet_RTDL_D_Regressor(ResnetConstructorMixin, AlgInterfaceRegres...
method _get_default_params (line 1346) | def _get_default_params(self):
method _create_alg_interface (line 1349) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _allowed_device_names (line 1353) | def _allowed_device_names(self) -> List[str]:
method _supports_single_sample (line 1356) | def _supports_single_sample(self) -> bool:
method _supports_multioutput (line 1359) | def _supports_multioutput(self) -> bool:
method _non_deterministic_tag (line 1362) | def _non_deterministic_tag(self) -> bool:
class FTTransformerConstructorMixin (line 1368) | class FTTransformerConstructorMixin:
method __init__ (line 1369) | def __init__(self,
class FTT_D_Classifier (line 1442) | class FTT_D_Classifier(FTTransformerConstructorMixin, AlgInterfaceClassi...
method _get_default_params (line 1443) | def _get_default_params(self):
method _create_alg_interface (line 1446) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _allowed_device_names (line 1451) | def _allowed_device_names(self) -> List[str]:
method _supports_single_class (line 1454) | def _supports_single_class(self) -> bool:
method _supports_single_sample (line 1457) | def _supports_single_sample(self) -> bool:
method _non_deterministic_tag (line 1460) | def _non_deterministic_tag(self) -> bool:
class FTT_D_Regressor (line 1467) | class FTT_D_Regressor(FTTransformerConstructorMixin, AlgInterfaceRegress...
method _get_default_params (line 1468) | def _get_default_params(self):
method _create_alg_interface (line 1471) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _allowed_device_names (line 1476) | def _allowed_device_names(self) -> List[str]:
method _supports_single_sample (line 1479) | def _supports_single_sample(self) -> bool:
method _supports_multioutput (line 1482) | def _supports_multioutput(self) -> bool:
method _non_deterministic_tag (line 1485) | def _non_deterministic_tag(self) -> bool:
class RTDL_MLPConstructorMixin (line 1491) | class RTDL_MLPConstructorMixin:
method __init__ (line 1492) | def __init__(self,
class MLP_RTDL_D_Classifier (line 1561) | class MLP_RTDL_D_Classifier(RTDL_MLPConstructorMixin, AlgInterfaceClassi...
method _get_default_params (line 1562) | def _get_default_params(self):
method _create_alg_interface (line 1565) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _allowed_device_names (line 1569) | def _allowed_device_names(self) -> List[str]:
method _supports_single_class (line 1572) | def _supports_single_class(self) -> bool:
method _supports_single_sample (line 1575) | def _supports_single_sample(self) -> bool:
method _non_deterministic_tag (line 1578) | def _non_deterministic_tag(self) -> bool:
class MLP_RTDL_D_Regressor (line 1585) | class MLP_RTDL_D_Regressor(RTDL_MLPConstructorMixin, AlgInterfaceRegress...
method _get_default_params (line 1586) | def _get_default_params(self):
method _create_alg_interface (line 1589) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _allowed_device_names (line 1593) | def _allowed_device_names(self) -> List[str]:
method _supports_single_sample (line 1596) | def _supports_single_sample(self) -> bool:
method _supports_multioutput (line 1599) | def _supports_multioutput(self) -> bool:
method _non_deterministic_tag (line 1602) | def _non_deterministic_tag(self) -> bool:
class MLP_PLR_D_Classifier (line 1608) | class MLP_PLR_D_Classifier(RTDL_MLPConstructorMixin, AlgInterfaceClassif...
method _get_default_params (line 1609) | def _get_default_params(self):
method _create_alg_interface (line 1612) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _allowed_device_names (line 1616) | def _allowed_device_names(self) -> List[str]:
method _supports_single_class (line 1619) | def _supports_single_class(self) -> bool:
method _supports_single_sample (line 1622) | def _supports_single_sample(self) -> bool:
method _non_deterministic_tag (line 1625) | def _non_deterministic_tag(self) -> bool:
class MLP_PLR_D_Regressor (line 1632) | class MLP_PLR_D_Regressor(RTDL_MLPConstructorMixin, AlgInterfaceRegressor):
method _get_default_params (line 1633) | def _get_default_params(self):
method _create_alg_interface (line 1636) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _allowed_device_names (line 1640) | def _allowed_device_names(self) -> List[str]:
method _supports_single_sample (line 1643) | def _supports_single_sample(self) -> bool:
method _supports_multioutput (line 1646) | def _supports_multioutput(self) -> bool:
method _non_deterministic_tag (line 1649) | def _non_deterministic_tag(self) -> bool:
class TabrConstructorMixin (line 1655) | class TabrConstructorMixin:
method __init__ (line 1656) | def __init__(self,
class TabR_S_D_Classifier (line 1737) | class TabR_S_D_Classifier(TabrConstructorMixin, AlgInterfaceClassifier):
method _get_default_params (line 1738) | def _get_default_params(self):
method _create_alg_interface (line 1741) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _allowed_device_names (line 1745) | def _allowed_device_names(self) -> List[str]:
class TabR_S_D_Regressor (line 1749) | class TabR_S_D_Regressor(TabrConstructorMixin, AlgInterfaceRegressor):
method _get_default_params (line 1750) | def _get_default_params(self):
method _create_alg_interface (line 1753) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _allowed_device_names (line 1757) | def _allowed_device_names(self) -> List[str]:
class RealTabR_D_Classifier (line 1761) | class RealTabR_D_Classifier(TabrConstructorMixin, AlgInterfaceClassifier):
method _get_default_params (line 1762) | def _get_default_params(self):
method _create_alg_interface (line 1765) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _allowed_device_names (line 1769) | def _allowed_device_names(self) -> List[str]:
class RealTabR_D_Regressor (line 1773) | class RealTabR_D_Regressor(TabrConstructorMixin, AlgInterfaceRegressor):
method _get_default_params (line 1774) | def _get_default_params(self):
method _create_alg_interface (line 1777) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _allowed_device_names (line 1781) | def _allowed_device_names(self) -> List[str]:
class TabMConstructorMixin (line 1785) | class TabMConstructorMixin:
method __init__ (line 1786) | def __init__(self, device: Optional[str] = None, random_state: Optiona...
class TabM_D_Classifier (line 1912) | class TabM_D_Classifier(TabMConstructorMixin, AlgInterfaceClassifier):
method _get_default_params (line 1913) | def _get_default_params(self):
method _create_alg_interface (line 1916) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _allowed_device_names (line 1920) | def _allowed_device_names(self) -> List[str]:
method _supports_single_class (line 1923) | def _supports_single_class(self) -> bool:
method _supports_single_sample (line 1926) | def _supports_single_sample(self) -> bool:
class TabM_D_Regressor (line 1930) | class TabM_D_Regressor(TabMConstructorMixin, AlgInterfaceRegressor):
method _get_default_params (line 1931) | def _get_default_params(self):
method _create_alg_interface (line 1934) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _allowed_device_names (line 1938) | def _allowed_device_names(self) -> List[str]:
method _supports_multioutput (line 1941) | def _supports_multioutput(self) -> bool:
method _supports_single_sample (line 1944) | def _supports_single_sample(self) -> bool:
class TabM_HPO_Classifier (line 1948) | class TabM_HPO_Classifier(RealMLPHPOConstructorMixin, AlgInterfaceClassi...
method _get_default_params (line 1952) | def _get_default_params(self):
method _create_alg_interface (line 1955) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _allowed_device_names (line 1964) | def _allowed_device_names(self) -> List[str]:
class TabM_HPO_Regressor (line 1968) | class TabM_HPO_Regressor(RealMLPHPOConstructorMixin, AlgInterfaceRegress...
method _get_default_params (line 1972) | def _get_default_params(self):
method _create_alg_interface (line 1975) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _allowed_device_names (line 1984) | def _allowed_device_names(self) -> List[str]:
class XRFMConstructorMixin (line 1990) | class XRFMConstructorMixin:
method __init__ (line 1991) | def __init__(self, device: Optional[str] = None, random_state: Optiona...
class XRFM_D_Classifier (line 2115) | class XRFM_D_Classifier(XRFMConstructorMixin, AlgInterfaceClassifier):
method _get_default_params (line 2116) | def _get_default_params(self):
method _create_alg_interface (line 2119) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _allowed_device_names (line 2123) | def _allowed_device_names(self) -> List[str]:
method _non_deterministic_tag (line 2126) | def _non_deterministic_tag(self) -> bool:
method _supports_single_sample (line 2131) | def _supports_single_sample(self) -> bool:
method _supports_multioutput (line 2134) | def _supports_multioutput(self) -> bool:
class XRFM_D_Regressor (line 2139) | class XRFM_D_Regressor(XRFMConstructorMixin, AlgInterfaceRegressor):
method _get_default_params (line 2140) | def _get_default_params(self):
method _create_alg_interface (line 2143) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _allowed_device_names (line 2147) | def _allowed_device_names(self) -> List[str]:
method _non_deterministic_tag (line 2150) | def _non_deterministic_tag(self) -> bool:
method _supports_single_sample (line 2155) | def _supports_single_sample(self) -> bool:
method _supports_multioutput (line 2158) | def _supports_multioutput(self) -> bool:
class XRFMHPOConstructorMixin (line 2162) | class XRFMHPOConstructorMixin:
method __init__ (line 2163) | def __init__(self, device: Optional[str] = None, random_state: Optiona...
class XRFM_HPO_Classifier (line 2264) | class XRFM_HPO_Classifier(XRFMHPOConstructorMixin, AlgInterfaceClassifier):
method _get_default_params (line 2268) | def _get_default_params(self):
method _create_alg_interface (line 2271) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _allowed_device_names (line 2280) | def _allowed_device_names(self) -> List[str]:
method _supports_single_sample (line 2283) | def _supports_single_sample(self) -> bool:
method _supports_multioutput (line 2286) | def _supports_multioutput(self) -> bool:
class XRFM_HPO_Regressor (line 2290) | class XRFM_HPO_Regressor(XRFMHPOConstructorMixin, AlgInterfaceRegressor):
method _get_default_params (line 2294) | def _get_default_params(self):
method _create_alg_interface (line 2297) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _allowed_device_names (line 2306) | def _allowed_device_names(self) -> List[str]:
method _supports_single_sample (line 2309) | def _supports_single_sample(self) -> bool:
method _supports_multioutput (line 2312) | def _supports_multioutput(self) -> bool:
class MLP_RTDL_HPO_Classifier (line 2319) | class MLP_RTDL_HPO_Classifier(RealMLPHPOConstructorMixin, AlgInterfaceCl...
method _get_default_params (line 2320) | def _get_default_params(self):
method _create_alg_interface (line 2323) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _allowed_device_names (line 2332) | def _allowed_device_names(self) -> List[str]:
class MLP_RTDL_HPO_Regressor (line 2336) | class MLP_RTDL_HPO_Regressor(RealMLPHPOConstructorMixin, AlgInterfaceReg...
method _get_default_params (line 2337) | def _get_default_params(self):
method _create_alg_interface (line 2340) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _allowed_device_names (line 2349) | def _allowed_device_names(self) -> List[str]:
class MLP_PLR_HPO_Classifier (line 2353) | class MLP_PLR_HPO_Classifier(RealMLPHPOConstructorMixin, AlgInterfaceCla...
method _get_default_params (line 2354) | def _get_default_params(self):
method _create_alg_interface (line 2357) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _allowed_device_names (line 2367) | def _allowed_device_names(self) -> List[str]:
class MLP_PLR_HPO_Regressor (line 2371) | class MLP_PLR_HPO_Regressor(RealMLPHPOConstructorMixin, AlgInterfaceRegr...
method _get_default_params (line 2372) | def _get_default_params(self):
method _create_alg_interface (line 2375) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _allowed_device_names (line 2385) | def _allowed_device_names(self) -> List[str]:
class Resnet_RTDL_HPO_Classifier (line 2389) | class Resnet_RTDL_HPO_Classifier(RealMLPHPOConstructorMixin, AlgInterfac...
method _get_default_params (line 2390) | def _get_default_params(self):
method _create_alg_interface (line 2393) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _allowed_device_names (line 2402) | def _allowed_device_names(self) -> List[str]:
class Resnet_RTDL_HPO_Regressor (line 2406) | class Resnet_RTDL_HPO_Regressor(RealMLPHPOConstructorMixin, AlgInterface...
method _get_default_params (line 2407) | def _get_default_params(self):
method _create_alg_interface (line 2410) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _allowed_device_names (line 2419) | def _allowed_device_names(self) -> List[str]:
class FTT_HPO_Classifier (line 2423) | class FTT_HPO_Classifier(RealMLPHPOConstructorMixin, AlgInterfaceClassif...
method _get_default_params (line 2424) | def _get_default_params(self):
method _create_alg_interface (line 2427) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _allowed_device_names (line 2436) | def _allowed_device_names(self) -> List[str]:
class FTT_HPO_Regressor (line 2440) | class FTT_HPO_Regressor(RealMLPHPOConstructorMixin, AlgInterfaceRegressor):
method _get_default_params (line 2441) | def _get_default_params(self):
method _create_alg_interface (line 2444) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _allowed_device_names (line 2453) | def _allowed_device_names(self) -> List[str]:
class TabR_HPO_Classifier (line 2457) | class TabR_HPO_Classifier(RealMLPHPOConstructorMixin, AlgInterfaceClassi...
method _get_default_params (line 2458) | def _get_default_params(self):
method _create_alg_interface (line 2461) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _allowed_device_names (line 2470) | def _allowed_device_names(self) -> List[str]:
class TabR_HPO_Regressor (line 2474) | class TabR_HPO_Regressor(RealMLPHPOConstructorMixin, AlgInterfaceRegress...
method _get_default_params (line 2475) | def _get_default_params(self):
method _create_alg_interface (line 2478) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _allowed_device_names (line 2487) | def _allowed_device_names(self) -> List[str]:
class Ensemble_TD_Classifier (line 2493) | class Ensemble_TD_Classifier(AlgInterfaceClassifier):
method __init__ (line 2494) | def __init__(self, device: Optional[str] = None, random_state: Optiona...
method _create_alg_interface (line 2513) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _allowed_device_names (line 2539) | def _allowed_device_names(self) -> List[str]:
class Ensemble_TD_Regressor (line 2543) | class Ensemble_TD_Regressor(AlgInterfaceRegressor):
method __init__ (line 2544) | def __init__(self, device: Optional[str] = None, random_state: Optiona...
method _create_alg_interface (line 2560) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _allowed_device_names (line 2582) | def _allowed_device_names(self) -> List[str]:
class EnsembleHPOConstructorMixin (line 2586) | class EnsembleHPOConstructorMixin:
method __init__ (line 2587) | def __init__(self, device: Optional[str] = None, random_state: Optiona...
class Ensemble_HPO_Classifier (line 2638) | class Ensemble_HPO_Classifier(EnsembleHPOConstructorMixin, AlgInterfaceC...
method _create_alg_interface (line 2639) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _allowed_device_names (line 2670) | def _allowed_device_names(self) -> List[str]:
class Ensemble_HPO_Regressor (line 2674) | class Ensemble_HPO_Regressor(EnsembleHPOConstructorMixin, AlgInterfaceRe...
method _create_alg_interface (line 2675) | def _create_alg_interface(self, n_cv: int) -> AlgInterface:
method _allowed_device_names (line 2703) | def _allowed_device_names(self) -> List[str]:
FILE: pytabkit/models/torch_utils.py
function get_available_device_names (line 7) | def get_available_device_names() -> List['str']:
function seeded_randperm (line 14) | def seeded_randperm(n, device, seed):
function permute_idxs (line 21) | def permute_idxs(idxs, seed):
function batch_randperm (line 25) | def batch_randperm(n_batch, n, device='cpu'):
function gauss_cdf (line 33) | def gauss_cdf(x):
class ClampWithIdentityGradientFunc (line 37) | class ClampWithIdentityGradientFunc(torch.autograd.Function):
method forward (line 39) | def forward(ctx, input: torch.Tensor, low: torch.Tensor, high: torch.T...
method backward (line 43) | def backward(ctx, grad_output: torch.Tensor):
function clamp_with_identity_gradient_func (line 47) | def clamp_with_identity_gradient_func(x, low, high):
function cat_if_necessary (line 51) | def cat_if_necessary(tensors: List[torch.Tensor], dim: int):
function hash_tensor (line 64) | def hash_tensor(tensor: torch.Tensor) -> int:
function torch_np_quantile (line 72) | def torch_np_quantile(tensor: torch.Tensor, q: float, dim: int, keepdim:...
function _cuda_in_use (line 93) | def _cuda_in_use() -> bool:
class TorchTimer (line 104) | class TorchTimer:
method __init__ (line 121) | def __init__(self, use_cuda: Optional[bool] = None, record_history: bo...
method _do_cuda_sync (line 138) | def _do_cuda_sync(self) -> bool:
method __enter__ (line 148) | def __enter__(self):
method __exit__ (line 152) | def __exit__(self, exc_type, exc_val, exc_tb):
method start (line 157) | def start(self):
method stop (line 162) | def stop(self):
function get_available_memory_gb (line 173) | def get_available_memory_gb(device: Union[str, torch.device]) -> float:
FILE: pytabkit/models/training/auc_mu.py
function auc_mu_impl (line 26) | def auc_mu_impl(y_true, y_score, A=None, W=None):
FILE: pytabkit/models/training/coord.py
class HyperparamManager (line 8) | class HyperparamManager:
class HyperGetter (line 9) | class HyperGetter:
method __init__ (line 10) | def __init__(self, tc: 'HyperparamManager', hyper_name: str, base_va...
method __call__ (line 16) | def __call__(self):
method __init__ (line 20) | def __init__(self, **config):
method get_more_info_dict (line 30) | def get_more_info_dict(self) -> Dict:
method _find_pattern (line 33) | def _find_pattern(self, d: dict, scope):
method register_hyper (line 43) | def register_hyper(self, name: str, scope, default=None, default_sched...
method get_hyper_sched_values (line 70) | def get_hyper_sched_values(self):
method update_hyper_sched_values (line 74) | def update_hyper_sched_values(self):
method add_reg_term (line 81) | def add_reg_term(self, loss):
method update_hypers (line 84) | def update_hypers(self, learner):
FILE: pytabki
Condensed preview — 157 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (2,212K chars).
[
{
"path": ".github/workflows/testing.yml",
"chars": 965,
"preview": "name: 'test'\n\non:\n push:\n branches:\n - \"main\"\n - \"dev\"\n pull_request:\n branches:\n - '*'\n\njobs:\n "
},
{
"path": ".gitignore",
"chars": 333,
"preview": "*.pyc\n*.pdf\n*.zip\n*.ckpt\n\nexperiments/*/\nexperiments/trace.json\n!experiments/meta_hpo\n!experiments/prototypes\npublic_exp"
},
{
"path": ".readthedocs.yaml",
"chars": 1122,
"preview": "# Read the Docs configuration file for Sphinx projects\n# See https://docs.readthedocs.io/en/stable/config-file/v2.html f"
},
{
"path": "LICENSE.txt",
"chars": 11357,
"preview": " Apache License\n Version 2.0, January 2004\n "
},
{
"path": "README.md",
"chars": 18721,
"preview": "[](https://colab.research.google.com/github/dh"
},
{
"path": "docs/Makefile",
"chars": 638,
"preview": "# Minimal makefile for Sphinx documentation\n#\n\n# You can set these variables from the command line, and also\n# from the "
},
{
"path": "docs/make.bat",
"chars": 804,
"preview": "@ECHO OFF\r\n\r\npushd %~dp0\r\n\r\nREM Command file for Sphinx documentation\r\n\r\nif \"%SPHINXBUILD%\" == \"\" (\r\n\tset SPHINXBUILD=sp"
},
{
"path": "docs/requirements.txt",
"chars": 498,
"preview": "adjustText>=1.0\nautorank>=1.0\ncatboost>=1.2\ndask[dataframe]>=2023\ndill\nfire\nlightgbm>=4.1\nmatplotlib>=3.0\nmsgpack>=1.0\nm"
},
{
"path": "docs/source/bench/00_installation.md",
"chars": 1556,
"preview": "# Overview and Installation of the Benchmarking code\n\nOur benchmarking code contains several features:\n\n- Automatic data"
},
{
"path": "docs/source/bench/01_running_the_benchmark.md",
"chars": 5086,
"preview": "# Running the benchmark\n\n## Configuration of data paths\n\nThe paths for storing data and results are configured\nthrough t"
},
{
"path": "docs/source/bench/02_stored_data.md",
"chars": 3657,
"preview": "# Data format\n\nHere, we describe how the main data is stored \ninside the main data folder configured in the `tab_bench.d"
},
{
"path": "docs/source/bench/03_code.md",
"chars": 2434,
"preview": "# Code structure\n\n## Algorithm wrappers\n\nTo run methods in `tab_bench`, one needs to \nprovide them as a subclass of `tab"
},
{
"path": "docs/source/bench/adding_models.md",
"chars": 1066,
"preview": "# Adding your own models to the benchmark\n\nTo run your own models,\n- implement an `AlgInterface` subclass. There are num"
},
{
"path": "docs/source/bench/download_results.md",
"chars": 1693,
"preview": "# Downloading the benchmark results\n\nThe benchmark data (as well as the code)\nis archived at [DaRUS](https://doi.org/10."
},
{
"path": "docs/source/bench/refine_then_calibrate.md",
"chars": 1252,
"preview": "# Reproducing results of \"Rethinking Early Stopping: Refine, Then Calibrate\"\n\nHere, we document how to reproduce results"
},
{
"path": "docs/source/bench/using_the_scheduler.md",
"chars": 3053,
"preview": "# Using the scheduler\n\n`pytabkit` includes a flexible scheduler that can schedule jobs within python using `ray` and `mu"
},
{
"path": "docs/source/conf.py",
"chars": 1621,
"preview": "# Configuration file for the Sphinx documentation builder.\n#\n# For the full list of built-in configuration values, see t"
},
{
"path": "docs/source/index.rst",
"chars": 783,
"preview": "Welcome to PyTabKit's documentation!\n======================================\n\n.. toctree::\n :maxdepth: 2\n :caption: C"
},
{
"path": "docs/source/models/00_overview.md",
"chars": 4225,
"preview": "# Overview of the `models` part\n\n## Scikit-learn interfaces\n\nWe provide scikit-learn interfaces for various methods in \n"
},
{
"path": "docs/source/models/01_sklearn_interfaces.rst",
"chars": 4590,
"preview": "Scikit-learn interfaces\n=======================\n\nWe provide scikit-learn interfaces for numerous methods in\n``pytabkit.m"
},
{
"path": "docs/source/models/02_hpo.md",
"chars": 5667,
"preview": "# Hyperparameter optimization\n\nThis is a guide how to perform hyperparameter optimization (HPO) \nto get the best results"
},
{
"path": "docs/source/models/03_training_implementation.md",
"chars": 4326,
"preview": "# Training directly with PyTorch Lightning\n\n## Using PyTorch Lightning\n\nThe TabNN models are implemented using [Pytorch "
},
{
"path": "docs/source/models/examples.md",
"chars": 2349,
"preview": "# Examples\n\n## Refitting RealMLP on train+val data using the best epoch from a previous run\n\nYou can refit RealMLP by si"
},
{
"path": "docs/source/models/nn_classes.md",
"chars": 6187,
"preview": "# NN implementation\n\nWhile RealMLP is implemented in PyTorch, \nwe extend the conventional `nn.Module` logic. \nTraditiona"
},
{
"path": "docs/source/models/quantile_reg.md",
"chars": 788,
"preview": "# (Multi)quantile regression with RealMLP\n\nRealMLP supports multiquantile regression, for example by using\n```python\nfro"
},
{
"path": "examples/tutorial_notebook.ipynb",
"chars": 9905,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"metadata\": {\n \"id\": \"enZVuzCHCy1n\"\n },\n \"sou"
},
{
"path": "original_requirements/conda_env_2024_06_25.yml",
"chars": 7670,
"preview": "name: tab_bench_venv_3\nchannels:\n - pytorch\n - nvidia\n - defaults\ndependencies:\n - _libgcc_mutex=0.1\n - _openmp_mut"
},
{
"path": "original_requirements/conda_env_2024_10_28.yml",
"chars": 7838,
"preview": "name: tab_bench_conda\nchannels:\n - pytorch\n - nvidia\n - defaults\ndependencies:\n - _libgcc_mutex=0.1\n - _openmp_mute"
},
{
"path": "original_requirements/conda_env_2025_01_15.yml",
"chars": 5194,
"preview": "name: probclass\nchannels:\n - pytorch\n - nvidia\n - defaults\ndependencies:\n - _libgcc_mutex=0.1\n - _openmp_mutex=5.1\n"
},
{
"path": "original_requirements/requirements_2024_06_25.txt",
"chars": 4314,
"preview": "adjustText==1.0.4\naiohttp==3.9.1\naiosignal==1.3.1\nannotated-types==0.6.0\nargon2-cffi==23.1.0\nargon2-cffi-bindings==21.2."
},
{
"path": "pyproject.toml",
"chars": 5616,
"preview": "[build-system]\nrequires = [\"hatchling>=1.26.1\"] # https://github.com/pypa/hatch/issues/1818\nbuild-backend = \"hatchling."
},
{
"path": "pytabkit/__about__.py",
"chars": 119,
"preview": "# SPDX-FileCopyrightText: 2024-present David Holzmüller\n#\n# SPDX-License-Identifier: Apache-2.0\n\n__version__ = \"1.7.3\"\n"
},
{
"path": "pytabkit/__init__.py",
"chars": 49,
"preview": "from .models.sklearn.sklearn_interfaces import *\n"
},
{
"path": "pytabkit/bench/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "pytabkit/bench/alg_wrappers/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "pytabkit/bench/alg_wrappers/general.py",
"chars": 3701,
"preview": "from pathlib import Path\nfrom typing import List, Dict, Optional\n\nfrom pytabkit.bench.data.tasks import TaskPackage, Tas"
},
{
"path": "pytabkit/bench/alg_wrappers/interface_wrappers.py",
"chars": 30352,
"preview": "import shutil\nfrom pathlib import Path\nfrom typing import Callable, List, Optional, Dict\n\nimport torch\n\nfrom pytabkit.be"
},
{
"path": "pytabkit/bench/data/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "pytabkit/bench/data/common.py",
"chars": 468,
"preview": "\nclass TaskSource:\n UCI_BIN_CLASS = 'uci-bin-class'\n UCI_MULTI_CLASS = 'uci-multi-class'\n UCI_REGRESSION = 'uci"
},
{
"path": "pytabkit/bench/data/get_uci.py",
"chars": 143346,
"preview": "#!/usr/bin/python3\nimport os\nimport shutil\nimport ssl\n\nimport pandas\n\nfrom pytabkit.bench.data.paths import Paths\nfrom p"
},
{
"path": "pytabkit/bench/data/import_talent_benchmark.py",
"chars": 6209,
"preview": "from pathlib import Path\nfrom typing import Optional\n\nimport numpy as np\nimport pandas as pd\n\nfrom pytabkit.bench.data.i"
},
{
"path": "pytabkit/bench/data/import_tasks.py",
"chars": 16946,
"preview": "from typing import Union, Optional, List, Dict\n\nimport sklearn.model_selection\n\nimport torch\nfrom pathlib import Path\nim"
},
{
"path": "pytabkit/bench/data/paths.py",
"chars": 5391,
"preview": "import os\nimport uuid\nfrom pathlib import Path\nfrom typing import Optional\n\nfrom pytabkit.models import utils\nimport shu"
},
{
"path": "pytabkit/bench/data/tasks.py",
"chars": 12090,
"preview": "from typing import Dict, List, Optional\n\nfrom pytabkit.bench.data.common import SplitType\nfrom pytabkit.bench.data.paths"
},
{
"path": "pytabkit/bench/data/uci_file_ops.py",
"chars": 30747,
"preview": "\n\n\nimport os as os\nimport re as re\nimport csv as csv\nimport math as math\nfrom pathlib import Path\n\nimport pandas as pand"
},
{
"path": "pytabkit/bench/eval/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "pytabkit/bench/eval/analysis.py",
"chars": 14221,
"preview": "from typing import Optional, Callable, Tuple, Dict, List, Union\n\nimport numpy as np\nimport scipy\n\nfrom pytabkit.bench.da"
},
{
"path": "pytabkit/bench/eval/colors.py",
"chars": 1546,
"preview": "from typing import List, Tuple, Callable\n\n\ndef bilin_int(x: float, values: List[Tuple[float, float]]) -> float:\n # in"
},
{
"path": "pytabkit/bench/eval/evaluation.py",
"chars": 32782,
"preview": "import distutils.command.build_ext\nfrom typing import List, Dict, Any, Tuple, Optional, Callable, Union\n\nimport numpy as"
},
{
"path": "pytabkit/bench/eval/plotting.py",
"chars": 70654,
"preview": "import copy\nfrom pathlib import Path\nfrom typing import List, Dict, Optional, Tuple, Callable\n\nimport matplotlib\nimport "
},
{
"path": "pytabkit/bench/eval/runtimes.py",
"chars": 2224,
"preview": "from typing import Dict\n\nimport numpy as np\n\nfrom pytabkit.bench.data.paths import Paths\nfrom pytabkit.bench.data.tasks "
},
{
"path": "pytabkit/bench/eval/tables.py",
"chars": 24989,
"preview": "from typing import List, Optional\n\nimport numpy as np\n\nfrom pytabkit.bench.data.paths import Paths\nfrom pytabkit.bench.d"
},
{
"path": "pytabkit/bench/run/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "pytabkit/bench/run/results.py",
"chars": 5873,
"preview": "from pathlib import Path\nfrom typing import Dict, List\n\nimport numpy as np\n\nfrom pytabkit.bench.data.paths import Paths\n"
},
{
"path": "pytabkit/bench/run/task_execution.py",
"chars": 17562,
"preview": "import shutil\nimport traceback\nfrom typing import List, Optional\n\nimport numpy as np\n\nfrom pytabkit.bench.alg_wrappers.g"
},
{
"path": "pytabkit/bench/scheduling/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "pytabkit/bench/scheduling/execution.py",
"chars": 11079,
"preview": "import os\n\nimport time\nimport multiprocessing as mp\nimport traceback\nfrom typing import Tuple, Optional, List\n\nimport nu"
},
{
"path": "pytabkit/bench/scheduling/jobs.py",
"chars": 5020,
"preview": "import time\nimport traceback\nimport sys\nfrom typing import Optional\n\nfrom pytabkit.bench.scheduling.resources import Nod"
},
{
"path": "pytabkit/bench/scheduling/resource_manager.py",
"chars": 3210,
"preview": "import copy\nimport enum\nimport time\nfrom typing import Optional\n\nfrom pytabkit.bench.scheduling.jobs import AbstractJob,"
},
{
"path": "pytabkit/bench/scheduling/resources.py",
"chars": 9296,
"preview": "from typing import Optional, List\n\nimport numpy as np\nimport copy\n\nfrom pytabkit.models.alg_interfaces.base import Inter"
},
{
"path": "pytabkit/bench/scheduling/schedulers.py",
"chars": 25305,
"preview": "import copy\nimport sys\nimport time\nfrom typing import List, Dict, Union\n\nimport numpy as np\n\nfrom pytabkit.bench.schedul"
},
{
"path": "pytabkit/models/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "pytabkit/models/alg_interfaces/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "pytabkit/models/alg_interfaces/alg_interfaces.py",
"chars": 28484,
"preview": "import functools\nimport warnings\nfrom pathlib import Path\nfrom typing import List, Tuple, Any, Optional, Dict\n\nimport to"
},
{
"path": "pytabkit/models/alg_interfaces/autogluon_model_interfaces.py",
"chars": 8019,
"preview": "import copy\nimport os\nfrom typing import List, Any, Optional\n\nimport numpy as np\nimport pandas as pd\nimport torch\nfrom p"
},
{
"path": "pytabkit/models/alg_interfaces/base.py",
"chars": 5462,
"preview": "from typing import Optional, List\n\nimport numpy as np\nimport torch\n\n\nclass SplitIdxs:\n \"\"\"\n Represents multiple tr"
},
{
"path": "pytabkit/models/alg_interfaces/calibration.py",
"chars": 6367,
"preview": "import traceback\nfrom pathlib import Path\nfrom typing import List, Optional, Tuple, Dict, Any, Callable\n\nimport numpy as"
},
{
"path": "pytabkit/models/alg_interfaces/catboost_interfaces.py",
"chars": 40972,
"preview": "import copy\nimport warnings\nfrom pathlib import Path\nfrom typing import Optional, Dict, Any, List, Tuple, Union\n\nimport "
},
{
"path": "pytabkit/models/alg_interfaces/ensemble_interfaces.py",
"chars": 16187,
"preview": "import copy\nimport time\nfrom pathlib import Path\nfrom typing import List, Optional, Dict\n\nimport numpy as np\nimport torc"
},
{
"path": "pytabkit/models/alg_interfaces/lightgbm_interfaces.py",
"chars": 39463,
"preview": "import copy\nfrom pathlib import Path\nfrom typing import Optional, Dict, Tuple, Any, List\n\nimport numpy as np\nimport torc"
},
{
"path": "pytabkit/models/alg_interfaces/nn_interfaces.py",
"chars": 59875,
"preview": "import copy\nimport warnings\nfrom pathlib import Path\nfrom typing import List, Optional, Dict, Any, Union\n\nimport numpy a"
},
{
"path": "pytabkit/models/alg_interfaces/other_interfaces.py",
"chars": 69487,
"preview": "import os\nfrom typing import Any, List, Optional\n\nimport numpy as np\nimport pandas as pd\nimport torch\nfrom sklearn.compo"
},
{
"path": "pytabkit/models/alg_interfaces/resource_computation.py",
"chars": 16343,
"preview": "import numbers\nimport time\nfrom collections.abc import Callable\nfrom typing import Dict, Union, List, Any, Tuple, Option"
},
{
"path": "pytabkit/models/alg_interfaces/resource_params.py",
"chars": 29301,
"preview": "class ResourceParams:\n # determined using estimate_resource_params.py\n cb_class_time = {'': 1.1074866100217955, 'd"
},
{
"path": "pytabkit/models/alg_interfaces/rtdl_interfaces.py",
"chars": 29938,
"preview": "import copy\nfrom typing import List, Any, Optional, Dict, Tuple\nfrom pathlib import Path\n\nimport numpy as np\nimport pand"
},
{
"path": "pytabkit/models/alg_interfaces/sub_split_interfaces.py",
"chars": 19208,
"preview": "import copy\nimport random\nfrom pathlib import Path\nfrom typing import List, Optional, Dict, Any, Tuple\n\nimport numpy as "
},
{
"path": "pytabkit/models/alg_interfaces/tabm_interface.py",
"chars": 23183,
"preview": "import functools\nimport math\nimport random\nfrom pathlib import Path\n\nimport scipy\nimport sklearn\nimport torch\nimport num"
},
{
"path": "pytabkit/models/alg_interfaces/tabr_interface.py",
"chars": 23872,
"preview": "from typing import List, Any, Optional, Dict, Tuple\nfrom pathlib import Path\n\nimport numpy as np\nimport pandas as pd\nimp"
},
{
"path": "pytabkit/models/alg_interfaces/xgboost_interfaces.py",
"chars": 36025,
"preview": "import copy\nfrom pathlib import Path\nfrom typing import Optional, Dict, Any, Tuple, List, Union\n\nimport numpy as np\nimpo"
},
{
"path": "pytabkit/models/alg_interfaces/xrfm_interfaces.py",
"chars": 25769,
"preview": "import contextlib\nimport random\nfrom pathlib import Path\nfrom typing import Optional, List, Any, Tuple, Dict\n\nimport num"
},
{
"path": "pytabkit/models/data/__init__.py",
"chars": 1,
"preview": "\n"
},
{
"path": "pytabkit/models/data/conversion.py",
"chars": 5574,
"preview": "import warnings\nfrom typing import Union, List, Optional\n\nimport numpy as np\nimport pandas as pd\nimport torch\nfrom panda"
},
{
"path": "pytabkit/models/data/data.py",
"chars": 9984,
"preview": "import math\nfrom typing import Optional, Union, List, Dict, Tuple\n\nimport numpy as np\nimport pandas as pd\nimport torch\n\n"
},
{
"path": "pytabkit/models/data/nested_dict.py",
"chars": 2384,
"preview": "from typing import Union, List, Tuple, Dict\n\nfrom pytabkit.models import utils\n\n\nclass NestedDict:\n \"\"\"\n Dictionar"
},
{
"path": "pytabkit/models/data/splits.py",
"chars": 6126,
"preview": "import math\nfrom typing import Tuple, List, Optional\n\nimport torch\n\nfrom pytabkit.models import utils\nfrom pytabkit.mode"
},
{
"path": "pytabkit/models/hyper_opt/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "pytabkit/models/hyper_opt/coord_opt.py",
"chars": 12881,
"preview": "from pathlib import Path\n\nimport numpy as np\nfrom typing import Union, Callable, Any, Optional, Dict, Tuple\n\nfrom pytabk"
},
{
"path": "pytabkit/models/hyper_opt/hyper_optimizers.py",
"chars": 9637,
"preview": "import time\nfrom pathlib import Path\nfrom typing import Callable, Tuple, Any, Dict, Union, Optional\n\nimport numpy as np\n"
},
{
"path": "pytabkit/models/nn_models/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "pytabkit/models/nn_models/activations.py",
"chars": 4357,
"preview": "import torch\nimport torch.nn.functional as F\nfrom typing import Dict\n\n# ------ from fastai2\nfrom torch.jit import script"
},
{
"path": "pytabkit/models/nn_models/base.py",
"chars": 44144,
"preview": "from pytabkit.models import torch_utils, utils\nfrom pytabkit.models.data.data import TensorInfo, DictDataset\nfrom pytabk"
},
{
"path": "pytabkit/models/nn_models/categorical.py",
"chars": 22693,
"preview": "from typing import Iterable, List, Dict, Tuple, Any, Callable, Optional, Union\n\nimport numpy as np\nimport torch\nimport t"
},
{
"path": "pytabkit/models/nn_models/models.py",
"chars": 15968,
"preview": "import copy\nimport functools\nfrom typing import Dict, Tuple\n\nimport numpy as np\nimport torch\nfrom sklearn.preprocessing "
},
{
"path": "pytabkit/models/nn_models/nn.py",
"chars": 48672,
"preview": "import copy\nfrom typing import Dict\n\nimport numpy as np\nimport torch\nimport torch.nn.functional as F\nfrom sklearn.ensemb"
},
{
"path": "pytabkit/models/nn_models/pipeline.py",
"chars": 12576,
"preview": "from typing import List, Dict, Union\n\nimport sklearn\nimport torch\nfrom sklearn.base import BaseEstimator, TransformerMix"
},
{
"path": "pytabkit/models/nn_models/rtdl_num_embeddings.py",
"chars": 28432,
"preview": "# taken from https://github.com/yandex-research/rtdl-num-embeddings/blob/main/package/rtdl_num_embeddings.py\n\"\"\"On Embed"
},
{
"path": "pytabkit/models/nn_models/rtdl_resnet.py",
"chars": 41592,
"preview": "import math\nimport numbers\nimport typing as ty\n\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimpor"
},
{
"path": "pytabkit/models/nn_models/tabm.py",
"chars": 21909,
"preview": "# License: https://github.com/yandex-research/tabm/blob/main/LICENSE\n\n# NOTE\n# The minimum required versions of the depe"
},
{
"path": "pytabkit/models/nn_models/tabr.py",
"chars": 25644,
"preview": "import os\nimport inspect\nimport warnings\nimport math\nfrom functools import partial\n\nimport numpy as np\nimport torch\nfrom"
},
{
"path": "pytabkit/models/nn_models/tabr_context_freeze.py",
"chars": 27891,
"preview": "import os\nimport inspect\nimport warnings\nimport math\nfrom functools import partial\n\nimport torch\nfrom torch import Tenso"
},
{
"path": "pytabkit/models/nn_models/tabr_lib.py",
"chars": 33311,
"preview": "import math\nimport inspect\nimport warnings\nimport dataclasses\nfrom typing import Any, Callable, Optional, Union, cast, I"
},
{
"path": "pytabkit/models/optim/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "pytabkit/models/optim/adopt.py",
"chars": 18446,
"preview": "# taken from https://github.com/iShohei220/adopt/blob/main/adopt.py\n# Apache 2.0 license\n# requires torch >= 2.4\n\n# mypy"
},
{
"path": "pytabkit/models/optim/optimizers.py",
"chars": 7834,
"preview": "import warnings\nfrom collections import defaultdict\nfrom copy import deepcopy\nfrom itertools import chain\nfrom typing im"
},
{
"path": "pytabkit/models/optim/scheduling_adam.py",
"chars": 6250,
"preview": "import torch\nfrom torch.optim import Optimizer\nimport math\n\n\n# modification of normal adam to properly handle varying b"
},
{
"path": "pytabkit/models/sklearn/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "pytabkit/models/sklearn/default_params.py",
"chars": 14995,
"preview": "import numpy as np\n\nfrom pytabkit.models import utils\n\n\nclass DefaultParams:\n RealMLP_TD_CLASS = dict(\n hidden"
},
{
"path": "pytabkit/models/sklearn/sklearn_base.py",
"chars": 25371,
"preview": "import copy\nfrom pathlib import Path\nfrom typing import Dict, Any, Optional, Union, List\nfrom warnings import warn\nfrom "
},
{
"path": "pytabkit/models/sklearn/sklearn_interfaces.py",
"chars": 139895,
"preview": "import pathlib\nfrom typing import Optional, Any, Union, List, Dict, Literal\n\nimport numpy as np\n\nfrom pytabkit.models im"
},
{
"path": "pytabkit/models/torch_utils.py",
"chars": 7322,
"preview": "from typing import List, Union, Optional\n\nimport torch\nimport numpy as np\n\n\ndef get_available_device_names() -> List['st"
},
{
"path": "pytabkit/models/training/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "pytabkit/models/training/auc_mu.py",
"chars": 5287,
"preview": "# taken from https://github.com/kleimanr/auc_mu/blob/master/auc_mu.py\n\n\"\"\"\nComputation of the measure 'AUC Mu'. This mea"
},
{
"path": "pytabkit/models/training/coord.py",
"chars": 3937,
"preview": "from typing import Dict\n\nfrom pytabkit.models.training.scheduling import ConstantSchedule, get_schedule\n\n# layers are cr"
},
{
"path": "pytabkit/models/training/lightning_callbacks.py",
"chars": 8851,
"preview": "from typing import List, Any, Optional, Union, Dict\n\nimport numpy as np\nimport torch\n\ntry:\n from lightning.pytorch.ca"
},
{
"path": "pytabkit/models/training/lightning_modules.py",
"chars": 16199,
"preview": "from pytabkit.models.training.lightning_callbacks import ModelCheckpointCallback\n\ntry:\n import lightning.pytorch as p"
},
{
"path": "pytabkit/models/training/logging.py",
"chars": 624,
"preview": "class Logger:\n def __init__(self, verbosity_level):\n # higher verbosity level means more verbose\n self."
},
{
"path": "pytabkit/models/training/metrics.py",
"chars": 27846,
"preview": "import traceback\nfrom typing import Dict, Any, List, Optional, Tuple, Callable\n\nimport numpy as np\nfrom sklearn.metrics "
},
{
"path": "pytabkit/models/training/nn_creator.py",
"chars": 12780,
"preview": "import functools\nfrom typing import List, Optional, Tuple, Callable, Dict, Any\n\nimport numpy as np\nimport torch\n\nfrom py"
},
{
"path": "pytabkit/models/training/scheduling.py",
"chars": 19523,
"preview": "import numpy as np\nimport math\n\n\nclass LearnerProgress:\n def __init__(self):\n self.epoch = 0\n self.epoc"
},
{
"path": "pytabkit/models/utils.py",
"chars": 18095,
"preview": "import multiprocessing as mp\nimport os\nimport os.path\nimport heapq\nimport glob\nimport gzip\nimport shutil\nimport timeit\nf"
},
{
"path": "scripts/analyze_hpo_best_params.py",
"chars": 4029,
"preview": "import msgpack_numpy as m\nm.patch()\nimport numbers\nfrom typing import Optional\n\nimport fire\nimport numpy as np\n\nfrom pyt"
},
{
"path": "scripts/analyze_tasks.py",
"chars": 5366,
"preview": "from pathlib import Path\nfrom typing import List, Optional\n\nimport fire\nimport matplotlib.pyplot as plt\nimport numpy as "
},
{
"path": "scripts/check_missing_values.py",
"chars": 2448,
"preview": "from typing import Optional\n\nimport fire\nimport openml\n\nfrom pytabkit.bench.data.import_tasks import set_openml_cache_di"
},
{
"path": "scripts/copy_algs.py",
"chars": 1262,
"preview": "import shutil\nfrom typing import List\n\nimport fire\n\nfrom pytabkit.bench.data.paths import Paths\n\n\ndef copy_algs_in_paths"
},
{
"path": "scripts/create_plots_and_tables.py",
"chars": 14403,
"preview": "from pytabkit.bench.data.paths import Paths\nfrom pytabkit.bench.data.tasks import TaskCollection\nfrom pytabkit.bench.eva"
},
{
"path": "scripts/create_probclass_plots.py",
"chars": 33978,
"preview": "from typing import Optional, List\n\nimport numpy as np\nimport pandas as pd\nimport torch\nfrom adjustText import adjust_tex"
},
{
"path": "scripts/create_xrfm_ablations_table.py",
"chars": 5140,
"preview": "from typing import List, Optional\n\nimport numpy as np\nfrom pytabkit.bench.run.results import ResultManager\n\nfrom pytabki"
},
{
"path": "scripts/custom_paths.py.default",
"chars": 51,
"preview": "def get_base_folder():\n return 'tab_bench_data'\n"
},
{
"path": "scripts/download_data.py",
"chars": 18954,
"preview": "from typing import Optional\n\nimport fire\n\nfrom pytabkit.bench.data.common import TaskSource\nfrom pytabkit.bench.data.get"
},
{
"path": "scripts/estimate_resource_params.py",
"chars": 18138,
"preview": "import multiprocessing\nimport time\nfrom typing import List, Dict, Any, Callable\n\nimport numpy as np\nimport sklearn\nimpor"
},
{
"path": "scripts/get_sklearn_names.py",
"chars": 655,
"preview": "import importlib\n# get the names of all sklearn interfaces, for exporting them in __all__ to import them from a higher-l"
},
{
"path": "scripts/make_plot_animation.py",
"chars": 8566,
"preview": "from typing import List\n\nfrom pytabkit.bench.eval.plotting import plot_pareto\nfrom pytabkit.bench.data.paths import Path"
},
{
"path": "scripts/meta_hyperopt.py",
"chars": 30505,
"preview": "from typing import Optional, Tuple, Any, Dict\n\nimport numpy as np\n\nfrom pytabkit.bench.alg_wrappers.interface_wrappers i"
},
{
"path": "scripts/move_algs.py",
"chars": 2033,
"preview": "import shutil\nfrom typing import Optional\n\nimport fire\n\nfrom pytabkit.bench.data.paths import Paths\nfrom pytabkit.models"
},
{
"path": "scripts/move_many_algs.py",
"chars": 880,
"preview": "from typing import Optional\n\nimport fire\n\nfrom pytabkit.models import utils\nfrom scripts.move_algs import move_algs\n\n\nde"
},
{
"path": "scripts/print_complete_results.py",
"chars": 908,
"preview": "import fire\n\nfrom pytabkit.bench.data.paths import Paths\nfrom pytabkit.bench.eval.analysis import ResultsTables\nfrom pyt"
},
{
"path": "scripts/print_runtimes.py",
"chars": 876,
"preview": "from pytabkit.bench.data.paths import Paths\nfrom pytabkit.bench.eval.runtimes import get_avg_train_times, get_avg_predic"
},
{
"path": "scripts/ray_slurm_launch.py",
"chars": 5320,
"preview": "# from https://docs.ray.io/en/latest/cluster/examples/slurm-launch.html#slurm-launch\n# slurm-launch.py\n# Usage:\n# python"
},
{
"path": "scripts/ray_slurm_template.sh",
"chars": 2534,
"preview": "#!/bin/bash\n# shellcheck disable=SC2206\n# THIS FILE IS GENERATED BY AUTOMATION SCRIPT! PLEASE REFER TO ORIGINAL SCRIPT!\n"
},
{
"path": "scripts/rename_alg.py",
"chars": 2239,
"preview": "import os\nimport shutil\nfrom pathlib import Path\n\nimport fire\n\nfrom pytabkit.bench.data.paths import Paths\nfrom pytabkit"
},
{
"path": "scripts/rename_tag.py",
"chars": 557,
"preview": "import fire\n\nfrom pytabkit.bench.data.paths import Paths\nfrom pytabkit.models import utils\n\n\ndef rename_tag(old_name: st"
},
{
"path": "scripts/run_evaluation.py",
"chars": 11882,
"preview": "import time\nfrom typing import Optional\n\nimport numpy as np\n\nimport fire\n\nfrom pytabkit.bench.data.common import SplitTy"
},
{
"path": "scripts/run_experiments.py",
"chars": 94094,
"preview": "from typing import Optional, Dict, Any, List\n\nimport numpy as np\n\nfrom pytabkit.bench.data.paths import Paths\nfrom pytab"
},
{
"path": "scripts/run_experiments_unused.py",
"chars": 28054,
"preview": "from typing import List, Optional, Dict, Any\n\nimport numpy as np\n\nfrom pytabkit.bench.alg_wrappers.interface_wrappers im"
},
{
"path": "scripts/run_probclass_experiments.py",
"chars": 16437,
"preview": "import copy\nimport time\nfrom typing import List, Optional, Dict, Any\n\nimport numpy as np\nimport pandas as pd\nimport skle"
},
{
"path": "scripts/run_single_task.py",
"chars": 5463,
"preview": "import time\n\nimport numpy as np\nimport torch\n\nfrom pytabkit.bench.data.paths import Paths\nfrom pytabkit.bench.data.tasks"
},
{
"path": "scripts/run_slurm.py",
"chars": 253,
"preview": "import functools\n\nimport fire\n\nfrom run_experiments import run_gbdt_rs_configs\nfrom pytabkit.bench.data.paths import Pat"
},
{
"path": "scripts/run_time_measurement.py",
"chars": 16192,
"preview": "import random\nimport time\nimport torch\n\nimport numpy as np\nimport sklearn\n\nfrom pytabkit.bench.data.paths import Paths\nf"
},
{
"path": "scripts/run_xrfm_large_ablations.py",
"chars": 8178,
"preview": "import fire\n\nfrom pytabkit.bench.alg_wrappers.interface_wrappers import RandomParamsxRFMInterfaceWrapper\nfrom pytabkit.b"
},
{
"path": "tests/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "tests/test_bench.py",
"chars": 2829,
"preview": "from pathlib import Path\n\nfrom sklearn.datasets import make_classification\nimport torch\n\nfrom pytabkit import XGB_TD_Cla"
},
{
"path": "tests/test_ensemble.py",
"chars": 826,
"preview": "import pytest\nimport sklearn.base\nimport numpy as np\n\nfrom pytabkit import Ensemble_TD_Classifier, Ensemble_TD_Regressor"
},
{
"path": "tests/test_metrics.py",
"chars": 415,
"preview": "import numpy as np\nimport torch\nimport sklearn\n\nfrom pytabkit.models.training.metrics import Metrics\n\n\ndef test_pinball("
},
{
"path": "tests/test_rtdl_nns.py",
"chars": 10972,
"preview": "import numpy as np\nimport pandas as pd\nfrom sklearn.utils.estimator_checks import check_estimator\nfrom sklearn.model_sel"
},
{
"path": "tests/test_sklearn_interfaces.py",
"chars": 2517,
"preview": "import pytest\nfrom sklearn.utils.estimator_checks import parametrize_with_checks\n\nfrom pytabkit import XRFM_D_Classifier"
},
{
"path": "tests/test_tabr.py",
"chars": 6013,
"preview": "import numpy as np\nimport pandas as pd\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.metrics import "
},
{
"path": "tests/test_variants.py",
"chars": 3045,
"preview": "import pytest\nimport numpy as np\nimport pandas as pd\nimport sklearn\nfrom sklearn.base import ClassifierMixin\nimport torc"
}
]
About this extraction
This page contains the full source code of the dholzmueller/pytabkit GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 157 files (2.0 MB), approximately 544.9k tokens, and a symbol index with 2572 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.