[
  {
    "path": ".gitignore",
    "content": ".DS_Store\n\n# Created by https://www.gitignore.io/api/python\n\n### Python ###\n# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribution / packaging\n.Python\nbuild/\ndevelop-eggs/\ndist/\ndownloads/\neggs/\n.eggs/\nlib/\nlib64/\nparts/\nsdist/\nvar/\nwheels/\n*.egg-info/\n.installed.cfg\n*.egg\n\n# PyInstaller\n#  Usually these files are written by a python script from a template\n#  before PyInstaller builds the exe, so as to inject date/other infos into it.\n*.manifest\n*.spec\n\n# Installer logs\npip-log.txt\npip-delete-this-directory.txt\n\n# Unit test / coverage reports\nhtmlcov/\n.tox/\n.coverage\n.coverage.*\n.cache\nnosetests.xml\ncoverage.xml\n*.cover\n.hypothesis/\n\n# Translations\n*.mo\n*.pot\n\n# Django stuff:\n*.log\nlocal_settings.py\n\n# Flask stuff:\ninstance/\n.webassets-cache\n\n# Scrapy stuff:\n.scrapy\n\n# Sphinx documentation\ndocs/_build/\n\n# PyBuilder\ntarget/\n\n# Jupyter Notebook\n.ipynb_checkpoints\n\n# pyenv\n.python-version\n\n# celery beat schedule file\ncelerybeat-schedule\n\n# SageMath parsed files\n*.sage.py\n\n# Environments\n.env\n.venv\nenv/\nvenv/\nENV/\nenv.bak/\nvenv.bak/\n\n# Spyder project settings\n.spyderproject\n.spyproject\n\n# Rope project settings\n.ropeproject\n\n# mkdocs documentation\n/site\n\n# mypy\n.mypy_cache/\n\n# End of https://www.gitignore.io/api/python\n"
  },
  {
    "path": "LICENSE",
    "content": "MIT License\n\nCopyright (c) 2021 Marco Cerliani\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n"
  },
  {
    "path": "README.md",
    "content": "# shap-hypetune\nA python package for simultaneous Hyperparameters Tuning and Features Selection for Gradient Boosting Models.\n\n![shap-hypetune diagram](https://raw.githubusercontent.com/cerlymarco/shap-hypetune/master/imgs/shap-hypetune-diagram.png#center)\n\n## Overview\nHyperparameters tuning and features selection are two common steps in every machine learning pipeline. Most of the time they are computed separately and independently. This may result in suboptimal performances and in a more time expensive process.\n\nshap-hypetune aims to combine hyperparameters tuning and features selection in a single pipeline optimizing the optimal number of features while searching for the optimal parameters configuration. Hyperparameters Tuning or Features Selection can also be carried out as standalone operations.\n\n**shap-hypetune main features:**\n\n- designed for gradient boosting models, as LGBModel or XGBModel;\n- developed to be integrable with the scikit-learn ecosystem;\n- effective in both classification or regression tasks;\n- customizable training process, supporting early-stopping and all the other fitting options available in the standard algorithms api;\n- ranking feature selection algorithms: Recursive Feature Elimination (RFE); Recursive Feature Addition (RFA); or Boruta;\n- classical boosting based feature importances or SHAP feature importances (the later can be computed also on the eval_set);\n- apply grid-search, random-search, or bayesian-search (from hyperopt);\n- parallelized computations with joblib.\n\n## Installation\n```shell\npip install --upgrade shap-hypetune\n```\nlightgbm, xgboost are not needed requirements. The module depends only on NumPy, shap, scikit-learn and hyperopt. Python 3.6 or above is supported.\n\n## Media\n- [SHAP for Feature Selection and HyperParameter Tuning](https://towardsdatascience.com/shap-for-feature-selection-and-hyperparameter-tuning-a330ec0ea104)\n- [Boruta and SHAP for better Feature Selection](https://towardsdatascience.com/boruta-and-shap-for-better-feature-selection-20ea97595f4a)\n- [Recursive Feature Selection: Addition or Elimination?](https://towardsdatascience.com/recursive-feature-selection-addition-or-elimination-755e5d86a791)\n- [Boruta SHAP for Temporal Feature Selection](https://towardsdatascience.com/boruta-shap-for-temporal-feature-selection-96a7840c7713)\n\n## Usage\n```python\nfrom shaphypetune import BoostSearch, BoostRFE, BoostRFA, BoostBoruta\n```\n#### Hyperparameters Tuning\n```python\nBoostSearch(\n    estimator,                              # LGBModel or XGBModel\n    param_grid=None,                        # parameters to be optimized\n    greater_is_better=False,                # minimize or maximize the monitored score\n    n_iter=None,                            # number of sampled parameter configurations\n    sampling_seed=None,                     # the seed used for parameter sampling\n    verbose=1,                              # verbosity mode\n    n_jobs=None                             # number of jobs to run in parallel\n)\n```\n#### Feature Selection (RFE)\n```python\nBoostRFE(  \n    estimator,                              # LGBModel or XGBModel\n    min_features_to_select=None,            # the minimum number of features to be selected  \n    step=1,                                 # number of features to remove at each iteration  \n    param_grid=None,                        # parameters to be optimized  \n    greater_is_better=False,                # minimize or maximize the monitored score  \n    importance_type='feature_importances',  # which importance measure to use: default or shap  \n    train_importance=True,                  # where to compute the shap feature importance  \n    n_iter=None,                            # number of sampled parameter configurations  \n    sampling_seed=None,                     # the seed used for parameter sampling  \n    verbose=1,                              # verbosity mode  \n    n_jobs=None                             # number of jobs to run in parallel  \n)  \n```\n#### Feature Selection (BORUTA)\n```python\nBoostBoruta(\n    estimator,                              # LGBModel or XGBModel\n    perc=100,                               # threshold used to compare shadow and real features\n    alpha=0.05,                             # p-value levels for feature rejection\n    max_iter=100,                           # maximum Boruta iterations to perform\n    early_stopping_boruta_rounds=None,      # maximum iterations without confirming a feature\n    param_grid=None,                        # parameters to be optimized\n    greater_is_better=False,                # minimize or maximize the monitored score\n    importance_type='feature_importances',  # which importance measure to use: default or shap\n    train_importance=True,                  # where to compute the shap feature importance\n    n_iter=None,                            # number of sampled parameter configurations\n    sampling_seed=None,                     # the seed used for parameter sampling\n    verbose=1,                              # verbosity mode\n    n_jobs=None                             # number of jobs to run in parallel\n)\n```\n#### Feature Selection (RFA)\n```python\nBoostRFA(\n    estimator,                              # LGBModel or XGBModel\n    min_features_to_select=None,            # the minimum number of features to be selected\n    step=1,                                 # number of features to remove at each iteration\n    param_grid=None,                        # parameters to be optimized\n    greater_is_better=False,                # minimize or maximize the monitored score\n    importance_type='feature_importances',  # which importance measure to use: default or shap\n    train_importance=True,                  # where to compute the shap feature importance\n    n_iter=None,                            # number of sampled parameter configurations\n    sampling_seed=None,                     # the seed used for parameter sampling\n    verbose=1,                              # verbosity mode\n    n_jobs=None                             # number of jobs to run in parallel\n)\n```\n\nFull examples in the [notebooks folder](https://github.com/cerlymarco/shap-hypetune/tree/main/notebooks).\n"
  },
  {
    "path": "notebooks/LGBM_usage.ipynb",
    "content": "{\"metadata\":{\"kernelspec\":{\"language\":\"python\",\"display_name\":\"Python 3\",\"name\":\"python3\"},\"language_info\":{\"name\":\"python\",\"version\":\"3.7.12\",\"mimetype\":\"text/x-python\",\"codemirror_mode\":{\"name\":\"ipython\",\"version\":3},\"pygments_lexer\":\"ipython3\",\"nbconvert_exporter\":\"python\",\"file_extension\":\".py\"}},\"nbformat_minor\":4,\"nbformat\":4,\"cells\":[{\"cell_type\":\"code\",\"source\":\"import numpy as np\\nimport pandas as pd\\nfrom scipy import stats\\n\\nfrom sklearn.model_selection import train_test_split\\nfrom sklearn.datasets import make_classification, make_regression\\n\\nfrom hyperopt import hp\\nfrom hyperopt import Trials\\n\\nfrom lightgbm import *\\n\\ntry:\\n    from shaphypetune import BoostSearch, BoostBoruta, BoostRFE, BoostRFA\\nexcept:\\n    !pip install --upgrade shap-hypetune\\n    from shaphypetune import BoostSearch, BoostBoruta, BoostRFE, BoostRFA\\n\\nimport warnings\\nwarnings.simplefilter('ignore')\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:46:43.363945Z\",\"iopub.execute_input\":\"2022-01-01T11:46:43.364356Z\",\"iopub.status.idle\":\"2022-01-01T11:46:45.084134Z\",\"shell.execute_reply.started\":\"2022-01-01T11:46:43.364243Z\",\"shell.execute_reply\":\"2022-01-01T11:46:45.083177Z\"},\"trusted\":true},\"execution_count\":1,\"outputs\":[{\"output_type\":\"display_data\",\"data\":{\"text/plain\":\"<IPython.core.display.HTML object>\",\"text/html\":\"<style type='text/css'>\\n.datatable table.frame { margin-bottom: 0; }\\n.datatable table.frame thead { border-bottom: none; }\\n.datatable table.frame tr.coltypes td {  color: #FFFFFF;  line-height: 6px;  padding: 0 0.5em;}\\n.datatable .bool    { background: #DDDD99; }\\n.datatable .object  { background: #565656; }\\n.datatable .int     { background: #5D9E5D; }\\n.datatable .float   { background: #4040CC; }\\n.datatable .str     { background: #CC4040; }\\n.datatable .time    { background: #40CC40; }\\n.datatable .row_index {  background: var(--jp-border-color3);  border-right: 1px solid var(--jp-border-color0);  color: var(--jp-ui-font-color3);  font-size: 9px;}\\n.datatable .frame tbody td { text-align: left; }\\n.datatable .frame tr.coltypes .row_index {  background: var(--jp-border-color0);}\\n.datatable th:nth-child(2) { padding-left: 12px; }\\n.datatable .hellipsis {  color: var(--jp-cell-editor-border-color);}\\n.datatable .vellipsis {  background: var(--jp-layout-color0);  color: var(--jp-cell-editor-border-color);}\\n.datatable .na {  color: var(--jp-cell-editor-border-color);  font-size: 80%;}\\n.datatable .sp {  opacity: 0.25;}\\n.datatable .footer { font-size: 9px; }\\n.datatable .frame_dimensions {  background: var(--jp-border-color3);  border-top: 1px solid var(--jp-border-color0);  color: var(--jp-ui-font-color3);  display: inline-block;  opacity: 0.6;  padding: 1px 10px 1px 5px;}\\n</style>\\n\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"X_clf, y_clf = make_classification(n_samples=6000, n_features=20, n_classes=2, \\n                                   n_informative=4, n_redundant=6, random_state=0)\\n\\nX_clf_train, X_clf_valid, y_clf_train, y_clf_valid = train_test_split(\\n    X_clf, y_clf, test_size=0.3, shuffle=False)\\n\\nX_regr, y_regr = make_classification(n_samples=6000, n_features=20,\\n                                     n_informative=7, random_state=0)\\n\\nX_regr_train, X_regr_valid, y_regr_train, y_regr_valid = train_test_split(\\n    X_regr, y_regr, test_size=0.3, shuffle=False)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:46:45.086875Z\",\"iopub.execute_input\":\"2022-01-01T11:46:45.087123Z\",\"iopub.status.idle\":\"2022-01-01T11:46:45.118700Z\",\"shell.execute_reply.started\":\"2022-01-01T11:46:45.087094Z\",\"shell.execute_reply\":\"2022-01-01T11:46:45.117983Z\"},\"trusted\":true},\"execution_count\":2,\"outputs\":[]},{\"cell_type\":\"code\",\"source\":\"param_grid = {\\n    'learning_rate': [0.2, 0.1],\\n    'num_leaves': [25, 35],\\n    'max_depth': [10, 12]\\n}\\n\\nparam_dist = {\\n    'learning_rate': stats.uniform(0.09, 0.25),\\n    'num_leaves': stats.randint(20,40),\\n    'max_depth': [10, 12]\\n}\\n\\nparam_dist_hyperopt = {\\n    'max_depth': 15 + hp.randint('num_leaves', 5), \\n    'learning_rate': hp.loguniform('learning_rate', np.log(0.01), np.log(0.2)),\\n    'colsample_bytree': hp.uniform('colsample_by_tree', 0.6, 1.0)\\n}\\n\\n\\nregr_lgbm = LGBMRegressor(n_estimators=150, random_state=0, n_jobs=-1)\\nclf_lgbm = LGBMClassifier(n_estimators=150, random_state=0, n_jobs=-1)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:46:45.120073Z\",\"iopub.execute_input\":\"2022-01-01T11:46:45.120376Z\",\"iopub.status.idle\":\"2022-01-01T11:46:45.132838Z\",\"shell.execute_reply.started\":\"2022-01-01T11:46:45.120336Z\",\"shell.execute_reply\":\"2022-01-01T11:46:45.131615Z\"},\"trusted\":true},\"execution_count\":3,\"outputs\":[]},{\"cell_type\":\"markdown\",\"source\":\"# Hyperparameters Tuning\",\"metadata\":{}},{\"cell_type\":\"code\",\"source\":\"### HYPERPARAM TUNING WITH GRID-SEARCH ###\\n\\nmodel = BoostSearch(clf_lgbm, param_grid=param_grid)\\nmodel.fit(X_clf_train, y_clf_train, eval_set=[(X_clf_valid, y_clf_valid)], early_stopping_rounds=6, verbose=0)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:46:45.134450Z\",\"iopub.execute_input\":\"2022-01-01T11:46:45.135435Z\",\"iopub.status.idle\":\"2022-01-01T11:46:46.383589Z\",\"shell.execute_reply.started\":\"2022-01-01T11:46:45.135389Z\",\"shell.execute_reply\":\"2022-01-01T11:46:46.382860Z\"},\"trusted\":true},\"execution_count\":4,\"outputs\":[{\"name\":\"stdout\",\"text\":\"\\n8 trials detected for ('learning_rate', 'num_leaves', 'max_depth')\\n\\ntrial: 0001 ### iterations: 00023 ### eval_score: 0.2085\\ntrial: 0002 ### iterations: 00019 ### eval_score: 0.21112\\ntrial: 0003 ### iterations: 00026 ### eval_score: 0.21162\\ntrial: 0004 ### iterations: 00032 ### eval_score: 0.20747\\ntrial: 0005 ### iterations: 00054 ### eval_score: 0.20244\\ntrial: 0006 ### iterations: 00071 ### eval_score: 0.20052\\ntrial: 0007 ### iterations: 00047 ### eval_score: 0.20306\\ntrial: 0008 ### iterations: 00050 ### eval_score: 0.20506\\n\",\"output_type\":\"stream\"},{\"execution_count\":4,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"BoostSearch(estimator=LGBMClassifier(n_estimators=150, random_state=0),\\n            param_grid={'learning_rate': [0.2, 0.1], 'max_depth': [10, 12],\\n                        'num_leaves': [25, 35]})\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"model.estimator_, model.best_params_, model.best_score_\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:46:46.388010Z\",\"iopub.execute_input\":\"2022-01-01T11:46:46.389926Z\",\"iopub.status.idle\":\"2022-01-01T11:46:46.397550Z\",\"shell.execute_reply.started\":\"2022-01-01T11:46:46.389888Z\",\"shell.execute_reply\":\"2022-01-01T11:46:46.396658Z\"},\"trusted\":true},\"execution_count\":5,\"outputs\":[{\"execution_count\":5,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(LGBMClassifier(max_depth=12, n_estimators=150, num_leaves=25, random_state=0),\\n {'learning_rate': 0.1, 'num_leaves': 25, 'max_depth': 12},\\n 0.20051586840398297)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"(model.score(X_clf_valid, y_clf_valid), \\n model.predict(X_clf_valid).shape, \\n model.predict_proba(X_clf_valid).shape)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:46:46.398765Z\",\"iopub.execute_input\":\"2022-01-01T11:46:46.399534Z\",\"iopub.status.idle\":\"2022-01-01T11:46:46.436761Z\",\"shell.execute_reply.started\":\"2022-01-01T11:46:46.399498Z\",\"shell.execute_reply\":\"2022-01-01T11:46:46.431623Z\"},\"trusted\":true},\"execution_count\":6,\"outputs\":[{\"execution_count\":6,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(0.9183333333333333, (1800,), (1800, 2))\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"### HYPERPARAM TUNING WITH RANDOM-SEARCH ###\\n\\nmodel = BoostSearch(\\n    regr_lgbm, param_grid=param_dist,\\n    n_iter=8, sampling_seed=0\\n)\\nmodel.fit(X_regr_train, y_regr_train, eval_set=[(X_regr_valid, y_regr_valid)], early_stopping_rounds=6, verbose=0)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:46:46.438241Z\",\"iopub.execute_input\":\"2022-01-01T11:46:46.438923Z\",\"iopub.status.idle\":\"2022-01-01T11:46:47.128794Z\",\"shell.execute_reply.started\":\"2022-01-01T11:46:46.438892Z\",\"shell.execute_reply\":\"2022-01-01T11:46:47.128107Z\"},\"trusted\":true},\"execution_count\":7,\"outputs\":[{\"name\":\"stdout\",\"text\":\"\\n8 trials detected for ('learning_rate', 'num_leaves', 'max_depth')\\n\\ntrial: 0001 ### iterations: 00023 ### eval_score: 0.07643\\ntrial: 0002 ### iterations: 00052 ### eval_score: 0.06818\\ntrial: 0003 ### iterations: 00062 ### eval_score: 0.07042\\ntrial: 0004 ### iterations: 00033 ### eval_score: 0.07035\\ntrial: 0005 ### iterations: 00032 ### eval_score: 0.07153\\ntrial: 0006 ### iterations: 00012 ### eval_score: 0.07547\\ntrial: 0007 ### iterations: 00041 ### eval_score: 0.07355\\ntrial: 0008 ### iterations: 00025 ### eval_score: 0.07805\\n\",\"output_type\":\"stream\"},{\"execution_count\":7,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"BoostSearch(estimator=LGBMRegressor(n_estimators=150, random_state=0), n_iter=8,\\n            param_grid={'learning_rate': <scipy.stats._distn_infrastructure.rv_frozen object at 0x7fd50407f2d0>,\\n                        'max_depth': [10, 12],\\n                        'num_leaves': <scipy.stats._distn_infrastructure.rv_frozen object at 0x7fd50407f590>},\\n            sampling_seed=0)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"model.estimator_, model.best_params_, model.best_score_\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:46:47.132071Z\",\"iopub.execute_input\":\"2022-01-01T11:46:47.132611Z\",\"iopub.status.idle\":\"2022-01-01T11:46:47.142185Z\",\"shell.execute_reply.started\":\"2022-01-01T11:46:47.132575Z\",\"shell.execute_reply\":\"2022-01-01T11:46:47.141271Z\"},\"trusted\":true},\"execution_count\":8,\"outputs\":[{\"execution_count\":8,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(LGBMRegressor(learning_rate=0.1350674222191923, max_depth=10, n_estimators=150,\\n               num_leaves=38, random_state=0),\\n {'learning_rate': 0.1350674222191923, 'num_leaves': 38, 'max_depth': 10},\\n 0.06817737242646997)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"(model.score(X_regr_valid, y_regr_valid), \\n model.predict(X_regr_valid).shape, \\n model.predict(X_regr_valid, pred_contrib=True).shape)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:46:47.143613Z\",\"iopub.execute_input\":\"2022-01-01T11:46:47.143856Z\",\"iopub.status.idle\":\"2022-01-01T11:46:47.611056Z\",\"shell.execute_reply.started\":\"2022-01-01T11:46:47.143827Z\",\"shell.execute_reply\":\"2022-01-01T11:46:47.610379Z\"},\"trusted\":true},\"execution_count\":9,\"outputs\":[{\"execution_count\":9,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(0.7272820930747703, (1800,), (1800, 21))\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"### HYPERPARAM TUNING WITH HYPEROPT ###\\n\\nmodel = BoostSearch(\\n    regr_lgbm, param_grid=param_dist_hyperopt,\\n    n_iter=8, sampling_seed=0\\n)\\nmodel.fit(\\n    X_regr_train, y_regr_train, trials=Trials(), \\n    eval_set=[(X_regr_valid, y_regr_valid)], early_stopping_rounds=6, verbose=0\\n)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:46:47.614530Z\",\"iopub.execute_input\":\"2022-01-01T11:46:47.616779Z\",\"iopub.status.idle\":\"2022-01-01T11:46:49.268236Z\",\"shell.execute_reply.started\":\"2022-01-01T11:46:47.616738Z\",\"shell.execute_reply\":\"2022-01-01T11:46:49.267608Z\"},\"trusted\":true},\"execution_count\":10,\"outputs\":[{\"name\":\"stdout\",\"text\":\"\\n8 trials detected for ('max_depth', 'learning_rate', 'colsample_bytree')\\n\\ntrial: 0001 ### iterations: 00149 ### eval_score: 0.06979\\ntrial: 0002 ### iterations: 00055 ### eval_score: 0.07039\\ntrial: 0003 ### iterations: 00056 ### eval_score: 0.0716\\ntrial: 0004 ### iterations: 00150 ### eval_score: 0.07352\\ntrial: 0005 ### iterations: 00150 ### eval_score: 0.07936\\ntrial: 0006 ### iterations: 00147 ### eval_score: 0.06833\\ntrial: 0007 ### iterations: 00032 ### eval_score: 0.07261\\ntrial: 0008 ### iterations: 00096 ### eval_score: 0.07074\\n\",\"output_type\":\"stream\"},{\"execution_count\":10,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"BoostSearch(estimator=LGBMRegressor(n_estimators=150, random_state=0), n_iter=8,\\n            param_grid={'colsample_bytree': <hyperopt.pyll.base.Apply object at 0x7fd50407fd10>,\\n                        'learning_rate': <hyperopt.pyll.base.Apply object at 0x7fd50407fa50>,\\n                        'max_depth': <hyperopt.pyll.base.Apply object at 0x7fd50407f710>},\\n            sampling_seed=0)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"model.estimator_, model.best_params_, model.best_score_\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:46:49.271739Z\",\"iopub.execute_input\":\"2022-01-01T11:46:49.272301Z\",\"iopub.status.idle\":\"2022-01-01T11:46:49.279337Z\",\"shell.execute_reply.started\":\"2022-01-01T11:46:49.272264Z\",\"shell.execute_reply\":\"2022-01-01T11:46:49.278727Z\"},\"trusted\":true},\"execution_count\":11,\"outputs\":[{\"execution_count\":11,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(LGBMRegressor(colsample_bytree=0.7597292534356749,\\n               learning_rate=0.059836658149176665, max_depth=16,\\n               n_estimators=150, random_state=0),\\n {'colsample_bytree': 0.7597292534356749,\\n  'learning_rate': 0.059836658149176665,\\n  'max_depth': 16},\\n 0.06832542425080958)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"(model.score(X_regr_valid, y_regr_valid), \\n model.predict(X_regr_valid).shape, \\n model.predict(X_regr_valid, pred_contrib=True).shape)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:46:49.280499Z\",\"iopub.execute_input\":\"2022-01-01T11:46:49.280735Z\",\"iopub.status.idle\":\"2022-01-01T11:46:50.260345Z\",\"shell.execute_reply.started\":\"2022-01-01T11:46:49.280700Z\",\"shell.execute_reply\":\"2022-01-01T11:46:50.259694Z\"},\"trusted\":true},\"execution_count\":12,\"outputs\":[{\"execution_count\":12,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(0.7266898674988451, (1800,), (1800, 21))\"},\"metadata\":{}}]},{\"cell_type\":\"markdown\",\"source\":\"# Features Selection\",\"metadata\":{}},{\"cell_type\":\"code\",\"source\":\"### BORUTA ###\\n\\nmodel = BoostBoruta(clf_lgbm, max_iter=200, perc=100)\\nmodel.fit(X_clf_train, y_clf_train, eval_set=[(X_clf_valid, y_clf_valid)], early_stopping_rounds=6, verbose=0)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:46:50.263726Z\",\"iopub.execute_input\":\"2022-01-01T11:46:50.265917Z\",\"iopub.status.idle\":\"2022-01-01T11:46:56.714012Z\",\"shell.execute_reply.started\":\"2022-01-01T11:46:50.265869Z\",\"shell.execute_reply\":\"2022-01-01T11:46:56.713278Z\"},\"trusted\":true},\"execution_count\":13,\"outputs\":[{\"execution_count\":13,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"BoostBoruta(estimator=LGBMClassifier(n_estimators=150, random_state=0),\\n            max_iter=200)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"model.estimator_, model.n_features_\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:46:56.720017Z\",\"iopub.execute_input\":\"2022-01-01T11:46:56.720486Z\",\"iopub.status.idle\":\"2022-01-01T11:46:56.727782Z\",\"shell.execute_reply.started\":\"2022-01-01T11:46:56.720450Z\",\"shell.execute_reply\":\"2022-01-01T11:46:56.726815Z\"},\"trusted\":true},\"execution_count\":14,\"outputs\":[{\"execution_count\":14,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(LGBMClassifier(n_estimators=150, random_state=0), 10)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"(model.score(X_clf_valid, y_clf_valid), \\n model.predict(X_clf_valid).shape, \\n model.transform(X_clf_valid).shape,\\n model.predict_proba(X_clf_valid).shape)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:46:56.730004Z\",\"iopub.execute_input\":\"2022-01-01T11:46:56.730326Z\",\"iopub.status.idle\":\"2022-01-01T11:46:56.765852Z\",\"shell.execute_reply.started\":\"2022-01-01T11:46:56.730286Z\",\"shell.execute_reply\":\"2022-01-01T11:46:56.760625Z\"},\"trusted\":true},\"execution_count\":15,\"outputs\":[{\"execution_count\":15,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(0.91, (1800,), (1800, 10), (1800, 2))\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"### RECURSIVE FEATURE ELIMINATION (RFE) ###\\n\\nmodel = BoostRFE(regr_lgbm, min_features_to_select=1, step=1)\\nmodel.fit(X_regr_train, y_regr_train, eval_set=[(X_regr_valid, y_regr_valid)], early_stopping_rounds=6, verbose=0)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:46:56.767160Z\",\"iopub.execute_input\":\"2022-01-01T11:46:56.767432Z\",\"iopub.status.idle\":\"2022-01-01T11:46:59.411924Z\",\"shell.execute_reply.started\":\"2022-01-01T11:46:56.767401Z\",\"shell.execute_reply\":\"2022-01-01T11:46:59.411240Z\"},\"trusted\":true},\"execution_count\":16,\"outputs\":[{\"execution_count\":16,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"BoostRFE(estimator=LGBMRegressor(n_estimators=150, random_state=0),\\n         min_features_to_select=1)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"model.estimator_, model.n_features_\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:46:59.415300Z\",\"iopub.execute_input\":\"2022-01-01T11:46:59.417330Z\",\"iopub.status.idle\":\"2022-01-01T11:46:59.424201Z\",\"shell.execute_reply.started\":\"2022-01-01T11:46:59.417288Z\",\"shell.execute_reply\":\"2022-01-01T11:46:59.423561Z\"},\"trusted\":true},\"execution_count\":17,\"outputs\":[{\"execution_count\":17,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(LGBMRegressor(n_estimators=150, random_state=0), 7)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"(model.score(X_regr_valid, y_regr_valid), \\n model.predict(X_regr_valid).shape, \\n model.transform(X_regr_valid).shape,\\n model.predict(X_regr_valid, pred_contrib=True).shape)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:46:59.425449Z\",\"iopub.execute_input\":\"2022-01-01T11:46:59.425674Z\",\"iopub.status.idle\":\"2022-01-01T11:47:00.248420Z\",\"shell.execute_reply.started\":\"2022-01-01T11:46:59.425645Z\",\"shell.execute_reply\":\"2022-01-01T11:47:00.247703Z\"},\"trusted\":true},\"execution_count\":18,\"outputs\":[{\"execution_count\":18,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(0.7766363424352807, (1800,), (1800, 7), (1800, 8))\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"### RECURSIVE FEATURE ADDITION (RFA) ###\\n\\nmodel = BoostRFA(regr_lgbm, min_features_to_select=1, step=1)\\nmodel.fit(X_regr_train, y_regr_train, eval_set=[(X_regr_valid, y_regr_valid)], early_stopping_rounds=6, verbose=0)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:47:00.251993Z\",\"iopub.execute_input\":\"2022-01-01T11:47:00.252510Z\",\"iopub.status.idle\":\"2022-01-01T11:47:03.954790Z\",\"shell.execute_reply.started\":\"2022-01-01T11:47:00.252473Z\",\"shell.execute_reply\":\"2022-01-01T11:47:03.954052Z\"},\"trusted\":true},\"execution_count\":19,\"outputs\":[{\"execution_count\":19,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"BoostRFA(estimator=LGBMRegressor(n_estimators=150, random_state=0),\\n         min_features_to_select=1)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"model.estimator_, model.n_features_\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:47:03.958397Z\",\"iopub.execute_input\":\"2022-01-01T11:47:03.958982Z\",\"iopub.status.idle\":\"2022-01-01T11:47:03.967715Z\",\"shell.execute_reply.started\":\"2022-01-01T11:47:03.958931Z\",\"shell.execute_reply\":\"2022-01-01T11:47:03.966909Z\"},\"trusted\":true},\"execution_count\":20,\"outputs\":[{\"execution_count\":20,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(LGBMRegressor(n_estimators=150, random_state=0), 8)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"(model.score(X_regr_valid, y_regr_valid), \\n model.predict(X_regr_valid).shape, \\n model.transform(X_regr_valid).shape,\\n model.predict(X_regr_valid, pred_contrib=True).shape)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:47:03.969215Z\",\"iopub.execute_input\":\"2022-01-01T11:47:03.969612Z\",\"iopub.status.idle\":\"2022-01-01T11:47:04.838820Z\",\"shell.execute_reply.started\":\"2022-01-01T11:47:03.969569Z\",\"shell.execute_reply\":\"2022-01-01T11:47:04.838192Z\"},\"trusted\":true},\"execution_count\":21,\"outputs\":[{\"execution_count\":21,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(0.7723191919698336, (1800,), (1800, 8), (1800, 9))\"},\"metadata\":{}}]},{\"cell_type\":\"markdown\",\"source\":\"# Features Selection with SHAP\",\"metadata\":{}},{\"cell_type\":\"code\",\"source\":\"### BORUTA SHAP ###\\n\\nmodel = BoostBoruta(\\n    clf_lgbm, max_iter=200, perc=100,\\n    importance_type='shap_importances', train_importance=False\\n)\\nmodel.fit(X_clf_train, y_clf_train, eval_set=[(X_clf_valid, y_clf_valid)], early_stopping_rounds=6, verbose=0)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:47:04.842289Z\",\"iopub.execute_input\":\"2022-01-01T11:47:04.844564Z\",\"iopub.status.idle\":\"2022-01-01T11:47:17.780389Z\",\"shell.execute_reply.started\":\"2022-01-01T11:47:04.844522Z\",\"shell.execute_reply\":\"2022-01-01T11:47:17.779726Z\"},\"trusted\":true},\"execution_count\":22,\"outputs\":[{\"execution_count\":22,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"BoostBoruta(estimator=LGBMClassifier(n_estimators=150, random_state=0),\\n            importance_type='shap_importances', max_iter=200,\\n            train_importance=False)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"model.estimator_, model.n_features_\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:47:17.781535Z\",\"iopub.execute_input\":\"2022-01-01T11:47:17.784569Z\",\"iopub.status.idle\":\"2022-01-01T11:47:17.791371Z\",\"shell.execute_reply.started\":\"2022-01-01T11:47:17.784530Z\",\"shell.execute_reply\":\"2022-01-01T11:47:17.790591Z\"},\"trusted\":true},\"execution_count\":23,\"outputs\":[{\"execution_count\":23,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(LGBMClassifier(n_estimators=150, random_state=0), 9)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"(model.score(X_clf_valid, y_clf_valid), \\n model.predict(X_clf_valid).shape, \\n model.transform(X_clf_valid).shape,\\n model.predict_proba(X_clf_valid).shape)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:47:17.794450Z\",\"iopub.execute_input\":\"2022-01-01T11:47:17.794986Z\",\"iopub.status.idle\":\"2022-01-01T11:47:17.813842Z\",\"shell.execute_reply.started\":\"2022-01-01T11:47:17.794933Z\",\"shell.execute_reply\":\"2022-01-01T11:47:17.813126Z\"},\"trusted\":true},\"execution_count\":24,\"outputs\":[{\"execution_count\":24,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(0.9111111111111111, (1800,), (1800, 9), (1800, 2))\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"### RECURSIVE FEATURE ELIMINATION (RFE) SHAP ###\\n\\nmodel = BoostRFE(\\n    regr_lgbm, min_features_to_select=1, step=1,\\n    importance_type='shap_importances', train_importance=False\\n)\\nmodel.fit(X_regr_train, y_regr_train, eval_set=[(X_regr_valid, y_regr_valid)], early_stopping_rounds=6, verbose=0)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:47:17.817477Z\",\"iopub.execute_input\":\"2022-01-01T11:47:17.819641Z\",\"iopub.status.idle\":\"2022-01-01T11:47:32.735329Z\",\"shell.execute_reply.started\":\"2022-01-01T11:47:17.819595Z\",\"shell.execute_reply\":\"2022-01-01T11:47:32.734687Z\"},\"trusted\":true},\"execution_count\":25,\"outputs\":[{\"execution_count\":25,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"BoostRFE(estimator=LGBMRegressor(n_estimators=150, random_state=0),\\n         importance_type='shap_importances', min_features_to_select=1,\\n         train_importance=False)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"model.estimator_, model.n_features_\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:47:32.736646Z\",\"iopub.execute_input\":\"2022-01-01T11:47:32.737109Z\",\"iopub.status.idle\":\"2022-01-01T11:47:32.743398Z\",\"shell.execute_reply.started\":\"2022-01-01T11:47:32.737074Z\",\"shell.execute_reply\":\"2022-01-01T11:47:32.742747Z\"},\"trusted\":true},\"execution_count\":26,\"outputs\":[{\"execution_count\":26,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(LGBMRegressor(n_estimators=150, random_state=0), 7)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"(model.score(X_regr_valid, y_regr_valid), \\n model.predict(X_regr_valid).shape, \\n model.transform(X_regr_valid).shape,\\n model.predict(X_regr_valid, pred_contrib=True).shape)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:47:32.744765Z\",\"iopub.execute_input\":\"2022-01-01T11:47:32.747374Z\",\"iopub.status.idle\":\"2022-01-01T11:47:33.570515Z\",\"shell.execute_reply.started\":\"2022-01-01T11:47:32.747336Z\",\"shell.execute_reply\":\"2022-01-01T11:47:33.569899Z\"},\"trusted\":true},\"execution_count\":27,\"outputs\":[{\"execution_count\":27,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(0.7766363424352807, (1800,), (1800, 7), (1800, 8))\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"### RECURSIVE FEATURE ADDITION (RFA) SHAP ###\\n\\nmodel = BoostRFA(\\n    regr_lgbm, min_features_to_select=1, step=1,\\n    importance_type='shap_importances', train_importance=False\\n)\\nmodel.fit(X_regr_train, y_regr_train, eval_set=[(X_regr_valid, y_regr_valid)], early_stopping_rounds=6, verbose=0)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:47:33.571778Z\",\"iopub.execute_input\":\"2022-01-01T11:47:33.572261Z\",\"iopub.status.idle\":\"2022-01-01T11:47:39.941084Z\",\"shell.execute_reply.started\":\"2022-01-01T11:47:33.572226Z\",\"shell.execute_reply\":\"2022-01-01T11:47:39.940356Z\"},\"trusted\":true},\"execution_count\":28,\"outputs\":[{\"execution_count\":28,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"BoostRFA(estimator=LGBMRegressor(n_estimators=150, random_state=0),\\n         importance_type='shap_importances', min_features_to_select=1,\\n         train_importance=False)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"model.estimator_, model.n_features_\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:47:39.944497Z\",\"iopub.execute_input\":\"2022-01-01T11:47:39.946592Z\",\"iopub.status.idle\":\"2022-01-01T11:47:39.953717Z\",\"shell.execute_reply.started\":\"2022-01-01T11:47:39.946550Z\",\"shell.execute_reply\":\"2022-01-01T11:47:39.952924Z\"},\"trusted\":true},\"execution_count\":29,\"outputs\":[{\"execution_count\":29,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(LGBMRegressor(n_estimators=150, random_state=0), 9)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"(model.score(X_regr_valid, y_regr_valid), \\n model.predict(X_regr_valid).shape, \\n model.transform(X_regr_valid).shape,\\n model.predict(X_regr_valid, pred_contrib=True).shape)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:47:39.954955Z\",\"iopub.execute_input\":\"2022-01-01T11:47:39.955713Z\",\"iopub.status.idle\":\"2022-01-01T11:47:40.853749Z\",\"shell.execute_reply.started\":\"2022-01-01T11:47:39.955669Z\",\"shell.execute_reply\":\"2022-01-01T11:47:40.853100Z\"},\"trusted\":true},\"execution_count\":30,\"outputs\":[{\"execution_count\":30,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(0.7699366468805918, (1800,), (1800, 9), (1800, 10))\"},\"metadata\":{}}]},{\"cell_type\":\"markdown\",\"source\":\"# Hyperparameters Tuning + Features Selection\",\"metadata\":{}},{\"cell_type\":\"code\",\"source\":\"### HYPERPARAM TUNING WITH GRID-SEARCH + BORUTA ###\\n\\nmodel = BoostBoruta(clf_lgbm, param_grid=param_grid, max_iter=200, perc=100)\\nmodel.fit(X_clf_train, y_clf_train, eval_set=[(X_clf_valid, y_clf_valid)], early_stopping_rounds=6, verbose=0)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:47:40.857000Z\",\"iopub.execute_input\":\"2022-01-01T11:47:40.859123Z\",\"iopub.status.idle\":\"2022-01-01T11:48:08.045782Z\",\"shell.execute_reply.started\":\"2022-01-01T11:47:40.859074Z\",\"shell.execute_reply\":\"2022-01-01T11:48:08.043191Z\"},\"trusted\":true},\"execution_count\":31,\"outputs\":[{\"name\":\"stdout\",\"text\":\"\\n8 trials detected for ('learning_rate', 'num_leaves', 'max_depth')\\n\\ntrial: 0001 ### iterations: 00023 ### eval_score: 0.19868\\ntrial: 0002 ### iterations: 00030 ### eval_score: 0.19844\\ntrial: 0003 ### iterations: 00023 ### eval_score: 0.19695\\ntrial: 0004 ### iterations: 00026 ### eval_score: 0.19949\\ntrial: 0005 ### iterations: 00067 ### eval_score: 0.19583\\ntrial: 0006 ### iterations: 00051 ### eval_score: 0.1949\\ntrial: 0007 ### iterations: 00045 ### eval_score: 0.19675\\ntrial: 0008 ### iterations: 00055 ### eval_score: 0.19906\\n\",\"output_type\":\"stream\"},{\"execution_count\":31,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"BoostBoruta(estimator=LGBMClassifier(n_estimators=150, random_state=0),\\n            max_iter=200,\\n            param_grid={'learning_rate': [0.2, 0.1], 'max_depth': [10, 12],\\n                        'num_leaves': [25, 35]})\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"model.estimator_, model.best_params_, model.best_score_, model.n_features_\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:48:08.047190Z\",\"iopub.execute_input\":\"2022-01-01T11:48:08.048047Z\",\"iopub.status.idle\":\"2022-01-01T11:48:08.056353Z\",\"shell.execute_reply.started\":\"2022-01-01T11:48:08.048000Z\",\"shell.execute_reply\":\"2022-01-01T11:48:08.055615Z\"},\"trusted\":true},\"execution_count\":32,\"outputs\":[{\"execution_count\":32,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(LGBMClassifier(max_depth=12, n_estimators=150, num_leaves=25, random_state=0),\\n {'learning_rate': 0.1, 'num_leaves': 25, 'max_depth': 12},\\n 0.19489866976777023,\\n 9)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"(model.score(X_clf_valid, y_clf_valid), \\n model.predict(X_clf_valid).shape, \\n model.transform(X_clf_valid).shape,\\n model.predict_proba(X_clf_valid).shape)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:48:08.058015Z\",\"iopub.execute_input\":\"2022-01-01T11:48:08.058593Z\",\"iopub.status.idle\":\"2022-01-01T11:48:08.109632Z\",\"shell.execute_reply.started\":\"2022-01-01T11:48:08.058410Z\",\"shell.execute_reply\":\"2022-01-01T11:48:08.108670Z\"},\"trusted\":true},\"execution_count\":33,\"outputs\":[{\"execution_count\":33,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(0.915, (1800,), (1800, 9), (1800, 2))\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"### HYPERPARAM TUNING WITH RANDOM-SEARCH + RECURSIVE FEATURE ELIMINATION (RFE) ###\\n\\nmodel = BoostRFE(\\n    regr_lgbm, param_grid=param_dist, min_features_to_select=1, step=1,\\n    n_iter=8, sampling_seed=0\\n)\\nmodel.fit(X_regr_train, y_regr_train, eval_set=[(X_regr_valid, y_regr_valid)], early_stopping_rounds=6, verbose=0)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:48:08.114460Z\",\"iopub.execute_input\":\"2022-01-01T11:48:08.116626Z\",\"iopub.status.idle\":\"2022-01-01T11:48:20.506235Z\",\"shell.execute_reply.started\":\"2022-01-01T11:48:08.116579Z\",\"shell.execute_reply\":\"2022-01-01T11:48:20.505511Z\"},\"trusted\":true},\"execution_count\":34,\"outputs\":[{\"name\":\"stdout\",\"text\":\"\\n8 trials detected for ('learning_rate', 'num_leaves', 'max_depth')\\n\\ntrial: 0001 ### iterations: 00107 ### eval_score: 0.06016\\ntrial: 0002 ### iterations: 00095 ### eval_score: 0.05711\\ntrial: 0003 ### iterations: 00121 ### eval_score: 0.05926\\ntrial: 0004 ### iterations: 00103 ### eval_score: 0.05688\\ntrial: 0005 ### iterations: 00119 ### eval_score: 0.05618\\ntrial: 0006 ### iterations: 00049 ### eval_score: 0.06188\\ntrial: 0007 ### iterations: 00150 ### eval_score: 0.05538\\ntrial: 0008 ### iterations: 00083 ### eval_score: 0.06084\\n\",\"output_type\":\"stream\"},{\"execution_count\":34,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"BoostRFE(estimator=LGBMRegressor(n_estimators=150, random_state=0),\\n         min_features_to_select=1, n_iter=8,\\n         param_grid={'learning_rate': <scipy.stats._distn_infrastructure.rv_frozen object at 0x7fd50407f2d0>,\\n                     'max_depth': [10, 12],\\n                     'num_leaves': <scipy.stats._distn_infrastructure.rv_frozen object at 0x7fd50407f590>},\\n         sampling_seed=0)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"model.estimator_, model.best_params_, model.best_score_, model.n_features_\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:48:20.509788Z\",\"iopub.execute_input\":\"2022-01-01T11:48:20.511633Z\",\"iopub.status.idle\":\"2022-01-01T11:48:20.521139Z\",\"shell.execute_reply.started\":\"2022-01-01T11:48:20.511592Z\",\"shell.execute_reply\":\"2022-01-01T11:48:20.520293Z\"},\"trusted\":true},\"execution_count\":35,\"outputs\":[{\"execution_count\":35,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(LGBMRegressor(learning_rate=0.13639381870463482, max_depth=12, n_estimators=150,\\n               num_leaves=25, random_state=0),\\n {'learning_rate': 0.13639381870463482, 'num_leaves': 25, 'max_depth': 12},\\n 0.0553821617278472,\\n 7)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"(model.score(X_regr_valid, y_regr_valid), \\n model.predict(X_regr_valid).shape, \\n model.transform(X_regr_valid).shape,\\n model.predict(X_regr_valid, pred_contrib=True).shape)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:48:20.522443Z\",\"iopub.execute_input\":\"2022-01-01T11:48:20.522817Z\",\"iopub.status.idle\":\"2022-01-01T11:48:21.145683Z\",\"shell.execute_reply.started\":\"2022-01-01T11:48:20.522785Z\",\"shell.execute_reply\":\"2022-01-01T11:48:21.145033Z\"},\"trusted\":true},\"execution_count\":36,\"outputs\":[{\"execution_count\":36,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(0.7784645155736596, (1800,), (1800, 7), (1800, 8))\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"### HYPERPARAM TUNING WITH HYPEROPT + RECURSIVE FEATURE ADDITION (RFA) ###\\n\\nmodel = BoostRFA(\\n    regr_lgbm, param_grid=param_dist_hyperopt, min_features_to_select=1, step=1,\\n    n_iter=8, sampling_seed=0\\n)\\nmodel.fit(\\n    X_regr_train, y_regr_train, trials=Trials(), \\n    eval_set=[(X_regr_valid, y_regr_valid)], early_stopping_rounds=6, verbose=0\\n)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:48:21.149492Z\",\"iopub.execute_input\":\"2022-01-01T11:48:21.151302Z\",\"iopub.status.idle\":\"2022-01-01T11:48:56.679453Z\",\"shell.execute_reply.started\":\"2022-01-01T11:48:21.151261Z\",\"shell.execute_reply\":\"2022-01-01T11:48:56.678720Z\"},\"trusted\":true},\"execution_count\":37,\"outputs\":[{\"name\":\"stdout\",\"text\":\"\\n8 trials detected for ('max_depth', 'learning_rate', 'colsample_bytree')\\n\\ntrial: 0001 ### iterations: 00150 ### eval_score: 0.06507\\ntrial: 0002 ### iterations: 00075 ### eval_score: 0.05784\\ntrial: 0003 ### iterations: 00095 ### eval_score: 0.06088\\ntrial: 0004 ### iterations: 00150 ### eval_score: 0.06976\\ntrial: 0005 ### iterations: 00150 ### eval_score: 0.07593\\ntrial: 0006 ### iterations: 00149 ### eval_score: 0.05995\\ntrial: 0007 ### iterations: 00058 ### eval_score: 0.05916\\ntrial: 0008 ### iterations: 00150 ### eval_score: 0.06366\\n\",\"output_type\":\"stream\"},{\"execution_count\":37,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"BoostRFA(estimator=LGBMRegressor(n_estimators=150, random_state=0),\\n         min_features_to_select=1, n_iter=8,\\n         param_grid={'colsample_bytree': <hyperopt.pyll.base.Apply object at 0x7fd50407fd10>,\\n                     'learning_rate': <hyperopt.pyll.base.Apply object at 0x7fd50407fa50>,\\n                     'max_depth': <hyperopt.pyll.base.Apply object at 0x7fd50407f710>},\\n         sampling_seed=0)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"model.estimator_, model.best_params_, model.best_score_, model.n_features_\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:48:56.682847Z\",\"iopub.execute_input\":\"2022-01-01T11:48:56.684405Z\",\"iopub.status.idle\":\"2022-01-01T11:48:56.691812Z\",\"shell.execute_reply.started\":\"2022-01-01T11:48:56.684368Z\",\"shell.execute_reply\":\"2022-01-01T11:48:56.690932Z\"},\"trusted\":true},\"execution_count\":38,\"outputs\":[{\"execution_count\":38,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(LGBMRegressor(colsample_bytree=0.8515260655364685,\\n               learning_rate=0.13520045129619862, max_depth=18, n_estimators=150,\\n               random_state=0),\\n {'colsample_bytree': 0.8515260655364685,\\n  'learning_rate': 0.13520045129619862,\\n  'max_depth': 18},\\n 0.0578369356489881,\\n 8)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"(model.score(X_regr_valid, y_regr_valid), \\n model.predict(X_regr_valid).shape, \\n model.transform(X_regr_valid).shape,\\n model.predict(X_regr_valid, pred_contrib=True).shape)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:48:56.693078Z\",\"iopub.execute_input\":\"2022-01-01T11:48:56.693305Z\",\"iopub.status.idle\":\"2022-01-01T11:48:57.115924Z\",\"shell.execute_reply.started\":\"2022-01-01T11:48:56.693277Z\",\"shell.execute_reply\":\"2022-01-01T11:48:57.115308Z\"},\"trusted\":true},\"execution_count\":39,\"outputs\":[{\"execution_count\":39,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(0.7686451168212334, (1800,), (1800, 8), (1800, 9))\"},\"metadata\":{}}]},{\"cell_type\":\"markdown\",\"source\":\"# Hyperparameters Tuning + Features Selection with SHAP\",\"metadata\":{}},{\"cell_type\":\"code\",\"source\":\"### HYPERPARAM TUNING WITH GRID-SEARCH + BORUTA SHAP ###\\n\\nmodel = BoostBoruta(\\n    clf_lgbm, param_grid=param_grid, max_iter=200, perc=100,\\n    importance_type='shap_importances', train_importance=False\\n)\\nmodel.fit(X_clf_train, y_clf_train, eval_set=[(X_clf_valid, y_clf_valid)], early_stopping_rounds=6, verbose=0)\",\"metadata\":{\"scrolled\":true,\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:48:57.119397Z\",\"iopub.execute_input\":\"2022-01-01T11:48:57.120009Z\",\"iopub.status.idle\":\"2022-01-01T11:50:15.982498Z\",\"shell.execute_reply.started\":\"2022-01-01T11:48:57.119958Z\",\"shell.execute_reply\":\"2022-01-01T11:50:15.981774Z\"},\"trusted\":true},\"execution_count\":40,\"outputs\":[{\"name\":\"stdout\",\"text\":\"\\n8 trials detected for ('learning_rate', 'num_leaves', 'max_depth')\\n\\ntrial: 0001 ### iterations: 00036 ### eval_score: 0.19716\\ntrial: 0002 ### iterations: 00030 ### eval_score: 0.19818\\ntrial: 0003 ### iterations: 00031 ### eval_score: 0.19881\\ntrial: 0004 ### iterations: 00026 ### eval_score: 0.19949\\ntrial: 0005 ### iterations: 00067 ### eval_score: 0.19583\\ntrial: 0006 ### iterations: 00051 ### eval_score: 0.1949\\ntrial: 0007 ### iterations: 00045 ### eval_score: 0.19675\\ntrial: 0008 ### iterations: 00057 ### eval_score: 0.19284\\n\",\"output_type\":\"stream\"},{\"execution_count\":40,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"BoostBoruta(estimator=LGBMClassifier(n_estimators=150, random_state=0),\\n            importance_type='shap_importances', max_iter=200,\\n            param_grid={'learning_rate': [0.2, 0.1], 'max_depth': [10, 12],\\n                        'num_leaves': [25, 35]},\\n            train_importance=False)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"model.estimator_, model.best_params_, model.best_score_, model.n_features_\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:50:15.988196Z\",\"iopub.execute_input\":\"2022-01-01T11:50:15.988729Z\",\"iopub.status.idle\":\"2022-01-01T11:50:15.996898Z\",\"shell.execute_reply.started\":\"2022-01-01T11:50:15.988685Z\",\"shell.execute_reply\":\"2022-01-01T11:50:15.996175Z\"},\"trusted\":true},\"execution_count\":41,\"outputs\":[{\"execution_count\":41,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(LGBMClassifier(max_depth=12, n_estimators=150, num_leaves=35, random_state=0),\\n {'learning_rate': 0.1, 'num_leaves': 35, 'max_depth': 12},\\n 0.1928371931511303,\\n 10)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"(model.score(X_clf_valid, y_clf_valid), \\n model.predict(X_clf_valid).shape, \\n model.transform(X_clf_valid).shape,\\n model.predict_proba(X_clf_valid).shape)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:50:15.998631Z\",\"iopub.execute_input\":\"2022-01-01T11:50:15.999269Z\",\"iopub.status.idle\":\"2022-01-01T11:50:16.029050Z\",\"shell.execute_reply.started\":\"2022-01-01T11:50:15.999228Z\",\"shell.execute_reply\":\"2022-01-01T11:50:16.028270Z\"},\"trusted\":true},\"execution_count\":42,\"outputs\":[{\"execution_count\":42,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(0.9111111111111111, (1800,), (1800, 10), (1800, 2))\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"### HYPERPARAM TUNING WITH RANDOM-SEARCH + RECURSIVE FEATURE ELIMINATION (RFE) SHAP ###\\n\\nmodel = BoostRFE(\\n    regr_lgbm, param_grid=param_dist, min_features_to_select=1, step=1,\\n    n_iter=8, sampling_seed=0,\\n    importance_type='shap_importances', train_importance=False\\n)\\nmodel.fit(X_regr_train, y_regr_train, eval_set=[(X_regr_valid, y_regr_valid)], early_stopping_rounds=6, verbose=0)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:50:16.030261Z\",\"iopub.execute_input\":\"2022-01-01T11:50:16.030658Z\",\"iopub.status.idle\":\"2022-01-01T11:51:19.095150Z\",\"shell.execute_reply.started\":\"2022-01-01T11:50:16.030625Z\",\"shell.execute_reply\":\"2022-01-01T11:51:19.094483Z\"},\"trusted\":true},\"execution_count\":43,\"outputs\":[{\"name\":\"stdout\",\"text\":\"\\n8 trials detected for ('learning_rate', 'num_leaves', 'max_depth')\\n\\ntrial: 0001 ### iterations: 00107 ### eval_score: 0.06016\\ntrial: 0002 ### iterations: 00102 ### eval_score: 0.05525\\ntrial: 0003 ### iterations: 00150 ### eval_score: 0.05869\\ntrial: 0004 ### iterations: 00149 ### eval_score: 0.05863\\ntrial: 0005 ### iterations: 00119 ### eval_score: 0.05618\\ntrial: 0006 ### iterations: 00049 ### eval_score: 0.06188\\ntrial: 0007 ### iterations: 00150 ### eval_score: 0.05538\\ntrial: 0008 ### iterations: 00083 ### eval_score: 0.06084\\n\",\"output_type\":\"stream\"},{\"execution_count\":43,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"BoostRFE(estimator=LGBMRegressor(n_estimators=150, random_state=0),\\n         importance_type='shap_importances', min_features_to_select=1, n_iter=8,\\n         param_grid={'learning_rate': <scipy.stats._distn_infrastructure.rv_frozen object at 0x7fd50407f2d0>,\\n                     'max_depth': [10, 12],\\n                     'num_leaves': <scipy.stats._distn_infrastructure.rv_frozen object at 0x7fd50407f590>},\\n         sampling_seed=0, train_importance=False)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"model.estimator_, model.best_params_, model.best_score_, model.n_features_\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:51:19.098487Z\",\"iopub.execute_input\":\"2022-01-01T11:51:19.099062Z\",\"iopub.status.idle\":\"2022-01-01T11:51:19.108772Z\",\"shell.execute_reply.started\":\"2022-01-01T11:51:19.099027Z\",\"shell.execute_reply\":\"2022-01-01T11:51:19.107939Z\"},\"trusted\":true},\"execution_count\":44,\"outputs\":[{\"execution_count\":44,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(LGBMRegressor(learning_rate=0.1350674222191923, max_depth=10, n_estimators=150,\\n               num_leaves=38, random_state=0),\\n {'learning_rate': 0.1350674222191923, 'num_leaves': 38, 'max_depth': 10},\\n 0.05524518772497125,\\n 9)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"(model.score(X_regr_valid, y_regr_valid), \\n model.predict(X_regr_valid).shape, \\n model.transform(X_regr_valid).shape,\\n model.predict(X_regr_valid, pred_contrib=True).shape)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:51:19.110141Z\",\"iopub.execute_input\":\"2022-01-01T11:51:19.110358Z\",\"iopub.status.idle\":\"2022-01-01T11:51:19.840667Z\",\"shell.execute_reply.started\":\"2022-01-01T11:51:19.110333Z\",\"shell.execute_reply\":\"2022-01-01T11:51:19.840035Z\"},\"trusted\":true},\"execution_count\":45,\"outputs\":[{\"execution_count\":45,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(0.779012428496056, (1800,), (1800, 9), (1800, 10))\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"### HYPERPARAM TUNING WITH HYPEROPT + RECURSIVE FEATURE ADDITION (RFA) SHAP ###\\n\\nmodel = BoostRFA(\\n    regr_lgbm, param_grid=param_dist_hyperopt, min_features_to_select=1, step=1,\\n    n_iter=8, sampling_seed=0,\\n    importance_type='shap_importances', train_importance=False\\n)\\nmodel.fit(\\n    X_regr_train, y_regr_train, trials=Trials(), \\n    eval_set=[(X_regr_valid, y_regr_valid)], early_stopping_rounds=6, verbose=0\\n)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:51:19.844245Z\",\"iopub.execute_input\":\"2022-01-01T11:51:19.844839Z\",\"iopub.status.idle\":\"2022-01-01T11:52:27.830673Z\",\"shell.execute_reply.started\":\"2022-01-01T11:51:19.844800Z\",\"shell.execute_reply\":\"2022-01-01T11:52:27.829915Z\"},\"trusted\":true},\"execution_count\":46,\"outputs\":[{\"name\":\"stdout\",\"text\":\"\\n8 trials detected for ('max_depth', 'learning_rate', 'colsample_bytree')\\n\\ntrial: 0001 ### iterations: 00150 ### eval_score: 0.06508\\ntrial: 0002 ### iterations: 00091 ### eval_score: 0.05997\\ntrial: 0003 ### iterations: 00094 ### eval_score: 0.06078\\ntrial: 0004 ### iterations: 00150 ### eval_score: 0.06773\\ntrial: 0005 ### iterations: 00150 ### eval_score: 0.07565\\ntrial: 0006 ### iterations: 00150 ### eval_score: 0.05935\\ntrial: 0007 ### iterations: 00083 ### eval_score: 0.06047\\ntrial: 0008 ### iterations: 00150 ### eval_score: 0.05966\\n\",\"output_type\":\"stream\"},{\"execution_count\":46,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"BoostRFA(estimator=LGBMRegressor(n_estimators=150, random_state=0),\\n         importance_type='shap_importances', min_features_to_select=1, n_iter=8,\\n         param_grid={'colsample_bytree': <hyperopt.pyll.base.Apply object at 0x7fd50407fd10>,\\n                     'learning_rate': <hyperopt.pyll.base.Apply object at 0x7fd50407fa50>,\\n                     'max_depth': <hyperopt.pyll.base.Apply object at 0x7fd50407f710>},\\n         sampling_seed=0, train_importance=False)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"model.estimator_, model.best_params_, model.best_score_, model.n_features_\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:52:27.834402Z\",\"iopub.execute_input\":\"2022-01-01T11:52:27.835864Z\",\"iopub.status.idle\":\"2022-01-01T11:52:27.842813Z\",\"shell.execute_reply.started\":\"2022-01-01T11:52:27.835812Z\",\"shell.execute_reply\":\"2022-01-01T11:52:27.842095Z\"},\"trusted\":true},\"execution_count\":47,\"outputs\":[{\"execution_count\":47,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(LGBMRegressor(colsample_bytree=0.7597292534356749,\\n               learning_rate=0.059836658149176665, max_depth=16,\\n               n_estimators=150, random_state=0),\\n {'colsample_bytree': 0.7597292534356749,\\n  'learning_rate': 0.059836658149176665,\\n  'max_depth': 16},\\n 0.059352961644604275,\\n 9)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"(model.score(X_regr_valid, y_regr_valid), \\n model.predict(X_regr_valid).shape, \\n model.transform(X_regr_valid).shape,\\n model.predict(X_regr_valid, pred_contrib=True).shape)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:52:27.844085Z\",\"iopub.execute_input\":\"2022-01-01T11:52:27.844320Z\",\"iopub.status.idle\":\"2022-01-01T11:52:28.690931Z\",\"shell.execute_reply.started\":\"2022-01-01T11:52:27.844291Z\",\"shell.execute_reply\":\"2022-01-01T11:52:28.690302Z\"},\"trusted\":true},\"execution_count\":48,\"outputs\":[{\"execution_count\":48,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(0.7625808256692885, (1800,), (1800, 9), (1800, 10))\"},\"metadata\":{}}]},{\"cell_type\":\"markdown\",\"source\":\"# CUSTOM EVAL METRIC SUPPORT\",\"metadata\":{}},{\"cell_type\":\"code\",\"source\":\"from sklearn.metrics import roc_auc_score\\n\\ndef AUC(y_true, y_hat):\\n    return 'auc', roc_auc_score(y_true, y_hat), True\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:52:28.691909Z\",\"iopub.execute_input\":\"2022-01-01T11:52:28.692560Z\",\"iopub.status.idle\":\"2022-01-01T11:52:28.696813Z\",\"shell.execute_reply.started\":\"2022-01-01T11:52:28.692526Z\",\"shell.execute_reply\":\"2022-01-01T11:52:28.696058Z\"},\"trusted\":true},\"execution_count\":49,\"outputs\":[]},{\"cell_type\":\"code\",\"source\":\"model = BoostRFE(\\n    LGBMClassifier(n_estimators=150, random_state=0, metric=\\\"custom\\\"), \\n    param_grid=param_grid, min_features_to_select=1, step=1,\\n    greater_is_better=True\\n)\\nmodel.fit(\\n    X_clf_train, y_clf_train, \\n    eval_set=[(X_clf_valid, y_clf_valid)], early_stopping_rounds=6, verbose=0, \\n    eval_metric=AUC\\n)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:52:28.700234Z\",\"iopub.execute_input\":\"2022-01-01T11:52:28.700461Z\",\"iopub.status.idle\":\"2022-01-01T11:52:49.577997Z\",\"shell.execute_reply.started\":\"2022-01-01T11:52:28.700433Z\",\"shell.execute_reply\":\"2022-01-01T11:52:49.577317Z\"},\"trusted\":true},\"execution_count\":50,\"outputs\":[{\"name\":\"stdout\",\"text\":\"\\n8 trials detected for ('learning_rate', 'num_leaves', 'max_depth')\\n\\ntrial: 0001 ### iterations: 00028 ### eval_score: 0.97581\\ntrial: 0002 ### iterations: 00016 ### eval_score: 0.97514\\ntrial: 0003 ### iterations: 00015 ### eval_score: 0.97574\\ntrial: 0004 ### iterations: 00032 ### eval_score: 0.97549\\ntrial: 0005 ### iterations: 00075 ### eval_score: 0.97551\\ntrial: 0006 ### iterations: 00041 ### eval_score: 0.97597\\ntrial: 0007 ### iterations: 00076 ### eval_score: 0.97592\\ntrial: 0008 ### iterations: 00060 ### eval_score: 0.97539\\n\",\"output_type\":\"stream\"},{\"execution_count\":50,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"BoostRFE(estimator=LGBMClassifier(metric='custom', n_estimators=150,\\n                                  random_state=0),\\n         greater_is_better=True, min_features_to_select=1,\\n         param_grid={'learning_rate': [0.2, 0.1], 'max_depth': [10, 12],\\n                     'num_leaves': [25, 35]})\"},\"metadata\":{}}]},{\"cell_type\":\"markdown\",\"source\":\"# CATEGORICAL FEATURE SUPPORT\",\"metadata\":{}},{\"cell_type\":\"code\",\"source\":\"categorical_feature = [0,1,2]\\n\\nX_clf_train[:,categorical_feature] = (X_clf_train[:,categorical_feature]+100).clip(0).astype(int)\\nX_clf_valid[:,categorical_feature] = (X_clf_valid[:,categorical_feature]+100).clip(0).astype(int)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:52:49.581409Z\",\"iopub.execute_input\":\"2022-01-01T11:52:49.581982Z\",\"iopub.status.idle\":\"2022-01-01T11:52:49.589315Z\",\"shell.execute_reply.started\":\"2022-01-01T11:52:49.581931Z\",\"shell.execute_reply\":\"2022-01-01T11:52:49.588511Z\"},\"trusted\":true},\"execution_count\":51,\"outputs\":[]},{\"cell_type\":\"code\",\"source\":\"### MANUAL PASS categorical_feature WITH NUMPY ARRAYS ###\\n\\nmodel = BoostRFE(clf_lgbm, param_grid=param_grid, min_features_to_select=1, step=1)\\nmodel.fit(\\n    X_clf_train, y_clf_train, \\n    eval_set=[(X_clf_valid, y_clf_valid)], early_stopping_rounds=6, verbose=0,\\n    categorical_feature=categorical_feature\\n)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:52:49.590366Z\",\"iopub.execute_input\":\"2022-01-01T11:52:49.590604Z\",\"iopub.status.idle\":\"2022-01-01T11:53:00.495917Z\",\"shell.execute_reply.started\":\"2022-01-01T11:52:49.590576Z\",\"shell.execute_reply\":\"2022-01-01T11:53:00.495224Z\"},\"trusted\":true},\"execution_count\":52,\"outputs\":[{\"name\":\"stdout\",\"text\":\"\\n8 trials detected for ('learning_rate', 'num_leaves', 'max_depth')\\n\\ntrial: 0001 ### iterations: 00029 ### eval_score: 0.2036\\ntrial: 0002 ### iterations: 00030 ### eval_score: 0.2034\\ntrial: 0003 ### iterations: 00027 ### eval_score: 0.20617\\ntrial: 0004 ### iterations: 00024 ### eval_score: 0.20003\\ntrial: 0005 ### iterations: 00060 ### eval_score: 0.20332\\ntrial: 0006 ### iterations: 00063 ### eval_score: 0.20329\\ntrial: 0007 ### iterations: 00054 ### eval_score: 0.20136\\ntrial: 0008 ### iterations: 00052 ### eval_score: 0.19959\\n\",\"output_type\":\"stream\"},{\"execution_count\":52,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"BoostRFE(estimator=LGBMClassifier(n_estimators=150, random_state=0),\\n         min_features_to_select=1,\\n         param_grid={'learning_rate': [0.2, 0.1], 'max_depth': [10, 12],\\n                     'num_leaves': [25, 35]})\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"X_clf_train = pd.DataFrame(X_clf_train)\\nX_clf_train[categorical_feature] = X_clf_train[categorical_feature].astype('category')\\n\\nX_clf_valid = pd.DataFrame(X_clf_valid)\\nX_clf_valid[categorical_feature] = X_clf_valid[categorical_feature].astype('category')\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:53:00.499198Z\",\"iopub.execute_input\":\"2022-01-01T11:53:00.499858Z\",\"iopub.status.idle\":\"2022-01-01T11:53:00.527402Z\",\"shell.execute_reply.started\":\"2022-01-01T11:53:00.499814Z\",\"shell.execute_reply\":\"2022-01-01T11:53:00.526779Z\"},\"trusted\":true},\"execution_count\":53,\"outputs\":[]},{\"cell_type\":\"code\",\"source\":\"### PASS category COLUMNS IN PANDAS DF ###\\n\\nmodel = BoostRFE(clf_lgbm, param_grid=param_grid, min_features_to_select=1, step=1)\\nmodel.fit(X_clf_train, y_clf_train, eval_set=[(X_clf_valid, y_clf_valid)], early_stopping_rounds=6, verbose=0)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:53:00.529027Z\",\"iopub.execute_input\":\"2022-01-01T11:53:00.529320Z\",\"iopub.status.idle\":\"2022-01-01T11:53:12.422092Z\",\"shell.execute_reply.started\":\"2022-01-01T11:53:00.529281Z\",\"shell.execute_reply\":\"2022-01-01T11:53:12.421368Z\"},\"trusted\":true},\"execution_count\":54,\"outputs\":[{\"name\":\"stdout\",\"text\":\"\\n8 trials detected for ('learning_rate', 'num_leaves', 'max_depth')\\n\\ntrial: 0001 ### iterations: 00029 ### eval_score: 0.2036\\ntrial: 0002 ### iterations: 00030 ### eval_score: 0.2034\\ntrial: 0003 ### iterations: 00027 ### eval_score: 0.20617\\ntrial: 0004 ### iterations: 00024 ### eval_score: 0.20003\\ntrial: 0005 ### iterations: 00060 ### eval_score: 0.20332\\ntrial: 0006 ### iterations: 00063 ### eval_score: 0.20329\\ntrial: 0007 ### iterations: 00054 ### eval_score: 0.20136\\ntrial: 0008 ### iterations: 00052 ### eval_score: 0.19959\\n\",\"output_type\":\"stream\"},{\"execution_count\":54,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"BoostRFE(estimator=LGBMClassifier(n_estimators=150, random_state=0),\\n         min_features_to_select=1,\\n         param_grid={'learning_rate': [0.2, 0.1], 'max_depth': [10, 12],\\n                     'num_leaves': [25, 35]})\"},\"metadata\":{}}]}]}"
  },
  {
    "path": "notebooks/XGBoost_usage.ipynb",
    "content": "{\"metadata\":{\"kernelspec\":{\"language\":\"python\",\"display_name\":\"Python 3\",\"name\":\"python3\"},\"language_info\":{\"name\":\"python\",\"version\":\"3.7.12\",\"mimetype\":\"text/x-python\",\"codemirror_mode\":{\"name\":\"ipython\",\"version\":3},\"pygments_lexer\":\"ipython3\",\"nbconvert_exporter\":\"python\",\"file_extension\":\".py\"}},\"nbformat_minor\":4,\"nbformat\":4,\"cells\":[{\"cell_type\":\"code\",\"source\":\"import numpy as np\\nimport pandas as pd\\nfrom scipy import stats\\n\\nfrom sklearn.model_selection import train_test_split\\nfrom sklearn.datasets import make_classification, make_regression\\n\\nfrom hyperopt import hp\\nfrom hyperopt import Trials\\n\\nfrom xgboost import *\\n\\ntry:\\n    from shaphypetune import BoostSearch, BoostBoruta, BoostRFE, BoostRFA\\nexcept:\\n    !pip install --upgrade shap-hypetune\\n    from shaphypetune import BoostSearch, BoostBoruta, BoostRFE, BoostRFA\\n\\nimport warnings\\nwarnings.simplefilter('ignore')\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:49:44.031173Z\",\"iopub.execute_input\":\"2022-01-01T11:49:44.031497Z\",\"iopub.status.idle\":\"2022-01-01T11:49:45.071830Z\",\"shell.execute_reply.started\":\"2022-01-01T11:49:44.031410Z\",\"shell.execute_reply\":\"2022-01-01T11:49:45.070928Z\"},\"trusted\":true},\"execution_count\":1,\"outputs\":[]},{\"cell_type\":\"code\",\"source\":\"X_clf, y_clf = make_classification(n_samples=6000, n_features=20, n_classes=2, \\n                                   n_informative=4, n_redundant=6, random_state=0)\\n\\nX_clf_train, X_clf_valid, y_clf_train, y_clf_valid = train_test_split(\\n    X_clf, y_clf, test_size=0.3, shuffle=False)\\n\\nX_regr, y_regr = make_classification(n_samples=6000, n_features=20,\\n                                     n_informative=7, random_state=0)\\n\\nX_regr_train, X_regr_valid, y_regr_train, y_regr_valid = train_test_split(\\n    X_regr, y_regr, test_size=0.3, shuffle=False)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:49:45.073832Z\",\"iopub.execute_input\":\"2022-01-01T11:49:45.074046Z\",\"iopub.status.idle\":\"2022-01-01T11:49:45.098178Z\",\"shell.execute_reply.started\":\"2022-01-01T11:49:45.074004Z\",\"shell.execute_reply\":\"2022-01-01T11:49:45.097461Z\"},\"trusted\":true},\"execution_count\":2,\"outputs\":[]},{\"cell_type\":\"code\",\"source\":\"param_grid = {\\n    'learning_rate': [0.2, 0.1],\\n    'num_leaves': [25, 35],\\n    'max_depth': [10, 12]\\n}\\n\\nparam_dist = {\\n    'learning_rate': stats.uniform(0.09, 0.25),\\n    'num_leaves': stats.randint(20,40),\\n    'max_depth': [10, 12]\\n}\\n\\nparam_dist_hyperopt = {\\n    'max_depth': 15 + hp.randint('num_leaves', 5), \\n    'learning_rate': hp.loguniform('learning_rate', np.log(0.01), np.log(0.2)),\\n    'colsample_bytree': hp.uniform('colsample_by_tree', 0.6, 1.0)\\n}\\n\\n\\nregr_xgb = XGBRegressor(n_estimators=150, random_state=0, verbosity=0, n_jobs=-1)\\nclf_xgb = XGBClassifier(n_estimators=150, random_state=0, verbosity=0, n_jobs=-1)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:49:45.099715Z\",\"iopub.execute_input\":\"2022-01-01T11:49:45.099916Z\",\"iopub.status.idle\":\"2022-01-01T11:49:45.108765Z\",\"shell.execute_reply.started\":\"2022-01-01T11:49:45.099890Z\",\"shell.execute_reply\":\"2022-01-01T11:49:45.107996Z\"},\"trusted\":true},\"execution_count\":3,\"outputs\":[]},{\"cell_type\":\"markdown\",\"source\":\"# Hyperparameters Tuning\",\"metadata\":{}},{\"cell_type\":\"code\",\"source\":\"### HYPERPARAM TUNING WITH GRID-SEARCH ###\\n\\nmodel = BoostSearch(clf_xgb, param_grid=param_grid)\\nmodel.fit(X_clf_train, y_clf_train, eval_set=[(X_clf_valid, y_clf_valid)], early_stopping_rounds=6, verbose=0)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:49:45.109686Z\",\"iopub.execute_input\":\"2022-01-01T11:49:45.109871Z\",\"iopub.status.idle\":\"2022-01-01T11:49:52.490942Z\",\"shell.execute_reply.started\":\"2022-01-01T11:49:45.109848Z\",\"shell.execute_reply\":\"2022-01-01T11:49:52.490078Z\"},\"trusted\":true},\"execution_count\":4,\"outputs\":[{\"name\":\"stdout\",\"text\":\"\\n8 trials detected for ('learning_rate', 'num_leaves', 'max_depth')\\n\\ntrial: 0001 ### iterations: 00021 ### eval_score: 0.2045\\ntrial: 0002 ### iterations: 00026 ### eval_score: 0.19472\\ntrial: 0003 ### iterations: 00021 ### eval_score: 0.2045\\ntrial: 0004 ### iterations: 00026 ### eval_score: 0.19472\\ntrial: 0005 ### iterations: 00045 ### eval_score: 0.19964\\ntrial: 0006 ### iterations: 00050 ### eval_score: 0.20157\\ntrial: 0007 ### iterations: 00045 ### eval_score: 0.19964\\ntrial: 0008 ### iterations: 00050 ### eval_score: 0.20157\\n\",\"output_type\":\"stream\"},{\"execution_count\":4,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"BoostSearch(estimator=XGBClassifier(base_score=None, booster=None,\\n                                    colsample_bylevel=None,\\n                                    colsample_bynode=None,\\n                                    colsample_bytree=None,\\n                                    enable_categorical=False, gamma=None,\\n                                    gpu_id=None, importance_type=None,\\n                                    interaction_constraints=None,\\n                                    learning_rate=None, max_delta_step=None,\\n                                    max_depth=None, min_child_weight=None,\\n                                    missing=nan, monotone_constraints=None,\\n                                    n_estimators=150, n_jobs=-1,\\n                                    num_parallel_tree=None, predictor=None,\\n                                    random_state=0, reg_alpha=None,\\n                                    reg_lambda=None, scale_pos_weight=None,\\n                                    subsample=None, tree_method=None,\\n                                    validate_parameters=None, verbosity=0),\\n            param_grid={'learning_rate': [0.2, 0.1], 'max_depth': [10, 12],\\n                        'num_leaves': [25, 35]})\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"model.estimator_, model.best_params_, model.best_score_\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:49:52.493607Z\",\"iopub.execute_input\":\"2022-01-01T11:49:52.494126Z\",\"iopub.status.idle\":\"2022-01-01T11:49:52.504649Z\",\"shell.execute_reply.started\":\"2022-01-01T11:49:52.494081Z\",\"shell.execute_reply\":\"2022-01-01T11:49:52.503849Z\"},\"trusted\":true},\"execution_count\":5,\"outputs\":[{\"execution_count\":5,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,\\n               colsample_bynode=1, colsample_bytree=1, enable_categorical=False,\\n               gamma=0, gpu_id=-1, importance_type=None,\\n               interaction_constraints='', learning_rate=0.2, max_delta_step=0,\\n               max_depth=12, min_child_weight=1, missing=nan,\\n               monotone_constraints='()', n_estimators=150, n_jobs=-1,\\n               num_leaves=25, num_parallel_tree=1, predictor='auto',\\n               random_state=0, reg_alpha=0, reg_lambda=1, scale_pos_weight=1,\\n               subsample=1, tree_method='exact', validate_parameters=1,\\n               verbosity=0),\\n {'learning_rate': 0.2, 'num_leaves': 25, 'max_depth': 12},\\n 0.194719)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"(model.score(X_clf_valid, y_clf_valid), \\n model.predict(X_clf_valid).shape, \\n model.predict_proba(X_clf_valid).shape)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:49:52.506201Z\",\"iopub.execute_input\":\"2022-01-01T11:49:52.506365Z\",\"iopub.status.idle\":\"2022-01-01T11:49:52.528604Z\",\"shell.execute_reply.started\":\"2022-01-01T11:49:52.506344Z\",\"shell.execute_reply\":\"2022-01-01T11:49:52.528078Z\"},\"trusted\":true},\"execution_count\":6,\"outputs\":[{\"execution_count\":6,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(0.9138888888888889, (1800,), (1800, 2))\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"### HYPERPARAM TUNING WITH RANDOM-SEARCH ###\\n\\nmodel = BoostSearch(\\n    regr_xgb, param_grid=param_dist,\\n    n_iter=8, sampling_seed=0\\n)\\nmodel.fit(X_regr_train, y_regr_train, eval_set=[(X_regr_valid, y_regr_valid)], early_stopping_rounds=6, verbose=0)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:49:52.529476Z\",\"iopub.execute_input\":\"2022-01-01T11:49:52.530097Z\",\"iopub.status.idle\":\"2022-01-01T11:50:03.018637Z\",\"shell.execute_reply.started\":\"2022-01-01T11:49:52.530066Z\",\"shell.execute_reply\":\"2022-01-01T11:50:03.017927Z\"},\"trusted\":true},\"execution_count\":7,\"outputs\":[{\"name\":\"stdout\",\"text\":\"\\n8 trials detected for ('learning_rate', 'num_leaves', 'max_depth')\\n\\ntrial: 0001 ### iterations: 00012 ### eval_score: 0.27616\\ntrial: 0002 ### iterations: 00056 ### eval_score: 0.26211\\ntrial: 0003 ### iterations: 00078 ### eval_score: 0.27603\\ntrial: 0004 ### iterations: 00045 ### eval_score: 0.26117\\ntrial: 0005 ### iterations: 00046 ### eval_score: 0.27868\\ntrial: 0006 ### iterations: 00035 ### eval_score: 0.27815\\ntrial: 0007 ### iterations: 00039 ### eval_score: 0.2753\\ntrial: 0008 ### iterations: 00016 ### eval_score: 0.28116\\n\",\"output_type\":\"stream\"},{\"execution_count\":7,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"BoostSearch(estimator=XGBRegressor(base_score=None, booster=None,\\n                                   colsample_bylevel=None,\\n                                   colsample_bynode=None, colsample_bytree=None,\\n                                   enable_categorical=False, gamma=None,\\n                                   gpu_id=None, importance_type=None,\\n                                   interaction_constraints=None,\\n                                   learning_rate=None, max_delta_step=None,\\n                                   max_depth=None, min_child_weight=None,\\n                                   missing=nan, monotone_constraints=None,\\n                                   n_estim...\\n                                   random_state=0, reg_alpha=None,\\n                                   reg_lambda=None, scale_pos_weight=None,\\n                                   subsample=None, tree_method=None,\\n                                   validate_parameters=None, verbosity=0),\\n            n_iter=8,\\n            param_grid={'learning_rate': <scipy.stats._distn_infrastructure.rv_frozen object at 0x7f5d29426490>,\\n                        'max_depth': [10, 12],\\n                        'num_leaves': <scipy.stats._distn_infrastructure.rv_frozen object at 0x7f5d29426710>},\\n            sampling_seed=0)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"model.estimator_, model.best_params_, model.best_score_\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:50:03.019747Z\",\"iopub.execute_input\":\"2022-01-01T11:50:03.020416Z\",\"iopub.status.idle\":\"2022-01-01T11:50:03.030730Z\",\"shell.execute_reply.started\":\"2022-01-01T11:50:03.020379Z\",\"shell.execute_reply\":\"2022-01-01T11:50:03.030065Z\"},\"trusted\":true},\"execution_count\":8,\"outputs\":[{\"execution_count\":8,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,\\n              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,\\n              gamma=0, gpu_id=-1, importance_type=None,\\n              interaction_constraints='', learning_rate=0.1669837381562427,\\n              max_delta_step=0, max_depth=10, min_child_weight=1, missing=nan,\\n              monotone_constraints='()', n_estimators=150, n_jobs=-1,\\n              num_leaves=25, num_parallel_tree=1, predictor='auto',\\n              random_state=0, reg_alpha=0, reg_lambda=1, scale_pos_weight=1,\\n              subsample=1, tree_method='exact', validate_parameters=1,\\n              verbosity=0),\\n {'learning_rate': 0.1669837381562427, 'num_leaves': 25, 'max_depth': 10},\\n 0.26117)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"(model.score(X_regr_valid, y_regr_valid), \\n model.predict(X_regr_valid).shape)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:50:03.032146Z\",\"iopub.execute_input\":\"2022-01-01T11:50:03.032612Z\",\"iopub.status.idle\":\"2022-01-01T11:50:03.058721Z\",\"shell.execute_reply.started\":\"2022-01-01T11:50:03.032572Z\",\"shell.execute_reply\":\"2022-01-01T11:50:03.058084Z\"},\"trusted\":true},\"execution_count\":9,\"outputs\":[{\"execution_count\":9,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(0.7271524639165458, (1800,))\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"### HYPERPARAM TUNING WITH HYPEROPT ###\\n\\nmodel = BoostSearch(\\n    regr_xgb, param_grid=param_dist_hyperopt,\\n    n_iter=8, sampling_seed=0\\n)\\nmodel.fit(\\n    X_regr_train, y_regr_train, trials=Trials(), \\n    eval_set=[(X_regr_valid, y_regr_valid)], early_stopping_rounds=6, verbose=0\\n)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:50:03.059800Z\",\"iopub.execute_input\":\"2022-01-01T11:50:03.062204Z\",\"iopub.status.idle\":\"2022-01-01T11:50:32.323625Z\",\"shell.execute_reply.started\":\"2022-01-01T11:50:03.062158Z\",\"shell.execute_reply\":\"2022-01-01T11:50:32.322789Z\"},\"trusted\":true},\"execution_count\":10,\"outputs\":[{\"name\":\"stdout\",\"text\":\"\\n8 trials detected for ('max_depth', 'learning_rate', 'colsample_bytree')\\n\\ntrial: 0001 ### iterations: 00149 ### eval_score: 0.27498\\ntrial: 0002 ### iterations: 00074 ### eval_score: 0.27186\\ntrial: 0003 ### iterations: 00038 ### eval_score: 0.28326\\ntrial: 0004 ### iterations: 00149 ### eval_score: 0.29455\\ntrial: 0005 ### iterations: 00149 ### eval_score: 0.28037\\ntrial: 0006 ### iterations: 00149 ### eval_score: 0.26421\\ntrial: 0007 ### iterations: 00052 ### eval_score: 0.27191\\ntrial: 0008 ### iterations: 00133 ### eval_score: 0.29251\\n\",\"output_type\":\"stream\"},{\"execution_count\":10,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"BoostSearch(estimator=XGBRegressor(base_score=None, booster=None,\\n                                   colsample_bylevel=None,\\n                                   colsample_bynode=None, colsample_bytree=None,\\n                                   enable_categorical=False, gamma=None,\\n                                   gpu_id=None, importance_type=None,\\n                                   interaction_constraints=None,\\n                                   learning_rate=None, max_delta_step=None,\\n                                   max_depth=None, min_child_weight=None,\\n                                   missing=nan, monotone_constraints=None,\\n                                   n_estim...\\n                                   random_state=0, reg_alpha=None,\\n                                   reg_lambda=None, scale_pos_weight=None,\\n                                   subsample=None, tree_method=None,\\n                                   validate_parameters=None, verbosity=0),\\n            n_iter=8,\\n            param_grid={'colsample_bytree': <hyperopt.pyll.base.Apply object at 0x7f5d29426890>,\\n                        'learning_rate': <hyperopt.pyll.base.Apply object at 0x7f5d29426a50>,\\n                        'max_depth': <hyperopt.pyll.base.Apply object at 0x7f5d29426790>},\\n            sampling_seed=0)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"model.estimator_, model.best_params_, model.best_score_\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:50:32.324994Z\",\"iopub.execute_input\":\"2022-01-01T11:50:32.325480Z\",\"iopub.status.idle\":\"2022-01-01T11:50:32.335828Z\",\"shell.execute_reply.started\":\"2022-01-01T11:50:32.325441Z\",\"shell.execute_reply\":\"2022-01-01T11:50:32.334970Z\"},\"trusted\":true},\"execution_count\":11,\"outputs\":[{\"execution_count\":11,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,\\n              colsample_bynode=1, colsample_bytree=0.7597292534356749,\\n              enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None,\\n              interaction_constraints='', learning_rate=0.059836658149176665,\\n              max_delta_step=0, max_depth=16, min_child_weight=1, missing=nan,\\n              monotone_constraints='()', n_estimators=150, n_jobs=-1,\\n              num_parallel_tree=1, predictor='auto', random_state=0, reg_alpha=0,\\n              reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact',\\n              validate_parameters=1, verbosity=0),\\n {'colsample_bytree': 0.7597292534356749,\\n  'learning_rate': 0.059836658149176665,\\n  'max_depth': 16},\\n 0.264211)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"(model.score(X_regr_valid, y_regr_valid), \\n model.predict(X_regr_valid).shape)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:50:32.337011Z\",\"iopub.execute_input\":\"2022-01-01T11:50:32.337395Z\",\"iopub.status.idle\":\"2022-01-01T11:50:32.370381Z\",\"shell.execute_reply.started\":\"2022-01-01T11:50:32.337369Z\",\"shell.execute_reply\":\"2022-01-01T11:50:32.369816Z\"},\"trusted\":true},\"execution_count\":12,\"outputs\":[{\"execution_count\":12,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(0.7207605727361562, (1800,))\"},\"metadata\":{}}]},{\"cell_type\":\"markdown\",\"source\":\"# Features Selection\",\"metadata\":{}},{\"cell_type\":\"code\",\"source\":\"### BORUTA ###\\n\\nmodel = BoostBoruta(clf_xgb, max_iter=200, perc=100)\\nmodel.fit(X_clf_train, y_clf_train, eval_set=[(X_clf_valid, y_clf_valid)], early_stopping_rounds=6, verbose=0)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:50:32.371634Z\",\"iopub.execute_input\":\"2022-01-01T11:50:32.372109Z\",\"iopub.status.idle\":\"2022-01-01T11:50:50.797541Z\",\"shell.execute_reply.started\":\"2022-01-01T11:50:32.372066Z\",\"shell.execute_reply\":\"2022-01-01T11:50:50.797059Z\"},\"trusted\":true},\"execution_count\":13,\"outputs\":[{\"execution_count\":13,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"BoostBoruta(estimator=XGBClassifier(base_score=None, booster=None,\\n                                    colsample_bylevel=None,\\n                                    colsample_bynode=None,\\n                                    colsample_bytree=None,\\n                                    enable_categorical=False, gamma=None,\\n                                    gpu_id=None, importance_type=None,\\n                                    interaction_constraints=None,\\n                                    learning_rate=None, max_delta_step=None,\\n                                    max_depth=None, min_child_weight=None,\\n                                    missing=nan, monotone_constraints=None,\\n                                    n_estimators=150, n_jobs=-1,\\n                                    num_parallel_tree=None, predictor=None,\\n                                    random_state=0, reg_alpha=None,\\n                                    reg_lambda=None, scale_pos_weight=None,\\n                                    subsample=None, tree_method=None,\\n                                    validate_parameters=None, verbosity=0),\\n            max_iter=200)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"model.estimator_, model.n_features_\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:50:50.800394Z\",\"iopub.execute_input\":\"2022-01-01T11:50:50.800795Z\",\"iopub.status.idle\":\"2022-01-01T11:50:50.809566Z\",\"shell.execute_reply.started\":\"2022-01-01T11:50:50.800767Z\",\"shell.execute_reply\":\"2022-01-01T11:50:50.808911Z\"},\"trusted\":true},\"execution_count\":14,\"outputs\":[{\"execution_count\":14,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,\\n               colsample_bynode=1, colsample_bytree=1, enable_categorical=False,\\n               gamma=0, gpu_id=-1, importance_type=None,\\n               interaction_constraints='', learning_rate=0.300000012,\\n               max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,\\n               monotone_constraints='()', n_estimators=150, n_jobs=-1,\\n               num_parallel_tree=1, predictor='auto', random_state=0,\\n               reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,\\n               tree_method='exact', validate_parameters=1, verbosity=0),\\n 11)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"(model.score(X_clf_valid, y_clf_valid), \\n model.predict(X_clf_valid).shape, \\n model.transform(X_clf_valid).shape,\\n model.predict_proba(X_clf_valid).shape)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:50:50.810633Z\",\"iopub.execute_input\":\"2022-01-01T11:50:50.811078Z\",\"iopub.status.idle\":\"2022-01-01T11:50:50.834426Z\",\"shell.execute_reply.started\":\"2022-01-01T11:50:50.811040Z\",\"shell.execute_reply\":\"2022-01-01T11:50:50.833776Z\"},\"trusted\":true},\"execution_count\":15,\"outputs\":[{\"execution_count\":15,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(0.9161111111111111, (1800,), (1800, 11), (1800, 2))\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"### RECURSIVE FEATURE ELIMINATION (RFE) ###\\n\\nmodel = BoostRFE(regr_xgb, min_features_to_select=1, step=1)\\nmodel.fit(X_regr_train, y_regr_train, eval_set=[(X_regr_valid, y_regr_valid)], early_stopping_rounds=6, verbose=0)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:50:50.835608Z\",\"iopub.execute_input\":\"2022-01-01T11:50:50.836142Z\",\"iopub.status.idle\":\"2022-01-01T11:50:58.558180Z\",\"shell.execute_reply.started\":\"2022-01-01T11:50:50.836100Z\",\"shell.execute_reply\":\"2022-01-01T11:50:58.557365Z\"},\"trusted\":true},\"execution_count\":16,\"outputs\":[{\"execution_count\":16,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"BoostRFE(estimator=XGBRegressor(base_score=None, booster=None,\\n                                colsample_bylevel=None, colsample_bynode=None,\\n                                colsample_bytree=None, enable_categorical=False,\\n                                gamma=None, gpu_id=None, importance_type=None,\\n                                interaction_constraints=None,\\n                                learning_rate=None, max_delta_step=None,\\n                                max_depth=None, min_child_weight=None,\\n                                missing=nan, monotone_constraints=None,\\n                                n_estimators=150, n_jobs=-1,\\n                                num_parallel_tree=None, predictor=None,\\n                                random_state=0, reg_alpha=None, reg_lambda=None,\\n                                scale_pos_weight=None, subsample=None,\\n                                tree_method=None, validate_parameters=None,\\n                                verbosity=0),\\n         min_features_to_select=1)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"model.estimator_, model.n_features_\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:50:58.559585Z\",\"iopub.execute_input\":\"2022-01-01T11:50:58.560110Z\",\"iopub.status.idle\":\"2022-01-01T11:50:58.569301Z\",\"shell.execute_reply.started\":\"2022-01-01T11:50:58.560048Z\",\"shell.execute_reply\":\"2022-01-01T11:50:58.568542Z\"},\"trusted\":true},\"execution_count\":17,\"outputs\":[{\"execution_count\":17,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,\\n              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,\\n              gamma=0, gpu_id=-1, importance_type=None,\\n              interaction_constraints='', learning_rate=0.300000012,\\n              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,\\n              monotone_constraints='()', n_estimators=150, n_jobs=-1,\\n              num_parallel_tree=1, predictor='auto', random_state=0, reg_alpha=0,\\n              reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact',\\n              validate_parameters=1, verbosity=0),\\n 7)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"(model.score(X_regr_valid, y_regr_valid), \\n model.predict(X_regr_valid).shape, \\n model.transform(X_regr_valid).shape)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:50:58.570558Z\",\"iopub.execute_input\":\"2022-01-01T11:50:58.570828Z\",\"iopub.status.idle\":\"2022-01-01T11:50:58.584624Z\",\"shell.execute_reply.started\":\"2022-01-01T11:50:58.570792Z\",\"shell.execute_reply\":\"2022-01-01T11:50:58.584081Z\"},\"trusted\":true},\"execution_count\":18,\"outputs\":[{\"execution_count\":18,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(0.7317444492376407, (1800,), (1800, 7))\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"### RECURSIVE FEATURE ADDITION (RFA) ###\\n\\nmodel = BoostRFA(regr_xgb, min_features_to_select=1, step=1)\\nmodel.fit(X_regr_train, y_regr_train, eval_set=[(X_regr_valid, y_regr_valid)], early_stopping_rounds=6, verbose=0)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:50:58.585749Z\",\"iopub.execute_input\":\"2022-01-01T11:50:58.586163Z\",\"iopub.status.idle\":\"2022-01-01T11:51:09.404587Z\",\"shell.execute_reply.started\":\"2022-01-01T11:50:58.586126Z\",\"shell.execute_reply\":\"2022-01-01T11:51:09.403781Z\"},\"trusted\":true},\"execution_count\":19,\"outputs\":[{\"execution_count\":19,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"BoostRFA(estimator=XGBRegressor(base_score=None, booster=None,\\n                                colsample_bylevel=None, colsample_bynode=None,\\n                                colsample_bytree=None, enable_categorical=False,\\n                                gamma=None, gpu_id=None, importance_type=None,\\n                                interaction_constraints=None,\\n                                learning_rate=None, max_delta_step=None,\\n                                max_depth=None, min_child_weight=None,\\n                                missing=nan, monotone_constraints=None,\\n                                n_estimators=150, n_jobs=-1,\\n                                num_parallel_tree=None, predictor=None,\\n                                random_state=0, reg_alpha=None, reg_lambda=None,\\n                                scale_pos_weight=None, subsample=None,\\n                                tree_method=None, validate_parameters=None,\\n                                verbosity=0),\\n         min_features_to_select=1)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"model.estimator_, model.n_features_\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:51:09.406057Z\",\"iopub.execute_input\":\"2022-01-01T11:51:09.406434Z\",\"iopub.status.idle\":\"2022-01-01T11:51:09.416068Z\",\"shell.execute_reply.started\":\"2022-01-01T11:51:09.406399Z\",\"shell.execute_reply\":\"2022-01-01T11:51:09.415411Z\"},\"trusted\":true},\"execution_count\":20,\"outputs\":[{\"execution_count\":20,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,\\n              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,\\n              gamma=0, gpu_id=-1, importance_type=None,\\n              interaction_constraints='', learning_rate=0.300000012,\\n              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,\\n              monotone_constraints='()', n_estimators=150, n_jobs=-1,\\n              num_parallel_tree=1, predictor='auto', random_state=0, reg_alpha=0,\\n              reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact',\\n              validate_parameters=1, verbosity=0),\\n 8)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"(model.score(X_regr_valid, y_regr_valid), \\n model.predict(X_regr_valid).shape, \\n model.transform(X_regr_valid).shape)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:51:09.417248Z\",\"iopub.execute_input\":\"2022-01-01T11:51:09.417698Z\",\"iopub.status.idle\":\"2022-01-01T11:51:09.450280Z\",\"shell.execute_reply.started\":\"2022-01-01T11:51:09.417657Z\",\"shell.execute_reply\":\"2022-01-01T11:51:09.449664Z\"},\"trusted\":true},\"execution_count\":21,\"outputs\":[{\"execution_count\":21,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(0.7274037362877257, (1800,), (1800, 8))\"},\"metadata\":{}}]},{\"cell_type\":\"markdown\",\"source\":\"# Features Selection with SHAP\",\"metadata\":{}},{\"cell_type\":\"code\",\"source\":\"### BORUTA SHAP ###\\n\\nmodel = BoostBoruta(\\n    clf_xgb, max_iter=200, perc=100,\\n    importance_type='shap_importances', train_importance=False\\n)\\nmodel.fit(X_clf_train, y_clf_train, eval_set=[(X_clf_valid, y_clf_valid)], early_stopping_rounds=6, verbose=0)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:51:09.451169Z\",\"iopub.execute_input\":\"2022-01-01T11:51:09.451507Z\",\"iopub.status.idle\":\"2022-01-01T11:51:33.925757Z\",\"shell.execute_reply.started\":\"2022-01-01T11:51:09.451482Z\",\"shell.execute_reply\":\"2022-01-01T11:51:33.925076Z\"},\"trusted\":true},\"execution_count\":22,\"outputs\":[{\"execution_count\":22,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"BoostBoruta(estimator=XGBClassifier(base_score=None, booster=None,\\n                                    colsample_bylevel=None,\\n                                    colsample_bynode=None,\\n                                    colsample_bytree=None,\\n                                    enable_categorical=False, gamma=None,\\n                                    gpu_id=None, importance_type=None,\\n                                    interaction_constraints=None,\\n                                    learning_rate=None, max_delta_step=None,\\n                                    max_depth=None, min_child_weight=None,\\n                                    missing=nan, monotone_constraints=None,\\n                                    n_estimators=150, n_jobs=-1,\\n                                    num_parallel_tree=None, predictor=None,\\n                                    random_state=0, reg_alpha=None,\\n                                    reg_lambda=None, scale_pos_weight=None,\\n                                    subsample=None, tree_method=None,\\n                                    validate_parameters=None, verbosity=0),\\n            importance_type='shap_importances', max_iter=200,\\n            train_importance=False)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"model.estimator_, model.n_features_\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:51:33.926762Z\",\"iopub.execute_input\":\"2022-01-01T11:51:33.926940Z\",\"iopub.status.idle\":\"2022-01-01T11:51:33.934907Z\",\"shell.execute_reply.started\":\"2022-01-01T11:51:33.926918Z\",\"shell.execute_reply\":\"2022-01-01T11:51:33.934315Z\"},\"trusted\":true},\"execution_count\":23,\"outputs\":[{\"execution_count\":23,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,\\n               colsample_bynode=1, colsample_bytree=1, enable_categorical=False,\\n               gamma=0, gpu_id=-1, importance_type=None,\\n               interaction_constraints='', learning_rate=0.300000012,\\n               max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,\\n               monotone_constraints='()', n_estimators=150, n_jobs=-1,\\n               num_parallel_tree=1, predictor='auto', random_state=0,\\n               reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,\\n               tree_method='exact', validate_parameters=1, verbosity=0),\\n 10)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"(model.score(X_clf_valid, y_clf_valid), \\n model.predict(X_clf_valid).shape, \\n model.transform(X_clf_valid).shape,\\n model.predict_proba(X_clf_valid).shape)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:51:33.935950Z\",\"iopub.execute_input\":\"2022-01-01T11:51:33.936419Z\",\"iopub.status.idle\":\"2022-01-01T11:51:33.961319Z\",\"shell.execute_reply.started\":\"2022-01-01T11:51:33.936381Z\",\"shell.execute_reply\":\"2022-01-01T11:51:33.960533Z\"},\"trusted\":true},\"execution_count\":24,\"outputs\":[{\"execution_count\":24,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(0.91, (1800,), (1800, 10), (1800, 2))\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"### RECURSIVE FEATURE ELIMINATION (RFE) SHAP ###\\n\\nmodel = BoostRFE(\\n    regr_xgb, min_features_to_select=1, step=1,\\n    importance_type='shap_importances', train_importance=False\\n)\\nmodel.fit(X_regr_train, y_regr_train, eval_set=[(X_regr_valid, y_regr_valid)], early_stopping_rounds=6, verbose=0)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:51:33.962369Z\",\"iopub.execute_input\":\"2022-01-01T11:51:33.962555Z\",\"iopub.status.idle\":\"2022-01-01T11:51:47.059712Z\",\"shell.execute_reply.started\":\"2022-01-01T11:51:33.962532Z\",\"shell.execute_reply\":\"2022-01-01T11:51:47.058892Z\"},\"trusted\":true},\"execution_count\":25,\"outputs\":[{\"execution_count\":25,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"BoostRFE(estimator=XGBRegressor(base_score=None, booster=None,\\n                                colsample_bylevel=None, colsample_bynode=None,\\n                                colsample_bytree=None, enable_categorical=False,\\n                                gamma=None, gpu_id=None, importance_type=None,\\n                                interaction_constraints=None,\\n                                learning_rate=None, max_delta_step=None,\\n                                max_depth=None, min_child_weight=None,\\n                                missing=nan, monotone_constraints=None,\\n                                n_estimators=150, n_jobs=-1,\\n                                num_parallel_tree=None, predictor=None,\\n                                random_state=0, reg_alpha=None, reg_lambda=None,\\n                                scale_pos_weight=None, subsample=None,\\n                                tree_method=None, validate_parameters=None,\\n                                verbosity=0),\\n         importance_type='shap_importances', min_features_to_select=1,\\n         train_importance=False)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"model.estimator_, model.n_features_\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:51:47.060847Z\",\"iopub.execute_input\":\"2022-01-01T11:51:47.061090Z\",\"iopub.status.idle\":\"2022-01-01T11:51:47.069229Z\",\"shell.execute_reply.started\":\"2022-01-01T11:51:47.061061Z\",\"shell.execute_reply\":\"2022-01-01T11:51:47.068462Z\"},\"trusted\":true},\"execution_count\":26,\"outputs\":[{\"execution_count\":26,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,\\n              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,\\n              gamma=0, gpu_id=-1, importance_type=None,\\n              interaction_constraints='', learning_rate=0.300000012,\\n              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,\\n              monotone_constraints='()', n_estimators=150, n_jobs=-1,\\n              num_parallel_tree=1, predictor='auto', random_state=0, reg_alpha=0,\\n              reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact',\\n              validate_parameters=1, verbosity=0),\\n 7)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"(model.score(X_regr_valid, y_regr_valid), \\n model.predict(X_regr_valid).shape, \\n model.transform(X_regr_valid).shape)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:51:47.070353Z\",\"iopub.execute_input\":\"2022-01-01T11:51:47.071217Z\",\"iopub.status.idle\":\"2022-01-01T11:51:47.087333Z\",\"shell.execute_reply.started\":\"2022-01-01T11:51:47.071168Z\",\"shell.execute_reply\":\"2022-01-01T11:51:47.086754Z\"},\"trusted\":true},\"execution_count\":27,\"outputs\":[{\"execution_count\":27,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(0.7317444492376407, (1800,), (1800, 7))\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"### RECURSIVE FEATURE ADDITION (RFA) SHAP ###\\n\\nmodel = BoostRFA(\\n    regr_xgb, min_features_to_select=1, step=1,\\n    importance_type='shap_importances', train_importance=False\\n)\\nmodel.fit(\\n    X_regr_train, y_regr_train, trials=Trials(), \\n    eval_set=[(X_regr_valid, y_regr_valid)], early_stopping_rounds=6, verbose=0\\n)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:51:47.088455Z\",\"iopub.execute_input\":\"2022-01-01T11:51:47.088921Z\",\"iopub.status.idle\":\"2022-01-01T11:51:59.186202Z\",\"shell.execute_reply.started\":\"2022-01-01T11:51:47.088885Z\",\"shell.execute_reply\":\"2022-01-01T11:51:59.185431Z\"},\"trusted\":true},\"execution_count\":28,\"outputs\":[{\"execution_count\":28,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"BoostRFA(estimator=XGBRegressor(base_score=None, booster=None,\\n                                colsample_bylevel=None, colsample_bynode=None,\\n                                colsample_bytree=None, enable_categorical=False,\\n                                gamma=None, gpu_id=None, importance_type=None,\\n                                interaction_constraints=None,\\n                                learning_rate=None, max_delta_step=None,\\n                                max_depth=None, min_child_weight=None,\\n                                missing=nan, monotone_constraints=None,\\n                                n_estimators=150, n_jobs=-1,\\n                                num_parallel_tree=None, predictor=None,\\n                                random_state=0, reg_alpha=None, reg_lambda=None,\\n                                scale_pos_weight=None, subsample=None,\\n                                tree_method=None, validate_parameters=None,\\n                                verbosity=0),\\n         importance_type='shap_importances', min_features_to_select=1,\\n         train_importance=False)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"model.estimator_, model.n_features_\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:51:59.187276Z\",\"iopub.execute_input\":\"2022-01-01T11:51:59.188081Z\",\"iopub.status.idle\":\"2022-01-01T11:51:59.199276Z\",\"shell.execute_reply.started\":\"2022-01-01T11:51:59.188004Z\",\"shell.execute_reply\":\"2022-01-01T11:51:59.198325Z\"},\"trusted\":true},\"execution_count\":29,\"outputs\":[{\"execution_count\":29,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,\\n              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,\\n              gamma=0, gpu_id=-1, importance_type=None,\\n              interaction_constraints='', learning_rate=0.300000012,\\n              max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,\\n              monotone_constraints='()', n_estimators=150, n_jobs=-1,\\n              num_parallel_tree=1, predictor='auto', random_state=0, reg_alpha=0,\\n              reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact',\\n              validate_parameters=1, verbosity=0),\\n 9)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"(model.score(X_regr_valid, y_regr_valid), \\n model.predict(X_regr_valid).shape, \\n model.transform(X_regr_valid).shape)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:51:59.200366Z\",\"iopub.execute_input\":\"2022-01-01T11:51:59.200640Z\",\"iopub.status.idle\":\"2022-01-01T11:51:59.222774Z\",\"shell.execute_reply.started\":\"2022-01-01T11:51:59.200592Z\",\"shell.execute_reply\":\"2022-01-01T11:51:59.222078Z\"},\"trusted\":true},\"execution_count\":30,\"outputs\":[{\"execution_count\":30,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(0.7249664284333042, (1800,), (1800, 9))\"},\"metadata\":{}}]},{\"cell_type\":\"markdown\",\"source\":\"# Hyperparameters Tuning + Features Selection\",\"metadata\":{}},{\"cell_type\":\"code\",\"source\":\"### HYPERPARAM TUNING WITH GRID-SEARCH + BORUTA ###\\n\\nmodel = BoostBoruta(clf_xgb, param_grid=param_grid, max_iter=200, perc=100)\\nmodel.fit(X_clf_train, y_clf_train, eval_set=[(X_clf_valid, y_clf_valid)], early_stopping_rounds=6, verbose=0)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T11:51:59.224176Z\",\"iopub.execute_input\":\"2022-01-01T11:51:59.224707Z\",\"iopub.status.idle\":\"2022-01-01T12:14:09.045290Z\",\"shell.execute_reply.started\":\"2022-01-01T11:51:59.224667Z\",\"shell.execute_reply\":\"2022-01-01T12:14:09.044649Z\"},\"trusted\":true},\"execution_count\":31,\"outputs\":[{\"name\":\"stdout\",\"text\":\"\\n8 trials detected for ('learning_rate', 'num_leaves', 'max_depth')\\n\\ntrial: 0001 ### iterations: 00026 ### eval_score: 0.20001\\ntrial: 0002 ### iterations: 00022 ### eval_score: 0.20348\\ntrial: 0003 ### iterations: 00026 ### eval_score: 0.20001\\ntrial: 0004 ### iterations: 00022 ### eval_score: 0.20348\\ntrial: 0005 ### iterations: 00048 ### eval_score: 0.19925\\ntrial: 0006 ### iterations: 00052 ### eval_score: 0.20307\\ntrial: 0007 ### iterations: 00048 ### eval_score: 0.19925\\ntrial: 0008 ### iterations: 00052 ### eval_score: 0.20307\\n\",\"output_type\":\"stream\"},{\"execution_count\":31,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"BoostBoruta(estimator=XGBClassifier(base_score=None, booster=None,\\n                                    colsample_bylevel=None,\\n                                    colsample_bynode=None,\\n                                    colsample_bytree=None,\\n                                    enable_categorical=False, gamma=None,\\n                                    gpu_id=None, importance_type=None,\\n                                    interaction_constraints=None,\\n                                    learning_rate=None, max_delta_step=None,\\n                                    max_depth=None, min_child_weight=None,\\n                                    missing=nan, monotone_constraints=None,\\n                                    n_estimators=150, n_jobs=-1,\\n                                    num_parallel_tree=None, predictor=None,\\n                                    random_state=0, reg_alpha=None,\\n                                    reg_lambda=None, scale_pos_weight=None,\\n                                    subsample=None, tree_method=None,\\n                                    validate_parameters=None, verbosity=0),\\n            max_iter=200,\\n            param_grid={'learning_rate': [0.2, 0.1], 'max_depth': [10, 12],\\n                        'num_leaves': [25, 35]})\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"model.estimator_, model.best_params_, model.best_score_, model.n_features_\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T12:14:09.046490Z\",\"iopub.execute_input\":\"2022-01-01T12:14:09.047104Z\",\"iopub.status.idle\":\"2022-01-01T12:14:09.056559Z\",\"shell.execute_reply.started\":\"2022-01-01T12:14:09.047070Z\",\"shell.execute_reply\":\"2022-01-01T12:14:09.056076Z\"},\"trusted\":true},\"execution_count\":32,\"outputs\":[{\"execution_count\":32,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,\\n               colsample_bynode=1, colsample_bytree=1, enable_categorical=False,\\n               gamma=0, gpu_id=-1, importance_type=None,\\n               interaction_constraints='', learning_rate=0.1, max_delta_step=0,\\n               max_depth=10, min_child_weight=1, missing=nan,\\n               monotone_constraints='()', n_estimators=150, n_jobs=-1,\\n               num_leaves=25, num_parallel_tree=1, predictor='auto',\\n               random_state=0, reg_alpha=0, reg_lambda=1, scale_pos_weight=1,\\n               subsample=1, tree_method='exact', validate_parameters=1,\\n               verbosity=0),\\n {'learning_rate': 0.1, 'num_leaves': 25, 'max_depth': 10},\\n 0.199248,\\n 11)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"(model.score(X_clf_valid, y_clf_valid), \\n model.predict(X_clf_valid).shape, \\n model.transform(X_clf_valid).shape,\\n model.predict_proba(X_clf_valid).shape)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T12:14:09.057462Z\",\"iopub.execute_input\":\"2022-01-01T12:14:09.057740Z\",\"iopub.status.idle\":\"2022-01-01T12:14:09.086612Z\",\"shell.execute_reply.started\":\"2022-01-01T12:14:09.057716Z\",\"shell.execute_reply\":\"2022-01-01T12:14:09.085920Z\"},\"trusted\":true},\"execution_count\":33,\"outputs\":[{\"execution_count\":33,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(0.9144444444444444, (1800,), (1800, 11), (1800, 2))\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"### HYPERPARAM TUNING WITH RANDOM-SEARCH + RECURSIVE FEATURE ELIMINATION (RFE) ###\\n\\nmodel = BoostRFE(\\n    regr_xgb, param_grid=param_dist, min_features_to_select=1, step=1,\\n    n_iter=8, sampling_seed=0\\n)\\nmodel.fit(X_regr_train, y_regr_train, eval_set=[(X_regr_valid, y_regr_valid)], early_stopping_rounds=6, verbose=0)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T12:14:09.087595Z\",\"iopub.execute_input\":\"2022-01-01T12:14:09.087798Z\",\"iopub.status.idle\":\"2022-01-01T12:16:42.203604Z\",\"shell.execute_reply.started\":\"2022-01-01T12:14:09.087772Z\",\"shell.execute_reply\":\"2022-01-01T12:16:42.202743Z\"},\"trusted\":true},\"execution_count\":34,\"outputs\":[{\"name\":\"stdout\",\"text\":\"\\n8 trials detected for ('learning_rate', 'num_leaves', 'max_depth')\\n\\ntrial: 0001 ### iterations: 00021 ### eval_score: 0.25941\\ntrial: 0002 ### iterations: 00077 ### eval_score: 0.25055\\ntrial: 0003 ### iterations: 00086 ### eval_score: 0.25676\\ntrial: 0004 ### iterations: 00098 ### eval_score: 0.25383\\ntrial: 0005 ### iterations: 00050 ### eval_score: 0.25751\\ntrial: 0006 ### iterations: 00028 ### eval_score: 0.26007\\ntrial: 0007 ### iterations: 00084 ### eval_score: 0.2603\\ntrial: 0008 ### iterations: 00024 ### eval_score: 0.26278\\n\",\"output_type\":\"stream\"},{\"execution_count\":34,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"BoostRFE(estimator=XGBRegressor(base_score=None, booster=None,\\n                                colsample_bylevel=None, colsample_bynode=None,\\n                                colsample_bytree=None, enable_categorical=False,\\n                                gamma=None, gpu_id=None, importance_type=None,\\n                                interaction_constraints=None,\\n                                learning_rate=None, max_delta_step=None,\\n                                max_depth=None, min_child_weight=None,\\n                                missing=nan, monotone_constraints=None,\\n                                n_estimato...\\n                                random_state=0, reg_alpha=None, reg_lambda=None,\\n                                scale_pos_weight=None, subsample=None,\\n                                tree_method=None, validate_parameters=None,\\n                                verbosity=0),\\n         min_features_to_select=1, n_iter=8,\\n         param_grid={'learning_rate': <scipy.stats._distn_infrastructure.rv_frozen object at 0x7f5d29426490>,\\n                     'max_depth': [10, 12],\\n                     'num_leaves': <scipy.stats._distn_infrastructure.rv_frozen object at 0x7f5d29426710>},\\n         sampling_seed=0)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"model.estimator_, model.best_params_, model.best_score_, model.n_features_\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T12:16:42.205176Z\",\"iopub.execute_input\":\"2022-01-01T12:16:42.205439Z\",\"iopub.status.idle\":\"2022-01-01T12:16:42.215355Z\",\"shell.execute_reply.started\":\"2022-01-01T12:16:42.205404Z\",\"shell.execute_reply\":\"2022-01-01T12:16:42.214732Z\"},\"trusted\":true},\"execution_count\":35,\"outputs\":[{\"execution_count\":35,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,\\n              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,\\n              gamma=0, gpu_id=-1, importance_type=None,\\n              interaction_constraints='', learning_rate=0.1350674222191923,\\n              max_delta_step=0, max_depth=10, min_child_weight=1, missing=nan,\\n              monotone_constraints='()', n_estimators=150, n_jobs=-1,\\n              num_leaves=38, num_parallel_tree=1, predictor='auto',\\n              random_state=0, reg_alpha=0, reg_lambda=1, scale_pos_weight=1,\\n              subsample=1, tree_method='exact', validate_parameters=1,\\n              verbosity=0),\\n {'learning_rate': 0.1350674222191923, 'num_leaves': 38, 'max_depth': 10},\\n 0.250552,\\n 10)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"(model.score(X_regr_valid, y_regr_valid), \\n model.predict(X_regr_valid).shape, \\n model.transform(X_regr_valid).shape)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T12:16:42.216398Z\",\"iopub.execute_input\":\"2022-01-01T12:16:42.216642Z\",\"iopub.status.idle\":\"2022-01-01T12:16:42.242381Z\",\"shell.execute_reply.started\":\"2022-01-01T12:16:42.216606Z\",\"shell.execute_reply\":\"2022-01-01T12:16:42.241879Z\"},\"trusted\":true},\"execution_count\":36,\"outputs\":[{\"execution_count\":36,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(0.7488873349293266, (1800,), (1800, 10))\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"### HYPERPARAM TUNING WITH HYPEROPT + RECURSIVE FEATURE ADDITION (RFA) ###\\n\\nmodel = BoostRFA(\\n    regr_xgb, param_grid=param_dist_hyperopt, min_features_to_select=1, step=1,\\n    n_iter=8, sampling_seed=0\\n)\\nmodel.fit(\\n    X_regr_train, y_regr_train, trials=Trials(), \\n    eval_set=[(X_regr_valid, y_regr_valid)], early_stopping_rounds=6, verbose=0\\n)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T12:16:42.245655Z\",\"iopub.execute_input\":\"2022-01-01T12:16:42.247219Z\",\"iopub.status.idle\":\"2022-01-01T12:26:08.685124Z\",\"shell.execute_reply.started\":\"2022-01-01T12:16:42.247188Z\",\"shell.execute_reply\":\"2022-01-01T12:26:08.684364Z\"},\"trusted\":true},\"execution_count\":37,\"outputs\":[{\"name\":\"stdout\",\"text\":\"\\n8 trials detected for ('max_depth', 'learning_rate', 'colsample_bytree')\\n\\ntrial: 0001 ### iterations: 00149 ### eval_score: 0.26412\\ntrial: 0002 ### iterations: 00080 ### eval_score: 0.25357\\ntrial: 0003 ### iterations: 00054 ### eval_score: 0.26123\\ntrial: 0004 ### iterations: 00149 ### eval_score: 0.2801\\ntrial: 0005 ### iterations: 00149 ### eval_score: 0.27046\\ntrial: 0006 ### iterations: 00149 ### eval_score: 0.24789\\ntrial: 0007 ### iterations: 00054 ### eval_score: 0.25928\\ntrial: 0008 ### iterations: 00140 ### eval_score: 0.27284\\n\",\"output_type\":\"stream\"},{\"execution_count\":37,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"BoostRFA(estimator=XGBRegressor(base_score=None, booster=None,\\n                                colsample_bylevel=None, colsample_bynode=None,\\n                                colsample_bytree=None, enable_categorical=False,\\n                                gamma=None, gpu_id=None, importance_type=None,\\n                                interaction_constraints=None,\\n                                learning_rate=None, max_delta_step=None,\\n                                max_depth=None, min_child_weight=None,\\n                                missing=nan, monotone_constraints=None,\\n                                n_estimato...\\n                                random_state=0, reg_alpha=None, reg_lambda=None,\\n                                scale_pos_weight=None, subsample=None,\\n                                tree_method=None, validate_parameters=None,\\n                                verbosity=0),\\n         min_features_to_select=1, n_iter=8,\\n         param_grid={'colsample_bytree': <hyperopt.pyll.base.Apply object at 0x7f5d29426890>,\\n                     'learning_rate': <hyperopt.pyll.base.Apply object at 0x7f5d29426a50>,\\n                     'max_depth': <hyperopt.pyll.base.Apply object at 0x7f5d29426790>},\\n         sampling_seed=0)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"model.estimator_, model.best_params_, model.best_score_, model.n_features_\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T12:26:08.686184Z\",\"iopub.execute_input\":\"2022-01-01T12:26:08.686931Z\",\"iopub.status.idle\":\"2022-01-01T12:26:08.696854Z\",\"shell.execute_reply.started\":\"2022-01-01T12:26:08.686898Z\",\"shell.execute_reply\":\"2022-01-01T12:26:08.696004Z\"},\"trusted\":true},\"execution_count\":38,\"outputs\":[{\"execution_count\":38,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,\\n              colsample_bynode=1, colsample_bytree=0.7597292534356749,\\n              enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None,\\n              interaction_constraints='', learning_rate=0.059836658149176665,\\n              max_delta_step=0, max_depth=16, min_child_weight=1, missing=nan,\\n              monotone_constraints='()', n_estimators=150, n_jobs=-1,\\n              num_parallel_tree=1, predictor='auto', random_state=0, reg_alpha=0,\\n              reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact',\\n              validate_parameters=1, verbosity=0),\\n {'colsample_bytree': 0.7597292534356749,\\n  'learning_rate': 0.059836658149176665,\\n  'max_depth': 16},\\n 0.247887,\\n 8)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"(model.score(X_regr_valid, y_regr_valid), \\n model.predict(X_regr_valid).shape, \\n model.transform(X_regr_valid).shape)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T12:26:08.697934Z\",\"iopub.execute_input\":\"2022-01-01T12:26:08.698155Z\",\"iopub.status.idle\":\"2022-01-01T12:26:08.736781Z\",\"shell.execute_reply.started\":\"2022-01-01T12:26:08.698128Z\",\"shell.execute_reply\":\"2022-01-01T12:26:08.736145Z\"},\"trusted\":true},\"execution_count\":39,\"outputs\":[{\"execution_count\":39,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(0.7542006308661441, (1800,), (1800, 8))\"},\"metadata\":{}}]},{\"cell_type\":\"markdown\",\"source\":\"# Hyperparameters Tuning + Features Selection with SHAP\",\"metadata\":{}},{\"cell_type\":\"code\",\"source\":\"### HYPERPARAM TUNING WITH GRID-SEARCH + BORUTA SHAP ###\\n\\nmodel = BoostBoruta(\\n    clf_xgb, param_grid=param_grid, max_iter=200, perc=100,\\n    importance_type='shap_importances', train_importance=False\\n)\\nmodel.fit(X_clf_train, y_clf_train, eval_set=[(X_clf_valid, y_clf_valid)], early_stopping_rounds=6, verbose=0)\",\"metadata\":{\"scrolled\":true,\"execution\":{\"iopub.status.busy\":\"2022-01-01T12:26:08.740222Z\",\"iopub.execute_input\":\"2022-01-01T12:26:08.741848Z\",\"iopub.status.idle\":\"2022-01-01T12:56:13.612807Z\",\"shell.execute_reply.started\":\"2022-01-01T12:26:08.741813Z\",\"shell.execute_reply\":\"2022-01-01T12:56:13.611991Z\"},\"trusted\":true},\"execution_count\":40,\"outputs\":[{\"name\":\"stdout\",\"text\":\"\\n8 trials detected for ('learning_rate', 'num_leaves', 'max_depth')\\n\\ntrial: 0001 ### iterations: 00024 ### eval_score: 0.20151\\ntrial: 0002 ### iterations: 00020 ### eval_score: 0.20877\\ntrial: 0003 ### iterations: 00024 ### eval_score: 0.20151\\ntrial: 0004 ### iterations: 00020 ### eval_score: 0.20877\\ntrial: 0005 ### iterations: 00048 ### eval_score: 0.20401\\ntrial: 0006 ### iterations: 00048 ### eval_score: 0.20575\\ntrial: 0007 ### iterations: 00048 ### eval_score: 0.20401\\ntrial: 0008 ### iterations: 00048 ### eval_score: 0.20575\\n\",\"output_type\":\"stream\"},{\"execution_count\":40,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"BoostBoruta(estimator=XGBClassifier(base_score=None, booster=None,\\n                                    colsample_bylevel=None,\\n                                    colsample_bynode=None,\\n                                    colsample_bytree=None,\\n                                    enable_categorical=False, gamma=None,\\n                                    gpu_id=None, importance_type=None,\\n                                    interaction_constraints=None,\\n                                    learning_rate=None, max_delta_step=None,\\n                                    max_depth=None, min_child_weight=None,\\n                                    missing=nan, monotone_constraints=None,\\n                                    n_estimators=150, n_jobs=-1,\\n                                    num_parallel_tree=None, predictor=None,\\n                                    random_state=0, reg_alpha=None,\\n                                    reg_lambda=None, scale_pos_weight=None,\\n                                    subsample=None, tree_method=None,\\n                                    validate_parameters=None, verbosity=0),\\n            importance_type='shap_importances', max_iter=200,\\n            param_grid={'learning_rate': [0.2, 0.1], 'max_depth': [10, 12],\\n                        'num_leaves': [25, 35]},\\n            train_importance=False)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"model.estimator_, model.best_params_, model.best_score_, model.n_features_\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T12:56:13.617168Z\",\"iopub.execute_input\":\"2022-01-01T12:56:13.617372Z\",\"iopub.status.idle\":\"2022-01-01T12:56:13.626563Z\",\"shell.execute_reply.started\":\"2022-01-01T12:56:13.617349Z\",\"shell.execute_reply\":\"2022-01-01T12:56:13.626036Z\"},\"trusted\":true},\"execution_count\":41,\"outputs\":[{\"execution_count\":41,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,\\n               colsample_bynode=1, colsample_bytree=1, enable_categorical=False,\\n               gamma=0, gpu_id=-1, importance_type=None,\\n               interaction_constraints='', learning_rate=0.2, max_delta_step=0,\\n               max_depth=10, min_child_weight=1, missing=nan,\\n               monotone_constraints='()', n_estimators=150, n_jobs=-1,\\n               num_leaves=25, num_parallel_tree=1, predictor='auto',\\n               random_state=0, reg_alpha=0, reg_lambda=1, scale_pos_weight=1,\\n               subsample=1, tree_method='exact', validate_parameters=1,\\n               verbosity=0),\\n {'learning_rate': 0.2, 'num_leaves': 25, 'max_depth': 10},\\n 0.201509,\\n 10)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"(model.score(X_clf_valid, y_clf_valid), \\n model.predict(X_clf_valid).shape, \\n model.transform(X_clf_valid).shape,\\n model.predict_proba(X_clf_valid).shape)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T12:56:13.627454Z\",\"iopub.execute_input\":\"2022-01-01T12:56:13.627825Z\",\"iopub.status.idle\":\"2022-01-01T12:56:13.665907Z\",\"shell.execute_reply.started\":\"2022-01-01T12:56:13.627797Z\",\"shell.execute_reply\":\"2022-01-01T12:56:13.664686Z\"},\"trusted\":true},\"execution_count\":42,\"outputs\":[{\"execution_count\":42,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(0.9144444444444444, (1800,), (1800, 10), (1800, 2))\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"### HYPERPARAM TUNING WITH RANDOM-SEARCH + RECURSIVE FEATURE ELIMINATION (RFE) SHAP ###\\n\\nmodel = BoostRFE(\\n    regr_xgb, param_grid=param_dist, min_features_to_select=1, step=1,\\n    n_iter=8, sampling_seed=0,\\n    importance_type='shap_importances', train_importance=False\\n)\\nmodel.fit(X_regr_train, y_regr_train, eval_set=[(X_regr_valid, y_regr_valid)], early_stopping_rounds=6, verbose=0)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T12:56:13.667149Z\",\"iopub.execute_input\":\"2022-01-01T12:56:13.667539Z\",\"iopub.status.idle\":\"2022-01-01T13:08:38.854835Z\",\"shell.execute_reply.started\":\"2022-01-01T12:56:13.667509Z\",\"shell.execute_reply\":\"2022-01-01T13:08:38.854142Z\"},\"trusted\":true},\"execution_count\":43,\"outputs\":[{\"name\":\"stdout\",\"text\":\"\\n8 trials detected for ('learning_rate', 'num_leaves', 'max_depth')\\n\\ntrial: 0001 ### iterations: 00021 ### eval_score: 0.25941\\ntrial: 0002 ### iterations: 00064 ### eval_score: 0.25075\\ntrial: 0003 ### iterations: 00075 ### eval_score: 0.25493\\ntrial: 0004 ### iterations: 00084 ### eval_score: 0.25002\\ntrial: 0005 ### iterations: 00093 ### eval_score: 0.25609\\ntrial: 0006 ### iterations: 00039 ### eval_score: 0.2573\\ntrial: 0007 ### iterations: 00074 ### eval_score: 0.25348\\ntrial: 0008 ### iterations: 00032 ### eval_score: 0.2583\\n\",\"output_type\":\"stream\"},{\"execution_count\":43,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"BoostRFE(estimator=XGBRegressor(base_score=None, booster=None,\\n                                colsample_bylevel=None, colsample_bynode=None,\\n                                colsample_bytree=None, enable_categorical=False,\\n                                gamma=None, gpu_id=None, importance_type=None,\\n                                interaction_constraints=None,\\n                                learning_rate=None, max_delta_step=None,\\n                                max_depth=None, min_child_weight=None,\\n                                missing=nan, monotone_constraints=None,\\n                                n_estimato...\\n                                tree_method=None, validate_parameters=None,\\n                                verbosity=0),\\n         importance_type='shap_importances', min_features_to_select=1, n_iter=8,\\n         param_grid={'learning_rate': <scipy.stats._distn_infrastructure.rv_frozen object at 0x7f5d29426490>,\\n                     'max_depth': [10, 12],\\n                     'num_leaves': <scipy.stats._distn_infrastructure.rv_frozen object at 0x7f5d29426710>},\\n         sampling_seed=0, train_importance=False)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"model.estimator_, model.best_params_, model.best_score_, model.n_features_\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T13:08:38.855807Z\",\"iopub.execute_input\":\"2022-01-01T13:08:38.856007Z\",\"iopub.status.idle\":\"2022-01-01T13:08:38.866421Z\",\"shell.execute_reply.started\":\"2022-01-01T13:08:38.855982Z\",\"shell.execute_reply\":\"2022-01-01T13:08:38.865771Z\"},\"trusted\":true},\"execution_count\":44,\"outputs\":[{\"execution_count\":44,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,\\n              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,\\n              gamma=0, gpu_id=-1, importance_type=None,\\n              interaction_constraints='', learning_rate=0.1669837381562427,\\n              max_delta_step=0, max_depth=10, min_child_weight=1, missing=nan,\\n              monotone_constraints='()', n_estimators=150, n_jobs=-1,\\n              num_leaves=25, num_parallel_tree=1, predictor='auto',\\n              random_state=0, reg_alpha=0, reg_lambda=1, scale_pos_weight=1,\\n              subsample=1, tree_method='exact', validate_parameters=1,\\n              verbosity=0),\\n {'learning_rate': 0.1669837381562427, 'num_leaves': 25, 'max_depth': 10},\\n 0.250021,\\n 11)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"(model.score(X_regr_valid, y_regr_valid), \\n model.predict(X_regr_valid).shape, \\n model.transform(X_regr_valid).shape)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T13:08:38.867249Z\",\"iopub.execute_input\":\"2022-01-01T13:08:38.867888Z\",\"iopub.status.idle\":\"2022-01-01T13:08:38.887178Z\",\"shell.execute_reply.started\":\"2022-01-01T13:08:38.867860Z\",\"shell.execute_reply\":\"2022-01-01T13:08:38.886666Z\"},\"trusted\":true},\"execution_count\":45,\"outputs\":[{\"execution_count\":45,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(0.7499501426259738, (1800,), (1800, 11))\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"### HYPERPARAM TUNING WITH HYPEROPT + RECURSIVE FEATURE ADDITION (RFA) SHAP ###\\n\\nmodel = BoostRFA(\\n    regr_xgb, param_grid=param_dist_hyperopt, min_features_to_select=1, step=1,\\n    n_iter=8, sampling_seed=0,\\n    importance_type='shap_importances', train_importance=False\\n)\\nmodel.fit(\\n    X_regr_train, y_regr_train, trials=Trials(), \\n    eval_set=[(X_regr_valid, y_regr_valid)], early_stopping_rounds=6, verbose=0\\n)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T13:08:38.890197Z\",\"iopub.execute_input\":\"2022-01-01T13:08:38.891876Z\",\"iopub.status.idle\":\"2022-01-01T13:41:32.886109Z\",\"shell.execute_reply.started\":\"2022-01-01T13:08:38.891845Z\",\"shell.execute_reply\":\"2022-01-01T13:41:32.885257Z\"},\"trusted\":true},\"execution_count\":46,\"outputs\":[{\"name\":\"stdout\",\"text\":\"\\n8 trials detected for ('max_depth', 'learning_rate', 'colsample_bytree')\\n\\ntrial: 0001 ### iterations: 00149 ### eval_score: 0.25811\\ntrial: 0002 ### iterations: 00078 ### eval_score: 0.25554\\ntrial: 0003 ### iterations: 00059 ### eval_score: 0.26658\\ntrial: 0004 ### iterations: 00149 ### eval_score: 0.27356\\ntrial: 0005 ### iterations: 00149 ### eval_score: 0.26426\\ntrial: 0006 ### iterations: 00149 ### eval_score: 0.25537\\ntrial: 0007 ### iterations: 00052 ### eval_score: 0.26107\\ntrial: 0008 ### iterations: 00137 ### eval_score: 0.27787\\n\",\"output_type\":\"stream\"},{\"execution_count\":46,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"BoostRFA(estimator=XGBRegressor(base_score=None, booster=None,\\n                                colsample_bylevel=None, colsample_bynode=None,\\n                                colsample_bytree=None, enable_categorical=False,\\n                                gamma=None, gpu_id=None, importance_type=None,\\n                                interaction_constraints=None,\\n                                learning_rate=None, max_delta_step=None,\\n                                max_depth=None, min_child_weight=None,\\n                                missing=nan, monotone_constraints=None,\\n                                n_estimato...\\n                                tree_method=None, validate_parameters=None,\\n                                verbosity=0),\\n         importance_type='shap_importances', min_features_to_select=1, n_iter=8,\\n         param_grid={'colsample_bytree': <hyperopt.pyll.base.Apply object at 0x7f5d29426890>,\\n                     'learning_rate': <hyperopt.pyll.base.Apply object at 0x7f5d29426a50>,\\n                     'max_depth': <hyperopt.pyll.base.Apply object at 0x7f5d29426790>},\\n         sampling_seed=0, train_importance=False)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"model.estimator_, model.best_params_, model.best_score_, model.n_features_\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T13:41:32.887300Z\",\"iopub.execute_input\":\"2022-01-01T13:41:32.887495Z\",\"iopub.status.idle\":\"2022-01-01T13:41:32.897203Z\",\"shell.execute_reply.started\":\"2022-01-01T13:41:32.887472Z\",\"shell.execute_reply\":\"2022-01-01T13:41:32.896455Z\"},\"trusted\":true},\"execution_count\":47,\"outputs\":[{\"execution_count\":47,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,\\n              colsample_bynode=1, colsample_bytree=0.7597292534356749,\\n              enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None,\\n              interaction_constraints='', learning_rate=0.059836658149176665,\\n              max_delta_step=0, max_depth=16, min_child_weight=1, missing=nan,\\n              monotone_constraints='()', n_estimators=150, n_jobs=-1,\\n              num_parallel_tree=1, predictor='auto', random_state=0, reg_alpha=0,\\n              reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact',\\n              validate_parameters=1, verbosity=0),\\n {'colsample_bytree': 0.7597292534356749,\\n  'learning_rate': 0.059836658149176665,\\n  'max_depth': 16},\\n 0.255374,\\n 11)\"},\"metadata\":{}}]},{\"cell_type\":\"code\",\"source\":\"(model.score(X_regr_valid, y_regr_valid), \\n model.predict(X_regr_valid).shape, \\n model.transform(X_regr_valid).shape)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T13:41:32.898201Z\",\"iopub.execute_input\":\"2022-01-01T13:41:32.898493Z\",\"iopub.status.idle\":\"2022-01-01T13:41:32.931801Z\",\"shell.execute_reply.started\":\"2022-01-01T13:41:32.898469Z\",\"shell.execute_reply\":\"2022-01-01T13:41:32.931131Z\"},\"trusted\":true},\"execution_count\":48,\"outputs\":[{\"execution_count\":48,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"(0.7391290836488575, (1800,), (1800, 11))\"},\"metadata\":{}}]},{\"cell_type\":\"markdown\",\"source\":\"# CUSTOM EVAL METRIC SUPPORT\",\"metadata\":{}},{\"cell_type\":\"code\",\"source\":\"from sklearn.metrics import roc_auc_score\\n\\ndef AUC(y_hat, dtrain):\\n    y_true = dtrain.get_label()\\n    return 'auc', roc_auc_score(y_true, y_hat)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T13:41:32.932773Z\",\"iopub.execute_input\":\"2022-01-01T13:41:32.932979Z\",\"iopub.status.idle\":\"2022-01-01T13:41:32.940277Z\",\"shell.execute_reply.started\":\"2022-01-01T13:41:32.932952Z\",\"shell.execute_reply\":\"2022-01-01T13:41:32.939659Z\"},\"trusted\":true},\"execution_count\":49,\"outputs\":[]},{\"cell_type\":\"code\",\"source\":\"model = BoostRFE(\\n    clf_xgb, \\n    param_grid=param_grid, min_features_to_select=1, step=1,\\n    greater_is_better=True\\n)\\nmodel.fit(\\n    X_clf_train, y_clf_train, \\n    eval_set=[(X_clf_valid, y_clf_valid)], early_stopping_rounds=6, verbose=0,\\n    eval_metric=AUC\\n)\",\"metadata\":{\"execution\":{\"iopub.status.busy\":\"2022-01-01T13:41:32.943194Z\",\"iopub.execute_input\":\"2022-01-01T13:41:32.944797Z\",\"iopub.status.idle\":\"2022-01-01T13:43:50.574377Z\",\"shell.execute_reply.started\":\"2022-01-01T13:41:32.944765Z\",\"shell.execute_reply\":\"2022-01-01T13:43:50.573628Z\"},\"trusted\":true},\"execution_count\":50,\"outputs\":[{\"name\":\"stdout\",\"text\":\"\\n8 trials detected for ('learning_rate', 'num_leaves', 'max_depth')\\n\\ntrial: 0001 ### iterations: 00017 ### eval_score: 0.9757\\ntrial: 0002 ### iterations: 00026 ### eval_score: 0.97632\\ntrial: 0003 ### iterations: 00017 ### eval_score: 0.9757\\ntrial: 0004 ### iterations: 00026 ### eval_score: 0.97632\\ntrial: 0005 ### iterations: 00033 ### eval_score: 0.97594\\ntrial: 0006 ### iterations: 00034 ### eval_score: 0.97577\\ntrial: 0007 ### iterations: 00033 ### eval_score: 0.97594\\ntrial: 0008 ### iterations: 00034 ### eval_score: 0.97577\\n\",\"output_type\":\"stream\"},{\"execution_count\":50,\"output_type\":\"execute_result\",\"data\":{\"text/plain\":\"BoostRFE(estimator=XGBClassifier(base_score=None, booster=None,\\n                                 colsample_bylevel=None, colsample_bynode=None,\\n                                 colsample_bytree=None,\\n                                 enable_categorical=False, gamma=None,\\n                                 gpu_id=None, importance_type=None,\\n                                 interaction_constraints=None,\\n                                 learning_rate=None, max_delta_step=None,\\n                                 max_depth=None, min_child_weight=None,\\n                                 missing=nan, monotone_constraints=None,\\n                                 n_estimators=150, n_jobs=-1,\\n                                 num_parallel_tree=None, predictor=None,\\n                                 random_state=0, reg_alpha=None,\\n                                 reg_lambda=None, scale_pos_weight=None,\\n                                 subsample=None, tree_method=None,\\n                                 validate_parameters=None, verbosity=0),\\n         greater_is_better=True, min_features_to_select=1,\\n         param_grid={'learning_rate': [0.2, 0.1], 'max_depth': [10, 12],\\n                     'num_leaves': [25, 35]})\"},\"metadata\":{}}]}]}"
  },
  {
    "path": "requirements.txt",
    "content": "numpy\nscipy\nscikit-learn>=0.24.1\nshap>=0.39.0\nhyperopt==0.2.5"
  },
  {
    "path": "setup.py",
    "content": "import pathlib\nfrom setuptools import setup, find_packages\n\nHERE = pathlib.Path(__file__).parent\n\nVERSION = '0.2.7'\nPACKAGE_NAME = 'shap-hypetune'\nAUTHOR = 'Marco Cerliani'\nAUTHOR_EMAIL = 'cerlymarco@gmail.com'\nURL = 'https://github.com/cerlymarco/shap-hypetune'\n\nLICENSE = 'MIT'\nDESCRIPTION = 'A python package for simultaneous Hyperparameters Tuning and Features Selection for Gradient Boosting Models.'\nLONG_DESCRIPTION = (HERE / \"README.md\").read_text()\nLONG_DESC_TYPE = \"text/markdown\"\n\nINSTALL_REQUIRES = [\n    'numpy',\n    'scipy',\n    'scikit-learn>=0.24.1',\n    'shap>=0.39.0',\n    'hyperopt==0.2.5'\n]\n\nsetup(name=PACKAGE_NAME,\n      version=VERSION,\n      description=DESCRIPTION,\n      long_description=LONG_DESCRIPTION,\n      long_description_content_type=LONG_DESC_TYPE,\n      author=AUTHOR,\n      license=LICENSE,\n      author_email=AUTHOR_EMAIL,\n      url=URL,\n      install_requires=INSTALL_REQUIRES,\n      python_requires='>=3',\n      packages=find_packages()\n      )"
  },
  {
    "path": "shaphypetune/__init__.py",
    "content": "from .utils import *\nfrom ._classes import *\nfrom .shaphypetune import *"
  },
  {
    "path": "shaphypetune/_classes.py",
    "content": "import io\nimport contextlib\nimport warnings\nimport numpy as np\nimport scipy as sp\nfrom copy import deepcopy\n\nfrom sklearn.base import clone\nfrom sklearn.utils.validation import check_is_fitted\nfrom sklearn.base import BaseEstimator, TransformerMixin\n\nfrom joblib import Parallel, delayed\nfrom hyperopt import fmin, tpe\n\nfrom .utils import ParameterSampler, _check_param, _check_boosting\nfrom .utils import _set_categorical_indexes, _get_categorical_support\nfrom .utils import _feature_importances, _shap_importances\n\n\nclass _BoostSearch(BaseEstimator):\n    \"\"\"Base class for BoostSearch meta-estimator.\n\n    Warning: This class should not be used directly. Use derived classes\n    instead.\n    \"\"\"\n\n    def __init__(self):\n        pass\n\n    def _validate_param_grid(self, fit_params):\n        \"\"\"Private method to validate fitting parameters.\"\"\"\n\n        if not isinstance(self.param_grid, dict):\n            raise ValueError(\"Pass param_grid in dict format.\")\n        self._param_grid = self.param_grid.copy()\n\n        for p_k, p_v in self._param_grid.items():\n            self._param_grid[p_k] = _check_param(p_v)\n\n        if 'eval_set' not in fit_params:\n            raise ValueError(\n                \"When tuning parameters, at least \"\n                \"a evaluation set is required.\")\n\n        self._eval_score = np.argmax if self.greater_is_better else np.argmin\n        self._score_sign = -1 if self.greater_is_better else 1\n\n        rs = ParameterSampler(\n            n_iter=self.n_iter,\n            param_distributions=self._param_grid,\n            random_state=self.sampling_seed\n        )\n        self._param_combi, self._tuning_type = rs.sample()\n        self._trial_id = 1\n\n        if self.verbose > 0:\n            n_trials = self.n_iter if self._tuning_type is 'hyperopt' \\\n                else len(self._param_combi)\n            print(\"\\n{} trials detected for {}\\n\".format(\n                n_trials, tuple(self.param_grid.keys())))\n\n    def _fit(self, X, y, fit_params, params=None):\n        \"\"\"Private method to fit a single boosting model and extract results.\"\"\"\n\n        model = self._build_model(params)\n        if isinstance(model, _BoostSelector):\n            model.fit(X=X, y=y, **fit_params)\n        else:\n            with contextlib.redirect_stdout(io.StringIO()):\n                model.fit(X=X, y=y, **fit_params)\n\n        results = {'params': params, 'status': 'ok'}\n\n        if isinstance(model, _BoostSelector):\n            results['booster'] = model.estimator_\n            results['model'] = model\n        else:\n            results['booster'] = model\n            results['model'] = None\n\n        if 'eval_set' not in fit_params:\n            return results\n\n        if self.boost_type_ == 'XGB':\n            # w/ eval_set and w/ early_stopping_rounds\n            if hasattr(results['booster'], 'best_score'):\n                results['iterations'] = results['booster'].best_iteration\n            # w/ eval_set and w/o early_stopping_rounds\n            else:\n                valid_id = list(results['booster'].evals_result_.keys())[-1]\n                eval_metric = list(results['booster'].evals_result_[valid_id])[-1]\n                results['iterations'] = \\\n                    len(results['booster'].evals_result_[valid_id][eval_metric])\n        else:\n            # w/ eval_set and w/ early_stopping_rounds\n            if results['booster'].best_iteration_ is not None:\n                results['iterations'] = results['booster'].best_iteration_\n            # w/ eval_set and w/o early_stopping_rounds\n            else:\n                valid_id = list(results['booster'].evals_result_.keys())[-1]\n                eval_metric = list(results['booster'].evals_result_[valid_id])[-1]\n                results['iterations'] = \\\n                    len(results['booster'].evals_result_[valid_id][eval_metric])\n\n        if self.boost_type_ == 'XGB':\n            # w/ eval_set and w/ early_stopping_rounds\n            if hasattr(results['booster'], 'best_score'):\n                results['loss'] = results['booster'].best_score\n            # w/ eval_set and w/o early_stopping_rounds\n            else:\n                valid_id = list(results['booster'].evals_result_.keys())[-1]\n                eval_metric = list(results['booster'].evals_result_[valid_id])[-1]\n                results['loss'] = \\\n                    results['booster'].evals_result_[valid_id][eval_metric][-1]\n        else:\n            valid_id = list(results['booster'].best_score_.keys())[-1]\n            eval_metric = list(results['booster'].best_score_[valid_id])[-1]\n            results['loss'] = results['booster'].best_score_[valid_id][eval_metric]\n\n        if params is not None:\n            if self.verbose > 0:\n                msg = \"trial: {} ### iterations: {} ### eval_score: {}\".format(\n                    str(self._trial_id).zfill(4),\n                    str(results['iterations']).zfill(5),\n                    round(results['loss'], 5)\n                )\n                print(msg)\n\n            self._trial_id += 1\n            results['loss'] *= self._score_sign\n\n        return results\n\n    def fit(self, X, y, trials=None, **fit_params):\n        \"\"\"Fit the provided boosting algorithm while searching the best subset\n        of features (according to the selected strategy) and choosing the best\n        parameters configuration (if provided).\n\n        It takes the same arguments available in the estimator fit.\n\n        Parameters\n        ----------\n        X : array-like of shape (n_samples, n_features)\n            The training input samples.\n\n        y : array-like of shape (n_samples,)\n            Target values.\n\n        trials : hyperopt.Trials() object, default=None\n            A hyperopt trials object, used to store intermediate results for all\n            optimization runs. Effective (and required) only when hyperopt\n            parameter searching is computed.\n\n        **fit_params : Additional fitting arguments.\n\n        Returns\n        -------\n        self : object\n        \"\"\"\n\n        self.boost_type_ = _check_boosting(self.estimator)\n\n        if self.param_grid is None:\n            results = self._fit(X, y, fit_params)\n\n            for v in vars(results['model']):\n                if v.endswith(\"_\") and not v.startswith(\"__\"):\n                    setattr(self, str(v), getattr(results['model'], str(v)))\n\n        else:\n            self._validate_param_grid(fit_params)\n\n            if self._tuning_type == 'hyperopt':\n                if trials is None:\n                    raise ValueError(\n                        \"trials must be not None when using hyperopt.\"\n                    )\n\n                search = fmin(\n                    fn=lambda p: self._fit(\n                        params=p, X=X, y=y, fit_params=fit_params\n                    ),\n                    space=self._param_combi, algo=tpe.suggest,\n                    max_evals=self.n_iter, trials=trials,\n                    rstate=np.random.RandomState(self.sampling_seed),\n                    show_progressbar=False, verbose=0\n                )\n                all_results = trials.results\n\n            else:\n                all_results = Parallel(\n                    n_jobs=self.n_jobs, verbose=self.verbose * int(bool(self.n_jobs))\n                )(delayed(self._fit)(X, y, fit_params, params)\n                  for params in self._param_combi)\n\n            # extract results from parallel loops\n            self.trials_, self.iterations_, self.scores_, models = [], [], [], []\n            for job_res in all_results:\n                self.trials_.append(job_res['params'])\n                self.iterations_.append(job_res['iterations'])\n                self.scores_.append(self._score_sign * job_res['loss'])\n                if isinstance(job_res['model'], _BoostSelector):\n                    models.append(job_res['model'])\n                else:\n                    models.append(job_res['booster'])\n\n            # get the best\n            id_best = self._eval_score(self.scores_)\n            self.best_params_ = self.trials_[id_best]\n            self.best_iter_ = self.iterations_[id_best]\n            self.best_score_ = self.scores_[id_best]\n            self.estimator_ = models[id_best]\n\n            for v in vars(models[id_best]):\n                if v.endswith(\"_\") and not v.startswith(\"__\"):\n                    setattr(self, str(v), getattr(models[id_best], str(v)))\n\n        return self\n\n    def predict(self, X, **predict_params):\n        \"\"\"Predict X.\n\n        Parameters\n        ----------\n        X : array-like of shape (n_samples, n_features)\n            Samples.\n\n        **predict_params : Additional predict arguments.\n\n        Returns\n        -------\n        pred : ndarray of shape (n_samples,)\n            The predicted values.\n        \"\"\"\n\n        check_is_fitted(self)\n\n        if hasattr(self, 'transform'):\n            X = self.transform(X)\n\n        return self.estimator_.predict(X, **predict_params)\n\n    def predict_proba(self, X, **predict_params):\n        \"\"\"Predict X probabilities.\n\n        Parameters\n        ----------\n        X : array-like of shape (n_samples, n_features)\n            Samples.\n\n        **predict_params : Additional predict arguments.\n\n        Returns\n        -------\n        pred : ndarray of shape (n_samples, n_classes)\n            The predicted values.\n        \"\"\"\n\n        check_is_fitted(self)\n\n        # raise original AttributeError\n        getattr(self.estimator_, 'predict_proba')\n\n        if hasattr(self, 'transform'):\n            X = self.transform(X)\n\n        return self.estimator_.predict_proba(X, **predict_params)\n\n    def score(self, X, y, sample_weight=None):\n        \"\"\"Return the score on the given test data and labels.\n\n        Parameters\n        ----------\n        X : array-like of shape (n_samples, n_features)\n            Test samples.\n\n        y : array-like of shape (n_samples,)\n            True values for X.\n\n        sample_weight : array-like of shape (n_samples,), default=None\n            Sample weights.\n\n        Returns\n        -------\n        score : float\n            Accuracy for classification, R2 for regression.\n        \"\"\"\n\n        check_is_fitted(self)\n\n        if hasattr(self, 'transform'):\n            X = self.transform(X)\n\n        return self.estimator_.score(X, y, sample_weight=sample_weight)\n\n\nclass _BoostSelector(BaseEstimator, TransformerMixin):\n    \"\"\"Base class for feature selection meta-estimator.\n\n    Warning: This class should not be used directly. Use derived classes\n    instead.\n    \"\"\"\n\n    def __init__(self):\n        pass\n\n    def transform(self, X):\n        \"\"\"Reduces the input X to the features selected by Boruta.\n\n        Parameters\n        ----------\n        X : array-like of shape (n_samples, n_features)\n            Samples.\n\n        Returns\n        -------\n        X : array-like of shape (n_samples, n_features_)\n            The input samples with only the selected features by Boruta.\n        \"\"\"\n\n        check_is_fitted(self)\n\n        shapes = np.shape(X)\n        if len(shapes) != 2:\n            raise ValueError(\"X must be 2D.\")\n\n        if shapes[1] != self.support_.shape[0]:\n            raise ValueError(\n                \"Expected {} features, received {}.\".format(\n                    self.support_.shape[0], shapes[1]))\n\n        if isinstance(X, np.ndarray):\n            return X[:, self.support_]\n        elif hasattr(X, 'loc'):\n            return X.loc[:, self.support_]\n        else:\n            raise ValueError(\"Data type not understood.\")\n\n    def get_support(self, indices=False):\n        \"\"\"Get a mask, or integer index, of the features selected.\n\n        Parameters\n        ----------\n        indices : bool, default=False\n            If True, the return value will be an array of integers, rather\n            than a boolean mask.\n\n        Returns\n        -------\n        support : array\n            An index that selects the retained features from a feature vector.\n            If `indices` is False, this is a boolean array of shape\n            [# input features], in which an element is True iff its\n            corresponding feature is selected for retention. If `indices` is\n            True, this is an integer array of shape [# output features] whose\n            values are indices into the input feature vector.\n        \"\"\"\n\n        check_is_fitted(self)\n\n        mask = self.support_\n        return mask if not indices else np.where(mask)[0]\n\n\nclass _Boruta(_BoostSelector):\n    \"\"\"Base class for BoostBoruta meta-estimator.\n\n    Warning: This class should not be used directly. Use derived classes\n    instead.\n\n    Notes\n    -----\n    The code for the Boruta algorithm is inspired and improved from:\n    https://github.com/scikit-learn-contrib/boruta_py\n    \"\"\"\n\n    def __init__(self,\n                 estimator, *,\n                 perc=100,\n                 alpha=0.05,\n                 max_iter=100,\n                 early_stopping_boruta_rounds=None,\n                 importance_type='feature_importances',\n                 train_importance=True,\n                 verbose=0):\n\n        self.estimator = estimator\n        self.perc = perc\n        self.alpha = alpha\n        self.max_iter = max_iter\n        self.early_stopping_boruta_rounds = early_stopping_boruta_rounds\n        self.importance_type = importance_type\n        self.train_importance = train_importance\n        self.verbose = verbose\n\n    def _create_X(self, X, feat_id_real):\n        \"\"\"Private method to add shadow features to the original ones. \"\"\"\n\n        if isinstance(X, np.ndarray):\n            X_real = X[:, feat_id_real].copy()\n            X_sha = X_real.copy()\n            X_sha = np.apply_along_axis(self._random_state.permutation, 0, X_sha)\n\n            X = np.hstack((X_real, X_sha))\n\n        elif hasattr(X, 'iloc'):\n            X_real = X.iloc[:, feat_id_real].copy()\n            X_sha = X_real.copy()\n            X_sha = X_sha.apply(self._random_state.permutation)\n            X_sha = X_sha.astype(X_real.dtypes)\n\n            X = X_real.join(X_sha, rsuffix='_SHA')\n\n        else:\n            raise ValueError(\"Data type not understood.\")\n\n        return X\n\n    def _check_fit_params(self, fit_params, feat_id_real=None):\n        \"\"\"Private method to validate and check fit_params.\"\"\"\n\n        _fit_params = deepcopy(fit_params)\n        estimator = clone(self.estimator)\n        # add here possible estimator checks in each iteration\n\n        _fit_params = _set_categorical_indexes(\n            self.support_, self._cat_support, _fit_params, duplicate=True)\n\n        if feat_id_real is None:  # final model fit\n            if 'eval_set' in _fit_params:\n                _fit_params['eval_set'] = list(map(lambda x: (\n                    self.transform(x[0]), x[1]\n                ), _fit_params['eval_set']))\n        else:\n            if 'eval_set' in _fit_params:  # iterative model fit\n                _fit_params['eval_set'] = list(map(lambda x: (\n                    self._create_X(x[0], feat_id_real), x[1]\n                ), _fit_params['eval_set']))\n\n        if 'feature_name' in _fit_params:  # LGB\n            _fit_params['feature_name'] = 'auto'\n\n        if 'feature_weights' in _fit_params:  # XGB  import warnings\n            warnings.warn(\n                \"feature_weights is not supported when selecting features. \"\n                \"It's automatically set to None.\")\n            _fit_params['feature_weights'] = None\n\n        return _fit_params, estimator\n\n    def _do_tests(self, dec_reg, hit_reg, iter_id):\n        \"\"\"Private method to operate Bonferroni corrections on the feature\n        selections.\"\"\"\n\n        active_features = np.where(dec_reg >= 0)[0]\n        hits = hit_reg[active_features]\n        # get uncorrected p values based on hit_reg\n        to_accept_ps = sp.stats.binom.sf(hits - 1, iter_id, .5).flatten()\n        to_reject_ps = sp.stats.binom.cdf(hits, iter_id, .5).flatten()\n\n        # Bonferroni correction with the total n_features in each iteration\n        to_accept = to_accept_ps <= self.alpha / float(len(dec_reg))\n        to_reject = to_reject_ps <= self.alpha / float(len(dec_reg))\n\n        # find features which are 0 and have been rejected or accepted\n        to_accept = np.where((dec_reg[active_features] == 0) * to_accept)[0]\n        to_reject = np.where((dec_reg[active_features] == 0) * to_reject)[0]\n\n        # updating dec_reg\n        dec_reg[active_features[to_accept]] = 1\n        dec_reg[active_features[to_reject]] = -1\n\n        return dec_reg\n\n    def fit(self, X, y, **fit_params):\n        \"\"\"Fit the Boruta algorithm to automatically tune\n        the number of selected features.\"\"\"\n\n        self.boost_type_ = _check_boosting(self.estimator)\n\n        if self.max_iter < 1:\n            raise ValueError('max_iter should be an integer >0.')\n\n        if self.perc <= 0 or self.perc > 100:\n            raise ValueError('The percentile should be between 0 and 100.')\n\n        if self.alpha <= 0 or self.alpha > 1:\n            raise ValueError('alpha should be between 0 and 1.')\n\n        if self.early_stopping_boruta_rounds is None:\n            es_boruta_rounds = self.max_iter\n        else:\n            if self.early_stopping_boruta_rounds < 1:\n                raise ValueError(\n                    'early_stopping_boruta_rounds should be an integer >0.')\n            es_boruta_rounds = self.early_stopping_boruta_rounds\n\n        importances = ['feature_importances', 'shap_importances']\n        if self.importance_type not in importances:\n            raise ValueError(\n                \"importance_type must be one of {}. Get '{}'\".format(\n                    importances, self.importance_type))\n\n        if self.importance_type == 'shap_importances':\n            if not self.train_importance and not 'eval_set' in fit_params:\n                raise ValueError(\n                    \"When train_importance is set to False, using \"\n                    \"shap_importances, pass at least a eval_set.\")\n            eval_importance = not self.train_importance and 'eval_set' in fit_params\n\n        shapes = np.shape(X)\n        if len(shapes) != 2:\n            raise ValueError(\"X must be 2D.\")\n        n_features = shapes[1]\n\n        # create mask for user-defined categorical features\n        self._cat_support = _get_categorical_support(n_features, fit_params)\n\n        # holds the decision about each feature:\n        # default (0); accepted (1); rejected (-1)\n        dec_reg = np.zeros(n_features, dtype=int)\n        dec_history = np.zeros((self.max_iter, n_features), dtype=int)\n        # counts how many times a given feature was more important than\n        # the best of the shadow features\n        hit_reg = np.zeros(n_features, dtype=int)\n        # record the history of the iterations\n        imp_history = np.zeros(n_features, dtype=float)\n        sha_max_history = []\n\n        for i in range(self.max_iter):\n            if (dec_reg != 0).all():\n                if self.verbose > 1:\n                    print(\"All Features analyzed. Boruta stop!\")\n                break\n\n            if self.verbose > 1:\n                print('Iteration: {} / {}'.format(i + 1, self.max_iter))\n\n            self._random_state = np.random.RandomState(i + 1000)\n\n            # add shadow attributes, shuffle and train estimator\n            self.support_ = dec_reg >= 0\n            feat_id_real = np.where(self.support_)[0]\n            n_real = feat_id_real.shape[0]\n            _fit_params, estimator = self._check_fit_params(fit_params, feat_id_real)\n            estimator.set_params(random_state=i + 1000)\n            _X = self._create_X(X, feat_id_real)\n            with contextlib.redirect_stdout(io.StringIO()):\n                estimator.fit(_X, y, **_fit_params)\n\n            # get coefs\n            if self.importance_type == 'feature_importances':\n                coefs = _feature_importances(estimator)\n            else:\n                if eval_importance:\n                    coefs = _shap_importances(\n                        estimator, _fit_params['eval_set'][-1][0])\n                else:\n                    coefs = _shap_importances(estimator, _X)\n\n                    # separate importances of real and shadow features\n            imp_sha = coefs[n_real:]\n            imp_real = np.zeros(n_features) * np.nan\n            imp_real[feat_id_real] = coefs[:n_real]\n\n            # get the threshold of shadow importances used for rejection\n            imp_sha_max = np.percentile(imp_sha, self.perc)\n\n            # record importance history\n            sha_max_history.append(imp_sha_max)\n            imp_history = np.vstack((imp_history, imp_real))\n\n            # register which feature is more imp than the max of shadows\n            hit_reg[np.where(imp_real[~np.isnan(imp_real)] > imp_sha_max)[0]] += 1\n\n            # check if a feature is doing better than expected by chance\n            dec_reg = self._do_tests(dec_reg, hit_reg, i + 1)\n            dec_history[i] = dec_reg\n\n            es_id = i - es_boruta_rounds\n            if es_id >= 0:\n                if np.equal(dec_history[es_id:(i + 1)], dec_reg).all():\n                    if self.verbose > 0:\n                        print(\"Boruta early stopping at iteration {}\".format(i + 1))\n                    break\n\n        confirmed = np.where(dec_reg == 1)[0]\n        tentative = np.where(dec_reg == 0)[0]\n\n        self.support_ = np.zeros(n_features, dtype=bool)\n        self.ranking_ = np.ones(n_features, dtype=int) * 4\n        self.n_features_ = confirmed.shape[0]\n        self.importance_history_ = imp_history[1:]\n\n        if tentative.shape[0] > 0:\n            tentative_median = np.nanmedian(imp_history[1:, tentative], axis=0)\n            tentative_low = tentative[\n                np.where(tentative_median <= np.median(sha_max_history))[0]]\n            tentative_up = np.setdiff1d(tentative, tentative_low)\n\n            self.ranking_[tentative_low] = 3\n            if tentative_up.shape[0] > 0:\n                self.ranking_[tentative_up] = 2\n\n        if confirmed.shape[0] > 0:\n            self.support_[confirmed] = True\n            self.ranking_[confirmed] = 1\n\n        if (~self.support_).all():\n            raise RuntimeError(\n                \"Boruta didn't select any feature. Try to increase max_iter or \"\n                \"increase (if not None) early_stopping_boruta_rounds or \"\n                \"decrese perc.\")\n\n        _fit_params, self.estimator_ = self._check_fit_params(fit_params)\n        with contextlib.redirect_stdout(io.StringIO()):\n            self.estimator_.fit(self.transform(X), y, **_fit_params)\n\n        return self\n\n\nclass _RFE(_BoostSelector):\n    \"\"\"Base class for BoostRFE meta-estimator.\n\n    Warning: This class should not be used directly. Use derived classes\n    instead.\n    \"\"\"\n\n    def __init__(self,\n                 estimator, *,\n                 min_features_to_select=None,\n                 step=1,\n                 greater_is_better=False,\n                 importance_type='feature_importances',\n                 train_importance=True,\n                 verbose=0):\n\n        self.estimator = estimator\n        self.min_features_to_select = min_features_to_select\n        self.step = step\n        self.greater_is_better = greater_is_better\n        self.importance_type = importance_type\n        self.train_importance = train_importance\n        self.verbose = verbose\n\n    def _check_fit_params(self, fit_params):\n        \"\"\"Private method to validate and check fit_params.\"\"\"\n\n        _fit_params = deepcopy(fit_params)\n        estimator = clone(self.estimator)\n        # add here possible estimator checks in each iteration\n\n        _fit_params = _set_categorical_indexes(\n            self.support_, self._cat_support, _fit_params)\n\n        if 'eval_set' in _fit_params:\n            _fit_params['eval_set'] = list(map(lambda x: (\n                self.transform(x[0]), x[1]\n            ), _fit_params['eval_set']))\n\n        if 'feature_name' in _fit_params:  # LGB\n            _fit_params['feature_name'] = 'auto'\n\n        if 'feature_weights' in _fit_params:  # XGB  import warnings\n            warnings.warn(\n                \"feature_weights is not supported when selecting features. \"\n                \"It's automatically set to None.\")\n            _fit_params['feature_weights'] = None\n\n        return _fit_params, estimator\n\n    def _step_score(self, estimator):\n        \"\"\"Return the score for a fit on eval_set.\"\"\"\n\n        if self.boost_type_ == 'LGB':\n            valid_id = list(estimator.best_score_.keys())[-1]\n            eval_metric = list(estimator.best_score_[valid_id])[-1]\n            score = estimator.best_score_[valid_id][eval_metric]\n        else:\n            # w/ eval_set and w/ early_stopping_rounds\n            if hasattr(estimator, 'best_score'):\n                score = estimator.best_score\n            # w/ eval_set and w/o early_stopping_rounds\n            else:\n                valid_id = list(estimator.evals_result_.keys())[-1]\n                eval_metric = list(estimator.evals_result_[valid_id])[-1]\n                score = estimator.evals_result_[valid_id][eval_metric][-1]\n\n        return score\n\n    def fit(self, X, y, **fit_params):\n        \"\"\"Fit the RFE algorithm to automatically tune\n        the number of selected features.\"\"\"\n\n        self.boost_type_ = _check_boosting(self.estimator)\n\n        importances = ['feature_importances', 'shap_importances']\n        if self.importance_type not in importances:\n            raise ValueError(\n                \"importance_type must be one of {}. Get '{}'\".format(\n                    importances, self.importance_type))\n\n        # scoring controls the calculation of self.score_history_\n        # scoring is used automatically when 'eval_set' is in fit_params\n        scoring = 'eval_set' in fit_params\n        if self.importance_type == 'shap_importances':\n            if not self.train_importance and not scoring:\n                raise ValueError(\n                    \"When train_importance is set to False, using \"\n                    \"shap_importances, pass at least a eval_set.\")\n            eval_importance = not self.train_importance and scoring\n\n        shapes = np.shape(X)\n        if len(shapes) != 2:\n            raise ValueError(\"X must be 2D.\")\n        n_features = shapes[1]\n\n        # create mask for user-defined categorical features\n        self._cat_support = _get_categorical_support(n_features, fit_params)\n\n        if self.min_features_to_select is None:\n            if scoring:\n                min_features_to_select = 1\n            else:\n                min_features_to_select = n_features // 2\n        else:\n            min_features_to_select = self.min_features_to_select\n\n        if 0.0 < self.step < 1.0:\n            step = int(max(1, self.step * n_features))\n        else:\n            step = int(self.step)\n        if step <= 0:\n            raise ValueError(\"Step must be >0.\")\n\n        self.support_ = np.ones(n_features, dtype=bool)\n        self.ranking_ = np.ones(n_features, dtype=int)\n        if scoring:\n            self.score_history_ = []\n            eval_score = np.max if self.greater_is_better else np.min\n            best_score = -np.inf if self.greater_is_better else np.inf\n\n        while np.sum(self.support_) > min_features_to_select:\n            # remaining features\n            features = np.arange(n_features)[self.support_]\n            _fit_params, estimator = self._check_fit_params(fit_params)\n\n            if self.verbose > 1:\n                print(\"Fitting estimator with {} features\".format(\n                    self.support_.sum()))\n            with contextlib.redirect_stdout(io.StringIO()):\n                estimator.fit(self.transform(X), y, **_fit_params)\n\n            # get coefs\n            if self.importance_type == 'feature_importances':\n                coefs = _feature_importances(estimator)\n            else:\n                if eval_importance:\n                    coefs = _shap_importances(\n                        estimator, _fit_params['eval_set'][-1][0])\n                else:\n                    coefs = _shap_importances(\n                        estimator, self.transform(X))\n            ranks = np.argsort(coefs)\n\n            # eliminate the worse features\n            threshold = min(step, np.sum(self.support_) - min_features_to_select)\n\n            # compute step score on the previous selection iteration\n            # because 'estimator' must use features\n            # that have not been eliminated yet\n            if scoring:\n                score = self._step_score(estimator)\n                self.score_history_.append(score)\n                if best_score != eval_score([score, best_score]):\n                    best_score = score\n                    best_support = self.support_.copy()\n                    best_ranking = self.ranking_.copy()\n                    best_estimator = estimator\n\n            self.support_[features[ranks][:threshold]] = False\n            self.ranking_[np.logical_not(self.support_)] += 1\n\n        # set final attributes\n        _fit_params, self.estimator_ = self._check_fit_params(fit_params)\n        if self.verbose > 1:\n            print(\"Fitting estimator with {} features\".format(self.support_.sum()))\n        with contextlib.redirect_stdout(io.StringIO()):\n            self.estimator_.fit(self.transform(X), y, **_fit_params)\n\n        # compute step score when only min_features_to_select features left\n        if scoring:\n            score = self._step_score(self.estimator_)\n            self.score_history_.append(score)\n            if best_score == eval_score([score, best_score]):\n                self.support_ = best_support\n                self.ranking_ = best_ranking\n                self.estimator_ = best_estimator\n        self.n_features_ = self.support_.sum()\n\n        return self\n\n\nclass _RFA(_BoostSelector):\n    \"\"\"Base class for BoostRFA meta-estimator.\n\n    Warning: This class should not be used directly. Use derived classes\n    instead.\n    \"\"\"\n\n    def __init__(self,\n                 estimator, *,\n                 min_features_to_select=None,\n                 step=1,\n                 greater_is_better=False,\n                 importance_type='feature_importances',\n                 train_importance=True,\n                 verbose=0):\n\n        self.estimator = estimator\n        self.min_features_to_select = min_features_to_select\n        self.step = step\n        self.greater_is_better = greater_is_better\n        self.importance_type = importance_type\n        self.train_importance = train_importance\n        self.verbose = verbose\n\n    def _check_fit_params(self, fit_params, inverse=False):\n        \"\"\"Private method to validate and check fit_params.\"\"\"\n\n        _fit_params = deepcopy(fit_params)\n        estimator = clone(self.estimator)\n        # add here possible estimator checks in each iteration\n\n        _fit_params = _set_categorical_indexes(\n            self.support_, self._cat_support, _fit_params)\n\n        if 'eval_set' in _fit_params:\n            _fit_params['eval_set'] = list(map(lambda x: (\n                self._transform(x[0], inverse), x[1]\n            ), _fit_params['eval_set']))\n\n        if 'feature_name' in _fit_params:  # LGB\n            _fit_params['feature_name'] = 'auto'\n\n        if 'feature_weights' in _fit_params:  # XGB  import warnings\n            warnings.warn(\n                \"feature_weights is not supported when selecting features. \"\n                \"It's automatically set to None.\")\n            _fit_params['feature_weights'] = None\n\n        return _fit_params, estimator\n\n    def _step_score(self, estimator):\n        \"\"\"Return the score for a fit on eval_set.\"\"\"\n\n        if self.boost_type_ == 'LGB':\n            valid_id = list(estimator.best_score_.keys())[-1]\n            eval_metric = list(estimator.best_score_[valid_id])[-1]\n            score = estimator.best_score_[valid_id][eval_metric]\n        else:\n            # w/ eval_set and w/ early_stopping_rounds\n            if hasattr(estimator, 'best_score'):\n                score = estimator.best_score\n            # w/ eval_set and w/o early_stopping_rounds\n            else:\n                valid_id = list(estimator.evals_result_.keys())[-1]\n                eval_metric = list(estimator.evals_result_[valid_id])[-1]\n                score = estimator.evals_result_[valid_id][eval_metric][-1]\n\n        return score\n\n    def fit(self, X, y, **fit_params):\n        \"\"\"Fit the RFA algorithm to automatically tune\n        the number of selected features.\"\"\"\n\n        self.boost_type_ = _check_boosting(self.estimator)\n\n        importances = ['feature_importances', 'shap_importances']\n        if self.importance_type not in importances:\n            raise ValueError(\n                \"importance_type must be one of {}. Get '{}'\".format(\n                    importances, self.importance_type))\n\n        # scoring controls the calculation of self.score_history_\n        # scoring is used automatically when 'eval_set' is in fit_params\n        scoring = 'eval_set' in fit_params\n        if self.importance_type == 'shap_importances':\n            if not self.train_importance and not scoring:\n                raise ValueError(\n                    \"When train_importance is set to False, using \"\n                    \"shap_importances, pass at least a eval_set.\")\n            eval_importance = not self.train_importance and scoring\n\n        shapes = np.shape(X)\n        if len(shapes) != 2:\n            raise ValueError(\"X must be 2D.\")\n        n_features = shapes[1]\n\n        # create mask for user-defined categorical features\n        self._cat_support = _get_categorical_support(n_features, fit_params)\n\n        if self.min_features_to_select is None:\n            if scoring:\n                min_features_to_select = 1\n            else:\n                min_features_to_select = n_features // 2\n        else:\n            if scoring:\n                min_features_to_select = self.min_features_to_select\n            else:\n                min_features_to_select = n_features - self.min_features_to_select\n\n        if 0.0 < self.step < 1.0:\n            step = int(max(1, self.step * n_features))\n        else:\n            step = int(self.step)\n        if step <= 0:\n            raise ValueError(\"Step must be >0.\")\n\n        self.support_ = np.zeros(n_features, dtype=bool)\n        self._support = np.ones(n_features, dtype=bool)\n        self.ranking_ = np.ones(n_features, dtype=int)\n        self._ranking = np.ones(n_features, dtype=int)\n        if scoring:\n            self.score_history_ = []\n            eval_score = np.max if self.greater_is_better else np.min\n            best_score = -np.inf if self.greater_is_better else np.inf\n\n        while np.sum(self._support) > min_features_to_select:\n            # remaining features\n            features = np.arange(n_features)[self._support]\n\n            # scoring the previous added features\n            if scoring and np.sum(self.support_) > 0:\n                _fit_params, estimator = self._check_fit_params(fit_params)\n                with contextlib.redirect_stdout(io.StringIO()):\n                    estimator.fit(self._transform(X, inverse=False), y, **_fit_params)\n                score = self._step_score(estimator)\n                self.score_history_.append(score)\n                if best_score != eval_score([score, best_score]):\n                    best_score = score\n                    best_support = self.support_.copy()\n                    best_ranking = self.ranking_.copy()\n                    best_estimator = estimator\n\n            # evaluate the remaining features\n            _fit_params, _estimator = self._check_fit_params(fit_params, inverse=True)\n            if self.verbose > 1:\n                print(\"Fitting estimator with {} features\".format(self._support.sum()))\n            with contextlib.redirect_stdout(io.StringIO()):\n                _estimator.fit(self._transform(X, inverse=True), y, **_fit_params)\n                if self._support.sum() == n_features:\n                    all_features_estimator = _estimator\n\n            # get coefs\n            if self.importance_type == 'feature_importances':\n                coefs = _feature_importances(_estimator)\n            else:\n                if eval_importance:\n                    coefs = _shap_importances(\n                        _estimator, _fit_params['eval_set'][-1][0])\n                else:\n                    coefs = _shap_importances(\n                        _estimator, self._transform(X, inverse=True))\n            ranks = np.argsort(-coefs)  # the rank is inverted\n\n            # add the best features\n            threshold = min(step, np.sum(self._support) - min_features_to_select)\n\n            # remaining features to test\n            self._support[features[ranks][:threshold]] = False\n            self._ranking[np.logical_not(self._support)] += 1\n            # features tested\n            self.support_[features[ranks][:threshold]] = True\n            self.ranking_[np.logical_not(self.support_)] += 1\n\n        # set final attributes\n        _fit_params, self.estimator_ = self._check_fit_params(fit_params)\n        if self.verbose > 1:\n            print(\"Fitting estimator with {} features\".format(self._support.sum()))\n        with contextlib.redirect_stdout(io.StringIO()):\n            self.estimator_.fit(self._transform(X, inverse=False), y, **_fit_params)\n\n        # compute step score when only min_features_to_select features left\n        if scoring:\n            score = self._step_score(self.estimator_)\n            self.score_history_.append(score)\n            if best_score == eval_score([score, best_score]):\n                self.support_ = best_support\n                self.ranking_ = best_ranking\n                self.estimator_ = best_estimator\n\n            if len(set(self.score_history_)) == 1:\n                self.support_ = np.ones(n_features, dtype=bool)\n                self.ranking_ = np.ones(n_features, dtype=int)\n                self.estimator_ = all_features_estimator\n        self.n_features_ = self.support_.sum()\n\n        return self\n\n    def _transform(self, X, inverse=False):\n        \"\"\"Private method to reduce the input X to the features selected.\"\"\"\n\n        shapes = np.shape(X)\n        if len(shapes) != 2:\n            raise ValueError(\"X must be 2D.\")\n\n        if shapes[1] != self.support_.shape[0]:\n            raise ValueError(\n                \"Expected {} features, received {}.\".format(\n                    self.support_.shape[0], shapes[1]))\n\n        if inverse:\n            if isinstance(X, np.ndarray):\n                return X[:, self._support]\n            elif hasattr(X, 'loc'):\n                return X.loc[:, self._support]\n            elif sp.sparse.issparse(X):\n                return X[:, self._support]\n            else:\n                raise ValueError(\"Data type not understood.\")\n        else:\n            if isinstance(X, np.ndarray):\n                return X[:, self.support_]\n            elif hasattr(X, 'loc'):\n                return X.loc[:, self.support_]\n            elif sp.sparse.issparse(X):\n                return X[:, self.support_]\n            else:\n                raise ValueError(\"Data type not understood.\")\n\n    def transform(self, X):\n        \"\"\"Reduces the input X to the features selected with RFA.\n\n        Parameters\n        ----------\n        X : array-like of shape (n_samples, n_features)\n            Samples.\n\n        Returns\n        -------\n        X : array-like of shape (n_samples, n_features_)\n            The input samples with only the selected features by Boruta.\n        \"\"\"\n\n        check_is_fitted(self)\n\n        return self._transform(X, inverse=False)\n"
  },
  {
    "path": "shaphypetune/shaphypetune.py",
    "content": "from sklearn.base import clone\n\nfrom ._classes import _BoostSearch, _Boruta, _RFA, _RFE\n\n\nclass BoostSearch(_BoostSearch):\n    \"\"\"Hyperparamater searching and optimization on a given validation set\n    for LGBModel or XGBModel. \n\n    Pass a LGBModel or XGBModel, and a dictionary with the parameter boundaries \n    for grid, random or bayesian search. \n    To operate random search pass distributions in the param_grid with rvs \n    method for sampling (such as those from scipy.stats.distributions). \n    To operate bayesian search pass hyperopt distributions.   \n    The specification of n_iter or sampling_seed is effective only with random\n    or hyperopt searches.\n    The best parameter combination is the one which obtain the better score\n    (as returned by eval_metric) on the provided eval_set.\n\n    If all parameters are presented as a list/floats/integers, grid-search \n    is performed. If at least one parameter is given as a distribution (such as \n    those from scipy.stats.distributions), random-search is performed computing\n    sampling with replacement. Bayesian search is effective only when all the \n    parameters to tune are in form of hyperopt distributions. \n    It is highly recommended to use continuous distributions for continuous \n    parameters.\n\n    Parameters\n    ----------\n    estimator : object\n        A supervised learning estimator of LGBModel or XGBModel type.\n\n    param_grid : dict\n        Dictionary with parameters names (`str`) as keys and distributions\n        or lists of parameters to try. \n\n    greater_is_better : bool, default=False\n        Whether the quantity to monitor is a score function, \n        meaning high is good, or a loss function, meaning low is good.\n\n    n_iter : int, default=None\n        Effective only for random or hyperopt search.\n        Number of parameter settings that are sampled. \n        n_iter trades off runtime vs quality of the solution.\n\n    sampling_seed : int, default=None\n        Effective only for random or hyperopt search.\n        The seed used to sample from the hyperparameter distributions.\n\n    n_jobs : int, default=None\n        Effective only with grid and random search.\n        The number of jobs to run in parallel for model fitting.\n        ``None`` means 1 using one processor. ``-1`` means using all\n        processors.\n\n    verbose : int, default=1\n        Verbosity mode. <=0 silent all; >0 print trial logs with the \n        connected score.\n\n    Attributes\n    ----------\n    estimator_ : estimator\n        Estimator that was chosen by the search, i.e. estimator\n        which gave the best score on the eval_set.\n\n    best_params_ : dict\n        Parameter setting that gave the best results on the eval_set.\n\n    trials_ : list\n        A list of dicts. The dicts are all the parameter combinations tried \n        and derived from the param_grid.\n\n    best_score_ : float\n        The best score achieved by all the possible combination created.\n\n    scores_ : list\n        The scores achieved on the eval_set by all the models tried.\n\n    best_iter_ : int\n        The boosting iterations achieved by the best parameters combination.\n\n    iterations_ : list\n        The boosting iterations of all the models tried.\n\n    boost_type_ : str\n        The type of the boosting estimator (LGB or XGB).\n    \"\"\"\n\n    def __init__(self,\n                 estimator, *,\n                 param_grid,\n                 greater_is_better=False,\n                 n_iter=None,\n                 sampling_seed=None,\n                 verbose=1,\n                 n_jobs=None):\n        self.estimator = estimator\n        self.param_grid = param_grid\n        self.greater_is_better = greater_is_better\n        self.n_iter = n_iter\n        self.sampling_seed = sampling_seed\n        self.verbose = verbose\n        self.n_jobs = n_jobs\n\n    def _build_model(self, params):\n        \"\"\"Private method to build model.\"\"\"\n\n        model = clone(self.estimator)\n        model.set_params(**params)\n\n        return model\n\n\nclass BoostBoruta(_BoostSearch, _Boruta):\n    \"\"\"Simultaneous features selection with Boruta algorithm and hyperparamater\n    searching on a given validation set for LGBModel or XGBModel.\n\n    Pass a LGBModel or XGBModel to compute features selection with Boruta\n    algorithm. The best features are used to train a new gradient boosting\n    instance. When a eval_set is provided, shadow features are build also on it.\n\n    If param_grid is a dictionary with parameter boundaries, a hyperparameter\n    tuning is computed simultaneously. The parameter combinations are scored on\n    the provided eval_set.\n    To operate random search pass distributions in the param_grid with rvs\n    method for sampling (such as those from scipy.stats.distributions).\n    To operate bayesian search pass hyperopt distributions.\n    The specification of n_iter or sampling_seed is effective only with random\n    or hyperopt searches.\n    The best parameter combination is the one which obtain the better score\n    (as returned by eval_metric) on the provided eval_set.\n\n    If all parameters are presented as a list/floats/integers, grid-search\n    is performed. If at least one parameter is given as a distribution (such as\n    those from scipy.stats.distributions), random-search is performed computing\n    sampling with replacement. Bayesian search is effective only when all the\n    parameters to tune are in form of hyperopt distributions.\n    It is highly recommended to use continuous distributions for continuous\n    parameters.\n\n    Parameters\n    ----------\n    estimator : object\n        A supervised learning estimator of LGBModel or XGBModel type.\n\n    perc : int, default=100\n        Threshold for comparison between shadow and real features.\n        The lower perc is the more false positives will be picked as relevant\n        but also the less relevant features will be left out.\n        100 correspond to the max.\n\n    alpha : float, default=0.05\n        Level at which the corrected p-values will get rejected in the\n        correction steps.\n\n    max_iter : int, default=100\n        The number of maximum Boruta iterations to perform.\n\n    early_stopping_boruta_rounds : int, default=None\n        The maximum amount of iterations without confirming a tentative\n        feature. Use early stopping to terminate the selection process\n        before reaching `max_iter` iterations if the algorithm cannot\n        confirm a tentative feature after N iterations.\n        None means no early stopping search.\n\n    importance_type : str, default='feature_importances'\n         Which importance measure to use. It can be 'feature_importances'\n         (the default feature importance of the gradient boosting estimator)\n         or 'shap_importances'.\n\n    train_importance : bool, default=True\n        Effective only when importance_type='shap_importances'.\n        Where to compute the shap feature importance: on train (True)\n        or on eval_set (False).\n\n    param_grid : dict, default=None\n        Dictionary with parameters names (`str`) as keys and distributions\n        or lists of parameters to try.\n        None means no hyperparameters search.\n\n    greater_is_better : bool, default=False\n        Effective only when hyperparameters searching.\n        Whether the quantity to monitor is a score function,\n        meaning high is good, or a loss function, meaning low is good.\n\n    n_iter : int, default=None\n        Effective only when hyperparameters searching.\n        Effective only for random or hyperopt seraches.\n        Number of parameter settings that are sampled.\n        n_iter trades off runtime vs quality of the solution.\n\n    sampling_seed : int, default=None\n        Effective only when hyperparameters searching.\n        Effective only for random or hyperopt serach.\n        The seed used to sample from the hyperparameter distributions.\n\n    n_jobs : int, default=None\n        Effective only when hyperparameters searching without hyperopt.\n        The number of jobs to run in parallel for model fitting.\n        ``None`` means 1 using one processor. ``-1`` means using all\n        processors.\n\n    verbose : int, default=1\n        Verbosity mode. <=0 silent all; ==1 print trial logs (when\n        hyperparameters searching); >1 print feature selection logs plus\n        trial logs (when hyperparameters searching).\n\n    Attributes\n    ----------\n    estimator_ : estimator\n        The fitted estimator with the select features and the optimal\n        parameter combination (when hyperparameters searching).\n\n    n_features_ : int\n        The number of selected features (from the best param config\n        when hyperparameters searching).\n\n    ranking_ : ndarray of shape (n_features,)\n        The feature ranking, such that ``ranking_[i]`` corresponds to the\n        ranking position of the i-th feature (from the best param config\n        when hyperparameters searching). Selected features are assigned\n        rank 1 (2: tentative upper bound, 3: tentative lower bound, 4:\n        rejected).\n\n    support_ : ndarray of shape (n_features,)\n        The mask of selected features (from the best param config\n        when hyperparameters searching).\n\n    importance_history_ : ndarray of shape (n_features, n_iters)\n        The importance values for each feature across all iterations.\n\n    best_params_ : dict\n        Available only when hyperparameters searching.\n        Parameter setting that gave the best results on the eval_set.\n\n    trials_ : list\n        Available only when hyperparameters searching.\n        A list of dicts. The dicts are all the parameter combinations tried\n        and derived from the param_grid.\n\n    best_score_ : float\n        Available only when hyperparameters searching.\n        The best score achieved by all the possible combination created.\n\n    scores_ : list\n        Available only when hyperparameters searching.\n        The scores achived on the eval_set by all the models tried.\n\n    best_iter_ : int\n        Available only when hyperparameters searching.\n        The boosting iterations achieved by the best parameters combination.\n\n    iterations_ : list\n        Available only when hyperparameters searching.\n        The boosting iterations of all the models tried.\n\n    boost_type_ : str\n        The type of the boosting estimator (LGB or XGB).\n\n    Notes\n    -----\n    The code for the Boruta algorithm is inspired and improved from:\n    https://github.com/scikit-learn-contrib/boruta_py\n    \"\"\"\n\n    def __init__(self,\n                 estimator, *,\n                 perc=100,\n                 alpha=0.05,\n                 max_iter=100,\n                 early_stopping_boruta_rounds=None,\n                 param_grid=None,\n                 greater_is_better=False,\n                 importance_type='feature_importances',\n                 train_importance=True,\n                 n_iter=None,\n                 sampling_seed=None,\n                 verbose=1,\n                 n_jobs=None):\n\n        self.estimator = estimator\n        self.perc = perc\n        self.alpha = alpha\n        self.max_iter = max_iter\n        self.early_stopping_boruta_rounds = early_stopping_boruta_rounds\n        self.param_grid = param_grid\n        self.greater_is_better = greater_is_better\n        self.importance_type = importance_type\n        self.train_importance = train_importance\n        self.n_iter = n_iter\n        self.sampling_seed = sampling_seed\n        self.verbose = verbose\n        self.n_jobs = n_jobs\n\n    def _build_model(self, params=None):\n        \"\"\"Private method to build model.\"\"\"\n\n        estimator = clone(self.estimator)\n\n        if params is None:\n            model = _Boruta(\n                estimator=estimator,\n                perc=self.perc,\n                alpha=self.alpha,\n                max_iter=self.max_iter,\n                early_stopping_boruta_rounds=self.early_stopping_boruta_rounds,\n                importance_type=self.importance_type,\n                train_importance=self.train_importance,\n                verbose=self.verbose\n            )\n\n        else:\n            estimator.set_params(**params)\n            model = _Boruta(\n                estimator=estimator,\n                perc=self.perc,\n                alpha=self.alpha,\n                max_iter=self.max_iter,\n                early_stopping_boruta_rounds=self.early_stopping_boruta_rounds,\n                importance_type=self.importance_type,\n                train_importance=self.train_importance,\n                verbose=self.verbose\n            )\n\n        return model\n\n\nclass BoostRFE(_BoostSearch, _RFE):\n    \"\"\"Simultaneous features selection with RFE and hyperparamater searching\n    on a given validation set for LGBModel or XGBModel.\n\n    Pass a LGBModel or XGBModel to compute features selection with RFE.\n    The gradient boosting instance with the best features is selected.\n    When a eval_set is provided, the best gradient boosting and the best\n    features are obtained evaluating the score with eval_metric.\n    Otherwise, the best combination is obtained looking only at feature\n    importance.\n\n    If param_grid is a dictionary with parameter boundaries, a hyperparameter\n    tuning is computed simultaneously. The parameter combinations are scored on\n    the provided eval_set.\n    To operate random search pass distributions in the param_grid with rvs\n    method for sampling (such as those from scipy.stats.distributions).\n    To operate bayesian search pass hyperopt distributions.\n    The specification of n_iter or sampling_seed is effective only with random\n    or hyperopt searches.\n    The best parameter combination is the one which obtain the better score\n    (as returned by eval_metric) on the provided eval_set.\n\n    If all parameters are presented as a list/floats/integers, grid-search\n    is performed. If at least one parameter is given as a distribution (such as\n    those from scipy.stats.distributions), random-search is performed computing\n    sampling with replacement. Bayesian search is effective only when all the\n    parameters to tune are in form of hyperopt distributions.\n    It is highly recommended to use continuous distributions for continuous\n    parameters.\n\n    Parameters\n    ----------\n    estimator : object\n        A supervised learning estimator of LGBModel or XGBModel type.\n\n    step : int or float, default=1\n        If greater than or equal to 1, then `step` corresponds to the\n        (integer) number of features to remove at each iteration.\n        If within (0.0, 1.0), then `step` corresponds to the percentage\n        (rounded down) of features to remove at each iteration.\n        Note that the last iteration may remove fewer than `step` features in\n        order to reach `min_features_to_select`.\n\n    min_features_to_select : int, default=None\n        The minimum number of features to be selected. This number of features\n        will always be scored, even if the difference between the original\n        feature count and `min_features_to_select` isn't divisible by\n        `step`. The default value for min_features_to_select is set to 1 when a\n        eval_set is provided, otherwise it always corresponds to n_features // 2.\n\n    importance_type : str, default='feature_importances'\n         Which importance measure to use. It can be 'feature_importances'\n         (the default feature importance of the gradient boosting estimator)\n         or 'shap_importances'.\n\n    train_importance : bool, default=True\n        Effective only when importance_type='shap_importances'.\n        Where to compute the shap feature importance: on train (True)\n        or on eval_set (False).\n\n    param_grid : dict, default=None\n        Dictionary with parameters names (`str`) as keys and distributions\n        or lists of parameters to try.\n        None means no hyperparameters search.\n\n    greater_is_better : bool, default=False\n        Effective only when hyperparameters searching.\n        Whether the quantity to monitor is a score function,\n        meaning high is good, or a loss function, meaning low is good.\n\n    n_iter : int, default=None\n        Effective only when hyperparameters searching.\n        Effective only for random or hyperopt serach.\n        Number of parameter settings that are sampled.\n        n_iter trades off runtime vs quality of the solution.\n\n    sampling_seed : int, default=None\n        Effective only when hyperparameters searching.\n        Effective only for random or hyperopt serach.\n        The seed used to sample from the hyperparameter distributions.\n\n    n_jobs : int, default=None\n        Effective only when hyperparameters searching without hyperopt.\n        The number of jobs to run in parallel for model fitting.\n        ``None`` means 1 using one processor. ``-1`` means using all\n        processors.\n\n    verbose : int, default=1\n        Verbosity mode. <=0 silent all; ==1 print trial logs (when\n        hyperparameters searching); >1 print feature selection logs plus\n        trial logs (when hyperparameters searching).\n\n    Attributes\n    ----------\n    estimator_ : estimator\n        The fitted estimator with the select features and the optimal\n        parameter combination (when hyperparameters searching).\n\n    n_features_ : int\n        The number of selected features (from the best param config\n        when hyperparameters searching).\n\n    ranking_ : ndarray of shape (n_features,)\n        The feature ranking, such that ``ranking_[i]`` corresponds to the\n        ranking position of the i-th feature (from the best param config\n        when hyperparameters searching). Selected  features are assigned\n        rank 1.\n\n    support_ : ndarray of shape (n_features,)\n        The mask of selected features (from the best param config\n        when hyperparameters searching).\n\n    score_history_ : list\n        Available only when a eval_set is provided.\n        Scores obtained reducing the features (from the best param config\n        when hyperparameters searching).\n\n    best_params_ : dict\n        Available only when hyperparameters searching.\n        Parameter setting that gave the best results on the eval_set.\n\n    trials_ : list\n        Available only when hyperparameters searching.\n        A list of dicts. The dicts are all the parameter combinations tried\n        and derived from the param_grid.\n\n    best_score_ : float\n        Available only when hyperparameters searching.\n        The best score achieved by all the possible combination created.\n\n    scores_ : list\n        Available only when hyperparameters searching.\n        The scores achieved on the eval_set by all the models tried.\n\n    best_iter_ : int\n        Available only when hyperparameters searching.\n        The boosting iterations achieved by the best parameters combination.\n\n    iterations_ : list\n        Available only when hyperparameters searching.\n        The boosting iterations of all the models tried.\n\n    boost_type_ : str\n        The type of the boosting estimator (LGB or XGB).\n    \"\"\"\n\n    def __init__(self,\n                 estimator, *,\n                 min_features_to_select=None,\n                 step=1,\n                 param_grid=None,\n                 greater_is_better=False,\n                 importance_type='feature_importances',\n                 train_importance=True,\n                 n_iter=None,\n                 sampling_seed=None,\n                 verbose=1,\n                 n_jobs=None):\n\n        self.estimator = estimator\n        self.min_features_to_select = min_features_to_select\n        self.step = step\n        self.param_grid = param_grid\n        self.greater_is_better = greater_is_better\n        self.importance_type = importance_type\n        self.train_importance = train_importance\n        self.n_iter = n_iter\n        self.sampling_seed = sampling_seed\n        self.verbose = verbose\n        self.n_jobs = n_jobs\n\n    def _build_model(self, params=None):\n        \"\"\"Private method to build model.\"\"\"\n\n        estimator = clone(self.estimator)\n\n        if params is None:\n            model = _RFE(\n                estimator=estimator,\n                min_features_to_select=self.min_features_to_select,\n                step=self.step,\n                greater_is_better=self.greater_is_better,\n                importance_type=self.importance_type,\n                train_importance=self.train_importance,\n                verbose=self.verbose\n            )\n\n        else:\n            estimator.set_params(**params)\n            model = _RFE(\n                estimator=estimator,\n                min_features_to_select=self.min_features_to_select,\n                step=self.step,\n                greater_is_better=self.greater_is_better,\n                importance_type=self.importance_type,\n                train_importance=self.train_importance,\n                verbose=self.verbose\n            )\n\n        return model\n\n\nclass BoostRFA(_BoostSearch, _RFA):\n    \"\"\"Simultaneous features selection with RFA and hyperparamater searching\n    on a given validation set for LGBModel or XGBModel.\n\n    Pass a LGBModel or XGBModel to compute features selection with RFA.\n    The gradient boosting instance with the best features is selected.\n    When a eval_set is provided, the best gradient boosting and the best\n    features are obtained evaluating the score with eval_metric.\n    Otherwise, the best combination is obtained looking only at feature\n    importance.\n\n    If param_grid is a dictionary with parameter boundaries, a hyperparameter\n    tuning is computed simultaneously. The parameter combinations are scored on\n    the provided eval_set.\n    To operate random search pass distributions in the param_grid with rvs\n    method for sampling (such as those from scipy.stats.distributions).\n    To operate bayesian search pass hyperopt distributions.\n    The specification of n_iter or sampling_seed is effective only with random\n    or hyperopt searches.\n    The best parameter combination is the one which obtain the better score\n    (as returned by eval_metric) on the provided eval_set.\n\n    If all parameters are presented as a list/floats/integers, grid-search\n    is performed. If at least one parameter is given as a distribution (such as\n    those from scipy.stats.distributions), random-search is performed computing\n    sampling with replacement. Bayesian search is effective only when all the\n    parameters to tune are in form of hyperopt distributions.\n    It is highly recommended to use continuous distributions for continuous\n    parameters.\n\n    Parameters\n    ----------\n    estimator : object\n        A supervised learning estimator of LGBModel or XGBModel type.\n\n    step : int or float, default=1\n        If greater than or equal to 1, then `step` corresponds to the\n        (integer) number of features to remove at each iteration.\n        If within (0.0, 1.0), then `step` corresponds to the percentage\n        (rounded down) of features to remove at each iteration.\n        Note that the last iteration may remove fewer than `step` features in\n        order to reach `min_features_to_select`.\n\n    min_features_to_select : int, default=None\n        The minimum number of features to be selected. This number of features\n        will always be scored, even if the difference between the original\n        feature count and `min_features_to_select` isn't divisible by\n        `step`. The default value for min_features_to_select is set to 1 when a\n        eval_set is provided, otherwise it always corresponds to n_features // 2.\n\n    importance_type : str, default='feature_importances'\n         Which importance measure to use. It can be 'feature_importances'\n         (the default feature importance of the gradient boosting estimator)\n         or 'shap_importances'.\n\n    train_importance : bool, default=True\n        Effective only when importance_type='shap_importances'.\n        Where to compute the shap feature importance: on train (True)\n        or on eval_set (False).\n\n    param_grid : dict, default=None\n        Dictionary with parameters names (`str`) as keys and distributions\n        or lists of parameters to try.\n        None means no hyperparameters search.\n\n    greater_is_better : bool, default=False\n        Effective only when hyperparameters searching.\n        Whether the quantity to monitor is a score function,\n        meaning high is good, or a loss function, meaning low is good.\n\n    n_iter : int, default=None\n        Effective only when hyperparameters searching.\n        Effective only for random or hyperopt serach.\n        Number of parameter settings that are sampled.\n        n_iter trades off runtime vs quality of the solution.\n\n    sampling_seed : int, default=None\n        Effective only when hyperparameters searching.\n        Effective only for random or hyperopt serach.\n        The seed used to sample from the hyperparameter distributions.\n\n    n_jobs : int, default=None\n        Effective only when hyperparameters searching without hyperopt.\n        The number of jobs to run in parallel for model fitting.\n        ``None`` means 1 using one processor. ``-1`` means using all\n        processors.\n\n    verbose : int, default=1\n        Verbosity mode. <=0 silent all; ==1 print trial logs (when\n        hyperparameters searching); >1 print feature selection logs plus\n        trial logs (when hyperparameters searching).\n\n    Attributes\n    ----------\n    estimator_ : estimator\n        The fitted estimator with the select features and the optimal\n        parameter combination (when hyperparameters searching).\n\n    n_features_ : int\n        The number of selected features (from the best param config\n        when hyperparameters searching).\n\n    ranking_ : ndarray of shape (n_features,)\n        The feature ranking, such that ``ranking_[i]`` corresponds to the\n        ranking position of the i-th feature (from the best param config\n        when hyperparameters searching). Selected  features are assigned\n        rank 1.\n\n    support_ : ndarray of shape (n_features,)\n        The mask of selected features (from the best param config\n        when hyperparameters searching).\n\n    score_history_ : list\n        Available only when a eval_set is provided.\n        Scores obtained reducing the features (from the best param config\n        when hyperparameters searching).\n\n    best_params_ : dict\n        Available only when hyperparameters searching.\n        Parameter setting that gave the best results on the eval_set.\n\n    trials_ : list\n        Available only when hyperparameters searching.\n        A list of dicts. The dicts are all the parameter combinations tried\n        and derived from the param_grid.\n\n    best_score_ : float\n        Available only when hyperparameters searching.\n        The best score achieved by all the possible combination created.\n\n    scores_ : list\n        Available only when hyperparameters searching.\n        The scores achieved on the eval_set by all the models tried.\n\n    best_iter_ : int\n        Available only when hyperparameters searching.\n        The boosting iterations achieved by the best parameters combination.\n\n    iterations_ : list\n        Available only when hyperparameters searching.\n        The boosting iterations of all the models tried.\n\n    boost_type_ : str\n        The type of the boosting estimator (LGB or XGB).\n\n    Notes\n    -----\n    The code for the RFA algorithm is inspired and improved from:\n    https://github.com/heberleh/recursive-feature-addition\n    \"\"\"\n\n    def __init__(self,\n                 estimator, *,\n                 min_features_to_select=None,\n                 step=1,\n                 param_grid=None,\n                 greater_is_better=False,\n                 importance_type='feature_importances',\n                 train_importance=True,\n                 n_iter=None,\n                 sampling_seed=None,\n                 verbose=1,\n                 n_jobs=None):\n\n        self.estimator = estimator\n        self.min_features_to_select = min_features_to_select\n        self.step = step\n        self.param_grid = param_grid\n        self.greater_is_better = greater_is_better\n        self.importance_type = importance_type\n        self.train_importance = train_importance\n        self.n_iter = n_iter\n        self.sampling_seed = sampling_seed\n        self.verbose = verbose\n        self.n_jobs = n_jobs\n\n    def _build_model(self, params=None):\n        \"\"\"Private method to build model.\"\"\"\n\n        estimator = clone(self.estimator)\n\n        if params is None:\n            model = _RFA(\n                estimator=estimator,\n                min_features_to_select=self.min_features_to_select,\n                step=self.step,\n                greater_is_better=self.greater_is_better,\n                importance_type=self.importance_type,\n                train_importance=self.train_importance,\n                verbose=self.verbose\n            )\n\n        else:\n            estimator.set_params(**params)\n            model = _RFA(\n                estimator=estimator,\n                min_features_to_select=self.min_features_to_select,\n                step=self.step,\n                greater_is_better=self.greater_is_better,\n                importance_type=self.importance_type,\n                train_importance=self.train_importance,\n                verbose=self.verbose\n            )\n\n        return model"
  },
  {
    "path": "shaphypetune/utils.py",
    "content": "import random\nimport numpy as np\nfrom itertools import product\n\nfrom shap import TreeExplainer\n\n\ndef _check_boosting(model):\n    \"\"\"Check if the estimator is a LGBModel or XGBModel.\n\n    Returns\n    -------\n    Model type in string format.\n    \"\"\"\n\n    estimator_type = str(type(model)).lower()\n\n    boost_type = ('LGB' if 'lightgbm' in estimator_type else '') + \\\n                 ('XGB' if 'xgboost' in estimator_type else '')\n\n    if len(boost_type) != 3:\n        raise ValueError(\"Pass a LGBModel or XGBModel.\")\n\n    return boost_type\n\n\ndef _shap_importances(model, X):\n    \"\"\"Extract feature importances from fitted boosting models\n    using TreeExplainer from shap.\n\n    Returns\n    -------\n    array of feature importances.\n    \"\"\"\n\n    explainer = TreeExplainer(\n        model, feature_perturbation=\"tree_path_dependent\")\n    coefs = explainer.shap_values(X)\n\n    if isinstance(coefs, list):\n        coefs = list(map(lambda x: np.abs(x).mean(0), coefs))\n        coefs = np.sum(coefs, axis=0)\n    else:\n        coefs = np.abs(coefs).mean(0)\n\n    return coefs\n\n\ndef _feature_importances(model):\n    \"\"\"Extract feature importances from fitted boosting models.\n\n    Returns\n    -------\n    array of feature importances.\n    \"\"\"\n\n    if hasattr(model, 'coef_'):  ## booster='gblinear' (xgb)\n        coefs = np.square(model.coef_).sum(axis=0)\n    else:\n        coefs = model.feature_importances_\n\n    return coefs\n\n\ndef _get_categorical_support(n_features, fit_params):\n    \"\"\"Obtain boolean mask for categorical features\"\"\"\n\n    cat_support = np.zeros(n_features, dtype=bool)\n    cat_ids = []\n\n    msg = \"When manually setting categarical features, \" \\\n          \"pass a 1D array-like of categorical columns indices \" \\\n          \"(specified as integers).\"\n\n    if 'categorical_feature' in fit_params:  # LGB\n        cat_ids = fit_params['categorical_feature']\n        if len(np.shape(cat_ids)) != 1:\n            raise ValueError(msg)\n        if not all([isinstance(c, int) for c in cat_ids]):\n            raise ValueError(msg)\n\n    cat_support[cat_ids] = True\n\n    return cat_support\n\n\ndef _set_categorical_indexes(support, cat_support, _fit_params,\n                             duplicate=False):\n    \"\"\"Map categorical features in each data repartition\"\"\"\n\n    if cat_support.any():\n\n        n_features = support.sum()\n        support_id = np.zeros_like(support, dtype='int32')\n        support_id[support] = np.arange(n_features, dtype='int32')\n        cat_feat = support_id[np.where(support & cat_support)[0]]\n        # empty if support and cat_support are not alligned\n\n        if duplicate:  # is Boruta\n            cat_feat = cat_feat.tolist() + (n_features + cat_feat).tolist()\n        else:\n            cat_feat = cat_feat.tolist()\n\n        _fit_params['categorical_feature'] = cat_feat\n\n    return _fit_params\n\n\ndef _check_param(values):\n    \"\"\"Check the parameter boundaries passed in dict values.\n\n    Returns\n    -------\n    list of checked parameters.\n    \"\"\"\n\n    if isinstance(values, (list, tuple, np.ndarray)):\n        return list(set(values))\n    elif 'scipy' in str(type(values)).lower():\n        return values\n    elif 'hyperopt' in str(type(values)).lower():\n        return values\n    else:\n        return [values]\n\n\nclass ParameterSampler(object):\n    \"\"\"Generator on parameters sampled from given distributions.\n    If all parameters are presented as a list, sampling without replacement is\n    performed. If at least one parameter is given as a scipy distribution,\n    sampling with replacement is used. If all parameters are given as hyperopt\n    distributions Tree of Parzen Estimators searching from hyperopt is computed.\n    It is highly recommended to use continuous distributions for continuous\n    parameters.\n\n    Parameters\n    ----------\n    param_distributions : dict\n        Dictionary with parameters names (`str`) as keys and distributions\n        or lists of parameters to try. Distributions must provide a ``rvs``\n        method for random sampling (such as those from scipy.stats.distributions)\n        or be hyperopt distributions for bayesian searching.\n        If a list is given, it is sampled uniformly.\n\n    n_iter : integer, default=None\n        Number of parameter configurations that are produced.\n\n    random_state : int, default=None\n        Pass an int for reproducible output across multiple\n        function calls.\n\n    Returns\n    -------\n    param_combi : list of dicts or dict of hyperopt distributions\n        Parameter combinations.\n\n    searching_type : str\n        The searching algorithm used.\n    \"\"\"\n\n    def __init__(self, param_distributions, n_iter=None, random_state=None):\n\n        self.n_iter = n_iter\n        self.random_state = random_state\n        self.param_distributions = param_distributions\n\n    def sample(self):\n        \"\"\"Generator parameter combinations from given distributions.\"\"\"\n\n        param_distributions = self.param_distributions.copy()\n\n        is_grid = all(isinstance(p, list)\n                      for p in param_distributions.values())\n        is_random = all(isinstance(p, list) or 'scipy' in str(type(p)).lower()\n                        for p in param_distributions.values())\n        is_hyperopt = all('hyperopt' in str(type(p)).lower()\n                          or (len(p) < 2 if isinstance(p, list) else False)\n                          for p in param_distributions.values())\n\n        if is_grid:\n            param_combi = list(product(*param_distributions.values()))\n            param_combi = [\n                dict(zip(param_distributions.keys(), combi))\n                for combi in param_combi\n            ]\n            return param_combi, 'grid'\n\n        elif is_random:\n            if self.n_iter is None:\n                raise ValueError(\n                    \"n_iter must be an integer >0 when scipy parameter \"\n                    \"distributions are provided. Get None.\"\n                )\n\n            seed = (random.randint(1, 100) if self.random_state is None\n                    else self.random_state + 1)\n            random.seed(seed)\n\n            param_combi = []\n            k = self.n_iter\n            for i in range(self.n_iter):\n                dist = param_distributions.copy()\n                combi = []\n                for j, v in enumerate(dist.values()):\n                    if 'scipy' in str(type(v)).lower():\n                        combi.append(v.rvs(random_state=seed * (k + j)))\n                    else:\n                        combi.append(v[random.randint(0, len(v) - 1)])\n                    k += i + j\n                param_combi.append(\n                    dict(zip(param_distributions.keys(), combi))\n                )\n            np.random.mtrand._rand\n\n            return param_combi, 'random'\n\n        elif is_hyperopt:\n            if self.n_iter is None:\n                raise ValueError(\n                    \"n_iter must be an integer >0 when hyperopt \"\n                    \"search spaces are provided. Get None.\"\n                )\n            param_distributions = {\n                k: p[0] if isinstance(p, list) else p\n                for k, p in param_distributions.items()\n            }\n\n            return param_distributions, 'hyperopt'\n\n        else:\n            raise ValueError(\n                \"Parameters not recognized. \"\n                \"Pass lists, scipy distributions (also in conjunction \"\n                \"with lists), or hyperopt search spaces.\"\n            )"
  }
]