Repository: gdikov/hypertunity
Branch: master
Commit: 768b26137f36
Files: 54
Total size: 170.1 KB
Directory structure:
gitextract_y6a8vwvu/
├── .circleci/
│ └── config.yml
├── .gitignore
├── .readthedocs.yml
├── CHANGELOG.md
├── LICENSE
├── README.md
├── conftest.py
├── docs/
│ ├── Makefile
│ ├── conf.py
│ ├── index.rst
│ ├── manual/
│ │ ├── domain.rst
│ │ ├── installation.rst
│ │ ├── optimisation.rst
│ │ ├── quickstart.rst
│ │ ├── reports.rst
│ │ └── scheduling.rst
│ └── source/
│ ├── hypertunity.rst
│ ├── optimisation.rst
│ ├── reports.rst
│ └── scheduling.rst
├── hypertunity/
│ ├── __init__.py
│ ├── domain.py
│ ├── optimisation/
│ │ ├── __init__.py
│ │ ├── base.py
│ │ ├── bo.py
│ │ ├── exhaustive.py
│ │ ├── random.py
│ │ └── tests/
│ │ ├── __init__.py
│ │ ├── _common.py
│ │ ├── test_bo.py
│ │ ├── test_exhaustive.py
│ │ └── test_random.py
│ ├── reports/
│ │ ├── __init__.py
│ │ ├── base.py
│ │ ├── table.py
│ │ ├── tensorboard.py
│ │ └── tests/
│ │ ├── __init__.py
│ │ ├── conftest.py
│ │ ├── test_table.py
│ │ └── test_tensorboard.py
│ ├── scheduling/
│ │ ├── __init__.py
│ │ ├── jobs.py
│ │ ├── scheduler.py
│ │ └── tests/
│ │ ├── __init__.py
│ │ ├── script.py
│ │ ├── test_jobs.py
│ │ └── test_scheduler.py
│ ├── tests/
│ │ ├── __init__.py
│ │ ├── test_domain.py
│ │ ├── test_trial.py
│ │ └── test_utils.py
│ ├── trial.py
│ └── utils.py
└── setup.py
================================================
FILE CONTENTS
================================================
================================================
FILE: .circleci/config.yml
================================================
# Python CircleCI 2.0 configuration file
version: 2
jobs:
build:
docker:
- image: circleci/python:3.7.3
working_directory: ~/repo
steps:
- checkout
- restore_cache:
keys:
- env-build
- run:
name: setup env
command: |
python3 -m venv venv
. venv/bin/activate
pip install --upgrade pip
pip install ./[tensorboard,tests,docs]
- save_cache:
paths:
- ./venv
key: env-build
- run:
name: run tests
command: |
. venv/bin/activate
py.test --verbose --runslow hypertunity
- store_artifacts:
path: test-reports
destination: test-reports
================================================
FILE: .gitignore
================================================
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit tests / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
.pytest_cache/
# Jupyter Notebook
.ipynb_checkpoints
# Environments
.venv*
# Pycharm project settings
.idea
# mkdocs documentation
/site
# mypy
.mypy_cache/
# Sphinx documentation
/docs/_build
================================================
FILE: .readthedocs.yml
================================================
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
version: 2
# Sphinx settings
sphinx:
builder: html
configuration: docs/conf.py
fail_on_warning: true
# Python settings
python:
version: 3.7
install:
- method: pip
path: .
extra_requirements:
- docs
================================================
FILE: CHANGELOG.md
================================================
# Changelog
All notable changes to this project will be documented in this file.
## [Unreleased]
## [1.0.1] - 2020-01-27
## Changed
- some code style related changes are applied, such as import sorting and line length shortening.
- refactoring in tests to use pytest parameterisation and fixtures.
## Fixed
- issue with running callables from script thanks to David Turner (https://github.com/gdikov/hypertunity/pull/43).
- issue with tensorflow version comparison in the tensorboard reporter.
## [1.0.0] - 2019-11-10
## Added
- `Reporter` instance can be loaded with data from the database of another reporter using a `from_database()` method.
- data from a `Reporter` instance can be exported into a `HistoryPoint` list to load into an optimiser.
- compiled documentation and logo.
- `BayesianOptimisation` raises `ExhaustedSearchSpaceError` if a discrete domain is exhausted.
## Changed
- minor fixes in documentation typos, argument names and tests.
- `Domain` is moved from `hypertunity.optimisation` to the `hypertunity` package.
- rename `TableReporter` to `Table` and `TensorboardReporter` to `Tensorboard`.
- `ExhaustedSearchSpaceError` is moved from `optimisation.exhastive` to `optimisation.base` module.
- `Trial` running a task from a job is now done with dict as input keyword arguments or named command line arguments.
## Fixed
- bug in `BayesianOptimisation` sample conversion for nested dictionaries.
- bug in `BayesianOptimisation` type preserving between the domain and the sample value.
- bug in `Tensorboard` reporter for real intervals with integer boundaries.
- bug in `Reporter` for not using the default metric name during logging.
## [0.4.0] - 2019-09-15
## Added
- `Trial` a wrapper class for high-level usage, which runs the optimiser, evaluates the objective
by scheduling jobs, updates the optimiser and summarises the results.
- a `Job` from a script with command line arguments can now be run with
named arguments passed as a dictionary instead of a tuple.
- checkpointing of results on disk when calling `log()` or a `Reporter` object.
- optimisation history can now be loaded into an `Optimiser`. Example use-case would be to warm-start
`BayesianOptimisation` from the history of the quicker `RandomSearch`.
## Changed
- every `Reporter` instance has a `primary_metric` attribute, which is an argument to `__init__`.
## Fixed
- validation of `Domain` is not allowing for intervals with more than 2 numbers.
- minor bugs in tests.
## [0.3.1] - 2019-09-10
## Fixed
- `Optimiser.update()` now accepts evaluation arguments that are float, `EvaluationScore` or a dict
with metric names and floats or `EvaluationScore`s. This is valid for all optimisers.
## [0.3.0] - 2019-09-08
## Added
- `Job` can now be scheduled locally to run command line scripts with arguments.
- `BayesianOptimisation.run_step` can pass arguments to the backend for better customisation.
## Changed
- any `Reporter` object can be fed with data from a tuple of a
`Sample` object and a score, which can be a float or an `EvaluationScore`.
- `BayesianOptimisation` optimiser can be updated with a `Sample` and
a float or `EvaluationScore` objective evaluation types.
- a discrete/categorical `Domain` is defined with a set literal instead of a tuple.
- `Job` supports running functions from within a script by specifying 'script_path::func_name'.
- `batch_size` is no more an attribute of an `Optimiser` but an argument to `run_step`.
- `minimise` is no more an attribute of `BayesianOptimisation` but an argument to `run_step`.
## [0.2.0] - 2019-08-28
## Added
- `Scheduler` to run jobs locally using joblib.
- `SlurmJob` and `Job` dataclasses defining the tasks to be scheduled.
- `Result` dataclass encapsulating the results from the tasks.
- `TableReporter` class for reporting results in tabular format.
- `Reporter` base class for extending reporters.
## Changed
- `Base`-prefix is removed from all base classes which reside
in `base.py` modules.
- `split_by_type` is now a method of the `Domain` class.
- `Optimiser` has a `batch_size` attribute accessible as a property.
## Removed
- `optimisation.bo` package has been removed. Instead a single `bo.py`
module supports the only BO backend---GPyOpt, as of now.
- prefix for the file encoding (default is utf-8).
## [0.1.0] - 2019-07-27
### Added
- `TensorboardReporter` result logger using `HParams`.
- `GpyOpt` backend for `BayesianOptimisation`.
- `RandomSearch` optimiser.
- `GridSearch` optimiser.
- `Domain` and `Sample` classes as foundations for the optimisers.
================================================
FILE: LICENSE
================================================
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
================================================
FILE: README.md
================================================
[](https://circleci.com/gh/gdikov/hypertunity)
[](https://hypertunity.readthedocs.io/en/latest/?badge=latest)

## Why Hypertunity
Hypertunity is a lightweight, high-level library for hyperparameter optimisation.
Among others, it supports:
* Bayesian optimisation by wrapping [GPyOpt](http://sheffieldml.github.io/GPyOpt/),
* external or internal objective function evaluation by a scheduler, also compatible with [Slurm](https://slurm.schedmd.com),
* real-time visualisation of results in [Tensorboard](https://www.tensorflow.org/tensorboard)
via the [HParams](https://www.tensorflow.org/tensorboard/r2/hyperparameter_tuning_with_hparams) plugin.
For the full set of features refer to the [documentation](https://hypertunity.readthedocs.io).
## Quick start
Define the objective function to optimise. For example, it can take the hyperparameters `params` as input and
return a raw value `score` as output:
```python
import hypertunity as ht
def foo(**params) -> float:
# do some very costly computations
...
return score
```
To define the valid ranges for the values of `params` we create a `Domain` object:
```python
domain = ht.Domain({
"x": [-10., 10.], # continuous variable within the interval [-10., 10.]
"y": {"opt1", "opt2"}, # categorical variable from the set {"opt1", "opt2"}
"z": set(range(4)) # discrete variable from the set {0, 1, 2, 3}
})
```
Then we set up the optimiser:
```python
bo = ht.BayesianOptimisation(domain=domain)
```
And we run the optimisation for 10 steps. Each result is used to update the optimiser so that informed domain
samples are drawn:
```python
n_steps = 10
for i in range(n_steps):
samples = bo.run_step(batch_size=2, minimise=True) # suggest next samples
evaluations = [foo(**s.as_dict()) for s in samples] # evaluate foo
bo.update(samples, evaluations) # update the optimiser
```
Finally, we visualise the results in Tensorboard:
```python
import hypertunity.reports.tensorboard as tb
results = tb.Tensorboard(domain=domain, metrics=["score"], logdir="path/to/logdir")
results.from_history(bo.history)
```
## Even quicker start
A high-level wrapper class `Trial` allows for seamless parallel optimisation
without bothering with scheduling jobs, updating optimisers and logging:
```python
trial = ht.Trial(objective=foo,
domain=domain,
optimiser="bo",
reporter="tensorboard",
metrics=["score"])
trial.run(n_steps, batch_size=2, n_parallel=2)
```
## Installation
### Using PyPI
To install the base version run:
```bash
pip install hypertunity
```
To use the Tensorboard dashboard, build the docs or run the test suite you will need the following extras:
```bash
pip install hypertunity[tensorboard,docs,tests]
```
### From source
Checkout the latest master and install locally:
```bash
git clone https://github.com/gdikov/hypertunity.git
cd hypertunity
pip install ./[tensorboard]
```
================================================
FILE: conftest.py
================================================
import pytest
def pytest_addoption(parser):
parser.addoption(
"--runslow",
action="store_true",
default=False,
help="run slow tests"
)
parser.addoption(
"--runslurm",
action="store_true",
default=False,
help="run slurm tests"
)
def pytest_configure(config):
config.addinivalue_line(
"markers", "slow: mark test as slow to run"
)
config.addinivalue_line(
"markers", "slurm: mark test which require slurm to run"
)
def pytest_collection_modifyitems(config, items):
def mark_skip(keyword):
if config.getoption(f"--run{keyword}"):
return
skip = pytest.mark.skip(reason=f"need --run{keyword} option to run")
for item in items:
if keyword in item.keywords:
item.add_marker(skip)
mark_skip("slow")
mark_skip("slurm")
================================================
FILE: docs/Makefile
================================================
# Minimal makefile for Sphinx documentation
#
# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = .
BUILDDIR = _build
# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
.PHONY: help Makefile
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
================================================
FILE: docs/conf.py
================================================
# Configuration file for the Sphinx documentation builder.
#
# This file only contains a selection of the most common options. For a full
# list see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html
# -- Path setup --------------------------------------------------------------
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
import os
import sys
sys.path.insert(0, os.path.abspath('..'))
import hypertunity
# The short X.Y version.
version = '.'.join(hypertunity.__version__.split('.', 2)[:2])
# The full version, including alpha/beta/rc tags.
release = hypertunity.__version__
# -- Project information -----------------------------------------------------
project = 'Hypertunity'
copyright = '2019, Georgi Dikov'
author = 'Georgi Dikov'
# -- General configuration ---------------------------------------------------
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
'sphinx.ext.autodoc',
'sphinx.ext.autosummary',
'sphinx.ext.napoleon',
'sphinx.ext.viewcode'
]
# Napoleon settings
napoleon_google_docstring = True
napoleon_numpy_docstring = False
napoleon_include_init_with_doc = True
napoleon_include_private_with_doc = False
napoleon_include_special_with_doc = True
napoleon_use_admonition_for_examples = False
napoleon_use_admonition_for_notes = True
napoleon_use_admonition_for_references = True
napoleon_use_ivar = True
napoleon_use_param = True
napoleon_use_keyword = True
napoleon_use_rtype = True
autodoc_typehints = 'none'
autodoc_mock_imports = ['tensorflow', 'tensorboard']
source_suffix = '.rst'
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store', 'test*']
# -- Options for HTML output -------------------------------------------------
html_theme = 'sphinx_rtd_theme'
pygments_style = 'sphinx'
add_module_names = False
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
# this is needed as HTML5 causes an ugly rendering of the "Parameters", "Returns", etc. fields
html4_writer = True
html_theme_options = {
"logo_only": True,
'display_version': True,
'style_nav_header_background': '#002A3F',
# Toc options
'collapse_navigation': True
}
html_context = {
"display_github": True, # Add 'Edit on Github' link instead of 'View page source'
# "last_updated": True,
# "commit": False,
}
html_logo = "_static/images/logo_inverted.svg"
html_favicon = '_static/images/favicon.ico'
github_url = "https://github.com/gdikov/hypertunity"
================================================
FILE: docs/index.rst
================================================
:github_url: https://github.com/gdikov/hypertunity
.. image:: _static/images/logo.svg
:width: 800
:align: center
:alt: Hypertunity logo
========
Welcome!
========
Hypertunity is a lightweight, high-level library for hyperparameter optimisation.
Among others, it supports:
* Bayesian optimisation by wrapping `GPyOpt `_
* external or internal objective evaluation using a scheduler, also compatible with `Slurm `_
* real-time visualisation of results in `Tensorboard `_ using the `HParams `_ plugin.
The main guiding design principles are:
* **Modular**: you can use any optimiser and reporter as well as schedule jobs locally or on Slurm without changes in the API.
* **Simple**: the small codebase (just about 1000 LOC) and the flat subpackage hierarchy makes it easy to use, maintain and extend.
* **Extensible**: base classes such as :class:`Optimiser`, :class:`Job` and :class:`Reporter` allow for seamless implementation of customized functionality.
.. toctree::
:maxdepth: 2
:caption: User Guide
manual/installation
manual/quickstart
manual/domain
manual/optimisation
manual/reports
manual/scheduling
.. toctree::
:maxdepth: 2
:caption: API Reference
source/hypertunity
source/optimisation
source/reports
source/scheduling
Indices and tables
------------------
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
================================================
FILE: docs/manual/domain.rst
================================================
Domain
======
The set of all hyperparameters and the corresponding ranges of possible values is specified using the :class:`Domain` class.
It can be initialised with a dictionary mapping parameter names to continuous numeric intervals or discrete sets.
The former are given as python :obj:`list` and the latter---as :obj:`set`.
For example, to define a domain over the continuous interval [-10, 10] and the discrete set of
strings {"option_1", "option_2"}, it suffices to write:
.. code-block:: python
domain = Domain({"var_1": [-10, 10], "var_2": {"option_1", "option_2"}})
where ``"var_1"`` and ``"var_2"`` are two arbitrary names for the two subdomains.
Given this domain we can now generate samples from it using the :py:meth:`sample()` method:
.. code-block:: python
>>> domain.sample()
{'var_1': -8.529187978165552, 'var_2': 'option_1'}
The returned objects are of class :class:`Sample` and represent one realisation of the domain.
It is represented as a mapping of parameter names to samples from the set of possible values.
It also has a handy conversion methods such as :py:meth:`as_dict()` or :py:meth:`as_namedtuple()` which enable accessing
parameters using the `["var_1"]` or `.var_1` notation.
Both :class:`Domain` and :class:`Sample` objects allow for nested subdomains, e.g.:
.. code-block:: python
>>> domain = Domain({
... "subdomain_a": {"var_1": [-10, 10], "var_2": {"option_1", "option_2"}},
... "subdomain_b": {"var_1": [-1, 1], "var_2": {"option_1", "option_2"}}
... })
>>> sample = domain.sample()
>>> sample
{
'subdomain_a': {'var_1': -6.892359956494582, 'var_2': 'option_2'},
'subdomain_b': {'var_1': 0.21004903180560652, 'var_2': 'option_1'}
}
>>> nt_sample = sample.as_namedtuple()
>>> nt_sample.subdomain_a.var_2
'option_2'
================================================
FILE: docs/manual/installation.rst
================================================
Installation
============
Requirements
------------
Hypertunity has been tested with Python 3.6 and 3.7. As of now, there are no plans to support earlier versions of Python.
The reason for that is the usage of variable and function annotations, dataclasses as well as relying on the fact that the
insertion order of the keys in a dictionary is preserved during iteration. Porting Hypertunity to earlier versions will
only make it unnecessarily hard to maintain.
From PyPI
---------
To get the latest stable release just run:
.. code-block:: bash
pip install hypertunity
Note that this will install the basic version only, without support for Tensorboard visualisations.
To enable this feature you will need to specify the option `tensorboard`.
To run the tests or compile the docs add the `tests` and `docs` options respectively:
.. code-block:: bash
pip install hypertunity[tensorboard,tests,docs]
From source
-----------
To install the bleeding-edge version of Hypertunity, clone the repository, checkout the master branch
and install from source:
.. code-block:: bash
git clone https://github.com/gdikov/hypertunity.git
cd hypertunity
git checkout master
pip install ./[tensorboard,tests,docs]
================================================
FILE: docs/manual/optimisation.rst
================================================
Optimisation
============
Hypertunity ships with three types of hyperparameter space exploration algorithms. A Bayesian optimisation, random and
grid search. While the first one is sequential in nature and requires evaluations to update its internal model of the
objective function, so that more informed sample suggestions are generated, the latter two are able to generate all samples
in parallel and do not require updating. In this section we will give a brief overview of each.
Bayesian optimisation
---------------------
:class:`BayesianOptimisation` in Hypertunity is a wrapper around `GPyOpt.methods.BayesianOptimization` which uses
Gaussian Process regression to build a surrogate model of the objective function. It is initialised from a :class:`Domain`
object:
.. code-block:: python
bo = BayesianOptimization(domain)
The :class:`BayesianOptimisation` optimiser is highly customisable during sampling. This enables the user to
dynamically refine the model during calling :py:meth:`run_step()`. This approach introduces however the computational
burden of recomputing the surrogate model at each query. In the following example we show how one can set the GP model
using readily available ones from `GPy.models`, e.g. a `GPHeteroschedasticRegression`:
.. code-block:: python
bo = BayesianOptimisation(domain=domain, seed=7) # initialise BO optimiser
kernel = GPy.kern.RBF(1) + GPy.kern.Bias(1) # create a custom kernel
custom_model = GPy.models.GPHeteroscedasticRegression(..., kernel) # create a custom model
samples = bayes_opt.run_step(model=custom_model) # generate samples
Random search
-------------
This class is a wrapper around the :py:meth:`Domain.sample()` method. It has the API of
an :class:`Optimiser` class and yields samples which are uniformly drawn from the domain.
There is no limitation on the number of samples that can be returned in a single call of :py:meth:`run_step()`,
even if this leads to repetitions.
Grid search
-----------
:class:`GridSearch` is a wrapper around the iteration over a domain. It goes over each point in the Cartesian-product of
all discrete subdomains. If one of the subdomains is continuous :class:`GridSearch` will sample uniformly from
this interval. Once the domain is exhausted, further iteration will be prevented by raising an :class:`ExhaustedSearchSpaceError`.
To iterate again the :class:`GridSearch` optimiser must be reset by calling the :py:meth:`reset()` method.
.. code-block:: python
>>> domain = Domain({"x": {1, 2, 3}, "y": {"a", "b"}, "z": [0, 1]})
>>> gs = GridSearch(domain, sample_continuous=True)
>>> gs.run_step(batch_size=6)
[
{'x': 1, 'y': 'b', 'z': 0.054781406913364084},
{'x': 2, 'y': 'b', 'z': 0.7006391867439882},
{'x': 3, 'y': 'b', 'z': 0.9674445624792569},
{'x': 1, 'y': 'a', 'z': 0.7837727333178091},
{'x': 2, 'y': 'a', 'z': 0.17240297136803384},
{'x': 3, 'y': 'a', 'z': 0.844465575155033}
]
>>> gs.reset()
Custom optimiser
----------------
If neither of the predefined optimiser are useful for your problem, you can easily roll out a custom one.
Only thing you have to do is to inherit from the base :class:`Optimiser` class and implement the :py:meth:`run_step` method.
.. code-block:: python
class CustomOptimiser(Optimiser):
def __init__(self, domain, *args, **kwargs):
super(CustomOptimiser, self).__init__(domain)
...
def run_step(batch_size, *args, **kwargs):
...
return [samples]
================================================
FILE: docs/manual/quickstart.rst
================================================
Quick start
===========
A worked example
~~~~~~~~~~~~~~~~
Let's delve in into the API of Hypertunity by going through a worked example---neural network hyperparameter optimisation.
In the following we will tune the number of layers and units, the non-linearity type, as well as the dropout rate and the
learning rate of the optimiser.
**Disclaimer:** This example serves a demonstration purpose only. It does not represent an advanced way of performing
neural network architecture search!
First thing we do it to import Hypertunity, tensorflow and numpy and define a helper data loading function:
.. code-block:: python
import hypertunity as ht
import numpy as np
import tensorflow as tf
import hypertunity.reports.tensorboard as ht_tb
def load_mnist():
(train_x, train_y), (test_x, test_y) = tf.keras.datasets.mnist.load_data()
data_shape = train_x.shape[1:]
train_x = train_x.reshape(-1, np.prod(data_shape)).astype(np.float32) / 255.
mean_train = np.mean(train_x, axis=0)
train_x -= mean_train
test_x = test_x.reshape(-1, np.prod(data_shape)).astype(np.float32) / 255.
test_x -= mean_train
train_y = tf.keras.utils.to_categorical(train_y, num_classes=10)
test_y = tf.keras.utils.to_categorical(test_y, num_classes=10)
return (train_x, train_y), (test_x, test_y)
Next we define a function that will build the model given the architectural hyperparameters and the learning rate,
followed by the objective which will wrap the model building and evaluation:
.. code-block:: python
def build_model(inp_size, out_size, n_layers, n_units, p_dropout, activation):
inp = tf.keras.Input(inp_size)
h = inp
for l in range(n_layers - 1):
h = tf.keras.layers.Dense(n_units, activation=activation)(h)
h = tf.keras.layers.Dropout(rate=p_dropout)(h)
h = tf.keras.layers.Dense(out_size, activation=None)(h)
out = tf.keras.layers.Softmax()(h)
model = tf.keras.models.Model(inputs=inp, outputs=out)
return model
def objective_fn(**config) -> float:
(train_x, train_y), (test_x, test_y) = load_mnist()
model = build_model(train_x.shape[-1], train_y.shape[-1],
config["arch"]["n_layers"],
config["arch"]["n_units"],
config["arch"]["p_dropout"],
config["arch"]["activation"])
opt = tf.keras.optimizers.Adam(learning_rate=config["opt"]["lr"])
model.compile(optimizer=opt, loss="categorical_crossentropy")
model.fit(train_x, train_y, batch_size=100, epochs=1)
score = model.evaluate(test_x, test_y, batch_size=test_x.shape[0])
return score
Now that we can build a model, we should define the ranges of possible values for the these parameters.
This can be done with creating a :class:`Domain` instance as follows:
.. code-block:: python
domain = ht.Domain({
"arch": {
"n_layers": {1, 3, 5},
"n_units": {10, 50, 100, 500},
"p_dropout": [0, 0.9999],
"activation": {"relu", "selu", "elu"}
},
"opt": {
"lr": [1e-9, 1e-2]
}
})
The :class:`Domain` plays a central role in Hypertunity and we will make a frequent use of it later as well.
An important related class is the :class:`Sample`. It can be thought of as one realisation of the variables from the domain,
which in our case is one particular configuration of network hyperparameters.
Using the domain, we can set up the optimiser and the result visualiser also used for experiment logging.
In this case we use :class:`BayesianOptimisation` and :class:`Tensorboard` respectively:
.. code-block:: python
optimiser = ht.BayesianOptimisation(domain)
tb_rep = ht_tb.Tensorboard(domain,
metrics=["cross-entropy"],
logdir="./mnist_mlp",
database_path="./mnist_mlp")
After we create the :class:`Tensorboard` reporter we will be prompted to run `tensorboard --logdir=./mnist_mlp`
in the console and open Tensorboard in the browser. We can do this also before we launch the actual optimisation.
One last bit before running it is the definition of the job schedule as well as optimiser and reporter update loop.
This is to ensure that samples are generated, experiments are run and the results used to improve the underlying model of the :class:`BayesianOptimisation` optimiser.
To schedule one experiment at a time, for 50 consecutive steps we create a :class:`Job` for each function call of ``objective_fn``
with a set of suggested hyperparameters:
.. code-block:: python
n_steps = 50
batch_size = 1
with ht.Scheduler(n_parallel=batch_size) as scheduler:
for i in range(n_steps):
samples = optimiser.run_step(batch_size=batch_size, minimise=True)
jobs = [ht.Job(task=objective_fn, args=s.as_dict() for s in samples]
scheduler.dispatch(jobs)
evaluations = [r.data for r in scheduler.collect(n_results=batch_size, timeout=100.0)]
optimiser.update(samples, evaluations)
for sample_evaluation_pair in zip(samples, evaluations):
tb_rep.log(sample_evaluation_pair)
If we have a look at the Tensorboard dashboard while this is running, we should be able to see results being updated live!
.. image:: ../_static/images/tensorboard.gif
:width: 800
:align: center
:alt: Tensorboard
Even quicker start
~~~~~~~~~~~~~~~~~~
A high-level wrapper class :class:`Trial` allows for seamless parallel optimisation without having to schedule jobs,
update the optimiser or log results explicitly. The API is reduced to the minimum and yet remains flexible as
one can specify any optimiser or reporter:
.. code-block:: python
trial = ht.Trial(objective=objective_fn,
domain=domain,
optimiser="bo",
reporter="tensorboard",
logdir="./mnist_mlp",
database_path="./mnist_mlp",
metrics=["cross-entropy"])
trial.run(n_steps, batch_size=batch_size, n_parallel=batch_size)
================================================
FILE: docs/manual/reports.rst
================================================
Reports
=======
Saving and visualising progress can be accomplished by using :class:`Reporter` instance.
The reporter is supplied with data using the :py:meth:`log()` method which takes a tuple of a sample and score.
Optionally one can store additional information about the current experiment, e.g. the output directory or the job id,
using the ``meta`` keyword argument:
.. code-block:: python
for s, e, m in zip(samples, evaluations, meta_infos):
reporter.log((s, e), meta=m)
Table
-----
Hypertunity comes with a built-in reporter which organises the experiment results into an ascii table.
It is initialised from a domain and a list of metrics and can be viewed as a formatted string table by calling :obj:`str`
on the object.
The table can be sorted in ascending or descending order and the best results can be emphasised:
.. code-block:: python
>>> domain = ht.Domain({"x": [-5., 6.], "y": {"sin", "cos"}, "z": set(range(4))})
>>> reporter = ht.Table(domain, metrics=["score"])
>>> # run experiment and call reporter.log(...)
...
>>> print(reporter.format(order="descending"))
+=====+========+=====+===+==============+
| No. | x | y | z | score |
+=====+========+=====+===+==============+
| 6 | -4.35 | cos | 1 | 15.921 ± 0.0 |
+-----+--------+-----+---+--------------+
| 5 | -4.232 | cos | 3 | 8.906 ± 0.0 |
+-----+--------+-----+---+--------------+
| 4 | -4.588 | sin | 3 | 6.134 ± 0.0 |
+-----+--------+-----+---+--------------+
| 2 | 2.16 | cos | 0 | 4.667 ± 0.0 |
+-----+--------+-----+---+--------------+
| 3 | -0.977 | cos | 1 | -2.045 ± 0.0 |
+-----+--------+-----+---+--------------+
| 1 | -1.438 | cos | 3 | -6.933 ± 0.0 |
+-----+--------+-----+---+--------------+
Tensorboard
-----------
If Hypertunity is installed with the `tensorboard` option, a suitable version of Tensorflow and Tensorboard will be installed.
This will enable a :class:`Tensorboard` reporter which, using the HParams plugin, will generate live visualisations
as experiments are being logged. One can start the Tensorboard dashboard in the browser as usual, using the `logdir` supplied
at initialisation.
Note that to create a Tensorboard reporter one will have to import ``hypertunity.reports.tensorboard`` explicitly:
.. code-block:: python
import hypertunity.reports.tensorboard as tb
tb_reporter = tb.Tensorboard(domain, metrics=["score"], logdir="./logs")
See the :doc:`quickstart` guide for a preview of the dashboard visualisation.
================================================
FILE: docs/manual/scheduling.rst
================================================
Scheduling jobs
===============
Often in practice the objective function is a python script that might take command line arguments as parameters or define a function that has lots of dependencies.
Importing this function into the hyperparameter optimisation script or wrapping the target script involves some boilerplate code.
To help with that Hypertunity allows for specifying objective functions as ``Job`` instances which are then run in succession or in parallel using a ``Scheduler``.
The latter is a wrapper around `joblib `_ and takes care of both running jobs and collecting results.
Scheduling of ``Job`` instances is done using the ``dispatch`` method of a ``Scheduler``:
.. code-block:: python
jobs = [Job(...) for _ in range(10)]
scheduler.dispatch(jobs)
evaluations = [r.data for r in scheduler.collect(n_results=batch_size, timeout=10.0)]
There are multiple ways to define a job depending on the target to optimise.
Local python callable
~~~~~~~~~~~~~~~~~~~~~
If the function is defined or imported within the hyperparameter optimisation script, the ``task`` argument is the callable instance.
The ``args`` is then a tuple of arguments or a dict of named arguments which are supplied to the task function during calling.
For example:
.. code-block:: python
jobs = [ht.Job(task=foo, args=(*s.as_namedtuple(),)) for s in samples]
Python callable in a script
~~~~~~~~~~~~~~~~~~~~~~~~~~~
If the function to optimise resides in a script, Hypertunity allows for specifying a target by the full path to the script.
To select the objective function from the script append ``:`` and the function name:
.. code-block:: python
jobs = [Job(task="path/to/script.py:foo", args=(*s.as_namedtuple(),)) for s in samples]
A script
~~~~~~~~
If the objective function is a full command line application or a script that accepts the hyperparameters to tune as command line arguments then you should create a job as follows:
.. code-block:: python
jobs = [Job(task="path/to/script.py",
args=(*s.as_namedtuple(),),
meta={"binary": "python"}) for s in samples]
Using Slurm
~~~~~~~~~~~
To schedule jobs using Slurm a special job type is available. It allows to configure resources and other Slurm parameters but also requires that the target script is able to write a results file on disk.
.. code-block:: python
jobs = [SlurmJob(task="path/to/script.py",
args=(*sample.as_namedtuple(),),
output_file="path/to/results.pkl",
meta={"binary": "python", "resources": {"cpu": 1}}))
================================================
FILE: docs/source/hypertunity.rst
================================================
:mod:`hypertunity`
==================
.. automodule:: hypertunity
Summary
-------
.. autosummary::
:nosignatures:
Domain
Sample
Trial
API documentation
-----------------
.. autoclass:: Domain
:members:
.. autoclass:: Sample
:members:
.. autoclass:: Trial
:members:
================================================
FILE: docs/source/optimisation.rst
================================================
:mod:`hypertunity.optimisation`
===============================
.. currentmodule:: hypertunity.optimisation
Summary
-------
Data classes
~~~~~~~~~~~~
.. autosummary::
:nosignatures:
EvaluationScore
HistoryPoint
Optimisers
~~~~~~~~~~
.. autosummary::
:nosignatures:
Optimiser
BayesianOptimisation
GridSearch
RandomSearch
API documentation
-----------------
.. autoclass:: EvaluationScore
:members:
.. autoclass:: HistoryPoint
:members:
.. autoclass:: Optimiser
:members:
.. autoclass:: BayesianOptimisation
:members:
.. autoclass:: GridSearch
:members:
.. autoclass:: RandomSearch
:members:
================================================
FILE: docs/source/reports.rst
================================================
:mod:`hypertunity.reports`
==========================
.. currentmodule:: hypertunity.reports
Summary
-------
Default
~~~~~~~
.. autosummary::
:nosignatures:
Reporter
Table
Optional
~~~~~~~~
.. autosummary::
:nosignatures:
tensorboard.Tensorboard
API documentation
-----------------
.. autoclass:: Reporter
:members:
.. autoclass:: Table
:members:
.. currentmodule:: hypertunity.reports.tensorboard
.. autoclass:: Tensorboard
:members:
================================================
FILE: docs/source/scheduling.rst
================================================
:mod:`hypertunity.scheduling`
=============================
.. currentmodule:: hypertunity.scheduling
Summary
-------
.. autosummary::
:nosignatures:
Scheduler
Job
SlurmJob
Result
API documentation
-----------------
.. autoclass:: Scheduler
:members:
.. autoclass:: Job
:members:
.. autoclass:: SlurmJob
:members:
.. autoclass:: Result
:members:
================================================
FILE: hypertunity/__init__.py
================================================
from .domain import *
from .optimisation import *
from .reports import *
from .scheduling import *
from .trial import *
__version__ = "1.0.1"
================================================
FILE: hypertunity/domain.py
================================================
"""Definition of the optimisation domain and a sample."""
import ast
import copy
import os
import pickle
import random
from collections import namedtuple
from typing import Tuple
__all__ = [
"Domain",
"DomainNotIterableError",
"DomainSpecificationError",
"Sample"
]
class _RecursiveDict:
"""Helper base class for the :class:`Domain` and :class:`Sample` classes.
It implements common logic for creation, representation, type conversion
and serialisation.
"""
def __init__(self, dct):
if isinstance(dct, dict):
self._data = dct
elif isinstance(dct, str):
self._data = ast.literal_eval(dct)
else:
raise TypeError(
f"A {self.__class__.__name__} object can be created from a "
f"Python dict or str objects only. "
f"Unknown type {type(dct)} at initialisation."
)
self._ndim = 0
for _, val in _deepiter_dict(self._data):
self._ndim += 1
def __hash__(self):
return hash(str(self))
def __repr__(self):
"""Return the representation of the recursive dict using the
string method.
"""
return str(self)
def __str__(self):
"""Return the string representation of the recursive dict."""
return str(self._data)
def __eq__(self, other):
"""Compare all subdomains for equal bounds and sets. The order of the
subdomains is not important.
"""
return self.as_dict() == other.as_dict()
def __len__(self):
"""Compute the dimensionality of the recursive dict as the length of
the flattened dict.
"""
return self._ndim
def __getitem__(self, item):
"""Return the item (possibly a subdomain) for a given key.
Args:
item: str of tuple of str. If the latter it will access nested
structures with the next str in the tuple.
"""
if isinstance(item, str):
return self._data.__getitem__(item)
elif isinstance(item, tuple) and all(map(lambda x: isinstance(x, str), item)):
sub_dict = self._data
for it in item:
if not isinstance(sub_dict, dict):
raise KeyError(f"Unknown sub-key {it}.")
sub_dict = sub_dict[it]
return sub_dict
def __add__(self, other: '_RecursiveDict'):
"""Merge self with the `other` :class:`_RecursiveDict`.
Args:
other: :class:`_RecursiveDict`. The recursive dictionary that will
be merged into the current one.
Returns:
A new :class:`_RecursiveDict` object consisting of the subdomains
of both domains. If the keys overlap and the subdomains are discrete
or categorical, the values will be unified.
Raises:
:obj:`ValueError`: if identical keys point to different values.
"""
flattened_a = self.flatten()
flattened_b = other.flatten()
# validate that the two _RecursiveDicts are disjoint
if len(flattened_a.keys()) > len(flattened_a.keys() - flattened_b.keys()):
raise ValueError(
f"Ambiguous addition of {self.__class__.__name__} objects."
)
merged = list(flattened_a.items())
merged.extend(list(flattened_b.items()))
return self.__class__.from_list(merged)
def flatten(self):
"""Return the flattened version of the recursive dict, i.e. without
nested dicts.
The keys of the nested subdomains are collected in a tuple to create a
new unique key. For the sake of type consistency, the key of a
non-nested subdomain is converted to a tuple with a single element.
"""
return {keys: val for keys, val in _deepiter_dict(self._data)}
def as_dict(self):
"""Convert the recursive dict object from :class:`_RecursiveDict`
to :obj:`dict` type.
"""
return copy.deepcopy(self._data)
@classmethod
def from_list(cls, lst):
"""Create a :class:`_RecursiveDict` object from a list of tuples.
Args:
lst: :obj:`List[Tuple]`. Each element is a pair of the keys
(tuple of strings) and the value.
Returns:
A :class:`_RecursiveDict` object.
Raises:
:obj:`ValueError`: if the list contains duplicating keys with
different values.
Examples:
```python
>>> lst = [(("a", "b"), {2, 3, 4}), (("c",), [0, 0.1])]
>>> _RecursiveDict.from_list(lst)
{"a": {"b": {2, 3, 4}}, "c": [0, 0.1]}
```
"""
dct = {}
head = dct
for keys, vals in lst:
if not keys:
continue
for k in keys[:-1]:
if k not in dct:
dct[k] = {}
dct = dct[k]
if keys[-1] in dct and dct[keys[-1]] == vals:
raise ValueError(f"Duplicating entries for keys {keys}.")
dct[keys[-1]] = vals
dct = head
return cls(head)
def serialise(self, filepath=None):
"""Serialise the :class:`_RecursiveDict` object to a file or a string
if `filepath` is not supplied.
Args:
filepath: (optional) :obj:`str`. Filepath as to dump the serialised
:class:`_RecursiveDict` object.
Returns:
The bytes representing the serialised :class:`_RecursiveDict` object.
"""
serialised = pickle.dumps(self._data)
if filepath is not None:
with open(filepath, "wb") as fp:
pickle.dump(self._data, fp)
return serialised
@classmethod
def deserialise(cls, series):
"""Deserialise a serialised :class:`_RecursiveDict` object from a byte
stream or file.
Args:
series: :obj:`str`. The serialised :class:`_RecursiveDict` object or
a filepath to it.
Returns:
A :class:`_RecursiveDict` object.
"""
if not isinstance(series, (bytes, bytearray)) and os.path.isfile(series):
with open(series, "rb") as fp:
return cls(pickle.load(fp))
return cls(pickle.loads(series))
def as_namedtuple(self):
"""Convert a :class:`_RecursiveDict` to a namedtuple type.
Returns:
A Python namedtuple object with names the same as the keys of the
:class:`_RecursiveDict` dict. Nested dicts are accessed by
successive attribute getters.
Examples:
```python
>>> rd = _RecursiveDict({"a": {"b": [1, 2]}, "c": {1, 2, 3}, "d": 2.})
>>> nt = rd.as_namedtuple()
>>> nt.a.b
[1, 2]
>>> nt.c == {1, 2, 3} and nt.d == 2.
True
```
"""
def helper(dct):
keys, vals = [], []
for k, v in dct.items():
keys.append(k)
if isinstance(v, dict):
vals.append(helper(v))
else:
vals.append(v)
# The dict.keys() and dict.values() will iterate in the same order
# as long as dct is not modified.
return namedtuple("NT_" + self.__class__.__name__, keys)(*vals)
return helper(self._data)
class Domain(_RecursiveDict):
"""Defines the optimisation domain of the objective function. It can be a
continuous interval or a discrete set of numeric or non-numeric values.
The latter is also designated as a categorical domain. It is represented as
a Python dict object with the keys naming the variables and the values defining
the set of allowed values. A :class:`Domain` can also be recursively
specified. That is, a key can name a subdomain represented as a Python dict.
For continuous sets use Python list to define an interval in the form
[a, b], a < b. For discrete sets use Python sets, e.g. {1, 2, 5, -0.1}
or {"option_a", "option_b"}.
Examples:
>>> simple_domain = {"x": {0, 1},
>>> "y": [-1, 1],
>>> "z": {-1, 2, 4}}
>>> nested_domain = {"discrete": {"x": {1, 2, 3}, "y": {4, 5, 6}}
>>> "continuous": {"x": [-4, 4], "y": [0, 1]}
>>> "categorical": {"opt1", "opt2"}}
"""
# Domain types
Continuous = 1
Discrete = 2
Categorical = 3
Invalid = 4
def __init__(self, dct, seed=None):
"""Initialise the :class:`Domain`.
Args:
dct: :obj:`dict`. The mapping of variable names to sets of
allowed values.
seed: (optional) :obj:`int`. Seed for the randomised sampling.
"""
super(Domain, self).__init__(dct)
self._validate()
self._rng = random.Random(seed)
self._is_continuous = False
for _, val in _deepiter_dict(self._data):
if isinstance(val, list):
self._is_continuous = True
def __iter__(self):
"""Iterate over the domain if it is fully discrete.
The iterations are over the Cartesian product of all 1-dim discrete
subdomains.
Raises:
:class:`DomainNotIterableError`: if the domain has a at least one
continuous subdomain.
"""
if self._is_continuous:
raise DomainNotIterableError(
"The domain has a continuous subdomain and cannot be iterated."
)
def cartesian_walk(dct):
if dct:
key, vals = dct.popitem()
if isinstance(vals, set):
for v in vals:
yield from (
dict(**rem, **{key: v})
for rem in cartesian_walk(copy.deepcopy(dct))
)
elif isinstance(vals, dict):
for sub_v in cartesian_walk(copy.deepcopy(vals)):
yield from (
dict(**rem, **{key: sub_v})
for rem in cartesian_walk(copy.deepcopy(dct))
)
else:
raise TypeError(
f"Unexpected subdomain of type {type(vals)}."
)
else:
yield {}
yield from map(Sample, cartesian_walk(copy.deepcopy(self._data)))
def _validate(self):
"""Check for invalid domain specifications."""
for keys, values in _deepiter_dict(self._data):
if not (all(map(lambda x: isinstance(x, str), keys))
and isinstance(values, (set, list, dict))):
raise DomainSpecificationError(
"Keys must be of type string and values "
"must be either of type set, list or dict."
)
if (isinstance(values, list)
and (len(values) != 2 or values[0] >= values[1])):
raise DomainSpecificationError(
"Interval must be specified by two numbers: [a, b], a < b."
)
def sample(self):
"""Draw a sample from the domain. All subdomains are sampled uniformly.
Returns:
A :class:`Sample` object.
"""
def sample_dict(dct):
sample = {}
for key, vals in dct.items():
if isinstance(vals, set):
sample[key] = self._rng.choice(list(vals))
elif isinstance(vals, list):
sample[key] = self._rng.uniform(*vals)
else:
sample[key] = sample_dict(vals)
return sample
return Sample(sample_dict(self._data))
@property
def is_continuous(self):
"""Return `True` if at least one subdomain is continuous."""
return self._is_continuous
@classmethod
def get_type(cls, subdomain):
"""Return the type of the set of values in a subdomain.
Args:
subdomain: one of :obj:`dict`, :obj:`list` or :obj:`set`. The
subdomain to get the type for.
Returns:
One of `Domain.Continuous`, `Domain.Discrete`, `Domain.Categorical`
or `Domain.Invalid`.
"""
def is_numeric(x):
try:
float(x)
except ValueError:
return False
return True
if isinstance(subdomain, list):
return Domain.Continuous
if isinstance(subdomain, set):
if all(map(is_numeric, subdomain)):
return Domain.Discrete
return Domain.Categorical
return Domain.Invalid
def split_by_type(self) -> Tuple['Domain', 'Domain', 'Domain']:
"""Split the domain into discrete, categorical and continuous
subdomains respectively.
Returns:
A tuple of three :class:`Domain` objects for the discrete
numerical, categorical and continuous subdomains.
"""
discrete, categorical, continuous = [], [], []
for keys, vals in self.flatten().items():
if Domain.get_type(vals) == Domain.Continuous:
continuous.append((keys, vals))
elif Domain.get_type(vals) == Domain.Categorical:
categorical.append((keys, vals))
elif Domain.get_type(vals) == Domain.Discrete:
discrete.append((keys, vals))
else:
raise ValueError("Encountered an invalid subdomain.")
return (
Domain.from_list(discrete),
Domain.from_list(categorical),
Domain.from_list(continuous)
)
class DomainNotIterableError(TypeError):
"""Alias for the :obj:`TypeError` raised during iteration of (partially)
continuous :class:`Domain` object.
"""
pass
class DomainSpecificationError(ValueError):
"""Alias for the :obj:`ValueError` raised during :class:`Domain` object
creation from an invalid set of values.
"""
pass
class Sample(_RecursiveDict):
"""Defines a sample from the optimisation domain.
It has the same recursive structure a :class:`Domain` object, however each
dimension is represented by one value only. The keys are exactly as the
keys of the respective domain.
Examples:
>>> domain = Domain({"x": {"y": {0, 1, 2}}, "z": [3, 4]})
>>> domain.sample()
{'x': {'y': 0}, 'z': 3.1415926535897932}
"""
def __init__(self, dct):
"""Initialise the :class:`Sample` object from a dict."""
super(Sample, self).__init__(dct)
def __iter__(self):
"""Iterate over all values in the sample.
Yields:
A tuple of keys and a single value, where the keys are a tuple
of strings.
"""
yield from self.flatten().items()
def _deepiter_dict(dct):
"""Iterate over all key, value pairs of a (possibly nested) dictionary.
In this case, all keys of the nested dicts are summarised in a tuple.
Args:
dct: dict object to iterate.
Yields:
Tuple of keys (itself a tuple) and the corresponding value.
Examples:
>>> list(_deepiter_dict({"a": {"b": 1, "c": 2}, "d": 3}))
[(('a', 'b'), 1), (('a', 'c'), 2), (('d',), 3)]
"""
def chained_keys_iter(prefix_keys, dct_tmp):
for key, val in dct_tmp.items():
chained_keys = prefix_keys + (key,)
if isinstance(val, dict):
yield from chained_keys_iter(chained_keys, val)
else:
yield chained_keys, val
yield from chained_keys_iter((), dct)
================================================
FILE: hypertunity/optimisation/__init__.py
================================================
from .base import *
from .bo import *
from .exhaustive import *
from .random import *
================================================
FILE: hypertunity/optimisation/base.py
================================================
"""Defines the API of every optimiser and implements common logic."""
import abc
import math
from dataclasses import dataclass
from typing import Any, Dict, List, Sequence
from hypertunity.domain import Domain, Sample
__all__ = [
"EvaluationScore",
"HistoryPoint",
"Optimiser",
"Optimizer",
"ExhaustedSearchSpaceError"
]
@dataclass(frozen=True, order=True)
class EvaluationScore:
"""A tuple of the evaluation value of the objective function
and a variance if known.
"""
value: float
variance: float = 0.0
def __str__(self):
return f"{self.value:.3f} ± {math.sqrt(self.variance):.1f}"
@dataclass(frozen=True)
class HistoryPoint:
"""A tuple of a :class:`Sample` at which the objective has been evaluated
and the corresponding metrics. The metrics are supplied as :obj:`dict`
mapping of a :obj:`str` metric name to an :class:`EvaluationScore`.
"""
sample: Sample
metrics: Dict[str, EvaluationScore]
class Optimiser:
"""Abstract class :class:`Optimiser` for all optimisers.
It must be implemented by all subclasses in this package.
Every :class:`Optimiser` instance can be run for one single step using the
:py:meth:`run_step` method. The :class:`Optimiser` does not perform the
evaluation of the objective function but only proposes values from its
domain. Therefore an evaluation history must be supplied via the
:py:meth`update` method. The history can be erased and the
:class:`Optimiser` brought to the initial state via the :py:meth:`reset`
method.
"""
DEFAULT_METRIC_NAME = "score"
def __init__(self, domain: Domain):
"""Initialise the optimiser with a domain.
Args:
domain: :class:`Domain`. The domain of the objective function.
"""
self.domain = domain
self._history: List[HistoryPoint] = []
@property
def history(self):
"""Return the accumulated optimisation history."""
return self._history
@history.setter
def history(self, history: List[HistoryPoint]):
"""Set the optimiser history.
This method can be used to warm-start an optimiser.
Args:
history: :obj:`List[HistoryPoint]`. New history which will
**overwrite** the old one.
"""
self.reset()
for hp in history:
self.update(hp.sample, hp.metrics)
@abc.abstractmethod
def run_step(self, batch_size, *args: Any, **kwargs: Any) -> List[Sample]:
"""Perform one step of optimisation and suggest the next sample to
evaluate.
Args:
batch_size: (optional) :obj:`int`. The number of samples to
suggest at once.
*args: optional arguments for the Optimiser.
**kwargs: optional keyword arguments for the Optimiser.
Returns:
A :obj:`List[Sample]` with the suggested samples to evaluate.
"""
raise NotImplementedError
def update(self, x, fx, **kwargs):
"""Update the optimiser's history with new points.
Args:
x: :class:`Sample` or :obj:`List[Sample]`. The samples at which the
objective function has been evaluated.
fx: :class:`EvaluationScore` or :obj:`List[EvaluationScore]`. The
evaluation scores at the corresponding samples.
"""
if isinstance(x, Sample):
self._update_history(x, fx)
elif (isinstance(x, Sequence)
and isinstance(fx, Sequence)
and len(x) == len(fx)):
for i, j in zip(x, fx):
self._update_history(i, j)
else:
raise ValueError("Update values for `x` and `f(x)` must be either "
"a `Sample` and an evaluation or a list thereof.")
def _update_history(self, x, fx):
if isinstance(fx, (float, int)):
history_point = HistoryPoint(
sample=x,
metrics={self.DEFAULT_METRIC_NAME: EvaluationScore(fx)}
)
elif isinstance(fx, EvaluationScore):
history_point = HistoryPoint(
sample=x, metrics={self.DEFAULT_METRIC_NAME: fx})
elif isinstance(fx, Dict):
metrics = {}
for key, val in fx.items():
if isinstance(val, (float, int)):
metrics[key] = EvaluationScore(val)
else:
metrics[key] = val
history_point = HistoryPoint(sample=x, metrics=metrics)
else:
raise TypeError(
"Cannot update history for one sample and multiple evaluations."
" Use batched update instead and provide a list of samples and "
"a list of evaluation metrics.")
self.history.append(history_point)
def reset(self):
"""Reset the optimiser to the initial state."""
self._history.clear()
class ExhaustedSearchSpaceError(Exception):
pass
Optimizer = Optimiser
================================================
FILE: hypertunity/optimisation/bo.py
================================================
"""Bayesian Optimisation using Gaussian Process regression."""
from multiprocessing import cpu_count
from typing import Any, Dict, List, Sequence, Tuple, Type, TypeVar, Union
import GPy
import GPyOpt
import numpy as np
from GPyOpt.core import errors as gpyopt_err
from hypertunity import utils
from hypertunity.domain import Domain, Sample
from hypertunity.optimisation.base import (
EvaluationScore,
ExhaustedSearchSpaceError,
Optimiser
)
__all__ = [
"BayesianOptimisation",
"BayesianOptimization"
]
GPyOptSample = TypeVar("GPyOptSample", List[List], np.ndarray)
GPyOptDomain = List[Dict[str, Any]]
GPyOptCategoricalValueMapper = Dict[str, Dict[Any, int]]
GPyOptDiscreteTypeMapper = Dict[str, Dict[Any, type]]
class BayesianOptimisation(Optimiser):
"""Bayesian Optimiser using `GPyOpt` as a backend."""
CONTINUOUS_TYPE = "continuous"
DISCRETE_TYPE = "discrete"
CATEGORICAL_TYPE = "categorical"
def __init__(self, domain, seed=None):
"""Initialise the optimiser's domain.
Args:
domain: :class:`Domain`. The domain of the objective function.
seed: (optional) :obj:`int`. The seed of the optimiser. Used for
reproducibility purposes.
"""
np.random.seed(seed)
domain = Domain(domain.as_dict(), seed=seed)
super(BayesianOptimisation, self).__init__(domain)
converted_and_mappers = self._convert_to_gpyopt_domain(self.domain)
(
self.gpyopt_domain,
self._categorical_value_mapper,
self._discrete_type_mapper
) = converted_and_mappers
self._inv_categorical_value_mapper = {
name: {v: k for k, v in mapping.items()}
for name, mapping in self._categorical_value_mapper.items()
}
self._data_x = np.array([[]])
self._data_fx = np.array([[]])
self.__is_empty_data = True
@staticmethod
def _convert_to_gpyopt_domain(
orig_domain: Domain
) -> Tuple[GPyOptDomain,
GPyOptCategoricalValueMapper,
GPyOptDiscreteTypeMapper]:
"""Convert a :class:`Domain` type object to :obj:`GPyOptDomain`.
Args:
orig_domain: :class:`Domain` to convert.
Returns:
A tuple of the converted :obj:`GPyOptDomain` object and a value
mapper to assign each categorical value to an integer
(0, 1, 2, 3 ...). This is done to abstract away the type of the
categorical domain from the `GPyOpt` internals and thus arbitrary
types are supported.
Notes:
The categorical options must be hashable. This behaviour may change
in the future.
"""
gpyopt_domain = []
value_mapper = {}
type_mapper = {}
flat_domain = orig_domain.flatten()
for names, vals in flat_domain.items():
dim_name = utils.join_strings(names)
domain_type = Domain.get_type(vals)
if domain_type == Domain.Continuous:
dim_type = BayesianOptimisation.CONTINUOUS_TYPE
elif domain_type == Domain.Discrete:
dim_type = BayesianOptimisation.DISCRETE_TYPE
type_mapper[dim_name] = {v: type(v) for v in vals}
elif domain_type == Domain.Categorical:
dim_type = BayesianOptimisation.CATEGORICAL_TYPE
value_mapper[dim_name] = {v: i for i, v in enumerate(vals)}
vals = tuple(range(len(vals)))
else:
raise ValueError(
f"Badly specified subdomain {names} with values {vals}."
)
gpyopt_domain.append({
"name": dim_name,
"type": dim_type,
"domain": tuple(vals)
})
assert len(gpyopt_domain) == len(orig_domain), \
"Mismatching dimensionality after domain conversion."
return gpyopt_domain, value_mapper, type_mapper
def _convert_to_gpyopt_sample(self, orig_sample: Sample) -> GPyOptSample:
"""Convert a sample of type :class:`Sample` to type :obj:`GPyOptSample`
and vice versa.
If the function is supplied with a :obj:`GPyOptSample` type object it
calls the dedicated function `self._convert_from_gpyopt_sample`.
Args:
orig_sample: :class:`Sample` type object to be converted.
Returns:
A :obj:`GPyOptSample` type object with the same values as
`orig_sample`.
"""
gpyopt_sample = []
# iterate in the order of the GPyOpt domain names
for dim in self.gpyopt_domain:
keys = utils.split_string(dim["name"])
val = orig_sample[keys]
if dim["type"] == BayesianOptimisation.CATEGORICAL_TYPE:
val = self._categorical_value_mapper[dim["name"]][val]
gpyopt_sample.append(val)
return np.asarray(gpyopt_sample)
def _convert_from_gpyopt_sample(self, gpyopt_sample: GPyOptSample) -> Sample:
"""Convert :obj:`GPyOptSample` type object to the corresponding
:class:`Sample` type.
Args:
gpyopt_sample: :obj:`GPyOptSample` object to be converted.
Returns:
A :class:`Sample` type object with the same values as
`gpyopt_sample`.
"""
if len(self.gpyopt_domain) != len(gpyopt_sample):
raise ValueError(
f"Cannot convert sample with mismatching dimensionality. "
f"The original space has {len(self.domain)} dimensions and the "
f"sample {len(gpyopt_sample)} dimensions."
)
orig_sample = {}
for dim, value in zip(self.gpyopt_domain, gpyopt_sample):
names = utils.split_string(dim["name"])
sub_dim = orig_sample
for name in names[:-1]:
if name not in sub_dim:
sub_dim[name] = {}
sub_dim = sub_dim[name]
if dim["type"] == BayesianOptimisation.CATEGORICAL_TYPE:
sub_dim[names[-1]] = self._inv_categorical_value_mapper[dim["name"]][value]
elif dim["type"] == BayesianOptimisation.DISCRETE_TYPE:
sub_dim[names[-1]] = self._discrete_type_mapper[dim["name"]][value](value)
else:
sub_dim[names[-1]] = value
return Sample(orig_sample)
@utils.support_american_spelling
def run_step(
self,
batch_size: int = 1,
minimise: bool = False,
**kwargs
) -> List[Sample]:
"""Run one step of Bayesian optimisation with a GP regression surrogate
model.
The first sample of the domain is chosen at random. Only after the model
has been updated with at least one (data point, evaluation score)-pair
the GPs are built and the acquisition function computed and optimised.
Args:
batch_size: (optional) :obj:`int`. The number of samples to suggest
at once. If larger than one, there is no guarantee for the
optimality of the number of probes.
minimise: (optional) :obj:`bool`. Whether the objective should be
minimised
**kwargs: optional keyword arguments which will be passed to the
backend `GPyOpt.methods.BayesianOptimisation` optimiser.
Keyword Args:
model: :obj:`str` or :obj:`GPy.Model` object. The surrogate model
used by the backend optimiser.
kernel: :obj:`GPy.Kern` object. The kernel used by the model.
variance: :obj:`float`. The variance of the objective function.
Returns:
A list of `batch_size`-many :class:`Sample` instances at which the
objective should be evaluated next.
Raises:
:class:`ExhaustedSearchSpaceError`: if the domain is discrete and
gets exhausted.
"""
if self.__is_empty_data:
next_samples = [self.domain.sample() for _ in range(batch_size)]
else:
assert len(self._data_x) > 0 and len(self._data_fx) > 0, \
"Cannot initialise BO from empty data."
default_kwargs = {
"num_cores": min(batch_size, cpu_count() - 1),
"normalize_Y": True,
"acquisition_type": "EI",
"de_duplication": True,
"model_type": "GP",
"evaluator_type": "local_penalization" if batch_size > 1 else "sequential"
}
if "model" in kwargs:
model = kwargs.pop("model")
# NOTE: Remove this test for model type after the bug in GPyOpt
# is fixed: https://github.com/SheffieldML/GPyOpt/issues/183
if (isinstance(model, str)
and model.lower() == "gp_mcmc"
and batch_size > 1):
raise ValueError(
"GP_MCMC model cannot be used with a batch size > 1 "
"due to a bug in GPyOpt: "
"https://github.com/SheffieldML/GPyOpt/issues/183"
)
kernel = kwargs.pop("kernel", None)
variance = kwargs.pop("variance", None)
default_kwargs["model"] = self._build_model(
model, kernel, variance
)
if (variance is not None
and all(np.atleast_1d(np.isclose(variance, 0.0)))):
default_kwargs["exact_feval"] = True
default_kwargs = _overwrite_dict(default_kwargs, kwargs)
# NOTE: as of GPyOpt 1.2.5 adding new data to an existing model is
# not yet possible, hence the object recreation. This behaviour
# might be changed in future versions. In this case the code should
# be refactored such that `bo` is initialised once and `update`
# takes care of the extension of the (X, Y) samples.
bo = GPyOpt.methods.BayesianOptimization(
f=None, domain=self.gpyopt_domain,
maximize=not minimise,
X=self._data_x,
# NOTE: the following hack is necessary due to a bug in GPyOpt.
# The code should be updated once this gets fixed:
# https://github.com/SheffieldML/GPyOpt/issues/180
Y=(-1 + 2 * minimise) * self._data_fx,
initial_design_numdata=len(self._data_x),
batch_size=batch_size,
**default_kwargs)
try:
gpyopt_samples = bo.suggest_next_locations()
except gpyopt_err.FullyExploredOptimizationDomainError as err:
raise ExhaustedSearchSpaceError from err
next_samples = [self._convert_from_gpyopt_sample(s)
for s in gpyopt_samples]
return next_samples
def _build_model(self, model: Union[str, Type[GPy.Model]] = "GP",
kernel: GPy.kern.Kern = None,
variance: float = None):
"""Build the surrogate model for the GPyOpt BayesianOptimisation.
The default model is 'gp'. In case of a large number of already
evaluated samples, a 'sparse_gp' is used to speed up computation.
Args:
model: :obj:`str` or :obj:`GPy.Model`, the GP regression model.
kernel: :obj:`GPy.kern.Kern`, the kernel of the GP regression model.
variance: :obj:`float`, the variance of the evaluations
(used only if supported by the model).
Returns:
A :obj:`GPy.Model` instance.
"""
if isinstance(model, GPy.Model):
return model
if isinstance(model, str):
model = model.lower()
if model == "gp":
return GPyOpt.models.GPModel(kernel=kernel, noise_var=variance,
sparse=len(self._data_x) > 25)
if model == "gp_mcmc":
return GPyOpt.models.GPModel_MCMC(
kernel=kernel,
noise_var=variance
)
raise ValueError(
f"Unknown model {model}. When supplying a custom kernel or "
f"the variance of the objective function, the model has to be "
f"one from {{'GP', 'GP_MCMC'}}. Otherwise you should supply a "
f"custom `GPy.Model` instance."
)
raise TypeError("Argument `model` must be of type str or `GPy.Model`.")
def update(self, x, fx, **kwargs):
"""Update the surrogate model with the domain sample `x` and the
function evaluation `fx`.
Args:
x: class:`Sample`. One sample of the domain of the objective
function.
fx: a :obj:`float`, an :class:`EvaluationScore` or a :obj:`dict`.
The evaluation scores of the objective evaluated at `x`. If
given as :obj:`dict` then it must be a mapping from metric names
to :class:`EvaluationScore` or :obj:`float` results.
**kwargs: unused by this model.
"""
super(BayesianOptimisation, self).update(x, fx)
# both `converted_x` and `array_fx` must be 2dim arrays
if isinstance(x, Sample):
converted_x, array_fx = self._convert_evaluation_sample(x, fx)
elif (isinstance(x, Sequence)
and isinstance(fx, Sequence)
and len(x) == len(fx)):
# append each history point to the tracked history and
# convert to numpy arrays
converted_x, array_fx = map(
np.concatenate, zip(*[self._convert_evaluation_sample(i, j)
for i, j in zip(x, fx)]))
else:
raise ValueError(
"Update values for `x` and `f(x)` must be either "
"`Sample` and an evaluation or a list thereof."
)
if self._data_x.size == 0:
self._data_x = converted_x
self._data_fx = array_fx
else:
self._data_x = np.concatenate([self._data_x, converted_x])
self._data_fx = np.concatenate([self._data_fx, array_fx])
self.__is_empty_data = False
def _convert_evaluation_sample(self, x, fx):
if isinstance(fx, (float, int)):
array_fx = np.array([[fx]])
elif isinstance(fx, EvaluationScore):
array_fx = np.array([[fx.value]])
elif isinstance(fx, Dict):
if not len(fx) == 1:
raise NotImplementedError(
"Currently only evaluations with a single metric are supported."
)
array_fx = np.array([[list(fx.values())[0].value]])
else:
raise TypeError(
"Cannot update history for one sample and multiple evaluations."
" Use batched update instead and provide a list of samples and "
"a list of evaluation metrics."
)
converted_x = self._convert_to_gpyopt_sample(x).reshape(1, -1)
return converted_x, array_fx
def reset(self):
"""Reset the optimiser for a fresh start."""
super(BayesianOptimisation, self).reset()
self._data_x = np.array([])
self._data_fx = np.array([])
self.__is_empty_data = True
BayesianOptimization = BayesianOptimisation
def _overwrite_dict(old_dict, new_dict):
updated_old = {}
# copy the old dict
for key, value in old_dict.items():
updated_old[key] = value
# overwrite the existing and add the new values
for key, value in new_dict.items():
updated_old[key] = value
return updated_old
================================================
FILE: hypertunity/optimisation/exhaustive.py
================================================
"""Optimisation by exhaustive search, aka grid search."""
from typing import List
from hypertunity.domain import Domain, DomainNotIterableError, Sample
from hypertunity.optimisation.base import ExhaustedSearchSpaceError, Optimiser
__all__ = [
"GridSearch"
]
class GridSearch(Optimiser):
"""Grid search pseudo-optimiser."""
def __init__(self,
domain: Domain,
sample_continuous: bool = False,
seed: int = None):
"""Initialise the :class:`GridSearch` optimiser from a discrete domain.
If the domain contains continuous subspaces, then they could be sampled
if `sample_continuous` is enabled.
Args:
domain: :class:`Domain`. The domain to iterate over.
sample_continuous: (optional) :obj:`bool`. Whether to sample the
continuous subspaces of the domain.
seed: (optional) :obj:`int`. Seed for the sampling of the continuous
subspace if necessary.
"""
if domain.is_continuous and not sample_continuous:
raise DomainNotIterableError(
"Cannot perform grid search on (partially) continuous domain. "
"To enable grid search in this case, set the argument "
"'sample_continuous' to True."
)
super(GridSearch, self).__init__(domain)
(
discrete_domain,
categorical_domain,
continuous_domain
) = domain.split_by_type()
# unify the discrete and the categorical into one,
# as they can be iterated:
self.discrete_domain = discrete_domain + categorical_domain
if seed is not None:
self.continuous_domain = Domain(
continuous_domain.as_dict(), seed=seed
)
else:
self.continuous_domain = continuous_domain
self._discrete_domain_iter = iter(self.discrete_domain)
self._is_exhausted = len(self.discrete_domain) == 0
self.__exhausted_err = ExhaustedSearchSpaceError(
"The domain has been exhausted. Reset the optimiser to start again."
)
def run_step(self, batch_size: int = 1, **kwargs) -> List[Sample]:
"""Get the next `batch_size` samples from the Cartesian-product walk
over the domain.
Args:
batch_size: (optional) :obj:`int`. The number of samples to suggest
at once.
Returns:
A list of :class:`Sample` instances from the domain.
Raises:
:class:`ExhaustedSearchSpaceError`: if the (discrete part of the)
domain is fully exhausted and no samples can be generated.
Notes:
This method does not guarantee that the returned list of
:class:`Samples` will be of length `batch_size`. This is due to the
size of the domain and the fact that samples will not be repeated.
"""
if self._is_exhausted:
raise self.__exhausted_err
samples = []
for i in range(batch_size):
try:
discrete = next(self._discrete_domain_iter)
except StopIteration:
self._is_exhausted = True
break
if self.continuous_domain:
continuous = self.continuous_domain.sample()
samples.append(discrete + continuous)
else:
samples.append(discrete)
if samples:
return samples
raise self.__exhausted_err
def reset(self):
"""Reset the optimiser to the beginning of the Cartesian-product walk."""
super(GridSearch, self).reset()
self._discrete_domain_iter = iter(self.discrete_domain)
self._is_exhausted = len(self.discrete_domain) == 0
================================================
FILE: hypertunity/optimisation/random.py
================================================
"""Optimisation by a uniformly random search."""
from typing import List
from hypertunity.domain import Domain, Sample
from hypertunity.optimisation.base import Optimiser
__all__ = [
"RandomSearch"
]
class RandomSearch(Optimiser):
"""Uniform random sampling pseudo-optimiser."""
def __init__(self, domain: Domain, seed: int = None):
"""Initialise the :class:`RandomSearch` search space.
Args:
domain: :class:`Domain`. The domain of the objective function.
It will be sampled uniformly using the :py:meth:`sample()`
method of the :class:`Domain`.
seed: (optional) :obj:`int`. The seed for the domain sampling.
"""
if seed is not None:
domain = Domain(domain.as_dict(), seed=seed)
super(RandomSearch, self).__init__(domain)
def run_step(self, batch_size=1, **kwargs) -> List[Sample]:
"""Sample uniformly the domain for `batch_size` number of times.
Args:
batch_size: (optional) :obj:`int`. The number of samples to return
at one step.
Returns:
A list of `batch_size` many :class:`Sample` instances.
"""
return [self.domain.sample() for _ in range(batch_size)]
================================================
FILE: hypertunity/optimisation/tests/__init__.py
================================================
================================================
FILE: hypertunity/optimisation/tests/_common.py
================================================
import numpy as np
from hypertunity.optimisation import EvaluationScore
CONT_1D_ARGMAX = 3.989333
CONT_1D_MAX = 5.958363
def continuous_1d(x):
"""Compute x * sin(2x) + 2 if x in [0, 5] else 0."""
fx = np.atleast_1d(x * np.sin(2 * x) + 2)
fx[np.logical_and(x < 0, x > 5)] = 0.
return fx
CONT_HETEROSCED_1D_ARGMAX = 0.0
CONT_HETEROSCED_1D_MAX = 2.0
def continuous_heteroscedastic_1d(x):
"""Compute 0.2 * x^4 - x^2 + 2 + eps
where eps ~ N(0, |0.2 * x| + 1e-7) and x in [-2., 2]
"""
rng = np.random.RandomState(7)
noise = rng.normal(0., 0.2 * np.abs(x) + 1e-7)
fx = np.atleast_1d(0.2 * x**4 - x**2 + 2 + noise)
fx[np.logical_and(x < -2., x > 2.)] = 0.
return fx
HETEROGEN_3D_ARGMAX = (6.0, "sqr", 0)
HETEROGEN_3D_MAX = 36.0
def heterogeneous_3d(x, y, z):
"""Compute `continuous_1d` + z if y == 'sin', else return x**2 - 3 * z
where x is continuous, y is categorical ("sin", "sqr"), z is discrete.
Args:
x: float or np.ndarray, continuous variable [-5.0, 6.0]
y: str, categorical variable ("sin", "sqr")
z: float or int or np.ndarray, discrete variable (0, 1, 2, 3)
"""
if y == "sin":
return (continuous_1d(x) + z)[0]
elif y == "sqr" and z in [0, 1, 2, 3]:
return x**2 - 3 * z
else:
raise ValueError("`y` can only be 'sin' or 'sqr' and z [0, 1, 2, 3].")
DISCRETE_3D_ARGMAX = (4, 5, "large")
DISCRETE_3D_MAX = 3.0
def discrete_3d(x, y, z):
"""Compute c * x * y where c = 0.1 if z == "small" else 0.15.
`x` and `y` are discrete numerical values, z is categorical.
Args:
x: int, discrete variable (1, 2, 3, 4)
y: int, discrete variable (-3, 2, 5)
z: str, categorical variable ("small", "large")
"""
if (x not in {1, 2, 3, 4}
and y not in {-3, 2, 5}
and z not in {"small", "large"}):
raise ValueError("Outside the allowed domain.")
if z == "small":
return 0.1 * x * y
return 0.15 * x * y
def evaluate_continuous_1d(opt, batch_size, n_steps, **kwargs):
all_samples = []
all_evaluations = []
for i in range(n_steps):
samples = opt.run_step(batch_size, minimise=False, **kwargs)
evaluations = continuous_1d(np.array([s["x"] for s in samples]))
opt.update(samples, [EvaluationScore(ev) for ev in evaluations], )
# gather the samples and evaluations for later assessment
all_samples.extend([s["x"] for s in samples])
all_evaluations.extend(evaluations)
best_eval_index = int(np.argmax(all_evaluations))
best_sample = all_samples[best_eval_index]
best_eval = all_evaluations[best_eval_index]
assert np.isclose(best_sample, CONT_1D_ARGMAX, atol=1e-1)
assert np.isclose(best_eval, CONT_1D_MAX, atol=1e-1)
def evaluate_heterogeneous_3d(opt, batch_size, n_steps):
all_samples = []
all_evaluations = []
for i in range(n_steps):
samples = opt.run_step(batch_size, minimise=False)
evaluations = [heterogeneous_3d(s["x"], s["y"], s["z"])
for s in samples]
opt.update(samples, [EvaluationScore(ev) for ev in evaluations], )
# gather the samples and evaluations for later assessment
all_samples.extend([(s["x"], s["y"], s["z"]) for s in samples])
all_evaluations.extend(evaluations)
best_eval_index = int(np.argmax(all_evaluations))
best_sample = all_samples[best_eval_index]
best_eval = all_evaluations[best_eval_index]
assert np.isclose(best_sample[0], HETEROGEN_3D_ARGMAX[0], atol=1.0)
assert best_sample[1:] == HETEROGEN_3D_ARGMAX[1:]
assert np.isclose(best_eval, HETEROGEN_3D_MAX, atol=1.0)
def evaluate_discrete_3d(opt, batch_size, n_steps):
all_samples = []
all_evaluations = []
for i in range(n_steps):
samples = opt.run_step(batch_size, minimise=False)
evaluations = [discrete_3d(s["x"], s["y"], s["z"]) for s in samples]
opt.update(samples, [EvaluationScore(ev) for ev in evaluations], )
# gather the samples and evaluations for later assessment
all_samples.extend([(s["x"], s["y"], s["z"]) for s in samples])
all_evaluations.extend(evaluations)
best_eval_index = int(np.argmax(all_evaluations))
best_sample = all_samples[best_eval_index]
best_eval = all_evaluations[best_eval_index]
assert best_sample == DISCRETE_3D_ARGMAX
assert best_eval == DISCRETE_3D_MAX
================================================
FILE: hypertunity/optimisation/tests/test_bo.py
================================================
import GPy
import numpy as np
import pytest
from hypertunity.domain import Domain
from hypertunity.optimisation import base, bo
from . import _common as test_utils
def test_bo_update_and_reset():
domain = Domain({"a": {"b": [2, 3], "d": {"f": [3, 4]}}, "c": [0, 0.1]})
bayes_opt = bo.BayesianOptimisation(domain, seed=7)
samples = []
n_reps = 3
for i in range(n_reps):
samples.extend(bayes_opt.run_step(batch_size=1, minimise=False))
bayes_opt.update(samples[-1], base.EvaluationScore(2. * i))
assert len(bayes_opt._data_x) == n_reps
assert len(bayes_opt._data_fx) == n_reps
assert np.all(
bayes_opt._data_x == np.array([bayes_opt._convert_to_gpyopt_sample(s)
for s in samples])
)
assert np.all(
bayes_opt._data_fx == 2. * np.arange(n_reps).reshape(n_reps, 1)
)
bayes_opt.reset()
assert len(bayes_opt.history) == 0
def test_bo_set_history():
n_samples = 10
domain = Domain({"a": {"b": [2, 3]}, "c": [0, 0.1]})
history = [
base.HistoryPoint(
domain.sample(),
{"score": base.EvaluationScore(float(i))}
)
for i in range(n_samples)
]
bayes_opt = bo.BayesianOptimisation(domain, seed=7)
bayes_opt.history = history
assert bayes_opt.history == history
assert len(bayes_opt._data_x) == len(bayes_opt._data_fx) == len(history)
@pytest.mark.slow
def test_bo_simple_continuous():
domain = Domain({"x": [-1., 6.]})
bayes_opt = bo.BayesianOptimization(domain=domain, seed=7)
test_utils.evaluate_continuous_1d(bayes_opt, batch_size=2, n_steps=7)
@pytest.mark.slow
def test_bo_simple_mixed():
domain = Domain({"x": [-5., 6.], "y": {"sin", "sqr"}, "z": set(range(4))})
bayes_opt = bo.BayesianOptimization(domain=domain, seed=7)
test_utils.evaluate_heterogeneous_3d(bayes_opt, batch_size=7, n_steps=3)
@pytest.mark.slow
def test_bo_custom_model():
domain = Domain({"x": [-2., 2.]})
bayes_opt = bo.BayesianOptimisation(domain=domain, seed=7)
kernel = GPy.kern.RBF(1) + GPy.kern.Bias(1)
n_steps = 3
batch_size = 3
all_samples = []
all_evaluations = []
first_samples = bayes_opt.run_step(batch_size=batch_size, minimise=False)
xs = np.atleast_2d([s["x"] for s in first_samples])
ys = np.atleast_2d(test_utils.continuous_heteroscedastic_1d(
np.array([s["x"] for s in first_samples]))
)
for i in range(n_steps):
custom_model = GPy.models.GPHeteroscedasticRegression(xs, ys, kernel)
samples = bayes_opt.run_step(
batch_size,
minimise=False,
model=custom_model
)
evaluations = test_utils.continuous_heteroscedastic_1d(
np.array([s["x"] for s in samples])
)
bayes_opt.update(
samples, [base.EvaluationScore(ev) for ev in evaluations]
)
xs = np.concatenate(
[xs, np.atleast_2d([s["x"] for s in samples])], axis=0
)
ys = np.concatenate([ys, np.atleast_2d(evaluations)], axis=0)
# gather the samples and evaluations for later assessment
all_samples.extend([s["x"] for s in samples])
all_evaluations.extend(evaluations)
best_eval_index = int(np.argmax(all_evaluations))
best_sample = all_samples[best_eval_index]
assert np.isclose(
best_sample, test_utils.CONT_HETEROSCED_1D_ARGMAX, atol=1e-1
)
@pytest.mark.skip("Due to https://github.com/SheffieldML/GPyOpt/issues/260"
" using GP_MCMC model can not be tested yet.")
@pytest.mark.slow
def test_bo_gp_mcmc_model():
domain = Domain({"x": [-1., 6.]})
bayes_opt = bo.BayesianOptimization(domain=domain, seed=7)
test_utils.evaluate_continuous_1d(
bayes_opt,
batch_size=1,
n_steps=7,
model="GP_MCMC",
evaluator_type="sequential"
)
================================================
FILE: hypertunity/optimisation/tests/test_exhaustive.py
================================================
import pytest
from hypertunity.domain import Domain
from hypertunity.optimisation import exhaustive
from . import _common as test_utils
def test_grid_simple_discrete():
domain = Domain({
"x": {1, 2, 3, 4},
"y": {-3, 2, 5},
"z": {"small", "large"}
})
gs = exhaustive.GridSearch(domain=domain)
test_utils.evaluate_discrete_3d(gs, batch_size=4, n_steps=3 * 2)
with pytest.raises(exhaustive.ExhaustedSearchSpaceError):
gs.run_step(batch_size=4)
gs.reset()
assert len(gs.run_step(batch_size=4)) == 4
def test_grid_simple_mixed():
domain = Domain({"x": [-5., 6.], "y": {"sin", "sqr"}, "z": set(range(4))})
with pytest.raises(exhaustive.DomainNotIterableError):
_ = exhaustive.GridSearch(domain)
gs = exhaustive.GridSearch(domain, sample_continuous=True, seed=93)
assert len(gs.run_step(batch_size=8)) == 8
def test_update():
domain = Domain({"x": {-5., 6.}})
gs = exhaustive.GridSearch(domain)
gs.update([domain.sample() for _ in range(10)], list(range(10)))
gs.update(domain.sample(), {"score": 23.0})
gs.update(domain.sample(), 2.0)
assert len(gs.history) == 12
================================================
FILE: hypertunity/optimisation/tests/test_random.py
================================================
from hypertunity.domain import Domain
from hypertunity.optimisation import random
from . import _common as test_utils
def test_random_simple_continuous():
domain = Domain({"x": [-1., 6.]})
rs = random.RandomSearch(domain=domain, seed=7)
test_utils.evaluate_continuous_1d(rs, batch_size=50, n_steps=2)
def test_random_simple_mixed():
domain = Domain({"x": [-5., 6.], "y": {"sin", "sqr"}, "z": set(range(4))})
rs = random.RandomSearch(domain=domain, seed=1)
test_utils.evaluate_heterogeneous_3d(rs, batch_size=50, n_steps=25)
def test_update():
domain = Domain({"x": [-5., 6.]})
rs = random.RandomSearch(domain)
rs.update([domain.sample() for _ in range(4)], list(range(4)))
rs.update(domain.sample(), {"score": 23.0})
rs.update(domain.sample(), 2.0)
assert len(rs.history) == 6
rs.reset()
assert len(rs.history) == 0
================================================
FILE: hypertunity/reports/__init__.py
================================================
from .base import Reporter
from .table import Table
================================================
FILE: hypertunity/reports/base.py
================================================
import abc
import datetime
import os
from typing import Any, Callable, Dict, List, Optional, Tuple, Union
import tinydb
from hypertunity.domain import Domain, Sample
from hypertunity.optimisation.base import EvaluationScore, HistoryPoint
__all__ = [
"Reporter"
]
HistoryEntryType = Union[
HistoryPoint,
Tuple[Sample, Union[float, Dict[str, float], Dict[str, EvaluationScore]]]
]
class Reporter:
"""Abstract class :class:`Reporter` for result visualisation."""
def __init__(self, domain: Domain,
metrics: List[str],
primary_metric: str = "",
database_path: str = None):
"""Initialise the base reporter with domain and metrics.
Args:
domain: A :class:`Domain` from which all evaluated samples are drawn.
metrics: :obj:`List[str]` with names of the metrics used during
evaluation.
primary_metric: (optional) :obj:`str` primary metric from `metrics`.
This is used to determine the best sample. Defaults to the first one.
database_path: (optional) :obj:`str` path to the database for
storing experiment history on disk. Defaults to in-memory storage.
"""
self.domain = domain
if not metrics:
self.metrics = ["score"]
else:
self.metrics = metrics
if not primary_metric:
self.primary_metric = self.metrics[0]
else:
self.primary_metric = primary_metric
self._default_table_name = f"trial_{datetime.datetime.now().isoformat()}"
if database_path is not None:
if not os.path.exists(database_path):
os.makedirs(database_path)
db_path = os.path.join(database_path, "db.json")
self._db = tinydb.TinyDB(
db_path,
sort_keys=True,
indent=4,
separators=(',', ': ')
)
else:
from tinydb.storages import MemoryStorage
self._db = tinydb.TinyDB(storage=MemoryStorage,
default_table=self._default_table_name)
self._db_default_table = self._db.table(self._default_table_name)
@property
def database(self):
"""Return the logging database."""
return self._db
@property
def default_database_table(self):
"""Return the default database table name."""
return self._default_table_name
def log(self, entry: HistoryEntryType, **kwargs: Any):
"""Create an entry for an optimisation history point in the
:class:`Reporter`.
Args:
entry: :class:`HistoryPoint` or :obj:`Tuple[Sample, Dict]`.
The history point to log. If given as a tuple of :class:`Sample`
instance and a mapping from metric names to results, the
variance of the evaluation noise can be supplied by adding
an entry in the dict with the metric name and the suffix '_var'.
**kwargs: (optional) :obj:`Any`. Additional arguments for the
logging implementation in a subclass.
Keyword Args:
meta: (optional) additional information to be logged in the database
for this entry.
"""
if isinstance(entry, Tuple):
log_fn = self._log_tuple
elif isinstance(entry, HistoryPoint):
self._add_to_db(entry, kwargs.pop("meta", None))
log_fn = self._log_history_point
else:
raise TypeError(
"The history point can be either a tuple or a "
"`HistoryPoint` type object."
)
log_fn(entry, **kwargs)
def _log_tuple(self, entry: Tuple, **kwargs):
"""Helper function to convert the history entry from tuple to
:class:`HistoryPoint` and then log it using the overridden method
:method:`_log_history_point`.
"""
if not (len(entry) == 2 and isinstance(entry[0], Sample)
and isinstance(entry[1], (Dict, EvaluationScore, float))):
raise ValueError(f"Malformed history entry tuple: {entry}.")
sample, metrics_obj = entry
if isinstance(metrics_obj, (float, EvaluationScore)):
# use default name for score column
metrics_obj = {self.primary_metric: metrics_obj}
metrics = {}
# create a properly formatted metrics dict of type Dict[str, EvaluationScore]
for name, val in metrics_obj.items():
if name in metrics:
continue
if name.endswith("_var"):
metric_name = name.rstrip("_var")
if (metric_name not in metrics_obj
or not isinstance(metrics_obj[metric_name], float)):
raise ValueError(
f"Metrics dict does not contain a proper value "
f"for metric {metric_name}."
)
metrics[metric_name] = EvaluationScore(
value=metrics_obj[metric_name],
variance=val
)
elif isinstance(val, EvaluationScore):
metrics[name] = val
elif isinstance(val, float):
metrics[name] = EvaluationScore(
value=val,
variance=metrics_obj.get(f"{name}_var", 0.0)
)
entry = HistoryPoint(sample=sample, metrics=metrics)
self._add_to_db(entry, kwargs.pop("meta", None))
self._log_history_point(entry, **kwargs)
@abc.abstractmethod
def _log_history_point(self, entry: HistoryPoint, **kwargs: Any):
"""Abstract method to override.
Log the :class:`HistoryPoint` entry into the reporter.
Args:
entry: :class:`HistoryPoint`. The sample and evaluation metrics to log.
"""
raise NotImplementedError
def _add_to_db(self, entry: HistoryPoint, meta: Any = None):
document = self._convert_history_to_doc(entry)
if meta is not None:
document["meta"] = meta
self._db_default_table.insert(document)
def get_best(self, criterion: Union[str, Callable] = "max") -> Optional[Dict[str, Any]]:
"""Return the entry from the database which corresponds to the best
scoring experiment.
Args:
criterion: :obj:`str` or :obj:`Callable`. The function used to
determine whether the highest or lowest score is requested. If
several evaluation metrics are present, then a custom `criterion`
must be supplied.
Returns:
JSON object or `None` if the database is empty. The content of the
database for the best experiment.
"""
if not self._db_default_table:
return None
if isinstance(criterion, str):
predefined = {"max": max, "min": min}
if criterion not in predefined:
raise ValueError(
f"Unknown criterion for finding best experiment. "
f"Select one from {list(predefined.keys())} "
f"or supply a custom function."
)
selection_fn = predefined[criterion]
elif isinstance(criterion, Callable):
selection_fn = criterion
else:
raise TypeError("The criterion must be of type str or Callable.")
return self._get_best_from_db(selection_fn)
def _get_best_from_db(self, selection_fn: Callable):
best_entry = self._db_default_table.get(doc_id=1)
best_score = best_entry["metrics"][self.primary_metric]["value"]
for entry in self._db_default_table:
current_score = entry["metrics"][self.primary_metric]["value"]
new_score = selection_fn(current_score, best_score)
if new_score != best_score:
best_entry = entry
best_score = new_score
return best_entry
def from_history(self, history: List[HistoryEntryType]):
"""Load the reporter with data from an entry of evaluations.
Args:
history: :obj:`List[HistoryPoint]` or :obj:`Tuple`. The sequence of
evaluations comprised of samples and metrics.
"""
for h in history:
self.log(h)
def from_database(self, database: Union[str, tinydb.TinyDB], table: str = None):
"""Load history from a database supplied as a path to a file or a
:obj:`tinydb.TinyDB` object.
Args:
database: :obj:`str` or :obj:`tinydb.TinyDB`. The database to load.
table: (optional) :obj:`str`. The table to load from the database.
This argument is not required if the database has only one table.
Raises:
:class:`ValueError`: if the database contains more than one table
and `table` is not given.
"""
if isinstance(database, str):
db = tinydb.TinyDB(database, sort_keys=True, indent=4, separators=(',', ': '))
elif isinstance(database, tinydb.TinyDB):
db = database
else:
raise TypeError("The database must be of type str or tinydb.TinyDB.")
if len(db.tables()) > 1 and table is None:
raise ValueError(
"Ambiguous database with multiple tables. "
"Specify a table name."
)
if table is None:
table = list(db.tables())[0]
self._db = db
self._db_default_table = self._db.table(table)
def to_history(self, table: str = None) -> List[HistoryPoint]:
"""Export the reporter logged history from a database table to an
optimiser-friendly history.
Args:
table: (optional) :obj:`str`. The name of the table to export.
Defaults to the one created during reporter initialisation.
Returns:
A list of :class:`HistoryPoint` objects which can be loaded into
an :class:`Optimiser` instance.
"""
history = []
if table is None:
default_table = self._db_default_table
else:
default_table = self._db.table(table)
for doc in default_table:
history.append(self._convert_doc_to_history(doc))
return history
@staticmethod
def _convert_history_to_doc(entry: HistoryPoint) -> Dict:
db_entry = {
"sample": entry.sample.as_dict(),
"metrics": {k: {
"value": v.value,
"variance": v.variance
} for k, v in entry.metrics.items()}
}
return db_entry
@staticmethod
def _convert_doc_to_history(document: Dict) -> HistoryPoint:
hist_point = HistoryPoint(
sample=Sample(document["sample"]),
metrics={k: EvaluationScore(v["value"], v["variance"])
for k, v in document["metrics"].items()}
)
return hist_point
================================================
FILE: hypertunity/reports/table.py
================================================
from typing import Any, List, Union
import beautifultable as bt
import numpy as np
import tinydb
from hypertunity import utils
from hypertunity.domain import Domain
from hypertunity.optimisation.base import HistoryPoint
from .base import Reporter
__all__ = [
"Table"
]
class Table(Reporter):
"""A :class:`Reporter` subclass to print and store a formatted table of
the results.
"""
def __init__(self, domain: Domain,
metrics: List[str],
primary_metric: str = "",
database_path: str = None):
"""Initialise the table reporter with domain and metrics.
Args:
domain: A :class:`Domain` from which all evaluated samples are drawn.
metrics: :obj:`List[str]` with names of the metrics used during evaluation.
primary_metric: (optional) :obj:`str` primary metric from `metrics`.
This is used to determine the best sample. Defaults to the first one.
database_path: (optional) :obj:`str` path to the database for
storing experiment history on disk. Defaults to in-memory storage.
"""
super(Table, self).__init__(
domain, metrics, primary_metric, database_path
)
self._table = bt.BeautifulTable()
self._table.set_style(bt.STYLE_SEPARATED)
dim_names = [".".join(dns) for dns in self.domain.flatten()]
self._table.column_headers = ["No.", *dim_names, *self.metrics]
def __str__(self):
"""Return the string representation of the table."""
return str(self._table)
@property
def data(self) -> np.array:
"""Return the table as a numpy array."""
return np.array(self._table)
def _log_history_point(self, entry: HistoryPoint, **kwargs: Any):
"""Create an entry for a :class:`HistoryPoint` in the table.
Args:
entry: :class:`HistoryPoint`. The history point to log. If given as
a tuple of :class:`Sample` instance and a mapping from metric
names to results, the variance of the evaluation noise can be
supplied by adding an entry in the dict with the metric name and
the suffix '_var'.
"""
id_ = len(self._table)
row = [id_ + 1,
*entry.sample.flatten().values(),
*entry.metrics.values()]
self._table.append_row(row)
@utils.support_american_spelling
def format(self, order: str = "none", emphasise: bool = False) -> str:
"""Format the table and return it as a string.
Supported formatting is sorting and emphasising of the best result.
Args:
order: (optional) :obj:`str`. The order of sorting by the primary
metric. Can be "none", "ascending" or "descending".
Defaults to "none".
emphasise: (optional) :obj:`bool`. Whether to emphasise the best
experiment by marking it in yellow and blinking if supported.
Defaults to `False`.
Returns:
:obj:`str` of the formatted table.
"""
table_copy = self._table.copy()
if order not in ["none", "descending", "ascending"]:
raise ValueError(
"`order` argument can only be 'ascending' or 'descending'."
)
if order != "none":
table_copy.sort(
key=self.primary_metric,
reverse=order == "descending"
)
if emphasise:
best_row_ind = int(np.argmax(
list(table_copy.get_column(self.primary_metric))
))
emphasised_best_row = map(
lambda x: f"\033[33;5;7m{x}\033[0m", table_copy[best_row_ind]
)
table_copy.update_row(best_row_ind, emphasised_best_row)
return str(table_copy)
def from_database(self, database: Union[str, tinydb.TinyDB], table: str = None):
"""Load history from a database supplied as a path to a file or a
:obj:`tinydb.TinyDB` object.
Args:
database: :obj:`str` or :obj:`tinydb.TinyDB`. The database to load.
table: (optional) :obj:`str`. The table to load from the database.
This argument is not required if the database has only one table.
Raises:
:class:`ValueError`: if the database contains more than one table
and `table` is not given.
"""
super(Table, self).from_database(database, table)
for doc in self._db_default_table:
history_point = self._convert_doc_to_history(doc)
self._log_history_point(history_point)
================================================
FILE: hypertunity/reports/tensorboard.py
================================================
import os
import sys
from typing import Any, Dict, List, Union
import tinydb
from hypertunity import utils
from hypertunity.domain import Domain, Sample
from hypertunity.optimisation.base import HistoryPoint
from .base import Reporter
try:
import tensorflow as tf
from tensorboard.plugins.hparams import api as hp
except ImportError as err:
raise ImportError("Install tensorflow>=1.14 and tensorboard>=1.14 "
"to support the HParams plugin.") from err
__all__ = [
"Tensorboard"
]
EAGER_MODE = tf.executing_eagerly()
session_builder = tf.compat.v1.Session
if str(tf.version.VERSION) < "2.":
summary_file_writer = tf.compat.v2.summary.create_file_writer
summary_scalar = tf.compat.v2.summary.scalar
else:
summary_file_writer = tf.summary.create_file_writer
summary_scalar = tf.summary.scalar
class Tensorboard(Reporter):
"""A :class:`Reporter` subclass to visualise the results in Tensorboard.
It utilises Tensorboard's HParams plugin as a dashboard for the summary of
the optimisation. This class prepares and creates entries with the scalar
data of the experiment trials, containing the domain sample and the
corresponding metrics.
Notes:
The user is responsible for launching TensorBoard in the browser.
"""
def __init__(self, domain: Domain, metrics: List[str], logdir: str,
primary_metric: str = "",
database_path: str = None):
"""Initialise the TensorBoard reporter.
Args:
domain: :class:`Domain`. The domain to which all evaluated samples belong.
metrics: :obj:`List[str]`. The names of the metrics.
logdir: :obj:`str`. Path to a folder for storing the Tensorboard events.
primary_metric: (optional) :obj:`str`. Primary metric from `metrics`.
This is used by the :py:meth:`format` method to determine the
sorting column and the best value. Default is the first one.
database_path: (optional) :obj:`str`. The path to the database for
storing experiment history on disk. Default is in-memory storage.
"""
super(Tensorboard, self).__init__(
domain, metrics, primary_metric, database_path
)
self._hparams_domain = self._convert_to_hparams_domain(self.domain)
if not os.path.exists(logdir):
os.makedirs(logdir)
self._logdir = logdir
self._experiment_counter = 0
self._set_up()
print(f"Run 'tensorboard --logdir={logdir}' to launch "
f"the visualisation in TensorBoard", file=sys.stderr)
@staticmethod
def _convert_to_hparams_domain(domain: Domain) -> Dict[str, hp.HParam]:
hparams = {}
for var_name, dim in domain.flatten().items():
dim_type = Domain.get_type(dim)
joined_name = utils.join_strings(var_name, join_char="/")
if dim_type == Domain.Continuous:
hp_dim_type = hp.RealInterval
vals = list(map(float, dim))
elif dim_type in [Domain.Discrete, Domain.Categorical]:
hp_dim_type = hp.Discrete
vals = (dim,)
else:
raise TypeError(
f"Cannot map subdomain of type {dim_type} "
f"to a known HParams domain."
)
hparams[joined_name] = hp.HParam(joined_name, hp_dim_type(*vals))
return hparams
def _convert_to_hparams_sample(self, sample: Sample) -> Dict[hp.HParam, Any]:
hparams = {}
for name, val in sample:
joined_name = utils.join_strings(name, join_char="/")
hparams[self._hparams_domain[joined_name]] = val
return hparams
def _set_up(self):
with summary_file_writer(self._logdir).as_default():
hp.hparams_config(
hparams=self._hparams_domain.values(),
metrics=[hp.Metric(m) for m in self.metrics])
@staticmethod
def _log_tf_eager_mode(params, metrics, full_experiment_dir):
"""Log in eager mode."""
with summary_file_writer(full_experiment_dir).as_default():
hp.hparams(params)
for metric_name, metric_value in metrics.items():
summary_scalar(metric_name, metric_value.value, step=1)
@staticmethod
def _log_tf_graph_mode(params, metrics, full_experiment_dir):
"""Log in legacy graph execution mode with session creation."""
with summary_file_writer(full_experiment_dir).as_default() as fw, session_builder() as sess:
sess.run(fw.init())
sess.run(hp.hparams(params))
for metric_name, metric_value in metrics.items():
sess.run(summary_scalar(metric_name, metric_value.value, step=1))
sess.run(fw.flush())
def _log_history_point(self, entry: HistoryPoint, experiment_dir: str = None):
"""Create an entry for a :class:`HistoryPoint` in Tensorboard.
Args:
entry: :class:`HistoryPoint`. The sample and evaluation metrics to log.
experiment_dir: (optional) :obj:`str`. The directory name where to
store all experiment related data. It will be prefixed by the
`logdir` path which is provided on initialisation of the
:class:`Tensorboard` object. Default is 'experiment_[number]'.
"""
converted = self._convert_to_hparams_sample(entry.sample)
if not experiment_dir:
experiment_dir = f"experiment_{str(self._experiment_counter)}"
self._experiment_counter += 1
full_experiment_dir = os.path.join(self._logdir, experiment_dir)
if EAGER_MODE:
self._log_tf_eager_mode(converted, entry.metrics, full_experiment_dir)
else:
self._log_tf_graph_mode(converted, entry.metrics, full_experiment_dir)
def from_database(self, database: Union[str, tinydb.TinyDB], table: str = None):
"""Load history from a database supplied as a path to a file or a
:obj:`tinydb.TinyDB` object.
Args:
database: :obj:`str` or :obj:`tinydb.TinyDB`. The database to load.
table: (optional) :obj:`str`. The table to load from the database.
This argument is not required if the database has only one table.
Raises:
:class:`ValueError`: if the database contains more than one table
and `table` is not given.
"""
super(Tensorboard, self).from_database(database, table)
for doc in self._db_default_table:
history_point = self._convert_doc_to_history(doc)
self._log_history_point(history_point)
================================================
FILE: hypertunity/reports/tests/__init__.py
================================================
================================================
FILE: hypertunity/reports/tests/conftest.py
================================================
import pytest
from hypertunity.domain import Domain
from hypertunity.optimisation.base import EvaluationScore, HistoryPoint
@pytest.fixture(scope="session")
def generated_history():
domain = Domain({
"x": [-5., 6.],
"y": {"sin", "sqr"},
"z": set(range(4))
}, seed=7)
n_samples = 10
history = [HistoryPoint(sample=domain.sample(),
metrics={"metric_1": EvaluationScore(float(i)),
"metric_2": EvaluationScore(i * 2.)})
for i in range(n_samples)]
if len(history) == 1:
history = history[0]
return history, domain
================================================
FILE: hypertunity/reports/tests/test_table.py
================================================
import os
import tempfile
from hypertunity.optimisation.base import EvaluationScore
from ..table import Table
def test_from_to_history(generated_history):
history, domain = generated_history
rep = Table(
domain,
metrics=["metric_1", "metric_2"],
primary_metric="metric_1"
)
rep.from_history(history)
data_history = [
[i + 1, *list(h.sample.flatten().values()), *list(h.metrics.values())]
for i, h in enumerate(history)
]
assert rep.data.tolist() == data_history
assert rep.to_history() == history
def test_from_tuple_and_history_point(generated_history):
history, domain = generated_history
hist_point = history[0]
rep = Table(
domain,
metrics=["metric_1", "metric_2"],
primary_metric="metric_1"
)
rep.log(hist_point)
sample = domain.sample()
rep.log((sample, {"metric_1": 1.0, "metric_2": 2.0, "metric_2_var": 3.0}))
assert rep.data.tolist() == [
[1, *list(hist_point.sample.flatten().values()),
*list(hist_point.metrics.values())],
[2, *list(sample.flatten().values()),
EvaluationScore(1.0), EvaluationScore(2.0, 3.0)]
]
def test_database_and_get_best(generated_history):
history, domain = generated_history
with tempfile.TemporaryDirectory() as db_dir:
rep = Table(
domain,
metrics=["metric_1", "metric_2"],
database_path=db_dir
)
best_meta, best_metrics, best_sample = {}, {}, {}
best_score = float("-inf")
for i, hp in enumerate(history):
rep.log(hp, meta={"id": i})
if hp.metrics["metric_1"].value > best_score:
best_meta = {"id": i}
best_metrics = {k: {"value": v.value, "variance": v.variance}
for k, v in hp.metrics.items()}
best_sample = hp.sample.as_dict()
best_score = hp.metrics["metric_1"].value
assert len(rep.database.table(rep.default_database_table)) == len(history)
best_entry = rep.get_best(criterion="max")
assert best_entry["meta"] == best_meta
assert best_entry["metrics"] == best_metrics
assert best_entry["sample"] == best_sample
rep2 = Table(domain, metrics=["metric_1", "metric_2"])
rep2.from_database(rep.database, table=rep.default_database_table)
rep3 = Table(domain, metrics=["metric_1", "metric_2"])
rep3.from_database(os.path.join(db_dir, "db.json"),
table=rep.default_database_table)
assert str(rep) == str(rep2) == str(rep3)
assert rep.get_best() == rep2.get_best() == rep3.get_best()
================================================
FILE: hypertunity/reports/tests/test_tensorboard.py
================================================
import os
import tempfile
from ..tensorboard import Tensorboard
def test_from_to_history(generated_history):
history, domain = generated_history
with tempfile.TemporaryDirectory() as tmp_dir:
rep = Tensorboard(
domain,
metrics=["metric_1", "metric_2"],
logdir=tmp_dir
)
rep.from_history(history)
assert len([dirname for dirname in os.listdir(tmp_dir)
if dirname.startswith("experiment_")]) == len(history)
for root, dirs, files in os.walk(tmp_dir):
assert all(map(lambda x: x.startswith("events.out.tfevents"), files))
assert rep.to_history() == history
def test_from_tuple_and_history_point(generated_history):
history, domain = generated_history
hist_point = history[0]
with tempfile.TemporaryDirectory() as tmp_dir:
rep = Tensorboard(
domain,
metrics=["metric_1", "metric_2"],
logdir=tmp_dir
)
rep.log(hist_point)
rep.log((domain.sample(),
{"metric_1": 1.0, "metric_2": 2.0, "metric_2_var": 3.0}))
assert len([dirname for dirname in os.listdir(tmp_dir)
if dirname.startswith("experiment_")]) == 2
for root, dirs, files in os.walk(tmp_dir):
assert all(map(lambda x: x.startswith("events.out.tfevents"), files))
================================================
FILE: hypertunity/scheduling/__init__.py
================================================
from .jobs import *
from .scheduler import *
================================================
FILE: hypertunity/scheduling/jobs.py
================================================
"""Definition of `Job` and `Result` classes used to encapsulate an experiment
and the corresponding outcomes.
"""
import enum
import importlib
import os
import pickle
import re
import subprocess
import sys
import tempfile
import time
from dataclasses import dataclass, field
from functools import partial
from typing import Any, Callable, Dict, List, Tuple, Union
__all__ = [
"Job",
"SlurmJob",
"Result"
]
# Global registries to control the job and result id assignment
_JOB_REGISTRY = set()
_RESULT_REGISTRY = set()
_ID_COUNTER = -1
def reset_registry():
"""Reset the global job and result registries.
Notes:
This function should be used with care as it will allow for jobs with
repeating IDs to be created. As a consequence, two or more
:class:`Result` objects might coexist end make the actual experiment
outcome ambiguous.
"""
global _ID_COUNTER
_JOB_REGISTRY.clear()
_RESULT_REGISTRY.clear()
_ID_COUNTER = -1
def generate_id():
"""Generate a new, unused integer job id."""
global _ID_COUNTER
_ID_COUNTER += 1
return _ID_COUNTER
def import_script(path):
"""Import a module or script by a given path.
Args:
path: :obj:`str`, can be either a module import of the form
[package.]*[module] if the outer most package is in the
`PYTHONPATH`, or a path to an arbitrary python script.
Returns:
The loaded python script as a module.
"""
try:
module = importlib.import_module(path)
except ModuleNotFoundError:
if not os.path.isfile(path):
raise FileNotFoundError(f"Cannot find script {path}.")
if not os.path.basename(path).endswith(".py"):
raise ValueError(
f"Expected a python script ending with *.py, "
f"found {os.path.basename(path)}.")
import_path = os.path.dirname(os.path.abspath(path))
sys.path.append(import_path)
module = importlib.import_module(
f"{os.path.basename(path).rstrip('.py')}",
package=f"{os.path.basename(import_path)}"
)
sys.path.pop()
return module
def run_command(cmd: List[str]) -> str:
"""Execute a command in the shell.
Args:
cmd: :obj:`List[str]`. The command with its arguments to execute.
Returns:
The standard output of the command.
Raises:
:obj:`OSError`: if the standard error stream is not empty.
"""
ps = subprocess.run(args=cmd, capture_output=True)
if ps.stderr:
raise OSError(f"Failed running {' '.join(cmd)} with error message: "
f"{ps.stderr.decode('utf-8')}.")
return ps.stdout.decode("utf-8")
def get_callable_from_script(script_path: str, func_name: str = "main") -> Callable:
"""Convert a module to a callable function and call the `main` function of
the module.
Args:
script_path: str, the file path to the python script to run. It can
either be given as a module i.e. in the [package.]*[module] form,
or as a path to a *.py file in case it is not added into the
PYTHONPATH environment variable.
func_name: str, the name of the function to run.
Returns:
The wrapper which calls a function from the script module.
Raises:
`AttributeError` if the script does not define a `func_name` function.
"""
def wrapper(*args):
module = import_script(script_path)
if not hasattr(module, func_name):
raise AttributeError(
f"Cannot find {func_name} function in {script_path}."
)
return getattr(module, func_name)(*args)
return wrapper
def run_script_with_args(binary: str, script_path: str, *args: Any, **kwargs: Any):
"""Run script using a binary and command line arguments.
Args:
binary: str, the binary to run the script with, e.g. 'python'.
script_path: str, the path to the script.
*args: Any, a collection of arguments which will be converted to string
and passed on to the run command.
**kwargs: Any, keyword arguments which will be converted to named script
arguments.
Returns:
The contents of the results, which the script is assumed to store,
given an output file path as an argument.
Raises:
FileNotFoundError if the script cannot be found.
Notes:
It assumes that the script will store the results on disk using the
path provided by the last of the command line arguments.
"""
if not os.path.isfile(script_path):
raise FileNotFoundError(f"Cannot find script {script_path}.")
with tempfile.TemporaryDirectory() as tmpdir:
output_file = os.path.join(tmpdir, "results.pkl")
args_as_str, kwargs_as_str = [], []
if args:
args_as_str.extend([*map(str, args), output_file])
if kwargs:
kwargs_as_str.extend([
str(item) for k_v in kwargs.items() for item in k_v
])
kwargs_as_str.extend(["--output_file", output_file])
run_command([binary, script_path, *args_as_str, *kwargs_as_str])
return fetch_result(output_file)
def fetch_result(output_file, n_trials: int = 5, waiting_time: float = 1.0) -> Any:
"""Load the output file.
Args:
output_file: str, a path to the output file.
n_trials: int, optional number of trials to load the file, afterwards a
None is returned.
waiting_time: float, time in seconds to wait before retrying to load
the file.
Returns:
The unpickled output file if found, else None.
"""
if output_file is None:
return None
for _ in range(n_trials):
if os.path.isfile(output_file):
break
time.sleep(waiting_time)
else:
return None
with open(output_file, 'rb') as fp:
return pickle.load(fp)
@dataclass(frozen=True)
class Job:
"""Default :class:`Job` class defining an experiment as a runnable task on
the local machine.
The job is defined by a callable function or a script task. In the case of
the former the `args` will be passed directly to it upon calling. Otherwise
either a module will be run as a scirpt with command line arguments or a
function, attribute of the module, will be called with the `args` as input.
In both cases a :class:`Result` object will be returned.
Attributes:
id: :obj:`int`. The job identifier. Must be unique.
args: :obj:`tuple` or :obj:`dict`. The arguments or keyword arguments
for the callable function or script.
task: :obj:`Callable` or :obj:`str`, a python function to run or a
file path to a python script.
"""
task: Union[Callable, str]
args: Union[Tuple, Dict] = ()
id: int = field(default_factory=generate_id)
meta: Any = None
# job related constants
_JOB_SCRIPT_FUNC_SEPARATOR = ":"
_JOB_DEFAULT_BINARY = "source"
_JOB_SCRIPT_FUNC_SEPARATION_REGEX = r"[^\w\/\.]+"
def __post_init__(self):
if not isinstance(self.task, (Callable, str)):
raise ValueError(
"Job's task must be either a callable function "
"or a path to a script."
)
if self.id in _JOB_REGISTRY:
raise ValueError(
f"Job with an ID {self.id} is already created. "
f"Reusing IDs is prohibited."
)
_JOB_REGISTRY.add(self.id)
def __hash__(self):
return hash(str(self.id))
def __call__(self, *args, **kwargs) -> 'Result':
all_args = args
all_kwargs = kwargs
if isinstance(self.args, Tuple):
all_args += self.args
else:
all_kwargs = dict(**kwargs, **self.args)
if isinstance(self.task, Callable):
runnable = self.task
else:
runnable = self._build_callable()
return Result(id=self.id, data=runnable(*all_args, **all_kwargs))
def _build_callable(self):
"""Create a function from a string task.
If the task is of the form /path/to/script.py::func_to_run, split the
path from the func and return a script.func_to_run callable.
If the task is of the form /path/to/script.py, then return a
python /path/to/script.py callable.
"""
if self._JOB_SCRIPT_FUNC_SEPARATOR in self.task:
# split the task string by the [:]+ marker
script_path, func_name = re.split(
self._JOB_SCRIPT_FUNC_SEPARATION_REGEX, self.task
)
assert script_path and func_name, \
f"Empty path {script_path} or function name {func_name}"
runnable = get_callable_from_script(script_path, func_name)
else:
binary = self._infer_binary()
runnable = partial(run_script_with_args, binary, self.task)
return runnable
def _infer_binary(self):
if isinstance(self.meta, dict) and "binary" in self.meta:
return self.meta["binary"]
if self.task.endswith(".py"):
return "python"
if self.task.endswith(".sh"):
return "bash"
return self._JOB_DEFAULT_BINARY
class SlurmJobState(enum.Enum):
"""Some of the most frequently encountered slurm job statuses."""
PENDING = 0
RUNNING = 1
COMPLETED = 2
FAILED = 3
CANCELLED = 4
UNKNOWN = 5
@classmethod
def from_string(cls, state: str):
if state == "running":
return cls.RUNNING
if state == "pending":
return cls.PENDING
if state == "completed":
return cls.COMPLETED
if state == "failed":
return cls.FAILED
if state == "cancelled":
return cls.CANCELLED
return cls.UNKNOWN
@dataclass(frozen=True)
class SlurmJob(Job):
"""A :class:`Job` subclass to schedule tasks on Slurm.
Runs an 'sbatch' command in the shell with the script.
Attributes:
output_file: (optional) :obj:`str`. Path to the file where the executed
script will dump the result file. If none is provided, a temporary
file will be created.
"""
output_file: str = None
# slurm shell commands
_SLURM_CMD_PUSH = ["sbatch"]
_SLURM_CMD_KILL = ["scancel"]
_SLURM_CMD_INFO = ["scontrol", "show", "job"]
# slurm script elements
_SLURM_SCRIPT_PREAMBLE = "#!/bin/bash"
_SLURM_SCRIPT_LINE_PREFIX = "#SBATCH"
_SLURM_SCRIPT_JOB_NAME = "--job-name"
_SLURM_SCRIPT_OUT_NAME = "--output"
_SLURM_SCRIPT_RESOURCES_MEM = "--mem"
_SLURM_SCRIPT_RESOURCES_TIME = "--time"
_SLURM_SCRIPT_RESOURCES_CPU = "--cpus-per-task"
_SLURM_SCRIPT_RESOURCES_GPU = "--gres"
# other macros
_SLURM_JOB_STATE_REGEX = r"JobState=(RUNNING|PENDING|COMPLETED|FAILED|CANCELLED)"
def __post_init__(self):
if not isinstance(self.task, str):
raise ValueError("Slurm job must be defined with a script to run.")
super(SlurmJob, self).__post_init__()
def __call__(self) -> 'Result':
res = self._execute_job()
return Result(id=self.id, data=res)
def _execute_job(self) -> Any:
with tempfile.NamedTemporaryFile(mode="w+t", suffix=".sh") as fp:
contents = self._create_slurm_script()
fp.writelines(contents)
fp.seek(0)
response = run_command(self._SLURM_CMD_PUSH + [f"{fp.name}"])
slurm_id = int(re.search(r"[\d]+", response).group())
while True:
slurm_status = self._query_job_status(slurm_id)
if slurm_status in [SlurmJobState.RUNNING, SlurmJobState.PENDING]:
time.sleep(1)
elif slurm_status in [SlurmJobState.CANCELLED, SlurmJobState.FAILED]:
return None
elif slurm_status == SlurmJobState.COMPLETED:
return fetch_result(self.output_file)
else:
raise RuntimeError(f"Unknown state of slurm job {slurm_id}.")
def _create_slurm_script(self) -> List[str]:
if not self.meta:
raise ValueError(f"Cannot infer slurm job parameters. "
f"Fill in meta dict in job {self.id}.")
else:
# Preamble, job name and output log filename definitions
content_lines = [
f"{self._SLURM_SCRIPT_PREAMBLE}\n",
f"{self._SLURM_SCRIPT_LINE_PREFIX} "
f"{self._SLURM_SCRIPT_JOB_NAME}=job_{self.id}\n",
f"{self._SLURM_SCRIPT_LINE_PREFIX} "
f"{self._SLURM_SCRIPT_OUT_NAME}=log_%j.txt\n"]
# Resources specification
n_cpus = int(self.meta.get("resources", {}).get("cpu", 1))
if n_cpus >= 1:
content_lines.append(
f"{self._SLURM_SCRIPT_LINE_PREFIX} "
f"{self._SLURM_SCRIPT_RESOURCES_CPU}={n_cpus}\n"
)
gpus = str(self.meta.get("resources", {}).get("gpu", ""))
if gpus:
if gpus.isnumeric():
gpus = f"gpu:{gpus}"
content_lines.append(
f"{self._SLURM_SCRIPT_LINE_PREFIX} "
f"{self._SLURM_SCRIPT_RESOURCES_GPU}={gpus}\n"
)
mem = str(self.meta.get("resources", {}).get("memory", ""))
if mem:
content_lines.append(
f"{self._SLURM_SCRIPT_LINE_PREFIX} "
f"{self._SLURM_SCRIPT_RESOURCES_MEM}={mem}\n"
)
limit_time = str(self.meta.get("resources", {}).get("time", ""))
if limit_time:
content_lines.append(
f"{self._SLURM_SCRIPT_LINE_PREFIX} "
f"{self._SLURM_SCRIPT_RESOURCES_TIME}={limit_time}\n"
)
# Task specification
binary = self.meta.get("binary", "python")
if isinstance(self.args, Tuple):
# build positional arguments
script_args = ' '.join([*map(str, self.args), self.output_file])
else:
# build named arguments
script_args = ' '.join([
*(str(item)
for key_val in self.args.items()
for item in key_val),
"--output_file", self.output_file
])
content_lines.append(f"{binary} {self.task} {script_args}")
return content_lines
def _query_job_status(self, slurm_id: int) -> SlurmJobState:
response = run_command(self._SLURM_CMD_INFO + [str(slurm_id)])
job_state = re.search(self._SLURM_JOB_STATE_REGEX, response)
if job_state is not None:
job_state = job_state.group(1).lower()
return SlurmJobState.from_string(job_state)
@dataclass(frozen=True)
class Result:
"""A :class:`Result` class to store the output of the executed :class:`Job`.
It shares the same id as the job which generated it.
Attributes:
id: :obj:`int`. The identifier of the `Result` object which corresponds
to the job that has been run.
data: :obj:`Any`. The output data of the job.
"""
data: Any
id: int
def __post_init__(self):
if self.id in _RESULT_REGISTRY:
raise ValueError(
f"Result with an ID {self.id} is already created. "
f"Reusing IDs is prohibited."
)
_RESULT_REGISTRY.add(self.id)
================================================
FILE: hypertunity/scheduling/scheduler.py
================================================
"""A scheduler for running jobs locally in a parallel manner using joblib as
a backend.
"""
import multiprocessing as mp
import time
from typing import List
import joblib
from hypertunity import utils
from .jobs import Job, Result
__all__ = [
"Scheduler"
]
class Scheduler:
"""A manager for parallel execution of jobs.
A job must be of type :class:`Job` which produces a :class:`Result`
object upon successful completion. The scheduler maintains a job and
result queues.
Notes:
This class should be used as a context manager.
"""
def __init__(self, n_parallel: int = None):
"""Setup the job and results queues.
Args:
n_parallel: (optional) :obj:`int`. The number of jobs that can be
run in parallel. Defaults to `None` in which case all but one
available CPUs will be used.
"""
self._job_queue = mp.Queue()
self._result_queue = mp.Queue()
self._is_queue_closed = False
if n_parallel is None:
self.n_parallel = -2 # using all CPUs but 1
else:
self.n_parallel = max(n_parallel, 1)
self._servant = mp.Process(target=self._run_servant)
self._interrupt_event = mp.Event()
self._servant.start()
def __del__(self):
"""Clean up subprocesses on object deletion.
Close the queues and join all subprocesses before the object is deleted.
"""
if not self._is_queue_closed:
self.exit()
if self._servant.is_alive():
self._servant.terminate()
def __enter__(self):
"""Enter the context manager."""
return self
def __exit__(self, exc_type, exc_val, exc_tb):
"""Exit the context manager."""
self.exit()
def _run_servant(self):
"""Run the pool of workers on the dispatched jobs, fetched from the job
queue and collect the results into the result queue.
Notes:
The runner will take as long as all jobs from the job queue finish
before any results are written to the result queue.
"""
# TODO: Switch backend back to default "loky", after the leakage
# of semaphores is fixed
with joblib.Parallel(n_jobs=self.n_parallel,
backend="multiprocessing") as parallel:
while not self._interrupt_event.is_set():
current_jobs = utils.drain_queue(self._job_queue)
if not current_jobs:
continue
# the order of the results corresponds to the that of the jobs
# and the IDs don't need to be shuffled.
ids = [job.id for job in current_jobs]
# TODO: in a future version of joblib, this could be a generator
# and then the inputs would be stored immediately in the results
# queue. Be ready to update whenever this PR gets merged:
# https://github.com/joblib/joblib/pull/588
results = parallel(joblib.delayed(job)() for job in current_jobs)
assert len(ids) == len(results)
for res in results:
self._result_queue.put_nowait(res)
def dispatch(self, jobs: List[Job]):
"""Dispatch the jobs for parallel execution.
This method is non-blocking.
Args:
jobs: :obj:`List[Job]`. A list of jobs to run whenever resources
are available.
Notes:
Although the jobs are scheduled to run immediately, the actual
execution may take place after indefinite delay if the job runner
is occupied with older jobs.
"""
for job in jobs:
self._job_queue.put_nowait(job)
def collect(self, n_results: int, timeout: float = None) -> List[Result]:
"""Collect all the available results or wait until they become available.
Args:
n_results: :obj:`int`, number of results to wait for.
If `n_results` ≤ 0 then all available results will be returned.
timeout: (optional) :obj:`float`, number of seconds to wait for
results to appear. If None (default) then it will wait until
all `n_results` are collected.
Returns:
A list of :class:`Result` objects with length `n_results` at least.
Notes:
If `n_results` is overestimated and timeout is None, then this
method will hang forever. Therefore it is recommended that a timeout
is set.
Raises:
:obj:`TimeoutError`: if more than `timeout` seconds elapse before a
:class:`Result` is collected.
"""
if n_results > 0:
results = []
for i in range(n_results):
results.append(self._result_queue.get(block=True, timeout=timeout))
else:
results = utils.drain_queue(self._result_queue)
return results
def interrupt(self):
"""Interrupt the scheduler and all running jobs."""
self._interrupt_event.set()
def exit(self):
"""Exit the scheduler by closing the queues and terminating the
servant process.
"""
if not self._is_queue_closed:
utils.drain_queue(self._job_queue, close_queue=True)
self._job_queue.join_thread()
utils.drain_queue(self._result_queue, close_queue=True)
self._result_queue.join_thread()
self._is_queue_closed = True
self.interrupt()
# wait a bit for the subprocess to exit gracefully
n_retries = 3
while self._servant.is_alive() and n_retries > 0:
n_retries -= 1
time.sleep(0.05)
self._servant.terminate()
================================================
FILE: hypertunity/scheduling/tests/__init__.py
================================================
================================================
FILE: hypertunity/scheduling/tests/script.py
================================================
import argparse
import os
import pickle
import sys
class DoNotReplaceAction(argparse.Action):
def __call__(self, parser, namespace, values, option_string=None):
if getattr(namespace, self.dest) is None:
setattr(namespace, self.dest, values)
def parse_args(args):
parser = argparse.ArgumentParser()
parser.add_argument("x", nargs='?', type=int, action=DoNotReplaceAction)
parser.add_argument("--x", type=int)
parser.add_argument("y", nargs='?', type=float, action=DoNotReplaceAction)
parser.add_argument("--y", type=float)
parser.add_argument("z", nargs='?', type=str, action=DoNotReplaceAction)
parser.add_argument("--z", type=str)
parser.add_argument("output_file", nargs='?', type=str, action=DoNotReplaceAction)
parser.add_argument("--output_file", type=str)
return parser.parse_args(args)
def main(x: int, y: float, z: str) -> float:
if z.endswith(tuple("0123456789")):
return y * x
return y * x**2
if __name__ == '__main__':
parsed_args = parse_args(sys.argv[1:])
result = main(parsed_args.x, parsed_args.y, parsed_args.z)
print(result)
output_dir = os.path.dirname(parsed_args.output_file)
if not os.path.exists(output_dir):
os.makedirs(output_dir)
with open(parsed_args.output_file, 'wb') as fp:
pickle.dump(result, fp)
================================================
FILE: hypertunity/scheduling/tests/test_jobs.py
================================================
import pytest
from ..jobs import Job
def test_repeating_id():
_ = Job(task=sum, args=(), id=-100)
with pytest.raises(ValueError):
_ = Job(task=max, args=(), id=-100)
_ = Job(task=sum, args=(), id=-99)
def test_callable_job():
job_args = (131212, 123123123)
job = Job(task=lambda x, y: x + y, args=job_args)
result = job()
assert result.data == sum(job_args)
================================================
FILE: hypertunity/scheduling/tests/test_scheduler.py
================================================
import os
import tempfile
import pytest
from hypertunity.domain import Domain, Sample
from hypertunity.optimisation import base
from ..jobs import Job, SlurmJob
from ..scheduler import Scheduler
from . import script
@pytest.fixture(scope="module")
def shared_slurm_tmp_dir():
return "/tmp"
def square(sample: Sample) -> base.EvaluationScore:
return base.EvaluationScore(sample["x"]**2)
def run_jobs(jobs):
with Scheduler(n_parallel=2) as scheduler:
scheduler.dispatch(jobs)
results = scheduler.collect(n_results=len(jobs), timeout=60.0)
assert len(results) == len(jobs)
assert all([r.id == j.id for r, j in zip(results, jobs)])
return results
@pytest.mark.timeout(10.0)
def test_local_from_script_and_function():
domain = Domain({
"x": {0, 1, 2, 3},
"y": [-1., 1.],
"z": {"123", "abc"}
}, seed=7)
jobs = [Job(task="hypertunity/scheduling/tests/script.py::main",
args=(*domain.sample().as_namedtuple(),)) for _ in range(10)]
results = run_jobs(jobs)
assert all([r.data == script.main(*j.args) for r, j in zip(results, jobs)])
@pytest.mark.timeout(10.0)
def test_local_from_script_and_cmdline_args():
domain = Domain({
"x": {0, 1, 2, 3},
"y": [-1., 1.],
"z": {"123", "abc"}
}, seed=7)
jobs = [Job(task="hypertunity/scheduling/tests/script.py",
args=(*domain.sample().as_namedtuple(),),
meta={"binary": "python"}) for _ in range(10)]
results = run_jobs(jobs)
assert all([r.data == script.main(*j.args) for r, j in zip(results, jobs)])
@pytest.mark.timeout(10.0)
def test_local_from_script_and_cmdline_named_args():
domain = Domain({
"--x": {0, 1, 2, 3},
"--y": [-1., 1.],
"--z": {"acb123", "abc"}
}, seed=7)
jobs = [Job(task="hypertunity/scheduling/tests/script.py",
args=domain.sample().as_dict(),
meta={"binary": "python"}) for _ in range(10)]
results = run_jobs(jobs)
assert all([
r.data == script.main(**{k.lstrip("-"): v for k, v in j.args.items()})
for r, j in zip(results, jobs)
])
@pytest.mark.timeout(10.0)
def test_local_from_fn():
domain = Domain({"x": [0., 1.]}, seed=7)
jobs = [Job(task=square, args=(domain.sample(),)) for _ in range(10)]
results = run_jobs(jobs)
assert all([r.data.value == square(*j.args).value
for r, j in zip(results, jobs)])
@pytest.mark.slurm
@pytest.mark.timeout(60.0)
def test_slurm_from_script(shared_slurm_tmp_dir):
domain = Domain({
"x": {0, 1, 2, 3},
"y": [-1., 1.],
"z": {"123", "abc"}
}, seed=7)
jobs, dirs = [], []
n_jobs = 4
for i in range(n_jobs):
sample = domain.sample()
dirs.append(tempfile.TemporaryDirectory(dir=shared_slurm_tmp_dir))
jobs.append(SlurmJob(
task="hypertunity/scheduling/tests/script.py",
args=(*sample.as_namedtuple(),),
output_file=f"{os.path.join(dirs[-1].name, 'results.pkl')}",
meta={"binary": "python", "resources": {"cpu": 1}}
))
results = run_jobs(jobs)
assert all([r.data == script.main(*j.args) for r, j in zip(results, jobs)])
# clean-up the temporary dirs
for d in dirs:
d.cleanup()
================================================
FILE: hypertunity/tests/__init__.py
================================================
================================================
FILE: hypertunity/tests/test_domain.py
================================================
from collections import namedtuple
import pytest
from hypertunity.domain import (
Domain,
DomainNotIterableError,
DomainSpecificationError,
Sample
)
@pytest.mark.parametrize("domain,expectation", [
({1: {"b": [2, 3]}, "c": [0, 0.1]},
pytest.raises(DomainSpecificationError)),
({"a": {"b": (1, 2, 3, 4)}, "c": [0, 0.1]},
pytest.raises(DomainSpecificationError)),
({"a": {"b": lambda x: x}, "c": [0, 0.1]},
pytest.raises(DomainSpecificationError)),
# this one should fail from the ast.literal_eval parsing
('{"a": {"b": lambda x: x}, "c": [0, 0.1]}',
pytest.raises(ValueError))
])
def test_invalid_domain(domain, expectation):
with expectation:
Domain(domain)
@pytest.mark.parametrize("domain", [
{"a": {"b": {0, 1}}, "c": [0, 0.1]},
'{"a": {"b": {0, 1}}, "c": [0, 0.1]}'
])
def test_valid_domain(domain):
Domain(domain)
def test_eq():
d1 = Domain({"a": {"b": [2, 3]}, "c": [0, 0.1]})
d2 = Domain({"a": {"b": [2, 3]}, "c": [0, 0.1]})
assert d1 == d2
def test_flatten():
dom = Domain({"a": {"b": [0, 1]}, "c": {0, 0.1}})
assert dom.flatten() == {("a", "b"): [0, 1], ("c",): {0, 0.1}}
def test_addition():
domain_all = Domain({
"a": [1, 2],
"b": {"c": {1, 2, 3}, "d": {"o1", "o2"}},
"e": {3, 4, 5}
})
domain_1 = Domain({"a": [1, 2], "b": {"c": {1, 2, 3}}})
domain_2 = Domain({"b": {"d": {"o1", "o2"}}})
domain_3 = Domain({"e": {3, 4, 5}})
assert domain_1 + domain_2 + domain_3 == domain_all
with pytest.raises(ValueError):
_ = domain_1 + domain_1
def test_serialisation():
domain = Domain({"a": [1, 2], "b": {"c": {1, 2, 3}, "d": {"o1", "o2"}}})
serialised = domain.serialise()
deserialised = Domain.deserialise(serialised)
assert deserialised == domain
def test_as_dict():
dict_domain = {"a": {"b": [2, 3]}, "c": [0, 0.1]}
domain = Domain(dict_domain)
assert domain.as_dict() == dict_domain
def test_as_namedtuple():
domain = Domain({"a": {"b": {2, 3, 4}}, "c": [0, 0.1]})
nt = domain.as_namedtuple()
assert nt.a == namedtuple("_", "b")({2, 3, 4})
assert nt.a.b == {2, 3, 4}
assert nt.c == [0, 0.1]
def test_from_list():
lst = [
(("a", "b"), {2, 3, 4}),
(("c",), {0, 0.1}),
(("d", "e", "f"), {0, 1}),
(("d", "g"), {2, 3})
]
domain_true = Domain({
"a": {"b": {2, 3, 4}},
"c": {0, 0.1},
"d": {"e": {"f": {0, 1}}, "g": {2, 3}}
})
domain_from_list = Domain.from_list(lst)
assert domain_true == domain_from_list
assert lst == list(domain_true.flatten().items())
def test_fail_iter_cont_domain():
with pytest.raises(DomainNotIterableError):
list(iter(Domain({"a": {"b": {2, 3, 4}}, "c": [0, 0.1]})))
def test_iter():
discrete_domain = Domain({
"a": {"b": {2, 3, 4}, "j": {"d": {5, 6}, "f": {"g": {7}}}},
"c": {"op1", 0.1}
})
all_samples = set(iter(discrete_domain))
assert all_samples == {
Sample({'a': {'b': 2, 'j': {'d': 5, 'f': {'g': 7}}}, 'c': 'op1'}),
Sample({'a': {'b': 3, 'j': {'d': 5, 'f': {'g': 7}}}, 'c': 'op1'}),
Sample({'a': {'b': 4, 'j': {'d': 5, 'f': {'g': 7}}}, 'c': 'op1'}),
Sample({'a': {'b': 2, 'j': {'d': 6, 'f': {'g': 7}}}, 'c': 'op1'}),
Sample({'a': {'b': 3, 'j': {'d': 6, 'f': {'g': 7}}}, 'c': 'op1'}),
Sample({'a': {'b': 4, 'j': {'d': 6, 'f': {'g': 7}}}, 'c': 'op1'}),
Sample({'a': {'b': 2, 'j': {'d': 5, 'f': {'g': 7}}}, 'c': 0.1}),
Sample({'a': {'b': 3, 'j': {'d': 5, 'f': {'g': 7}}}, 'c': 0.1}),
Sample({'a': {'b': 4, 'j': {'d': 5, 'f': {'g': 7}}}, 'c': 0.1}),
Sample({'a': {'b': 2, 'j': {'d': 6, 'f': {'g': 7}}}, 'c': 0.1}),
Sample({'a': {'b': 3, 'j': {'d': 6, 'f': {'g': 7}}}, 'c': 0.1}),
Sample({'a': {'b': 4, 'j': {'d': 6, 'f': {'g': 7}}}, 'c': 0.1})
}
def test_sampling():
domain = Domain({"a": {"b": {2, 3, 4}}, "c": [0, 0.1]})
for i in range(10):
sample = domain.sample()
assert sample["a"]["b"] in {2, 3, 4} and 0. <= sample["c"] <= 0.1
def test_split_by_type():
domain = Domain({"x": [1, 2], "y": {-3, 2, 5}, "z": {"small", 1, 0.1}})
discr, cat, cont = domain.split_by_type()
assert sum(domain.split_by_type(), Domain({})) == domain
assert discr == Domain({"y": {-3, 2, 5}})
assert cat == Domain({"z": {"small", 1, 0.1}})
assert cont == Domain({"x": [1, 2]})
================================================
FILE: hypertunity/tests/test_trial.py
================================================
import pytest
from hypertunity import Domain, Trial
from hypertunity.optimisation import RandomSearch
from hypertunity.reports import Table
from hypertunity.scheduling import Job
from hypertunity.scheduling.tests.test_scheduler import run_jobs
def foo(x, y, z):
return x**2 + y**2 - z**3
@pytest.mark.timeout(60.0)
def test_trial_with_callable():
domain = Domain({"x": [-1., 1.], "y": [-2, 2], "z": {1, 2, 3, 4}})
trial = Trial(objective=foo, domain=domain,
optimiser="random_search",
database_path=None,
seed=7, metrics=["score"])
n_steps = 10
batch_size = 2
trial.run(n_steps, batch_size=batch_size, n_parallel=batch_size)
rs = RandomSearch(domain=domain, seed=7)
rep = Table(domain, metrics=["score"])
for i in range(n_steps):
samples = rs.run_step(batch_size=batch_size, minimise=False)
results = [foo(*s.as_namedtuple(), ) for s in samples]
for sample_eval in zip(samples, results):
rep.log(sample_eval)
assert len(trial.optimiser.history) == n_steps * batch_size
assert str(rep.format(order="ascending")) == str(
trial.reporter.format(order="ascending")
)
@pytest.mark.timeout(60.0)
def test_trial_with_script():
domain = Domain({
"--x": {0, 1, 2, 3},
"--y": [-1., 1.],
"--z": {"123", "abc"}
})
trial = Trial(objective="hypertunity/scheduling/tests/script.py",
domain=domain,
optimiser="random_search",
database_path=None,
seed=7, metrics=["score"])
batch_size = 4
trial.run(n_steps=1, batch_size=batch_size, n_parallel=batch_size)
rs = RandomSearch(domain=domain, seed=7)
samples = rs.run_step(batch_size=batch_size)
jobs = [Job(task="hypertunity/scheduling/tests/script.py",
args=s.as_dict(),
meta={"binary": "python"}) for s in samples]
results = [r.data for r in run_jobs(jobs)]
assert results == [h.metrics["score"].value
for h in trial.optimiser.history]
================================================
FILE: hypertunity/tests/test_utils.py
================================================
import queue
import pytest
from .. import utils
try:
from contextlib import nullcontext
except ImportError:
from contextlib import contextmanager
@contextmanager
def nullcontext():
yield
def test_support_american_spelling():
@utils.support_american_spelling
def gb_spelling_func(minimise, optimise, maximise):
return minimise, optimise, maximise
expected = (True, 1, None)
assert gb_spelling_func(minimise=True, optimise=1, maximise=None) == expected
assert gb_spelling_func(minimize=True, optimize=1, maximize=None) == expected
@pytest.mark.parametrize("test_input,expectation", [
(("vxc", "", "", "___"), nullcontext()),
(("_", "_", ""), nullcontext()),
(("asd",), nullcontext()),
(("asd", "dxcv"), nullcontext()),
(("asd", "\\", "\n"), pytest.raises(ValueError))
])
def test_split_and_join_strings(test_input, expectation):
with expectation:
assert test_input == utils.split_string(
utils.join_strings(test_input, join_char="_"),
split_char="_"
)
def test_drain_queue():
q = queue.Queue(10)
elems = list(range(10))
for i in elems:
q.put(i)
items = utils.drain_queue(q)
assert items == elems
with pytest.raises(queue.Empty):
q.get_nowait()
================================================
FILE: hypertunity/trial.py
================================================
"""A wrapper class for conducting multiple experiments, scheduling jobs and
saving results.
"""
from typing import Callable, Type, Union
from hypertunity import optimisation, reports, utils
from hypertunity.domain import Domain
from hypertunity.optimisation import Optimiser
from hypertunity.reports import Reporter
from hypertunity.scheduling import Job, Scheduler, SlurmJob
__all__ = [
"Trial"
]
OptimiserTypes = Union[str, Type[Optimiser], Optimiser]
ReporterTypes = Union[str, Type[Reporter], Reporter]
class Trial:
"""High-level API class for running hyperparameter optimisation.
This class encapsulates optimiser querying, job building, scheduling and
results collection as well as checkpointing and report generation.
"""
@utils.support_american_spelling
def __init__(self, objective: Union[Callable, str],
domain: Domain,
optimiser: OptimiserTypes = "bo",
reporter: ReporterTypes = "table",
device: str = "local",
**kwargs):
"""Initialise the :class:`Trial` experiment manager.
Args:
objective: :obj:`Callable` or :obj:`str`. The objective function or
script to run.
domain: :class:`Domain`. The optimisation domain of the objective
function.
optimiser: :class:`Optimiser` or :obj:`str`. The optimiser method
for domain sampling.
reporter: :class:`Reporter` or :obj:`str`. The reporting method for
the results.
device: :obj:`str`. The host device running the evaluations. Can be
'local' or 'slurm'.
**kwargs: additional parameters for the optimiser, reporter and
scheduler.
Keyword Args:
timeout: :obj:`float`. The number of seconds to wait for a
:class:`Job` instance to finish. Default is 259200 seconds,
or approximately 3 days.
"""
self.objective = objective
self.domain = domain
self.optimiser = self._init_optimiser(optimiser, **kwargs)
self.reporter = self._init_reporter(reporter, **kwargs)
self.scheduler = Scheduler
# 259200 is the number of seconds contained in 3 days
self._timeout = kwargs.get("timeout", 259200.0)
self._job = self._init_job(device)
def _init_optimiser(self, optimiser: OptimiserTypes, **kwargs) -> Optimiser:
if isinstance(optimiser, str):
optimiser_class = get_optimiser(optimiser)
elif issubclass(optimiser, Optimiser):
optimiser_class = optimiser
elif isinstance(optimiser, Optimiser):
return optimiser
else:
raise TypeError(
"An optimiser must be a either a string, "
"an Optimiser type or an Optimiser instance."
)
opt_kwargs = {}
if "seed" in kwargs:
opt_kwargs["seed"] = kwargs["seed"]
return optimiser_class(self.domain, **opt_kwargs)
def _init_reporter(self, reporter: ReporterTypes, **kwargs) -> Reporter:
if isinstance(reporter, str):
reporter_class = get_reporter(reporter)
elif issubclass(reporter, Reporter):
reporter_class = reporter
elif isinstance(reporter, Reporter):
return reporter
else:
raise TypeError("A reporter must be either a string, "
"a Reporter type or a Reporter instance.")
rep_kwargs = {"metrics": kwargs.get("metrics", ["score"]),
"database_path": kwargs.get("database_path", ".")}
if not issubclass(reporter_class, reports.Table):
rep_kwargs["logdir"] = kwargs.get("logdir", "tensorboard/")
return reporter_class(self.domain, **rep_kwargs)
@staticmethod
def _init_job(device: str) -> Type[Job]:
device = device.lower()
if device == "local":
return Job
if device == "slurm":
return SlurmJob
raise ValueError(
f"Unknown device {device}. Select one from {{'local', 'slurm'}}."
)
def run(self, n_steps: int, n_parallel: int = 1, **kwargs):
"""Run the optimisation and objective function evaluation.
Args:
n_steps: :obj:`int`. The total number of optimisation steps.
n_parallel: (optional) :obj:`int`. The number of jobs that can be
scheduled at once.
**kwargs: additional keyword arguments for the optimisation,
supplied to the :py:meth:`run_step` method of the
:class:`Optimiser` instance.
Keyword Args:
batch_size: (optional) :obj:`int`. The number of samples that are
suggested at once. Default is 1.
minimise: (optional) :obj:`bool`. If the optimiser is
:class:`BayesianOptimisation` then this flag tells whether the
objective function is being minimised or maximised. Otherwise
it has no effect. Default is `False`.
"""
batch_size = kwargs.get("batch_size", 1)
n_parallel = min(n_parallel, batch_size)
with self.scheduler(n_parallel=n_parallel) as scheduler:
for i in range(n_steps):
samples = self.optimiser.run_step(
batch_size=batch_size,
minimise=kwargs.get("minimise", False)
)
jobs = [
self._job(task=self.objective, args=s.as_dict())
for s in samples
]
scheduler.dispatch(jobs)
evaluations = [
r.data for r in scheduler.collect(
n_results=batch_size, timeout=self._timeout
)
]
self.optimiser.update(samples, evaluations)
for s, e, j in zip(samples, evaluations, jobs):
self.reporter.log((s, e), meta={"job_id": j.id})
def get_optimiser(name: str) -> Type[Optimiser]:
name = name.lower()
if name.startswith(("bayes", "bo")):
return optimisation.BayesianOptimisation
if name.startswith("random"):
return optimisation.RandomSearch
if name.startswith(("grid", "exhaustive")):
return optimisation.GridSearch
raise ValueError(
f"Unknown optimiser {name}. Select one from "
f"{{'bayesian_optimisation', 'random_search', 'grid_search'}}."
)
def get_reporter(name: str) -> Type[Reporter]:
name = name.lower()
if name.startswith("table"):
return reports.Table
if name.startswith(("tensor", "tb")):
import reports.tensorboard as tb
return tb.Tensorboard
raise ValueError(
f"Unknown reporter {name}. Select one from {{'table', 'tensorboard'}}."
)
================================================
FILE: hypertunity/utils.py
================================================
import queue
from functools import wraps
GB_US_SPELLING = {
"minimise": "minimize",
"maximise": "maximize",
"optimise": "optimize",
"optimiser": "optimizer",
"emphasise": "emphasize"
}
US_GB_SPELLING = {us: gb for gb, us in GB_US_SPELLING.items()}
def support_american_spelling(func):
"""Convert American spelling keyword arguments to British
(default for hypertunity).
Args:
func: a Python callable to decorate.
Returns:
The decorated function which supports American keyword arguments.
"""
# using functools.wraps(func) enables automated documentation generation
# for more information see: https://github.com/sphinx-doc/sphinx/issues/3783
@wraps(func)
def british_spelling_func(*args, **kwargs):
gb_kwargs = {US_GB_SPELLING.get(kw, kw): val
for kw, val in kwargs.items()}
return func(*args, **gb_kwargs)
return british_spelling_func
def join_strings(strings, join_char="_"):
"""Join list of strings with an underscore.
The strings must contain string.printable characters only, otherwise an
exception is raised. If one of the strings has already an underscore, it
will be replace by a null character.
Args:
strings: iterable of strings.
join_char: str, the character to join with.
Returns:
The joined string with an underscore character.
Examples:
```python
>>> join_strings(['asd', '', '_xcv__'])
'asd__\x00xcv\x00\x00'
```
Raises:
ValueError if a string contains an unprintable character.
"""
all_cleaned = []
for s in strings:
if not s.isprintable():
raise ValueError(
"Encountered unexpected name containing non-printable characters."
)
all_cleaned.append(s.replace(join_char, "\0"))
return join_char.join(all_cleaned)
def split_string(joined, split_char="_"):
"""Split joined string and substitute back the null characters with an
underscore if necessary.
Inverse function of `join_strings(strings)`.
Args:
joined: str, the joined representation of the substrings.
split_char: str, the character to split by.
Returns:
A tuple of strings with the splitting character (underscore) removed.
Examples:
```python
>>> split_string('asd__\x00xcv\x00\x00')
('asd', '', '_xcv__')
```
"""
strings = joined.split(split_char)
strings_copy = []
for s in strings:
strings_copy.append(s.replace("\0", split_char))
return tuple(strings_copy)
def drain_queue(q, close_queue=False):
"""Get all items from a queue until an `Empty` exception is raised.
Args:
q: `Queue`, the queue to drain.
close_queue: bool, whether to close the queue, such that no other
object can be put in. Default is False.
Returns:
List of all items from the queue.
"""
items = []
while True:
try:
it = q.get_nowait()
except queue.Empty:
break
items.append(it)
if close_queue:
q.close()
return items
================================================
FILE: setup.py
================================================
import re
from setuptools import setup, find_packages
with open("hypertunity/__init__.py", "r", encoding="utf8") as f:
version = re.search(r"__version__ = [\'\"](.*?)[\'\"]", f.read()).group(1)
with open("README.md", "r", encoding="utf8") as f:
readme = f.read()
required_packages = [
"beautifultable>=0.7.0",
"dataclasses;python_version<'3.7'",
"gpy>=1.9.8",
"gpyopt==1.2.5",
"joblib>=0.13.2",
"matplotlib>=3.0",
"numpy>=1.16",
"tinydb>=3.13.0"
]
extras = {
"tensorboard": ["tensorflow>=1.14.0", "tensorboard>=1.14.0"],
"tests": ["pytest>=4.6.3", "pytest-timeout>=1.3.3"],
"docs": ["sphinx>=2.2.0", "sphinx_rtd_theme>=0.4.3"]
}
classifiers = [
"Development Status :: 5 - Production/Stable",
"Intended Audience :: Developers",
"Intended Audience :: Education",
"Intended Audience :: Science/Research",
"License :: OSI Approved :: Apache Software License",
"Programming Language :: Python :: 3.6",
"Programming Language :: Python :: 3.7",
"Programming Language :: Python :: 3.8",
"Topic :: Software Development :: Libraries",
"Topic :: Software Development :: Libraries :: Python Modules"
]
setup(
name="hypertunity",
version=version,
author="Georgi Dikov",
author_email="gvdikov@gmail.com",
url="https://github.com/gdikov/hypertunity",
description="A toolset for distributed black-box hyperparameter optimisation.",
long_description=readme,
long_description_content_type='text/markdown',
packages=find_packages(exclude=["*.tests", "*.tests.*", "tests.*", "tests"]),
python_requires=">=3.6",
install_requires=required_packages,
extras_require=extras,
classifiers=classifiers
)