Repository: gdikov/hypertunity
Branch: master
Commit: 768b26137f36
Files: 54
Total size: 170.1 KB

Directory structure:
gitextract_y6a8vwvu/

├── .circleci/
│   └── config.yml
├── .gitignore
├── .readthedocs.yml
├── CHANGELOG.md
├── LICENSE
├── README.md
├── conftest.py
├── docs/
│   ├── Makefile
│   ├── conf.py
│   ├── index.rst
│   ├── manual/
│   │   ├── domain.rst
│   │   ├── installation.rst
│   │   ├── optimisation.rst
│   │   ├── quickstart.rst
│   │   ├── reports.rst
│   │   └── scheduling.rst
│   └── source/
│       ├── hypertunity.rst
│       ├── optimisation.rst
│       ├── reports.rst
│       └── scheduling.rst
├── hypertunity/
│   ├── __init__.py
│   ├── domain.py
│   ├── optimisation/
│   │   ├── __init__.py
│   │   ├── base.py
│   │   ├── bo.py
│   │   ├── exhaustive.py
│   │   ├── random.py
│   │   └── tests/
│   │       ├── __init__.py
│   │       ├── _common.py
│   │       ├── test_bo.py
│   │       ├── test_exhaustive.py
│   │       └── test_random.py
│   ├── reports/
│   │   ├── __init__.py
│   │   ├── base.py
│   │   ├── table.py
│   │   ├── tensorboard.py
│   │   └── tests/
│   │       ├── __init__.py
│   │       ├── conftest.py
│   │       ├── test_table.py
│   │       └── test_tensorboard.py
│   ├── scheduling/
│   │   ├── __init__.py
│   │   ├── jobs.py
│   │   ├── scheduler.py
│   │   └── tests/
│   │       ├── __init__.py
│   │       ├── script.py
│   │       ├── test_jobs.py
│   │       └── test_scheduler.py
│   ├── tests/
│   │   ├── __init__.py
│   │   ├── test_domain.py
│   │   ├── test_trial.py
│   │   └── test_utils.py
│   ├── trial.py
│   └── utils.py
└── setup.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .circleci/config.yml
================================================
# Python CircleCI 2.0 configuration file
version: 2
jobs:
  build:
    docker:
      - image: circleci/python:3.7.3

    working_directory: ~/repo

    steps:
      - checkout

      - restore_cache:
          keys:
          - env-build

      - run:
          name: setup env
          command: |
            python3 -m venv venv
            . venv/bin/activate
            pip install --upgrade pip
            pip install ./[tensorboard,tests,docs]
      - save_cache:
          paths:
            - ./venv
          key: env-build

      - run:
          name: run tests
          command: |
            . venv/bin/activate
            py.test --verbose --runslow hypertunity
      - store_artifacts:
          path: test-reports
          destination: test-reports


================================================
FILE: .gitignore
================================================
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit tests / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
.pytest_cache/

# Jupyter Notebook
.ipynb_checkpoints

# Environments
.venv*

# Pycharm project settings
.idea

# mkdocs documentation
/site

# mypy
.mypy_cache/

# Sphinx documentation
/docs/_build


================================================
FILE: .readthedocs.yml
================================================
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details

version: 2

# Sphinx settings
sphinx:
  builder: html
  configuration: docs/conf.py
  fail_on_warning: true

# Python settings
python:
   version: 3.7
   install:
      - method: pip
        path: .
        extra_requirements:
            - docs


================================================
FILE: CHANGELOG.md
================================================
# Changelog
All notable changes to this project will be documented in this file.

## [Unreleased]

## [1.0.1] - 2020-01-27
## Changed
- some code style related changes are applied, such as import sorting and line length shortening.
- refactoring in tests to use pytest parameterisation and fixtures.

## Fixed
- issue with running callables from script thanks to David Turner (https://github.com/gdikov/hypertunity/pull/43).
- issue with tensorflow version comparison in the tensorboard reporter.

## [1.0.0] - 2019-11-10
## Added
- `Reporter` instance can be loaded with data from the database of another reporter using a `from_database()` method.
- data from a `Reporter` instance can be exported into a `HistoryPoint` list to load into an optimiser.
- compiled documentation and logo.
- `BayesianOptimisation` raises `ExhaustedSearchSpaceError` if a discrete domain is exhausted.

## Changed
- minor fixes in documentation typos, argument names and tests.
- `Domain` is moved from `hypertunity.optimisation` to the `hypertunity` package.
- rename `TableReporter` to `Table` and `TensorboardReporter` to `Tensorboard`.
- `ExhaustedSearchSpaceError` is moved from `optimisation.exhastive` to `optimisation.base` module.
- `Trial` running a task from a job is now done with dict as input keyword arguments or named command line arguments.

## Fixed
- bug in `BayesianOptimisation` sample conversion for nested dictionaries.
- bug in `BayesianOptimisation` type preserving between the domain and the sample value.
- bug in `Tensorboard` reporter for real intervals with integer boundaries. 
- bug in `Reporter` for not using the default metric name during logging.

## [0.4.0] - 2019-09-15
## Added
- `Trial` a wrapper class for high-level usage, which runs the optimiser, evaluates the objective
 by scheduling jobs, updates the optimiser and summarises the results.
- a `Job` from a script with command line arguments can now be run with 
 named arguments passed as a dictionary instead of a tuple.
- checkpointing of results on disk when calling `log()` or a `Reporter` object.
- optimisation history can now be loaded into an `Optimiser`. Example use-case would be to warm-start
`BayesianOptimisation` from the history of the quicker `RandomSearch`.

## Changed
- every `Reporter` instance has a `primary_metric` attribute, which is an argument to `__init__`.

## Fixed
- validation of `Domain` is not allowing for intervals with more than 2 numbers.
- minor bugs in tests.

## [0.3.1] - 2019-09-10
## Fixed
- `Optimiser.update()` now accepts evaluation arguments that are float, `EvaluationScore` or a dict
 with metric names and floats or `EvaluationScore`s. This is valid for all optimisers. 

## [0.3.0] - 2019-09-08
## Added
- `Job` can now be scheduled locally to run command line scripts with arguments.
- `BayesianOptimisation.run_step` can pass arguments to the backend for better customisation.

## Changed
- any `Reporter` object can be fed with data from a tuple of a 
`Sample` object and a score, which can be a float or an `EvaluationScore`.
- `BayesianOptimisation` optimiser can be updated with a `Sample` and 
a float or `EvaluationScore` objective evaluation types.
- a discrete/categorical `Domain` is defined with a set literal instead of a tuple.
- `Job` supports running functions from within a script by specifying 'script_path::func_name'.
- `batch_size` is no more an attribute of an `Optimiser` but an argument to `run_step`. 
- `minimise` is no more an attribute of `BayesianOptimisation` but an argument to `run_step`.

## [0.2.0] - 2019-08-28
## Added
- `Scheduler` to run jobs locally using joblib.
- `SlurmJob` and `Job` dataclasses defining the tasks to be scheduled.
- `Result` dataclass encapsulating the results from the tasks.
- `TableReporter` class for reporting results in tabular format.
- `Reporter` base class for extending reporters.

## Changed
- `Base`-prefix is removed from all base classes which reside 
 in `base.py` modules.
- `split_by_type` is now a method of the `Domain` class.
- `Optimiser` has a `batch_size` attribute accessible as a property.

## Removed
- `optimisation.bo` package has been removed. Instead a single `bo.py`
 module supports the only BO backend---GPyOpt, as of now.
- prefix for the file encoding (default is utf-8).
 
## [0.1.0] - 2019-07-27
### Added
- `TensorboardReporter` result logger using `HParams`.
- `GpyOpt` backend for `BayesianOptimisation`.
- `RandomSearch` optimiser.
- `GridSearch` optimiser.
- `Domain` and `Sample` classes as foundations for the optimisers.


================================================
FILE: LICENSE
================================================
                                 Apache License
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/

   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

   1. Definitions.

      "License" shall mean the terms and conditions for use, reproduction,
      and distribution as defined by Sections 1 through 9 of this document.

      "Licensor" shall mean the copyright owner or entity authorized by
      the copyright owner that is granting the License.

      "Legal Entity" shall mean the union of the acting entity and all
      other entities that control, are controlled by, or are under common
      control with that entity. For the purposes of this definition,
      "control" means (i) the power, direct or indirect, to cause the
      direction or management of such entity, whether by contract or
      otherwise, or (ii) ownership of fifty percent (50%) or more of the
      outstanding shares, or (iii) beneficial ownership of such entity.

      "You" (or "Your") shall mean an individual or Legal Entity
      exercising permissions granted by this License.

      "Source" form shall mean the preferred form for making modifications,
      including but not limited to software source code, documentation
      source, and configuration files.

      "Object" form shall mean any form resulting from mechanical
      transformation or translation of a Source form, including but
      not limited to compiled object code, generated documentation,
      and conversions to other media types.

      "Work" shall mean the work of authorship, whether in Source or
      Object form, made available under the License, as indicated by a
      copyright notice that is included in or attached to the work
      (an example is provided in the Appendix below).

      "Derivative Works" shall mean any work, whether in Source or Object
      form, that is based on (or derived from) the Work and for which the
      editorial revisions, annotations, elaborations, or other modifications
      represent, as a whole, an original work of authorship. For the purposes
      of this License, Derivative Works shall not include works that remain
      separable from, or merely link (or bind by name) to the interfaces of,
      the Work and Derivative Works thereof.

      "Contribution" shall mean any work of authorship, including
      the original version of the Work and any modifications or additions
      to that Work or Derivative Works thereof, that is intentionally
      submitted to Licensor for inclusion in the Work by the copyright owner
      or by an individual or Legal Entity authorized to submit on behalf of
      the copyright owner. For the purposes of this definition, "submitted"
      means any form of electronic, verbal, or written communication sent
      to the Licensor or its representatives, including but not limited to
      communication on electronic mailing lists, source code control systems,
      and issue tracking systems that are managed by, or on behalf of, the
      Licensor for the purpose of discussing and improving the Work, but
      excluding communication that is conspicuously marked or otherwise
      designated in writing by the copyright owner as "Not a Contribution."

      "Contributor" shall mean Licensor and any individual or Legal Entity
      on behalf of whom a Contribution has been received by Licensor and
      subsequently incorporated within the Work.

   2. Grant of Copyright License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      copyright license to reproduce, prepare Derivative Works of,
      publicly display, publicly perform, sublicense, and distribute the
      Work and such Derivative Works in Source or Object form.

   3. Grant of Patent License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      (except as stated in this section) patent license to make, have made,
      use, offer to sell, sell, import, and otherwise transfer the Work,
      where such license applies only to those patent claims licensable
      by such Contributor that are necessarily infringed by their
      Contribution(s) alone or by combination of their Contribution(s)
      with the Work to which such Contribution(s) was submitted. If You
      institute patent litigation against any entity (including a
      cross-claim or counterclaim in a lawsuit) alleging that the Work
      or a Contribution incorporated within the Work constitutes direct
      or contributory patent infringement, then any patent licenses
      granted to You under this License for that Work shall terminate
      as of the date such litigation is filed.

   4. Redistribution. You may reproduce and distribute copies of the
      Work or Derivative Works thereof in any medium, with or without
      modifications, and in Source or Object form, provided that You
      meet the following conditions:

      (a) You must give any other recipients of the Work or
          Derivative Works a copy of this License; and

      (b) You must cause any modified files to carry prominent notices
          stating that You changed the files; and

      (c) You must retain, in the Source form of any Derivative Works
          that You distribute, all copyright, patent, trademark, and
          attribution notices from the Source form of the Work,
          excluding those notices that do not pertain to any part of
          the Derivative Works; and

      (d) If the Work includes a "NOTICE" text file as part of its
          distribution, then any Derivative Works that You distribute must
          include a readable copy of the attribution notices contained
          within such NOTICE file, excluding those notices that do not
          pertain to any part of the Derivative Works, in at least one
          of the following places: within a NOTICE text file distributed
          as part of the Derivative Works; within the Source form or
          documentation, if provided along with the Derivative Works; or,
          within a display generated by the Derivative Works, if and
          wherever such third-party notices normally appear. The contents
          of the NOTICE file are for informational purposes only and
          do not modify the License. You may add Your own attribution
          notices within Derivative Works that You distribute, alongside
          or as an addendum to the NOTICE text from the Work, provided
          that such additional attribution notices cannot be construed
          as modifying the License.

      You may add Your own copyright statement to Your modifications and
      may provide additional or different license terms and conditions
      for use, reproduction, or distribution of Your modifications, or
      for any such Derivative Works as a whole, provided Your use,
      reproduction, and distribution of the Work otherwise complies with
      the conditions stated in this License.

   5. Submission of Contributions. Unless You explicitly state otherwise,
      any Contribution intentionally submitted for inclusion in the Work
      by You to the Licensor shall be under the terms and conditions of
      this License, without any additional terms or conditions.
      Notwithstanding the above, nothing herein shall supersede or modify
      the terms of any separate license agreement you may have executed
      with Licensor regarding such Contributions.

   6. Trademarks. This License does not grant permission to use the trade
      names, trademarks, service marks, or product names of the Licensor,
      except as required for reasonable and customary use in describing the
      origin of the Work and reproducing the content of the NOTICE file.

   7. Disclaimer of Warranty. Unless required by applicable law or
      agreed to in writing, Licensor provides the Work (and each
      Contributor provides its Contributions) on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
      implied, including, without limitation, any warranties or conditions
      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
      PARTICULAR PURPOSE. You are solely responsible for determining the
      appropriateness of using or redistributing the Work and assume any
      risks associated with Your exercise of permissions under this License.

   8. Limitation of Liability. In no event and under no legal theory,
      whether in tort (including negligence), contract, or otherwise,
      unless required by applicable law (such as deliberate and grossly
      negligent acts) or agreed to in writing, shall any Contributor be
      liable to You for damages, including any direct, indirect, special,
      incidental, or consequential damages of any character arising as a
      result of this License or out of the use or inability to use the
      Work (including but not limited to damages for loss of goodwill,
      work stoppage, computer failure or malfunction, or any and all
      other commercial damages or losses), even if such Contributor
      has been advised of the possibility of such damages.

   9. Accepting Warranty or Additional Liability. While redistributing
      the Work or Derivative Works thereof, You may choose to offer,
      and charge a fee for, acceptance of support, warranty, indemnity,
      or other liability obligations and/or rights consistent with this
      License. However, in accepting such obligations, You may act only
      on Your own behalf and on Your sole responsibility, not on behalf
      of any other Contributor, and only if You agree to indemnify,
      defend, and hold each Contributor harmless for any liability
      incurred by, or claims asserted against, such Contributor by reason
      of your accepting any such warranty or additional liability.

   END OF TERMS AND CONDITIONS

   APPENDIX: How to apply the Apache License to your work.

      To apply the Apache License to your work, attach the following
      boilerplate notice, with the fields enclosed by brackets "[]"
      replaced with your own identifying information. (Don't include
      the brackets!)  The text should be enclosed in the appropriate
      comment syntax for the file format. We also recommend that a
      file or class name and description of purpose be included on the
      same "printed page" as the copyright notice for easier
      identification within third-party archives.

   Copyright [yyyy] [name of copyright owner]

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.


================================================
FILE: README.md
================================================
<div align="center">
  <img src="https://raw.githubusercontent.com/gdikov/hypertunity/master/docs/_static/images/logo.svg?sanitize=true" width="100%">
</div>

[![CircleCI](https://img.shields.io/circleci/build/github/gdikov/hypertunity)](https://circleci.com/gh/gdikov/hypertunity)
[![Documentation Status](https://readthedocs.org/projects/hypertunity/badge/?version=latest)](https://hypertunity.readthedocs.io/en/latest/?badge=latest)
![GitHub](https://img.shields.io/github/license/gdikov/hypertunity)

## Why Hypertunity

Hypertunity is a lightweight, high-level library for hyperparameter optimisation. 
Among others, it supports:
 * Bayesian optimisation by wrapping [GPyOpt](http://sheffieldml.github.io/GPyOpt/),
 * external or internal objective function evaluation by a scheduler, also compatible with [Slurm](https://slurm.schedmd.com),
 * real-time visualisation of results in [Tensorboard](https://www.tensorflow.org/tensorboard) 
 via the [HParams](https://www.tensorflow.org/tensorboard/r2/hyperparameter_tuning_with_hparams) plugin.

For the full set of features refer to the [documentation](https://hypertunity.readthedocs.io).

## Quick start

Define the objective function to optimise. For example, it can take the hyperparameters `params` as input and 
return a raw value `score` as output:

```python
import hypertunity as ht

def foo(**params) -> float:
    # do some very costly computations
    ...
    return score
```

To define the valid ranges for the values of `params` we create a `Domain` object:

```python
domain = ht.Domain({
    "x": [-10., 10.],         # continuous variable within the interval [-10., 10.]
    "y": {"opt1", "opt2"},    # categorical variable from the set {"opt1", "opt2"}
    "z": set(range(4))        # discrete variable from the set {0, 1, 2, 3}
})
```

Then we set up the optimiser:

```python
bo = ht.BayesianOptimisation(domain=domain)
```

And we run the optimisation for 10 steps. Each result is used to update the optimiser so that informed domain 
samples are drawn:

```python
n_steps = 10
for i in range(n_steps):
    samples = bo.run_step(batch_size=2, minimise=True)      # suggest next samples
    evaluations = [foo(**s.as_dict()) for s in samples]     # evaluate foo
    bo.update(samples, evaluations)                         # update the optimiser
```

Finally, we visualise the results in Tensorboard: 

```python
import hypertunity.reports.tensorboard as tb

results = tb.Tensorboard(domain=domain, metrics=["score"], logdir="path/to/logdir")
results.from_history(bo.history)
```

## Even quicker start

A high-level wrapper class `Trial` allows for seamless parallel optimisation
without bothering with scheduling jobs, updating optimisers and logging:
   
```python
trial = ht.Trial(objective=foo,
                 domain=domain,
                 optimiser="bo",
                 reporter="tensorboard",
                 metrics=["score"])
trial.run(n_steps, batch_size=2, n_parallel=2)
```

## Installation

### Using PyPI
To install the base version run:
```bash
pip install hypertunity
```
To use the Tensorboard dashboard, build the docs or run the test suite you will need the following extras:
```bash
pip install hypertunity[tensorboard,docs,tests]
```

### From source
Checkout the latest master and install locally:
```bash
git clone https://github.com/gdikov/hypertunity.git
cd hypertunity
pip install ./[tensorboard]
```


================================================
FILE: conftest.py
================================================
import pytest


def pytest_addoption(parser):
    parser.addoption(
        "--runslow",
        action="store_true",
        default=False,
        help="run slow tests"
    )
    parser.addoption(
        "--runslurm",
        action="store_true",
        default=False,
        help="run slurm tests"
    )


def pytest_configure(config):
    config.addinivalue_line(
        "markers", "slow: mark test as slow to run"
    )
    config.addinivalue_line(
        "markers", "slurm: mark test which require slurm to run"
    )


def pytest_collection_modifyitems(config, items):
    def mark_skip(keyword):
        if config.getoption(f"--run{keyword}"):
            return
        skip = pytest.mark.skip(reason=f"need --run{keyword} option to run")
        for item in items:
            if keyword in item.keywords:
                item.add_marker(skip)

    mark_skip("slow")
    mark_skip("slurm")


================================================
FILE: docs/Makefile
================================================
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS    ?=
SPHINXBUILD   ?= sphinx-build
SOURCEDIR     = .
BUILDDIR      = _build

# Put it first so that "make" without argument is like "make help".
help:
	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)


================================================
FILE: docs/conf.py
================================================
# Configuration file for the Sphinx documentation builder.
#
# This file only contains a selection of the most common options. For a full
# list see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html

# -- Path setup --------------------------------------------------------------

# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
import os
import sys


sys.path.insert(0, os.path.abspath('..'))
import hypertunity

# The short X.Y version.
version = '.'.join(hypertunity.__version__.split('.', 2)[:2])
# The full version, including alpha/beta/rc tags.
release = hypertunity.__version__


# -- Project information -----------------------------------------------------

project = 'Hypertunity'
copyright = '2019, Georgi Dikov'
author = 'Georgi Dikov'


# -- General configuration ---------------------------------------------------

# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
    'sphinx.ext.autodoc',
    'sphinx.ext.autosummary',
    'sphinx.ext.napoleon',
    'sphinx.ext.viewcode'
]

# Napoleon settings
napoleon_google_docstring = True
napoleon_numpy_docstring = False
napoleon_include_init_with_doc = True
napoleon_include_private_with_doc = False
napoleon_include_special_with_doc = True
napoleon_use_admonition_for_examples = False
napoleon_use_admonition_for_notes = True
napoleon_use_admonition_for_references = True
napoleon_use_ivar = True
napoleon_use_param = True
napoleon_use_keyword = True
napoleon_use_rtype = True

autodoc_typehints = 'none'
autodoc_mock_imports = ['tensorflow', 'tensorboard']


source_suffix = '.rst'


# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store', 'test*']


# -- Options for HTML output -------------------------------------------------
html_theme = 'sphinx_rtd_theme'

pygments_style = 'sphinx'
add_module_names = False

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']

# this is needed as HTML5 causes an ugly rendering of the "Parameters", "Returns", etc. fields
html4_writer = True

html_theme_options = {
    "logo_only": True,
    'display_version': True,
    'style_nav_header_background': '#002A3F',
    # Toc options
    'collapse_navigation': True
}

html_context = {
    "display_github": True,     # Add 'Edit on Github' link instead of 'View page source'
    # "last_updated": True,
    # "commit": False,
}

html_logo = "_static/images/logo_inverted.svg"
html_favicon = '_static/images/favicon.ico'

github_url = "https://github.com/gdikov/hypertunity"


================================================
FILE: docs/index.rst
================================================
:github_url: https://github.com/gdikov/hypertunity

.. image:: _static/images/logo.svg
  :width: 800
  :align: center
  :alt: Hypertunity logo

========
Welcome!
========

Hypertunity is a lightweight, high-level library for hyperparameter optimisation.
Among others, it supports:

* Bayesian optimisation by wrapping `GPyOpt <http://sheffieldml.github.io/GPyOpt/>`_
* external or internal objective evaluation using a scheduler, also compatible with `Slurm <https://slurm.schedmd.com>`_
* real-time visualisation of results in `Tensorboard <https://www.tensorflow.org/tensorboard>`_ using the `HParams <https://www.tensorflow.org/tensorboard/r2/hyperparameter_tuning_with_hparams>`_ plugin.

The main guiding design principles are:

* **Modular**: you can use any optimiser and reporter as well as schedule jobs locally or on Slurm without changes in the API.
* **Simple**: the small codebase (just about 1000 LOC) and the flat subpackage hierarchy makes it easy to use, maintain and extend.
* **Extensible**: base classes such as :class:`Optimiser`, :class:`Job` and :class:`Reporter` allow for seamless implementation of customized functionality.


.. toctree::
  :maxdepth: 2
  :caption: User Guide

  manual/installation
  manual/quickstart
  manual/domain
  manual/optimisation
  manual/reports
  manual/scheduling


.. toctree::
  :maxdepth: 2
  :caption: API Reference

  source/hypertunity
  source/optimisation
  source/reports
  source/scheduling


Indices and tables
------------------

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`


================================================
FILE: docs/manual/domain.rst
================================================
Domain
======

The set of all hyperparameters and the corresponding ranges of possible values is specified using the :class:`Domain` class.
It can be initialised with a dictionary mapping parameter names to continuous numeric intervals or discrete sets.
The former are given as python :obj:`list` and the latter---as :obj:`set`.

For example, to define a domain over the continuous interval [-10, 10] and the discrete set of
strings {"option_1", "option_2"}, it suffices to write:

.. code-block:: python

    domain = Domain({"var_1": [-10, 10], "var_2": {"option_1", "option_2"}})

where ``"var_1"`` and ``"var_2"`` are two arbitrary names for the two subdomains.

Given this domain we can now generate samples from it using the :py:meth:`sample()` method:

.. code-block:: python

    >>> domain.sample()
    {'var_1': -8.529187978165552, 'var_2': 'option_1'}

The returned objects are of class :class:`Sample` and represent one realisation of the domain.
It is represented as a mapping of parameter names to samples from the set of possible values.
It also has a handy conversion methods such as :py:meth:`as_dict()` or :py:meth:`as_namedtuple()` which enable accessing
parameters using the `["var_1"]` or `.var_1` notation.

Both :class:`Domain` and :class:`Sample` objects allow for nested subdomains, e.g.:

.. code-block:: python

    >>> domain = Domain({
    ...    "subdomain_a": {"var_1": [-10, 10], "var_2": {"option_1", "option_2"}},
    ...    "subdomain_b": {"var_1": [-1, 1], "var_2": {"option_1", "option_2"}}
    ... })
    >>> sample = domain.sample()
    >>> sample
    {
        'subdomain_a': {'var_1': -6.892359956494582, 'var_2': 'option_2'},
        'subdomain_b': {'var_1': 0.21004903180560652, 'var_2': 'option_1'}
    }
    >>> nt_sample = sample.as_namedtuple()
    >>> nt_sample.subdomain_a.var_2
    'option_2'


================================================
FILE: docs/manual/installation.rst
================================================
Installation
============

Requirements
------------

Hypertunity has been tested with Python 3.6 and 3.7. As of now, there are no plans to support earlier versions of Python.
The reason for that is the usage of variable and function annotations, dataclasses as well as relying on the fact that the
insertion order of the keys in a dictionary is preserved during iteration. Porting Hypertunity to earlier versions will
only make it unnecessarily hard to maintain.

From PyPI
---------

To get the latest stable release just run:

.. code-block:: bash

    pip install hypertunity

Note that this will install the basic version only, without support for Tensorboard visualisations.
To enable this feature you will need to specify the option `tensorboard`.
To run the tests or compile the docs add the `tests` and `docs` options respectively:

.. code-block:: bash

    pip install hypertunity[tensorboard,tests,docs]


From source
-----------

To install the bleeding-edge version of Hypertunity, clone the repository, checkout the master branch
and install from source:

.. code-block:: bash

    git clone https://github.com/gdikov/hypertunity.git
    cd hypertunity
    git checkout master
    pip install ./[tensorboard,tests,docs]


================================================
FILE: docs/manual/optimisation.rst
================================================
Optimisation
============

Hypertunity ships with three types of hyperparameter space exploration algorithms. A Bayesian optimisation, random and
grid search. While the first one is sequential in nature and requires evaluations to update its internal model of the
objective function, so that more informed sample suggestions are generated, the latter two are able to generate all samples
in parallel and do not require updating. In this section we will give a brief overview of each.

Bayesian optimisation
---------------------

:class:`BayesianOptimisation` in Hypertunity is a wrapper around `GPyOpt.methods.BayesianOptimization` which uses
Gaussian Process regression to build a surrogate model of the objective function. It is initialised from a :class:`Domain`
object:

.. code-block:: python

    bo = BayesianOptimization(domain)

The :class:`BayesianOptimisation` optimiser is highly customisable during sampling. This enables the user to
dynamically refine the model during calling :py:meth:`run_step()`. This approach introduces however the computational
burden of recomputing the surrogate model at each query. In the following example we show how one can set the GP model
using readily available ones from `GPy.models`, e.g. a `GPHeteroschedasticRegression`:

.. code-block:: python

    bo = BayesianOptimisation(domain=domain, seed=7)                    # initialise BO optimiser
    kernel = GPy.kern.RBF(1) + GPy.kern.Bias(1)                         # create a custom kernel
    custom_model = GPy.models.GPHeteroscedasticRegression(..., kernel)  # create a custom model
    samples = bayes_opt.run_step(model=custom_model)                    # generate samples


Random search
-------------

This class is a wrapper around the :py:meth:`Domain.sample()` method. It has the API of
an :class:`Optimiser` class and yields samples which are uniformly drawn from the domain.
There is no limitation on the number of samples that can be returned in a single call of :py:meth:`run_step()`,
even if this leads to repetitions.


Grid search
-----------

:class:`GridSearch` is a wrapper around the iteration over a domain. It goes over each point in the Cartesian-product of
all discrete subdomains. If one of the subdomains is continuous :class:`GridSearch` will sample uniformly from
this interval. Once the domain is exhausted, further iteration will be prevented by raising an :class:`ExhaustedSearchSpaceError`.
To iterate again the :class:`GridSearch` optimiser must be reset by calling the :py:meth:`reset()` method.

.. code-block:: python

    >>> domain = Domain({"x": {1, 2, 3}, "y": {"a", "b"}, "z": [0, 1]})
    >>> gs = GridSearch(domain, sample_continuous=True)
    >>> gs.run_step(batch_size=6)
    [
        {'x': 1, 'y': 'b', 'z': 0.054781406913364084},
        {'x': 2, 'y': 'b', 'z': 0.7006391867439882},
        {'x': 3, 'y': 'b', 'z': 0.9674445624792569},
        {'x': 1, 'y': 'a', 'z': 0.7837727333178091},
        {'x': 2, 'y': 'a', 'z': 0.17240297136803384},
        {'x': 3, 'y': 'a', 'z': 0.844465575155033}
    ]
    >>> gs.reset()


Custom optimiser
----------------

If neither of the predefined optimiser are useful for your problem, you can easily roll out a custom one.
Only thing you have to do is to inherit from the base :class:`Optimiser` class and implement the :py:meth:`run_step` method.

.. code-block:: python

    class CustomOptimiser(Optimiser):
        def __init__(self, domain, *args, **kwargs):
            super(CustomOptimiser, self).__init__(domain)
            ...

        def run_step(batch_size, *args, **kwargs):
            ...
            return [samples]


================================================
FILE: docs/manual/quickstart.rst
================================================
Quick start
===========

A worked example
~~~~~~~~~~~~~~~~

Let's delve in into the API of Hypertunity by going through a worked example---neural network hyperparameter optimisation.
In the following we will tune the number of layers and units, the non-linearity type, as well as the dropout rate and the
learning rate of the optimiser.

**Disclaimer:** This example serves a demonstration purpose only. It does not represent an advanced way of performing
neural network architecture search!

First thing we do it to import Hypertunity, tensorflow and numpy and define a helper data loading function:

.. code-block:: python

    import hypertunity as ht
    import numpy as np
    import tensorflow as tf

    import hypertunity.reports.tensorboard as ht_tb


    def load_mnist():
        (train_x, train_y), (test_x, test_y) = tf.keras.datasets.mnist.load_data()
        data_shape = train_x.shape[1:]
        train_x = train_x.reshape(-1, np.prod(data_shape)).astype(np.float32) / 255.
        mean_train = np.mean(train_x, axis=0)
        train_x -= mean_train
        test_x = test_x.reshape(-1, np.prod(data_shape)).astype(np.float32) / 255.
        test_x -= mean_train
        train_y = tf.keras.utils.to_categorical(train_y, num_classes=10)
        test_y = tf.keras.utils.to_categorical(test_y, num_classes=10)
        return (train_x, train_y), (test_x, test_y)


Next we define a function that will build the model given the architectural hyperparameters and the learning rate,
followed by the objective which will wrap the model building and evaluation:

.. code-block:: python

    def build_model(inp_size, out_size, n_layers, n_units, p_dropout, activation):
        inp = tf.keras.Input(inp_size)
        h = inp
        for l in range(n_layers - 1):
            h = tf.keras.layers.Dense(n_units, activation=activation)(h)
            h = tf.keras.layers.Dropout(rate=p_dropout)(h)
        h = tf.keras.layers.Dense(out_size, activation=None)(h)
        out = tf.keras.layers.Softmax()(h)
        model = tf.keras.models.Model(inputs=inp, outputs=out)
        return model


    def objective_fn(**config) -> float:
        (train_x, train_y), (test_x, test_y) = load_mnist()
        model = build_model(train_x.shape[-1], train_y.shape[-1],
                            config["arch"]["n_layers"],
                            config["arch"]["n_units"],
                            config["arch"]["p_dropout"],
                            config["arch"]["activation"])
        opt = tf.keras.optimizers.Adam(learning_rate=config["opt"]["lr"])
        model.compile(optimizer=opt, loss="categorical_crossentropy")
        model.fit(train_x, train_y, batch_size=100, epochs=1)
        score = model.evaluate(test_x, test_y, batch_size=test_x.shape[0])
        return score

Now that we can build a model, we should define the ranges of possible values for the these parameters.
This can be done with creating a :class:`Domain` instance as follows:

.. code-block:: python

    domain = ht.Domain({
        "arch": {
            "n_layers": {1, 3, 5},
            "n_units": {10, 50, 100, 500},
            "p_dropout": [0, 0.9999],
            "activation": {"relu", "selu", "elu"}
        },
        "opt": {
            "lr": [1e-9, 1e-2]
        }
    })

The :class:`Domain` plays a central role in Hypertunity and we will make a frequent use of it later as well.
An important related class is the :class:`Sample`. It can be thought of as one realisation of the variables from the domain,
which in our case is one particular configuration of network hyperparameters.

Using the domain, we can set up the optimiser and the result visualiser also used for experiment logging.
In this case we use :class:`BayesianOptimisation` and :class:`Tensorboard` respectively:

.. code-block:: python

    optimiser = ht.BayesianOptimisation(domain)
    tb_rep = ht_tb.Tensorboard(domain,
                               metrics=["cross-entropy"],
                               logdir="./mnist_mlp",
                               database_path="./mnist_mlp")


After we create the :class:`Tensorboard` reporter we will be prompted to run `tensorboard --logdir=./mnist_mlp`
in the console and open Tensorboard in the browser. We can do this also before we launch the actual optimisation.

One last bit before running it is the definition of the job schedule as well as optimiser and reporter update loop.
This is to ensure that samples are generated, experiments are run and the results used to improve the underlying model of the :class:`BayesianOptimisation` optimiser.
To schedule one experiment at a time, for 50 consecutive steps we create a :class:`Job` for each function call of ``objective_fn``
with a set of suggested hyperparameters:

.. code-block:: python

    n_steps = 50
    batch_size = 1
    with ht.Scheduler(n_parallel=batch_size) as scheduler:
        for i in range(n_steps):
            samples = optimiser.run_step(batch_size=batch_size, minimise=True)
            jobs = [ht.Job(task=objective_fn, args=s.as_dict() for s in samples]
            scheduler.dispatch(jobs)
            evaluations = [r.data for r in scheduler.collect(n_results=batch_size, timeout=100.0)]
            optimiser.update(samples, evaluations)
            for sample_evaluation_pair in zip(samples, evaluations):
                tb_rep.log(sample_evaluation_pair)

If we have a look at the Tensorboard dashboard while this is running, we should be able to see results being updated live!

.. image:: ../_static/images/tensorboard.gif
  :width: 800
  :align: center
  :alt: Tensorboard

Even quicker start
~~~~~~~~~~~~~~~~~~

A high-level wrapper class :class:`Trial` allows for seamless parallel optimisation without having to schedule jobs,
update the optimiser or log results explicitly. The API is reduced to the minimum and yet remains flexible as
one can specify any optimiser or reporter:

.. code-block:: python

    trial = ht.Trial(objective=objective_fn,
                     domain=domain,
                     optimiser="bo",
                     reporter="tensorboard",
                     logdir="./mnist_mlp",
                     database_path="./mnist_mlp",
                     metrics=["cross-entropy"])

    trial.run(n_steps, batch_size=batch_size, n_parallel=batch_size)


================================================
FILE: docs/manual/reports.rst
================================================
Reports
=======

Saving and visualising progress can be accomplished by using :class:`Reporter` instance.
The reporter is supplied with data using the :py:meth:`log()` method which takes a tuple of a sample and score.
Optionally one can store additional information about the current experiment, e.g. the output directory or the job id,
using the ``meta`` keyword argument:

.. code-block:: python

    for s, e, m in zip(samples, evaluations, meta_infos):
        reporter.log((s, e), meta=m)

Table
-----

Hypertunity comes with a built-in reporter which organises the experiment results into an ascii table.
It is initialised from a domain and a list of metrics and can be viewed as a formatted string table by calling :obj:`str`
on the object.
The table can be sorted in ascending or descending order and the best results can be emphasised:

.. code-block:: python

    >>> domain = ht.Domain({"x": [-5., 6.], "y": {"sin", "cos"}, "z": set(range(4))})
    >>> reporter = ht.Table(domain, metrics=["score"])
    >>> # run experiment and call reporter.log(...)
    ...
    >>> print(reporter.format(order="descending"))
    +=====+========+=====+===+==============+
    | No. |   x    |  y  | z |    score     |
    +=====+========+=====+===+==============+
    |  6  | -4.35  | cos | 1 | 15.921 ± 0.0 |
    +-----+--------+-----+---+--------------+
    |  5  | -4.232 | cos | 3 | 8.906 ± 0.0  |
    +-----+--------+-----+---+--------------+
    |  4  | -4.588 | sin | 3 | 6.134 ± 0.0  |
    +-----+--------+-----+---+--------------+
    |  2  |  2.16  | cos | 0 | 4.667 ± 0.0  |
    +-----+--------+-----+---+--------------+
    |  3  | -0.977 | cos | 1 | -2.045 ± 0.0 |
    +-----+--------+-----+---+--------------+
    |  1  | -1.438 | cos | 3 | -6.933 ± 0.0 |
    +-----+--------+-----+---+--------------+

Tensorboard
-----------

If Hypertunity is installed with the `tensorboard` option, a suitable version of Tensorflow and Tensorboard will be installed.
This will enable a :class:`Tensorboard` reporter which, using the HParams plugin, will generate live visualisations
as experiments are being logged. One can start the Tensorboard dashboard in the browser as usual, using the `logdir` supplied
at initialisation.

Note that to create a Tensorboard reporter one will have to import ``hypertunity.reports.tensorboard`` explicitly:

.. code-block:: python

    import hypertunity.reports.tensorboard as tb
    tb_reporter = tb.Tensorboard(domain, metrics=["score"], logdir="./logs")

See the :doc:`quickstart` guide for a preview of the dashboard visualisation.


================================================
FILE: docs/manual/scheduling.rst
================================================
Scheduling jobs
===============

Often in practice the objective function is a python script that might take command line arguments as parameters or define a function that has lots of dependencies.
Importing this function into the hyperparameter optimisation script or wrapping the target script involves some boilerplate code.
To help with that Hypertunity allows for specifying objective functions as ``Job`` instances which are then run in succession or in parallel using a ``Scheduler``.
The latter is a wrapper around `joblib <https://joblib.readthedocs.io>`_ and takes care of both running jobs and collecting results.

Scheduling of ``Job`` instances is done using the ``dispatch`` method of a ``Scheduler``:

.. code-block:: python

    jobs = [Job(...) for _ in range(10)]
    scheduler.dispatch(jobs)
    evaluations = [r.data for r in scheduler.collect(n_results=batch_size, timeout=10.0)]

There are multiple ways to define a job depending on the target to optimise.

Local python callable
~~~~~~~~~~~~~~~~~~~~~

If the function is defined or imported within the hyperparameter optimisation script, the ``task`` argument is the callable instance.
The ``args`` is then a tuple of arguments or a dict of named arguments which are supplied to the task function during calling.
For example:

.. code-block:: python

    jobs = [ht.Job(task=foo, args=(*s.as_namedtuple(),)) for s in samples]


Python callable in a script
~~~~~~~~~~~~~~~~~~~~~~~~~~~

If the function to optimise resides in a script, Hypertunity allows for specifying a target by the full path to the script.
To select the objective function from the script append ``:`` and the function name:

.. code-block:: python

    jobs = [Job(task="path/to/script.py:foo", args=(*s.as_namedtuple(),)) for s in samples]


A script
~~~~~~~~

If the objective function is a full command line application or a script that accepts the hyperparameters to tune as command line arguments then you should create a job as follows:

.. code-block:: python

    jobs = [Job(task="path/to/script.py",
                args=(*s.as_namedtuple(),),
                meta={"binary": "python"}) for s in samples]


Using Slurm
~~~~~~~~~~~

To schedule jobs using Slurm a special job type is available. It allows to configure resources and other Slurm parameters but also requires that the target script is able to write a results file on disk.

.. code-block:: python

    jobs = [SlurmJob(task="path/to/script.py",
                     args=(*sample.as_namedtuple(),),
                     output_file="path/to/results.pkl",
                     meta={"binary": "python", "resources": {"cpu": 1}}))


================================================
FILE: docs/source/hypertunity.rst
================================================
:mod:`hypertunity`
==================

.. automodule:: hypertunity

Summary
-------

.. autosummary::
   :nosignatures:

   Domain
   Sample
   Trial

API documentation
-----------------

.. autoclass:: Domain
   :members:

.. autoclass:: Sample
   :members:

.. autoclass:: Trial
   :members:


================================================
FILE: docs/source/optimisation.rst
================================================
:mod:`hypertunity.optimisation`
===============================

.. currentmodule:: hypertunity.optimisation

Summary
-------

Data classes
~~~~~~~~~~~~

.. autosummary::
   :nosignatures:

   EvaluationScore
   HistoryPoint

Optimisers
~~~~~~~~~~

.. autosummary::
   :nosignatures:

   Optimiser
   BayesianOptimisation
   GridSearch
   RandomSearch

API documentation
-----------------

.. autoclass:: EvaluationScore
   :members:

.. autoclass:: HistoryPoint
   :members:

.. autoclass:: Optimiser
   :members:

.. autoclass:: BayesianOptimisation
   :members:

.. autoclass:: GridSearch
   :members:

.. autoclass:: RandomSearch
   :members:


================================================
FILE: docs/source/reports.rst
================================================
:mod:`hypertunity.reports`
==========================

.. currentmodule:: hypertunity.reports

Summary
-------

Default
~~~~~~~

.. autosummary::
   :nosignatures:

   Reporter
   Table

Optional
~~~~~~~~

.. autosummary::
    :nosignatures:

    tensorboard.Tensorboard

API documentation
-----------------

.. autoclass:: Reporter
  :members:

.. autoclass:: Table
  :members:

.. currentmodule:: hypertunity.reports.tensorboard

.. autoclass:: Tensorboard
   :members:


================================================
FILE: docs/source/scheduling.rst
================================================
:mod:`hypertunity.scheduling`
=============================

.. currentmodule:: hypertunity.scheduling

Summary
-------

.. autosummary::
   :nosignatures:

   Scheduler
   Job
   SlurmJob
   Result

API documentation
-----------------

.. autoclass:: Scheduler
   :members:

.. autoclass:: Job
   :members:

.. autoclass:: SlurmJob
   :members:

.. autoclass:: Result
   :members:


================================================
FILE: hypertunity/__init__.py
================================================
from .domain import *
from .optimisation import *
from .reports import *
from .scheduling import *
from .trial import *

__version__ = "1.0.1"


================================================
FILE: hypertunity/domain.py
================================================
"""Definition of the optimisation domain and a sample."""

import ast
import copy
import os
import pickle
import random
from collections import namedtuple
from typing import Tuple

__all__ = [
    "Domain",
    "DomainNotIterableError",
    "DomainSpecificationError",
    "Sample"
]


class _RecursiveDict:
    """Helper base class for the :class:`Domain` and :class:`Sample` classes.

    It implements common logic for creation, representation, type conversion
    and serialisation.
    """

    def __init__(self, dct):
        if isinstance(dct, dict):
            self._data = dct
        elif isinstance(dct, str):
            self._data = ast.literal_eval(dct)
        else:
            raise TypeError(
                f"A {self.__class__.__name__} object can be created from a "
                f"Python dict or str objects only. "
                f"Unknown type {type(dct)} at initialisation."
            )

        self._ndim = 0
        for _, val in _deepiter_dict(self._data):
            self._ndim += 1

    def __hash__(self):
        return hash(str(self))

    def __repr__(self):
        """Return the representation of the recursive dict using the
        string method.
        """
        return str(self)

    def __str__(self):
        """Return the string representation of the recursive dict."""
        return str(self._data)

    def __eq__(self, other):
        """Compare all subdomains for equal bounds and sets. The order of the
        subdomains is not important.
        """
        return self.as_dict() == other.as_dict()

    def __len__(self):
        """Compute the dimensionality of the recursive dict as the length of
        the flattened dict.
        """
        return self._ndim

    def __getitem__(self, item):
        """Return the item (possibly a subdomain) for a given key.

        Args:
            item: str of tuple of str. If the latter it will access nested
            structures with the next str in the tuple.
        """
        if isinstance(item, str):
            return self._data.__getitem__(item)
        elif isinstance(item, tuple) and all(map(lambda x: isinstance(x, str), item)):
            sub_dict = self._data
            for it in item:
                if not isinstance(sub_dict, dict):
                    raise KeyError(f"Unknown sub-key {it}.")
                sub_dict = sub_dict[it]
            return sub_dict

    def __add__(self, other: '_RecursiveDict'):
        """Merge self with the `other` :class:`_RecursiveDict`.

        Args:
            other: :class:`_RecursiveDict`. The recursive dictionary that will
                be merged into the current one.

        Returns:
            A new :class:`_RecursiveDict` object consisting of the subdomains
            of both domains. If the keys overlap and the subdomains are discrete
            or categorical, the values will be unified.

        Raises:
            :obj:`ValueError`: if identical keys point to different values.
        """
        flattened_a = self.flatten()
        flattened_b = other.flatten()
        # validate that the two _RecursiveDicts are disjoint
        if len(flattened_a.keys()) > len(flattened_a.keys() - flattened_b.keys()):
            raise ValueError(
                f"Ambiguous addition of {self.__class__.__name__} objects."
            )
        merged = list(flattened_a.items())
        merged.extend(list(flattened_b.items()))
        return self.__class__.from_list(merged)

    def flatten(self):
        """Return the flattened version of the recursive dict, i.e. without
        nested dicts.

        The keys of the nested subdomains are collected in a tuple to create a
        new unique key. For the sake of type consistency, the key of a
        non-nested subdomain is converted to a tuple with a single element.
        """
        return {keys: val for keys, val in _deepiter_dict(self._data)}

    def as_dict(self):
        """Convert the recursive dict object from :class:`_RecursiveDict`
        to :obj:`dict` type.
        """
        return copy.deepcopy(self._data)

    @classmethod
    def from_list(cls, lst):
        """Create a :class:`_RecursiveDict` object from a list of tuples.

        Args:
            lst: :obj:`List[Tuple]`. Each element is a pair of the keys
            (tuple of strings) and the value.

        Returns:
            A :class:`_RecursiveDict` object.

        Raises:
            :obj:`ValueError`: if the list contains duplicating keys with
            different values.

        Examples:
        ```python
            >>> lst = [(("a", "b"), {2, 3, 4}), (("c",), [0, 0.1])]
            >>> _RecursiveDict.from_list(lst)
            {"a": {"b": {2, 3, 4}}, "c": [0, 0.1]}
        ```
        """
        dct = {}
        head = dct
        for keys, vals in lst:
            if not keys:
                continue
            for k in keys[:-1]:
                if k not in dct:
                    dct[k] = {}
                dct = dct[k]
            if keys[-1] in dct and dct[keys[-1]] == vals:
                raise ValueError(f"Duplicating entries for keys {keys}.")
            dct[keys[-1]] = vals
            dct = head
        return cls(head)

    def serialise(self, filepath=None):
        """Serialise the :class:`_RecursiveDict` object to a file or a string
        if `filepath` is not supplied.

        Args:
            filepath: (optional) :obj:`str`. Filepath as to dump the serialised
            :class:`_RecursiveDict` object.

        Returns:
            The bytes representing the serialised :class:`_RecursiveDict` object.
        """
        serialised = pickle.dumps(self._data)
        if filepath is not None:
            with open(filepath, "wb") as fp:
                pickle.dump(self._data, fp)
        return serialised

    @classmethod
    def deserialise(cls, series):
        """Deserialise a serialised :class:`_RecursiveDict` object from a byte
        stream or file.

        Args:
            series: :obj:`str`. The serialised :class:`_RecursiveDict` object or
                a filepath to it.

        Returns:
            A :class:`_RecursiveDict` object.
        """
        if not isinstance(series, (bytes, bytearray)) and os.path.isfile(series):
            with open(series, "rb") as fp:
                return cls(pickle.load(fp))
        return cls(pickle.loads(series))

    def as_namedtuple(self):
        """Convert a :class:`_RecursiveDict` to a namedtuple type.

        Returns:
            A Python namedtuple object with names the same as the keys of the
            :class:`_RecursiveDict` dict. Nested dicts are accessed by
            successive attribute getters.

        Examples:
        ```python
            >>> rd = _RecursiveDict({"a": {"b": [1, 2]}, "c": {1, 2, 3}, "d": 2.})
            >>> nt = rd.as_namedtuple()
            >>> nt.a.b
            [1, 2]
            >>> nt.c == {1, 2, 3} and nt.d == 2.
            True
        ```
        """

        def helper(dct):
            keys, vals = [], []
            for k, v in dct.items():
                keys.append(k)
                if isinstance(v, dict):
                    vals.append(helper(v))
                else:
                    vals.append(v)
            # The dict.keys() and dict.values() will iterate in the same order
            # as long as dct is not modified.
            return namedtuple("NT_" + self.__class__.__name__, keys)(*vals)

        return helper(self._data)


class Domain(_RecursiveDict):
    """Defines the optimisation domain of the objective function. It can be a
    continuous interval or a discrete set of numeric or non-numeric values.
    The latter is also designated as a categorical domain. It is represented as
    a Python dict object with the keys naming the variables and the values defining
    the set of allowed values. A :class:`Domain` can also be recursively
    specified. That is, a key can name a subdomain represented as a Python dict.

    For continuous sets use Python list to define an interval in the form
    [a, b], a < b. For discrete sets use Python sets, e.g. {1, 2, 5, -0.1}
    or {"option_a", "option_b"}.

    Examples:
        >>> simple_domain = {"x": {0, 1},
        >>>                  "y": [-1, 1],
        >>>                  "z": {-1, 2, 4}}
        >>> nested_domain = {"discrete": {"x": {1, 2, 3}, "y": {4, 5, 6}}
        >>>                  "continuous": {"x": [-4, 4], "y": [0, 1]}
        >>>                  "categorical": {"opt1", "opt2"}}
    """
    # Domain types
    Continuous = 1
    Discrete = 2
    Categorical = 3
    Invalid = 4

    def __init__(self, dct, seed=None):
        """Initialise the :class:`Domain`.

        Args:
            dct: :obj:`dict`. The mapping of variable names to sets of
                allowed values.
            seed: (optional) :obj:`int`. Seed for the randomised sampling.
        """
        super(Domain, self).__init__(dct)
        self._validate()
        self._rng = random.Random(seed)
        self._is_continuous = False
        for _, val in _deepiter_dict(self._data):
            if isinstance(val, list):
                self._is_continuous = True

    def __iter__(self):
        """Iterate over the domain if it is fully discrete.

        The iterations are over the Cartesian product of all 1-dim discrete
        subdomains.

        Raises:
            :class:`DomainNotIterableError`: if the domain has a at least one
            continuous subdomain.
        """
        if self._is_continuous:
            raise DomainNotIterableError(
                "The domain has a continuous subdomain and cannot be iterated."
            )

        def cartesian_walk(dct):
            if dct:
                key, vals = dct.popitem()
                if isinstance(vals, set):
                    for v in vals:
                        yield from (
                            dict(**rem, **{key: v})
                            for rem in cartesian_walk(copy.deepcopy(dct))
                        )
                elif isinstance(vals, dict):
                    for sub_v in cartesian_walk(copy.deepcopy(vals)):
                        yield from (
                            dict(**rem, **{key: sub_v})
                            for rem in cartesian_walk(copy.deepcopy(dct))
                        )
                else:
                    raise TypeError(
                        f"Unexpected subdomain of type {type(vals)}."
                    )
            else:
                yield {}

        yield from map(Sample, cartesian_walk(copy.deepcopy(self._data)))

    def _validate(self):
        """Check for invalid domain specifications."""
        for keys, values in _deepiter_dict(self._data):
            if not (all(map(lambda x: isinstance(x, str), keys))
                    and isinstance(values, (set, list, dict))):
                raise DomainSpecificationError(
                    "Keys must be of type string and values "
                    "must be either of type set, list or dict."
                )
            if (isinstance(values, list)
                    and (len(values) != 2 or values[0] >= values[1])):
                raise DomainSpecificationError(
                    "Interval must be specified by two numbers: [a, b], a < b."
                )

    def sample(self):
        """Draw a sample from the domain. All subdomains are sampled uniformly.

        Returns:
            A :class:`Sample` object.
        """

        def sample_dict(dct):
            sample = {}
            for key, vals in dct.items():
                if isinstance(vals, set):
                    sample[key] = self._rng.choice(list(vals))
                elif isinstance(vals, list):
                    sample[key] = self._rng.uniform(*vals)
                else:
                    sample[key] = sample_dict(vals)
            return sample

        return Sample(sample_dict(self._data))

    @property
    def is_continuous(self):
        """Return `True` if at least one subdomain is continuous."""
        return self._is_continuous

    @classmethod
    def get_type(cls, subdomain):
        """Return the type of the set of values in a subdomain.

        Args:
            subdomain: one of :obj:`dict`, :obj:`list` or :obj:`set`. The
                subdomain to get the type for.

        Returns:
            One of `Domain.Continuous`, `Domain.Discrete`, `Domain.Categorical`
            or `Domain.Invalid`.
        """

        def is_numeric(x):
            try:
                float(x)
            except ValueError:
                return False
            return True

        if isinstance(subdomain, list):
            return Domain.Continuous
        if isinstance(subdomain, set):
            if all(map(is_numeric, subdomain)):
                return Domain.Discrete
            return Domain.Categorical
        return Domain.Invalid

    def split_by_type(self) -> Tuple['Domain', 'Domain', 'Domain']:
        """Split the domain into discrete, categorical and continuous
        subdomains respectively.

        Returns:
            A tuple of three :class:`Domain` objects for the discrete
            numerical, categorical and continuous subdomains.
        """
        discrete, categorical, continuous = [], [], []
        for keys, vals in self.flatten().items():
            if Domain.get_type(vals) == Domain.Continuous:
                continuous.append((keys, vals))
            elif Domain.get_type(vals) == Domain.Categorical:
                categorical.append((keys, vals))
            elif Domain.get_type(vals) == Domain.Discrete:
                discrete.append((keys, vals))
            else:
                raise ValueError("Encountered an invalid subdomain.")
        return (
            Domain.from_list(discrete),
            Domain.from_list(categorical),
            Domain.from_list(continuous)
        )


class DomainNotIterableError(TypeError):
    """Alias for the :obj:`TypeError` raised during iteration of (partially)
    continuous :class:`Domain` object.
    """
    pass


class DomainSpecificationError(ValueError):
    """Alias for the :obj:`ValueError` raised during :class:`Domain` object
    creation from an invalid set of values.
    """
    pass


class Sample(_RecursiveDict):
    """Defines a sample from the optimisation domain.

    It has the same recursive structure a :class:`Domain` object, however each
    dimension is represented by one value only. The keys are exactly as the
    keys of the respective domain.

    Examples:
        >>> domain = Domain({"x": {"y": {0, 1, 2}}, "z": [3, 4]})
        >>> domain.sample()
        {'x': {'y': 0}, 'z': 3.1415926535897932}
    """

    def __init__(self, dct):
        """Initialise the :class:`Sample` object from a dict."""
        super(Sample, self).__init__(dct)

    def __iter__(self):
        """Iterate over all values in the sample.

        Yields:
            A tuple of keys and a single value, where the keys are a tuple
            of strings.
        """
        yield from self.flatten().items()


def _deepiter_dict(dct):
    """Iterate over all key, value pairs of a (possibly nested) dictionary.
    In this case, all keys of the nested dicts are summarised in a tuple.

    Args:
        dct: dict object to iterate.

    Yields:
        Tuple of keys (itself a tuple) and the corresponding value.

    Examples:
        >>> list(_deepiter_dict({"a": {"b": 1, "c": 2}, "d": 3}))
        [(('a', 'b'), 1), (('a', 'c'), 2), (('d',), 3)]
    """

    def chained_keys_iter(prefix_keys, dct_tmp):
        for key, val in dct_tmp.items():
            chained_keys = prefix_keys + (key,)
            if isinstance(val, dict):
                yield from chained_keys_iter(chained_keys, val)
            else:
                yield chained_keys, val

    yield from chained_keys_iter((), dct)


================================================
FILE: hypertunity/optimisation/__init__.py
================================================
from .base import *
from .bo import *
from .exhaustive import *
from .random import *


================================================
FILE: hypertunity/optimisation/base.py
================================================
"""Defines the API of every optimiser and implements common logic."""

import abc
import math
from dataclasses import dataclass
from typing import Any, Dict, List, Sequence

from hypertunity.domain import Domain, Sample

__all__ = [
    "EvaluationScore",
    "HistoryPoint",
    "Optimiser",
    "Optimizer",
    "ExhaustedSearchSpaceError"
]


@dataclass(frozen=True, order=True)
class EvaluationScore:
    """A tuple of the evaluation value of the objective function
    and a variance if known.
    """
    value: float
    variance: float = 0.0

    def __str__(self):
        return f"{self.value:.3f} ± {math.sqrt(self.variance):.1f}"


@dataclass(frozen=True)
class HistoryPoint:
    """A tuple of a :class:`Sample` at which the objective has been evaluated
    and the corresponding metrics. The metrics are supplied as :obj:`dict`
    mapping of a :obj:`str` metric name to an :class:`EvaluationScore`.
    """
    sample: Sample
    metrics: Dict[str, EvaluationScore]


class Optimiser:
    """Abstract class :class:`Optimiser` for all optimisers.

    It must be implemented by all subclasses in this package.

    Every :class:`Optimiser` instance can be run for one single step using the
    :py:meth:`run_step` method. The :class:`Optimiser` does not perform the
    evaluation of the objective function but only proposes values from its
    domain. Therefore an evaluation history must be supplied via the
    :py:meth`update` method. The history can be erased and the
    :class:`Optimiser` brought to the initial state via the :py:meth:`reset`
    method.
    """

    DEFAULT_METRIC_NAME = "score"

    def __init__(self, domain: Domain):
        """Initialise the optimiser with a domain.

        Args:
            domain: :class:`Domain`. The domain of the objective function.
        """
        self.domain = domain
        self._history: List[HistoryPoint] = []

    @property
    def history(self):
        """Return the accumulated optimisation history."""
        return self._history

    @history.setter
    def history(self, history: List[HistoryPoint]):
        """Set the optimiser history.

        This method can be used to warm-start an optimiser.

        Args:
            history: :obj:`List[HistoryPoint]`. New history which will
                **overwrite** the old one.
        """
        self.reset()
        for hp in history:
            self.update(hp.sample, hp.metrics)

    @abc.abstractmethod
    def run_step(self, batch_size, *args: Any, **kwargs: Any) -> List[Sample]:
        """Perform one step of optimisation and suggest the next sample to
        evaluate.

        Args:
            batch_size: (optional) :obj:`int`. The number of samples to
                suggest at once.
            *args: optional arguments for the Optimiser.
            **kwargs: optional keyword arguments for the Optimiser.

        Returns:
            A :obj:`List[Sample]` with the suggested samples to evaluate.
        """
        raise NotImplementedError

    def update(self, x, fx, **kwargs):
        """Update the optimiser's history with new points.

        Args:
            x: :class:`Sample` or :obj:`List[Sample]`. The samples at which the
                objective function has been evaluated.
            fx: :class:`EvaluationScore` or :obj:`List[EvaluationScore]`. The
                evaluation scores at the corresponding samples.
        """
        if isinstance(x, Sample):
            self._update_history(x, fx)
        elif (isinstance(x, Sequence)
              and isinstance(fx, Sequence)
              and len(x) == len(fx)):
            for i, j in zip(x, fx):
                self._update_history(i, j)
        else:
            raise ValueError("Update values for `x` and `f(x)` must be either "
                             "a `Sample` and an evaluation or a list thereof.")

    def _update_history(self, x, fx):
        if isinstance(fx, (float, int)):
            history_point = HistoryPoint(
                sample=x,
                metrics={self.DEFAULT_METRIC_NAME: EvaluationScore(fx)}
            )
        elif isinstance(fx, EvaluationScore):
            history_point = HistoryPoint(
                sample=x, metrics={self.DEFAULT_METRIC_NAME: fx})
        elif isinstance(fx, Dict):
            metrics = {}
            for key, val in fx.items():
                if isinstance(val, (float, int)):
                    metrics[key] = EvaluationScore(val)
                else:
                    metrics[key] = val
            history_point = HistoryPoint(sample=x, metrics=metrics)
        else:
            raise TypeError(
                "Cannot update history for one sample and multiple evaluations."
                " Use batched update instead and provide a list of samples and "
                "a list of evaluation metrics.")
        self.history.append(history_point)

    def reset(self):
        """Reset the optimiser to the initial state."""
        self._history.clear()


class ExhaustedSearchSpaceError(Exception):
    pass


Optimizer = Optimiser


================================================
FILE: hypertunity/optimisation/bo.py
================================================
"""Bayesian Optimisation using Gaussian Process regression."""

from multiprocessing import cpu_count
from typing import Any, Dict, List, Sequence, Tuple, Type, TypeVar, Union

import GPy
import GPyOpt
import numpy as np
from GPyOpt.core import errors as gpyopt_err

from hypertunity import utils
from hypertunity.domain import Domain, Sample
from hypertunity.optimisation.base import (
    EvaluationScore,
    ExhaustedSearchSpaceError,
    Optimiser
)

__all__ = [
    "BayesianOptimisation",
    "BayesianOptimization"
]

GPyOptSample = TypeVar("GPyOptSample", List[List], np.ndarray)
GPyOptDomain = List[Dict[str, Any]]
GPyOptCategoricalValueMapper = Dict[str, Dict[Any, int]]
GPyOptDiscreteTypeMapper = Dict[str, Dict[Any, type]]


class BayesianOptimisation(Optimiser):
    """Bayesian Optimiser using `GPyOpt` as a backend."""

    CONTINUOUS_TYPE = "continuous"
    DISCRETE_TYPE = "discrete"
    CATEGORICAL_TYPE = "categorical"

    def __init__(self, domain, seed=None):
        """Initialise the optimiser's domain.

        Args:
            domain: :class:`Domain`. The domain of the objective function.
            seed: (optional) :obj:`int`. The seed of the optimiser. Used for
                reproducibility purposes.
        """
        np.random.seed(seed)
        domain = Domain(domain.as_dict(), seed=seed)
        super(BayesianOptimisation, self).__init__(domain)
        converted_and_mappers = self._convert_to_gpyopt_domain(self.domain)
        (
            self.gpyopt_domain,
            self._categorical_value_mapper,
            self._discrete_type_mapper
        ) = converted_and_mappers
        self._inv_categorical_value_mapper = {
            name: {v: k for k, v in mapping.items()}
            for name, mapping in self._categorical_value_mapper.items()
        }
        self._data_x = np.array([[]])
        self._data_fx = np.array([[]])
        self.__is_empty_data = True

    @staticmethod
    def _convert_to_gpyopt_domain(
            orig_domain: Domain
    ) -> Tuple[GPyOptDomain,
               GPyOptCategoricalValueMapper,
               GPyOptDiscreteTypeMapper]:
        """Convert a :class:`Domain` type object to :obj:`GPyOptDomain`.

        Args:
            orig_domain: :class:`Domain` to convert.

        Returns:
            A tuple of the converted :obj:`GPyOptDomain` object and a value
            mapper to assign each categorical value to an integer
            (0, 1, 2, 3 ...). This is done to abstract away the type of the
            categorical domain from the `GPyOpt` internals and thus arbitrary
            types are supported.

        Notes:
            The categorical options must be hashable. This behaviour may change
            in the future.
        """
        gpyopt_domain = []
        value_mapper = {}
        type_mapper = {}
        flat_domain = orig_domain.flatten()
        for names, vals in flat_domain.items():
            dim_name = utils.join_strings(names)
            domain_type = Domain.get_type(vals)
            if domain_type == Domain.Continuous:
                dim_type = BayesianOptimisation.CONTINUOUS_TYPE
            elif domain_type == Domain.Discrete:
                dim_type = BayesianOptimisation.DISCRETE_TYPE
                type_mapper[dim_name] = {v: type(v) for v in vals}
            elif domain_type == Domain.Categorical:
                dim_type = BayesianOptimisation.CATEGORICAL_TYPE
                value_mapper[dim_name] = {v: i for i, v in enumerate(vals)}
                vals = tuple(range(len(vals)))
            else:
                raise ValueError(
                    f"Badly specified subdomain {names} with values {vals}."
                )
            gpyopt_domain.append({
                "name": dim_name,
                "type": dim_type,
                "domain": tuple(vals)
            })
        assert len(gpyopt_domain) == len(orig_domain), \
            "Mismatching dimensionality after domain conversion."
        return gpyopt_domain, value_mapper, type_mapper

    def _convert_to_gpyopt_sample(self, orig_sample: Sample) -> GPyOptSample:
        """Convert a sample of type :class:`Sample` to type :obj:`GPyOptSample`
        and vice versa.

        If the function is supplied with a :obj:`GPyOptSample` type object it
        calls the dedicated function `self._convert_from_gpyopt_sample`.

        Args:
            orig_sample: :class:`Sample` type object to be converted.

        Returns:
            A :obj:`GPyOptSample` type object with the same values as
            `orig_sample`.
        """
        gpyopt_sample = []
        # iterate in the order of the GPyOpt domain names
        for dim in self.gpyopt_domain:
            keys = utils.split_string(dim["name"])
            val = orig_sample[keys]
            if dim["type"] == BayesianOptimisation.CATEGORICAL_TYPE:
                val = self._categorical_value_mapper[dim["name"]][val]
            gpyopt_sample.append(val)
        return np.asarray(gpyopt_sample)

    def _convert_from_gpyopt_sample(self, gpyopt_sample: GPyOptSample) -> Sample:
        """Convert :obj:`GPyOptSample` type object to the corresponding
        :class:`Sample` type.

        Args:
            gpyopt_sample: :obj:`GPyOptSample` object to be converted.

        Returns:
            A :class:`Sample` type object with the same values as
                `gpyopt_sample`.
        """
        if len(self.gpyopt_domain) != len(gpyopt_sample):
            raise ValueError(
                f"Cannot convert sample with mismatching dimensionality. "
                f"The original space has {len(self.domain)} dimensions and the "
                f"sample {len(gpyopt_sample)} dimensions."
            )
        orig_sample = {}
        for dim, value in zip(self.gpyopt_domain, gpyopt_sample):
            names = utils.split_string(dim["name"])
            sub_dim = orig_sample
            for name in names[:-1]:
                if name not in sub_dim:
                    sub_dim[name] = {}
                sub_dim = sub_dim[name]
            if dim["type"] == BayesianOptimisation.CATEGORICAL_TYPE:
                sub_dim[names[-1]] = self._inv_categorical_value_mapper[dim["name"]][value]
            elif dim["type"] == BayesianOptimisation.DISCRETE_TYPE:
                sub_dim[names[-1]] = self._discrete_type_mapper[dim["name"]][value](value)
            else:
                sub_dim[names[-1]] = value
        return Sample(orig_sample)

    @utils.support_american_spelling
    def run_step(
            self,
            batch_size: int = 1,
            minimise: bool = False,
            **kwargs
    ) -> List[Sample]:
        """Run one step of Bayesian optimisation with a GP regression surrogate
        model.

        The first sample of the domain is chosen at random. Only after the model
        has been updated with at least one (data point, evaluation score)-pair
        the GPs are built and the acquisition function computed and optimised.

        Args:
            batch_size: (optional) :obj:`int`. The number of samples to suggest
                at once. If larger than one, there is no guarantee for the
                optimality of the number of probes.
            minimise: (optional) :obj:`bool`. Whether the objective should be
                minimised
            **kwargs: optional keyword arguments which will be passed to the
                backend `GPyOpt.methods.BayesianOptimisation` optimiser.

        Keyword Args:
            model: :obj:`str` or :obj:`GPy.Model` object. The surrogate model
                used by the backend optimiser.
            kernel: :obj:`GPy.Kern` object. The kernel used by the model.
            variance: :obj:`float`. The variance of the objective function.

        Returns:
            A list of `batch_size`-many :class:`Sample` instances at which the
            objective should be evaluated next.

        Raises:
            :class:`ExhaustedSearchSpaceError`: if the domain is discrete and
            gets exhausted.
        """
        if self.__is_empty_data:
            next_samples = [self.domain.sample() for _ in range(batch_size)]
        else:
            assert len(self._data_x) > 0 and len(self._data_fx) > 0, \
                "Cannot initialise BO from empty data."
            default_kwargs = {
                "num_cores": min(batch_size, cpu_count() - 1),
                "normalize_Y": True,
                "acquisition_type": "EI",
                "de_duplication": True,
                "model_type": "GP",
                "evaluator_type": "local_penalization" if batch_size > 1 else "sequential"
            }
            if "model" in kwargs:
                model = kwargs.pop("model")
                # NOTE: Remove this test for model type after the bug in GPyOpt
                #  is fixed: https://github.com/SheffieldML/GPyOpt/issues/183
                if (isinstance(model, str)
                        and model.lower() == "gp_mcmc"
                        and batch_size > 1):
                    raise ValueError(
                        "GP_MCMC model cannot be used with a batch size > 1 "
                        "due to a bug in GPyOpt: "
                        "https://github.com/SheffieldML/GPyOpt/issues/183"
                    )
                kernel = kwargs.pop("kernel", None)
                variance = kwargs.pop("variance", None)
                default_kwargs["model"] = self._build_model(
                    model, kernel, variance
                )
                if (variance is not None
                        and all(np.atleast_1d(np.isclose(variance, 0.0)))):
                    default_kwargs["exact_feval"] = True
            default_kwargs = _overwrite_dict(default_kwargs, kwargs)

            # NOTE: as of GPyOpt 1.2.5 adding new data to an existing model is
            #  not yet possible, hence the object recreation. This behaviour
            #  might be changed in future versions. In this case the code should
            #  be refactored such that `bo` is initialised once and `update`
            #  takes care of the extension of the (X, Y) samples.
            bo = GPyOpt.methods.BayesianOptimization(
                f=None, domain=self.gpyopt_domain,
                maximize=not minimise,
                X=self._data_x,
                # NOTE: the following hack is necessary due to a bug in GPyOpt.
                #  The code should be updated once this gets fixed:
                #  https://github.com/SheffieldML/GPyOpt/issues/180
                Y=(-1 + 2 * minimise) * self._data_fx,
                initial_design_numdata=len(self._data_x),
                batch_size=batch_size,
                **default_kwargs)
            try:
                gpyopt_samples = bo.suggest_next_locations()
            except gpyopt_err.FullyExploredOptimizationDomainError as err:
                raise ExhaustedSearchSpaceError from err
            next_samples = [self._convert_from_gpyopt_sample(s)
                            for s in gpyopt_samples]
        return next_samples

    def _build_model(self, model: Union[str, Type[GPy.Model]] = "GP",
                     kernel: GPy.kern.Kern = None,
                     variance: float = None):
        """Build the surrogate model for the GPyOpt BayesianOptimisation.

        The default model is 'gp'. In case of a large number of already
        evaluated samples, a 'sparse_gp' is used to speed up computation.

        Args:
            model: :obj:`str` or :obj:`GPy.Model`, the GP regression model.
            kernel: :obj:`GPy.kern.Kern`, the kernel of the GP regression model.
            variance: :obj:`float`, the variance of the evaluations
                (used only if supported by the model).

        Returns:
            A :obj:`GPy.Model` instance.
        """
        if isinstance(model, GPy.Model):
            return model
        if isinstance(model, str):
            model = model.lower()
            if model == "gp":
                return GPyOpt.models.GPModel(kernel=kernel, noise_var=variance,
                                             sparse=len(self._data_x) > 25)
            if model == "gp_mcmc":
                return GPyOpt.models.GPModel_MCMC(
                    kernel=kernel,
                    noise_var=variance
                )
            raise ValueError(
                f"Unknown model {model}. When supplying a custom kernel or "
                f"the variance of the objective function, the model has to be "
                f"one from {{'GP', 'GP_MCMC'}}. Otherwise you should supply a "
                f"custom `GPy.Model` instance."
            )
        raise TypeError("Argument `model` must be of type str or `GPy.Model`.")

    def update(self, x, fx, **kwargs):
        """Update the surrogate model with the domain sample `x` and the
        function evaluation `fx`.

        Args:
            x: class:`Sample`. One sample of the domain of the objective
                function.
            fx: a :obj:`float`, an :class:`EvaluationScore` or a :obj:`dict`.
                The evaluation scores of the objective evaluated at `x`. If
                given as :obj:`dict` then it must be a mapping from metric names
                to :class:`EvaluationScore` or :obj:`float` results.
            **kwargs: unused by this model.
        """
        super(BayesianOptimisation, self).update(x, fx)
        # both `converted_x` and `array_fx` must be 2dim arrays
        if isinstance(x, Sample):
            converted_x, array_fx = self._convert_evaluation_sample(x, fx)
        elif (isinstance(x, Sequence)
              and isinstance(fx, Sequence)
              and len(x) == len(fx)):
            # append each history point to the tracked history and
            # convert to numpy arrays
            converted_x, array_fx = map(
                np.concatenate, zip(*[self._convert_evaluation_sample(i, j)
                                      for i, j in zip(x, fx)]))
        else:
            raise ValueError(
                "Update values for `x` and `f(x)` must be either "
                "`Sample` and an evaluation or a list thereof."
            )

        if self._data_x.size == 0:
            self._data_x = converted_x
            self._data_fx = array_fx
        else:
            self._data_x = np.concatenate([self._data_x, converted_x])
            self._data_fx = np.concatenate([self._data_fx, array_fx])
        self.__is_empty_data = False

    def _convert_evaluation_sample(self, x, fx):
        if isinstance(fx, (float, int)):
            array_fx = np.array([[fx]])
        elif isinstance(fx, EvaluationScore):
            array_fx = np.array([[fx.value]])
        elif isinstance(fx, Dict):
            if not len(fx) == 1:
                raise NotImplementedError(
                    "Currently only evaluations with a single metric are supported."
                )
            array_fx = np.array([[list(fx.values())[0].value]])
        else:
            raise TypeError(
                "Cannot update history for one sample and multiple evaluations."
                " Use batched update instead and provide a list of samples and "
                "a list of evaluation metrics."
            )
        converted_x = self._convert_to_gpyopt_sample(x).reshape(1, -1)
        return converted_x, array_fx

    def reset(self):
        """Reset the optimiser for a fresh start."""
        super(BayesianOptimisation, self).reset()
        self._data_x = np.array([])
        self._data_fx = np.array([])
        self.__is_empty_data = True


BayesianOptimization = BayesianOptimisation


def _overwrite_dict(old_dict, new_dict):
    updated_old = {}
    # copy the old dict
    for key, value in old_dict.items():
        updated_old[key] = value
    # overwrite the existing and add the new values
    for key, value in new_dict.items():
        updated_old[key] = value
    return updated_old


================================================
FILE: hypertunity/optimisation/exhaustive.py
================================================
"""Optimisation by exhaustive search, aka grid search."""

from typing import List

from hypertunity.domain import Domain, DomainNotIterableError, Sample
from hypertunity.optimisation.base import ExhaustedSearchSpaceError, Optimiser

__all__ = [
    "GridSearch"
]


class GridSearch(Optimiser):
    """Grid search pseudo-optimiser."""

    def __init__(self,
                 domain: Domain,
                 sample_continuous: bool = False,
                 seed: int = None):
        """Initialise the :class:`GridSearch` optimiser from a discrete domain.

        If the domain contains continuous subspaces, then they could be sampled
        if `sample_continuous` is enabled.

        Args:
            domain: :class:`Domain`. The domain to iterate over.
            sample_continuous: (optional) :obj:`bool`. Whether to sample the
                continuous subspaces of the domain.
            seed: (optional) :obj:`int`. Seed for the sampling of the continuous
                subspace if necessary.
        """
        if domain.is_continuous and not sample_continuous:
            raise DomainNotIterableError(
                "Cannot perform grid search on (partially) continuous domain. "
                "To enable grid search in this case, set the argument "
                "'sample_continuous' to True."
            )
        super(GridSearch, self).__init__(domain)
        (
            discrete_domain,
            categorical_domain,
            continuous_domain
        ) = domain.split_by_type()
        # unify the discrete and the categorical into one,
        # as they can be iterated:
        self.discrete_domain = discrete_domain + categorical_domain
        if seed is not None:
            self.continuous_domain = Domain(
                continuous_domain.as_dict(), seed=seed
            )
        else:
            self.continuous_domain = continuous_domain
        self._discrete_domain_iter = iter(self.discrete_domain)
        self._is_exhausted = len(self.discrete_domain) == 0
        self.__exhausted_err = ExhaustedSearchSpaceError(
            "The domain has been exhausted. Reset the optimiser to start again."
        )

    def run_step(self, batch_size: int = 1, **kwargs) -> List[Sample]:
        """Get the next `batch_size` samples from the Cartesian-product walk
        over the domain.

        Args:
            batch_size: (optional) :obj:`int`. The number of samples to suggest
                at once.

        Returns:
            A list of :class:`Sample` instances from the domain.

        Raises:
            :class:`ExhaustedSearchSpaceError`: if the (discrete part of the)
                domain is fully exhausted and no samples can be generated.

        Notes:
            This method does not guarantee that the returned list of
            :class:`Samples` will be of length `batch_size`. This is due to the
            size of the domain and the fact that samples will not be repeated.
        """
        if self._is_exhausted:
            raise self.__exhausted_err

        samples = []
        for i in range(batch_size):
            try:
                discrete = next(self._discrete_domain_iter)
            except StopIteration:
                self._is_exhausted = True
                break
            if self.continuous_domain:
                continuous = self.continuous_domain.sample()
                samples.append(discrete + continuous)
            else:
                samples.append(discrete)
        if samples:
            return samples
        raise self.__exhausted_err

    def reset(self):
        """Reset the optimiser to the beginning of the Cartesian-product walk."""
        super(GridSearch, self).reset()
        self._discrete_domain_iter = iter(self.discrete_domain)
        self._is_exhausted = len(self.discrete_domain) == 0


================================================
FILE: hypertunity/optimisation/random.py
================================================
"""Optimisation by a uniformly random search."""

from typing import List

from hypertunity.domain import Domain, Sample
from hypertunity.optimisation.base import Optimiser

__all__ = [
    "RandomSearch"
]


class RandomSearch(Optimiser):
    """Uniform random sampling pseudo-optimiser."""

    def __init__(self, domain: Domain, seed: int = None):
        """Initialise the :class:`RandomSearch` search space.

        Args:
            domain: :class:`Domain`. The domain of the objective function.
                It will be sampled uniformly using the :py:meth:`sample()`
                method of the :class:`Domain`.
            seed: (optional) :obj:`int`. The seed for the domain sampling.
        """
        if seed is not None:
            domain = Domain(domain.as_dict(), seed=seed)
        super(RandomSearch, self).__init__(domain)

    def run_step(self, batch_size=1, **kwargs) -> List[Sample]:
        """Sample uniformly the domain for `batch_size` number of times.

        Args:
            batch_size: (optional) :obj:`int`. The number of samples to return
                at one step.

        Returns:
            A list of `batch_size` many :class:`Sample` instances.
        """
        return [self.domain.sample() for _ in range(batch_size)]


================================================
FILE: hypertunity/optimisation/tests/__init__.py
================================================


================================================
FILE: hypertunity/optimisation/tests/_common.py
================================================
import numpy as np

from hypertunity.optimisation import EvaluationScore

CONT_1D_ARGMAX = 3.989333
CONT_1D_MAX = 5.958363


def continuous_1d(x):
    """Compute x * sin(2x) + 2 if x in [0, 5] else 0."""
    fx = np.atleast_1d(x * np.sin(2 * x) + 2)
    fx[np.logical_and(x < 0, x > 5)] = 0.
    return fx


CONT_HETEROSCED_1D_ARGMAX = 0.0
CONT_HETEROSCED_1D_MAX = 2.0


def continuous_heteroscedastic_1d(x):
    """Compute 0.2 * x^4 - x^2 + 2 + eps
    where eps ~ N(0, |0.2 * x| + 1e-7) and x in [-2., 2]
    """
    rng = np.random.RandomState(7)
    noise = rng.normal(0., 0.2 * np.abs(x) + 1e-7)
    fx = np.atleast_1d(0.2 * x**4 - x**2 + 2 + noise)
    fx[np.logical_and(x < -2., x > 2.)] = 0.
    return fx


HETEROGEN_3D_ARGMAX = (6.0, "sqr", 0)
HETEROGEN_3D_MAX = 36.0


def heterogeneous_3d(x, y, z):
    """Compute `continuous_1d` + z if y == 'sin', else return x**2 - 3 * z
    where x is continuous, y is categorical ("sin", "sqr"), z is discrete.

    Args:
        x: float or np.ndarray, continuous variable         [-5.0, 6.0]
        y: str, categorical variable                        ("sin", "sqr")
        z: float or int or np.ndarray, discrete variable    (0, 1, 2, 3)
    """
    if y == "sin":
        return (continuous_1d(x) + z)[0]
    elif y == "sqr" and z in [0, 1, 2, 3]:
        return x**2 - 3 * z
    else:
        raise ValueError("`y` can only be 'sin' or 'sqr' and z [0, 1, 2, 3].")


DISCRETE_3D_ARGMAX = (4, 5, "large")
DISCRETE_3D_MAX = 3.0


def discrete_3d(x, y, z):
    """Compute c * x * y where c = 0.1 if z == "small" else 0.15.

    `x` and `y` are discrete numerical values, z is categorical.

    Args:
        x: int, discrete variable                           (1, 2, 3, 4)
        y: int, discrete variable                           (-3, 2, 5)
        z: str, categorical variable                        ("small", "large")
    """
    if (x not in {1, 2, 3, 4}
            and y not in {-3, 2, 5}
            and z not in {"small", "large"}):
        raise ValueError("Outside the allowed domain.")
    if z == "small":
        return 0.1 * x * y
    return 0.15 * x * y


def evaluate_continuous_1d(opt, batch_size, n_steps, **kwargs):
    all_samples = []
    all_evaluations = []
    for i in range(n_steps):
        samples = opt.run_step(batch_size, minimise=False, **kwargs)
        evaluations = continuous_1d(np.array([s["x"] for s in samples]))
        opt.update(samples, [EvaluationScore(ev) for ev in evaluations], )
        # gather the samples and evaluations for later assessment
        all_samples.extend([s["x"] for s in samples])
        all_evaluations.extend(evaluations)
    best_eval_index = int(np.argmax(all_evaluations))
    best_sample = all_samples[best_eval_index]
    best_eval = all_evaluations[best_eval_index]
    assert np.isclose(best_sample, CONT_1D_ARGMAX, atol=1e-1)
    assert np.isclose(best_eval, CONT_1D_MAX, atol=1e-1)


def evaluate_heterogeneous_3d(opt, batch_size, n_steps):
    all_samples = []
    all_evaluations = []
    for i in range(n_steps):
        samples = opt.run_step(batch_size, minimise=False)
        evaluations = [heterogeneous_3d(s["x"], s["y"], s["z"])
                       for s in samples]
        opt.update(samples, [EvaluationScore(ev) for ev in evaluations], )
        # gather the samples and evaluations for later assessment
        all_samples.extend([(s["x"], s["y"], s["z"]) for s in samples])
        all_evaluations.extend(evaluations)
    best_eval_index = int(np.argmax(all_evaluations))
    best_sample = all_samples[best_eval_index]
    best_eval = all_evaluations[best_eval_index]
    assert np.isclose(best_sample[0], HETEROGEN_3D_ARGMAX[0], atol=1.0)
    assert best_sample[1:] == HETEROGEN_3D_ARGMAX[1:]
    assert np.isclose(best_eval, HETEROGEN_3D_MAX, atol=1.0)


def evaluate_discrete_3d(opt, batch_size, n_steps):
    all_samples = []
    all_evaluations = []
    for i in range(n_steps):
        samples = opt.run_step(batch_size, minimise=False)
        evaluations = [discrete_3d(s["x"], s["y"], s["z"]) for s in samples]
        opt.update(samples, [EvaluationScore(ev) for ev in evaluations], )
        # gather the samples and evaluations for later assessment
        all_samples.extend([(s["x"], s["y"], s["z"]) for s in samples])
        all_evaluations.extend(evaluations)
    best_eval_index = int(np.argmax(all_evaluations))
    best_sample = all_samples[best_eval_index]
    best_eval = all_evaluations[best_eval_index]
    assert best_sample == DISCRETE_3D_ARGMAX
    assert best_eval == DISCRETE_3D_MAX


================================================
FILE: hypertunity/optimisation/tests/test_bo.py
================================================
import GPy
import numpy as np
import pytest

from hypertunity.domain import Domain
from hypertunity.optimisation import base, bo

from . import _common as test_utils


def test_bo_update_and_reset():
    domain = Domain({"a": {"b": [2, 3], "d": {"f": [3, 4]}}, "c": [0, 0.1]})
    bayes_opt = bo.BayesianOptimisation(domain, seed=7)
    samples = []
    n_reps = 3
    for i in range(n_reps):
        samples.extend(bayes_opt.run_step(batch_size=1, minimise=False))
        bayes_opt.update(samples[-1], base.EvaluationScore(2. * i))
    assert len(bayes_opt._data_x) == n_reps
    assert len(bayes_opt._data_fx) == n_reps
    assert np.all(
        bayes_opt._data_x == np.array([bayes_opt._convert_to_gpyopt_sample(s)
                                       for s in samples])
    )
    assert np.all(
        bayes_opt._data_fx == 2. * np.arange(n_reps).reshape(n_reps, 1)
    )
    bayes_opt.reset()
    assert len(bayes_opt.history) == 0


def test_bo_set_history():
    n_samples = 10
    domain = Domain({"a": {"b": [2, 3]}, "c": [0, 0.1]})
    history = [
        base.HistoryPoint(
            domain.sample(),
            {"score": base.EvaluationScore(float(i))}
        )
        for i in range(n_samples)
    ]
    bayes_opt = bo.BayesianOptimisation(domain, seed=7)
    bayes_opt.history = history
    assert bayes_opt.history == history
    assert len(bayes_opt._data_x) == len(bayes_opt._data_fx) == len(history)


@pytest.mark.slow
def test_bo_simple_continuous():
    domain = Domain({"x": [-1., 6.]})
    bayes_opt = bo.BayesianOptimization(domain=domain, seed=7)
    test_utils.evaluate_continuous_1d(bayes_opt, batch_size=2, n_steps=7)


@pytest.mark.slow
def test_bo_simple_mixed():
    domain = Domain({"x": [-5., 6.], "y": {"sin", "sqr"}, "z": set(range(4))})
    bayes_opt = bo.BayesianOptimization(domain=domain, seed=7)
    test_utils.evaluate_heterogeneous_3d(bayes_opt, batch_size=7, n_steps=3)


@pytest.mark.slow
def test_bo_custom_model():
    domain = Domain({"x": [-2., 2.]})
    bayes_opt = bo.BayesianOptimisation(domain=domain, seed=7)
    kernel = GPy.kern.RBF(1) + GPy.kern.Bias(1)
    n_steps = 3
    batch_size = 3
    all_samples = []
    all_evaluations = []
    first_samples = bayes_opt.run_step(batch_size=batch_size, minimise=False)
    xs = np.atleast_2d([s["x"] for s in first_samples])
    ys = np.atleast_2d(test_utils.continuous_heteroscedastic_1d(
        np.array([s["x"] for s in first_samples]))
    )
    for i in range(n_steps):
        custom_model = GPy.models.GPHeteroscedasticRegression(xs, ys, kernel)
        samples = bayes_opt.run_step(
            batch_size,
            minimise=False,
            model=custom_model
        )
        evaluations = test_utils.continuous_heteroscedastic_1d(
            np.array([s["x"] for s in samples])
        )
        bayes_opt.update(
            samples, [base.EvaluationScore(ev) for ev in evaluations]
        )
        xs = np.concatenate(
            [xs, np.atleast_2d([s["x"] for s in samples])], axis=0
        )
        ys = np.concatenate([ys, np.atleast_2d(evaluations)], axis=0)
        # gather the samples and evaluations for later assessment
        all_samples.extend([s["x"] for s in samples])
        all_evaluations.extend(evaluations)
    best_eval_index = int(np.argmax(all_evaluations))
    best_sample = all_samples[best_eval_index]
    assert np.isclose(
        best_sample, test_utils.CONT_HETEROSCED_1D_ARGMAX, atol=1e-1
    )


@pytest.mark.skip("Due to https://github.com/SheffieldML/GPyOpt/issues/260"
                  " using GP_MCMC model can not be tested yet.")
@pytest.mark.slow
def test_bo_gp_mcmc_model():
    domain = Domain({"x": [-1., 6.]})
    bayes_opt = bo.BayesianOptimization(domain=domain, seed=7)
    test_utils.evaluate_continuous_1d(
        bayes_opt,
        batch_size=1,
        n_steps=7,
        model="GP_MCMC",
        evaluator_type="sequential"
    )


================================================
FILE: hypertunity/optimisation/tests/test_exhaustive.py
================================================
import pytest

from hypertunity.domain import Domain
from hypertunity.optimisation import exhaustive

from . import _common as test_utils


def test_grid_simple_discrete():
    domain = Domain({
        "x": {1, 2, 3, 4},
        "y": {-3, 2, 5},
        "z": {"small", "large"}
    })
    gs = exhaustive.GridSearch(domain=domain)
    test_utils.evaluate_discrete_3d(gs, batch_size=4, n_steps=3 * 2)
    with pytest.raises(exhaustive.ExhaustedSearchSpaceError):
        gs.run_step(batch_size=4)
    gs.reset()
    assert len(gs.run_step(batch_size=4)) == 4


def test_grid_simple_mixed():
    domain = Domain({"x": [-5., 6.], "y": {"sin", "sqr"}, "z": set(range(4))})
    with pytest.raises(exhaustive.DomainNotIterableError):
        _ = exhaustive.GridSearch(domain)
    gs = exhaustive.GridSearch(domain, sample_continuous=True, seed=93)
    assert len(gs.run_step(batch_size=8)) == 8


def test_update():
    domain = Domain({"x": {-5., 6.}})
    gs = exhaustive.GridSearch(domain)
    gs.update([domain.sample() for _ in range(10)], list(range(10)))
    gs.update(domain.sample(), {"score": 23.0})
    gs.update(domain.sample(), 2.0)
    assert len(gs.history) == 12


================================================
FILE: hypertunity/optimisation/tests/test_random.py
================================================
from hypertunity.domain import Domain
from hypertunity.optimisation import random

from . import _common as test_utils


def test_random_simple_continuous():
    domain = Domain({"x": [-1., 6.]})
    rs = random.RandomSearch(domain=domain, seed=7)
    test_utils.evaluate_continuous_1d(rs, batch_size=50, n_steps=2)


def test_random_simple_mixed():
    domain = Domain({"x": [-5., 6.], "y": {"sin", "sqr"}, "z": set(range(4))})
    rs = random.RandomSearch(domain=domain, seed=1)
    test_utils.evaluate_heterogeneous_3d(rs, batch_size=50, n_steps=25)


def test_update():
    domain = Domain({"x": [-5., 6.]})
    rs = random.RandomSearch(domain)
    rs.update([domain.sample() for _ in range(4)], list(range(4)))
    rs.update(domain.sample(), {"score": 23.0})
    rs.update(domain.sample(), 2.0)
    assert len(rs.history) == 6
    rs.reset()
    assert len(rs.history) == 0


================================================
FILE: hypertunity/reports/__init__.py
================================================
from .base import Reporter
from .table import Table


================================================
FILE: hypertunity/reports/base.py
================================================
import abc
import datetime
import os
from typing import Any, Callable, Dict, List, Optional, Tuple, Union

import tinydb

from hypertunity.domain import Domain, Sample
from hypertunity.optimisation.base import EvaluationScore, HistoryPoint

__all__ = [
    "Reporter"
]

HistoryEntryType = Union[
    HistoryPoint,
    Tuple[Sample, Union[float, Dict[str, float], Dict[str, EvaluationScore]]]
]


class Reporter:
    """Abstract class :class:`Reporter` for result visualisation."""

    def __init__(self, domain: Domain,
                 metrics: List[str],
                 primary_metric: str = "",
                 database_path: str = None):
        """Initialise the base reporter with domain and metrics.

        Args:
            domain: A :class:`Domain` from which all evaluated samples are drawn.
            metrics: :obj:`List[str]` with names of the metrics used during
                evaluation.
            primary_metric: (optional) :obj:`str` primary metric from `metrics`.
                This is used to determine the best sample. Defaults to the first one.
            database_path: (optional) :obj:`str` path to the database for
                storing experiment history on disk. Defaults to in-memory storage.
        """
        self.domain = domain
        if not metrics:
            self.metrics = ["score"]
        else:
            self.metrics = metrics
        if not primary_metric:
            self.primary_metric = self.metrics[0]
        else:
            self.primary_metric = primary_metric

        self._default_table_name = f"trial_{datetime.datetime.now().isoformat()}"
        if database_path is not None:
            if not os.path.exists(database_path):
                os.makedirs(database_path)
            db_path = os.path.join(database_path, "db.json")
            self._db = tinydb.TinyDB(
                db_path,
                sort_keys=True,
                indent=4,
                separators=(',', ': ')
            )
        else:
            from tinydb.storages import MemoryStorage
            self._db = tinydb.TinyDB(storage=MemoryStorage,
                                     default_table=self._default_table_name)
        self._db_default_table = self._db.table(self._default_table_name)

    @property
    def database(self):
        """Return the logging database."""
        return self._db

    @property
    def default_database_table(self):
        """Return the default database table name."""
        return self._default_table_name

    def log(self, entry: HistoryEntryType, **kwargs: Any):
        """Create an entry for an optimisation history point in the
        :class:`Reporter`.

        Args:
            entry: :class:`HistoryPoint` or :obj:`Tuple[Sample, Dict]`.
                The history point to log. If given as a tuple of :class:`Sample`
                instance and a mapping from metric names to results, the
                variance of the evaluation noise can be supplied by adding
                an entry in the dict with the metric name and the suffix '_var'.
            **kwargs: (optional) :obj:`Any`. Additional arguments for the
                logging implementation in a subclass.

        Keyword Args:
            meta: (optional) additional information to be logged in the database
                for this entry.
        """
        if isinstance(entry, Tuple):
            log_fn = self._log_tuple
        elif isinstance(entry, HistoryPoint):
            self._add_to_db(entry, kwargs.pop("meta", None))
            log_fn = self._log_history_point
        else:
            raise TypeError(
                "The history point can be either a tuple or a "
                "`HistoryPoint` type object."
            )
        log_fn(entry, **kwargs)

    def _log_tuple(self, entry: Tuple, **kwargs):
        """Helper function to convert the history entry from tuple to
        :class:`HistoryPoint` and then log it using the overridden method
        :method:`_log_history_point`.
        """
        if not (len(entry) == 2 and isinstance(entry[0], Sample)
                and isinstance(entry[1], (Dict, EvaluationScore, float))):
            raise ValueError(f"Malformed history entry tuple: {entry}.")
        sample, metrics_obj = entry
        if isinstance(metrics_obj, (float, EvaluationScore)):
            # use default name for score column
            metrics_obj = {self.primary_metric: metrics_obj}
        metrics = {}
        # create a properly formatted metrics dict of type Dict[str, EvaluationScore]
        for name, val in metrics_obj.items():
            if name in metrics:
                continue
            if name.endswith("_var"):
                metric_name = name.rstrip("_var")
                if (metric_name not in metrics_obj
                        or not isinstance(metrics_obj[metric_name], float)):
                    raise ValueError(
                        f"Metrics dict does not contain a proper value "
                        f"for metric {metric_name}."
                    )
                metrics[metric_name] = EvaluationScore(
                    value=metrics_obj[metric_name],
                    variance=val
                )
            elif isinstance(val, EvaluationScore):
                metrics[name] = val
            elif isinstance(val, float):
                metrics[name] = EvaluationScore(
                    value=val,
                    variance=metrics_obj.get(f"{name}_var", 0.0)
                )
        entry = HistoryPoint(sample=sample, metrics=metrics)
        self._add_to_db(entry, kwargs.pop("meta", None))
        self._log_history_point(entry, **kwargs)

    @abc.abstractmethod
    def _log_history_point(self, entry: HistoryPoint, **kwargs: Any):
        """Abstract method to override.

        Log the :class:`HistoryPoint` entry into the reporter.

        Args:
            entry: :class:`HistoryPoint`. The sample and evaluation metrics to log.
        """
        raise NotImplementedError

    def _add_to_db(self, entry: HistoryPoint, meta: Any = None):
        document = self._convert_history_to_doc(entry)
        if meta is not None:
            document["meta"] = meta
        self._db_default_table.insert(document)

    def get_best(self, criterion: Union[str, Callable] = "max") -> Optional[Dict[str, Any]]:
        """Return the entry from the database which corresponds to the best
        scoring experiment.

        Args:
            criterion: :obj:`str` or :obj:`Callable`. The function used to
                determine whether the highest or lowest score is requested. If
                several evaluation metrics are present, then a custom `criterion`
                must be supplied.

        Returns:
            JSON object or `None` if the database is empty. The content of the
            database for the best experiment.
        """
        if not self._db_default_table:
            return None
        if isinstance(criterion, str):
            predefined = {"max": max, "min": min}
            if criterion not in predefined:
                raise ValueError(
                    f"Unknown criterion for finding best experiment. "
                    f"Select one from {list(predefined.keys())} "
                    f"or supply a custom function."
                )
            selection_fn = predefined[criterion]
        elif isinstance(criterion, Callable):
            selection_fn = criterion
        else:
            raise TypeError("The criterion must be of type str or Callable.")
        return self._get_best_from_db(selection_fn)

    def _get_best_from_db(self, selection_fn: Callable):
        best_entry = self._db_default_table.get(doc_id=1)
        best_score = best_entry["metrics"][self.primary_metric]["value"]
        for entry in self._db_default_table:
            current_score = entry["metrics"][self.primary_metric]["value"]
            new_score = selection_fn(current_score, best_score)
            if new_score != best_score:
                best_entry = entry
                best_score = new_score
        return best_entry

    def from_history(self, history: List[HistoryEntryType]):
        """Load the reporter with data from an entry of evaluations.

        Args:
            history: :obj:`List[HistoryPoint]` or :obj:`Tuple`. The sequence of
                evaluations comprised of samples and metrics.
        """
        for h in history:
            self.log(h)

    def from_database(self, database: Union[str, tinydb.TinyDB], table: str = None):
        """Load history from a database supplied as a path to a file or a
        :obj:`tinydb.TinyDB` object.

        Args:
            database: :obj:`str` or :obj:`tinydb.TinyDB`. The database to load.
            table: (optional) :obj:`str`. The table to load from the database.
                This argument is not required if the database has only one table.

        Raises:
            :class:`ValueError`: if the database contains more than one table
                and `table` is not given.
        """
        if isinstance(database, str):
            db = tinydb.TinyDB(database, sort_keys=True, indent=4, separators=(',', ': '))
        elif isinstance(database, tinydb.TinyDB):
            db = database
        else:
            raise TypeError("The database must be of type str or tinydb.TinyDB.")
        if len(db.tables()) > 1 and table is None:
            raise ValueError(
                "Ambiguous database with multiple tables. "
                "Specify a table name."
            )
        if table is None:
            table = list(db.tables())[0]
        self._db = db
        self._db_default_table = self._db.table(table)

    def to_history(self, table: str = None) -> List[HistoryPoint]:
        """Export the reporter logged history from a database table to an
        optimiser-friendly history.

        Args:
            table: (optional) :obj:`str`. The name of the table to export.
                Defaults to the one created during reporter initialisation.

        Returns:
            A list of :class:`HistoryPoint` objects which can be loaded into
            an :class:`Optimiser` instance.
        """
        history = []
        if table is None:
            default_table = self._db_default_table
        else:
            default_table = self._db.table(table)
        for doc in default_table:
            history.append(self._convert_doc_to_history(doc))
        return history

    @staticmethod
    def _convert_history_to_doc(entry: HistoryPoint) -> Dict:
        db_entry = {
            "sample": entry.sample.as_dict(),
            "metrics": {k: {
                "value": v.value,
                "variance": v.variance
            } for k, v in entry.metrics.items()}
        }
        return db_entry

    @staticmethod
    def _convert_doc_to_history(document: Dict) -> HistoryPoint:
        hist_point = HistoryPoint(
            sample=Sample(document["sample"]),
            metrics={k: EvaluationScore(v["value"], v["variance"])
                     for k, v in document["metrics"].items()}
        )
        return hist_point


================================================
FILE: hypertunity/reports/table.py
================================================
from typing import Any, List, Union

import beautifultable as bt
import numpy as np
import tinydb

from hypertunity import utils
from hypertunity.domain import Domain
from hypertunity.optimisation.base import HistoryPoint

from .base import Reporter

__all__ = [
    "Table"
]


class Table(Reporter):
    """A :class:`Reporter` subclass to print and store a formatted table of
    the results.
    """

    def __init__(self, domain: Domain,
                 metrics: List[str],
                 primary_metric: str = "",
                 database_path: str = None):
        """Initialise the table reporter with domain and metrics.

        Args:
            domain: A :class:`Domain` from which all evaluated samples are drawn.
            metrics: :obj:`List[str]` with names of the metrics used during evaluation.
            primary_metric: (optional) :obj:`str` primary metric from `metrics`.
                This is used to determine the best sample. Defaults to the first one.
            database_path: (optional) :obj:`str` path to the database for
                storing experiment history on disk. Defaults to in-memory storage.
        """
        super(Table, self).__init__(
            domain, metrics, primary_metric, database_path
        )
        self._table = bt.BeautifulTable()
        self._table.set_style(bt.STYLE_SEPARATED)
        dim_names = [".".join(dns) for dns in self.domain.flatten()]
        self._table.column_headers = ["No.", *dim_names, *self.metrics]

    def __str__(self):
        """Return the string representation of the table."""
        return str(self._table)

    @property
    def data(self) -> np.array:
        """Return the table as a numpy array."""
        return np.array(self._table)

    def _log_history_point(self, entry: HistoryPoint, **kwargs: Any):
        """Create an entry for a :class:`HistoryPoint` in the table.

        Args:
            entry: :class:`HistoryPoint`. The history point to log. If given as
                a tuple of :class:`Sample` instance and a mapping from metric
                names to results, the variance of the evaluation noise can be
                supplied by adding an entry in the dict with the metric name and
                the suffix '_var'.
        """
        id_ = len(self._table)
        row = [id_ + 1,
               *entry.sample.flatten().values(),
               *entry.metrics.values()]
        self._table.append_row(row)

    @utils.support_american_spelling
    def format(self, order: str = "none", emphasise: bool = False) -> str:
        """Format the table and return it as a string.

        Supported formatting is sorting and emphasising of the best result.

        Args:
            order: (optional) :obj:`str`. The order of sorting by the primary
                metric. Can be "none", "ascending" or "descending".
                Defaults to "none".
            emphasise: (optional) :obj:`bool`. Whether to emphasise the best
                experiment by marking it in yellow and blinking if supported.
                Defaults to `False`.

        Returns:
            :obj:`str` of the formatted table.
        """
        table_copy = self._table.copy()
        if order not in ["none", "descending", "ascending"]:
            raise ValueError(
                "`order` argument can only be 'ascending' or 'descending'."
            )
        if order != "none":
            table_copy.sort(
                key=self.primary_metric,
                reverse=order == "descending"
            )
        if emphasise:
            best_row_ind = int(np.argmax(
                list(table_copy.get_column(self.primary_metric))
            ))
            emphasised_best_row = map(
                lambda x: f"\033[33;5;7m{x}\033[0m", table_copy[best_row_ind]
            )
            table_copy.update_row(best_row_ind, emphasised_best_row)
        return str(table_copy)

    def from_database(self, database: Union[str, tinydb.TinyDB], table: str = None):
        """Load history from a database supplied as a path to a file or a
        :obj:`tinydb.TinyDB` object.

        Args:
            database: :obj:`str` or :obj:`tinydb.TinyDB`. The database to load.
            table: (optional) :obj:`str`. The table to load from the database.
                This argument is not required if the database has only one table.

        Raises:
            :class:`ValueError`: if the database contains more than one table
            and `table` is not given.
        """
        super(Table, self).from_database(database, table)
        for doc in self._db_default_table:
            history_point = self._convert_doc_to_history(doc)
            self._log_history_point(history_point)


================================================
FILE: hypertunity/reports/tensorboard.py
================================================
import os
import sys
from typing import Any, Dict, List, Union

import tinydb

from hypertunity import utils
from hypertunity.domain import Domain, Sample
from hypertunity.optimisation.base import HistoryPoint

from .base import Reporter

try:
    import tensorflow as tf
    from tensorboard.plugins.hparams import api as hp
except ImportError as err:
    raise ImportError("Install tensorflow>=1.14 and tensorboard>=1.14 "
                      "to support the HParams plugin.") from err


__all__ = [
    "Tensorboard"
]

EAGER_MODE = tf.executing_eagerly()
session_builder = tf.compat.v1.Session
if str(tf.version.VERSION) < "2.":
    summary_file_writer = tf.compat.v2.summary.create_file_writer
    summary_scalar = tf.compat.v2.summary.scalar
else:
    summary_file_writer = tf.summary.create_file_writer
    summary_scalar = tf.summary.scalar


class Tensorboard(Reporter):
    """A :class:`Reporter` subclass to visualise the results in Tensorboard.

    It utilises Tensorboard's HParams plugin as a dashboard for the summary of
    the optimisation. This class prepares and creates entries with the scalar
    data of the experiment trials, containing the domain sample and the
    corresponding metrics.

    Notes:
        The user is responsible for launching TensorBoard in the browser.
    """

    def __init__(self, domain: Domain, metrics: List[str], logdir: str,
                 primary_metric: str = "",
                 database_path: str = None):
        """Initialise the TensorBoard reporter.

        Args:
            domain: :class:`Domain`. The domain to which all evaluated samples belong.
            metrics: :obj:`List[str]`. The names of the metrics.
            logdir: :obj:`str`. Path to a folder for storing the Tensorboard events.
            primary_metric: (optional) :obj:`str`. Primary metric from `metrics`.
                This is used by the :py:meth:`format` method to determine the
                sorting column and the best value. Default is the first one.
            database_path: (optional) :obj:`str`. The path to the database for
                storing experiment history on disk. Default is in-memory storage.
        """
        super(Tensorboard, self).__init__(
            domain, metrics, primary_metric, database_path
        )
        self._hparams_domain = self._convert_to_hparams_domain(self.domain)
        if not os.path.exists(logdir):
            os.makedirs(logdir)
        self._logdir = logdir
        self._experiment_counter = 0
        self._set_up()
        print(f"Run 'tensorboard --logdir={logdir}' to launch "
              f"the visualisation in TensorBoard", file=sys.stderr)

    @staticmethod
    def _convert_to_hparams_domain(domain: Domain) -> Dict[str, hp.HParam]:
        hparams = {}
        for var_name, dim in domain.flatten().items():
            dim_type = Domain.get_type(dim)
            joined_name = utils.join_strings(var_name, join_char="/")
            if dim_type == Domain.Continuous:
                hp_dim_type = hp.RealInterval
                vals = list(map(float, dim))
            elif dim_type in [Domain.Discrete, Domain.Categorical]:
                hp_dim_type = hp.Discrete
                vals = (dim,)
            else:
                raise TypeError(
                    f"Cannot map subdomain of type {dim_type} "
                    f"to a known HParams domain."
                )
            hparams[joined_name] = hp.HParam(joined_name, hp_dim_type(*vals))
        return hparams

    def _convert_to_hparams_sample(self, sample: Sample) -> Dict[hp.HParam, Any]:
        hparams = {}
        for name, val in sample:
            joined_name = utils.join_strings(name, join_char="/")
            hparams[self._hparams_domain[joined_name]] = val
        return hparams

    def _set_up(self):
        with summary_file_writer(self._logdir).as_default():
            hp.hparams_config(
                hparams=self._hparams_domain.values(),
                metrics=[hp.Metric(m) for m in self.metrics])

    @staticmethod
    def _log_tf_eager_mode(params, metrics, full_experiment_dir):
        """Log in eager mode."""
        with summary_file_writer(full_experiment_dir).as_default():
            hp.hparams(params)
            for metric_name, metric_value in metrics.items():
                summary_scalar(metric_name, metric_value.value, step=1)

    @staticmethod
    def _log_tf_graph_mode(params, metrics, full_experiment_dir):
        """Log in legacy graph execution mode with session creation."""
        with summary_file_writer(full_experiment_dir).as_default() as fw, session_builder() as sess:
            sess.run(fw.init())
            sess.run(hp.hparams(params))
            for metric_name, metric_value in metrics.items():
                sess.run(summary_scalar(metric_name, metric_value.value, step=1))
            sess.run(fw.flush())

    def _log_history_point(self, entry: HistoryPoint, experiment_dir: str = None):
        """Create an entry for a :class:`HistoryPoint` in Tensorboard.

        Args:
            entry: :class:`HistoryPoint`. The sample and evaluation metrics to log.
            experiment_dir: (optional) :obj:`str`. The directory name where to
                store all experiment related data. It will be prefixed by the
                `logdir` path which is provided on initialisation of the
                :class:`Tensorboard` object. Default is 'experiment_[number]'.
        """
        converted = self._convert_to_hparams_sample(entry.sample)
        if not experiment_dir:
            experiment_dir = f"experiment_{str(self._experiment_counter)}"
            self._experiment_counter += 1
        full_experiment_dir = os.path.join(self._logdir, experiment_dir)
        if EAGER_MODE:
            self._log_tf_eager_mode(converted, entry.metrics, full_experiment_dir)
        else:
            self._log_tf_graph_mode(converted, entry.metrics, full_experiment_dir)

    def from_database(self, database: Union[str, tinydb.TinyDB], table: str = None):
        """Load history from a database supplied as a path to a file or a
        :obj:`tinydb.TinyDB` object.

        Args:
            database: :obj:`str` or :obj:`tinydb.TinyDB`. The database to load.
            table: (optional) :obj:`str`. The table to load from the database.
                This argument is not required if the database has only one table.

        Raises:
            :class:`ValueError`: if the database contains more than one table
            and `table` is not given.
        """
        super(Tensorboard, self).from_database(database, table)
        for doc in self._db_default_table:
            history_point = self._convert_doc_to_history(doc)
            self._log_history_point(history_point)


================================================
FILE: hypertunity/reports/tests/__init__.py
================================================


================================================
FILE: hypertunity/reports/tests/conftest.py
================================================
import pytest

from hypertunity.domain import Domain
from hypertunity.optimisation.base import EvaluationScore, HistoryPoint


@pytest.fixture(scope="session")
def generated_history():
    domain = Domain({
        "x": [-5., 6.],
        "y": {"sin", "sqr"},
        "z": set(range(4))
    }, seed=7)
    n_samples = 10
    history = [HistoryPoint(sample=domain.sample(),
                            metrics={"metric_1": EvaluationScore(float(i)),
                                     "metric_2": EvaluationScore(i * 2.)})
               for i in range(n_samples)]
    if len(history) == 1:
        history = history[0]
    return history, domain


================================================
FILE: hypertunity/reports/tests/test_table.py
================================================
import os
import tempfile

from hypertunity.optimisation.base import EvaluationScore

from ..table import Table


def test_from_to_history(generated_history):
    history, domain = generated_history
    rep = Table(
        domain,
        metrics=["metric_1", "metric_2"],
        primary_metric="metric_1"
    )
    rep.from_history(history)
    data_history = [
        [i + 1, *list(h.sample.flatten().values()), *list(h.metrics.values())]
        for i, h in enumerate(history)
    ]
    assert rep.data.tolist() == data_history
    assert rep.to_history() == history


def test_from_tuple_and_history_point(generated_history):
    history, domain = generated_history
    hist_point = history[0]
    rep = Table(
        domain,
        metrics=["metric_1", "metric_2"],
        primary_metric="metric_1"
    )
    rep.log(hist_point)
    sample = domain.sample()
    rep.log((sample, {"metric_1": 1.0, "metric_2": 2.0, "metric_2_var": 3.0}))
    assert rep.data.tolist() == [
        [1, *list(hist_point.sample.flatten().values()),
         *list(hist_point.metrics.values())],
        [2, *list(sample.flatten().values()),
         EvaluationScore(1.0), EvaluationScore(2.0, 3.0)]
    ]


def test_database_and_get_best(generated_history):
    history, domain = generated_history
    with tempfile.TemporaryDirectory() as db_dir:
        rep = Table(
            domain,
            metrics=["metric_1", "metric_2"],
            database_path=db_dir
        )
        best_meta, best_metrics, best_sample = {}, {}, {}
        best_score = float("-inf")
        for i, hp in enumerate(history):
            rep.log(hp, meta={"id": i})
            if hp.metrics["metric_1"].value > best_score:
                best_meta = {"id": i}
                best_metrics = {k: {"value": v.value, "variance": v.variance}
                                for k, v in hp.metrics.items()}
                best_sample = hp.sample.as_dict()
                best_score = hp.metrics["metric_1"].value

        assert len(rep.database.table(rep.default_database_table)) == len(history)
        best_entry = rep.get_best(criterion="max")
        assert best_entry["meta"] == best_meta
        assert best_entry["metrics"] == best_metrics
        assert best_entry["sample"] == best_sample

        rep2 = Table(domain, metrics=["metric_1", "metric_2"])
        rep2.from_database(rep.database, table=rep.default_database_table)
        rep3 = Table(domain, metrics=["metric_1", "metric_2"])
        rep3.from_database(os.path.join(db_dir, "db.json"),
                           table=rep.default_database_table)

        assert str(rep) == str(rep2) == str(rep3)
        assert rep.get_best() == rep2.get_best() == rep3.get_best()


================================================
FILE: hypertunity/reports/tests/test_tensorboard.py
================================================
import os
import tempfile

from ..tensorboard import Tensorboard


def test_from_to_history(generated_history):
    history, domain = generated_history
    with tempfile.TemporaryDirectory() as tmp_dir:
        rep = Tensorboard(
            domain,
            metrics=["metric_1", "metric_2"],
            logdir=tmp_dir
        )
        rep.from_history(history)
        assert len([dirname for dirname in os.listdir(tmp_dir)
                    if dirname.startswith("experiment_")]) == len(history)
        for root, dirs, files in os.walk(tmp_dir):
            assert all(map(lambda x: x.startswith("events.out.tfevents"), files))
        assert rep.to_history() == history


def test_from_tuple_and_history_point(generated_history):
    history, domain = generated_history
    hist_point = history[0]
    with tempfile.TemporaryDirectory() as tmp_dir:
        rep = Tensorboard(
            domain,
            metrics=["metric_1", "metric_2"],
            logdir=tmp_dir
        )
        rep.log(hist_point)
        rep.log((domain.sample(),
                 {"metric_1": 1.0, "metric_2": 2.0, "metric_2_var": 3.0}))
        assert len([dirname for dirname in os.listdir(tmp_dir)
                    if dirname.startswith("experiment_")]) == 2
        for root, dirs, files in os.walk(tmp_dir):
            assert all(map(lambda x: x.startswith("events.out.tfevents"), files))


================================================
FILE: hypertunity/scheduling/__init__.py
================================================
from .jobs import *
from .scheduler import *


================================================
FILE: hypertunity/scheduling/jobs.py
================================================
"""Definition of `Job` and `Result` classes used to encapsulate an experiment
and the corresponding outcomes.
"""

import enum
import importlib
import os
import pickle
import re
import subprocess
import sys
import tempfile
import time
from dataclasses import dataclass, field
from functools import partial
from typing import Any, Callable, Dict, List, Tuple, Union

__all__ = [
    "Job",
    "SlurmJob",
    "Result"
]

# Global registries to control the job and result id assignment
_JOB_REGISTRY = set()
_RESULT_REGISTRY = set()
_ID_COUNTER = -1


def reset_registry():
    """Reset the global job and result registries.

    Notes:
        This function should be used with care as it will allow for jobs with
        repeating IDs to be created. As a consequence, two or more
        :class:`Result` objects might coexist end make the actual experiment
        outcome ambiguous.
    """
    global _ID_COUNTER
    _JOB_REGISTRY.clear()
    _RESULT_REGISTRY.clear()
    _ID_COUNTER = -1


def generate_id():
    """Generate a new, unused integer job id."""
    global _ID_COUNTER
    _ID_COUNTER += 1
    return _ID_COUNTER


def import_script(path):
    """Import a module or script by a given path.

    Args:
        path: :obj:`str`, can be either a module import of the form
            [package.]*[module] if the outer most package is in the
            `PYTHONPATH`, or a path to an arbitrary python script.

    Returns:
        The loaded python script as a module.
    """
    try:
        module = importlib.import_module(path)
    except ModuleNotFoundError:
        if not os.path.isfile(path):
            raise FileNotFoundError(f"Cannot find script {path}.")
        if not os.path.basename(path).endswith(".py"):
            raise ValueError(

                f"Expected a python script ending with *.py, "
                f"found {os.path.basename(path)}.")
        import_path = os.path.dirname(os.path.abspath(path))
        sys.path.append(import_path)
        module = importlib.import_module(
            f"{os.path.basename(path).rstrip('.py')}",
            package=f"{os.path.basename(import_path)}"
        )
        sys.path.pop()
    return module


def run_command(cmd: List[str]) -> str:
    """Execute a command in the shell.

    Args:
        cmd: :obj:`List[str]`. The command with its arguments to execute.

    Returns:
        The standard output of the command.

    Raises:
        :obj:`OSError`: if the standard error stream is not empty.
    """
    ps = subprocess.run(args=cmd, capture_output=True)
    if ps.stderr:
        raise OSError(f"Failed running {' '.join(cmd)} with error message: "
                      f"{ps.stderr.decode('utf-8')}.")
    return ps.stdout.decode("utf-8")


def get_callable_from_script(script_path: str, func_name: str = "main") -> Callable:
    """Convert a module to a callable function and call the `main` function of
    the module.

    Args:
        script_path: str, the file path to the python script to run. It can
            either be given as a module i.e. in the [package.]*[module] form,
            or as a path to a *.py file in case it is not added into the
            PYTHONPATH environment variable.
        func_name: str, the name of the function to run.

    Returns:
        The wrapper which calls a function from the script module.

    Raises:
          `AttributeError` if the script does not define a `func_name` function.
    """

    def wrapper(*args):
        module = import_script(script_path)
        if not hasattr(module, func_name):
            raise AttributeError(
                f"Cannot find {func_name} function in {script_path}."
            )
        return getattr(module, func_name)(*args)

    return wrapper


def run_script_with_args(binary: str, script_path: str, *args: Any, **kwargs: Any):
    """Run script using a binary and command line arguments.

    Args:
        binary: str, the binary to run the script with, e.g. 'python'.
        script_path: str, the path to the script.
        *args: Any, a collection of arguments which will be converted to string
            and passed on to the run command.
        **kwargs: Any, keyword arguments which will be converted to named script
            arguments.

    Returns:
        The contents of the results, which the script is assumed to store,
        given an output file path as an argument.

    Raises:
        FileNotFoundError if the script cannot be found.

    Notes:
        It assumes that the script will store the results on disk using the
        path provided by the last of the command line arguments.
    """
    if not os.path.isfile(script_path):
        raise FileNotFoundError(f"Cannot find script {script_path}.")
    with tempfile.TemporaryDirectory() as tmpdir:
        output_file = os.path.join(tmpdir, "results.pkl")
        args_as_str, kwargs_as_str = [], []
        if args:
            args_as_str.extend([*map(str, args), output_file])
        if kwargs:
            kwargs_as_str.extend([
                str(item) for k_v in kwargs.items() for item in k_v
            ])
            kwargs_as_str.extend(["--output_file", output_file])
        run_command([binary, script_path, *args_as_str, *kwargs_as_str])
        return fetch_result(output_file)


def fetch_result(output_file, n_trials: int = 5, waiting_time: float = 1.0) -> Any:
    """Load the output file.

    Args:
        output_file: str, a path to the output file.
        n_trials: int, optional number of trials to load the file, afterwards a
            None is returned.
        waiting_time: float, time in seconds to wait before retrying to load
            the file.

    Returns:
        The unpickled output file if found, else None.
    """
    if output_file is None:
        return None
    for _ in range(n_trials):
        if os.path.isfile(output_file):
            break
        time.sleep(waiting_time)
    else:
        return None
    with open(output_file, 'rb') as fp:
        return pickle.load(fp)


@dataclass(frozen=True)
class Job:
    """Default :class:`Job` class defining an experiment as a runnable task on
    the local machine.

    The job is defined by a callable function or a script task. In the case of
    the former the `args` will be passed directly to it upon calling. Otherwise
    either a module will be run as a scirpt with command line arguments or a
    function, attribute of the module, will be called with the `args` as input.
    In both cases a :class:`Result` object will be returned.

    Attributes:
        id: :obj:`int`. The job identifier. Must be unique.
        args: :obj:`tuple` or :obj:`dict`. The arguments or keyword arguments
            for the callable function or script.
        task: :obj:`Callable` or :obj:`str`, a python function to run or a
            file path to a python script.
    """
    task: Union[Callable, str]
    args: Union[Tuple, Dict] = ()
    id: int = field(default_factory=generate_id)
    meta: Any = None

    # job related constants
    _JOB_SCRIPT_FUNC_SEPARATOR = ":"
    _JOB_DEFAULT_BINARY = "source"
    _JOB_SCRIPT_FUNC_SEPARATION_REGEX = r"[^\w\/\.]+"

    def __post_init__(self):
        if not isinstance(self.task, (Callable, str)):
            raise ValueError(
                "Job's task must be either a callable function "
                "or a path to a script."
            )
        if self.id in _JOB_REGISTRY:
            raise ValueError(
                f"Job with an ID {self.id} is already created. "
                f"Reusing IDs is prohibited."
            )
        _JOB_REGISTRY.add(self.id)

    def __hash__(self):
        return hash(str(self.id))

    def __call__(self, *args, **kwargs) -> 'Result':
        all_args = args
        all_kwargs = kwargs
        if isinstance(self.args, Tuple):
            all_args += self.args
        else:
            all_kwargs = dict(**kwargs, **self.args)
        if isinstance(self.task, Callable):
            runnable = self.task
        else:
            runnable = self._build_callable()
        return Result(id=self.id, data=runnable(*all_args, **all_kwargs))

    def _build_callable(self):
        """Create a function from a string task.

        If the task is of the form /path/to/script.py::func_to_run, split the
        path from the func and return a script.func_to_run callable.
        If the task is of the form /path/to/script.py, then return a
        python /path/to/script.py callable.
        """
        if self._JOB_SCRIPT_FUNC_SEPARATOR in self.task:
            # split the task string by the [:]+ marker
            script_path, func_name = re.split(
                self._JOB_SCRIPT_FUNC_SEPARATION_REGEX, self.task
            )
            assert script_path and func_name, \
                f"Empty path {script_path} or function name {func_name}"
            runnable = get_callable_from_script(script_path, func_name)
        else:
            binary = self._infer_binary()
            runnable = partial(run_script_with_args, binary, self.task)
        return runnable

    def _infer_binary(self):
        if isinstance(self.meta, dict) and "binary" in self.meta:
            return self.meta["binary"]
        if self.task.endswith(".py"):
            return "python"
        if self.task.endswith(".sh"):
            return "bash"
        return self._JOB_DEFAULT_BINARY


class SlurmJobState(enum.Enum):
    """Some of the most frequently encountered slurm job statuses."""

    PENDING = 0
    RUNNING = 1
    COMPLETED = 2
    FAILED = 3
    CANCELLED = 4
    UNKNOWN = 5

    @classmethod
    def from_string(cls, state: str):
        if state == "running":
            return cls.RUNNING
        if state == "pending":
            return cls.PENDING
        if state == "completed":
            return cls.COMPLETED
        if state == "failed":
            return cls.FAILED
        if state == "cancelled":
            return cls.CANCELLED
        return cls.UNKNOWN


@dataclass(frozen=True)
class SlurmJob(Job):
    """A :class:`Job` subclass to schedule tasks on Slurm.

    Runs an 'sbatch' command in the shell with the script.

    Attributes:
        output_file: (optional) :obj:`str`. Path to the file where the executed
            script will dump the result file. If none is provided, a temporary
            file will be created.
    """

    output_file: str = None

    # slurm shell commands
    _SLURM_CMD_PUSH = ["sbatch"]
    _SLURM_CMD_KILL = ["scancel"]
    _SLURM_CMD_INFO = ["scontrol", "show", "job"]

    # slurm script elements
    _SLURM_SCRIPT_PREAMBLE = "#!/bin/bash"
    _SLURM_SCRIPT_LINE_PREFIX = "#SBATCH"
    _SLURM_SCRIPT_JOB_NAME = "--job-name"
    _SLURM_SCRIPT_OUT_NAME = "--output"
    _SLURM_SCRIPT_RESOURCES_MEM = "--mem"
    _SLURM_SCRIPT_RESOURCES_TIME = "--time"
    _SLURM_SCRIPT_RESOURCES_CPU = "--cpus-per-task"
    _SLURM_SCRIPT_RESOURCES_GPU = "--gres"

    # other macros
    _SLURM_JOB_STATE_REGEX = r"JobState=(RUNNING|PENDING|COMPLETED|FAILED|CANCELLED)"

    def __post_init__(self):
        if not isinstance(self.task, str):
            raise ValueError("Slurm job must be defined with a script to run.")
        super(SlurmJob, self).__post_init__()

    def __call__(self) -> 'Result':
        res = self._execute_job()
        return Result(id=self.id, data=res)

    def _execute_job(self) -> Any:
        with tempfile.NamedTemporaryFile(mode="w+t", suffix=".sh") as fp:
            contents = self._create_slurm_script()
            fp.writelines(contents)
            fp.seek(0)
            response = run_command(self._SLURM_CMD_PUSH + [f"{fp.name}"])
        slurm_id = int(re.search(r"[\d]+", response).group())
        while True:
            slurm_status = self._query_job_status(slurm_id)
            if slurm_status in [SlurmJobState.RUNNING, SlurmJobState.PENDING]:
                time.sleep(1)
            elif slurm_status in [SlurmJobState.CANCELLED, SlurmJobState.FAILED]:
                return None
            elif slurm_status == SlurmJobState.COMPLETED:
                return fetch_result(self.output_file)
            else:
                raise RuntimeError(f"Unknown state of slurm job {slurm_id}.")

    def _create_slurm_script(self) -> List[str]:
        if not self.meta:
            raise ValueError(f"Cannot infer slurm job parameters. "
                             f"Fill in meta dict in job {self.id}.")
        else:
            # Preamble, job name and output log filename definitions
            content_lines = [
                f"{self._SLURM_SCRIPT_PREAMBLE}\n",
                f"{self._SLURM_SCRIPT_LINE_PREFIX} "
                f"{self._SLURM_SCRIPT_JOB_NAME}=job_{self.id}\n",
                f"{self._SLURM_SCRIPT_LINE_PREFIX} "
                f"{self._SLURM_SCRIPT_OUT_NAME}=log_%j.txt\n"]
            # Resources specification
            n_cpus = int(self.meta.get("resources", {}).get("cpu", 1))
            if n_cpus >= 1:
                content_lines.append(
                    f"{self._SLURM_SCRIPT_LINE_PREFIX} "
                    f"{self._SLURM_SCRIPT_RESOURCES_CPU}={n_cpus}\n"
                )
            gpus = str(self.meta.get("resources", {}).get("gpu", ""))
            if gpus:
                if gpus.isnumeric():
                    gpus = f"gpu:{gpus}"
                content_lines.append(
                    f"{self._SLURM_SCRIPT_LINE_PREFIX} "
                    f"{self._SLURM_SCRIPT_RESOURCES_GPU}={gpus}\n"
                )
            mem = str(self.meta.get("resources", {}).get("memory", ""))
            if mem:
                content_lines.append(
                    f"{self._SLURM_SCRIPT_LINE_PREFIX} "
                    f"{self._SLURM_SCRIPT_RESOURCES_MEM}={mem}\n"
                )
            limit_time = str(self.meta.get("resources", {}).get("time", ""))
            if limit_time:
                content_lines.append(
                    f"{self._SLURM_SCRIPT_LINE_PREFIX} "
                    f"{self._SLURM_SCRIPT_RESOURCES_TIME}={limit_time}\n"
                )
            # Task specification
            binary = self.meta.get("binary", "python")
            if isinstance(self.args, Tuple):
                # build positional arguments
                script_args = ' '.join([*map(str, self.args), self.output_file])
            else:
                # build named arguments
                script_args = ' '.join([
                    *(str(item)
                      for key_val in self.args.items()
                      for item in key_val),
                    "--output_file", self.output_file
                ])
            content_lines.append(f"{binary} {self.task} {script_args}")
        return content_lines

    def _query_job_status(self, slurm_id: int) -> SlurmJobState:
        response = run_command(self._SLURM_CMD_INFO + [str(slurm_id)])
        job_state = re.search(self._SLURM_JOB_STATE_REGEX, response)
        if job_state is not None:
            job_state = job_state.group(1).lower()
            return SlurmJobState.from_string(job_state)


@dataclass(frozen=True)
class Result:
    """A :class:`Result` class to store the output of the executed :class:`Job`.

     It shares the same id as the job which generated it.

    Attributes:
        id: :obj:`int`. The identifier of the `Result` object which corresponds
            to the job that has been run.
        data: :obj:`Any`. The output data of the job.
    """
    data: Any
    id: int

    def __post_init__(self):
        if self.id in _RESULT_REGISTRY:
            raise ValueError(
                f"Result with an ID {self.id} is already created. "
                f"Reusing IDs is prohibited."
            )
        _RESULT_REGISTRY.add(self.id)


================================================
FILE: hypertunity/scheduling/scheduler.py
================================================
"""A scheduler for running jobs locally in a parallel manner using joblib as
a backend.
"""

import multiprocessing as mp
import time
from typing import List

import joblib

from hypertunity import utils

from .jobs import Job, Result

__all__ = [
    "Scheduler"
]


class Scheduler:
    """A manager for parallel execution of jobs.

    A job must be of type :class:`Job` which produces a :class:`Result`
    object upon successful completion. The scheduler maintains a job and
    result queues.

    Notes:
        This class should be used as a context manager.
    """

    def __init__(self, n_parallel: int = None):
        """Setup the job and results queues.

        Args:
            n_parallel: (optional) :obj:`int`. The number of jobs that can be
                run in parallel. Defaults to `None` in which case all but one
                available CPUs will be used.
        """
        self._job_queue = mp.Queue()
        self._result_queue = mp.Queue()
        self._is_queue_closed = False

        if n_parallel is None:
            self.n_parallel = -2  # using all CPUs but 1
        else:
            self.n_parallel = max(n_parallel, 1)
        self._servant = mp.Process(target=self._run_servant)
        self._interrupt_event = mp.Event()
        self._servant.start()

    def __del__(self):
        """Clean up subprocesses on object deletion.

        Close the queues and join all subprocesses before the object is deleted.
        """
        if not self._is_queue_closed:
            self.exit()
        if self._servant.is_alive():
            self._servant.terminate()

    def __enter__(self):
        """Enter the context manager."""
        return self

    def __exit__(self, exc_type, exc_val, exc_tb):
        """Exit the context manager."""
        self.exit()

    def _run_servant(self):
        """Run the pool of workers on the dispatched jobs, fetched from the job
        queue and collect the results into the result queue.

        Notes:
            The runner will take as long as all jobs from the job queue finish
            before any results are written to the result queue.
        """
        # TODO: Switch backend back to default "loky", after the leakage
        #  of semaphores is fixed
        with joblib.Parallel(n_jobs=self.n_parallel,
                             backend="multiprocessing") as parallel:
            while not self._interrupt_event.is_set():
                current_jobs = utils.drain_queue(self._job_queue)
                if not current_jobs:
                    continue
                # the order of the results corresponds to the that of the jobs
                # and the IDs don't need to be shuffled.
                ids = [job.id for job in current_jobs]
                # TODO: in a future version of joblib, this could be a generator
                #  and then the inputs would be stored immediately in the results
                #  queue. Be ready to update whenever this PR gets merged:
                #  https://github.com/joblib/joblib/pull/588
                results = parallel(joblib.delayed(job)() for job in current_jobs)
                assert len(ids) == len(results)
                for res in results:
                    self._result_queue.put_nowait(res)

    def dispatch(self, jobs: List[Job]):
        """Dispatch the jobs for parallel execution.

        This method is non-blocking.

        Args:
            jobs: :obj:`List[Job]`. A list of jobs to run whenever resources
                are available.

        Notes:
            Although the jobs are scheduled to run immediately, the actual
            execution may take place after indefinite delay if the job runner
            is occupied with older jobs.
        """
        for job in jobs:
            self._job_queue.put_nowait(job)

    def collect(self, n_results: int, timeout: float = None) -> List[Result]:
        """Collect all the available results or wait until they become available.

        Args:
            n_results: :obj:`int`, number of results to wait for.
                If `n_results` ≤ 0 then all available results will be returned.
            timeout: (optional) :obj:`float`, number of seconds to wait for
                results to appear. If None (default) then it will wait until
                all `n_results` are collected.

        Returns:
            A list of :class:`Result` objects with length `n_results` at least.

        Notes:
            If `n_results` is overestimated and timeout is None, then this
            method will hang forever. Therefore it is recommended that a timeout
            is set.

        Raises:
            :obj:`TimeoutError`: if more than `timeout` seconds elapse before a
            :class:`Result` is collected.
        """
        if n_results > 0:
            results = []
            for i in range(n_results):
                results.append(self._result_queue.get(block=True, timeout=timeout))
        else:
            results = utils.drain_queue(self._result_queue)
        return results

    def interrupt(self):
        """Interrupt the scheduler and all running jobs."""
        self._interrupt_event.set()

    def exit(self):
        """Exit the scheduler by closing the queues and terminating the
        servant process.
        """
        if not self._is_queue_closed:
            utils.drain_queue(self._job_queue, close_queue=True)
            self._job_queue.join_thread()
            utils.drain_queue(self._result_queue, close_queue=True)
            self._result_queue.join_thread()
            self._is_queue_closed = True
        self.interrupt()
        # wait a bit for the subprocess to exit gracefully
        n_retries = 3
        while self._servant.is_alive() and n_retries > 0:
            n_retries -= 1
            time.sleep(0.05)
        self._servant.terminate()


================================================
FILE: hypertunity/scheduling/tests/__init__.py
================================================


================================================
FILE: hypertunity/scheduling/tests/script.py
================================================
import argparse
import os
import pickle
import sys


class DoNotReplaceAction(argparse.Action):
    def __call__(self, parser, namespace, values, option_string=None):
        if getattr(namespace, self.dest) is None:
            setattr(namespace, self.dest, values)


def parse_args(args):
    parser = argparse.ArgumentParser()
    parser.add_argument("x", nargs='?', type=int, action=DoNotReplaceAction)
    parser.add_argument("--x", type=int)
    parser.add_argument("y", nargs='?', type=float, action=DoNotReplaceAction)
    parser.add_argument("--y", type=float)
    parser.add_argument("z", nargs='?', type=str, action=DoNotReplaceAction)
    parser.add_argument("--z", type=str)
    parser.add_argument("output_file", nargs='?', type=str, action=DoNotReplaceAction)
    parser.add_argument("--output_file", type=str)
    return parser.parse_args(args)


def main(x: int, y: float, z: str) -> float:
    if z.endswith(tuple("0123456789")):
        return y * x
    return y * x**2


if __name__ == '__main__':
    parsed_args = parse_args(sys.argv[1:])
    result = main(parsed_args.x, parsed_args.y, parsed_args.z)
    print(result)
    output_dir = os.path.dirname(parsed_args.output_file)
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)
    with open(parsed_args.output_file, 'wb') as fp:
        pickle.dump(result, fp)


================================================
FILE: hypertunity/scheduling/tests/test_jobs.py
================================================
import pytest

from ..jobs import Job


def test_repeating_id():
    _ = Job(task=sum, args=(), id=-100)
    with pytest.raises(ValueError):
        _ = Job(task=max, args=(), id=-100)
    _ = Job(task=sum, args=(), id=-99)


def test_callable_job():
    job_args = (131212, 123123123)
    job = Job(task=lambda x, y: x + y, args=job_args)
    result = job()
    assert result.data == sum(job_args)


================================================
FILE: hypertunity/scheduling/tests/test_scheduler.py
================================================
import os
import tempfile

import pytest

from hypertunity.domain import Domain, Sample
from hypertunity.optimisation import base

from ..jobs import Job, SlurmJob
from ..scheduler import Scheduler
from . import script


@pytest.fixture(scope="module")
def shared_slurm_tmp_dir():
    return "/tmp"


def square(sample: Sample) -> base.EvaluationScore:
    return base.EvaluationScore(sample["x"]**2)


def run_jobs(jobs):
    with Scheduler(n_parallel=2) as scheduler:
        scheduler.dispatch(jobs)
        results = scheduler.collect(n_results=len(jobs), timeout=60.0)
    assert len(results) == len(jobs)
    assert all([r.id == j.id for r, j in zip(results, jobs)])
    return results


@pytest.mark.timeout(10.0)
def test_local_from_script_and_function():
    domain = Domain({
        "x": {0, 1, 2, 3},
        "y": [-1., 1.],
        "z": {"123", "abc"}
    }, seed=7)
    jobs = [Job(task="hypertunity/scheduling/tests/script.py::main",
                args=(*domain.sample().as_namedtuple(),)) for _ in range(10)]
    results = run_jobs(jobs)
    assert all([r.data == script.main(*j.args) for r, j in zip(results, jobs)])


@pytest.mark.timeout(10.0)
def test_local_from_script_and_cmdline_args():
    domain = Domain({
        "x": {0, 1, 2, 3},
        "y": [-1., 1.],
        "z": {"123", "abc"}
    }, seed=7)
    jobs = [Job(task="hypertunity/scheduling/tests/script.py",
                args=(*domain.sample().as_namedtuple(),),
                meta={"binary": "python"}) for _ in range(10)]
    results = run_jobs(jobs)
    assert all([r.data == script.main(*j.args) for r, j in zip(results, jobs)])


@pytest.mark.timeout(10.0)
def test_local_from_script_and_cmdline_named_args():
    domain = Domain({
        "--x": {0, 1, 2, 3},
        "--y": [-1., 1.],
        "--z": {"acb123", "abc"}
    }, seed=7)
    jobs = [Job(task="hypertunity/scheduling/tests/script.py",
                args=domain.sample().as_dict(),
                meta={"binary": "python"}) for _ in range(10)]
    results = run_jobs(jobs)
    assert all([
        r.data == script.main(**{k.lstrip("-"): v for k, v in j.args.items()})
        for r, j in zip(results, jobs)
    ])


@pytest.mark.timeout(10.0)
def test_local_from_fn():
    domain = Domain({"x": [0., 1.]}, seed=7)
    jobs = [Job(task=square, args=(domain.sample(),)) for _ in range(10)]
    results = run_jobs(jobs)
    assert all([r.data.value == square(*j.args).value
                for r, j in zip(results, jobs)])


@pytest.mark.slurm
@pytest.mark.timeout(60.0)
def test_slurm_from_script(shared_slurm_tmp_dir):
    domain = Domain({
        "x": {0, 1, 2, 3},
        "y": [-1., 1.],
        "z": {"123", "abc"}
    }, seed=7)
    jobs, dirs = [], []
    n_jobs = 4
    for i in range(n_jobs):
        sample = domain.sample()
        dirs.append(tempfile.TemporaryDirectory(dir=shared_slurm_tmp_dir))
        jobs.append(SlurmJob(
            task="hypertunity/scheduling/tests/script.py",
            args=(*sample.as_namedtuple(),),
            output_file=f"{os.path.join(dirs[-1].name, 'results.pkl')}",
            meta={"binary": "python", "resources": {"cpu": 1}}
        ))
    results = run_jobs(jobs)
    assert all([r.data == script.main(*j.args) for r, j in zip(results, jobs)])
    # clean-up the temporary dirs
    for d in dirs:
        d.cleanup()


================================================
FILE: hypertunity/tests/__init__.py
================================================


================================================
FILE: hypertunity/tests/test_domain.py
================================================
from collections import namedtuple

import pytest

from hypertunity.domain import (
    Domain,
    DomainNotIterableError,
    DomainSpecificationError,
    Sample
)


@pytest.mark.parametrize("domain,expectation", [
    ({1: {"b": [2, 3]}, "c": [0, 0.1]},
     pytest.raises(DomainSpecificationError)),
    ({"a": {"b": (1, 2, 3, 4)}, "c": [0, 0.1]},
     pytest.raises(DomainSpecificationError)),
    ({"a": {"b": lambda x: x}, "c": [0, 0.1]},
     pytest.raises(DomainSpecificationError)),
    # this one should fail from the ast.literal_eval parsing
    ('{"a": {"b": lambda x: x}, "c": [0, 0.1]}',
     pytest.raises(ValueError))
])
def test_invalid_domain(domain, expectation):
    with expectation:
        Domain(domain)


@pytest.mark.parametrize("domain", [
    {"a": {"b": {0, 1}}, "c": [0, 0.1]},
    '{"a": {"b": {0, 1}}, "c": [0, 0.1]}'
])
def test_valid_domain(domain):
    Domain(domain)


def test_eq():
    d1 = Domain({"a": {"b": [2, 3]}, "c": [0, 0.1]})
    d2 = Domain({"a": {"b": [2, 3]}, "c": [0, 0.1]})
    assert d1 == d2


def test_flatten():
    dom = Domain({"a": {"b": [0, 1]}, "c": {0, 0.1}})
    assert dom.flatten() == {("a", "b"): [0, 1], ("c",): {0, 0.1}}


def test_addition():
    domain_all = Domain({
        "a": [1, 2],
        "b": {"c": {1, 2, 3}, "d": {"o1", "o2"}},
        "e": {3, 4, 5}
    })
    domain_1 = Domain({"a": [1, 2], "b": {"c": {1, 2, 3}}})
    domain_2 = Domain({"b": {"d": {"o1", "o2"}}})
    domain_3 = Domain({"e": {3, 4, 5}})
    assert domain_1 + domain_2 + domain_3 == domain_all
    with pytest.raises(ValueError):
        _ = domain_1 + domain_1


def test_serialisation():
    domain = Domain({"a": [1, 2], "b": {"c": {1, 2, 3}, "d": {"o1", "o2"}}})
    serialised = domain.serialise()
    deserialised = Domain.deserialise(serialised)
    assert deserialised == domain


def test_as_dict():
    dict_domain = {"a": {"b": [2, 3]}, "c": [0, 0.1]}
    domain = Domain(dict_domain)
    assert domain.as_dict() == dict_domain


def test_as_namedtuple():
    domain = Domain({"a": {"b": {2, 3, 4}}, "c": [0, 0.1]})
    nt = domain.as_namedtuple()
    assert nt.a == namedtuple("_", "b")({2, 3, 4})
    assert nt.a.b == {2, 3, 4}
    assert nt.c == [0, 0.1]


def test_from_list():
    lst = [
        (("a", "b"), {2, 3, 4}),
        (("c",), {0, 0.1}),
        (("d", "e", "f"), {0, 1}),
        (("d", "g"), {2, 3})
    ]
    domain_true = Domain({
        "a": {"b": {2, 3, 4}},
        "c": {0, 0.1},
        "d": {"e": {"f": {0, 1}}, "g": {2, 3}}
    })
    domain_from_list = Domain.from_list(lst)
    assert domain_true == domain_from_list
    assert lst == list(domain_true.flatten().items())


def test_fail_iter_cont_domain():
    with pytest.raises(DomainNotIterableError):
        list(iter(Domain({"a": {"b": {2, 3, 4}}, "c": [0, 0.1]})))


def test_iter():
    discrete_domain = Domain({
        "a": {"b": {2, 3, 4}, "j": {"d": {5, 6}, "f": {"g": {7}}}},
        "c": {"op1", 0.1}
    })
    all_samples = set(iter(discrete_domain))
    assert all_samples == {
        Sample({'a': {'b': 2, 'j': {'d': 5, 'f': {'g': 7}}}, 'c': 'op1'}),
        Sample({'a': {'b': 3, 'j': {'d': 5, 'f': {'g': 7}}}, 'c': 'op1'}),
        Sample({'a': {'b': 4, 'j': {'d': 5, 'f': {'g': 7}}}, 'c': 'op1'}),
        Sample({'a': {'b': 2, 'j': {'d': 6, 'f': {'g': 7}}}, 'c': 'op1'}),
        Sample({'a': {'b': 3, 'j': {'d': 6, 'f': {'g': 7}}}, 'c': 'op1'}),
        Sample({'a': {'b': 4, 'j': {'d': 6, 'f': {'g': 7}}}, 'c': 'op1'}),
        Sample({'a': {'b': 2, 'j': {'d': 5, 'f': {'g': 7}}}, 'c': 0.1}),
        Sample({'a': {'b': 3, 'j': {'d': 5, 'f': {'g': 7}}}, 'c': 0.1}),
        Sample({'a': {'b': 4, 'j': {'d': 5, 'f': {'g': 7}}}, 'c': 0.1}),
        Sample({'a': {'b': 2, 'j': {'d': 6, 'f': {'g': 7}}}, 'c': 0.1}),
        Sample({'a': {'b': 3, 'j': {'d': 6, 'f': {'g': 7}}}, 'c': 0.1}),
        Sample({'a': {'b': 4, 'j': {'d': 6, 'f': {'g': 7}}}, 'c': 0.1})
    }


def test_sampling():
    domain = Domain({"a": {"b": {2, 3, 4}}, "c": [0, 0.1]})
    for i in range(10):
        sample = domain.sample()
        assert sample["a"]["b"] in {2, 3, 4} and 0. <= sample["c"] <= 0.1


def test_split_by_type():
    domain = Domain({"x": [1, 2], "y": {-3, 2, 5}, "z": {"small", 1, 0.1}})
    discr, cat, cont = domain.split_by_type()
    assert sum(domain.split_by_type(), Domain({})) == domain
    assert discr == Domain({"y": {-3, 2, 5}})
    assert cat == Domain({"z": {"small", 1, 0.1}})
    assert cont == Domain({"x": [1, 2]})


================================================
FILE: hypertunity/tests/test_trial.py
================================================
import pytest

from hypertunity import Domain, Trial
from hypertunity.optimisation import RandomSearch
from hypertunity.reports import Table
from hypertunity.scheduling import Job
from hypertunity.scheduling.tests.test_scheduler import run_jobs


def foo(x, y, z):
    return x**2 + y**2 - z**3


@pytest.mark.timeout(60.0)
def test_trial_with_callable():
    domain = Domain({"x": [-1., 1.], "y": [-2, 2], "z": {1, 2, 3, 4}})
    trial = Trial(objective=foo, domain=domain,
                  optimiser="random_search",
                  database_path=None,
                  seed=7, metrics=["score"])
    n_steps = 10
    batch_size = 2
    trial.run(n_steps, batch_size=batch_size, n_parallel=batch_size)

    rs = RandomSearch(domain=domain, seed=7)
    rep = Table(domain, metrics=["score"])
    for i in range(n_steps):
        samples = rs.run_step(batch_size=batch_size, minimise=False)
        results = [foo(*s.as_namedtuple(), ) for s in samples]
        for sample_eval in zip(samples, results):
            rep.log(sample_eval)

    assert len(trial.optimiser.history) == n_steps * batch_size
    assert str(rep.format(order="ascending")) == str(
        trial.reporter.format(order="ascending")
    )


@pytest.mark.timeout(60.0)
def test_trial_with_script():
    domain = Domain({
        "--x": {0, 1, 2, 3},
        "--y": [-1., 1.],
        "--z": {"123", "abc"}
    })
    trial = Trial(objective="hypertunity/scheduling/tests/script.py",
                  domain=domain,
                  optimiser="random_search",
                  database_path=None,
                  seed=7, metrics=["score"])
    batch_size = 4
    trial.run(n_steps=1, batch_size=batch_size, n_parallel=batch_size)

    rs = RandomSearch(domain=domain, seed=7)
    samples = rs.run_step(batch_size=batch_size)
    jobs = [Job(task="hypertunity/scheduling/tests/script.py",
                args=s.as_dict(),
                meta={"binary": "python"}) for s in samples]
    results = [r.data for r in run_jobs(jobs)]
    assert results == [h.metrics["score"].value
                       for h in trial.optimiser.history]


================================================
FILE: hypertunity/tests/test_utils.py
================================================
import queue

import pytest

from .. import utils

try:
    from contextlib import nullcontext
except ImportError:
    from contextlib import contextmanager

    @contextmanager
    def nullcontext():
        yield


def test_support_american_spelling():

    @utils.support_american_spelling
    def gb_spelling_func(minimise, optimise, maximise):
        return minimise, optimise, maximise

    expected = (True, 1, None)
    assert gb_spelling_func(minimise=True, optimise=1, maximise=None) == expected
    assert gb_spelling_func(minimize=True, optimize=1, maximize=None) == expected


@pytest.mark.parametrize("test_input,expectation", [
    (("vxc", "", "", "___"), nullcontext()),
    (("_", "_", ""), nullcontext()),
    (("asd",), nullcontext()),
    (("asd", "dxcv"), nullcontext()),
    (("asd", "\\", "\n"), pytest.raises(ValueError))
])
def test_split_and_join_strings(test_input, expectation):
    with expectation:
        assert test_input == utils.split_string(
            utils.join_strings(test_input, join_char="_"),
            split_char="_"
        )


def test_drain_queue():
    q = queue.Queue(10)
    elems = list(range(10))
    for i in elems:
        q.put(i)
    items = utils.drain_queue(q)
    assert items == elems
    with pytest.raises(queue.Empty):
        q.get_nowait()


================================================
FILE: hypertunity/trial.py
================================================
"""A wrapper class for conducting multiple experiments, scheduling jobs and
saving results.
"""

from typing import Callable, Type, Union

from hypertunity import optimisation, reports, utils
from hypertunity.domain import Domain
from hypertunity.optimisation import Optimiser
from hypertunity.reports import Reporter
from hypertunity.scheduling import Job, Scheduler, SlurmJob

__all__ = [
    "Trial"
]

OptimiserTypes = Union[str, Type[Optimiser], Optimiser]
ReporterTypes = Union[str, Type[Reporter], Reporter]


class Trial:
    """High-level API class for running hyperparameter optimisation.
    This class encapsulates optimiser querying, job building, scheduling and
    results collection as well as checkpointing and report generation.
    """

    @utils.support_american_spelling
    def __init__(self, objective: Union[Callable, str],
                 domain: Domain,
                 optimiser: OptimiserTypes = "bo",
                 reporter: ReporterTypes = "table",
                 device: str = "local",
                 **kwargs):
        """Initialise the :class:`Trial` experiment manager.

        Args:
            objective: :obj:`Callable` or :obj:`str`. The objective function or
                script to run.
            domain: :class:`Domain`. The optimisation domain of the objective
                function.
            optimiser: :class:`Optimiser` or :obj:`str`. The optimiser method
                for domain sampling.
            reporter: :class:`Reporter` or :obj:`str`. The reporting method for
                the results.
            device: :obj:`str`. The host device running the evaluations. Can be
                'local' or 'slurm'.
            **kwargs: additional parameters for the optimiser, reporter and
                scheduler.

        Keyword Args:
            timeout: :obj:`float`. The number of seconds to wait for a
                :class:`Job` instance to finish. Default is 259200 seconds,
                or approximately 3 days.
        """
        self.objective = objective
        self.domain = domain
        self.optimiser = self._init_optimiser(optimiser, **kwargs)
        self.reporter = self._init_reporter(reporter, **kwargs)
        self.scheduler = Scheduler
        # 259200 is the number of seconds contained in 3 days
        self._timeout = kwargs.get("timeout", 259200.0)
        self._job = self._init_job(device)

    def _init_optimiser(self, optimiser: OptimiserTypes, **kwargs) -> Optimiser:
        if isinstance(optimiser, str):
            optimiser_class = get_optimiser(optimiser)
        elif issubclass(optimiser, Optimiser):
            optimiser_class = optimiser
        elif isinstance(optimiser, Optimiser):
            return optimiser
        else:
            raise TypeError(
                "An optimiser must be a either a string, "
                "an Optimiser type or an Optimiser instance."
            )
        opt_kwargs = {}
        if "seed" in kwargs:
            opt_kwargs["seed"] = kwargs["seed"]
        return optimiser_class(self.domain, **opt_kwargs)

    def _init_reporter(self, reporter: ReporterTypes, **kwargs) -> Reporter:
        if isinstance(reporter, str):
            reporter_class = get_reporter(reporter)
        elif issubclass(reporter, Reporter):
            reporter_class = reporter
        elif isinstance(reporter, Reporter):
            return reporter
        else:
            raise TypeError("A reporter must be either a string, "
                            "a Reporter type or a Reporter instance.")
        rep_kwargs = {"metrics": kwargs.get("metrics", ["score"]),
                      "database_path": kwargs.get("database_path", ".")}
        if not issubclass(reporter_class, reports.Table):
            rep_kwargs["logdir"] = kwargs.get("logdir", "tensorboard/")
        return reporter_class(self.domain, **rep_kwargs)

    @staticmethod
    def _init_job(device: str) -> Type[Job]:
        device = device.lower()
        if device == "local":
            return Job
        if device == "slurm":
            return SlurmJob
        raise ValueError(
            f"Unknown device {device}. Select one from {{'local', 'slurm'}}."
        )

    def run(self, n_steps: int, n_parallel: int = 1, **kwargs):
        """Run the optimisation and objective function evaluation.

        Args:
            n_steps: :obj:`int`. The total number of optimisation steps.
            n_parallel: (optional) :obj:`int`. The number of jobs that can be
                scheduled at once.
            **kwargs: additional keyword arguments for the optimisation,
                supplied to the :py:meth:`run_step` method of the
                :class:`Optimiser` instance.

        Keyword Args:
            batch_size: (optional) :obj:`int`. The number of samples that are
                suggested at once. Default is 1.
            minimise: (optional) :obj:`bool`. If the optimiser is
                :class:`BayesianOptimisation` then this flag tells whether the
                objective function is being minimised or maximised. Otherwise
                it has no effect. Default is `False`.
        """
        batch_size = kwargs.get("batch_size", 1)
        n_parallel = min(n_parallel, batch_size)
        with self.scheduler(n_parallel=n_parallel) as scheduler:
            for i in range(n_steps):
                samples = self.optimiser.run_step(
                    batch_size=batch_size,
                    minimise=kwargs.get("minimise", False)
                )
                jobs = [
                    self._job(task=self.objective, args=s.as_dict())
                    for s in samples
                ]
                scheduler.dispatch(jobs)
                evaluations = [
                    r.data for r in scheduler.collect(
                        n_results=batch_size, timeout=self._timeout
                    )
                ]
                self.optimiser.update(samples, evaluations)
                for s, e, j in zip(samples, evaluations, jobs):
                    self.reporter.log((s, e), meta={"job_id": j.id})


def get_optimiser(name: str) -> Type[Optimiser]:
    name = name.lower()
    if name.startswith(("bayes", "bo")):
        return optimisation.BayesianOptimisation
    if name.startswith("random"):
        return optimisation.RandomSearch
    if name.startswith(("grid", "exhaustive")):
        return optimisation.GridSearch
    raise ValueError(
        f"Unknown optimiser {name}. Select one from "
        f"{{'bayesian_optimisation', 'random_search', 'grid_search'}}."
    )


def get_reporter(name: str) -> Type[Reporter]:
    name = name.lower()
    if name.startswith("table"):
        return reports.Table
    if name.startswith(("tensor", "tb")):
        import reports.tensorboard as tb
        return tb.Tensorboard
    raise ValueError(
        f"Unknown reporter {name}. Select one from {{'table', 'tensorboard'}}."
    )


================================================
FILE: hypertunity/utils.py
================================================
import queue
from functools import wraps

GB_US_SPELLING = {
    "minimise": "minimize",
    "maximise": "maximize",
    "optimise": "optimize",
    "optimiser": "optimizer",
    "emphasise": "emphasize"
}

US_GB_SPELLING = {us: gb for gb, us in GB_US_SPELLING.items()}


def support_american_spelling(func):
    """Convert American spelling keyword arguments to British
    (default for hypertunity).

    Args:
        func: a Python callable to decorate.

    Returns:
        The decorated function which supports American keyword arguments.
    """

    # using functools.wraps(func) enables automated documentation generation
    # for more information see: https://github.com/sphinx-doc/sphinx/issues/3783
    @wraps(func)
    def british_spelling_func(*args, **kwargs):
        gb_kwargs = {US_GB_SPELLING.get(kw, kw): val
                     for kw, val in kwargs.items()}
        return func(*args, **gb_kwargs)

    return british_spelling_func


def join_strings(strings, join_char="_"):
    """Join list of strings with an underscore.

    The strings must contain string.printable characters only, otherwise an
    exception is raised. If one of the strings has already an underscore, it
    will be replace by a null character.

    Args:
        strings: iterable of strings.
        join_char: str, the character to join with.

    Returns:
        The joined string with an underscore character.

    Examples:
    ```python
        >>> join_strings(['asd', '', '_xcv__'])
        'asd__\x00xcv\x00\x00'
    ```

    Raises:
        ValueError if a string contains an unprintable character.
    """
    all_cleaned = []
    for s in strings:
        if not s.isprintable():
            raise ValueError(
                "Encountered unexpected name containing non-printable characters."
            )
        all_cleaned.append(s.replace(join_char, "\0"))
    return join_char.join(all_cleaned)


def split_string(joined, split_char="_"):
    """Split joined string and substitute back the null characters with an
    underscore if necessary.

    Inverse function of `join_strings(strings)`.

    Args:
        joined: str, the joined representation of the substrings.
        split_char: str, the character to split by.

    Returns:
        A tuple of strings with the splitting character (underscore) removed.

    Examples:
    ```python
        >>> split_string('asd__\x00xcv\x00\x00')
        ('asd', '', '_xcv__')
    ```
    """
    strings = joined.split(split_char)
    strings_copy = []
    for s in strings:
        strings_copy.append(s.replace("\0", split_char))
    return tuple(strings_copy)


def drain_queue(q, close_queue=False):
    """Get all items from a queue until an `Empty` exception is raised.

    Args:
        q: `Queue`, the queue to drain.
        close_queue: bool, whether to close the queue, such that no other
        object can be put in. Default is False.

    Returns:
        List of all items from the queue.
    """
    items = []
    while True:
        try:
            it = q.get_nowait()
        except queue.Empty:
            break
        items.append(it)
    if close_queue:
        q.close()
    return items


================================================
FILE: setup.py
================================================
import re

from setuptools import setup, find_packages

with open("hypertunity/__init__.py", "r", encoding="utf8") as f:
    version = re.search(r"__version__ = [\'\"](.*?)[\'\"]", f.read()).group(1)

with open("README.md", "r", encoding="utf8") as f:
    readme = f.read()

required_packages = [
    "beautifultable>=0.7.0",
    "dataclasses;python_version<'3.7'",
    "gpy>=1.9.8",
    "gpyopt==1.2.5",
    "joblib>=0.13.2",
    "matplotlib>=3.0",
    "numpy>=1.16",
    "tinydb>=3.13.0"
]

extras = {
    "tensorboard": ["tensorflow>=1.14.0", "tensorboard>=1.14.0"],
    "tests": ["pytest>=4.6.3", "pytest-timeout>=1.3.3"],
    "docs": ["sphinx>=2.2.0", "sphinx_rtd_theme>=0.4.3"]
}

classifiers = [
    "Development Status :: 5 - Production/Stable",
    "Intended Audience :: Developers",
    "Intended Audience :: Education",
    "Intended Audience :: Science/Research",
    "License :: OSI Approved :: Apache Software License",
    "Programming Language :: Python :: 3.6",
    "Programming Language :: Python :: 3.7",
    "Programming Language :: Python :: 3.8",
    "Topic :: Software Development :: Libraries",
    "Topic :: Software Development :: Libraries :: Python Modules"
]

setup(
    name="hypertunity",
    version=version,
    author="Georgi Dikov",
    author_email="gvdikov@gmail.com",
    url="https://github.com/gdikov/hypertunity",
    description="A toolset for distributed black-box hyperparameter optimisation.",
    long_description=readme,
    long_description_content_type='text/markdown',
    packages=find_packages(exclude=["*.tests", "*.tests.*", "tests.*", "tests"]),
    python_requires=">=3.6",
    install_requires=required_packages,
    extras_require=extras,
    classifiers=classifiers
)