Repository: stefanch/sGDML
Branch: master
Commit: a6ae5e86f88c
Files: 28
Total size: 380.5 KB

Directory structure:
gitextract_7dj7aa45/

├── .gitignore
├── LICENSE.txt
├── README.md
├── pyproject.toml
├── scripts/
│   ├── sgdml_dataset_from_aims.py
│   ├── sgdml_dataset_from_extxyz.py
│   ├── sgdml_dataset_from_ipi.py
│   ├── sgdml_dataset_to_extxyz.py
│   ├── sgdml_dataset_via_ase.py
│   └── sgdml_datasets_from_model.py
├── setup.cfg
├── setup.py
└── sgdml/
    ├── __init__.py
    ├── cli.py
    ├── get.py
    ├── intf/
    │   ├── __init__.py
    │   └── ase_calc.py
    ├── predict.py
    ├── solvers/
    │   ├── __init__.py
    │   ├── analytic.py
    │   └── iterative.py
    ├── torchtools.py
    ├── train.py
    └── utils/
        ├── __init__.py
        ├── desc.py
        ├── io.py
        ├── perm.py
        └── ui.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================

.DS_Store

# Compiled python modules.
*.pyc

# Setuptools distribution folder.
/dist/

# Python egg metadata, regenerated from source files by setuptools.
/*.egg-info
/*.egg
sgdml/_bmark_cache.npz


================================================
FILE: LICENSE.txt
================================================
MIT License

Copyright (c) 2018-2022 Stefan Chmiela

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

================================================
FILE: README.md
================================================
# Symmetric Gradient Domain Machine Learning (sGDML)

For more details visit: [sgdml.org](http://sgdml.org/)  
Documentation can be found here: [docs.sgdml.org](http://docs.sgdml.org/)

#### Requirements:
- Python 3.7+
- PyTorch (>=1.8)
- NumPy (>=1.19)
- SciPy (>=1.1)

#### Optional:
- ASE (>=3.16.2) (to run atomistic simulations)

## Getting started

### Stable release

Most systems come with the default package manager for Python ``pip`` already preinstalled. Install ``sgdml`` by simply calling:

```
$ pip install sgdml
```

The ``sgdml`` command-line interface and the corresponding Python API can now be used from anywhere on the system.

### Development version

#### (1) Clone the repository

```
$ git clone https://github.com/stefanch/sGDML.git
$ cd sGDML
```

...or update your existing local copy with

```
$ git pull origin master
```

#### (2) Install

```
$ pip install -e .
```

Using the flag ``--user``, you can tell ``pip`` to install the package to the current users's home directory, instead of system-wide. This option might require you to update your system's ``PATH`` variable accordingly.


### Optional dependencies

Some functionality of this package relies on third-party libraries that are not installed by default. These optional dependencies (or "package extras") are specified during installation using the "square bracket syntax":

```
$ pip install sgdml[<optional1>]
```

#### Atomic Simulation Environment (ASE)

If you are interested in interfacing with [ASE](https://wiki.fysik.dtu.dk/ase/) to perform atomistic simulations (see [here](http://docs.sgdml.org/applications.html) for examples), use the ``ase`` keyword:

```
$ pip install sgdml[ase]
```

## Reconstruct your first force field

Download one of the example datasets:

```
$ sgdml-get dataset ethanol_dft
```

Train a force field model:

```
$ sgdml all ethanol_dft.npz 200 1000 5000
```

## Query a force field

```python
import numpy as np
from sgdml.predict import GDMLPredict
from sgdml.utils import io

r,_ = io.read_xyz('geometries/ethanol.xyz') # 9 atoms
print(r.shape) # (1,27)

model = np.load('models/ethanol.npz')
gdml = GDMLPredict(model)
e,f = gdml.predict(r)
print(e.shape) # (1,)
print(f.shape) # (1,27)
```

## Authors

* Stefan Chmiela
* Jan Hermann

We appreciate and welcome contributions and would like to thank the following people for participating in this project:

* Huziel Sauceda
* Igor Poltavsky
* Luis Gálvez
* Danny Panknin
* Grégory Fonseca
* Anton Charkin-Gorbulin

## References

* [1] Chmiela, S., Tkatchenko, A., Sauceda, H. E., Poltavsky, I., Schütt, K. T., Müller, K.-R.,
*Machine Learning of Accurate Energy-conserving Molecular Force Fields.*
Science Advances, 3(5), e1603015 (2017)   
[10.1126/sciadv.1603015](http://dx.doi.org/10.1126/sciadv.1603015)

* [2] Chmiela, S., Sauceda, H. E., Müller, K.-R., Tkatchenko, A.,
*Towards Exact Molecular Dynamics Simulations with Machine-Learned Force Fields.*
Nature Communications, 9(1), 3887 (2018)   
[10.1038/s41467-018-06169-2](https://doi.org/10.1038/s41467-018-06169-2)

* [3] Chmiela, S., Sauceda, H. E., Poltavsky, I., Müller, K.-R., Tkatchenko, A.,
*sGDML: Constructing Accurate and Data Efficient Molecular Force Fields Using Machine Learning.*
Computer Physics Communications, 240, 38-45 (2019)
[10.1016/j.cpc.2019.02.007](https://doi.org/10.1016/j.cpc.2019.02.007)

* [4] Chmiela, S., Vassilev-Galindo, V., Unke, O. T., Kabylda, A., Sauceda, H. E., Tkatchenko, A., Müller, K.-R.,
*Accurate Global Machine Learning Force Fields for Molecules With Hundreds of Atoms.*
Science Advances, 9(2), e1603015 (2023)
[10.1126/sciadv.adf0873](https://doi.org/10.1126/sciadv.adf0873)

================================================
FILE: pyproject.toml
================================================
[tool.black]
skip-string-normalization = true
skip-numeric-underscore-normalization = true


================================================
FILE: scripts/sgdml_dataset_from_aims.py
================================================
#!/usr/bin/python

# MIT License
#
# Copyright (c) 2018-2022 Stefan Chmiela
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

from __future__ import print_function

import argparse
import os
import sys

import numpy as np

from sgdml.utils import io, ui


def read_reference_data(f):  # noqa C901
    eV_to_kcalmol = 0.036749326 / 0.0015946679

    e_next, f_next, geo_next = False, False, False
    n_atoms = None
    R, z, E, F = [], [], [], []

    geo_idx = 0
    for line in f:
        if n_atoms:
            cols = line.split()
            if e_next:
                E.append(float(cols[5]))
                e_next = False
            elif f_next:
                a = int(cols[1]) - 1
                F.append(list(map(float, cols[2:5])))
                if a == n_atoms - 1:
                    f_next = False
            elif geo_next:
                if 'atom' in cols:
                    a_count += 1  # noqa: F821
                    R.append(list(map(float, cols[1:4])))

                    if geo_idx == 0:
                        z.append(io._z_str_to_z_dict[cols[4]])

                    if a_count == n_atoms:
                        geo_next = False
                        geo_idx += 1
            elif 'Energy and forces in a compact form:' in line:
                e_next = True
            elif 'Total atomic forces (unitary forces cleaned) [eV/Ang]:' in line:
                f_next = True
            elif (
                'Atomic structure (and velocities) as used in the preceding time step:'
                in line
            ):
                geo_next = True
                a_count = 0
        elif 'The structure contains' in line and 'atoms,  and a total of' in line:
            n_atoms = int(line.split()[3])
            print('Number atoms per geometry:      {:>7d}'.format(n_atoms))
            continue

        if geo_idx > 0 and geo_idx % 1000 == 0:
            sys.stdout.write("\rNumber geometries found so far: {:>7d}".format(geo_idx))
            sys.stdout.flush()
    sys.stdout.write("\rNumber geometries found so far: {:>7d}".format(geo_idx))
    sys.stdout.flush()
    print(
        '\n'
        + ui.color_str('[INFO]', bold=True)
        + ' Energies and forces have been converted from eV to kcal/mol(/Ang)'
    )

    R = np.array(R).reshape(-1, n_atoms, 3)
    z = np.array(z)
    E = np.array(E) * eV_to_kcalmol
    F = np.array(F).reshape(-1, n_atoms, 3) * eV_to_kcalmol

    f.close()
    return (R, z, E, F)


parser = argparse.ArgumentParser(description='Creates a dataset from FHI-aims format.')
parser.add_argument(
    'dataset',
    metavar='<dataset>',
    type=argparse.FileType('r'),
    help='path to xyz dataset file',
)
parser.add_argument(
    '-o',
    '--overwrite',
    dest='overwrite',
    action='store_true',
    help='overwrite existing dataset file',
)
args = parser.parse_args()
dataset = args.dataset

name = os.path.splitext(os.path.basename(dataset.name))[0]
dataset_file_name = name + '.npz'

dataset_exists = os.path.isfile(dataset_file_name)
if dataset_exists and args.overwrite:
    print(ui.color_str('[INFO]', bold=True) + ' Overwriting existing dataset file.')
if not dataset_exists or args.overwrite:
    print('Writing dataset to \'%s\'...' % dataset_file_name)
else:
    sys.exit(
        ui.color_str('[FAIL]', fore_color=ui.RED, bold=True) + ' Dataset \'%s\' already exists.' % dataset_file_name
    )

R, z, E, F = read_reference_data(dataset)

# Prune all arrays to same length.
n_mols = min(min(R.shape[0], F.shape[0]), E.shape[0])
if n_mols != R.shape[0] or n_mols != F.shape[0] or n_mols != E.shape[0]:
    print(
        ui.color_str('[WARN]', fore_color=ui.YELLOW, bold=True)
        + ' Incomplete output detected: Final dataset was pruned to %d points.' % n_mols
    )
R = R[:n_mols, :, :]
F = F[:n_mols, :, :]
E = E[:n_mols]

# Base variables contained in every model file.
base_vars = {
    'type': 'd',
    'R': R,
    'z': z,
    'E': E[:, None],
    'F': F,
    'e_unit': 'kcal/mol',
    'r_unit': 'Ang',
    'name': name,
    'theory': 'unknown',
}

base_vars['F_min'], base_vars['F_max'] = np.min(F.ravel()), np.max(F.ravel())
base_vars['F_mean'], base_vars['F_var'] = np.mean(F.ravel()), np.var(F.ravel())

base_vars['E_min'], base_vars['E_max'] = np.min(E), np.max(E)
base_vars['E_mean'], base_vars['E_var'] = np.mean(E), np.var(E)

base_vars['md5'] = io.dataset_md5(base_vars)

np.savez_compressed(dataset_file_name, **base_vars)
print(ui.color_str('DONE', fore_color=ui.GREEN, bold=True))


================================================
FILE: scripts/sgdml_dataset_from_extxyz.py
================================================
#!/usr/bin/python

# MIT License
#
# Copyright (c) 2018-2022 Stefan Chmiela
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

from __future__ import print_function

import argparse
import os
import sys

try:
    from ase.io import read
except ImportError:
    raise ImportError('Optional ASE dependency not found! Please run \'pip install sgdml[ase]\' to install it.')

import numpy as np

from sgdml import __version__
from sgdml.utils import io, ui

if sys.version[0] == '3':
    raw_input = input


# Note: assumes that the atoms in each molecule are in the same order.
def read_nonstd_ext_xyz(f):
    n_atoms = None

    R, z, E, F = [], [], [], []
    for i, line in enumerate(f):
        line = line.strip()
        if not n_atoms:
            n_atoms = int(line)
            print('Number atoms per geometry: {:,}'.format(n_atoms))

        file_i, line_i = divmod(i, n_atoms + 2)

        if line_i == 1:
            try:
                e = float(line)
            except ValueError:
                pass
            else:
                E.append(e)

        cols = line.split()
        if line_i >= 2:
            R.append(list(map(float, cols[1:4])))
            if file_i == 0:  # first molecule
                z.append(io._z_str_to_z_dict[cols[0]])
            F.append(list(map(float, cols[4:7])))

        if file_i % 1000 == 0:
            sys.stdout.write('\rNumber geometries found so far: {:,}'.format(file_i))
            sys.stdout.flush()
    sys.stdout.write('\rNumber geometries found so far: {:,}'.format(file_i))
    sys.stdout.flush()
    print()

    R = np.array(R).reshape(-1, n_atoms, 3)
    z = np.array(z)
    E = None if not E else np.array(E)
    F = np.array(F).reshape(-1, n_atoms, 3)

    if F.shape[0] != R.shape[0]:
        sys.exit(
            ui.color_str('[FAIL]', fore_color=ui.RED, bold=True)
            + ' Force labels are missing from dataset or are incomplete!'
        )

    f.close()
    return (R, z, E, F)

# Extracts info string for each frame.
def extract_info_from_extxyz(file_path):
    infos = []

    with open(file_path) as f:
        lines = f.readlines()

    i = 0
    while i < len(lines):
        try:
            n_atoms = int(lines[i])
        except ValueError:
            raise ValueError(f"Invalid atom count at line {i + 1}")

        if i + 1 >= len(lines):
            break

        comment_line = lines[i + 1].strip()
        info = {}
        for token in comment_line.split():
            if "=" in token:
                key, val = token.split("=", 1)
                val = val.strip('"')
                try:
                    val = float(val)
                except ValueError:
                    pass
                info[key] = val
        infos.append(info)

        i += 2 + n_atoms

    return infos


parser = argparse.ArgumentParser(
    description='Creates a dataset from extended XYZ format.'
)
parser.add_argument(
    'dataset',
    metavar='<dataset>',
    type=argparse.FileType('r'),
    help='path to extended xyz dataset file',
)
parser.add_argument(
    '-o',
    '--overwrite',
    dest='overwrite',
    action='store_true',
    help='overwrite existing dataset file',
)
args = parser.parse_args()
dataset = args.dataset


name = os.path.splitext(os.path.basename(dataset.name))[0]
dataset_file_name = name + '.npz'

dataset_exists = os.path.isfile(dataset_file_name)
if dataset_exists and args.overwrite:
    print(ui.color_str('[INFO]', bold=True) + ' Overwriting existing dataset file.')
if not dataset_exists or args.overwrite:
    print('Writing dataset to \'{}\'...'.format(dataset_file_name))
else:
    sys.exit(
        ui.color_str('[FAIL]', fore_color=ui.RED, bold=True)
        + ' Dataset \'{}\' already exists.'.format(dataset_file_name)
    )

lattice, R, z, E, F = None, None, None, None, None

mols = read(dataset.name, format='extxyz', index=':')
#calc = mols[0].get_calculator() # depreciated
calc = mols[0].calc
is_extxyz = calc is not None
if is_extxyz:

    print("\rNumber geometries found: {:,}\n".format(len(mols)))

    if 'forces' not in calc.results:
        sys.exit(
            ui.color_str('[FAIL]', fore_color=ui.RED, bold=True)
            + ' Forces are missing in the input file!'
        )

    lattice = np.array(mols[0].get_cell().T)
    if not np.any(lattice): # all zeros
        print(
            ui.color_str('[INFO]', bold=True)
            + ' No lattice vectors specified in extended XYZ file.'
        )
        lattice = None

    Z = np.array([mol.get_atomic_numbers() for mol in mols])
    all_z_the_same = (Z == Z[0]).all()
    if not all_z_the_same:
        sys.exit(
            ui.color_str('[FAIL]', fore_color=ui.RED, bold=True)
            + ' Order of atoms changes accross dataset.'
        )

    R = np.array([mol.get_positions() for mol in mols])
    z = Z[0]

    # ASE did not parse info string. Try doing it manually.
    if not mols[0].info:

        print(
            ui.color_str('[INFO]', bold=True)
            + ' ASE did not parse info string completely. Try doing it manually.'
        )

        infos = extract_info_from_extxyz(dataset.name)
        for mol, info in zip(mols, infos):
            mol.info.update(info)

    if 'Energy' in mols[0].info:
        E = np.array([mol.info['Energy'] for mol in mols])
    if 'energy' in mols[0].info:
        E = np.array([mol.info['energy'] for mol in mols])
    F = np.array([mol.get_forces() for mol in mols])

else:  # legacy non-standard XYZ format

    with open(dataset.name) as f:
        R, z, E, F = read_nonstd_ext_xyz(f)

# Base variables contained in every model file.
base_vars = {
    'type': 'd',
    'code_version': __version__,
    'name': name,
    'theory': 'unknown',
    'R': R,
    'z': z,
    'F': F,
}

base_vars['F_min'], base_vars['F_max'] = np.min(F.ravel()), np.max(F.ravel())
base_vars['F_mean'], base_vars['F_var'] = np.mean(F.ravel()), np.var(F.ravel())

print('Please provide a description of the length unit used in your input file, e.g. \'Ang\' or \'au\': ')
print('Note: This string will be stored in the dataset file and passed on to models files for later reference.')
r_unit = raw_input('> ').strip()
if r_unit != '':
    base_vars['r_unit'] = r_unit

print('Please provide a description of the energy unit used in your input file, e.g. \'kcal/mol\' or \'eV\': ')
print('Note: This string will be stored in the dataset file and passed on to models files for later reference.')
e_unit = raw_input('> ').strip()
if e_unit != '':
    base_vars['e_unit'] = e_unit

if E is not None:
    base_vars['E'] = E
    base_vars['E_min'], base_vars['E_max'] = np.min(E), np.max(E)
    base_vars['E_mean'], base_vars['E_var'] = np.mean(E), np.var(E)
else:
    print(ui.color_str('[INFO]', bold=True) + ' No energy labels found in dataset.')

if lattice is not None:
    base_vars['lattice'] = lattice

base_vars['md5'] = io.dataset_md5(base_vars)
np.savez_compressed(dataset_file_name, **base_vars)
print(ui.color_str('[DONE]', fore_color=ui.GREEN, bold=True))


================================================
FILE: scripts/sgdml_dataset_from_ipi.py
================================================
#!/usr/bin/python

# MIT License
#
# Copyright (c) 2018 Stefan Chmiela
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

from __future__ import print_function

import argparse
import os
import sys

import numpy as np

from sgdml.utils import io, ui


def raw_input_float(prompt):
    while True:
        try:
            return float(input(prompt))
        except ValueError:
            print(ui.color_str('[FAIL]', fore_color=ui.RED, bold=True) + ' That is not a valid float.')


# Assumes that the atoms in each molecule are in the same order.
def read_concat_xyz(f):
    n_atoms = None

    R, z = [], []
    for i, line in enumerate(f):
        line = line.strip()
        if not n_atoms:
            n_atoms = int(line)
            print('Number atoms per geometry:      {:>7d}'.format(n_atoms))

        file_i, line_i = divmod(i, n_atoms + 2)

        cols = line.split()
        if line_i >= 2:
            if file_i == 0:  # first molecule
                z.append(io._z_str_to_z_dict[cols[0]])
            R.append(list(map(float, cols[1:4])))

        if file_i % 1000 == 0:
            sys.stdout.write("\rNumber geometries found so far: {:>7d}".format(file_i))
            sys.stdout.flush()
    sys.stdout.write("\rNumber geometries found so far: {:>7d}\n".format(file_i))
    sys.stdout.flush()

    # Only keep complete entries.
    R = R[: int(n_atoms * np.floor(len(R) / float(n_atoms)))]

    R = np.array(R).reshape(-1, n_atoms, 3)
    z = np.array(z)

    f.close()
    return (R, z)


def read_out_file(f, col):

    E = []
    for i, line in enumerate(f):
        line = line.strip()
        if line[0] != '#':  # Ignore comments.
            E.append(float(line.split()[col]))
        if i % 1000 == 0:
            sys.stdout.write("\rNumber lines processed so far:  {:>7d}".format(len(E)))
            sys.stdout.flush()
    sys.stdout.write("\rNumber lines processed so far:  {:>7d}\n".format(len(E)))
    sys.stdout.flush()

    return np.array(E)


parser = argparse.ArgumentParser(
    description='Creates a dataset from extended [TODO] format.'
)
parser.add_argument(
    'geometries',
    metavar='<geometries>',
    type=argparse.FileType('r'),
    help='path to XYZ geometry file',
)
parser.add_argument(
    'forces',
    metavar='<forces>',
    type=argparse.FileType('r'),
    help='path to XYZ force file',
)
parser.add_argument(
    'energies',
    metavar='<energies>',
    type=argparse.FileType('r'),
    help='path to CSV force file',
)
parser.add_argument(
    'energy_col',
    metavar='<energy_col>',
    type=lambda x: io.is_strict_pos_int(x),
    help='which column to parse from energy file (zero based)',
    nargs='?',
    default=0,
)
parser.add_argument(
    '-o',
    '--overwrite',
    dest='overwrite',
    action='store_true',
    help='overwrite existing dataset file',
)
args = parser.parse_args()
geometries = args.geometries
forces = args.forces
energies = args.energies
energy_col = args.energy_col

name = os.path.splitext(os.path.basename(geometries.name))[0]
dataset_file_name = name + '.npz'

dataset_exists = os.path.isfile(dataset_file_name)
if dataset_exists and args.overwrite:
    print(ui.color_str('[INFO]', bold=True) + ' Overwriting existing dataset file.')
if not dataset_exists or args.overwrite:
    print('Writing dataset to \'%s\'...' % dataset_file_name)
else:
    sys.exit(
        ui.color_str('[FAIL]', fore_color=ui.RED, bold=True) + ' Dataset \'%s\' already exists.' % dataset_file_name
    )


print('Reading geometries...')
R, z = read_concat_xyz(geometries)

print('Reading forces...')
F, _ = read_concat_xyz(forces)

print('Reading energies from column %d...' % energy_col)
E = read_out_file(energies, energy_col)

# Prune all arrays to same length.
n_mols = min(min(R.shape[0], F.shape[0]), E.shape[0])
if n_mols != R.shape[0] or n_mols != F.shape[0] or n_mols != E.shape[0]:
    print(
        ui.color_str('[WARN]', fore_color=ui.YELLOW, bold=True)
        + ' Incomplete output detected: Final dataset was pruned to %d points.' % n_mols
    )
R = R[:n_mols, :, :]
F = F[:n_mols, :, :]
E = E[:n_mols]

print(
    ui.color_str('[INFO]', bold=True)
    + ' Geometries, forces and energies must have consistent units.'
)
R_conv_fact = raw_input_float('Unit conversion factor for geometries: ')
R = R * R_conv_fact
F_conv_fact = raw_input_float('Unit conversion factor for forces: ')
F = F * F_conv_fact
E_conv_fact = raw_input_float('Unit conversion factor for energies: ')
E = E * E_conv_fact

# Base variables contained in every model file.
base_vars = {
    'type': 'd',
    'R': R,
    'z': z,
    'E': E[:, None],
    'F': F,
    'name': name,
    'theory': 'unknown',
}
base_vars['md5'] = io.dataset_md5(base_vars)

np.savez_compressed(dataset_file_name, **base_vars)
ui.color_str('[DONE]', fore_color=ui.GREEN, bold=True)


================================================
FILE: scripts/sgdml_dataset_to_extxyz.py
================================================
#!/usr/bin/python

# MIT License
#
# Copyright (c) 2018-2019 Stefan Chmiela
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

from __future__ import print_function

import argparse
import os
import sys

import numpy as np

from sgdml.utils import io, ui


parser = argparse.ArgumentParser(
    description='Converts a native dataset file to extended XYZ format.'
)
parser.add_argument(
    'dataset',
    metavar='<dataset>',
    type=lambda x: io.is_file_type(x, 'dataset'),
    help='path to dataset file',
)
parser.add_argument(
    '-o',
    '--overwrite',
    dest='overwrite',
    action='store_true',
    help='overwrite existing xyz dataset file',
)

args = parser.parse_args()
dataset_path, dataset = args.dataset

name = os.path.splitext(os.path.basename(dataset_path))[0]
dataset_file_name = name + '.xyz'

xyz_exists = os.path.isfile(dataset_file_name)
if xyz_exists and args.overwrite:
    print(ui.color_str('[INFO]', bold=True) + ' Overwriting existing xyz dataset file.')
if not xyz_exists or args.overwrite:
    print(ui.color_str('[INFO]', bold=True) + ' Writing dataset to \'{}\'...'.format(dataset_file_name))
else:
    sys.exit(
        ui.color_str('[FAIL]', fore_color=ui.RED, bold=True) + ' Dataset \'{}\' already exists.'.format(dataset_file_name)
    )

R = dataset['R']
z = dataset['z']
F = dataset['F']

lattice = dataset['lattice'] if 'lattice' in dataset else None

try:
    with open(dataset_file_name, 'w') as file:

        n = R.shape[0]
        for i, r in enumerate(R):

            e = np.squeeze(dataset['E'][i]) if 'E' in dataset else None
            f = dataset['F'][i,:,:]
            ext_xyz_str = io.generate_xyz_str(r, z, e=e, f=f, lattice=lattice) + '\n'

            file.write(ext_xyz_str)

            progr = float(i) / (n - 1)
            ui.callback(i, n - 1, disp_str='Exporting %d data points...' % n)
            
except IOError:
    sys.exit("ERROR: Writing xyz file failed.")

print()


================================================
FILE: scripts/sgdml_dataset_via_ase.py
================================================
#!/usr/bin/python

# MIT License
#
# Copyright (c) 2018-2022 Stefan Chmiela
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

from __future__ import print_function

import argparse
import os
import sys

try:
    from ase.io import read
except ImportError:
    raise ImportError('Optional ASE dependency not found! Please run \'pip install sgdml[ase]\' to install it.')

import numpy as np

from sgdml import __version__
from sgdml.utils import io, ui

if sys.version[0] == '3':
    raw_input = input


parser = argparse.ArgumentParser(
    description='Creates a dataset from any input format supported by ASE.'
)
parser.add_argument(
    'dataset',
    metavar='<dataset>',
    type=argparse.FileType('r'),
    help='path to input dataset file',
)
parser.add_argument(
    '-o',
    '--overwrite',
    dest='overwrite',
    action='store_true',
    help='overwrite existing dataset file',
)
args = parser.parse_args()
dataset = args.dataset


name = os.path.splitext(os.path.basename(dataset.name))[0]
dataset_file_name = name + '.npz'

dataset_exists = os.path.isfile(dataset_file_name)
if dataset_exists and args.overwrite:
    print(ui.color_str('[INFO]', bold=True) + ' Overwriting existing dataset file.')
if not dataset_exists or args.overwrite:
    print('Writing dataset to \'{}\'...'.format(dataset_file_name))
else:
    sys.exit(
        ui.color_str('[FAIL]', fore_color=ui.RED, bold=True)
        + ' Dataset \'{}\' already exists.'.format(dataset_file_name)
    )

mols = read(dataset.name, index=':')

# filter incomplete outputs from trajectory
mols = [mol for mol in mols if mol.get_calculator() is not None]

lattice, R, z, E, F = None, None, None, None, None

calc = mols[0].get_calculator()

print("\rNumber geometries: {:,}".format(len(mols)))
#print("\rAvailable properties: " + ', '.join(calc.results))
print()

if 'forces' not in calc.results:
    sys.exit(
        ui.color_str('[FAIL]', fore_color=ui.RED, bold=True)
        + ' Forces are missing in the input file!'
    )

lattice = np.array(mols[0].get_cell().T)
if not np.any(lattice):
    print(
        ui.color_str('[INFO]', bold=True)
        + ' No lattice vectors specified.'
    )
    lattice = None

Z = np.array([mol.get_atomic_numbers() for mol in mols])
all_z_the_same = (Z == Z[0]).all()
if not all_z_the_same:
    sys.exit(
        ui.color_str('[FAIL]', fore_color=ui.RED, bold=True)
        + ' Order of atoms changes accross dataset.'
    )

R = np.array([mol.get_positions() for mol in mols])
z = Z[0]

if 'Energy' in mols[0].info:
    E = np.array([float(mol.info['Energy']) for mol in mols])
else:
    E = np.array([mol.get_potential_energy() for mol in mols])
F = np.array([mol.get_forces() for mol in mols])

print('Please provide a name for this dataset. Otherwise the original filename will be reused.')
custom_name = raw_input('> ').strip()
if custom_name != '':
    name = custom_name

print('Please provide a descriptor for the level of theory used to create this dataset.')
theory = raw_input('> ').strip()
if theory == '':
    theory = 'unknown'

# Base variables contained in every model file.
base_vars = {
    'type': 'd',
    'code_version': __version__,
    'name': name,
    'theory': theory,
    'R': R,
    'z': z,
    'F': F,
}

base_vars['F_min'], base_vars['F_max'] = np.min(F.ravel()), np.max(F.ravel())
base_vars['F_mean'], base_vars['F_var'] = np.mean(F.ravel()), np.var(F.ravel())

print('If you want to convert your original length unit, please provide a conversion factor (default: 1.0): ')
R_to_new_unit = raw_input('> ').strip()
if R_to_new_unit != '':
    R_to_new_unit = float(R_to_new_unit)
else:
    R_to_new_unit = 1.0

print('If you want to convert your original energy unit, please provide a conversion factor (default: 1.0): ')
E_to_new_unit = raw_input('> ').strip()
if E_to_new_unit != '':
    E_to_new_unit = float(E_to_new_unit)
else:
    E_to_new_unit = 1.0

print('Please provide a description of the length unit, e.g. \'Ang\' or \'au\': ')
print('Note: This string will be stored in the dataset file and passed on to models files for later reference.')
r_unit = raw_input('> ').strip()
if r_unit != '':
    base_vars['r_unit'] = r_unit

print('Please provide a description of the energy unit, e.g. \'kcal/mol\' or \'eV\': ')
print('Note: This string will be stored in the dataset file and passed on to models files for later reference.')
e_unit = raw_input('> ').strip()
if e_unit != '':
    base_vars['e_unit'] = e_unit

if E is not None:
    base_vars['E'] = E * E_to_new_unit
    base_vars['E_min'], base_vars['E_max'] = np.min(E), np.max(E)
    base_vars['E_mean'], base_vars['E_var'] = np.mean(E), np.var(E)
else:
    print(ui.color_str('[INFO]', bold=True) + ' No energy labels found in dataset.')

base_vars['R'] *= R_to_new_unit
base_vars['F'] *= E_to_new_unit / R_to_new_unit

if lattice is not None:
    base_vars['lattice'] = lattice

base_vars['md5'] = io.dataset_md5(base_vars)
np.savez_compressed(dataset_file_name, **base_vars)
print(ui.color_str('[DONE]', fore_color=ui.GREEN, bold=True))


================================================
FILE: scripts/sgdml_datasets_from_model.py
================================================
#!/usr/bin/python

# MIT License
#
# Copyright (c) 2018 Stefan Chmiela
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

from __future__ import print_function

import argparse
import os
import sys

import numpy as np

from sgdml.utils import io, ui

parser = argparse.ArgumentParser(
    description='Extracts the training and test data subsets from a dataset that were used to construct a model.'
)
parser.add_argument(
    'model',
    metavar='<model_file>',
    type=lambda x: io.is_file_type(x, 'model'),
    help='path to model file',
)
parser.add_argument(
    'dataset',
    metavar='<dataset_file>',
    type=lambda x: io.is_file_type(x, 'dataset'),
    help='path to dataset file referenced in model',
)
parser.add_argument(
    '-o',
    '--overwrite',
    dest='overwrite',
    action='store_true',
    help='overwrite existing files',
)
args = parser.parse_args()

model_path, model = args.model
dataset_path, dataset = args.dataset


for s in ['train', 'valid']:

    if dataset['md5'] != model['md5_' + s]:
        sys.exit(
            ui.fail_str('[FAIL]')
            + ' Dataset fingerprint does not match the one referenced in model for \'%s\'.'
            % s
        )

    idxs = model['idxs_' + s]
    R = dataset['R'][idxs, :, :]
    E = dataset['E'][idxs]
    F = dataset['F'][idxs, :, :]

    base_vars = {
        'type': 'd',
        'name': dataset['name'].astype(str),
        'theory': dataset['theory'].astype(str),
        'z': dataset['z'],
        'R': R,
        'E': E,
        'F': F,
    }
    base_vars['md5'] = io.dataset_md5(base_vars)

    subset_file_name = '%s_%s.npz' % (
        os.path.splitext(os.path.basename(dataset_path))[0],
        s,
    )
    file_exists = os.path.isfile(subset_file_name)
    if file_exists and args.overwrite:
        print(ui.info_str('[INFO]') + ' Overwriting existing model file.')
    if not file_exists or args.overwrite:
        np.savez_compressed(subset_file_name, **base_vars)
        ui.callback(1, disp_str='Extracted %s dataset saved to \'%s\'' % (s, subset_file_name)) # DONE
    else:
        print(
            ui.warn_str('[WARN]')
            + ' %s dataset \'%s\' already exists.' % (s.capitalize(), subset_file_name)
            + '\n       Run \'python %s -o %s %s\' to overwrite.\n'
            % (os.path.basename(__file__), model_path, dataset_path)
        )
        sys.exit()


================================================
FILE: setup.cfg
================================================
[flake8]
max-complexity = 12
ignore = E501,W503,E741
select = C,E,F,W

[isort]
multi_line_output = 3
include_trailing_comma = 1
line_length = 85
sections = FUTURE,STDLIB,TYPING,THIRDPARTY,FIRSTPARTY,LOCALFOLDER
known_typing = typing, typing_extensions
no_lines_before = TYPING


================================================
FILE: setup.py
================================================
import os
import re
from io import open
from setuptools import setup, find_packages


def get_property(property, package):
    result = re.search(
        r'{}\s*=\s*[\'"]([^\'"]*)[\'"]'.format(property),
        open(package + '/__init__.py').read(),
    )
    return result.group(1)


from os import path

this_dir = path.abspath(path.dirname(__file__))
with open(path.join(this_dir, 'README.md'), encoding='utf8') as f:
    long_description = f.read()

# Scripts
scripts = []
for dirname, dirnames, filenames in os.walk('scripts'):
    for filename in filenames:
        if filename.endswith('.py'):
            scripts.append(os.path.join(dirname, filename))

setup(
    name='sgdml',
    version=get_property('__version__', 'sgdml'),
    description='Reference implementation of the GDML and sGDML force field models.',
    long_description=long_description,
    long_description_content_type='text/markdown',
    classifiers=[
        'Development Status :: 4 - Beta',
        'Environment :: Console',
        'Intended Audience :: Science/Research',
        'Intended Audience :: Education',
        'Intended Audience :: Developers',
        'License :: OSI Approved :: MIT License',
        'Operating System :: MacOS :: MacOS X',
        'Operating System :: POSIX :: Linux',
        'Programming Language :: Python :: 3.7',
        'Topic :: Scientific/Engineering :: Chemistry',
        'Topic :: Scientific/Engineering :: Physics',
        'Topic :: Software Development :: Libraries :: Python Modules',
    ],
    url='http://www.sgdml.org',
    author='Stefan Chmiela',
    author_email='sgdml@chmiela.com',
    license='LICENSE.txt',
    packages=find_packages(),
    install_requires=['torch >= 1.8', 'numpy >= 1.19.0', 'scipy >= 1.1.0', 'psutil', 'future'],
    entry_points={
        'console_scripts': ['sgdml=sgdml.cli:main', 'sgdml-get=sgdml.get:main']
    },
    extras_require={'ase': ['ase >= 3.16.2']},
    scripts=scripts,
    include_package_data=True,
    zip_safe=False,
)


================================================
FILE: sgdml/__init__.py
================================================
#!/usr/bin/python

# MIT License
#
# Copyright (c) 2019-2025 Stefan Chmiela
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

__version__ = '1.0.3'

MAX_PRINT_WIDTH = 100
LOG_LEVELNAME_WIDTH = 7  # do not modify

# more descriptive callback status
DONE = 1
NOT_DONE = 0


# Logging

import copy
import logging
import re
import textwrap

from .utils import ui


class ColoredFormatter(logging.Formatter):

    LEVEL_COLORS = {
        'DEBUG': (ui.CYAN, ui.BLACK),
        'INFO': (ui.WHITE, ui.BLACK),
        'DONE': (ui.GREEN, ui.BLACK),
        'WARNING': (ui.YELLOW, ui.BLACK),
        'ERROR': (ui.RED, ui.BLACK),
        'CRITICAL': (ui.BLACK, ui.RED),
    }

    LEVEL_NAMES = {
        'DEBUG': '[DEBG]',
        'INFO': '[INFO]',
        'DONE': '[DONE]',
        'WARNING': '[WARN]',
        'ERROR': '[FAIL]',
        'CRITICAL': '[CRIT]',
    }

    def __init__(self, msg, use_color=True):

        logging.Formatter.__init__(self, msg)
        self.use_color = use_color

    def format(self, record):

        _record = copy.copy(record)
        levelname = _record.levelname
        msg = _record.msg

        levelname = ui.color_str(
            self.LEVEL_NAMES[levelname],
            self.LEVEL_COLORS[levelname][0],
            self.LEVEL_COLORS[levelname][1],
            bold=True,
        )

        if _record.levelname != 'CRITICAL':
            # wrap long messages (except for critical [i.e. exceptions, since they print a formatted traceback string])
            msg = ui.wrap_str(msg)

        # indent multiline strings after the first line
        msg = ui.indent_str(msg, LOG_LEVELNAME_WIDTH)[LOG_LEVELNAME_WIDTH:]

        _record.levelname = levelname
        _record.msg = msg
        return logging.Formatter.format(self, _record)


class ColoredLogger(logging.Logger):
    def __init__(self, name):

        logging.Logger.__init__(self, name, logging.DEBUG)

        # add 'DONE' logging level
        logging.DONE = logging.INFO + 1
        logging.addLevelName(logging.DONE, 'DONE')

        # only display levelname and message
        formatter = ColoredFormatter('%(levelname)s %(message)s')

        # this handler will write to sys.stderr by default
        hd = logging.StreamHandler()
        hd.setFormatter(formatter)
        hd.setLevel(
            logging.INFO
        ) # control logging level here

        self.addHandler(hd)
        return

    def done(self, msg, *args, **kwargs):

        if self.isEnabledFor(logging.DONE):
            self._log(logging.DONE, msg, args, **kwargs)


logging.setLoggerClass(ColoredLogger)


================================================
FILE: sgdml/cli.py
================================================
#!/usr/bin/python

# MIT License
#
# Copyright (c) 2018-2022 Stefan Chmiela
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

from __future__ import print_function

import logging
import multiprocessing as mp
import argparse
import os
import shutil
import psutil
import sys
import traceback
import time
from functools import partial

import numpy as np
import scipy as sp

try:
    import torch
except ImportError:
    _has_torch = False
else:
    _has_torch = True

try:
    _torch_mps_is_available = torch.backends.mps.is_available()
except AttributeError:
    _torch_mps_is_available = False
_torch_mps_is_available = False

try:
    _torch_cuda_is_available = torch.cuda.is_available()
except AttributeError:
    _torch_cuda_is_available = False

try:
    import ase
except ImportError:
    _has_ase = False
else:
    _has_ase = True

from . import __version__, DONE, NOT_DONE, MAX_PRINT_WIDTH
from .predict import GDMLPredict
from .train import GDMLTrain
from .utils import io, ui

# BASE_DIR = os.path.dirname(os.path.abspath(__file__))
PACKAGE_NAME = 'sgdml'

log = logging.getLogger(__name__)


class AssistantError(Exception):
    pass


def _print_splash(max_memory, max_processes, use_torch):

    logo_str = r"""         __________  __  _____
   _____/ ____/ __ \/  |/  / /
  / ___/ / __/ / / / /|_/ / /
 (__  ) /_/ / /_/ / /  / / /___
/____/\____/_____/_/  /_/_____/"""

    can_update, latest_version = _check_update()

    version_str = __version__
    version_str += (
        ' '
        + ui.color_str(
            ' Latest: ' + latest_version + ' ',
            fore_color=ui.BLACK,
            back_color=ui.YELLOW,
            bold=True,
        )
        if can_update
        else ''
    )

    max_memory_str = '{:d} GB(s) memory'.format(max_memory)
    max_processes_str = '{:d} CPU(s)'.format(max_processes)
    hardware_str = 'using {}, {}'.format(max_memory_str, max_processes_str)

    if use_torch and _has_torch:

        if _torch_cuda_is_available:
            num_gpu = torch.cuda.device_count()
            if num_gpu > 0:
                hardware_str += ', {:d} GPU(s)'.format(num_gpu)
        elif _torch_mps_is_available:
            hardware_str += ', MPS enabled'

    logo_str_split = logo_str.splitlines()
    print('\n'.join(logo_str_split[:-1]))
    ui.print_two_column_str(logo_str_split[-1] + '  ' + version_str, hardware_str)

    # Print update notice.
    if can_update:
        print(
            '\n'
            + ui.color_str(
                ' UPDATE AVAILABLE ',
                fore_color=ui.BLACK,
                back_color=ui.YELLOW,
                bold=True,
            )
            + '\n'
            + '-' * MAX_PRINT_WIDTH
        )
        print(
            'A new stable release version {} of this software is available.'.format(
                latest_version
            )
        )
        print(
            'You can update your installation by running \'pip install sgdml --upgrade\'.'
        )

    _print_billboard()


def _check_update():

    try:
        from urllib.request import urlopen
    except ImportError:
        from urllib2 import urlopen

    base_url = 'http://api.sgdml.org/'
    url = '{}update.php?v={}'.format(base_url, __version__)

    can_update, must_update = '0', '0'
    latest_version = ''
    try:
        response = urlopen(url, timeout=1)
        can_update, must_update, latest_version = response.read().decode().split(',')
        response.close()
    except:
        pass

    return can_update == '1', latest_version


def _print_billboard():

    try:
        from urllib.request import urlopen
    except ImportError:
        from urllib2 import urlopen

    base_url = 'http://api.sgdml.org/'
    url = '{}billboard.php'.format(base_url)

    resp_str = ''
    try:
        response = urlopen(url, timeout=1)
        resp_str = response.read().decode()
        response.close()
    except:
        pass

    bbs = None
    try:
        import json

        bbs = json.loads(resp_str)
    except:
        pass

    if bbs is not None:

        for bb in bbs:

            back_color = ui.WHITE
            if bb['color'] == 'YELLOW':
                back_color = ui.YELLOW
            elif bb['color'] == 'GREEN':
                back_color = ui.GREEN
            elif bb['color'] == 'RED':
                back_color = ui.RED
            elif bb['color'] == 'CYAN':
                back_color = ui.CYAN

            print(
                '\n'
                + ui.color_str(
                    ' {} '.format(bb['title']),
                    fore_color=ui.BLACK,
                    back_color=back_color,
                    bold=True,
                )
                + '\n'
                + '-' * MAX_PRINT_WIDTH
            )

            print(ui.wrap_str(bb['text'], width=MAX_PRINT_WIDTH - 2))


def _print_dataset_properties(dataset, title_str='Dataset properties'):

    print(ui.color_str(title_str, bold=True))

    n_mols, n_atoms, _ = dataset['R'].shape
    print('  {:<18} \'{}\''.format('Name:', ui.unicode_str(dataset['name'])))
    print('  {:<18} \'{}\''.format('Theory level:', ui.unicode_str(dataset['theory'])))
    print('  {:<18} {:<d}'.format('Atoms:', n_atoms))

    print('  {:<18} {:,} data points'.format('Size:', n_mols))

    ui.print_lattice(dataset['lattice'] if 'lattice' in dataset else None)

    if 'perms' in dataset:
        ui.print_two_column_str(
            '  {:<18} {}'.format('Symmetries:', len(dataset['perms'])),
            'This dataset contains precomputed permutations.',
        )

    if 'E' in dataset:

        e_unit = 'unknown unit'
        if 'e_unit' in dataset:
            e_unit = ui.unicode_str(dataset['e_unit'])

        print('  Energies [{}]'.format(e_unit))
        if 'E_min' in dataset and 'E_max' in dataset:
            E_min, E_max = dataset['E_min'], dataset['E_max']
        else:
            E_min, E_max = np.min(dataset['E']), np.max(dataset['E'])
        E_range_str = ui.gen_range_str(E_min, E_max)
        ui.print_two_column_str(
            '    {:<16} {}'.format('Range:', E_range_str), 'min |-- range --| max'
        )

        E_mean = dataset['E_mean'] if 'E_mean' in dataset else np.mean(dataset['E'])
        print('    {:<16} {:<.3f}'.format('Mean:', E_mean))

        E_var = dataset['E_var'] if 'E_var' in dataset else np.var(dataset['E'])
        print('    {:<16} {:<.3f}'.format('Variance:', E_var))
    else:
        print('  {:<18} {}'.format('Energies:', 'n/a'))

    f_unit = 'unknown unit'
    if 'r_unit' in dataset and 'e_unit' in dataset:
        f_unit = (
            ui.unicode_str(dataset['e_unit']) + '/' + ui.unicode_str(dataset['r_unit'])
        )

    print('  Forces [{}]'.format(f_unit))

    if 'F_min' in dataset and 'F_max' in dataset:
        F_min, F_max = dataset['F_min'], dataset['F_max']
    else:
        F_min, F_max = np.min(dataset['F'].ravel()), np.max(dataset['F'].ravel())
    F_range_str = ui.gen_range_str(F_min, F_max)
    ui.print_two_column_str(
        '    {:<16} {}'.format('Range:', F_range_str), 'min |-- range --| max'
    )

    F_mean = dataset['F_mean'] if 'F_mean' in dataset else np.mean(dataset['F'].ravel())
    print('    {:<16} {:<.3f}'.format('Mean:', F_mean))

    F_var = dataset['F_var'] if 'F_var' in dataset else np.var(dataset['F'].ravel())
    print('    {:<16} {:<.3f}'.format('Variance:', F_var))

    print('  {:<18} {}'.format('Fingerprint:', ui.unicode_str(dataset['md5'])))

    # if 'code_version' in dataset:
    #    print('  {:<18} sGDML {}'.format('Created with:', ui.unicode_str(dataset['code_version'])))

    idx = np.random.choice(n_mols, 1)[0]
    r = dataset['R'][idx, :, :]
    e = np.squeeze(dataset['E'][idx]) if 'E' in dataset else None
    f = dataset['F'][idx, :, :]
    lattice = dataset['lattice'] if 'lattice' in dataset else None

    print(
        '\n'
        + ui.color_str('Example geometry', fore_color=ui.WHITE, bold=True)
        + ' (point no. {:,}, chosen randomly)'.format(idx + 1)
    )

    xyz_info_str = 'Copy & paste the string below into Jmol (www.jmol.org), Avogadro (www.avogadro.cc), etc. to visualize one of the geometries from this dataset. A new example will be drawn on each run.'
    xyz_info_str = ui.wrap_str(xyz_info_str, width=MAX_PRINT_WIDTH - 2)
    xyz_info_str = ui.indent_str(xyz_info_str, 2)
    print(xyz_info_str + '\n')

    xyz_str = io.generate_xyz_str(r, dataset['z'], e=e, f=f, lattice=lattice)
    xyz_str = ui.indent_str(xyz_str, 2)

    cut_str = '---- COPY HERE '
    cut_str_reps = int(np.floor((MAX_PRINT_WIDTH - 6) / len(cut_str)))
    cutline_str = ui.color_str(
        '  -' + cut_str * cut_str_reps + '-----', fore_color=ui.GRAY
    )

    print(cutline_str)
    print(xyz_str)
    print(cutline_str)


def _print_task_properties_reduced(
    use_sym, use_E, use_E_cstr, title_str='Task properties'
):

    print(ui.color_str(title_str, bold=True))

    energy_fix_str = (
        (
            'pointwise energy constraints'
            if use_E_cstr
            else 'global integration constant'
        )
        if use_E
        else 'none'
    )
    print('  {:<16} {}'.format('Energy offset:', energy_fix_str))

    print(
        '  {:<16} {}'.format(
            'Symmetries:', 'include (sGDML)' if use_sym else 'ignore (GDML)'
        )
    )


def _print_task_properties(task, title_str='Task properties'):

    print(ui.color_str(title_str, bold=True))

    print('  {:<18}'.format('Dataset'))
    print('    {:<16} \'{}\''.format('Name:', ui.unicode_str(task['dataset_name'])))
    print(
        '    {:<16} \'{}\''.format(
            'Theory level:', ui.unicode_str(task['dataset_theory'])
        )
    )

    n_atoms = len(task['z'])
    print('    {:<16} {:<d}'.format('Atoms:', n_atoms))

    ui.print_lattice(task['lattice'] if 'lattice' in task else None, inset=True)

    print('  {:<18} {:<d}'.format('Symmetries:', len(task['perms'])))

    print('  {:<18}'.format('Hyper-parameters'))
    print('    {:<16} {:<d}'.format('Length scale:', task['sig']))

    if 'lam' in task:
        print('    {:<16} {:<.0e}'.format('Regularization:', task['lam']))

    # if 'solver_name' in task:
    #     print('  {:<18}'.format('Solver configuration'))
    #     print('    {:<16} \'{}\''.format('Type:', task['solver_name']))

    #     if task['solver_name'] == 'cg':

    #         if 'solver_tol' in task:
    #             print('    {:<16} {:<.0e}'.format('Tolerance:', task['solver_tol']))

    #         if 'n_inducing_pts_init' in task:
    #             print(
    #                 '    {:<16} {:<d}'.format(
    #                     'Inducing points:', task['n_inducing_pts_init']
    #                 )
    #             )
    # else:
    #     print('  {:<18} {}'.format('Solver:', 'unknown'))

    n_train = len(task['idxs_train'])
    ui.print_two_column_str(
        '  {:<18} {:,} points'.format('Train on:', n_train),
        'from \'' + ui.unicode_str(task['md5_train']) + '\'',
    )

    n_valid = len(task['idxs_valid'])
    ui.print_two_column_str(
        '  {:<18} {:,} points'.format('Validate on:', n_valid),
        'from \'' + ui.unicode_str(task['md5_valid']) + '\'',
    )

    # print('  {:<18}'.format('Estimated memory requirement (min.)'))

    # mem_kernel_mat_const = 0
    # mem_precond_const = 0
    # print(
    #    '    {:<16} {}'.format(
    #        'CPU:', ui.gen_memory_str(mem_kernel_mat_const + mem_precond_const)
    #    )
    # )
    # print('      {:<14} {}'.format('Kernel matrix:', ui.gen_memory_str(mem_kernel_mat_const))
    # print('      {:<14} {}'.format('Precond. factor:', ui.gen_memory_str(mem_precond_const)))

    # mem_torch_assemble = 0
    # mem_torch_eval = 0
    # print(
    #    '    {:<16} {}'.format(
    #        'GPU:', ui.gen_memory_str(mem_torch_assemble + mem_torch_eval)
    #    )
    # )
    # print('      {:<14} {}'.format('Kernel matrix assembly:', ui.gen_memory_str(mem_torch_assemble)))
    # print('      {:<14} {}'.format('Model evaluation:', ui.gen_memory_str(mem_torch_eval)))


def _print_model_properties(model, title_str='Model properties'):

    print(ui.color_str(title_str, bold=True))

    print('  {:<18}'.format('Dataset'))
    print('    {:<16} \'{}\''.format('Name:', ui.unicode_str(model['dataset_name'])))
    print(
        '    {:<16} \'{}\''.format(
            'Theory level:', ui.unicode_str(model['dataset_theory'])
        )
    )

    n_atoms = len(model['z'])
    print('    {:<16} {:<d}'.format('Atoms:', n_atoms))

    ui.print_lattice(model['lattice'] if 'lattice' in model else None, inset=True)

    print('  {:<18} {:<d}'.format('Symmetries:', len(model['perms'])))

    print('  {:<18}'.format('Hyper-parameters'))
    print('    {:<16} {:<d}'.format('Length scale:', model['sig']))

    if 'lam' in model:
        print('    {:<16} {:<.0e}'.format('Regularization:', model['lam']))

    if 'solver_name' in model:
        print('  {:<18}'.format('Solver'))
        print('    {:<16} \'{}\''.format('Type:', model['solver_name']))

        if model['solver_name'] == 'cg':

            if 'solver_tol' in model:
                ui.print_two_column_str(
                    '    {:<16} {:<.0e}'.format('Tolerance:', model['solver_tol']),
                    'iterate until: norm(K*alpha - y) <= tol*norm(y) = {:<.0e}'.format(
                        model['solver_tol'] * model['norm_y_train']
                    ),
                )

                if 'solver_resid' in model:
                    is_conv = (
                        model['solver_resid']
                        <= model['solver_tol'] * model['norm_y_train']
                    )
                    print(
                        '    {:<16} {:<.0e}{}'.format(
                            'Converged to:',
                            model['solver_resid'],
                            '' if is_conv else ' (NOT CONVERGED)',
                        )
                    )

            if 'solver_iters' in model:
                print('    {:<16} {:<d}'.format('Iterations:', model['solver_iters']))

            if 'inducing_pts_idxs' in model:
                n_inducing_pts = len(model['inducing_pts_idxs']) // (3 * n_atoms)
                ui.print_two_column_str(
                    '    {:<16} {:<d}'.format('Inducing points:', n_inducing_pts),
                    'inducing columns: {:<d} (multiplied by DOF)'.format(
                        n_inducing_pts * n_atoms * 3
                    ),
                )
    else:
        print('  {:<18} {}'.format('Solver:', 'unknown'))

    n_train = len(model['idxs_train'])
    ui.print_two_column_str(
        '  {:<18} {:,} points'.format('Trained on:', n_train),
        'from \'' + ui.unicode_str(model['md5_train']) + '\'',
    )

    use_E_cstr = 'alphas_E' in model
    print(
        '    {:<16} {}'.format(
            'Energy offset',
            '[{}] global integration constant'.format('x' if not use_E_cstr else ' '),
        )
    )
    ui.print_two_column_str(
        '                     {:<16}'.format(
            '[{}] pointwise energy constraints'.format('x' if use_E_cstr else ' ')
        ),
        'using \'--E_cstr\'',
    )

    if model['use_E']:
        e_err = model['e_err'].item()
    f_err = model['f_err'].item()

    n_valid = len(model['idxs_valid'])
    is_valid = not np.isnan(f_err['mae']) and not np.isnan(f_err['rmse'])
    ui.print_two_column_str(
        '  {:<18} {}{:,} points'.format(
            'Validated on:', '' if is_valid else '[pending] ', n_valid
        ),
        'from \'' + ui.unicode_str(model['md5_valid']) + '\'',
    )

    n_test = int(model['n_test'])
    is_test = n_test > 0
    if is_test:
        ui.print_two_column_str(
            '  {:<18} {:,} points'.format('Tested on:', n_test),
            'from \'' + ui.unicode_str(model['md5_test']) + '\'',
        )
    else:
        print('  {:<18} {}'.format('Test:', '[pending]'))

    e_unit = 'unknown unit'
    f_unit = 'unknown unit'
    if 'r_unit' in model and 'e_unit' in model:
        e_unit = model['e_unit']
        f_unit = ui.unicode_str(model['e_unit']) + '/' + ui.unicode_str(model['r_unit'])

    if is_valid:
        action_str = 'Validation' if not is_valid else 'Expected test'
        print('  {:<18}'.format('{} errors (MAE/RMSE)'.format(action_str)))
        if model['use_E']:
            print(
                '    {:<16} {:>.4f}/{:>.4f} [{}]'.format(
                    'Energy:', e_err['mae'], e_err['rmse'], e_unit
                )
            )
        print(
            '    {:<16} {:>.4f}/{:>.4f} [{}]'.format(
                'Forces:', f_err['mae'], f_err['rmse'], f_unit
            )
        )


def _print_next_step(
    prev_step, task_dir=None, model_dir=None, model_files=None, dataset_path=None
):

    if prev_step == 'create':

        assert task_dir is not None

        ui.print_step_title(
            'NEXT STEP',
            '{} train {} <valid_dataset_file>'.format(PACKAGE_NAME, task_dir),
            underscore=False,
        )

    elif prev_step == 'train' or prev_step == 'validate' or prev_step == 'resume':

        assert model_dir is not None and model_files is not None

        if dataset_path is None:
            dataset_path = '<test_dataset_file>'

        n_models = len(model_files)
        if n_models == 1:
            model_file_path = os.path.join(model_dir, model_files[0])
            ui.print_step_title(
                'NEXT STEP',
                '{} test {} {} [<n_test>]'.format(
                    PACKAGE_NAME, model_file_path, dataset_path
                ),
                underscore=False,
            )
        else:
            ui.print_step_title(
                'NEXT STEP',
                '{} select {}'.format(PACKAGE_NAME, model_dir),
                underscore=False,
            )

    elif prev_step == 'select':

        assert model_files is not None

        ui.print_step_title(
            'NEXT STEP',
            '{} test {} <test_dataset_file> [<n_test>]'.format(
                PACKAGE_NAME, model_files[0]
            ),
            underscore=False,
        )

    else:
        raise AssistantError('Unexpected previous step string.')


def all(
    dataset,
    valid_dataset,
    test_dataset,
    n_train,
    n_valid,
    n_test,
    sigs,
    gdml,
    use_E,
    use_E_cstr,
    lazy_training,
    overwrite,
    max_memory,
    max_processes,
    use_torch,
    task_dir=None,
    model_file=None,
    perms_from_arg=None,
    **kwargs
):

    print(
        '\n'
        + ui.color_str(' STEP 0 ', fore_color=ui.BLACK, back_color=ui.WHITE, bold=True)
        + ' Dataset(s)\n'
        + '-' * MAX_PRINT_WIDTH
    )

    _, dataset_extracted = dataset
    _print_dataset_properties(dataset_extracted, title_str='Properties')

    if valid_dataset is None:
        valid_dataset = dataset
    else:
        _, valid_dataset_extracted = valid_dataset
        print()
        _print_dataset_properties(
            valid_dataset_extracted, title_str='Properties (validation dataset)'
        )

        if not np.array_equal(dataset_extracted['z'], valid_dataset_extracted['z']):
            raise AssistantError(
                'Atom composition or order in validation dataset does not match the one in bulk dataset.'
            )

    if test_dataset is None:
        test_dataset = dataset
    else:
        _, test_dataset_extracted = test_dataset
        _print_dataset_properties(
            test_dataset_extracted, title_str='Properties (test dataset)'
        )

        if not np.array_equal(dataset_extracted['z'], test_dataset_extracted['z']):
            raise AssistantError(
                'Atom composition or order in test dataset does not match the one in bulk dataset.'
            )

    ui.print_step_title('STEP 1', 'Cross-validation task creation')
    task_dir = create(
        dataset,
        valid_dataset,
        n_train,
        n_valid,
        sigs,
        gdml,
        use_E,
        use_E_cstr,
        overwrite,
        task_dir,
        perms_from_arg=perms_from_arg,
        **kwargs
    )

    ui.print_step_title('STEP 2', 'Training and validation')
    task_dir_arg = io.is_dir_with_file_type(task_dir, 'task')
    model_dir_or_file_path = train(
        task_dir_arg,
        valid_dataset,
        lazy_training,
        overwrite,
        max_memory,
        max_processes,
        use_torch,
        **kwargs
    )

    model_dir_arg = io.is_dir_with_file_type(
        model_dir_or_file_path, 'model', or_file=True
    )

    _, model_file_names = model_dir_arg
    if len(model_file_names) == 0:
        raise AssistantError(
            'No trained models found!'
            + ('\nTry turning turning off \'--lazy\'-mode.' if lazy_training else '')
        )

    ui.print_step_title('STEP 3', 'Hyper-parameter selection')
    model_file_name = select(model_dir_arg, overwrite, model_file, **kwargs)

    # Have all tasks been trained?
    _, task_file_names = task_dir_arg
    if len(task_file_names) > len(model_file_names):
        log.warning(
            'Not all training tasks have been completed! The model selected here might not be optimal.'
            + ('\nTry turning turning off \'--lazy\'-mode.' if lazy_training else '')
        )

    ui.print_step_title('STEP 4', 'Testing')
    model_dir_arg = io.is_dir_with_file_type(model_file_name, 'model', or_file=True)
    test(
        model_dir_arg,
        test_dataset,
        n_test,
        overwrite=False,
        max_memory=max_memory,
        max_processes=max_processes,
        use_torch=use_torch,
        **kwargs
    )

    print(
        '\n'
        + ui.color_str('  DONE  ', fore_color=ui.BLACK, back_color=ui.GREEN, bold=True)
        + ' Training assistant finished sucessfully.'
    )
    print('         This is your model file: \'{}\''.format(model_file_name))


# if training job exists and is a subset of the requested cv range, add new tasks
# otherwise, if new range is different or smaller, fail
def create(  # noqa: C901
    dataset,
    valid_dataset,
    n_train,
    n_valid,
    sigs,
    gdml,
    use_E,
    use_E_cstr,
    overwrite,
    task_dir=None,
    perms_from_arg=None,
    command=None,
    **kwargs
):

    has_valid_dataset = not (valid_dataset is None or valid_dataset == dataset)

    dataset_path, dataset = dataset
    n_data = dataset['F'].shape[0]

    func_called_directly = (
        command == 'create'
    )  # has this function been called from command line or from 'all'?
    if func_called_directly:
        ui.print_step_title('TASK CREATION')
        _print_dataset_properties(dataset)
        print()

    _print_task_properties_reduced(use_sym=not gdml, use_E=use_E, use_E_cstr=use_E_cstr)
    print()

    if n_data < n_train:
        raise AssistantError(
            'Dataset only contains {} points, can not train on {}.'.format(
                n_data, n_train
            )
        )

    if not has_valid_dataset:
        valid_dataset_path, valid_dataset = dataset_path, dataset
        if n_data - n_train < n_valid:
            raise AssistantError(
                'Dataset only contains {} points, can not train on {} and validate on {}.'.format(
                    n_data, n_train, n_valid
                )
            )
    else:
        valid_dataset_path, valid_dataset = valid_dataset
        n_valid_data = valid_dataset['R'].shape[0]
        if n_valid_data < n_valid:
            raise AssistantError(
                'Validation dataset only contains {} points, can not validate on {}.'.format(
                    n_data, n_valid
                )
            )

    if sigs is None:
        log.info(
            'Kernel hyper-parameter sigma (length scale) was automatically set to range \'10:10:100\'.'
        )
        sigs = list(range(10, 100, 10))  # default range

    if task_dir is None:
        task_dir = io.train_dir_name(
            dataset,
            n_train,
            use_sym=not gdml,
            use_E=use_E,
            use_E_cstr=use_E_cstr,
        )

    task_file_names = []
    if os.path.exists(task_dir):
        if overwrite:
            log.info('Overwriting existing training directory')
            shutil.rmtree(task_dir, ignore_errors=True)
            os.makedirs(task_dir)
        else:
            if io.is_task_dir_resumeable(
                task_dir, dataset, valid_dataset, n_train, n_valid, sigs, gdml
            ):
                log.info(
                    'Resuming existing hyper-parameter search in \'{}\'.'.format(
                        task_dir
                    )
                )

                # Get all task file names.
                try:
                    _, task_file_names = io.is_dir_with_file_type(task_dir, 'task')
                except Exception:
                    pass
            else:
                raise AssistantError(
                    'Unfinished hyper-parameter search found in \'{}\'.\n'.format(
                        task_dir
                    )
                    + 'Run \'%s %s -o %s %d %d -s %s\' to overwrite.'
                    % (
                        PACKAGE_NAME,
                        command,
                        dataset_path,
                        n_train,
                        n_valid,
                        ' '.join(str(s) for s in sigs),
                    )
                )
    else:
        os.makedirs(task_dir)

    if task_file_names:

        with np.load(
            os.path.join(task_dir, task_file_names[0]), allow_pickle=True
        ) as task:
            tmpl_task = dict(task)
    else:
        if not use_E:
            log.info(
                'Energy labels will be ignored for training.\n'
                + 'Note: If available in the dataset file, the energy labels will however still be used to generate stratified training, test and validation datasets. Otherwise a random sampling is used.'
            )

        if 'E' not in dataset:
            log.warning(
                'Training dataset will be sampled with no guidance from energy labels (i.e. randomly)!'
            )

        if 'E' not in valid_dataset:
            log.warning(
                'Validation dataset will be sampled with no guidance from energy labels (i.e. randomly)!\n'
                + 'Note: Larger validation datasets are recommended due to slower convergence of the error.'
            )

        if ('lattice' in dataset) ^ ('lattice' in valid_dataset):
            log.error('One of the datasets specifies lattice vectors and one does not!')
            # TODO: stop program?

        if 'lattice' in dataset or 'lattice' in valid_dataset:
            log.info(
                'Lattice vectors found in dataset: applying periodic boundary conditions.'
            )

        perms = None
        if perms_from_arg is not None:

            _, perms_from = perms_from_arg
            if 'perms' in perms_from:
                perms = perms_from['perms']
            else:
                raise AssistantError(
                    'Provided permutation file does not contain any (looking for \'perms\'-key).'
                )

        gdml_train = (
            GDMLTrain()
        )  # No process number of memory restrictions necessary here.
        try:
            tmpl_task = gdml_train.create_task(
                dataset,
                n_train,
                valid_dataset,
                n_valid,
                sig=1,
                perms=perms,
                use_sym=not gdml,
                use_E=use_E,
                use_E_cstr=use_E_cstr,
                callback=ui.callback,
            )  # template task
        except:
            print()
            log.critical(traceback.format_exc())
            print()
            os._exit(1)

    n_written = 0
    for sig in sigs:
        tmpl_task['sig'] = sig
        task_file_name = io.task_file_name(tmpl_task)
        task_path = os.path.join(task_dir, task_file_name)

        if os.path.isfile(task_path):
            log.info('Skipping existing task \'{}\'.'.format(task_file_name))
        else:
            np.savez_compressed(task_path, **tmpl_task)
            n_written += 1
    if n_written > 0:
        log.done(
            'Writing {:d}/{:d} task(s) with m={} training points each'.format(
                n_written, len(sigs), tmpl_task['R_train'].shape[0]
            )
        )

    if func_called_directly:
        _print_next_step('create', task_dir=task_dir)

    return task_dir


def train(
    task_dir,
    valid_dataset,
    lazy_training,
    overwrite,
    max_memory,
    max_processes,
    use_torch,
    command=None,
    **kwargs
):

    task_dir, task_file_names = task_dir
    n_tasks = len(task_file_names)

    func_called_directly = (
        command == 'train'
    )  # Has this function been called from command line or from 'all'?
    if func_called_directly:
        ui.print_step_title('MODEL TRAINING')

    def save_progr_callback(
        unconv_model, unconv_model_path=None
    ):  # Saves current (unconverged) model during iterative training

        if unconv_model_path is None:
            log.critical(
                'Path for unconverged model not set in \'save_progr_callback\'.'
            )
            print()
            os._exit(1)

        np.savez_compressed(unconv_model_path, **unconv_model)

    try:
        gdml_train = GDMLTrain(
            max_memory=max_memory, max_processes=max_processes, use_torch=use_torch
        )
    except:
        print()
        log.critical(traceback.format_exc())
        print()
        os._exit(1)

    prev_valid_err = -1
    has_converged_once = False

    for i, task_file_name in enumerate(task_file_names):

        task_file_path = os.path.join(task_dir, task_file_name)
        with np.load(task_file_path, allow_pickle=True) as task:

            if n_tasks > 1:
                if i > 0:
                    print()

                n_train = len(task['idxs_train'])
                n_valid = len(task['idxs_valid'])
                ui.print_two_column_str(
                    ui.color_str('Task {:d} of {:d}'.format(i + 1, n_tasks), bold=True),
                    '{:,} + {:,} points (training + validation), sigma (length scale): {}'.format(
                        n_train, n_valid, task['sig']
                    ),
                )

            model_file_name = io.model_file_name(task, is_extended=False)
            model_file_path = os.path.join(task_dir, model_file_name)

            # is_conv = True
            # valid_errs = None
            # is_model_validated = False
            if not overwrite and os.path.isfile(
                model_file_path
            ):  # Train model found, validate if necessary
                log.info(
                    'Model \'{}\' already exists.'.format(model_file_name)
                    + (
                        '\nRun \'{} train -o {}\' to overwrite.'.format(
                            PACKAGE_NAME, task_file_path
                        )
                        if func_called_directly
                        else ''
                    )
                )

                model_path = os.path.join(task_dir, model_file_name)
                _, model = io.is_file_type(model_path, 'model')

                e_err = {'mae': 0.0, 'rmse': 0.0}
                if model['use_E']:
                    e_err = model['e_err'].item()
                f_err = model['f_err'].item()

                is_conv = True
                if 'solver_resid' in model:
                    is_conv = (
                        model['solver_resid']
                        <= model['solver_tol'] * model['norm_y_train']
                    )

                is_model_validated = not (
                    np.isnan(f_err['mae']) or np.isnan(f_err['rmse'])
                )
                if is_model_validated:

                    disp_str = (
                        'energy %.3f/%.3f, ' % (e_err['mae'], e_err['rmse'])
                        if model['use_E']
                        else ''
                    )
                    disp_str += 'forces %.3f/%.3f' % (f_err['mae'], f_err['rmse'])
                    disp_str = 'Validation errors (MAE/RMSE): ' + disp_str
                    ui.callback(1, 1, disp_str=disp_str)

                    valid_errs = [f_err['rmse']]

            else:  # Train and validate model

                # Check if training this task has been attempted before.
                if lazy_training and n_tasks > 1:
                    if 'tried_training' in task and task['tried_training']:
                        log.warning(
                            'Skipping task, because it has been tried before (without success).'
                        )
                        continue

                # Record in task file that there was a training attempt.
                task = dict(task)
                task['tried_training'] = True
                np.savez_compressed(task_file_path, **task)

                n_train, n_atoms = task['R_train'].shape[:2]

                unconv_model_file = '_unconv_{}'.format(model_file_name)
                unconv_model_path = os.path.join(task_dir, unconv_model_file)

                try:
                    model = gdml_train.train(
                        task,
                        partial(
                            save_progr_callback, unconv_model_path=unconv_model_path
                        ),
                        ui.callback,
                    )
                except:
                    print()
                    log.critical(traceback.format_exc())
                    print()
                    os._exit(1)
                else:
                    if func_called_directly:
                        log.done('Writing model to file \'{}\''.format(model_file_path))
                    np.savez_compressed(model_file_path, **model)

                    # Delete temporary model, if one exists.
                    unconv_model_exists = os.path.isfile(unconv_model_path)
                    if unconv_model_exists:
                        os.remove(unconv_model_path)

                is_model_validated = False

            if not is_model_validated:

                if (
                    n_tasks == 1
                ):  # Only validate if there is more than one training task.
                    log.info(
                        'Skipping validation step as there is only one model to validate.'
                    )
                    break

                # Validate model.
                model_dir = (task_dir, [model_file_name])
                valid_errs = test(
                    model_dir,
                    valid_dataset,
                    -1,  # n_test = -1 -> validation mode
                    overwrite,
                    max_memory,
                    max_processes,
                    use_torch,
                    command,
                    **kwargs
                )

                is_conv = True
                if 'solver_resid' in model:
                    is_conv = (
                        model['solver_resid']
                        <= model['solver_tol'] * model['norm_y_train']
                    )

            has_converged_once = has_converged_once or is_conv
            if (
                has_converged_once
                and prev_valid_err != -1
                and prev_valid_err < valid_errs[0]
            ):
                print()
                log.info(
                    'Skipping remaining training tasks, as validation error is rising again.'
                )
                break

            prev_valid_err = valid_errs[0]

    model_dir_or_file_path = model_file_path if n_tasks == 1 else task_dir
    if func_called_directly:

        model_dir_arg = io.is_dir_with_file_type(
            model_dir_or_file_path, 'model', or_file=True
        )
        model_dir, model_files = model_dir_arg
        _print_next_step('train', model_dir=model_dir, model_files=model_files)

    return model_dir_or_file_path  # model directory or file


def _batch(iterable, n=1):
    l = len(iterable)
    for ndx in range(0, l, n):
        yield iterable[ndx : min(ndx + n, l)]


def _online_err(err, size, n, mae_n_sum, rmse_n_sum):

    err = np.abs(err)

    mae_n_sum += np.sum(err) / size
    mae = mae_n_sum / n

    rmse_n_sum += np.sum(err**2) / size
    rmse = np.sqrt(rmse_n_sum / n)

    return mae, mae_n_sum, rmse, rmse_n_sum


def resume(
    model,
    dataset,
    valid_dataset,
    overwrite,
    max_memory,
    max_processes,
    use_torch,
    command=None,
    **kwargs
):

    model_path, model = model
    dataset_path, dataset = dataset

    valid_dataset_arg = valid_dataset
    valid_dataset_path, valid_dataset = valid_dataset

    ui.print_step_title('RESUME TRAINING')
    _print_model_properties(model, title_str='Model properties (initial)')
    print()

    if dataset['md5'] != model['md5_train']:
        raise AssistantError(
            'Fingerprint of provided training dataset does not match the one specified in model file.'
        )
    if valid_dataset['md5'] != model['md5_valid']:
        raise AssistantError(
            'Fingerprint of provided validation dataset does not match the one specified in model file.'
        )

    if model['solver_name'] == 'analytic':
        raise AssistantError(
            'This model was trained using a matrix decomposition method and thus already converged to the highest possible accuracy! It does not make sense to resume training in this case.'
        )
    elif 'solver_resid' in model and 'solver_tol' in model:
        if model['solver_resid'] > model['solver_tol'] * model['norm_y_train']:

            gdml_train = GDMLTrain(
                max_memory=max_memory, max_processes=max_processes, use_torch=use_torch
            )
            try:
                task = gdml_train.create_task_from_model(
                    model,
                    dataset,
                )
            except:
                print()
                log.critical(traceback.format_exc())
                print()
                os._exit(1)
            del gdml_train

            def save_progr_callback(
                unconv_model,
            ):  # saves current (unconverged) model during iterative training
                np.savez_compressed(model_path, **unconv_model)

            try:
                gdml_train = GDMLTrain(
                    max_memory=max_memory,
                    max_processes=max_processes,
                    use_torch=use_torch,
                )
            except:
                print()
                log.critical(traceback.format_exc())
                print()
                os._exit(1)

            try:
                model = gdml_train.train(
                    task, save_progr_callback=save_progr_callback, callback=ui.callback
                )
            except:
                print()
                log.critical(traceback.format_exc())
                print()
                os._exit(1)
            else:
                log.done('Model parameters have been updated.')
                np.savez_compressed(model_path, **model)

        else:
            log.warning('Model is already converged to the specified tolerance.')

    # Validate model.
    model_dir, model_file_name = os.path.split(model_path)
    model_dir_arg = (model_dir, [model_file_name])

    valid_errs = test(
        model_dir_arg,
        valid_dataset_arg,
        -1,  # n_test = -1 -> validation mode
        overwrite,
        max_memory,
        max_processes,
        use_torch,
        command,
        **kwargs
    )

    _print_next_step('resume', model_dir=model_dir, model_files=[model_file_name])


def validate(
    model_dir,
    valid_dataset,
    overwrite,
    max_memory,
    max_processes,
    use_torch,
    command=None,
    **kwargs
):

    dataset_path_extracted, dataset_extracted = valid_dataset

    func_called_directly = (
        command == 'validate'
    )  # has this function been called from command line or from 'all'?
    if func_called_directly:
        ui.print_step_title('MODEL VALIDATION')
        _print_dataset_properties(dataset_extracted)

    test(
        model_dir,
        valid_dataset,
        -1,  # n_test = -1 -> validation mode
        overwrite,
        max_memory,
        max_processes,
        use_torch,
        command,
        **kwargs
    )

    if func_called_directly:

        model_dir, model_files = model_dir
        n_models = len(model_files)
        _print_next_step('validate', model_dir=model_dir, model_files=model_files)


def test(
    model_dir,
    test_dataset,
    n_test,
    overwrite,
    max_memory,
    max_processes,
    use_torch,
    command=None,
    **kwargs
):  # noqa: C901

    # NOTE: this function runs a validation if n_test < 0 and test with all points if n_test == 0

    model_dir, model_file_names = model_dir
    n_models = len(model_file_names)

    n_test = 0 if n_test is None else n_test
    is_validation = n_test < 0
    is_test = n_test >= 0

    dataset_path, dataset = test_dataset

    func_called_directly = (
        command == 'test'
    )  # has this function been called from command line or from 'all'?
    if func_called_directly:
        ui.print_step_title('MODEL TEST')
        _print_dataset_properties(dataset)

    F_rmse = []

    # NEW

    DEBUG_WRITE = False

    if DEBUG_WRITE:
        if os.path.exists('test_pred.xyz'):
            os.remove('test_pred.xyz')
        if os.path.exists('test_ref.xyz'):
            os.remove('test_ref.xyz')
        if os.path.exists('test_diff.xyz'):
            os.remove('test_diff.xyz')

    # NEW

    num_workers, batch_size = -1, -1
    gdml_train = None
    for i, model_file_name in enumerate(model_file_names):

        model_path = os.path.join(model_dir, model_file_name)
        _, model = io.is_file_type(model_path, 'model')

        if i == 0 and command != 'all':
            print()
            _print_model_properties(model)
            print()

        if not np.array_equal(model['z'], dataset['z']):
            raise AssistantError(
                'Atom composition or order in dataset does not match the one in model.'
            )

        if ('lattice' in model) is not ('lattice' in dataset):
            if 'lattice' in model:
                raise AssistantError(
                    'Model contains lattice vectors, but dataset does not.'
                )
            elif 'lattice' in dataset:
                raise AssistantError(
                    'Dataset contains lattice vectors, but model does not.'
                )

        if model['use_E']:
            e_err = model['e_err'].item()
        f_err = model['f_err'].item()

        is_model_validated = not (np.isnan(f_err['mae']) or np.isnan(f_err['rmse']))

        if n_models > 1:
            if i > 0:
                print()
            print(
                ui.color_str(
                    '%s model %d of %d'
                    % ('Testing' if is_test else 'Validating', i + 1, n_models),
                    bold=True,
                )
            )

        if is_validation:
            if is_model_validated and not overwrite:
                log.info(
                    'Skipping already validated model \'{}\'.'.format(model_file_name)
                    + (
                        '\nRun \'{} validate -o {} {}\' to overwrite.'.format(
                            PACKAGE_NAME, model_path, dataset_path
                        )
                        if command == 'test'
                        else ''
                    )
                )
                continue

            if dataset['md5'] != model['md5_valid']:
                raise AssistantError(
                    'Fingerprint of provided validation dataset does not match the one specified in model file.'
                )

        test_idxs = model['idxs_valid']
        if is_test:

            # exclude training and/or test sets from validation set if necessary
            excl_idxs = np.empty((0,), dtype=np.uint)
            if dataset['md5'] == model['md5_train']:
                excl_idxs = np.concatenate([excl_idxs, model['idxs_train']]).astype(
                    np.uint
                )
            if dataset['md5'] == model['md5_valid']:
                excl_idxs = np.concatenate([excl_idxs, model['idxs_valid']]).astype(
                    np.uint
                )

            n_data = dataset['F'].shape[0]
            n_data_eff = n_data - len(excl_idxs)

            if (
                n_test == 0 and n_data_eff != 0
            ):  # test on all data points that have not been used for training or testing
                n_test = n_data_eff
                log.info(
                    'Test set size was automatically set to {:,} points.'.format(n_test)
                )

            if n_test == 0 or n_data_eff == 0:
                log.warning('Skipping! No unused points for test in provided dataset.')
                return
            elif n_data_eff < n_test:
                n_test = n_data_eff
                log.warning(
                    'Test size reduced to {:d}. Not enough unused points in provided dataset.'.format(
                        n_test
                    )
                )

            if 'E' in dataset:
                if gdml_train is None:
                    gdml_train = GDMLTrain(
                        max_memory=max_memory, max_processes=max_processes
                    )
                test_idxs = gdml_train.draw_strat_sample(
                    dataset['E'], n_test, excl_idxs=excl_idxs
                )
            else:
                test_idxs = np.delete(np.arange(n_data), excl_idxs)

                log.warning(
                    'Test dataset will be sampled with no guidance from energy labels (randomly)!\n'
                    + 'Note: Larger test datasets are recommended due to slower convergence of the error.'
                )
        # shuffle to improve convergence of online error
        np.random.shuffle(test_idxs)

        # NEW
        if DEBUG_WRITE:
            test_idxs = np.sort(test_idxs)

        z = dataset['z']
        R = dataset['R'][test_idxs, :, :]
        F = dataset['F'][test_idxs, :, :]

        if model['use_E']:
            E = dataset['E'][test_idxs]

        try:
            gdml_predict = GDMLPredict(
                model,
                max_memory=max_memory,
                max_processes=max_processes,
                use_torch=use_torch,
            )
        except:
            print()
            log.critical(traceback.format_exc())
            print()
            os._exit(1)

        b_size = min(1000, len(test_idxs))

        if not use_torch:
            if num_workers == -1 or batch_size == -1:
                ui.callback(NOT_DONE, disp_str='Optimizing parallelism')

                gps, is_from_cache = gdml_predict.prepare_parallel(
                    n_bulk=b_size, return_is_from_cache=True
                )
                num_workers, chunk_size, bulk_mp = (
                    gdml_predict.num_workers,
                    gdml_predict.chunk_size,
                    gdml_predict.bulk_mp,
                )

                sec_disp_str = 'no chunking'.format(chunk_size)
                if chunk_size != gdml_predict.n_train:
                    sec_disp_str = 'chunks of {:d}'.format(chunk_size)

                if num_workers == 0:
                    sec_disp_str = 'no workers / ' + sec_disp_str
                else:
                    sec_disp_str = (
                        '{:d} workers {}/ '.format(
                            num_workers, '[MP] ' if bulk_mp else ''
                        )
                        + sec_disp_str
                    )

                ui.callback(
                    DONE,
                    disp_str='Optimizing parallelism'
                    + (' (from cache)' if is_from_cache else ''),
                    sec_disp_str=sec_disp_str,
                )
            else:
                gdml_predict._set_num_workers(num_workers)
                gdml_predict._set_chunk_size(chunk_size)
                gdml_predict._set_bulk_mp(bulk_mp)

        n_atoms = z.shape[0]

        if model['use_E']:
            e_mae_sum, e_rmse_sum = 0, 0
        f_mae_sum, f_rmse_sum = 0, 0
        cos_mae_sum, cos_rmse_sum = 0, 0
        mag_mae_sum, mag_rmse_sum = 0, 0

        n_done = 0
        t = time.time()
        for b_range in _batch(list(range(len(test_idxs))), b_size):

            n_done_step = len(b_range)
            n_done += n_done_step

            r = R[b_range].reshape(n_done_step, -1)
            e_pred, f_pred = gdml_predict.predict(r)

            # energy error
            if model['use_E']:
                e = E[b_range]
                e_mae, e_mae_sum, e_rmse, e_rmse_sum = _online_err(
                    np.squeeze(e) - e_pred, 1, n_done, e_mae_sum, e_rmse_sum
                )

                # import matplotlib.pyplot as plt
                # plt.hist(np.squeeze(e) - e_pred)
                # plt.show()

            # force component error
            f = F[b_range].reshape(n_done_step, -1)
            f_mae, f_mae_sum, f_rmse, f_rmse_sum = _online_err(
                f - f_pred, 3 * n_atoms, n_done, f_mae_sum, f_rmse_sum
            )

            # magnitude error
            f_pred_mags = np.linalg.norm(f_pred.reshape(-1, 3), axis=1)
            f_mags = np.linalg.norm(f.reshape(-1, 3), axis=1)
            mag_mae, mag_mae_sum, mag_rmse, mag_rmse_sum = _online_err(
                f_pred_mags - f_mags, n_atoms, n_done, mag_mae_sum, mag_rmse_sum
            )

            # normalized cosine error
            f_pred_norm = f_pred.reshape(-1, 3) / f_pred_mags[:, None]
            f_norm = f.reshape(-1, 3) / f_mags[:, None]
            cos_err = (
                np.arccos(np.clip(np.einsum('ij,ij->i', f_pred_norm, f_norm), -1, 1))
                / np.pi
            )
            cos_mae, cos_mae_sum, cos_rmse, cos_rmse_sum = _online_err(
                cos_err, n_atoms, n_done, cos_mae_sum, cos_rmse_sum
            )

            # NEW

            if is_test and DEBUG_WRITE:

                try:
                    with open('test_pred.xyz', 'a') as file:

                        n = r.shape[0]
                        for i, ri in enumerate(r):

                            r_out = ri.reshape(-1, 3)
                            e_out = e_pred[i]
                            f_out = f_pred[i].reshape(-1, 3)

                            ext_xyz_str = (
                                io.generate_xyz_str(r_out, model['z'], e=e_out, f=f_out)
                                + '\n'
                            )

                            file.write(ext_xyz_str)

                except IOError:
                    sys.exit("ERROR: Writing xyz file failed.")

                try:
                    with open('test_ref.xyz', 'a') as file:

                        n = r.shape[0]
                        for i, ri in enumerate(r):

                            r_out = ri.reshape(-1, 3)
                            e_out = (
                                None
                                if not model['use_E']
                                else np.squeeze(E[b_range][i])
                            )
                            f_out = f[i].reshape(-1, 3)

                            ext_xyz_str = (
                                io.generate_xyz_str(r_out, model['z'], e=e_out, f=f_out)
                                + '\n'
                            )
                            file.write(ext_xyz_str)

                except IOError:
                    sys.exit("ERROR: Writing xyz file failed.")

                try:
                    with open('test_diff.xyz', 'a') as file:

                        n = r.shape[0]
                        for i, ri in enumerate(r):

                            r_out = ri.reshape(-1, 3)
                            e_out = (
                                None
                                if not model['use_E']
                                else (np.squeeze(E[b_range][i]) - e_pred[i])
                            )
                            f_out = (f[i] - f_pred[i]).reshape(-1, 3)

                            ext_xyz_str = (
                                io.generate_xyz_str(r_out, model['z'], e=e_out, f=f_out)
                                + '\n'
                            )
                            file.write(ext_xyz_str)

                except IOError:
                    sys.exit("ERROR: Writing xyz file failed.")

            # NEW

            sps = n_done / (time.time() - t)  # examples per second
            disp_str = 'energy %.3f/%.3f, ' % (e_mae, e_rmse) if model['use_E'] else ''
            disp_str += 'forces %.3f/%.3f' % (f_mae, f_rmse)
            disp_str = (
                '{} errors (MAE/RMSE): '.format('Test' if is_test else 'Validation')
                + disp_str
            )
            sec_disp_str = '@ %.1f geo/s' % sps if b_range is not None else ''

            ui.callback(
                n_done,
                len(test_idxs),
                disp_str=disp_str,
                sec_disp_str=sec_disp_str,
                newline_when_done=False,
            )

        if is_test:
            ui.callback(
                DONE,
                disp_str='Testing on {:,} points'.format(n_test),
                sec_disp_str=sec_disp_str,
            )
        else:
            ui.callback(DONE, disp_str=disp_str, sec_disp_str=sec_disp_str)

        if model['use_E']:
            e_rmse_pct = (e_rmse / e_err['rmse'] - 1.0) * 100
        f_rmse_pct = (f_rmse / f_err['rmse'] - 1.0) * 100

        if is_test and n_models == 1:
            n_train = len(model['idxs_train'])
            n_valid = len(model['idxs_valid'])
            print()
            ui.print_two_column_str(
                ui.color_str('Test errors (MAE/RMSE)', bold=True),
                '{:,} + {:,} points (training + validation), sigma (length scale): {}'.format(
                    n_train, n_valid, model['sig']
                ),
            )

            r_unit = 'unknown unit'
            e_unit = 'unknown unit'
            f_unit = 'unknown unit'
            if 'r_unit' in dataset and 'e_unit' in dataset:
                r_unit = dataset['r_unit']
                e_unit = dataset['e_unit']
                f_unit = str(dataset['e_unit']) + '/' + str(dataset['r_unit'])

            format_str = '  {:<18} {:>.4f}/{:>.4f} [{}]'
            if model['use_E']:
                ui.print_two_column_str(
                    format_str.format('Energy:', e_mae, e_rmse, e_unit),
                    'relative to expected: {:+.1f}%'.format(e_rmse_pct),
                )

            ui.print_two_column_str(
                format_str.format('Forces:', f_mae, f_rmse, f_unit),
                'relative to expected: {:+.1f}%'.format(f_rmse_pct),
            )

            print(format_str.format('  Magnitude:', mag_mae, mag_rmse, r_unit))
            ui.print_two_column_str(
                format_str.format('  Angle:', cos_mae, cos_rmse, '0-1'),
                'lower is better',
            )
            print()

        model_mutable = dict(model)
        model.close()
        model = model_mutable

        model_needs_update = (
            overwrite
            or (is_test and model['n_test'] < len(test_idxs))
            or (is_validation and not is_model_validated)
        )
        if model_needs_update:

            if is_validation and overwrite:
                model['n_test'] = 0  # flag the model as not tested

            if is_test:
                model['n_test'] = len(test_idxs)
                model['md5_test'] = dataset['md5']

            if model['use_E']:
                model['e_err'] = {
                    'mae': e_mae.item(),
                    'rmse': e_rmse.item(),
                }

            model['f_err'] = {'mae': f_mae.item(), 'rmse': f_rmse.item()}
            np.savez_compressed(model_path, **model)

            if is_test and model['n_test'] > 0:
                log.info('Expected errors were updated in model file.')

        else:
            add_info_str = (
                'the same number of'
                if model['n_test'] == len(test_idxs)
                else 'only {:,}'.format(len(test_idxs))
            )
            log.warning(
                'This model has previously been tested on {:,} points, which is why the errors for the current test run with {} points have NOT been used to update the model file.\n'.format(
                    model['n_test'], add_info_str
                )
                + 'Run \'{} test -o {} {} {}\' to overwrite.'.format(
                    PACKAGE_NAME, os.path.relpath(model_path), dataset_path, n_test
                )
            )

        F_rmse.append(f_rmse)

    return F_rmse


def select(model_dir, overwrite, model_file=None, command=None, **kwargs):  # noqa: C901

    func_called_directly = (
        command == 'select'
    )  # has this function been called from command line or from 'all'?
    if func_called_directly:
        ui.print_step_title('MODEL SELECTION')

    any_model_not_validated = False
    any_model_is_tested = False

    model_dir, model_file_names = model_dir
    if len(model_file_names) > 1:

        use_E = True

        rows = []
        data_names = ['sig', 'MAE', 'RMSE', 'MAE', 'RMSE']
        for i, model_file_name in enumerate(model_file_names):
            model_path = os.path.join(model_dir, model_file_name)
            _, model = io.is_file_type(model_path, 'model')

            use_E = model['use_E']

            if i == 0:
                idxs_train = set(model['idxs_train'])
                md5_train = model['md5_train']
                idxs_valid = set(model['idxs_valid'])
                md5_valid = model['md5_valid']
            else:
                if (
                    md5_train != model['md5_train']
                    or md5_valid != model['md5_valid']
                    or idxs_train != set(model['idxs_train'])
                    or idxs_valid != set(model['idxs_valid'])
                ):
                    raise AssistantError(
                        '{} contains models trained or validated on different datasets.'.format(
                            model_dir
                        )
                    )

            e_err = {'mae': 0.0, 'rmse': 0.0}
            if model['use_E']:
                e_err = model['e_err'].item()
            f_err = model['f_err'].item()

            is_model_validated = not (np.isnan(f_err['mae']) or np.isnan(f_err['rmse']))
            if not is_model_validated:
                any_model_not_validated = True

            is_model_tested = model['n_test'] > 0
            if is_model_tested:
                any_model_is_tested = True

            rows.append(
                [model['sig'], e_err['mae'], e_err['rmse'], f_err['mae'], f_err['rmse']]
            )

            model.close()

        if any_model_not_validated:
            log.warning(
                'One or more models in the given directory have not been validated.'
            )
            print()

        if any_model_is_tested:
            log.error(
                'One or more models in the given directory have already been tested. This means that their recorded expected errors are test errors, not validation errors. However, one should never perform model selection based on the test error!\n'
                + 'Please run the validation command (again) with the overwrite option \'-o\', then this selection command.'
            )
            return

        f_rmse_col = [row[4] for row in rows]
        best_idx = f_rmse_col.index(min(f_rmse_col))  # idx of row with lowest f_rmse
        best_sig = rows[best_idx][0]

        rows = sorted(rows, key=lambda col: col[0])  # sort according to sigma
        print(ui.color_str('Cross-validation errors', bold=True))
        print(' ' * 7 + 'Energy' + ' ' * 6 + 'Forces')
        print((' {:>3} ' + '{:>5} ' * 4).format(*data_names))
        print(' ' + '-' * 27)
        format_str = ' {:>3} ' + '{:5.2f} ' * 4
        format_str_no_E = ' {:>3}     -     - ' + '{:5.2f} ' * 2
        for row in rows:
            if use_E:
                row_str = format_str.format(*row)
            else:
                row_str = format_str_no_E.format(*[row[0], row[3], row[4]])

            if row[0] != best_sig:
                row_str = ui.color_str(row_str, fore_color=ui.GRAY)
            print(row_str)
        print()

        sig_col = [row[0] for row in rows]
        if best_sig == min(sig_col) or best_sig == max(sig_col):
            log.warning(
                'The optimal sigma (length scale) lies on the boundary of the search grid.\n'
                + 'Model performance might improve if the search grid is extended in direction sigma {} {:d}.'.format(
                    '<' if best_idx == 0 else '>', best_sig
                )
            )

    else:  # only one model available
        log.info('Skipping model selection step as there is only one model to select.')

        best_idx = 0

    best_model_path = os.path.join(model_dir, model_file_names[best_idx])

    if model_file is None:

        # generate model file name based on model properties
        best_model = np.load(best_model_path, allow_pickle=True)
        model_file = io.model_file_name(best_model, is_extended=True)
        best_model.close()

    model_exists = os.path.isfile(model_file)
    if model_exists and overwrite:
        log.info('Overwriting existing model file.')

    if not model_exists or overwrite:
        if func_called_directly:
            log.done('Writing model file \'{}\''.format(model_file))

        shutil.copy(best_model_path, model_file)
        shutil.rmtree(model_dir, ignore_errors=True)
    else:
        log.warning(
            'Model \'{}\' already exists.\n'.format(model_file)
            + 'Run \'{} select -o {}\' to overwrite.'.format(
                PACKAGE_NAME, os.path.relpath(model_dir)
            )
        )

    if func_called_directly:
        _print_next_step('select', model_files=[model_file])

    return model_file


def show(file, command=None, **kwargs):

    ui.print_step_title('SHOW DETAILS')
    file_path, file = file

    if file['type'].astype(str) == 'd':
        _print_dataset_properties(file)

    if file['type'].astype(str) == 't':
        _print_task_properties(file)

    if file['type'].astype(str) == 'm':
        _print_model_properties(file)


def reset(command=None, **kwargs):

    if ui.yes_or_no('\nDo you really want to purge all caches and temporary files?'):

        pkg_dir = os.path.dirname(os.path.abspath(__file__))
        bmark_file = '_bmark_cache.npz'
        bmark_path = os.path.join(pkg_dir, bmark_file)

        if os.path.exists(bmark_path):
            try:
                os.remove(bmark_path)
            except OSError:
                print()
                log.critical('Exception: unable to delete benchmark cache.')
                print()
                os._exit(1)

            log.done('Benchmark cache deleted.')
        else:
            log.info('Benchmark cache was already empty.')
    else:
        print(' Cancelled.')


def main():
    def _add_argument_sample_size(parser, subset_str):
        subparser.add_argument(
            'n_%s' % subset_str,
            metavar='<n_%s>' % subset_str,
            type=io.is_strict_pos_int,
            help='%s sample size' % subset_str,
        )

    def _add_argument_dir_with_file_type(parser, type, or_file=False):
        parser.add_argument(
            '%s_dir' % type,
            metavar='<%s_dir%s>' % (type, '_or_file' if or_file else ''),
            type=lambda x: io.is_dir_with_file_type(x, type, or_file=or_file),
            help='path to %s directory%s' % (type, ' or file' if or_file else ''),
        )

    # Available resources
    total_memory = psutil.virtual_memory().total // 2**30
    total_cpus = mp.cpu_count()

    parser = argparse.ArgumentParser()
    parser.add_argument(
        '--version',
        action='version',
        version='%(prog)s '
        + __version__
        + ' [Python {}, NumPy {}, SciPy {}'.format(
            '.'.join(map(str, sys.version_info[:3])), np.__version__, sp.__version__
        )
        + ', PyTorch {}'.format(torch.__version__ if _has_torch else 'N/A')
        + ', ASE {}'.format(ase.__version__ if _has_ase else 'N/A')
        + ']',
    )

    parent_parser = argparse.ArgumentParser(add_help=False)

    subparsers = parser.add_subparsers(title='commands', dest='command')
    subparsers.required = True
    parser_all = subparsers.add_parser(
        'all',
        help='reconstruct a force field from beginning to end',
        parents=[parent_parser],
    )
    parser_create = subparsers.add_parser(
        'create', help='create training task(s)', parents=[parent_parser]
    )
    parser_train = subparsers.add_parser(
        'train', help='train model(s) from task(s)', parents=[parent_parser]
    )
    parser_resume = subparsers.add_parser(
        'resume', help='resume training of a model', parents=[parent_parser]
    )
    parser_valid = subparsers.add_parser(
        'validate', help='validate model(s)', parents=[parent_parser]
    )
    parser_select = subparsers.add_parser(
        'select', help='select best performing model', parents=[parent_parser]
    )
    parser_test = subparsers.add_parser(
        'test', help='test a model', parents=[parent_parser]
    )
    parser_show = subparsers.add_parser(
        'show',
        help='print details about a dataset, task or model file',
        parents=[parent_parser],
    )
    subparsers.add_parser(
        'reset', help='delete all caches and temporary files', parents=[parent_parser]
    )

    for subparser in [parser_all, parser_create]:

        subparser.add_argument(
            'dataset',
            metavar='<dataset_file>',
            type=lambda x: io.is_file_type(x, 'dataset'),
            help='path to dataset file (train/validation/test subsets are sampled from here if no seperate dataset are specified)',
        )

        _add_argument_sample_size(subparser, 'train')
        _add_argument_sample_size(subparser, 'valid')
        subparser.add_argument(
            '-v',
            '--validation_dataset',
            metavar='<valid_dataset_file>',
            dest='valid_dataset',
            type=lambda x: io.is_file_type(x, 'dataset'),
            help='path to separate validation dataset file',
        )
        subparser.add_argument(
            '-t',
            '--test_dataset',
            metavar='<test_dataset_file>',
            dest='test_dataset',
            type=lambda x: io.is_file_type(x, 'dataset'),
            help='path to separate test dataset file',
        )
        subparser.add_argument(
            '-s',
            '--sig',
            metavar=('<s1>', '<s2>'),
            dest='sigs',
            type=io.parse_list_or_range,
            help='integer list and/or range <start>:[<step>:]<stop> for the kernel hyper-parameter sigma (length scale)',
            nargs='+',
        )

        group = subparser.add_mutually_exclusive_group()
        group.add_argument(
            '--gdml',
            action='store_true',
            help='don\'t include symmetries in the model (GDML)',
        )

        group.add_argument(
            '--perms_from',
            metavar='<file>',
            dest='perms_from_arg',
            type=lambda x: io.is_valid_file_type(x),
            help='path to file to take permutations from (key: \'perms\')',
        )

        group = subparser.add_mutually_exclusive_group()
        group.add_argument(
            '--no_E',
            dest='use_E',
            action='store_false',
            help='only reconstruct force field w/o potential energy surface',
        )
        group.add_argument(
            '--E_cstr',
            dest='use_E_cstr',
            action='store_true',
            help='include pointwise energy constraints',
        )

        subparser.add_argument(
            '--task_dir',
            metavar='<task_dir>',
            dest='task_dir',
            help='user-defined task output dir name',
        )

    for subparser in [parser_all, parser_select]:
        subparser.add_argument(
            '--model_file',
            metavar='<model_file>',
            dest='model_file',
            help='user-defined model output file name',
        )

    for subparser in [parser_all, parser_train]:
        subparser.add_argument(
            '--lazy',
            dest='lazy_training',
            action='store_true',
            help='give up on unfinished tasks (if more than one)',
        )

    for subparser in [parser_valid, parser_test]:
        _add_argument_dir_with_file_type(subparser, 'model', or_file=True)

    parser_valid.add_argument(
        'valid_dataset',
        metavar='<valid_dataset_file>',
        type=lambda x: io.is_file_type(x, 'dataset'),
        help='path to validation dataset file',
    )
    parser_test.add_argument(
        'test_dataset',
        metavar='<test_dataset_file>',
        type=lambda x: io.is_file_type(x, 'dataset'),
        help='path to test dataset file',
    )

    for subparser in [parser_all, parser_test]:
        subparser.add_argument(
            'n_test',
            metavar='<n_test>',
            type=io.is_strict_pos_int,
            help='test sample size',
            nargs='?',
            default=None,
        )

    parser_resume.add_argument(
        'model',
        metavar='<model_file>',
        type=lambda x: io.is_file_type(x, 'model'),
        help='path to model file to complete training for',
    )
    parser_resume.add_argument(
        'dataset',
        metavar='<train_dataset_file>',
        type=lambda x: io.is_file_type(x, 'dataset'),
        help='path to original training dataset file',
    )

    _add_argument_dir_with_file_type(parser_train, 'task', or_file=True)

    for subparser in [parser_train, parser_resume]:
        subparser.add_argument(
            'valid_dataset',
            metavar='<valid_dataset_file>',
            type=lambda x: io.is_file_type(x, 'dataset'),
            help='path to validation dataset file',
        )

    _add_argument_dir_with_file_type(parser_select, 'model')

    parser_show.add_argument(
        'file',
        metavar='<file>',
        type=lambda x: io.is_valid_file_type(x),
        help='path to dataset, task or model file',
    )

    for subparser in [
        parser_all,
        parser_train,
        parser_resume,
        parser_valid,
        parser_test,
    ]:

        subparser.add_argument(
            '-m',
            '--max_memory',
            metavar='<max_memory>',
            type=int,
            help='limit memory usage (whenever possible) [GB]',
            choices=range(1, total_memory + 1),
            default=total_memory,
        )

        subparser.add_argument(
            '-p',
            '--max_processes',
            metavar='<max_processes>',
            type=int,
            help='limit number of processes',
            choices=range(1, total_cpus + 1),
            default=total_cpus,
        )

        subparser.add_argument(
            '--cpu',
            dest='use_torch',
            action='store_false',
            help='use CPU implementation (no PyTorch dependency)',
        )

    for subparser in [
        parser_all,
        parser_create,
        parser_train,
        parser_resume,
        parser_valid,
        parser_select,
        parser_test,
    ]:
        subparser.add_argument(
            '-o',
            '--overwrite',
            dest='overwrite',
            action='store_true',
            help='overwrite existing files',
        )

    args = parser.parse_args()

    # Post-processing for optional sig argument
    if 'sigs' in args and args.sigs is not None:
        args.sigs = np.hstack(
            args.sigs
        ).tolist()  # Flatten list, if (part of it) was generated using the range syntax
        args.sigs = sorted(list(set(args.sigs)))  # remove potential duplicates

    # Post-processing for optional model output file argument
    if 'model_file' in args and args.model_file is not None:
        if not args.model_file.endswith('.npz'):
            args.model_file += '.npz'

    # Check PyTorch GPU support.
    if ('use_torch' in args and args.use_torch) or 'use_torch' not in args:
        if _has_torch:
            if not (_torch_cuda_is_available or _torch_mps_is_available):
                print()  # TODO: print only if log level includes warning
                log.warning(
                    'Your PyTorch installation does not see any GPU(s) on your system and will thus run all calculations on the CPU! If this is what you want, we recommend bypassing PyTorch using \'--cpu\' for improved performance.'
                )
        else:
            print()
            log.critical(
                'PyTorch dependency not found! Please install or use \'--cpu\' to bypass PyTorch and run everything the CPU.'
            )
            print()
            os._exit(1)

    args = vars(args)

    _print_splash(
        args['max_memory'] if 'max_memory' in args else total_memory,
        args['max_processes'] if 'max_processes' in args else total_cpus,
        args['use_torch'] if 'use_torch' in args else True,
    )

    try:
        getattr(sys.modules[__name__], args['command'])(**args)
    except AssistantError as err:
        log.error(str(err))
        print()
        os._exit(1)
    except:
        log.critical(traceback.format_exc())
        print()
        os._exit(1)
    print()


if __name__ == "__main__":
    main()


================================================
FILE: sgdml/get.py
================================================
#!/usr/bin/python

# MIT License
#
# Copyright (c) 2018-2023 Stefan Chmiela
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

from __future__ import print_function

import argparse
import os
import re
import sys

from . import __version__
from .utils import ui

if sys.version[0] == '3':
    raw_input = input

try:
    from urllib.request import urlopen
except ImportError:
    from urllib2 import urlopen


def download(command, file_name):

    base_url = 'http://www.quantum-machine.org/gdml/' + (
        'data/npz/' if command == 'dataset' else 'models/'
    )
    request = urlopen(base_url + file_name)
    file = open(file_name, 'wb')
    filesize = int(request.headers['Content-Length'])

    size = 0
    block_sz = 1024
    while True:
        buffer = request.read(block_sz)
        if not buffer:
            break
        size += len(buffer)
        file.write(buffer)

        ui.callback(
            size,
            filesize,
            disp_str='Downloading: {}'.format(file_name),
            sec_disp_str='{:,} bytes'.format(filesize),
        )
    file.close()


def main():

    base_url = 'http://www.quantum-machine.org/gdml/'

    parser = argparse.ArgumentParser()

    parent_parser = argparse.ArgumentParser(add_help=False)
    parent_parser.add_argument(
        '-o',
        '--overwrite',
        dest='overwrite',
        action='store_true',
        help='overwrite existing files',
    )

    subparsers = parser.add_subparsers(title='commands', dest='command')
    subparsers.required = True
    parser_dataset = subparsers.add_parser(
        'dataset', help='download benchmark dataset', parents=[parent_parser]
    )
    parser_model = subparsers.add_parser(
        'model', help='download pre-trained model', parents=[parent_parser]
    )

    for subparser in [parser_dataset, parser_model]:
        subparser.add_argument(
            'name',
            metavar='<name>',
            type=str,
            help='item name',
            nargs='?',
            default=None,
        )

    args = parser.parse_args()

    print("Contacting server (%s)..." % base_url)

    if args.name is not None:

        url = '%sget.php?version=%s&%s=%s' % (
            base_url,
            __version__,
            args.command,
            args.name,
        )
        response = urlopen(url)
        match, score = response.read().decode().split(',')
        response.close()

        if int(score) == 0 or ui.yes_or_no('Do you mean \'%s\'?' % match):
            download(args.command, match + '.npz')
            return

    response = urlopen(
        '%sget.php?version=%s&%s' % (base_url, __version__, args.command)
    )
    line = response.readlines()
    response.close()

    print()
    print('Available %ss:' % args.command)

    print('{:<2} {:<31}    {:>4}'.format('ID', 'Name', 'Size'))
    print('-' * 42)

    items = line[0].split(b';')
    for i, item in enumerate(items):
        name, size = item.split(b',')
        size = int(size) / 1024**2  # Bytes to MBytes

        print('{:>2d} {:<30} {:>5.1f} MB'.format(i, name.decode("utf-8"), size))
    print()

    down_list = raw_input(
        'Please list which %ss to download (e.g. 0 1 2 6) or type \'all\': '
        % args.command
    )
    down_idxs = []
    if 'all' in down_list.lower():
        down_idxs = list(range(len(items)))
    elif re.match(
        "^ *[0-9][0-9 ]*$", down_list
    ):  # only digits and spaces, at least one digit
        down_idxs = [int(idx) for idx in re.split(r'\s+', down_list.strip())]
        down_idxs = list(set(down_idxs))
    else:
        print(ui.color_str('ABORTED', fore_color=ui.RED, bold=True))

    for idx in down_idxs:
        if idx not in range(len(items)):
            print(
                ui.color_str('[WARN]', fore_color=ui.YELLOW, bold=True)
                + ' Index '
                + str(idx)
                + ' out of range, skipping.'
            )
        else:
            name = items[idx].split(b',')[0].decode("utf-8")
            if os.path.exists(name):
                print("'%s' exists, skipping." % (name))
                continue

            download(args.command, name + '.npz')


if __name__ == "__main__":
    main()


================================================
FILE: sgdml/intf/__init__.py
================================================


================================================
FILE: sgdml/intf/ase_calc.py
================================================
# MIT License
#
# Copyright (c) 2018-2020 Stefan Chmiela
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

import logging
import numpy as np

try:
    from ase.calculators.calculator import Calculator
    from ase.units import kcal, mol
except ImportError:
    raise ImportError(
        'Optional ASE dependency not found! Please run \'pip install sgdml[ase]\' to install it.'
    )

from ..predict import GDMLPredict


class SGDMLCalculator(Calculator):

    implemented_properties = ['energy', 'forces']

    def __init__(
        self,
        model_path,
        E_to_eV=kcal / mol,
        F_to_eV_Ang=kcal / mol,
        use_torch=False,
        *args,
        **kwargs
    ):
        """
        ASE calculator for the sGDML force field.

        A calculator takes atomic numbers and atomic positions from an Atoms object and calculates the energy and forces.

        Note
        ----
        ASE uses eV and Angstrom as energy and length unit, respectively. Unless the paramerters `E_to_eV` and `F_to_eV_Ang` are specified, the sGDML model is assumed to use kcal/mol and Angstorm and the appropriate conversion factors are set accordingly.
        Here is how to find them: `ASE units <https://wiki.fysik.dtu.dk/ase/ase/units.html>`_.

        Parameters
        ----------
                model_path : :obj:`str`
                        Path to a sGDML model file
                E_to_eV : float, optional
                        Conversion factor from whatever energy unit is used by the model to eV. By default this parameter is set to convert from kcal/mol.
                F_to_eV_Ang : float, optional
                        Conversion factor from whatever length unit is used by the model to Angstrom. By default, the length unit is not converted (assumed to be in Angstrom)
                use_torch : boolean, optional
                        Use PyTorch to calculate predictions
        """

        super(SGDMLCalculator, self).__init__(*args, **kwargs)

        self.log = logging.getLogger(__name__)

        model = np.load(model_path, allow_pickle=True)
        self.gdml_predict = GDMLPredict(model, use_torch=use_torch)
        self.gdml_predict.prepare_parallel(n_bulk=1)

        self.log.warning(
            'Please remember to specify the proper conversion factors, if your model does not use \'kcal/mol\' and \'Ang\' as units.'
        )

        # Converts energy from the unit used by the sGDML model to eV.
        self.E_to_eV = E_to_eV

        # Converts length from eV to unit used in sGDML model.
        self.Ang_to_R = F_to_eV_Ang / E_to_eV

        # Converts force from the unit used by the sGDML model to eV/Ang.
        self.F_to_eV_Ang = F_to_eV_Ang

    def calculate(self, atoms=None, *args, **kwargs):

        super(SGDMLCalculator, self).calculate(atoms, *args, **kwargs)

        # convert model units to ASE default units
        r = np.array(atoms.get_positions()) * self.Ang_to_R

        e, f = self.gdml_predict.predict(r.ravel())

        # convert model units to ASE default units (eV and Ang)
        e *= self.E_to_eV
        f *= self.F_to_eV_Ang

        self.results = {'energy': e, 'forces': f.reshape(-1, 3)}


================================================
FILE: sgdml/predict.py
================================================
"""
This module contains all routines for evaluating GDML and sGDML models.
"""

# MIT License
#
# Copyright (c) 2018-2022 Stefan Chmiela, Gregory Fonseca
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

from __future__ import print_function

import sys
import logging
import os
import psutil

import multiprocessing as mp

Pool = mp.get_context('fork').Pool

import timeit
from functools import partial

try:
    import torch
except ImportError:
    _has_torch = False
else:
    _has_torch = True

try:
    _torch_mps_is_available = torch.backends.mps.is_available()
except AttributeError:
    _torch_mps_is_available = False
_torch_mps_is_available = False

try:
    _torch_cuda_is_available = torch.cuda.is_available()
except AttributeError:
    _torch_cuda_is_available = False

import numpy as np

from . import __version__
from .utils.desc import Desc


def share_array(arr_np):
    """
    Return a ctypes array allocated from shared memory with data from a
    NumPy array of type `float`.

    Parameters
    ----------
            arr_np : :obj:`numpy.ndarray`
                    NumPy array.

    Returns
    -------
            array of :obj:`ctype`
    """

    arr = mp.RawArray('d', arr_np.ravel())
    return arr, arr_np.shape


def _predict_wkr(
    r, r_desc_d_desc, lat_and_inv, glob_id, wkr_start_stop=None, chunk_size=None
):
    """
    Compute (part) of a prediction.

    Every prediction is a linear combination involving the training points used for
    this model. This function evalutates that combination for the range specified by
    `wkr_start_stop`. This workload can optionally be processed in chunks,
    which can be faster as it requires less memory to be allocated.

    Note
    ----
        It is sufficient to provide either the parameter `r` or `r_desc_d_desc`.
        The other one can be set to `None`.

    Parameters
    ----------
            r : :obj:`numpy.ndarray`
                    An array of size 3N containing the Cartesian
                    coordinates of each atom in the molecule.
            r_desc_d_desc : tuple of :obj:`numpy.ndarray`
                    A tuple made up of:
                        (1) An array of size D containing the descriptors
                        of dimension D for the molecule.
                        (2) An array of size D x 3N containing the
                        descriptor Jacobian for the molecules. It has dimension
                        D with 3N partial derivatives with respect to the 3N
                        Cartesian coordinates of each atom.
            lat_and_inv : tuple of :obj:`numpy.ndarray`
                    Tuple of 3 x 3 matrix containing lattice vectors as columns and
                    its inverse.
            glob_id : int
                    Identifier of the global namespace that this
                    function is supposed to be using (zero if only one
                    instance of this class exists at the same time).
            wkr_start_stop : tuple of int, optional
                    Range defined by the indices of first and last (exclusive)
                    sum element. The full prediction is generated if this parameter
                    is not specified.
            chunk_size : int, optional
                    Chunk size. The whole linear combination is evaluated in a large
                    vector operation instead of looping over smaller chunks if this
                    parameter is left unspecified.

    Returns
    -------
            :obj:`numpy.ndarray`
                    Partial prediction of all force components and
                    energy (appended to array as last element).
    """

    global globs
    glob = globs[glob_id]
    sig, n_perms = glob['sig'], glob['n_perms']

    desc_func = glob['desc_func']

    R_desc_perms = np.frombuffer(glob['R_desc_perms']).reshape(
        glob['R_desc_perms_shape']
    )
    R_d_desc_alpha_perms = np.frombuffer(glob['R_d_desc_alpha_perms']).reshape(
        glob['R_d_desc_alpha_perms_shape']
    )

    if 'alphas_E_lin' in glob:
        alphas_E_lin = np.frombuffer(glob['alphas_E_lin']).reshape(
            glob['alphas_E_lin_shape']
        )

    r_desc, r_d_desc = r_desc_d_desc or desc_func.from_R(
        r, lat_and_inv, max_processes=1
    )  # no additional forking during parallelization

    n_train = int(R_desc_perms.shape[0] / n_perms)

    wkr_start, wkr_stop = (0, n_train) if wkr_start_stop is None else wkr_start_stop
    if chunk_size is None:
        chunk_size = n_train

    dim_d = desc_func.dim
    dim_i = desc_func.dim_i
    dim_c = chunk_size * n_perms

    # Pre-allocate memory.
    diff_ab_perms = np.empty((dim_c, dim_d))
    a_x2 = np.empty((dim_c,))
    mat52_base = np.empty((dim_c,))

    # avoid divisions (slower)
    sig_inv = 1.0 / sig
    mat52_base_fact = 5.0 / (3 * sig**3)
    diag_scale_fact = 5.0 / sig
    sqrt5 = np.sqrt(5.0)

    E_F = np.zeros((dim_d + 1,))
    F = E_F[1:]

    wkr_start *= n_perms
    wkr_stop *= n_perms

    b_start = wkr_start
    for b_stop in list(range(wkr_start + dim_c, wkr_stop, dim_c)) + [wkr_stop]:

        rj_desc_perms = R_desc_perms[b_start:b_stop, :]
        rj_d_desc_alpha_perms = R_d_desc_alpha_perms[b_start:b_stop, :]

        # Resize pre-allocated memory for last iteration, if chunk_size is not a divisor of the training set size.
        # Note: It's faster to process equally sized chunks.
        c_size = b_stop - b_start
        if c_size < dim_c:
            diff_ab_perms = diff_ab_perms[:c_size, :]
            a_x2 = a_x2[:c_size]
            mat52_base = mat52_base[:c_size]

        np.subtract(
            np.broadcast_to(r_desc, rj_desc_perms.shape),
            rj_desc_perms,
            out=diff_ab_perms,
        )
        norm_ab_perms = sqrt5 * np.linalg.norm(diff_ab_perms, axis=1)

        np.exp(-norm_ab_perms * sig_inv, out=mat52_base)
        mat52_base *= mat52_base_fact
        np.einsum(
            'ji,ji->j', diff_ab_perms, rj_d_desc_alpha_perms, out=a_x2
        )  # colum wise dot product

        F += (a_x2 * mat52_base).dot(diff_ab_perms) * diag_scale_fact
        mat52_base *= norm_ab_perms + sig
        F -= mat52_base.dot(rj_d_desc_alpha_perms)

        # Note: Energies are automatically predicted with a flipped sign here (because -E are trained, instead of E)
        E_F[0] += a_x2.dot(mat52_base)

        # Note: Energies are automatically predicted with a flipped sign here (because -E are trained, instead of E)
        if 'alphas_E_lin' in glob:

            K_fe = diff_ab_perms * mat52_base[:, None]
            F += alphas_E_lin[b_start:b_stop].dot(K_fe)

            K_ee = (
                1 + (norm_ab_perms * sig_inv) * (1 + norm_ab_perms / (3 * sig))
            ) * np.exp(-norm_ab_perms * sig_inv)

            E_F[0] += K_ee.dot(alphas_E_lin[b_start:b_stop])

        b_start = b_stop

    out = E_F[: dim_i + 1]

    # Descriptor has less entries than 3N, need to extend size of the 'E_F' array.
    if dim_d < dim_i:
        out = np.empty((dim_i + 1,))
        out[0] = E_F[0]

    out[1:] = desc_func.vec_dot_d_desc(
        r_d_desc,
        F,
    )  # 'r_d_desc.T.dot(F)' for our special representation of 'r_d_desc'

    return out


class GDMLPredict(object):
    def __init__(
        self,
        model,
        batch_size=None,
        num_workers=None,
        max_memory=None,
        max_processes=None,
        use_torch=False,
        log_level=None,
    ):
        """
        Query trained sGDML force fields.

        This class is used to load a trained model and make energy and
        force predictions for new geometries. GPU support is provided
        through PyTorch (requires optional `torch` dependency to be
        installed).

        Note
        ----
                The parameters `batch_size` and `num_workers` are only
                relevant if this code runs on a CPU. Both can be set
                automatically via the function `prepare_parallel`.
                Note: Running calculations via PyTorch is only
                recommended with available GPU hardware. CPU calcuations
                are faster with our NumPy implementation.

        Parameters
        ----------
                model : :obj:`dict`
                        Data structure that holds all parameters of the
                        trained model. This object is the output of
                        `GDMLTrain.train`
                batch_size : int, optional
                        Chunk size for processing parallel tasks
                num_workers : int, optional
                        Number of parallel workers (in addition to the main
                        process)
                max_memory : int, optional
                        Limit the max. memory usage [GB]. This is only a
                        soft limit that can not always be enforced.
                max_processes : int, optional
                        Limit the max. number of processes. Otherwise
                        all CPU cores are used. This parameters has no
                        effect if `use_torch=True`
                use_torch : boolean, optional
                        Use PyTorch to calculate predictions
                log_level : optional
                        Set custom logging level (e.g. `logging.CRITICAL`)
        """

        global globs
        if 'globs' not in globals():
            globs = []

        # Create a personal global space for this model at a new index
        # Note: do not call delete entries in this list, since 'self.glob_id' is
        # static. Instead, setting them to None conserves positions while still
        # freeing up memory.
        globs.append({})
        self.glob_id = len(globs) - 1
        glob = globs[self.glob_id]

        self.log = logging.getLogger(__name__)
        if log_level is not None:
            self.log.setLevel(log_level)

        total_memory = psutil.virtual_memory().total // 2**30  # bytes to GB)
        self.max_memory = (
            min(max_memory, total_memory) if max_memory is not None else total_memory
        )

        total_cpus = mp.cpu_count()
        self.max_processes = (
            min(max_processes, total_cpus) if max_processes is not None else total_cpus
        )

        if 'type' not in model or not (model['type'] == 'm' or model['type'] == b'm'):
            self.log.critical('The provided data structure is not a valid model.')
            sys.exit()

        self.n_atoms = model['z'].shape[0]

        self.desc = Desc(self.n_atoms, max_processes=max_processes)
        glob['desc_func'] = self.desc

        # Cache for iterative training mode.
        self.R_desc = None
        self.R_d_desc = None

        self.lat_and_inv = (
            (model['lattice'], np.linalg.inv(model['lattice']))
            if 'lattice' in model
            else None
        )

        self.n_train = model['R_desc'].shape[1]
        glob['sig'] = model['sig']

        self.std = model['std'] if 'std' in model else 1.0
        self.c = model['c']

        n_perms = model['perms'].shape[0]
        glob['n_perms'] = n_perms

        self.tril_perms_lin = model['tril_perms_lin']

        self.torch_predict = None
        self.use_torch = use_torch
        if use_torch:

            if not _has_torch:
                raise ImportError(
                    'Optional PyTorch dependency not found! Please run \'pip install sgdml[torch]\' to install it or disable the PyTorch option.'
                )

            from .torchtools import GDMLTorchPredict

            self.torch_predict = GDMLTorchPredict(
                model,
                self.lat_and_inv,
                max_memory=max_memory,
                max_processes=max_processes,
                log_level=self.log.level,
            )

            # Enable data parallelism
            n_gpu = torch.cuda.device_count()
            if n_gpu > 1:
                self.torch_predict = torch.nn.DataParallel(self.torch_predict)

            # Send model to device
            # self.torch_device = 'cuda' if torch.cuda.is_available() else 'cpu'
            if _torch_cuda_is_available:
                self.torch_device = 'cuda'
            elif _torch_mps_is_available:
                self.torch_device = 'mps'
            else:
                self.torch_device = 'cpu'

            while True:
                try:
                    self.torch_predict.to(self.torch_device)
                except RuntimeError as e:
                    if 'out of memory' in str(e):

                        if _torch_cuda_is_available:
                            torch.cuda.empty_cache()

                        model = self.torch_predict
                        if isinstance(self.torch_predict, torch.nn.DataParallel):
                            model = model.module

                        if (
                            model.get_n_perm_batches() == 1
                        ):  # model caches the permutations, this could be why it is too large
                            model.set_n_perm_batches(
                                model.get_n_perm_batches() + 1
                            )  # uncache
                            # self.torch_predict.to( # NOTE!
                            #    self.torch_device
                            # )  # try sending to device again
                            pass
                        else:
                            self.log.critical(
                                'Not enough memory on device (RAM or GPU memory). There is no hope!'
                            )
                            print()
                            os._exit(1)
                    else:
                        raise e
                else:
                    break
        else:

            # Precompute permuted training descriptors and its first derivatives multiplied with the coefficients.

            R_desc_perms = (
                np.tile(model['R_desc'].T, n_perms)[:, self.tril_perms_lin]
                .reshape(self.n_train, n_perms, -1, order='F')
                .reshape(self.n_train * n_perms, -1)
            )
            glob['R_desc_perms'], glob['R_desc_perms_shape'] = share_array(R_desc_perms)

            R_d_desc_alpha_perms = (
                np.tile(model['R_d_desc_alpha'], n_perms)[:, self.tril_perms_lin]
                .reshape(self.n_train, n_perms, -1, order='F')
                .reshape(self.n_train * n_perms, -1)
            )
            (
                glob['R_d_desc_alpha_perms'],
                glob['R_d_desc_alpha_perms_shape'],
            ) = share_array(R_d_desc_alpha_perms)

            if 'alphas_E' in model:
                alphas_E_lin = np.tile(model['alphas_E'][:, None], (1, n_perms)).ravel()
                glob['alphas_E_lin'], glob['alphas_E_lin_shape'] = share_array(
                    alphas_E_lin
                )

            # Parallel processing configuration

            self.bulk_mp = False  # Bulk predictions with multiple processes?

            self.pool = None

            # How many workers in addition to main process?
            num_workers = num_workers or (
                self.max_processes - 1
            )  # exclude main process
            self._set_num_workers(num_workers, force_reset=True)

            # Size of chunks in which each parallel task will be processed (unit: number of training samples)
            # This parameter should be as large as possible, but it depends on the size of available memory.
            self._set_chunk_size(batch_size)

    def __del__(self):

        global globs

        try:
            self.pool.terminate()
            self.pool.join()
            self.pool = None
        except:
            pass

        if 'globs' in globals() and globs is not None and self.glob_id < len(globs):
            globs[self.glob_id] = None

    ## Public ##

    # def set_R(self, R):
    #     """
    #     Store a reference to the training geometries.
    #     This function is used to avoid unnecessary copies of the
    #     traininig geometries when evaluation the training error
    #     (= gradient of the model's loss function).

    #     This routine is used during iterative model training.

    #     Parameters
    #     ----------
    #     R : :obj:`numpy.ndarray`
    #         Array containing the geometry for each training point.
    #     """

    #     # Add singleton dimension if input is (,3N).
    #     if R.ndim == 1:
    #         R = R[None, :]

    #     self.R = R

    #     # if self.use_torch:
    #     #     model = self.torch_predict
    #     #     if isinstance(self.torch_predict, torch.nn.DataParallel):
    #     #         model = model.module

    #     #     R_torch = torch.from_numpy(R.reshape(-1, self.n_atoms, 3)).to(self.torch_device)
    #     #     model.set_R(R_torch)

    def set_R_desc(self, R_desc):
        """
        Store a reference to the training geometry descriptors.

        This can accelerate iterative model training.

        Parameters
        ----------
            R_desc : :obj:`numpy.ndarray`, optional
                    An 2D array of size M x D containing the
                    descriptors of dimension D for M
                    molecules.
        """

        self.R_desc = R_desc

    def set_R_d_desc(self, R_d_desc):
        """
        Store a reference to the training geometry descriptor Jacobians.
        This function must be called before `set_alphas()` can be used.

        This routine is used during iterative model training.

        Parameters
        ----------
            R_d_desc : :obj:`numpy.ndarray`, optional
                    A 2D array of size M x D x 3N containing of the
                    descriptor Jacobians for M molecules. The descriptor
                    has dimension D with 3N partial derivatives with
                    respect to the 3N Cartesian coordinates of each atom.
        """

        self.R_d_desc = R_d_desc

        if self.use_torch:
            model = self.torch_predict
            if isinstance(self.torch_predict, torch.nn.DataParallel):
                model = model.module

            model.set_R_d_desc(R_d_desc)

    def set_alphas(self, alphas_F, alphas_E=None):
        """
        Reconfigure the current model with a new set of regression parameters.
        `R_d_desc` needs to be set for this function to work.

        This routine is used during iterative model training.

        Parameters
        ----------
                alphas_F : :obj:`numpy.ndarray`
                    1D array containing the new model parameters.
                alphas_E : :obj:`numpy.ndarray`, optional
                    1D array containing the additional new model parameters, if
                    energy constraints are used in the kernel (`use_E_cstr=True`)
        """

        if self.use_torch:

            model = self.torch_predict
            if isinstance(self.torch_predict, torch.nn.DataParallel):
                model = model.module

            model.set_alphas(alphas_F, alphas_E=alphas_E)

        else:

            assert self.R_d_desc is not None

            global globs
            glob = globs[self.glob_id]

            dim_i = self.desc.dim_i
            R_d_desc_alpha = self.desc.d_desc_dot_vec(
                self.R_d_desc, alphas_F.reshape(-1, dim_i)
            )

            R_d_desc_alpha_perms_new = np.tile(R_d_desc_alpha, glob['n_perms'])[
                :, self.tril_perms_lin
            ].reshape(self.n_train, glob['n_perms'], -1, order='F')

            R_d_desc_alpha_perms = np.frombuffer(glob['R_d_desc_alpha_perms'])
            np.copyto(R_d_desc_alpha_perms, R_d_desc_alpha_perms_new.ravel())

            if alphas_E is not None:

                alphas_E_lin_new = np.tile(
                    alphas_E[:, None], (1, glob['n_perms'])
                ).ravel()

                alphas_E_lin = np.frombuffer(glob['alphas_E_lin'])
                np.copyto(alphas_E_lin, alphas_E_lin_new)

    def _set_num_workers(
        self, num_workers=None, force_reset=False
    ):  # TODO: complain if chunk or worker parameters do not fit training data (this causes issues with the caching)!!
        """
        Set number of processes to use during prediction.

        If bulk_mp == True, each worker handles the whole generation of single prediction (this if for querying multiple geometries at once)
        If bulk_mp == False, each worker may handle only a part of a prediction (chunks are defined in 'wkr_starts_stops'). In that scenario multiple proesses
        are used to distribute the work of generating a single prediction

        This number should not exceed the number of available CPU cores.

        Note
        ----
                This parameter can be optimally determined using
                `prepare_parallel`.

        Parameters
        ----------
                num_workers : int, optional
                    Number of processes (maximum value is set if `None`).
                force_reset : bool, optional
                    Force applying the new setting.
        """

        if force_reset or self.num_workers is not num_workers:

            if self.pool is not None:
                self.pool.terminate()
                self.pool.join()
                self.pool = None

            self.num_workers = 0
            if num_workers is None or num_workers > 0:
                self.pool = Pool(num_workers)
                self.num_workers = (
                    self.pool._processes
                )  # number of actual workers (not max_processes)

        # Data ranges for processes
        if self.bulk_mp or self.num_workers < 2:
            # wkr_starts = [self.n_train]
            wkr_starts = [0]
        else:
            wkr_starts = list(
                range(
                    0,
                    self.n_train,
                    int(np.ceil(float(self.n_train) / self.num_workers)),
                )
            )
        wkr_stops = wkr_starts[1:] + [self.n_train]

        self.wkr_starts_stops = list(zip(wkr_starts, wkr_stops))

    def _set_chunk_size(self, chunk_size=None):

        # TODO: complain if chunk or worker parameters do not fit training data (this causes issues with the caching)!!
        """
        Set chunk size for each worker process.

        Every prediction is generated as a linear combination of the training
        points that the model is comprised of. If multiple workers are available
        (and bulk mode is disabled), each one processes an (approximatelly equal)
        part of those training points. Then, the chunk size determines how much of
        a processes workload is passed to NumPy's underlying low-level routines at
        once. If the chunk size is smaller than the number of points the worker is
        supposed to process, it processes them in multiple steps using a loop. This
        can sometimes be faster, depending on the available hardware.

        Note
        ----
                This parameter can be optimally determined using
                `prepare_parallel`.

        Parameters
        ----------
                chunk_size : int
                        Chunk size (maximum value is set if `None`).
        """

        if chunk_size is None:
            chunk_size = self.n_train

        self.chunk_size = chunk_size

    def _set_batch_size(self, batch_size=None):  # deprecated
        """

        Warning
        -------
        Deprecated! Please use the function `_set_chunk_size` in future projects.

        Set chunk size for each worker process. A chunk is a subset
        of the training data points whose linear combination needs to
        be evaluated in order to generate a prediction.

        The chunk size determines how much of a processes workload will
        be passed to Python's underlying low-level routines at once.
        This parameter is highly hardware dependent.

        Note
        ----
                This parameter can be optimally determined using
                `prepare_parallel`.

        Parameters
        ----------
                batch_size : int
                        Chunk size (maximum value is set if `None`).
        """

        self._set_chunk_size(batch_size)

    def _set_bulk_mp(self, bulk_mp=False):
        """
        Toggles bulk prediction mode.

        If bulk prediction is enabled, the prediction is parallelized accross
        input geometries, i.e. each worker generates the complete prediction for
        one query. Otherwise (depending on the number of available CPU cores) the
        input geometries are process sequentially, but every one of them may be
        processed by multiple workers at once (in chunks).

        Note
        ----
                This parameter can be optimally determined using
                `prepare_parallel`.

        Parameters
        ----------
                bulk_mp : bool, optional
                        Enable or disable bulk prediction mode.
        """

        bulk_mp = bool(bulk_mp)
        if self.bulk_mp is not bulk_mp:
            self.bulk_mp = bulk_mp

            # Reset data ranges for processes stored in 'wkr_starts_stops'
            self._set_num_workers(self.num_workers)

    def set_opt_num_workers_and_batch_size_fast(self, n_bulk=1, n_reps=1):  # deprecated
        """
        Warning
        -------
        Deprecated! Please use the function `prepare_parallel` in future projects.

        Parameters
        ----------
                n_bulk : int, optional
                        Number of geometries that will be passed to the
                        `predict` function in each call (performance
                        will be optimized for that exact use case).
                n_reps : int, optional
                        Number of repetitions (bigger value: more
                        accurate, but also slower).

        Returns
        -------
                int
                        Force and energy prediciton speed in geometries
                        per second.
        """

        self.prepare_parallel(n_bulk, n_reps)

    def prepare_parallel(
        self, n_bulk=1, n_reps=1, return_is_from_cache=False
    ):  # noqa: C901
        """
        Find and set the optimal parallelization parameters for the
        currently loaded model, running on a particular system. The result
        also depends on the number of geometries `n_bulk` that will be
        passed at once when calling the `predict` function.

        This function runs a benchmark in which the prediction routine is
        repeatedly called `n_reps`-times (default: 1) with varying parameter
        configurations, while the runtime is measured for each one. The
        optimal parameters are then cached for fast retrival in future
        calls of this function.

        We recommend calling this function after initialization of this
        class, as it will drastically increase the performance of the
        `predict` function.

        Note
        ----
                Depending on the parameter `n_reps`, this routine may take
                some seconds/minutes to complete. However, once a
                statistically significant number of benchmark results has
                been gathered for a particular configuration, it starts
                returning almost instantly.

        Parameters
        ----------
                n_bulk : int, optional
                        Number of geometries that will be passed to the
                        `predict` function in each call (performance
                        will be optimized for that exact use case).
                n_reps : int, optional
                        Number of repetitions (bigger value: more
                        accurate, but also slower).
                return_is_from_cache : bool, optional
                        If enabled, this function returns a second value
                        indicating if the returned results were obtained
                        from cache.

        Returns
        -------
                int
                        Force and energy prediciton speed in geometries
                        per second.
                boolean, optional
                        Return, whether this function obtained the results
                        from cache.
        """

        # global globs
        # glob = globs[self.glob_id]
        # n_perms = glob['n_perms']

        # No benchmarking necessary if prediction is running on GPUs.
        if self.use_torch:
            self.log.info(
                'Skipping multi-CPU benchmark, since torch is enabled.'
            )  # TODO: clarity!
            return

        # Retrieve cached benchmark results, if available.
        bmark_result = self._load_cached_bmark_result(n_bulk)
        if bmark_result is not None:

            num_workers, chunk_size, bulk_mp, gps = bmark_result

            self._set_chunk_size(chunk_size)
            self._set_num_workers(num_workers)
            self._set_bulk_mp(bulk_mp)

            if return_is_from_cache:
                is_from_cache = True
                return gps, is_from_cache
            else:
                return gps

        warm_up_done = False

        best_results = []
        last_i = None

        best_gps = 0
        gps_min = 0.0

        best_params = None

        r_dummy = np.random.rand(n_bulk, self.n_atoms * 3)

        def _dummy_predict():
            self.predict(r_dummy)

        bulk_mp_rng = [True, False] if n_bulk > 1 else [False]
        for bulk_mp in bulk_mp_rng:
            self._set_bulk_mp(bulk_mp)

            if bulk_mp is False:
                last_i = 0

            num_workers_rng = list(range(0, self.max_processes))
            if bulk_mp:
                num_workers_rng.reverse()  # benchmark converges faster this way

            # num_workers_rng_sizes = [batch_size for batch_size in batch_size_rng if min_batch_size % batch_size == 0]

            # for num_workers in range(min_num_workers,self.max_processes+1):
            for num_workers in num_workers_rng:
                if not bulk_mp and num_workers != 0 and self.n_train % num_workers != 0:
                    continue

                self._set_num_workers(num_workers)

                best_gps = 0
                gps_rng = (np.inf, 0.0)  # min and max per num_workers

                min_chunk_size = (
                    min(self.n_train, n_bulk)
                    if bulk_mp or num_workers < 2
                    else int(np.ceil(self.n_train / num_workers))
                )
                chunk_size_rng = list(range(min_chunk_size, 0, -1))

                chunk_size_rng_sizes = [
                    chunk_size
                    for chunk_size in chunk_size_rng
                    if min_chunk_size % chunk_size == 0
                ]

                # print('batch_size_rng_sizes ' + str(bulk_mp))
                # print(batch_size_rng_sizes)

                i_done = 0
                i_dir = 1
                i = 0 if last_i is None else last_i
                # i = 0

                # print(batch_size_rng_sizes)
                while i >= 0 and i < len(chunk_size_rng_sizes):

                    chunk_size = chunk_size_rng_sizes[i]
                    self._set_chunk_size(chunk_size)

                    i_done += 1

                    if warm_up_done == False:
                        timeit.timeit(_dummy_predict, number=10)
                        warm_up_done = True

                    gps = n_bulk * n_reps / timeit.timeit(_dummy_predict, number=n_reps)

                    # print(
                    #  '{:2d}@{:d} {:d} | {:7.2f} gps'.format(
                    #      num_workers, chunk_size, bulk_mp, gps
                    #  )
                    # )

                    gps_rng = (
                        min(gps_rng[0], gps),
                        max(gps_rng[1], gps),
                    )  # min and max per num_workers

                    # gps_min_max = min(gps_min_max[0], gps), max(gps_min_max[1], gps)

                    # print('     best_gps ' + str(best_gps))

                    # NEW

                    # if gps > best_gps and gps > gps_min: # gps is still going up, everything is good
                    #     best_gps = gps
                    #     best_params = num_workers, batch_size, bulk_mp
                    # else:
                    #     break

                    # if gps > best_gps: # gps is still going up, everything is good
                    #     best_gps = gps
                    #     best_params = num_workers, batch_size, bulk_mp
                    # else: # gps did not go up wrt. to previous step

                    #     # can we switch the search direction?
                    #     #   did we already?
                    #     #   we checked two consecutive configurations
                    #     #   are bigger batch sizes possible?

                    #     print(batch_size_rng_sizes)

                    #     turn_search_dir = i_dir > 0 and i_done == 2 and batch_size != batch_size_rng_sizes[1]

                    #     # only turn, if the current gps is not lower than the lowest overall
                    #     if turn_search_dir and gps >= gps_min:
                    #         i -= 2 * i_dir
                    #         i_dir = -1
                    #         print('><')
                    #         continue
                    #     else:
                    #         print('>>break ' + str(i_done))
                    #         break

                    # NEW

                    # gps still going up?
                    # AND: gps not lower than the lowest overall?
                    # if gps < best_gps and gps >= gps_min:
                    if gps < best_gps:
                        if (
                            i_dir > 0
                            and i_done == 2
                            and chunk_size
                            != chunk_size_rng_sizes[
                                1
                            ]  # there is no point in turning if this is the second batch size in the range
                        ):  # do we turn?
                            i -= 2 * i_dir
                            i_dir = -1
                            # print('><')
                            continue
                        else:
                            if chunk_size == chunk_size_rng_sizes[1]:
                                i -= 1 * i_dir
                            # print('>>break ' + str(i_done))
                            break
                    else:
                        best_gps = gps
                        best_params = num_workers, chunk_size, bulk_mp

                    if (
                        not bulk_mp and n_bulk > 1
                    ):  # stop search early when multiple cpus are available and the 1 cpu case is tested
                        if (
                            gps < gps_min
                        ):  # if the batch size run is lower than the lowest overall, stop right here
                            # print('breaking here')
                            break

                    i += 1 * i_dir

                last_i = i - 1 * i_dir
                i_dir = 1

                if len(best_results) > 0:
                    overall_best_gps = max(best_results, key=lambda x: x[1])[1]
                    if best_gps < overall_best_gps:
                        # print('breaking, because best of last test was worse than overall best so far')
                        break

                    # if best_gps < gps_min:
                    #    print('breaking here3')
                    #    break

                gps_min = gps_rng[0]  # FIX me: is this the overall min?
                # print ('gps_min ' + str(gps_min))

                # print ('best_gps')
                # print (best_gps)

                best_results.append(
                    (best_params, best_gps)
                )  # best results per num_workers

        (num_workers, chunk_size, bulk_mp), gps = max(best_results, key=lambda x: x[1])

        # Cache benchmark results.
        self._save_cached_bmark_result(n_bulk, num_workers, chunk_size, bulk_mp, gps)

        self._set_chunk_size(chunk_size)
        self._set_num_workers(num_workers)
        self._set_bulk_mp(bulk_mp)

        if return_is_from_cache:
            is_from_cache = False
            return gps, is_from_cache
        else:
            return gps

    def _save_cached_bmark_result(self, n_bulk, num_workers, chunk_size, bulk_mp, gps):

        pkg_dir = os.path.dirname(os.path.abspath(__file__))
        bmark_file = '_bmark_cache.npz'
        bmark_path = os.path.join(pkg_dir, bmark_file)

        bkey = '{}-{}-{}-{}'.format(
            self.n_atoms, self.n_train, n_bulk, self.max_processes
        )

        if os.path.exists(bmark_path):

            with np.load(bmark_path, allow_pickle=True) as bmark:
                bmark = dict(bmark)

                bmark['runs'] = np.append(bmark['runs'], bkey)
                bmark['num_workers'] = np.append(bmark['num_workers'], num_workers)
                bmark['batch_size'] = np.append(bmark['batch_size'], chunk_size)
                bmark['bulk_mp'] = np.append(bmark['bulk_mp'], bulk_mp)
                bmark['gps'] = np.append(bmark['gps'], gps)
        else:
            bmark = {
                'code_version': __version__,
                'runs': [bkey],
                'gps': [gps],
                'num_workers': [num_workers],
                'batch_size': [chunk_size],
                'bulk_mp': [bulk_mp],
            }

        np.savez_compressed(bmark_path, **bmark)

    def _load_cached_bmark_result(self, n_bulk):

        pkg_dir = os.path.dirname(os.path.abspath(__file__))
        bmark_file = '_bmark_cache.npz'
        bmark_path = os.path.join(pkg_dir, bmark_file)

        bkey = '{}-{}-{}-{}'.format(
            self.n_atoms, self.n_train, n_bulk, self.max_processes
        )

        if not os.path.exists(bmark_path):
            return None

        with np.load(bmark_path, allow_pickle=True) as bmark:

            # Keep collecting benchmark runs, until we have at least three.
            run_idxs = np.where(bmark['runs'] == bkey)[0]
            if len(run_idxs) >= 3:

                config_keys = []
                for run_idx in run_idxs:
                    config_keys.append(
                        '{}-{}-{}'.format(
                            bmark['num_workers'][run_idx],
                            bmark['batch_size'][run_idx],
                            bmark['bulk_mp'][run_idx],
                        )
                    )

                values, uinverse = np.unique(config_keys, return_index=True)

                best_mean = -1
                best_gps = 0
                for i, config_key in enumerate(zip(values, uinverse)):
                    mean_gps = np.mean(
                        bmark['gps'][
                            np.where(np.array(config_keys) == config_key[0])[0]
                        ]
                    )

                    if best_gps == 0 or best_gps < mean_gps:
                        best_mean = i
                        best_gps = mean_gps

                best_idx = run_idxs[uinverse[best_mean]]
                num_workers = bmark['num_workers'][best_idx]
                chunk_size = bmark['batch_size'][best_idx]
                bulk_mp = bmark['bulk_mp'][best_idx]

                return num_workers, chunk_size, bulk_mp, best_gps

        return None

    def get_GPU_batch(self):
        """
        Get batch size used by the GPU implementation to process bulk
        predictions (predictions for multiple input geometries at once).

        This value is determined on-the-fly depending on the available GPU
        memory.
        """

        if self.use_torch:

            model = self.torch_predict
            if isinstance(model, torch.nn.DataParallel):
                model = model.module

            return model._batch_size()

    def predict(self, R=None, return_E=True):
        """
        Predict energy and forces for multiple geometries. This function
        can run on the GPU, if the optional PyTorch dependency is
        installed and `use_torch=True` was speciefied during
        initialization of this class.

        Optionally, the descriptors and descriptor Jacobians for the
        same geometries can be provided, if already available from some
        previous calculations.

        Note
        ----
                The order of the atoms in `R` is not arbitrary and must
                be the same as used for training the model.

        Parameters
        ----------
                R : :obj:`numpy.ndarray`, optional
                        An 2D array of size M x 3N containing the
                        Cartesian coordinates of each atom of M
                        molecules. If this parameter is ommited, the training
                        error is returned. Note that the training geometries
                        need to be set right after initialization using
                        `set_R()` for this to work.
                return_E : boolean, optional
                        If false (default: true), only the forces are returned.

        Returns
        -------
                :obj:`numpy.ndarray`
                        Energies stored in an 1D array of size M (unless `return_E == False`)
                :obj:`numpy.ndarray`
                        Forces stored in an 2D arry of size M x 3N.
        """

        # Add singleton dimension if input is (,3N).
        if R is not None and R.ndim == 1:
            R = R[None, :]

        if self.use_torch:  # multi-GPU (or CPU if no GPUs are available)

            R_torch = torch.arange(self.n_train)
            if R is None:
                if self.R_d_desc is None:
                    self.log.critical(
                        'A reference to the training geometry descriptors needs to be set (using \'set_R_d_desc()\') for this function to work without arguments (using PyTorch).'
                    )
                    print()
                    os._exit(1)
            else:
                R_torch = (
                    torch.from_numpy(R.reshape(-1, self.n_atoms, 3))
                    .type(torch.float32)
                    .to(self.torch_device)
                )

            model = self.torch_predict
            if R_torch.shape[0] < torch.cuda.device_count() and isinstance(
                model, torch.nn.DataParallel
            ):
                model = self.torch_predict.module
            E_torch_F_torch = model.forward(R_torch, return_E=return_E)

            if return_E:
                E_torch, F_torch = E_torch_F_torch
                E = E_torch.cpu().numpy()
            else:
                (F_torch,) = E_torch_F_torch

            F = F_torch.cpu().numpy().reshape(-1, 3 * self.n_atoms)

        else:  # multi-CPU

            # Use precomputed descriptors in training mode.
            is_desc_in_cache = self.R_desc is not None and self.R_d_desc is not None

            if R is None and not is_desc_in_cache:
                self.log.critical(
                    'A reference to the training geometry descriptors and Jacobians needs to be set for this function to work without arguments.'
                )
                print()
                os._exit(1)

            assert is_desc_in_cache or R is not None

            dim_i = 3 * self.n_atoms
            n_pred = self.R_desc.shape[0] if R is None else R.shape[0]

            E_F = np.empty((n_pred, dim_i + 1))

            if (
                self.bulk_mp and self.num_workers > 0
            ):  # One whole prediction per worker (and multiple workers).

                _predict_wo_r_or_desc = partial(
                    _predict_wkr,
                    lat_and_inv=self.lat_and_inv,
                    glob_id=self.glob_id,
                    wkr_start_stop=None,
                    chunk_size=self.chunk_size,
                )

                for i, e_f in enumerate(
                    self.pool.imap(
                        partial(_predict_wo_r_or_desc, None)
                        if is_desc_in_cache
                        else partial(_predict_wo_r_or_desc, r_desc_d_desc=None),
                        zip(self.R_desc, self.R_d_desc) if is_desc_in_cache else R,
                    )
                ):
                    E_F[i, :] = e_f

            else:  # Multiple workers per prediction (or just one worker).

                for i in range(n_pred):

                    if is_desc_in_cache:
                        r_desc, r_d_desc = self.R_desc[i], self.R_d_desc[i]
                    else:
                        r_desc, r_d_desc = self.desc.from_R(R[i], self.lat_and_inv)

                    _predict_wo_wkr_starts_stops = partial(
                        _predict_wkr,
                        None,
                        (r_desc, r_d_desc),
                        self.lat_and_inv,
                        self.glob_id,
                        chunk_size=self.chunk_size,
                    )

                    if self.num_workers == 0:
                        E_F[i, :] = _predict_wo_wkr_starts_stops()
                    else:
                        E_F[i, :] = sum(
                            self.pool.imap_unordered(
                                _predict_wo_wkr_starts_stops, self.wkr_starts_stops
                            )
                        )

            E_F *= self.std
            F = E_F[:, 1:]
            E = E_F[:, 0] + self.c

        ret = (F,)
        if return_E:
            ret = (E,) + ret

        return ret


================================================
FILE: sgdml/solvers/__init__.py
================================================


================================================
FILE: sgdml/solvers/analytic.py
================================================
#!/usr/bin/python

# MIT License
#
# Copyright (c) 2020-2022 Stefan Chmiela
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

import sys
import logging
import warnings
from functools import partial

import numpy as np
import scipy as sp
import timeit

from .. import DONE, NOT_DONE


class Analytic(object):
    def __init__(self, gdml_train, desc, callback=None):

        self.log = logging.getLogger(__name__)

        self.gdml_train = gdml_train
        self.desc = desc

        self.callback = callback

    # from memory_profiler import profile
    # @profile
    def solve(self, task, R_desc, R_d_desc, tril_perms_lin, y):

        sig = task['sig']
        lam = task['lam']
        use_E_cstr = task['use_E_cstr']

        n_train, dim_d = R_d_desc.shape[:2]
        n_atoms = int((1 + np.sqrt(8 * dim_d + 1)) / 2)
        dim_i = 3 * n_atoms

        if self.callback is not None:
            self.callback = partial(
                self.callback,
                disp_str='Assembling kernel matrix',
            )

        K = -self.gdml_train._assemble_kernel_mat(
            R_desc,
            R_d_desc,
            tril_perms_lin,
            sig,
            self.desc,
            use_E_cstr=use_E_cstr,
            callback=self.callback,
        )  # Flip sign to make convex

        start = timeit.default_timer()

        with warnings.catch_warnings():
            warnings.simplefilter('ignore')

            if K.shape[0] == K.shape[1]:

                K[np.diag_indices_from(K)] += lam  # Regularize

                if self.callback is not None:
                    self.callback = partial(
                        self.callback,
                        disp_str='Solving linear system (Cholesky factorization)',
                    )
                    self.callback(NOT_DONE)

                try:

                    # Cholesky (do not overwrite K in case we need to retry)
                    L, lower = sp.linalg.cho_factor(
                        K, overwrite_a=False, check_finite=False
                    )
                    alphas = -sp.linalg.cho_solve(
                        (L, lower), y, overwrite_b=False, check_finite=False
                    )

                except np.linalg.LinAlgError:  # Try a solver that makes less assumptions

                    if self.callback is not None:
                        self.callback = partial(
                            self.callback,
                            disp_str='Solving linear system (LU factorization)      ',  # Keep whitespaces!
                        )
                        self.callback(NOT_DONE)

                    try:
                        # LU
                        alphas = -sp.linalg.solve(
                            K, y, overwrite_a=True, overwrite_b=True, check_finite=False
                        )
                    except MemoryError:
                        self.log.critical(
                            'Not enough memory to train this system using a closed form solver.'
                        )
                        print()
                        os._exit(1)

                except MemoryError:
                    self.log.critical(
                        'Not enough memory to train this system using a closed form solver.'
                    )
                    print()
                    os._exit(1)
            else:

                if self.callback is not None:
                    self.callback = partial(
                        self.callback,
                        disp_str='Solving over-determined linear system (least squares approximation)',
                    )
                    self.callback(NOT_DONE)

                # Least squares for non-square K
                alphas = -np.linalg.lstsq(K, y, rcond=-1)[0]

        stop = timeit.default_timer()

        if self.callback is not None:
            dur_s = stop - start
            sec_disp_str = 'took {:.1f} s'.format(dur_s) if dur_s >= 0.1 else ''
            self.callback(
                DONE,
                disp_str='Training on {:,} points'.format(n_train),
                sec_disp_str=sec_disp_str,
            )

        return alphas

    @staticmethod
    def est_memory_requirement(n_train, n_atoms):

        est_bytes = 3 * (n_train * 3 * n_atoms) ** 2 * 8  # K + factor(s) of K
        est_bytes += (n_train * 3 * n_atoms) * 8  # alpha

        return est_bytes


================================================
FILE: sgdml/solvers/iterative.py
================================================
#!/usr/bin/python

# MIT License
#
# Copyright (c) 2020-2025 Stefan Chmiela
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

import os
import logging
from functools import partial
import inspect
import multiprocessing as mp

import numpy as np
import scipy as sp
import timeit
import collections

from .. import DONE, NOT_DONE
from ..utils import ui
from ..predict import GDMLPredict

try:
    import torch
except ImportError:
    _has_torch = False
else:
    _has_torch = True


CG_STEPS_HIST_LEN = (
    100  # number of past steps to consider when calculatating solver effectiveness
)
EFF_RESTART_THRESH = 0  # if solver effectiveness is less than that percentage after 'CG_STEPS_HIST_LEN'-steps, a solver restart is triggert (with stronger preconditioner)

MAX_NUM_RESTARTS = 6


class CGRestartException(Exception):
    pass


class Iterative(object):
    def __init__(
        self,
        gdml_train,
        desc,
        max_memory,
        max_processes,
        use_torch,
        callback=None,
    ):

        self.log = logging.getLogger(__name__)

        self.gdml_train = gdml_train
        self.gdml_predict = None
        self.desc = desc

        self.callback = callback

        self._max_memory = max_memory
        self._max_processes = max_processes
        self._use_torch = use_torch

    def _init_precon_operator(
        self, task, R_desc, R_d_desc, tril_perms_lin, inducing_pts_idxs, callback=None
    ):

        lam = task['lam']
        lam_inv = 1.0 / lam

        sig = task['sig']

        use_E_cstr = task['use_E_cstr']

        L_inv_K_mn = self._nystroem_cholesky_factor(
            R_desc,
            R_d_desc,
            tril_perms_lin,
            sig,
            lam,
            use_E_cstr=use_E_cstr,
            col_idxs=inducing_pts_idxs,
            callback=callback,
        )

        L_inv_K_mn = np.ascontiguousarray(L_inv_K_mn)

        lev_scores = np.einsum(
            'i...,i...->...', L_inv_K_mn, L_inv_K_mn
        )  # compute leverage scores because it is basically free once we got the factor

        m, n = L_inv_K_mn.shape

        if self._use_torch and False:  # TURNED OFF!
            _torch_device = 'cuda' if torch.cuda.is_available() else 'cpu'
            L_inv_K_mn_torch = torch.from_numpy(L_inv_K_mn).to(_torch_device)

        global is_primed
        is_primed = False

        def _P_vec(v):

            global is_primed
            if not is_primed:
                is_primed = True
                return v

            if self._use_torch and False:  # TURNED OFF!

                v_torch = torch.from_numpy(v).to(_torch_device)[:, None]
                return (
                    L_inv_K_mn_torch.t().mm(L_inv_K_mn_torch.mm(v_torch)) - v_torch
                ).cpu().numpy() * lam_inv

            else:

                ret = L_inv_K_mn.T.dot(L_inv_K_mn.dot(v))
                ret -= v
                ret *= lam_inv

                return ret

        return sp.sparse.linalg.LinearOperator((n, n), matvec=_P_vec), lev_scores

    def _init_kernel_operator(
        self, task, R_desc, R_d_desc, tril_perms_lin, lam, n, callback=None
    ):

        n_train = R_desc.shape[0]

        # dummy alphas
        v_F = np.zeros((n - n_train, 1)) if task['use_E_cstr'] else np.zeros((n, 1))
        v_E = np.zeros((n_train, 1)) if task['use_E_cstr'] else None

        # Note: The standard deviation is set to 1.0, because we are predicting normalized labels here.
        model = self.gdml_train.create_model(
            task, 'cg', R_desc, R_d_desc, tril_perms_lin, 1.0, v_F, alphas_E=v_E
        )

        self.gdml_predict = GDMLPredict(
            model,
            max_memory=self._max_memory,
            max_processes=self._max_processes,
            use_torch=self._use_torch,
        )

        self.gdml_predict.set_R_desc(R_desc)  # only needed on CPU
        self.gdml_predict.set_R_d_desc(R_d_desc)

        if not self._use_torch:

            if callback is not None:
                callback = partial(callback, disp_str='Optimizing CPU parallelization')
                callback(NOT_DONE)

            self.gdml_predict.prepare_parallel(n_bulk=n_train)

            if callback is not None:
                callback(DONE)

        global is_primed
        is_primed = False

        def _K_vec(v):

            global is_primed
            if not is_primed:
                is_primed = True
                return v

            v_F, v_E = v, None
            if task['use_E_cstr']:
                v_F, v_E = v[:-n_train], v[-n_train:]

            self.gdml_predict.set_alphas(v_F, alphas_E=v_E)

            pred = self.gdml_predict.predict(return_E=task['use_E_cstr'])
            if task['use_E_cstr']:
                e_pred, f_pred = pred
                pred = np.hstack((f_pred.ravel(), -e_pred))
            else:
                pred = pred[0].ravel()

            pred -= lam * v
            return pred

        return sp.sparse.linalg.LinearOperator((n, n), matvec=_K_vec)

    def _nystroem_cholesky_factor(
        self,
        R_desc,
        R_d_desc,
        tril_perms_lin,
        sig,
        lam,
        use_E_cstr,
        col_idxs,
        callback_task_name='',
        callback=None,
    ):

        if callback_task_name != '':
            callback_task_name = ' ({})'.format(callback_task_name)

        if callback is not None:
            callback = partial(
                callback,
                disp_str='Assembling kernel [m x k]{}'.format(callback_task_name),
            )

        dim_d = R_desc.shape[1]
        n_atoms = int((1 + np.sqrt(8 * dim_d + 1)) / 2)
        n = R_desc.shape[0] * n_atoms * 3 + (R_desc.shape[0] if use_E_cstr else 0)
        m = len(
            range(*col_idxs.indices(n)) if isinstance(col_idxs, slice) else col_idxs
        )

        K_nmm = self.gdml_train._assemble_kernel_mat(
            R_desc,
            R_d_desc,
            tril_perms_lin,
            sig,
            self.desc,
            use_E_cstr=use_E_cstr,
            col_idxs=col_idxs,
            alloc_extra_rows=m,
            callback=callback,
        )

        # Store (psd) copy of K_mm in lower part of this oversized K_(n+m)m matrix.
        K_nmm[-m:, :] = -K_nmm[col_idxs, :]

        K_nm = K_nmm[:-m, :]
        K_mm = K_nmm[-m:, :]

        if callback is not None:
            callback = partial(
                callback,
                disp_str='Cholesky fact. (1/2) [k x k]{}'.format(callback_task_name),
            )
            callback(NOT_DONE)

        # Additional regularization is almost always necessary here (hence pre_reg=True).
        K_mm, lower = self._cho_factor_stable(K_mm, pre_reg=True)  # overwrites input!
        L_mm = K_mm
        # del K_mm

        if callback is not None:
            callback(DONE)
            callback = partial(
                callback,
                disp_str='m tri. solves (1/2) [k x k]{}'.format(callback_task_name),
            )
            callback(0, n)

        b_start, b_size = 0, int(n / 4)  # update in percentage steps of 25
        for b_stop in list(range(b_size, n, b_size)) + [n]:

            K_nm[b_start:b_stop, :] = sp.linalg.solve_triangular(
                L_mm,
                K_nm[b_start:b_stop, :].T,
                lower=lower,
                trans='T',
                overwrite_b=True,
                check_finite=False,
            ).T
            b_start = b_stop

            if callback is not None:
                callback(b_stop, n)

        del L_mm

        K_nmm[-m:, :] = K_nm.T.dot(K_nm)
        K_nmm[-m:, :][np.diag_indices_from(K_nmm[-m:, :])] += lam
        inner = K_nmm[-m:, :]

        if callback is not None:
            callback = partial(
                callback,
                disp_str='Cholesky fact. (2/2) [k x k]{}'.format(callback_task_name),
            )
            callback(NOT_DONE)

        L_lower = self._cho_factor_stable(
            inner, eps_mag_max=-14
        )  # Do not regularize more than 1e-14.
        if L_lower is not None:
            K_nmm[-m:, :], lower = L_lower
            L = K_nmm[-m:, :]
            del inner
        else:

            callback = partial(
                callback,
                disp_str='QR fact. (alt.) [k x k]{}'.format(callback_task_name),
            )
            callback(NOT_DONE)

            K_nmm[-m:, :] = 0
            K_nmm[-m:, :][np.diag_indices(m)] = np.sqrt(lam)

            K_nmm[-m:, :] = np.linalg.qr(K_nmm, mode='r')
            L = K_nmm[-m:, :]
            lower = False

        if callback is not None:
            callback(DONE)
            callback = partial(
                callback,
                disp_str='m tri. solves (2/2) [k x k]{}'.format(callback_task_name),
            )
            callback(0, n)

        b_start, b_size = 0, int(n / 4)  # update in percentage steps of 25
        for b_stop in list(range(b_size, n, b_size)) + [n]:

            K_nm[b_start:b_stop, :] = sp.linalg.solve_triangular(
                L,
                K_nm[b_start:b_stop, :].T,
                lower=lower,
                trans='T',
                overwrite_b=True,
                check_finite=False,
            ).T  # Note: Overwrites K_nm to save memory
            b_start = b_stop

            if callback is not None:
                callback(b_stop, n)
        del L

        return K_nm.T

    def _lev_scores(
        self,
        R_desc,
        R_d_desc,
        tril_perms_lin,
        sig,
        lam,
        use_E_cstr,
        n_inducing_pts,
        callback=None,
    ):

        n_train, dim_d = R_d_desc.shape[:2]
        dim_i = 3 * int((1 + np.sqrt(8 * dim_d + 1)) / 2)

        # Convert from training points to actual columns.
        # dim_m = (
        #    np.maximum(1, n_inducing_pts // 4) * dim_i
        # )  # only use 1/4 of inducing points for leverage score estimate
        dim_m = dim_i * min(n_inducing_pts, 10)

        # Which columns to use for leverage score approximation?
        lev_approx_idxs = np.sort(
            np.random.choice(
                n_train * dim_i + (n_train if use_E_cstr else 0), dim_m, replace=False
            )
        )  # random subset of columns
        # lev_approx_idxs = np.sort(np.random.choice(n_train*dim_i, dim_m, replace=False)) # random subset of columns

        # lev_approx_idxs = np.s_[
        #    :dim_m
        # ]  # first 'dim_m' columns (faster kernel construction)

        L_inv_K_mn = self._nystroem_cholesky_factor(
            R_desc,
            R_d_desc,
            tril_perms_lin,
            sig,
            lam,
            use_E_cstr=use_E_cstr,
            col_idxs=lev_approx_idxs,
            callback_task_name='lev. scores',
            callback=callback,
        )

        lev_scores = np.einsum('i...,i...->...', L_inv_K_mn, L_inv_K_mn)
        return lev_scores

    def inducing_pts_from_lev_scores(self, lev_scores, N):

        # Sample 'N' columns with probabilities proportional to the leverage scores.
        inducing_pts_idxs = np.random.choice(
            np.arange(lev_scores.size),
            N,
            replace=False,
            p=lev_scores / lev_scores.sum(),
        )

        return np.sort(inducing_pts_idxs)

    # performs a cholesky decompostion of a matrix, but regularizes the matrix (if neeeded) until its positive definite
    def _cho_factor_stable(self, M, pre_reg=False, eps_mag_max=1):
        """
        Performs a Cholesky decompostion of a matrix, but regularizes
        as needed until its positive definite.

        Parameters
        ----------
            M : :obj:`numpy.ndarray`
                Matrix to factorize.
            pre_reg : boolean, optional
                Regularize M right away (machine precision), before
                trying to factorize it (default: False).

        Returns
        -------
            :obj:`numpy.ndarray`
                Matrix whose upper or lower triangle contains the Cholesky factor of a. Other parts of the matrix contain random data.
            boolean
                Flag indicating whether the factor is in the lower or upper triangle
        """

        eps = np.finfo(float).eps
        eps_mag = int(np.floor(np.log10(eps)))

        if pre_reg:
            M[np.diag_indices_from(M)] += eps
            eps_mag += 1  # if additional regularization is necessary, start from the next order of magnitude

        for reg in 10.0 ** np.arange(
            eps_mag, eps_mag_max + 1
        ):  # regularize more and more aggressively (strongest regularization: 1)
            try:

                L, lower = sp.linalg.cho_factor(
                    M, overwrite_a=False, check_finite=False
                )

            except np.linalg.LinAlgError as e:

                if 'not positive definite' in str(e):
                    self.log.debug(
                        'Cholesky solver needs more aggressive regularization (adding {} to diagonal)'.format(
                            reg
                        )
                    )
                    M[np.diag_indices_from(M)] += reg
                else:
                    raise e
            else:
                return L, lower

        self.log.critical(
            'Failed to factorize despite strong regularization (max: {})!\nYou could try a larger sigma.'.format(
                10.0**eps_mag_max
            )
        )
        print()
        os._exit(1)

    def solve(
        self,
        task,
        R_desc,
        R_d_desc,
        tril_perms_lin,
        y,
        y_std,
        tol=1e-4,
        save_progr_callback=None,
    ):

        global num_iters, start, resid, avg_tt, m  # , P_t

        n_train, n_atoms = task['R_train'].shape[:2]
        dim_i = 3 * n_atoms

        sig = task['sig']
        lam = task['lam']

        # these keys are only present if the task was created from an existing model
        alphas0_F = task['alphas0_F'] if 'alphas0_F' in task else None
        alphas0_E = task['alphas0_E'] if 'alphas0_E' in task else None
        num_iters0 = task['solver_iters'] if 'solver_iters' in task else 0

        # Number of inducing points to use for Nystrom approximation.
        max_memory_bytes = self._max_memory * 1024**3
        max_n_inducing_pts = Iterative.max_n_inducing_pts(
            n_train, n_atoms, max_memory_bytes
        )
        n_inducing_pts = min(n_train, max_n_inducing_pts)
        n_inducing_pts_init = (
            len(task['inducing_pts_idxs']) // (3 * n_atoms)
            if 'inducing_pts_idxs' in task
            else None
        )

        if self.callback is not None:
            self.callback = partial(
                self.callback,
                disp_str='Building preconditioner (k={} ind. point{})'.format(
                    n_inducing_pts, 's' if n_inducing_pts > 1 else ''
                ),
            )
        subtask_callback = (
            partial(ui.sec_callback, main_callback=self.callback)
            if self.callback is not None
            else None
        )

        lev_scores = None
        if n_inducing_pts_init is not None and n_inducing_pts_init == n_inducing_pts:
            inducing_pts_idxs = task['inducing_pts_idxs']  # reuse old inducing points
        else:
            # Determine good inducing points.
            lev_scores = self._lev_scores(
                R_desc,
                R_d_desc,
                tril_perms_lin,
                sig,
                lam,
                task['use_E_cstr'],
                n_inducing_pts,
                callback=subtask_callback,
            )

            dim_m = n_inducing_pts * dim_i
            inducing_pts_idxs = self.inducing_pts_from_lev_scores(lev_scores, dim_m)

        start = timeit.default_timer()
        P_op, lev_scores = self._init_precon_operator(
            task,
            R_desc,
            R_d_desc,
            tril_perms_lin,
            inducing_pts_idxs,
            callback=subtask_callback,
        )
        stop = timeit.default_timer()

        if self.callback is not None:
            dur_s = stop - start
            sec_disp_str = 'took {:.1f} s'.format(dur_s) if dur_s >= 0.1 else ''
            self.callback(DONE, sec_disp_str=sec_disp_str)

            self.callback = partial(
                self.callback,
                disp_str='Initializing solver',
            )
        subtask_callback = (
            partial(ui.sec_callback, main_callback=self.callback)
            if self.callback is not None
            else None
        )

        n = P_op.shape[0]
        K_op = self._init_kernel_operator(
            task, R_desc, R_d_desc, tril_perms_lin, lam, n, callback=subtask_callback
        )

        num_iters = int(num_iters0)

        if self.callback is not None:

            num_devices = (
                mp.cpu_count() if self._max_processes is None else self._max_processes
            )
            if self._use_torch:
                num_devices = (
                    torch.cuda.device_count()
                    if torch.cuda.is_available()
                    else torch.get_num_threads()
                )
            hardware_str = '{:d} {}{}{}'.format(
                num_devices,
                'GPU' if self._use_torch and torch.cuda.is_available() else 'CPU',
                's' if num_devices > 1 else '',
                '[PyTorch]' if self._use_torch else '',
            )

            self.callback(NOT_DONE, sec_disp_str=None)

        start = 0
        resid = 0
        avg_tt = 0

        global alpha_t, eff, steps_hist, callback_disp_str

        alpha_t = None
        if alphas0_F is not None:  # TODO: improve me: this will not workt with E_cstr
            alpha_t = -alphas0_F

        if alphas0_E is not None:
            alpha_t = np.hstack((alpha_t, -alphas0_E))

        steps_hist = collections.deque(
            maxlen=CG_STEPS_HIST_LEN
        )  # moving average window for step history

        callback_disp_str = 'Initializing solver'

        def _cg_status(xk):

            global num_iters, start, resid, alpha_t, avg_tt, eff, steps_hist, callback_disp_str, P_t

            stop = timeit.default_timer()
            tt = 0.0 if start == 0 else (stop - start)
            avg_tt += tt
            start = timeit.default_timer()

            old_resid = resid
            try:

                # Can we extract the residual from the solver?
                f_locals = inspect.currentframe().f_back.f_locals
                if 'resid' in f_locals:
                    resid = f_locals['resid']
                elif 'r' in f_locals:
                    resid = np.linalg.norm(f_locals['r'])
                else:
                    raise KeyError

            except KeyError:

                # Fallback: compute residual from scratch (slower)
                rk = y + K_op @ xk
                resid = np.linalg.norm(rk)

            step = 0 if num_iters == num_iters0 else resid - old_resid
            steps_hist.append(step)

            steps_hist_arr = np.array(steps_hist)
            steps_hist_all = np.abs(steps_hist_arr).sum()
            steps_hist_ratio = (
                (-steps_hist_arr.clip(max=0).sum() / steps_hist_all)
                if steps_hist_all > 0
                else 1
            )
            eff = (
                0 if num_iters == num_iters0 else (int(100 * steps_hist_ratio) - 50) * 2
            )

            if tt > 0.0 and num_iters % int(np.ceil(1.0 / tt)) == 0:  # once per second

                train_rmse = resid / np.sqrt(len(y))
                if self.callback is not None:
                    callback_disp_str = 'Training error (RMSE): forces {:.4f}'.format(
                        train_rmse
                    )
                    self.callback(
                        NOT_DONE,
                        disp_str=callback_disp_str,
                        sec_disp_str=(
                            '{:d} iter @ {} iter/s [eff: {:d}%], k={:d}'.format(
                                num_iters,
                                '{:.2f}'.format(1.0 / tt),
                                eff,
                                n_inducing_pts,
                            )
                        ),
                    )

            # Write out current solution as a model file once every 2 minutes (give or take).
            if (
                tt > 0.0
                and num_iters % int(np.ceil(2 * 60.0 / tt)) == 0
                and num_iters % 10 == 0
            ):

                self.log.debug('Saving model checkpoint.')

                # TODO: support for +E constraints (done?)
                alphas_F, alphas_E = -xk, None
                if task['use_E_cstr']:
                    n_train = task['R_train'].shape[0]
                    alphas_F, alphas_E = -xk[:-n_train], -xk[-n_train:]

                unconv_model = self.gdml_train.create_model(
                    task,
                    'cg',
                    R_desc,
                    R_d_desc,
                    tril_perms_lin,
                    y_std,
                    alphas_F,
                    alphas_E=alphas_E,
                )

                solver_keys = {
                    'solver_tol': tol,
                    'solver_iters': num_iters
                    + 1,  # number of iterations performed (cg solver)
                    'solver_resid': resid,  # residual of solution
                    'norm_y_train': np.linalg.norm(y),
                    'inducing_pts_idxs': inducing_pts_idxs,
                }

                unconv_model.update(solver_keys)

                # recover integration constant
                self.gdml_predict.set_alphas(alphas_F, alphas_E=alphas_E)
                E_pred, _ = self.gdml_predict.predict()

                E_pred *= y_std

                unconv_model['c'] = 0
                if 'E_train' in task:
                    E_ref = np.squeeze(task['E_train'])
                    unconv_model['c'] = np.mean(E_ref - E_pred)

                if save_progr_callback is not None:
                    save_progr_callback(unconv_model)

            num_iters += 1

            n_train = task['idxs_train'].shape[0]
            if (
                len(steps_hist) == CG_STEPS_HIST_LEN
                and eff <= EFF_RESTART_THRESH
                and n_inducing_pts < n_train
            ):
                alpha_t = xk
                raise CGRestartException

        num_restarts = 0
        while True:
            try:
                alphas, info = sp.sparse.linalg.cg(
                    -K_op,
                    y,
                    x0=alpha_t,
                    M=P_op,
                    rtol=tol,  # norm(residual) <= max(rtol*norm(b), atol)
                    atol=0,
                    maxiter=3
                    * n_atoms
                    * n_train
                    * 10,  # allow 10x as many iterations as theoretically needed (at perfect precision)
                    callback=_cg_status,
                )
                alphas = -alphas

            except CGRestartException:

                num_restarts += 1
                steps_hist.clear()

                if num_restarts == MAX_NUM_RESTARTS:
                    info = 1  # convergence to tolerance not achieved
                    alphas = alpha_t
                    break
                else:
                    num_restarts_left = MAX_NUM_RESTARTS - num_restarts - 1
                    self.log.debug(
                        'Restarts left before giving up: {}{}.'.format(
                            num_restarts_left,
                            ' (final trial)' if num_restarts_left == 0 else '',
                        )
                    )

                # TODO: keep using same number of points

                n_inducing_pts = min(
                    int(np.ceil(1.2 * n_inducing_pts)), n_train
                )  # increase in increments (ignoring memory limits...)

                subtask_callback = (
                    partial(
                        ui.sec_callback,
                        main_callback=partial(
                            self.callback, disp_str=callback_disp_str
                        ),
                    )
                    if self.callback is not None
                    else None
                )

                dim_m = n_inducing_pts * dim_i
                inducing_pts_idxs = self.inducing_pts_from_lev_scores(lev_scores, dim_m)

                del P_op
                P_op, lev_scores = self._init_precon_operator(
                    task,
                    R_desc,
                    R_d_desc,
                    tril_perms_lin,
                    inducing_pts_idxs,
                    callback=subtask_callback,
                )

            else:
                break

        is_conv = info == 0

        if self.callback is not None:

            is_conv_warn_str = '' if is_conv else ' (NOT CONVERGED)'
            self.callback(
                DONE,
                disp_str='Training on {:,} points{}'.format(n_train, is_conv_warn_str),
                sec_disp_str=(
                    '{:d} iter @ {} iter/s'.format(
                        num_iters,
                        '{:.2f}'.format(num_iters / avg_tt) if avg_tt > 0 else '--',
                    )
                ),
                done_with_warning=not is_conv,
            )

        train_rmse = resid / np.sqrt(len(y))

        return alphas, tol, num_iters, resid, train_rmse, inducing_pts_idxs, is_conv

    @staticmethod
    def max_n_inducing_pts(n_train, n_atoms, max_memory_bytes):

        SQUARE_FACT = 5
        LINEAR_FACT = 4

        to_bytes = 8
        to_dof = (3 * n_atoms) ** 2 * to_bytes

        sq_factor = LINEAR_FACT * n_train * to_dof
        ny_factor = SQUARE_FACT * to_dof

        n_inducing_pts = (
            np.sqrt(sq_factor**2 + 4.0 * ny_factor * max_memory_bytes) - sq_factor
        ) / (2 * ny_factor)
        n_inducing_pts = int(n_inducing_pts)

        return min(n_inducing_pts, n_train)

    @staticmethod
    def est_memory_requirement(n_train, n_inducing_pts, n_atoms):

        SQUARE_FACT = 5
        LINEAR_FACT = 4

        # est_bytes = n_train * n_inducing_pts * (3 * n_atoms) ** 2 * 8  # P_op
        # est_bytes += 2 * (n_inducing_pts * 3 * n_atoms) ** 2 * 8  # P_op [cho_factor]
        # est_bytes += (n_train * 3 * n_atoms) * 8  # lev_scores
        # est_bytes += (n_train * 3 * n_atoms) * 8  # alpha

        est_bytes = LINEAR_FACT * n_train * n_inducing_pts * (3 * n_atoms) ** 2 * 8

        est_bytes += (
            SQUARE_FACT * n_inducing_pts * n_inducing_pts * (3 * n_atoms) ** 2 * 8
        )

        # est_bytes += (n_train * 3 * n_atoms) * 8  # lev_scores
        # est_bytes += (n_train * 3 * n_atoms) * 8  # alpha

        return est_bytes


================================================
FILE: sgdml/torchtools.py
================================================
#!/usr/bin/python

# MIT License
#
# Copyright (c) 2019-2023 Stefan Chmiela, Jan Hermann
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

import os
import sys
import logging
from functools import partial
import numpy as np
import torch
import torch.nn as nn
from torch.utils.data import DataLoader

try:
    _torch_mps_is_available = torch.backends.mps.is_available()
except AttributeError:
    _torch_mps_is_available = False
_torch_mps_is_available = False

try:
    _torch_cuda_is_available = torch.cuda.is_available()
except AttributeError:
    _torch_cuda_is_available = False


from .utils.desc import Desc
from .utils import ui

_dtype = torch.float64


def _next_batch_size(n_total, batch_size):

    batch_size += 1
    while n_total % batch_size != 0:
        batch_size += 1

    return batch_size


class GDMLTorchAssemble(nn.Module):
    """
    PyTorch version of the kernel assembly routines in :class:`~predict.GDMLTrain`.
    Derives from :class:`torch.nn.Module`. Contains no trainable parameters.
    """

    def __init__(
        self,
        J,
        tril_perms_lin,
        sig,
        use_E_cstr,
        R_desc_torch,
        R_d_desc_torch,
        out,
        callback=None,
    ):

        global _n_batches, _n_perm_batches

        super(GDMLTorchAssemble, self).__init__()

        self._log = logging.getLogger(__name__)

        self.callback = callback

        self.n_train, self.dim_d = R_d_desc_torch.shape[:2]
        self.n_atoms = int((1 + np.sqrt(8 * self.dim_d + 1)) / 2)
        self.dim_i = 3 * self.n_atoms

        self.sig = float(sig)
        self.tril_perms_lin = tril_perms_lin
        self.n_perms = len(self.tril_perms_lin) // self.dim_d

        self.use_E_cstr = use_E_cstr

        self.R_desc_torch = nn.Parameter(R_desc_torch.type(_dtype), requires_grad=False)
        self.R_d_desc_torch = nn.Parameter(
            R_d_desc_torch.type(_dtype), requires_grad=False
        )

        self._desc = Desc(self.n_atoms)

        self.J = J
        _n_batches = 1
        _n_perm_batches = 1

        self.out = out

    def _forward(
        self,
        j,
    ):

        global _n_batches, _n_perm_batches

        if type(j) is tuple:  # selective/"fancy" indexing
            (
                K_j,
                j,
                keep_idxs_3n,
            ) = j  # (block index in final K, block index global, indices of partials within block)
            blk_j_len = len(keep_idxs_3n)
            blk_j = slice(K_j, K_j + blk_j_len)

        else:  # sequential indexing
            blk_j_len = self.dim_i
            K_j = (
                j * self.dim_i
                if j < self.n_train
                else self.n_train * self.dim_i + (j % self.n_train)
            )
            blk_j = (
                slice(K_j, K_j + self.dim_i)
                if j < self.n_train
                else slice(K_j, K_j + 1)
            )
            keep_idxs_3n = slice(None)  # same as [:]

        q = np.sqrt(5) / self.sig

        if (
            j < self.n_train
        ):  # This column only contrains second and first derivative constraints.

            # Create decompressed a 'rj_d_desc'.
            rj_d_desc_decomp_torch = self._desc.d_desc_from_comp(
                self.R_d_desc_torch[j % self.n_train, :, :]
            )[0][:, keep_idxs_3n]

            n_perms_done = 0
            for perm_batch in np.array_split(
                np.arange(self.n_perms), min(_n_perm_batches, self.n_perms)
            ):

                tril_perms_lin_batch = (
                    self.tril_perms_lin.reshape(-1, self.n_perms)[:, perm_batch]
                    - n_perms_done * self.dim_d
                ).ravel()  # index shift

                n_perms_batch = len(perm_batch)
                n_perms_done += n_perms_batch

                # Create a permutated 'rj_desc'.
                rj_desc_perms_torch = torch.reshape(
                    torch.tile(self.R_desc_torch[j, :], (n_perms_batch,))[
                        tril_perms_lin_batch
                    ],
                    (-1, n_perms_batch),
                ).T

                # Create a permutated 'rj_d_desc'.
                rj_d_desc_perms_torch = torch.reshape(
                    torch.tile(rj_d_desc_decomp_torch.T, (n_perms_batch,))[
                        :, tril_perms_lin_batch
                    ],
                    (-1, self.dim_d, n_perms_batch),
                )

                for i_batch in np.array_split(np.arange(self.n_train), _n_batches):

                    x_diffs = q * (
                        self.R_desc_torch[i_batch, None, :]
                        - rj_desc_perms_torch[None, :, :]
                    )  # N, n_perms, d

                    x_dists = x_diffs.norm(dim=-1)  # N, n_perms

                    exp_xs = torch.exp(-x_dists) * (q**2) / 3  # N, n_perms
                    exp_xs_1_x_dists = exp_xs * (1 + x_dists)  # N, n_perms*N_train

                    del x_dists  # E_cstr

                    diff_ab_outer_perms_torch = torch.einsum(
                        '...ki,...kj->...ij',  # (slow)
                        x_diffs * exp_xs[:, :, None],  # N, n_perms, d
                        torch.einsum(
                            '...ki,jik -> ...kj',
                            x_diffs,
                            rj_d_desc_perms_torch,
                        ),  # N, n_perms, a*3
                    )  # N, n_perms, a*3
                    del exp_xs

                    if not self.use_E_cstr:
                        del x_diffs

                    diff_ab_outer_perms_torch -= torch.einsum(
                        'ikj,...j->...ki',
                        rj_d_desc_perms_torch,
                        exp_xs_1_x_dists,
                    )

                    if not self.use_E_cstr:
                        del exp_xs_1_x_dists

                    R_d_desc_decomp_torch = self._desc.d_desc_from_comp(
                        self.R_d_desc_torch[i_batch, :, :]
                    )

                    k = torch.einsum(
                        '...ij,...ik->...kj',
                        diff_ab_outer_perms_torch,  # N, d, 3*a
                        R_d_desc_decomp_torch,
                    )
                    del diff_ab_outer_perms_torch
                    del R_d_desc_decomp_torch

                    blk_i = slice(
                        i_batch[0] * self.dim_i, (i_batch[-1] + 1) * self.dim_i
                    )

                    k_np = k.cpu().numpy().reshape(-1, blk_j_len)
                    if (
                        n_perms_done == n_perms_batch
                    ):  # first permutation batch iteration
                        self.out[blk_i, blk_j] = k_np
                    else:
                        self.out[blk_i, blk_j] = self.out[blk_i, blk_j] + k_np
                    del k

                    # First derivative constraints
                    if self.use_E_cstr:

                        K_fe = (x_diffs / q) * exp_xs_1_x_dists[:, :, None]
                        del x_diffs
                        del exp_xs_1_x_dists

                        K_fe = -torch.einsum(
                            '...ik,jki -> ...j', K_fe, rj_d_desc_perms_torch
                        )

                        E_off_i = self.n_train * self.dim_i
                        i_batch_off = i_batch + E_off_i
                        self.out[
                            i_batch_off[0] : (i_batch_off[-1] + 1), blk_j
                        ] = K_fe.cpu().numpy()

                del rj_desc_perms_torch
                del rj_d_desc_perms_torch

        else:

            if self.use_E_cstr:

                n_perms_done = 0
                for perm_batch in np.array_split(
                    np.arange(self.n_perms), min(_n_perm_batches, self.n_perms)
                ):

                    tril_perms_lin_batch = (
                        self.tril_perms_lin.reshape(-1, self.n_perms)[:, perm_batch]
                        - n_perms_done * self.dim_d
                    ).ravel()  # index shift

                    n_perms_batch = len(perm_batch)
                    n_perms_done += n_perms_batch

                    for i_batch in np.array_split(np.arange(self.n_train), _n_batches):

                        ri_desc_perms_torch = torch.reshape(
                            torch.tile(
                                self.R_desc_torch[i_batch, :], (1, n_perms_batch)
                            )[:, tril_perms_lin_batch],
                            (len(i_batch), -1, n_perms_batch),
                        )

                        # Create decompressed a 'ri_d_desc'.
                        ri_d_desc_decomp_torch = self._desc.d_desc_from_comp(
                            self.R_d_desc_torch[i_batch, :, :]
                        )

                        ri_d_desc_perms_torch = torch.reshape(
                            torch.tile(ri_d_desc_decomp_torch, (1, n_perms_batch, 1))[
                                :, tril_perms_lin_batch, :
                            ],
                            (len(i_batch), self.dim_d, n_perms_batch, -1),
                        )
                        # del ri_d_desc_decomp_torch

                        x_diffs = q * (
                            self.R_desc_torch[j % self.n_train, None, :, None]
                            - ri_desc_perms_torch
                        )

                        x_dists = x_diffs.norm(dim=1)

                        exp_xs = torch.exp(-x_dists) * (q**2) / 3
                        exp_xs_1_x_dists = exp_xs * (1 + x_dists)

                        K_fe = x_diffs / q * exp_xs_1_x_dists[:, None, :]
                        K_fe = -torch.einsum(
                            '...ik,...ikj -> ...j', K_fe, ri_d_desc_perms_torch
                        ).ravel()
                        k_fe = K_fe.cpu().numpy()

                        k_ee = -torch.einsum(
                            '...i,...i -> ...',
                            1 + x_dists * (1 + x_dists / 3),
                            torch.exp(-x_dists),
                        )
                        k_ee = k_ee.cpu().numpy()

                        E_off_i = (
                            self.n_train * self.dim_i
                        )  # Account for 'alloc_extra_rows'!.
                        blk_i_full = slice(
                            i_batch[0] * self.dim_i, (i_batch[-1] + 1) * self.dim_i
                        )
                        if (
                            n_perms_done == n_perms_batch
                        ):  # first permutation batch iteration
                            self.out[blk_i_full, K_j] = k_fe
                            self.out[E_off_i + i_batch, K_j] = k_ee
                        else:
                            self.out[blk_i_full, K_j] = self.out[blk_i_full, K_j] + k_fe
                            self.out[E_off_i + i_batch, K_j] = (
                                self.out[E_off_i + i_batch, K_j] + k_ee
                            )

        return blk_j.stop - blk_j.start

    def forward(self, J_indx):

        global _n_batches, _n_perm_batches

        for i in J_indx:
            while True:
                try:
                    done = self._forward(self.J[i])
                except RuntimeError as e:
                    if 'out of memory' in str(e):
                        if _torch_cuda_is_available:
                            torch.cuda.empty_cache()

                        if _n_batches < self.n_train:
                            _n_batches = _next_batch_size(self.n_train, _n_batches)

                            self._log.debug(
                                'Assembling each kernel column in {} batches, i.e. {} points/batch ({} points in total).'.format(
                                    _n_batches,
                                    self.n_train // _n_batches,
                                    self.n_train,
                                )
                            )

                        elif _n_perm_batches < self.n_perms:
                            _n_perm_batches = _next_batch_size(
                                self.n_perms, _n_perm_batches
                            )

                            self._log.debug(
                                'Generating permutations in {} batches, i.e. {} permutations/batch ({} permutations in total).'.format(
                                    _n_perm_batches,
                                    self.n_perms // _n_perm_batches,
                                    self.n_perms,
                                )
                            )

                        else:
                            self._log.critical(
                                'Could not allocate enough memory to assemble kernel matrix, even block-by-block and/or handling perms in batches.'
                            )
                            print()
                            os._exit(1)
                    else:
                        raise e
                else:
                    if self.callback is not None:
                        self.callback(done)

                    break


class GDMLTorchPredict(nn.Module):
    """
    PyTorch version of :class:`~predict.GDMLPredict`. Derives from
    :class:`torch.nn.Module`. Contains no trainable parameters.
    """

    def __init__(
        self,
        model,
        lat_and_inv=None,
        batch_size=None,
        n_perm_batches=1,
        max_memory=None,
        max_processes=None,
        log_level=None,
    ):
        """
        Parameters
        ----------
        model : Mapping
            Obtained from :meth:`~train.GDMLTrain.train`.
        lat_and_inv : tuple of :obj:`numpy.ndarray`
            Tuple of 3 x 3 matrix containing lattice vectors as columns and its inverse.
        batch_size : int, optional
            Maximum batch size of geometries for prediction. Calculated from
            :paramref:`max_mem` if not given.
        n_perm_batches : int, optional
            Divide the processing of all symmetries for each point into smaller
            batches or precompute all in the beginning (needs  more memmory, but faster)?
        max_memory : float, optional
            (unit GB) Maximum allowed CPU memory for prediction (GPU memory always unlimited)
        """

        global _batch_size, _n_perm_batches

        super(GDMLTorchPredict, self).__init__()

        self._log = logging.getLogger(__name__)
        if log_level is not None:
            self._log.setLevel(log_level)

        model = dict(model)

        self._lat_and_inv = (
            None
            if lat_and_inv is None
            else (
                torch.tensor(lat_and_inv[0], dtype=_dtype),
                torch.tensor(lat_and_inv[1], dtype=_dtype),
            )
        )

        self.dim_d, self.n_train = model['R_desc'].shape[:2]
        self.dim_i = 3 * int((1 + np.sqrt(8 * self.dim_d + 1)) / 2)
        self.n_perms, self.n_atoms = model['perms'].shape

        # Check dublicates in permutation list.
        if model['perms'].shape[0] != np.unique(model['perms'], axis=0).shape[0]:
            self._log.warning('Model contains dublicate permutations')

        # Find index of identify permutation.
        self.idx_id_perm = np.where(
            (model['perms'] == np.arange(self.n_atoms)).all(axis=1)
        )[0]

        # No identity permutation found.
        if len(self.idx_id_perm) == 0:
            self._log.critical('Identity permutation is missing!')
            print()
            os._exit(1)

        # Identity permutation not at index zero.
        if len(self.idx_id_perm) > 0 and self.idx_id_perm[0] != 0:
            self._log.debug(
                'Identity is not at first position in permutation list (found at index {})'.format(
                    self.idx_id_perm[0]
                )
            )

        self.idx_id_perm = self.idx_id_perm[0]

        self._sig = int(model['sig'])
        self._c = float(model['c'])
        self._std = float(model.get('std', 1))

        self.tril_indices = np.tril_indices(self.n_atoms, k=-1)

        if _torch_cuda_is_available:  # Ignore limits and take whatever the GPU has.
            max_memory = (
                min(
                    [
                        torch.cuda.get_device_properties(i).total_memory
                        for i in range(torch.cuda.device_count())
                    ]
                )
                // 2**30
            )  # bytes to GB
        else:  # TODO: what about MPS?
            default_cpu_max_mem = 32
            if max_memory is None:
                self._log.warning(
                    'PyTorch CPU memory budget is limited to {} by default, which may impact performance.\n'.format(
                        ui.gen_memory_str(2**30 * default_cpu_max_mem)
                    )
                    + 'If necessary, adjust memory limit with option \'-m\'.'
                )
            max_memory = (
                max_memory or default_cpu_max_mem
            )  # 32 GB as default (hardcoded for now...)
        max_memory = int(2**30 * max_memory)  # GB to bytes

        min_const_mem, min_per_sample_mem = self.est_mem_requirement(return_min=True)

        log_type = (
            self._log.warning
            if min_const_mem + min_per_sample_mem >= max_memory
            else self._log.info
        )
        log_type(
            '{} memory report: max./avail. {}, min. req. (const./per-sample) ~{}/~{}'.format(
                'GPU'
                if (_torch_cuda_is_available or _torch_mps_is_available)
                else 'CPU',
                ui.gen_memory_str(max_memory),
                ui.gen_memory_str(min_const_mem),
                ui.gen_memory_str(min_per_sample_mem),
            )
        )

        self.max_processes = max_processes

        self.R_d_desc = None
        self._xs_train = nn.Parameter(
            torch.tensor(model['R_desc'], dtype=_dtype).t(), requires_grad=False
        )
        self._Jx_alphas = nn.Parameter(
            torch.tensor(np.array(model['R_d_desc_alpha']), dtype=_dtype),
            requires_grad=False,
        )

        self._alphas_E = None
        if 'alphas_E' in model:
            self._alphas_E = nn.Parameter(
                torch.from_numpy(model['alphas_E'], dtype=_dtype), requires_grad=False
            )

        self.perm_idxs = (
            torch.tensor(model['tril_perms_lin'], dtype=torch.long)
            .view(-1, self.n_perms)
            .t()
        )

        i, j = self.tril_indices
        self.register_buffer(
            'agg_mat', torch.zeros((self.n_atoms, self.dim_d), dtype=torch.int8)
        )
        self.agg_mat[i, range(self.dim_d)] = -1
        self.agg_mat[j, range(self.dim_d)] = 1

        # Try to cache all permutated variants of 'self._xs_train' and 'self._Jx_alphas'
        try:
            self.set_n_perm_batches(n_perm_batches)
        except RuntimeError as e:
            if 'out of memory' in str(e):
                if _torch_cuda_is_available:
                    torch.cuda.empty_cache()

                if n_perm_batches == 1:
                    self.set_n_perm_batches(
                        2
                    )  # Set to 2 perm batches, because that's the first batch size (and fastest) that is not cached.
                    pass
                else:
                    self._log.critical(
                        'Could not allocate enough memory to store model parameters on GPU. There is no hope!'
                    )
                    print()
                    os._exit(1)
            else:
                raise e

        const_mem, per_sample_mem = self.est_mem_requirement(return_min=False)
        _batch_size = (
            max((max_memory - const_mem) // per_sample_mem, 1)
            if batch_size is None
            else batch_size
        )
        max_batch_size = (
            self.n_train // torch.cuda.device_count()
            if _torch_cuda_is_available
            else self.n_train
        )
        _batch_size = min(_batch_size, max_batch_size)

        self._log.debug(
            'Setting batch size to {}/{} points.'.format(_batch_size, self.n_train)
        )

        self.desc = Desc(self.n_atoms, max_processes=max_processes)

    def get_n_perm_batches(self):

        global _n_perm_batches
        return _n_perm_batches

    def set_n_perm_batches(self, n_perm_batches):

        global _n_perm_batches

        self._log.debug(
            'Setting permutation batch size to {}/{}{}.'.format(
                self.n_perms // n_perm_batches,
                self.n_perms,
                ' (no caching)' if n_perm_batches > 1 else '',
            )
        )

        _n_perm_batches = n_perm_batches
        if n_perm_batches == 1 and self.n_perms > 1:
            self.cache_perms()
        else:
            self.uncache_perms()

    def apply_perms_to_obj(self, xs, perm_idxs=None):

        n_perms = 1 if perm_idxs is None else perm_idxs.numel() // self.dim_d
        perm_idxs = (
            slice(None) if perm_idxs is None else perm_idxs
        )  # slice(None) same as [:]

        # might run out of memory here, which will be handled by the caller
        try:
            return xs.repeat(1, n_perms)[:, perm_idxs].reshape(-1, self.dim_d)
        except:
            raise

    def remove_perms_from_obj(self, xs):

        return xs.reshape(self.n_train, -1, self.dim_d)[:, self.idx_id_perm, :].reshape(
            -1, self.dim_d
        )

    def uncache_perms(self):

        xs_train_n_perms = self._xs_train.numel() // (self.n_train * self.dim_d)
        if xs_train_n_perms != 1:  # Uncached already?
            self._xs_train = nn.Parameter(
                self.remove_perms_from_obj(self._xs_train), requires_grad=False
            )

        Jx_alphas_n_perms = self._Jx_alphas.numel() // (self.n_train * self.dim_d)
        if Jx_alphas_n_perms != 1:  # Uncached already?
            self._Jx_alphas = nn.Parameter(
                self.remove_perms_from_obj(self._Jx_alphas), requires_grad=False
            )

    def cache_perms(self):

        xs_train_n_perms = self._xs_train.numel() // (self.n_train * self.dim_d)
        if xs_train_n_perms == 1:  # Cached already?
            self._xs_train = nn.Parameter(
                self.apply_perms_to_obj(self._xs_train, perm_idxs=self.perm_idxs),
                requires_grad=False,
            )

        Jx_alphas_n_perms = self._Jx_alphas.numel() // (self.n_train * self.dim_d)
        if Jx_alphas_n_perms == 1:  # Cached already?
            self._Jx_alphas = nn.Parameter(
                self.apply_perms_to_obj(self._Jx_alphas, perm_idxs=self.perm_idxs),
                requires_grad=False,
            )

    def est_mem_requirement(self, return_min=False):
        """
        Calculate an estimate for the maximum/minimum memory needed to generate
        a prediction for a single geometry.

        Parameters
        ----------
        return_min : boolean, optional
            Return a minimum estimate instead.

        Returns
        -------
        const_mem : int
            Constant memory overhead (bytes) (allocated upon instantiation of the class)
        per_sample_mem : int
            Memory requirement for a single prediction (bytes)
        """

        n_perms_mem = 1 if return_min else self.n_perms

        # Constant memory requirement (bytes)
        const_mem = self.n_train * self.n_atoms * 3  # Rs (all)
        const_mem += n_perms_mem * self.dim_d  # perm_idxs
        const_mem += (
            n_perms_mem * self.n_train * self.dim_d * 2
        )  # _xs_train and _Jx_alphas
        const_mem += self.n_atoms * self.dim_d  # agg_mat
        const_mem *= 8
        const_mem = int(const_mem)

        # Peak memory requirement (bytes)
        per_sample_mem = 2 * self.n_atoms * 3  # Rs (batch), # Fs (batch)
        per_sample_mem += self.n_atoms  # Es (batch)
        per_sample_mem += self.n_atoms**2 * 3  # diffs
        per_sample_mem += self.dim_d  # xs
        per_sample_mem += self.dim_d * n_perms_mem * self.n_train  # x_diffs
        per_sample_mem += (
            4 * n_perms_mem * self.n_train
        )  # x_dists, exp_xs, dot_x_diff_Jx_alphas, exp_xs_1_x_dists
        per_sample_mem *= 8
        per_sample_mem = int(
            2 * per_sample_mem
        )  # HACK!!! Assume double that is needed. Seems to work better, maybe because of fragmentation issues?

        # <class 'torch.Tensor'> torch.Size([21, 118, 3]) # Fs
        # <class 'torch.Tensor'> torch.Size([21]) # Es
        # <class 'torch.Tensor'> torch.Size([21, 118, 3]) # Rs (batch)
        # <class 'torch.Tensor'> torch.Size([21, 118, 118, 3]) # diffs
        # <class 'torch.Tensor'> torch.Size([21, 6903]) # xs
        # <class 'torch.Tensor'> torch.Size([21, 5760, 6903])
        # <class 'torch.Tensor'> torch.Size([21, 5760]) # x_dists
        # <class 'torch.Tensor'> torch.Size([21, 5760]) # exp_xs
        # <class 'torch.Tensor'> torch.Size([21, 5760]) # dot_x_diff_Jx_alphas
        # <class 'torch.Tensor'> torch.Size([21, 5760]) # exp_xs_1_x_dists
        # <class 'torch.Tensor'> torch.Size([96, 6903]) # perm_idxs
        # <class 'torch.nn.parameter.Parameter'> torch.Size([5760, 6903]) # _xs_train
        # <class 'torch.nn.parameter.Parameter'> torch.Size([5760, 6903]) # _Jx_alphas
        # <class 'torch.Tensor'> torch.Size([60, 118, 3]) # Rs (all)

        return const_mem, per_sample_mem

    def set_R_d_desc(self, R_d_desc):
        """
        Set reference to training descriptor Jacobians. They are needed when the
        alpha coefficients are updated during iterative model training.

        This routine will try to move them to the GPU memory, if enough is available.

        Parameters
        ----------
        R_d_desc : :obj:`numpy.ndarray`
            Array containing the Jacobian of the descriptor for
            each training point.
        """

        self.R_d_desc = torch.from_numpy(R_d_desc).type(_dtype)

        # Try moving to GPU memory.
        if _torch_cuda_is_available or _torch_mps_is_available:
            try:
                R_d_desc = self.R_d_desc.to(self._xs_train.device)
            except RuntimeError as e:
                if 'out of memory' in str(e):

                    if _torch_cuda_is_available:
                        torch.cuda.empty_cache()

                    self._log.debug('Failed to cache \'R_d_desc\' on GPU.')
                else:
                    raise e
            else:
                self.R_d_desc = R_d_desc

    def set_alphas(self, alphas, alphas_E=None):
        """
        Reconfigure the current model with a new set of regression parameters.

        This routine is used during iterative model training.

        Parameters
        ----------
                alphas : :obj:`numpy.ndarray`
                    1D array containing the new model parameters.
                alphas_E : :obj:`numpy.ndarray`, optional
                    1D array containing the additional new model parameters, if
                    energy constraints are used in the kernel (`use_E_cstr=True`)
        """

        global _n_perm_batches

        if self.R_d_desc is None:
            self._log.critical(
                'The function \'set_alphas()\' requires \'R_d_desc\' to be set beforehand!'
            )
            print()
            os._exit(1)

        if alphas_E is not None:
            self._alphas_E = nn.Parameter(
                torch.from_numpy(alphas_E).to(self._xs_train.device).type(_dtype),
                requires_grad=False,
            )

        del self._Jx_alphas
        while True:
            try:

                alphas_torch = (
                    torch.from_numpy(alphas).type(_dtype).to(self.R_d_desc.device)
                )  # Send to whatever device 'R_d_desc' is on, first.
                xs = self.desc.d_desc_dot_vec(
                    self.R_d_desc, alphas_torch.reshape(-1, self.dim_i)
                )
                del alphas_torch

                if (_torch_cuda_is_available and not xs.is_cuda) or (
                    _torch_mps_is_available and not xs.is_mps
                ):
                    xs = xs.to(
                        self._xs_train.device
                    )  # Only now send it to the GPU ('_xs_train' will be for sure, if GPUs are available)

            except RuntimeError as e:
                if 'out of memory' in str(e):

                    if _torch_cuda_is_available or _torch_mps_is_available:

                        if _torch_cuda_is_available:
                            torch.cuda.empty_cache()

                        self.R_d_desc = self.R_d_desc.cpu()

                        self._log.debug(
                            'Failed to \'set_alphas()\': \'R_d_desc\' was moved back from GPU to CPU'
                        )

                        pass

                    else:

                        self._log.critical(
                            'Not enough memory to cache \'R_d_desc\'! There nothing we can do...'
                        )
                        print()
                        os._exit(1)

                else:
                    raise e
            else:
                break

        try:

            perm_idxs = self.perm_idxs if _n_perm_batches == 1 else None
            self._Jx_alphas = nn.Parameter(
                self.apply_perms_to_obj(xs, perm_idxs=perm_idxs), requires_grad=False
            )

        except RuntimeError as e:
            if 'out of memory' in str(e):
                if torch.cuda.is_available():
                    torch.cuda.empty_cache()

                if _n_perm_batches < self.n_perms:

                    self._log.debug(
                        'Setting permutation batch size to {}/{}{}.'.format(
                            self.n_perms // n_perm_batches,
                            self.n_perms,
                            ' (no caching)' if n_perm_batches > 1 else '',
                        )
                    )

                    _n_perm_batches += 1  # Do NOT change me to use 'self.set_n_perm_batches(_n_perm_batches + 1)'!
                    self._xs_train = nn.Parameter(
                        self.remove_perms_from_obj(self._xs_train), requires_grad=False
                    )  # Remove any permutations from 'self._xs_train'.
                    self._Jx_alphas = nn.Parameter(
                        self.apply_perms_to_obj(xs, perm_idxs=None), requires_grad=False
                    )  # Set 'self._Jx_alphas' without applying permutations.

                else:
                    self._log.critical(
                        'Could not allocate enough memory to set new alphas in model.'
                    )
                    print()
                    os._exit(1)
            else:
                raise e

    def _forward(self, Rs_or_train_idxs, return_E=True):

        global _n_perm_batches

        q = np.sqrt(5) / self._sig
        i, j = self.tril_indices

        is_train_pred = Rs_or_train_idxs.dim() == 1
        if not is_train_pred:  # Rs

            Rs = Rs_or_train_idxs.type(_dtype)
            diffs = Rs[:, :, None, :] - Rs[:, None, :, :]  # N, a, a, 3
            diffs = diffs[:, i, j, :]  # N, d, 3

            if self._lat_and_inv is not None:

                diffs_shape = diffs.shape
                # diffs = self.desc.pbc_diff(diffs.reshape(-1, 3), self._lat_and_inv).reshape(
                #    diffs_shape
                # )

                lat, lat_inv = self._lat_and_inv
                if lat.device != Rs.device:
                    lat = lat.to(Rs.device)
                    lat_inv = lat_inv.to(Rs.device)

                diffs = diffs.reshape(-1, 3)

                c = lat_inv.mm(diffs.t())
                diffs -= lat.mm(c.round()).t()

                diffs = diffs.reshape(diffs_shape)

            xs = 1 / diffs.norm(dim=-1)  # N, d

            diffs *= xs[:, :, None] ** 3
            Jxs = diffs
            del diffs

        else:  # xs_train

            train_idxs = Rs_or_train_idxs
            
            # Get index of identity permutation, depending on caching configuration.
            xs_train_n_perms = self._xs_train.numel() // (self.n_train * self.dim_d)
            idx_id_perm = 0 if xs_train_n_perms == 1 else self.idx_id_perm

            xs = self._xs_train.reshape(self.n_train, -1, self.dim_d)[
                train_idxs, idx_id_perm, :
            ]  # ignore permutations

            train_idxs = train_idxs.to(self.R_d_desc.device) # 'train_idxs' should be on the same device with 'R_d_desc'

            Jxs = self.R_d_desc[train_idxs, :, :].to(
                xs.device
            )  # 'R_d_desc' might be living on the CPU...

        # current:
        # diffs: N, a, a, 3
        # xs: # N, d

        Fs_x = torch.zeros(xs.shape, device=xs.device, dtype=xs.dtype)
        Es = (
            torch.zeros((xs.shape[0],), device=xs.device, dtype=xs.dtype)
            if return_E
            else None
        )

        n_perms_done = 0
        for perm_batch in np.array_split(np.arange(self.n_perms), _n_perm_batches):

            if _n_perm_batches == 1:
                xs_train_perm_split = self._xs_train
                Jx_alphas_perm_split = self._Jx_alphas
            else:
                perm_idxs_batch = (
                    self.perm_idxs[perm_batch, :] - n_perms_done * self.dim_d
                )  # index shift
                xs_train_perm_split = self.apply_perms_to_obj(
                    self._xs_train, perm_idxs=perm_idxs_batch
                )
                Jx_alphas_perm_split = self.apply_perms_to_obj(
                    self._Jx_alphas, perm_idxs=perm_idxs_batch
                )

            n_perms_done += len(perm_batch)

            x_diffs = q * (
                xs[:, None, :] - xs_train_perm_split
            )  # N, n_perms*N_train, d
            x_dists = x_diffs.norm(dim=-1)  # N, n_perms*N

            exp_xs = torch.exp(-x_dists) * (q**2) / 3  # N, n_perms
            exp_xs_1_x_dists = exp_xs * (1 + x_dists)  # N, n_perms*N_train

            if self._alphas_E is None:
                del x_dists

            dot_x_diff_Jx_alphas = torch.einsum(
                'ij...,j...->ij', x_diffs, Jx_alphas_perm_split
            )  # N, n_perms*N_train

            # Fs_x = ((exp_xs * dot_x_diff_Jx_alphas)[..., None] * x_diffs).sum(dim=1)
            Fs_x += torch.einsum(  # NOTE ! Fs_x = Fs_x + torch.einsum(
                '...j,...j,...jk', exp_xs, dot_x_diff_Jx_alphas, x_diffs
            )  # N, d
            del exp_xs

            if self._alphas_E is None:
                del x_diffs

            # current:
            # diffs: N, a, a, 3
            # xs: # N, d
            # x_diffs: # N, n_perms*N_train, d
            # x_dists: # N, n_perms*N_train
            # exp_xs: # N, n_perms*N_train
            # dot_x_diff_Jx_alphas: N, n_perms*N_train
            # exp_xs_1_x_dists: N, n_perms*N_train
            # Fs_x: N, d

            Fs_x -= exp_xs_1_x_dists.mm(Jx_alphas_perm_split)  # N, d

            if return_E:
                Es += (
                    torch.einsum('...j,...j', exp_xs_1_x_dists, dot_x_diff_Jx_alphas)
                    / q
                )

            del dot_x_diff_Jx_alphas

            if self._alphas_E is None:
                del exp_xs_1_x_dists

            # Note: Energies are automatically predicted with a flipped sign here (because -E are trained, instead of E)
            if self._alphas_E is not None:

                K_fe = (x_diffs / q) * exp_xs_1_x_dists[:, :, None]
                del exp_xs_1_x_dists
                del x_diffs

                K_fe = K_fe.reshape(-1, self.n_train, len(perm_batch), self.dim_d)
                Fs_x += torch.einsum('j,...jkl->...l', self._alphas_E, K_fe)
                del K_fe

                K_ee = (1 + x_dists * (1 + x_dists / 3)) * torch.exp(-x_dists)
                del x_dists

                K_ee = K_ee.reshape(-1, self.n_train, len(perm_batch))
                Es += torch.einsum('j,...jk->...', self._alphas_E, K_ee)
                del K_ee

        # current:
        # diffs: N, a, a, 3
        # xs: # N, d
        # x_dists: # N, n_perms*N
        # dot_x_diff_Jx_alphas: N, n_perms*N
        # exp_xs_1_x_dists: N, n_perms*N
        # Fs_x: N, d

        Fs = torch.einsum('ji,...ik,...i->...jk', self.agg_mat.double(), Jxs, Fs_x)

        if not is_train_pred:  # TODO: set std to zero in training mode?
            Fs *= self._std

        if return_E:
            Es *= self._std
            Es += self._c

        return Es, Fs

    def forward(self, Rs_or_train_idxs, return_E=True):
        """
        Predict energy and forces for a batch of geometries.

        Parameters
        ----------
        Rs_or_train_idxs : :obj:`torch.Tensor`
            (dims M x N x 3) Cartesian coordinates of M molecules composed of N atoms or
            (dims N) index list of training points to evaluate. Note that `self.R_d_desc`
            needs to be set for the latter to work.
        return_E : boolean, optional
            If false (default: true), only the forces are returned.

        Returns
        -------
        E : :obj:`torch.Tensor`
            (dims M) Molecular energies (unless `return_E == False`)
        F : :obj:`torch.Tensor`
            (dims M x N x 3) Nuclear gradients of the energy
        """

        global _batch_size, _n_perm_batches

        # if Rs_or_train_idxs.dim() == 1:
        #    # contains index list. return predictions for these training points
        #    dtype = self.R_d_desc.dtype
        # elif Rs_or_train_idxs.dim() == 3:
        # this is real data

        #    assert Rs_or_train_idxs.shape[1:] == (self.n_atoms, 3)
        #    Rs_or_train_idxs = Rs_or_train_idxs.double()
        #    dtype = Rs_or_train_idxs.dtype

        # else:
        #    # unknown input
        #    self._log.critical('Invalid input for \'Rs_or_train_idxs\'.')
        #    print()
        #    os._exit(1)

        while True:
            try:
                Es, Fs = zip(
                    *map(
                        partial(self._forward, return_E=return_E),
                        DataLoader(Rs_or_train_idxs, batch_size=_batch_size),
                    )
                )
            except RuntimeError as e:
                if 'out of memory' in str(e):
                    if torch.cuda.is_available():
                        torch.cuda.empty_cache()

                    if _batch_size > 1:

                        self._log.debug(
                            'Setting batch size to {}/{} points.'.format(
                                _batch_size, self.n_train
                            )
                        )
                        _batch_size -= 1

                    elif _n_perm_batches < self.n_perms:
                        n_perm_batches = _next_batch_size(self.n_perms, _n_perm_batches)
                        self.set_n_perm_batches(n_perm_batches)

                    else:
                        self._log.critical(
                            'Could not allocate enough (GPU) memory to evaluate model, despite reducing batch size.'
                        )
                        print()
                        os._exit(1)
                else:
                    raise e
            else:
                break

        ret = (torch.cat(Fs),)
        if return_E:
            ret = (torch.cat(Es),) + ret

        return ret


================================================
FILE: sgdml/train.py
================================================
"""
This module contains all routines for training GDML and sGDML models.
"""

# MIT License
#
# Copyright (c) 2018-2022 Stefan Chmiela
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

from __future__ import print_function

import sys
import os
import logging
import psutil

import multiprocessing as mp

Pool = mp.get_context('fork').Pool

import timeit
from functools import partial

import numpy as np

try:
    import torch
except ImportError:
    _has_torch = False
else:
    _has_torch = True

try:
    _torch_mps_is_available = torch.backends.mps.is_available()
except AttributeError:
    _torch_mps_is_available = False
_torch_mps_is_available = False

try:
    _torch_cuda_is_available = torch.cuda.is_available()
except AttributeError:
    _torch_cuda_is_available = False

from . import __version__, DONE, NOT_DONE
from .solvers.analytic import Analytic

# TODO: remove exception handling once iterative solver ships
try:
    from .solvers.iterative import Iterative
except ImportError:
    pass

from .predict import GDMLPredict
from .utils.desc import Desc
from .utils import io, perm, ui


def _share_array(arr_np, typecode_or_type):
    """
    Return a ctypes array allocated from shared memory with data from a
    NumPy array.

    Parameters
    ----------
        arr_np : :obj:`numpy.ndarray`
            NumPy array.
        typecode_or_type : char or :obj:`ctype`
            Either a ctypes type or a one character typecode of the
            kind used by the Python array module.

    Returns
    -------
        array of :obj:`ctype`
    """

    arr = mp.RawArray(typecode_or_type, arr_np.ravel())
    return arr, arr_np.shape


def _assemble_kernel_mat_wkr(
    j, tril_perms_lin, sig, use_E_cstr=False, exploit_sym=False, cols_m_limit=None
):
    r"""
    Compute one row and column of the force field kernel matrix.

    The Hessian of the Matern kernel is used with n = 2 (twice
    differentiable). Each row and column consists of matrix-valued
    blocks, which encode the interaction of one training point with all
    others. The result is stored in shared memory (a global variable).

    Parameters
    ----------
        j : int
            Index of training point.
        tril_perms_lin : :obj:`numpy.ndarray`
            1D array (int) containing all recovered permutations
            expanded as one large permutation to be applied to a tiled
            copy of the object to be permuted.
        sig : int
            Hyper-parameter :math:`\sigma`.
        use_E_cstr : bool, optional
            True: include energy constraints in the kernel,
            False: default (s)GDML kernel.
        exploit_sym : boolean, optional
            Do not create symmetric entries of the kernel matrix twice
            (this only works for spectific inputs for `cols_m_limit`)
        cols_m_limit : int, optional
            Limit the number of columns (include training points 1-`M`).
            Note that each training points consists of multiple columns.

    Returns
    -------
        int
            Number of kernel matrix blocks created, divided by 2
            (symmetric blocks are always created at together).
    """

    global glob

    R_desc = np.frombuffer(glob['R_desc']).reshape(glob['R_desc_shape'])
    R_d_desc = np.frombuffer(glob['R_d_desc']).reshape(glob['R_d_desc_shape'])
    K = np.frombuffer(glob['K']).reshape(glob['K_shape'])

    desc_func = glob['desc_func']

    n_train, dim_d = R_d_desc.shape[:2]
    n_atoms = int((1 + np.sqrt(8 * dim_d + 1)) / 2)
    dim_i = 3 * n_atoms
    n_perms = int(len(tril_perms_lin) / dim_d)

    if type(j) is tuple:  # Selective/"fancy" indexing
        (
            K_j,
            j,
            keep_idxs_3n,
        ) = j  # (block index in final K, block index global, indices of partials within block)
        blk_j = slice(K_j, K_j + len(keep_idxs_3n))

    else:  # Sequential indexing
        K_j = j * dim_i if j < n_train else n_train * dim_i + (j % n_train)
        blk_j = slice(K_j, K_j + dim_i) if j < n_train else slice(K_j, K_j + 1)
        keep_idxs_3n = slice(None)  # same as [:]

    # Note: The modulo-operator wraps around the index pointer on the training points when
    # energy constraints are used in the kernel. In that case each point is accessed twice.

    # Create permutated variants of 'rj_desc' and 'rj_d_desc'.
    rj_desc_perms = np.reshape(
        np.tile(R_desc[j % n_train, :], n_perms)[tril_perms_lin],
        (n_perms, -1),
        order='F',
    )

    rj_d_desc = desc_func.d_desc_from_comp(R_d_desc[j % n_train, :, :])[0][
        :, keep_idxs_3n
    ]  # convert descriptor back to full representation

    rj_d_desc_perms = np.reshape(
        np.tile(rj_d_desc.T, n_perms)[:, tril_perms_lin], (-1, dim_d, n_perms)
    )

    mat52_base_div = 3 * sig**4
    sqrt5 = np.sqrt(5.0)
    sig_pow2 = sig**2

    dim_i_keep = rj_d_desc.shape[1]
    diff_ab_outer_perms = np.empty((dim_d, dim_i_keep))
    diff_ab_perms = np.empty((n_perms, dim_d))
    ri_d_desc = np.zeros((1, dim_d, dim_i))  # must be zeros!
    k = np.empty((dim_i, dim_i_keep))

    if (
        j < n_train
    ):  # This column only contrains second and first derivative constraints.

        # for i in range(j if exploit_sym else 0, n_train):
        for i in range(0, n_train):

            blk_i = slice(i * dim_i, (i + 1) * dim_i)

            # diff_ab_perms = R_desc[i, :] - rj_desc_perms
            np.subtract(R_desc[i, :], rj_desc_perms, out=diff_ab_perms)

            norm_ab_perms = sqrt5 * np.linalg.norm(diff_ab_perms, axis=1)
            mat52_base_perms = np.exp(-norm_ab_perms / sig) / mat52_base_div * 5

            # diff_ab_outer_perms = 5 * np.einsum(
            #    'ki,kj->ij',
            #    diff_ab_perms * mat52_base_perms[:, None],
            #    np.einsum('ik,jki -> ij', diff_ab_perms, rj_d_desc_perms)
            # )
            np.einsum(
                'ki,kj->ij',
                diff_ab_perms * mat52_base_perms[:, None] * 5,
                np.einsum('ki,jik -> kj', diff_ab_perms, rj_d_desc_perms),
                out=diff_ab_outer_perms,
            )

            diff_ab_outer_perms -= np.einsum(
                'ikj,j->ki',
                rj_d_desc_perms,
                (sig_pow2 + sig * norm_ab_perms) * mat52_base_perms,
            )

            # ri_d_desc = desc_func.d_desc_from_comp(R_d_desc[i, :, :])[0]
            desc_func.d_desc_from_comp(R_d_desc[i, :, :], out=ri_d_desc)

            # K[blk_i, blk_j] = ri_d_desc[0].T.dot(diff_ab_outer_perms)
            np.dot(ri_d_desc[0].T, diff_ab_outer_perms, out=k)
            K[blk_i, blk_j] = k

            if exploit_sym and (
                cols_m_limit is None or i < cols_m_limit
            ):  # this will never be called with 'keep_idxs_3n' set to anything else than [:]
                K[blk_j, blk_i] = K[blk_i, blk_j].T

            # First derivative constraints
            if use_E_cstr:

                K_fe = (
                    5
                    * diff_ab_perms
                    / (3 * sig**3)
                    * (norm_ab_perms[:, None] + sig)
                    * np.exp(-norm_ab_perms / sig)[:, None]
                )

                K_fe = -np.einsum('ik,jki -> j', K_fe, rj_d_desc_perms)

                E_off_i = n_train * dim_i  # , K.shape[1] - n_train
                K[E_off_i + i, blk_j] = K_fe

    else:

        if use_E_cstr:

            # rj_d_desc = desc_func.d_desc_from_comp(R_d_desc[j % n_train, :, :])[0][
            #    :, :
            # ]  # convert descriptor back to full representation

            # rj_d_desc_perms = np.reshape(
            #    np.tile(rj_d_desc.T, n_perms)[:, tril_perms_lin], (-1, dim_d, n_perms)
            # )

            E_off_i = n_train * dim_i  # Account for 'alloc_extra_rows'!.
            # blk_j_full = slice((j % n_train) * dim_i, ((j % n_train) + 1) * dim_i)
            # for i in range((j % n_train) if exploit_sym else 0, n_train):
            for i in range(0, n_train):

                ri_desc_perms = np.reshape(
                    np.tile(R_desc[i, :], n_perms)[tril_perms_lin],
                    (n_perms, -1),
                    order='F',
                )

                ri_d_desc = desc_func.d_desc_from_comp(R_d_desc[i, :, :])[
                    0
                ]  # convert descriptor back to full representation
                ri_d_desc_perms = np.reshape(
                    np.tile(ri_d_desc.T, n_perms)[:, tril_perms_lin],
                    (-1, dim_d, n_perms),
                )

                diff_ab_perms = R_desc[j % n_train, :] - ri_desc_perms

                norm_ab_perms = sqrt5 * np.linalg.norm(diff_ab_perms, axis=1)

                K_fe = (
                    5
                    * diff_ab_perms
                    / (3 * sig**3)
                    * (norm_ab_perms[:, None] + sig)
                    * np.exp(-norm_ab_perms / sig)[:, None]
                )

                K_fe = -np.einsum('ik,jki -> j', K_fe, ri_d_desc_perms)

                blk_i_full = slice(i * dim_i, (i + 1) * dim_i)
                K[blk_i_full, K_j] = K_fe  # vertical

                K[E_off_i + i, K_j] = -(
                    1 + (norm_ab_perms / sig) * (1 + norm_ab_perms / (3 * sig))
                ).dot(np.exp(-norm_ab_perms / sig))

    return blk_j.stop - blk_j.start


class GDMLTrain(object):
    def __init__(self, max_memory=None, max_processes=None, use_torch=False):
        """
        Train sGDML force fields.

        This class is used to train models using different closed-form
        and numerical solvers. GPU support is provided
        through PyTorch (requires optional `torch` dependency to be
        installed) for some solvers.

        Parameters
        ----------
                max_memory : int, optional
                        Limit the max. memory usage [GB]. This is only a
                        soft limit that can not always be enforced.
                max_processes : int, optional
                        Limit the max. number of processes. Otherwise
                        all CPU cores are used. This parameters has no
                        effect if `use_torch=True`
                use_torch : boolean, optional
                        Use PyTorch to calculate predictions (if
                        supported by solver)

        Raises
        ------
            Exception
                If multiple instsances of this class are created.
            ImportError
                If the optional PyTorch dependency is missing, but PyTorch features are used.
        """

        global glob
        if 'glob' not in globals():  # Don't allow more than one instance of this class.
            glob = {}
        else:
            raise Exception(
                'You can not create multiple instances of this class. Please reuse your first one.'
            )

        self.log = logging.getLogger(__name__)

        total_memory = psutil.virtual_memory().total // 2**30  # bytes to GB)
        self._max_memory = (
            min(max_memory, total_memory) if max_memory is not None else total_memory
        )

        total_cpus = mp.cpu_count()
        self._max_processes = (
            min(max_processes, total_cpus) if max_processes is not None else total_cpus
        )

        self._use_torch = use_torch

        if use_torch and not _has_torch:
            raise ImportError(
                'Optional PyTorch dependency not found! Please run \'pip install sgdml[torch]\' to install it or disable the PyTorch option.'
            )

    def __del__(self):

        global glob

        if 'glob' in globals():
            del glob

    def create_task(
        self,
        train_dataset,
        n_train,
        valid_dataset,
        n_valid,
        sig,
        lam=1e-10,
        perms=None,
        use_sym=True,
        use_E=True,
        use_E_cstr=False,
        callback=None,  # TODO: document me
    ):
        """
        Create a data structure of custom type `task`.

        These data structures serve as recipes for model creation,
        summarizing the configuration of one particular training run.
        Training and test points are sampled from the provided dataset,
        without replacement. If the same dataset if given for training
        and testing, the subsets are drawn without overlap.

        Each task also contains a choice for the hyper-parameters of the
        training process and the MD5 fingerprints of the used datasets.

        Parameters
        ----------
            train_dataset : :obj:`dict`
                Data structure of custom type :obj:`dataset` containing
                train dataset.
            n_train : int
                Number of training points to sample.
            valid_dataset : :obj:`dict`
                Data structure of custom type :obj:`dataset` containing
                validation dataset.
            n_valid : int
                Number of validation points to sample.
            sig : int
                Hyper-parameter (kernel length scale).
            lam : float, optional
                Hyper-parameter lambda (regularization strength).
            perms : :obj:`numpy.ndarray`, optional
                An 2D array of size P x N containing P possible permutations
                of the N atoms in the system. This argument takes priority over the ones
                provided in the trainig dataset. No automatic discovery is run when this
                argument is provided.
            use_sym : bool, optional
                True: include symmetries (sGDML), False: GDML.
            use_E : bool, optional
                True: reconstruct force field with corresponding potential energy surface,
                False: ignore energy during training, even if energy labels are available
                       in the dataset. The trained model will still be able to predict
                       energies up to an unknown integration constant. Note, that the
                       energy predictions accuracy will be untested.
            use_E_cstr : bool, optional
                True: include energy constraints in the kernel,
                False: default (s)GDML.
            callback : callable, optional
                Progress callback function that takes three
                arguments:
                    current : int
                        Current progress.
                    total : int
                        Task size.
                    done_str : :obj:`str`, optional
                        Once complete, this string is shown.

        Returns
        -------
            dict
                Data structure of custom type :obj:`task`.

        Raises
        ------
            ValueError
                If a reconstruction of the potential energy surface is requested,
                but the energy labels are missing in the dataset.
        """

        if use_E and 'E' not in train_dataset:
            raise ValueError(
                'No energy labels found in dataset!\n'
                + 'By default, force fields are always reconstructed including the\n'
                + 'corresponding potential energy surface (this can be turned off).\n'
                + 'However, the energy labels are missing in the provided dataset.\n'
            )

        use_E_cstr = use_E and use_E_cstr

        n_atoms = train_dataset['R'].shape[1]

        if callback is not None:
            callback = partial(callback, disp_str='Hashing dataset(s)')
            callback(NOT_DONE)

        md5_train = io.dataset_md5(train_dataset)
        md5_valid = io.dataset_md5(valid_dataset)

        if callback is not None:
            callback(DONE)

        if callback is not None:
            callback = partial(
                callback, disp_str='Sampling training and validation subsets'
            )
            callback(NOT_DONE)

        if 'E' in train_dataset:
            idxs_train = self.draw_strat_sample(train_dataset['E'], n_train)
        else:
            idxs_train = np.random.choice(
                np.arange(train_dataset['F'].shape[0]),
                n_train,
                replace=False,
            )

        excl_idxs = (
            idxs_train if md5_train == md5_valid else np.array([], dtype=np.uint)
        )

        if 'E' in valid_dataset:
            idxs_valid = self.draw_strat_sample(
                valid_dataset['E'],
                n_valid,
                excl_idxs=excl_idxs,
            )
        else:
            idxs_valid_cands = np.setdiff1d(
                np.arange(valid_dataset['F'].shape[0]), excl_idxs, assume_unique=True
            )
            idxs_valid = np.random.choice(idxs_valid_cands, n_valid, replace=False)

        if callback is not None:
            callback(DONE)

        R_train = train_dataset['R'][idxs_train, :, :]
        task = {
            'type': 't',
            'code_version': __version__,
            'dataset_name': train_dataset['name'].astype(str),
            'dataset_theory': train_dataset['theory'].astype(str),
            'z': train_dataset['z'],
            'R_train': R_train,
            'F_train': train_dataset['F'][idxs_train, :, :],
            'idxs_train': idxs_train,
            'md5_train': md5_train,
            'idxs_valid': idxs_valid,
            'md5_valid': md5_valid,
            'sig': sig,
            'lam': lam,
            'use_E': use_E,
            'use_E_cstr': use_E_cstr,
            'use_sym': use_sym,
        }

        if use_E:
            task['E_train'] = train_dataset['E'][idxs_train]

        lat_and_inv = None
        if 'lattice' in train_dataset:
            task['lattice'] = train_dataset['lattice']

            try:
                lat_and_inv = (task['lattice'], np.linalg.inv(task['lattice']))
            except np.linalg.LinAlgError:
                raise ValueError(  # TODO: Document me
                    'Provided dataset contains invalid lattice vectors (not invertible). Note: Only rank 3 lattice vector matrices are supported.'
                )

        if 'r_unit' in train_dataset and 'e_unit' in train_dataset:
            task['r_unit'] = train_dataset['r_unit']
            task['e_unit'] = train_dataset['e_unit']

        if use_sym:

            # No permuations provided externally.
            if perms is None:

                if (
                    'perms' in train_dataset
                ):  # take perms from training dataset, if available

                    n_perms = train_dataset['perms'].shape[0]
                    self.log.info(
                        'Using {:d} permutations included in dataset.'.format(n_perms)
                    )

                    task['perms'] = train_dataset['perms']

                else:  # find perms from scratch

                    n_train = R_train.shape[0]
                    R_train_sync_mat = R_train
                    if n_train > 1000:
                        R_train_sync_mat = R_train[
                            np.random.choice(n_train, 1000, replace=False), :, :
                        ]
                        self.log.info(
                            'Symmetry search has been restricted to a random subset of 1000/{:d} training points for faster convergence.'.format(
                                n_train
                            )
                        )

                    # TOOD: PBCs disabled when matching (for now).
                    # task['perms'] = perm.find_perms(
                    #    R_train_sync_mat, train_dataset['z'], lat_and_inv=lat_and_inv, max_processes=self._max_processes,
                    # )
                    task['perms'] = perm.find_perms(
                        R_train_sync_mat,
                        train_dataset['z'],
                        # lat_and_inv=None,
                        lat_and_inv=lat_and_inv,
                        callback=callback,
                        max_processes=self._max_processes,
                    )

                    # NEW

                    USE_EXTRA_PERMS = False

                    if USE_EXTRA_PERMS:
                        task['perms'] = perm.find_extra_perms(
                            R_train_sync_mat,
                            train_dataset['z'],
                            # lat_and_inv=None,
                            lat_and_inv=lat_and_inv,
                            callback=callback,
                            max_processes=self._max_processes,
                        )

                    # NEW

                    # NEW

                    USE_FRAG_PERMS = False

                    if USE_FRAG_PERMS:
                        frag_perms = perm.find_frag_perms(
                            R_train_sync_mat,
                            train_dataset['z'],
                            lat_and_inv=lat_and_inv,
                            max_processes=self._max_processes,
                        )
                        task['perms'] = np.vstack((task['perms'], frag_perms))
                        task['perms'] = np.unique(task['perms'], axis=0)

                        print(
                            '| Keeping '
                            + str(task['perms'].shape[0])
                            + ' unique permutations.'
                        )

                    # NEW

            else:  # use provided perms

                n_atoms = len(task['z'])
                n_perms, perms_len = perms.shape

                if perms_len != n_atoms:
                    raise ValueError(  # TODO: Document me
                        'Provided permutations do not match the number of atoms in dataset.'
                    )
                else:

                    self.log.info(
                        'Using {:d} externally provided permutations.'.format(n_perms)
                    )

                    task['perms'] = perms

        else:
            task['perms'] = np.arange(train_dataset['R'].shape[1])[
                None, :
            ]  # no symmetries

        return task

    def create_task_from_model(self, model, dataset):
        """
        Create a data structure of custom type `task` from existing
        an structure of custom type `model`. This method is used to
        resume training of unconverged models.

        Any hyperparameter (including all symmetry permutations) in the
        provided model file is reused without further optimization. The
        current linear coeffiecient are used as starting point for the
        iterative training procedure.

        Parameters
        ----------
            model : :obj:`dict`
                Data structure of custom type :obj:`model` based on which
                to create the training task.
            dataset : :obj:`dict`
                Data structure of custom type :obj:`dataset` containing
                the original dataset from which the provided model emerged.

        Returns
        -------
            dict
                Data structure of custom type :obj:`task`.
        """

        idxs_train = model['idxs_train']
        R_train = dataset['R'][idxs_train, :, :]
        F_train = dataset['F'][idxs_train, :, :]

        use_E = 'e_err' in model
        use_E_cstr = 'alphas_E' in model
        use_sym = model['perms'].shape[0] > 1

        task = {
            'type': 't',
            'code_version': __version__,
            'dataset_name': model['dataset_name'],
            'dataset_theory': model['dataset_theory'],
            'z': model['z'],
            'R_train': R_train,
            'F_train': F_train,
            'idxs_train': idxs_train,
            'md5_train': model['md5_train'],
            'idxs_valid': model['idxs_valid'],
            'md5_valid': model['md5_valid'],
            'sig': model['sig'],
            'lam': model['lam'],
            'use_E': model['use_E'],
            'use_E_cstr': use_E_cstr,
            'use_sym': use_sym,
            'perms': model['perms'],
        }

        if use_E:
            task['E_train'] = dataset['E'][idxs_train]

        if 'lattice' in model:
            task['lattice'] = model['lattice']

        if 'r_unit' in model and 'e_unit' in model:
            task['r_unit'] = model['r_unit']
            task['e_unit'] = model['e_unit']

        if 'alphas_F' in model:
            task['alphas0_F'] = model['alphas_F']

        if 'alphas_E' in model:
            task['alphas0_E'] = model['alphas_E']

        if 'solver_iters' in model:
            task['solver_iters'] = model['solver_iters']

        if 'inducing_pts_idxs' in model:
            task['inducing_pts_idxs'] = model['inducing_pts_idxs']

        return task

    def create_model(
        self,
        task,
        solver,
        R_desc,
        R_d_desc,
        tril_perms_lin,
        std,
        alphas_F,
        alphas_E=None,
    ):
        """
        Create a data structure of custom type `model`.

        These data structures contain the trained model are everything
        that is needed to generate predictions for new inputs.

        Each task also contains the MD5 fingerprints of the used datasets.

        Parameters
        ----------
            task : :obj:`dict`
                Data structure of custom type :obj:`task` from which
                the model emerged.
            solver : :obj:`str`
                Identifier string for the solver that has been used to
                train this model.
            R_desc : :obj:`numpy.ndarray`, optional
                    An 2D array of size M x D containing the
                    descriptors of dimension D for M
                    molecules.
            R_d_desc : :obj:`numpy.ndarray`, optional
                    A 2D array of size M x D x 3N containing of the
                    descriptor Jacobians for M molecules. The descriptor
                    has dimension D with 3N partial derivatives with
                    respect to the 3N Cartesian coordinates of each atom.
            tril_perms_lin : :obj:`numpy.ndarray`
                1D array containing all recovered permutations
                expanded as one large permutation to be applied to a
                tiled copy of the object to be permuted.
            std : float
                Standard deviation of the training labels.
            alphas_F : :obj:`numpy.ndarray`
                    A 1D array of size 3NM containing of the linear
                    coefficients that correspond to the force constraints.
            alphas_E : :obj:`numpy.ndarray`, optional
                    A 1D array of size N containing of the linear
                    coefficients that correspond to the energy constraints.

        Returns
        -------
            dict
                Data structure of custom type :obj:`model`.
        """

        n_train, dim_d = R_d_desc.shape[:2]
        n_atoms = int((1 + np.sqrt(8 * dim_d + 1)) / 2)

        desc = Desc(
            n_atoms,
            max_processes=self._max_processes,
        )

        dim_i = desc.dim_i
        R_d_desc_alpha = desc.d_desc_dot_vec(R_d_desc, alphas_F.reshape(-1, dim_i))

        model = {
            'type': 'm',
            'code_version': __version__,
            'dataset_name': task['dataset_name'],
            'dataset_theory': task['dataset_theory'],
            'solver_name': solver,
            'z': task['z'],
            'idxs_train': task['idxs_train'],
            'md5_train': task['md5_train'],
            'idxs_valid': task['idxs_valid'],
            'md5_valid': task['md5_valid'],
            'n_test': 0,
            'md5_test': None,
            'f_err': {'mae': np.nan, 'rmse': np.nan},
            'R_desc': R_desc.T,
            'R_d_desc_alpha': R_d_desc_alpha,
            'c': 0.0,
            'std': std,
            'sig': task['sig'],
            'lam': task['lam'],
            'alphas_F': alphas_F,
            'perms': task['perms'],
            'tril_perms_lin': tril_perms_lin,
            'use_E': task['use_E'],
        }

        if task['use_E']:
            model['e_err'] = {'mae': np.nan, 'rmse': np.nan}

            if task['use_E_cstr']:
                model['alphas_E'] = alphas_E

        if 'lattice' in task:
            model['lattice'] = task['lattice']

        if 'r_unit' in task and 'e_unit' in task:
            model['r_unit'] = task['r_unit']
            model['e_unit'] = task['e_unit']

        return model

    # from memory_profiler import profile
    # @profile
    def train(  # noqa: C901
        self,
        task,
        save_progr_callback=None,  # TODO: document me
        callback=None,
    ):
        """
        Train a model based on a training task.

        Parameters
        ----------
            task : :obj:`dict`
                Data structure of custom type :obj:`task`.
            desc_callback : callable, optional
                Descriptor and descriptor Jacobian generation status.
                    current : int
                        Current progress (number of completed descriptors).
                    total : int
                        Task size (total number of descriptors to create).
                    done_str : :obj:`str`, optional
                        Once complete, this string contains the
                        time it took complete this task (seconds).
            ker_progr_callback : callable, optional
                Kernel assembly progress function that takes three
                arguments:
                    current : int
                        Current progress (number of completed entries).
                    total : int
                        Task size (total number of entries to create).
                    done_str : :obj:`str`, optional
                        Once complete, this string contains the
                        time it took to assemble the kernel (seconds).
            solve_callback : callable, optional
                Linear system solver status.
                    done : bool
                        False when solver starts, True when it finishes.
                    done_str : :obj:`str`, optional
                        Once done, this string contains the runtime
                        of the solver (seconds).

        Returns
        -------
            :obj:`dict`
                Data structure of custom type :obj:`model`.

        Raises
        ------
            ValueError
                If the provided dataset contains invalid lattice
                vectors.
        """

        task = dict(task)  # make mutable

        n_train, n_atoms = task['R_train'].shape[:2]

        desc = Desc(
            n_atoms,
            max_processes=self._max_processes,
        )

        n_perms = task['perms'].shape[0]
        tril_perms = np.array([Desc.perm(p) for p in task['perms']])

        dim_i = 3 * n_atoms
        dim_d = desc.dim

        perm_offsets = np.arange(n_perms)[:, None] * dim_d
        tril_perms_lin = (tril_perms + perm_offsets).flatten('F')

        # TODO: check if all atoms are in span of lattice vectors, otherwise suggest that
        # rows and columns might have been switched.
        lat_and_inv = None
        if 'lattice' in task:
            try:
                lat_and_inv = (task['lattice'], np.linalg.inv(task['lattice']))
            except np.linalg.LinAlgError:
                raise ValueError(  # TODO: Document me
                    'Provided dataset contains invalid lattice vectors (not invertible). Note: Only rank 3 lattice vector matrices are supported.'
                )

            # # TODO: check if all atoms are within unit cell
            # for r in task['R_train']:
            #    r_lat = lat_and_inv[1].dot(r.T)
            #    if not (r_lat >= 0).all():
            #         raise ValueError( # TODO: Document me
            #            'Some atoms appear outside of the unit cell! Please check lattice vectors in dataset file.'
            #         )
            #        #pass

        R = task['R_train'].reshape(n_train, -1)
        R_desc, R_d_desc = desc.from_R(
            R,
            lat_and_inv=lat_and_inv,
            callback=partial(
                callback, disp_str='Generating descriptors and their Jacobians'
            )
            if callback is not None
            else None,
        )

        # Generate label vector.
        E_train_mean = None
        y = task['F_train'].ravel().copy()
        if task['use_E'] and task['use_E_cstr']:
            E_train = task['E_train'].ravel().copy()
            E_train_mean = np.mean(E_train)

            y = np.hstack((y, -E_train + E_train_mean))

        y_std = np.std(y)
        y /= y_std

        max_memory_bytes = self._max_memory * 1024**3

        # Memory cost of analytic solver
        est_bytes_analytic = Analytic.est_memory_requirement(n_train, n_atoms)

        # Memory overhead (solver independent)
        est_bytes_overhead = y.nbytes
        est_bytes_overhead += R.nbytes
        est_bytes_overhead += R_desc.nbytes
        est_bytes_overhead += R_d_desc.nbytes

        solver_keys = {}

        use_analytic_solver = (
            est_bytes_analytic + est_bytes_overhead
        ) < max_memory_bytes

        # Fall back to analytic solver, if iterative solver file is missing.
        base_path = os.path.dirname(os.path.abspath(__file__))
        iter_solver_path = os.path.join(base_path, 'solvers/iterative.py')
        if not os.path.exists(iter_solver_path):
            self.log.debug('Iterative solver not installed.')
            use_analytic_solver = True

        # use_analytic_solver = True  # remove me!

        if use_analytic_solver:

            self.log.info(
                'Using analytic solver (expected memory use: ~{})'.format(
                    ui.gen_memory_str(est_bytes_analytic + est_bytes_overhead)
                )
            )

            analytic = Analytic(self, desc, callback=callback)
            alphas = analytic.solve(task, R_desc, R_d_desc, tril_perms_lin, y)

        else:

            max_n_inducing_pts = Iterative.max_n_inducing_pts(
                n_train, n_atoms, max_memory_bytes
            )
            est_bytes_iterative = Iterative.est_memory_requirement(
                n_train, max_n_inducing_pts, n_atoms
            )

            self.log.info(
                'Using iterative solver (expected memory use: ~{})'.format(
                    ui.gen_memory_str(est_bytes_iterative + est_bytes_overhead)
                )
            )

            alphas_F = task['alphas0_F'] if 'alphas0_F' in task else None
            alphas_E = task['alphas0_E'] if 'alphas0_E' in task else None

            iterative = Iterative(
                self,
                desc,
                self._max_memory,
                self._max_processes,
                self._use_torch,
                callback=callback,
            )
            (
                alphas,
                solver_keys['solver_tol'],
                solver_keys[
                    'solver_iters'
                ],  # number of iterations performed (cg solver)
                solver_keys['solver_resid'],  # residual of solution
                train_rmse,
                solver_keys['inducing_pts_idxs'],
                is_conv,
            ) = iterative.solve(
                task,
                R_desc,
                R_d_desc,
                tril_perms_lin,
                y,
                y_std,
                save_progr_callback=save_progr_callback,
            )

            solver_keys['norm_y_train'] = np.linalg.norm(y)

            if not is_conv:
                self.log.warning(
                    'Iterative solver did not converge!\n'
                    + 'The optimization problem underlying this force field reconstruction task seems to be highly ill-conditioned.\n\n'
                    + ui.color_str('Troubleshooting tips:\n', bold=True)
                    + ui.wrap_indent_str(
                        '(1) ',
                        'Are the provided geometries highly correlated (i.e. very similar to each other)?',
                    )
                    + '\n'
                    + ui.wrap_indent_str(
                        '(2) ', 'Try a larger length scale (sigma) parameter.'
                    )
                    + '\n\n'
                    + ui.color_str('Note:', bold=True)
                    + ' We will continue with this unconverged model, but its accuracy will likely be very bad.'
                )

        alphas_E = None
        alphas_F = alphas
        if task['use_E_cstr']:
            alphas_E = alphas[-n_train:]
            alphas_F = alphas[:-n_train]

        model = self.create_model(
            task,
            'analytic' if use_analytic_solver else 'cg',
            R_desc,
            R_d_desc,
            tril_perms_lin,
            y_std,
            alphas_F,
            alphas_E=alphas_E,
        )
        model.update(solver_keys)

        # Recover integration constant.
        # Note: if energy constraints are included in the kernel (via 'use_E_cstr'), do not
        # compute the integration constant, but simply set it to the mean of the training energies
        # (which was subtracted from the labels before training).
        if model['use_E']:
            c = (
                self._recov_int_const(model, task, R_desc=R_desc, R_d_desc=R_d_desc)
                if E_train_mean is None
                else E_train_mean
            )
            # if c is None:
            #    # Something does not seem right. Turn off energy predictions for this model, only output force predictions.
            #    model['use_E'] = False
            # else:
            #    model['c'] = c

            model['c'] = c

        return model

    def _recov_int_const(
        self, model, task, R_desc=None, R_d_desc=None
    ):  # TODO: document e_err_inconsist return
        """
        Estimate the integration constant for a force field model.

        The offset between the energies predicted for the original training
        data and the true energy labels is computed in the least square sense.
        Furthermore, common issues with the user-provided datasets are self
        diagnosed here.

        Parameters
        ----------
            model : :obj:`dict`
                Data structure of custom type :obj:`model`.
            task : :obj:`dict`
                Data structure of custom type :obj:`task`.
            R_desc : :obj:`numpy.ndarray`, optional
                    An 2D array of size M x D containing the
                    descriptors of dimension D for M
                    molecules.
            R_d_desc : :obj:`numpy.ndarray`, optional
                    A 2D array of size M x D x 3N containing of the
                    descriptor Jacobians for M molecules. The descriptor
                    has dimension D with 3N partial derivatives with
                    respect to the 3N Cartesian coordinates of each atom.

        Returns
        -------
            float
                Estimate for the integration constant.

        Raises
        ------
            ValueError
                If the sign of the force labels in the dataset from
                which the model emerged is switched (e.g. gradients
                instead of forces).
            ValueError
                If inconsistent/corrupted energy labels are detected
                in the provided dataset.
            ValueError
                If potentially inconsistent scales in energy vs.
                force labels are detected in the provided dataset.
        """

        gdml_predict = GDMLPredict(
            model,
            max_memory=self._max_memory,
            max_processes=self._max_processes,
            use_torch=self._use_torch,
            log_level=logging.CRITICAL,
        )

        gdml_predict.set_R_desc(R_desc)
        gdml_predict.set_R_d_desc(R_d_desc)

        E_pred, _ = gdml_predict.predict()
        E_ref = np.squeeze(task['E_train'])

        e_fact = np.linalg.lstsq(
            np.column_stack((E_pred, np.ones(E_ref.shape))), E_ref, rcond=-1
        )[0][0]
        corrcoef = np.corrcoef(E_ref, E_pred)[0, 1]

        # import matplotlib.pyplot as plt
        # sidx = np.argsort(E_ref)
        # plt.plot(E_ref[sidx])
        # c = np.sum(E_ref - E_pred) / E_ref.shape[0]
        # plt.plot(E_pred[sidx]+c)
        # plt.show()
        # sys.exit()

        # import matplotlib.pyplot as plt
        # sidx = np.argsort(F_ref)
        # plt.plot(F_ref[sidx])
        # c = np.sum(F_ref - F_pred) / F_ref.shape[0]
        # plt.plot(F_pred[sidx],'--')
        # plt.show()
        # sys.exit()

        if np.sign(e_fact) == -1:
            self.log.warning(
                'It looks like the provided dataset may contain gradients instead of force labels (flipped sign).\n\n'
                + ui.color_str('Troubleshooting tips:\n', bold=True)
                + ui.wrap_indent_str(
                    '(1) ',
                    'Verify the sign of your force labels.',
                )
                + '\n'
                + ui.wrap_indent_str(
                    '(2) ',
                    'This issue might very well just be a sympthom of using too few trainnig data and your labels are correct.',
                )
            )

        if corrcoef < 0.95:
            self.log.warning(
                'Potentially inconsistent energy labels detected!\n'
                + 'The predicted energies for the training data are only weakly correlated with the reference labels (correlation coefficient {:.2f}). Note that correlation is independent of scale, which indicates that the issue is most likely not just a unit conversion error.\n\n'.format(
                    corrcoef
                )
                + ui.color_str('Troubleshooting tips:\n', bold=True)
                + ui.wrap_indent_str(
                    '(1) ',
                    'Verify the correct correspondence between geometries and labels in the provided dataset.',
                )
                + '\n'
                + ui.wrap_indent_str(
                    '(2) ',
                    'This issue might very well just be a sympthom of using too few trainnig data and your labels are correct.',
                )
                + '\n'
                + ui.wrap_indent_str(
                    '(3) ', 'Verify the consistency between energy and force labels.'
                )
                + '\n'
                + ui.wrap_indent_str(
                    '    - ', 'Correspondence between force and energy labels correct?'
                )
                + '\n'
                + ui.wrap_indent_str(
                    '    - ',
                    'Accuracy of forces (convergence of your ab-initio calculations)?',
                )
                + '\n'
                + ui.wrap_indent_str(
                    '    - ',
                    'Was the same level of theory used to compute forces and energies?',
                )
                + '\n'
                + ui.wrap_indent_str(
                    '(4) ',
                    'Is the training data spread too broadly (i.e. weakly sampled transitions between example clusters)?',
                )
                + '\n'
                + ui.wrap_indent_str(
                    '(5) ', 'Are there duplicate geometries in the training data?'
                )
                + '\n'
                + ui.wrap_indent_str(
                    '(6) ', 'Are there any corrupted data points (e.g. parsing errors)?'
                )
            )

        if np.abs(e_fact - 1) > 1e-1:
            self.log.warning(
                'Potentially inconsistent scales in energy vs. force labels detected!\n'
                + 'The integrated force predictions differ from the reference energy labels by factor ~{:.2f} (for the training data), meaning that this model will likely fail to predict energies accurately in real-world use.\n\n'.format(
                    e_fact
                )
                + ui.color_str('Troubleshooting tips:\n', bold=True)
                + ui.wrap_indent_str(
                    '(1) ', 'Verify consistency of units in energy and force labels.'
                )
                + '\n'
                + ui.wrap_indent_str(
                    '(2) ',
                    'This issue might very well just be a sympthom of using too few trainnig data and your labels are correct.',
                )
                + '\n'
                + ui.wrap_indent_str(
                    '(3) ',
                    'Is the training data spread too broadly (i.e. weakly sampled transitions between example clusters)?',
                )
            )

        # Least squares estimate for integration constant.
        return np.sum(E_ref - E_pred) / E_ref.shape[0]

    def _assemble_kernel_mat(
        self,
        R_desc,
        R_d_desc,
        tril_perms_lin,
        sig,
        desc,  # TODO: document me
        use_E_cstr=False,
        col_idxs=np.s_[:],  # TODO: document me
        alloc_extra_rows=0,  # TODO: document me
        callback=None,
    ):
        r"""
        Compute force field kernel matrix.

        The Hessian of the Matern kernel is used with n = 2 (twice
        differentiable). Each row and column consists of matrix-valued blocks,
        which encode the interaction of one training point with all others. The
        result is stored in shared memory (a global variable).

        Parameters
        ----------
            R_desc : :obj:`numpy.ndarray`
                Array containing the descriptor for each training point.
            R_d_desc : :obj:`numpy.ndarray`
                Array containing the gradient of the descriptor for
                each training point.
            tril_perms_lin : :obj:`numpy.ndarray`
                1D array containing all recovered permutations
                expanded as one large permutation to be applied to a
                tiled copy of the object to be permuted.
            sig : int
                Hyper-parameter :math:`\sigma`(kernel length scale).
            use_E_cstr : bool, optional
                True: include energy constraints in the kernel,
                False: default (s)GDML kernel.
            callback : callable, optional
                Kernel assembly progress function that takes three
                arguments:
                    current : int
                        Current progress (number of completed entries).
                    total : int
                        Task size (total number of entries to create).
                    done_str : :obj:`str`, optional
                        Once complete, this string contains the
                        time it took to assemble the kernel (seconds).
            cols_m_limit : int, optional (DEPRECATED)
                Only generate the columns up to index 'cols_m_limit'. This creates
                a M*3N x cols_m_limit*3N kernel matrix, instead of M*3N x M*3N.
            cols_3n_keep_idxs : :obj:`numpy.ndarray`, optional
                Only generate columns with the given indices in the 3N x 3N
                kernel function. The resulting kernel matrix will have dimension
                M*3N x M*len(cols_3n_keep_idxs).

        Returns
        -------
            :obj:`numpy.ndarray`
                Force field kernel matrix.
        """

        global glob

        # Note: This function does not support unsorted (ascending) index arrays.
        # if not isinstance(col_idxs, slice):
        #    assert np.array_equal(col_idxs, np.sort(col_idxs))

        n_train, dim_d = R_d_desc.shape[:2]
        dim_i = 3 * int((1 + np.sqrt(8 * dim_d + 1)) / 2)

        # Determine size of kernel matrix.
        K_n_rows = n_train * dim_i

        # Account for additional rows (and columns) due to energy constraints in the kernel matrix.
        if use_E_cstr:
            K_n_rows += n_train

        if isinstance(col_idxs, slice):  # indexed by slice
            K_n_cols = len(range(*col_idxs.indices(K_n_rows)))
        else:  # indexed by list

            # TODO: throw exeption with description
            assert len(col_idxs) == len(set(col_idxs))  # assume no dublicate indices

            # TODO: throw exeption with description
            # Note: This function does not support unsorted (ascending) index arrays.
            assert np.array_equal(col_idxs, np.sort(col_idxs))

            K_n_cols = len(col_idxs)

        # Make sure no indices are outside of the valid range.
        if K_n_cols > K_n_rows:
            raise ValueError('Columns indexed beyond range.')

        exploit_sym = False
        cols_m_limit = None

        # Check if range is a subset of training points (as opposed to a subset of partials of multiple points).
        is_M_subset = (
            isinstance(col_idxs, slice)
            and (col_idxs.start is None or col_idxs.start % dim_i == 0)
            and (col_idxs.stop is None or col_idxs.stop % dim_i == 0)
            and col_idxs.step is None
        )
        if is_M_subset:
            M_slice_start = (
                None if col_idxs.start is None else int(col_idxs.start / dim_i)
            )
            M_slice_stop = None if col_idxs.stop is None else int(col_idxs.stop / dim_i)
            M_slice = slice(M_slice_start, M_slice_stop)

            J = range(*M_slice.indices(n_train + (n_train if use_E_cstr else 0)))

            if M_slice_start is None:
                exploit_sym = True
                cols_m_limit = M_slice_stop

        else:

            if isinstance(col_idxs, slice):
                # random = list(range(*col_idxs.indices(n_train * dim_i)))
                col_idxs = list(range(*col_idxs.indices(K_n_rows)))

            # Separate column indices of force-force and force-energy constraints.
            cond = col_idxs >= (n_train * dim_i)
            ff_col_idxs, fe_col_idxs = col_idxs[~cond], col_idxs[cond]

            # M - number training
            # N - number atoms

            n_idxs = np.concatenate(
                [np.mod(ff_col_idxs, dim_i), np.zeros(fe_col_idxs.shape, dtype=int)]
            )  # Column indices that go beyond force-force correlations need a different treatment.

            m_idxs = np.concatenate([np.array(ff_col_idxs) // dim_i, fe_col_idxs])
            m_idxs_uniq = np.unique(m_idxs)  # which points to include?

            m_n_idxs = [
                list(n_idxs[np.where(m_idxs == m_idx)]) for m_idx in m_idxs_uniq
            ]
            m_n_idxs_lens = [len(m_n_idx) for m_n_idx in m_n_idxs]

            m_n_idxs_lens.insert(0, 0)
            blk_start_idxs = list(
                np.cumsum(m_n_idxs_lens[:-1])
            )  # index within K at which each block starts

            # tupels: (block index in final K, block index global, indices of partials within block)
            J = list(zip(blk_start_idxs, m_idxs_uniq, m_n_idxs))

        if callback is not None:
            callback(0, 100)  # 0%

        if self._use_torch:
            if not _has_torch:
                raise ImportError(
                    'Optional PyTorch dependency not found! Please run \'pip install sgdml[torch]\' to install it or disable the PyTorch option.'
                )

            K = np.empty((K_n_rows + alloc_extra_rows, K_n_cols))

            if J is not list:
                J = list(J)

            global torch_assemble_done
            torch_assemble_todo, torch_assemble_done = K_n_cols, 0

            def progress_callback(done):

                global torch_assemble_done
                torch_assemble_done += done

                if callback is not None:
                    callback(
                        torch_assemble_done,
                        torch_assemble_todo,
                        newline_when_done=False,
                    )

            start = timeit.default_timer()

            if _torch_cuda_is_available:
                torch_device = 'cuda'
            elif _torch_mps_is_available:
                torch_device = 'mps'
            else:
                torch_device = 'cpu'

            R_desc_torch = torch.from_numpy(R_desc).to(torch_device)  # N, d
            R_d_desc_torch = torch.from_numpy(R_d_desc).to(torch_device)

            from .torchtools import GDMLTorchAssemble

            torch_assemble = GDMLTorchAssemble(
                J,
                tril_perms_lin,
                sig,
                use_E_cstr,
                R_desc_torch,
                R_d_desc_torch,
                out=K[:K_n_rows, :],
                callback=progress_callback,
            )

            # Enable data parallelism
            n_gpu = torch.cuda.device_count()
            if n_gpu > 1:
                torch_assemble = torch.nn.DataParallel(torch_assemble)
            torch_assemble.to(torch_device)

            torch_assemble.forward(torch.arange(len(J)))
            del torch_assemble

            del R_desc_torch
            del R_d_desc_torch

            stop = timeit.default_timer()

            if callback is not None:
                dur_s = stop - start
                sec_disp_str = 'took {:.1f} s'.format(dur_s) if dur_s >= 0.1 else ''
                callback(DONE, sec_disp_str=sec_disp_str)

            return K

        K = mp.RawArray('d', (K_n_rows + alloc_extra_rows) * K_n_cols)
        glob['K'], glob['K_shape'] = K, (K_n_rows + alloc_extra_rows, K_n_cols)
        glob['R_desc'], glob['R_desc_shape'] = _share_array(R_desc, 'd')
        glob['R_d_desc'], glob['R_d_desc_shape'] = _share_array(R_d_desc, 'd')

        glob['desc_func'] = desc

        start = timeit.default_timer()

        pool = None
        map_func = map
        if self._max_processes != 1 and mp.cpu_count() > 1:
            pool = Pool(
                (self._max_processes or mp.cpu_count()) - 1
            )  # exclude main process
            map_func = pool.imap_unordered

        todo, done = K_n_cols, 0
        for done_wkr in map_func(
            partial(
                _assemble_kernel_mat_wkr,
                tril_perms_lin=tril_perms_lin,
                sig=sig,
                use_E_cstr=use_E_cstr,
                exploit_sym=exploit_sym,
                cols_m_limit=cols_m_limit,
            ),
            J,
        ):
            done += done_wkr

            if callback is not None:
                callback(done, todo, newline_when_done=False)

        if pool is not None:
            pool.close()
            pool.join()  # Wait for the worker processes to terminate (to measure total runtime correctly).
            pool = None

        stop = timeit.default_timer()

        if callback is not None:
            dur_s = stop - start
            sec_disp_str = 'took {:.1f} s'.format(dur_s) if dur_s >= 0.1 else ''
            callback(DONE, sec_disp_str=sec_disp_str)

        # Release some memory.
        glob.pop('K', None)
        glob.pop('R_desc', None)
        glob.pop('R_d_desc', None)

        return np.frombuffer(K).reshape((K_n_rows + alloc_extra_rows), K_n_cols)

    def draw_strat_sample(self, T, n, excl_idxs=None):
        """
        Draw sample from dataset that preserves its original distribution.

        The distribution is estimated from a histogram were the bin size is
        determined using the Freedman-Diaconis rule. This rule is designed to
        minimize the difference between the area under the empirical
        probability distribution and the area under the theoretical
        probability distribution. A reduced histogram is then constructed by
        sampling uniformly in each bin. It is intended to populate all bins
        with at least one sample in the reduced histogram, even for small
        training sizes.

        Parameters
        ----------
            T : :obj:`numpy.ndarray`
                Dataset to sample from.
            n : int
                Number of examples.
            excl_idxs : :obj:`numpy.ndarray`, optional
                Array of indices to exclude from sample.

        Returns
        -------
            :obj:`numpy.ndarray`
                Array of indices that form the sample.
        """

        if excl_idxs is None or len(excl_idxs) == 0:
            excl_idxs = None

        if n == 0:
            return np.array([], dtype=np.uint)

        if T.size == n:  # TODO: this only works if excl_idxs=None
            assert excl_idxs is None
            return np.arange(n)

        if n == 1:
            idxs_all_non_excl = np.setdiff1d(
                np.arange(T.size), excl_idxs, assume_unique=True
            )
            return np.array([np.random.choice(idxs_all_non_excl)])

        # Freedman-Diaconis rule
        h = 2 * np.subtract(*np.percentile(T, [75, 25])) / np.cbrt(n)
        n_bins = int(np.ceil((np.max(T) - np.min(T)) / h)) if h > 0 else 1
        n_bins = min(
            n_bins, int(n / 2)
        )  # Limit number of bins to half of requested subset size.

        bins = np.linspace(np.min(T), np.max(T), n_bins, endpoint=False)
        idxs = np.digitize(T, bins)

        # Exclude restricted indices.
        if excl_idxs is not None and excl_idxs.size > 0:
            idxs[excl_idxs] = n_bins + 1  # Impossible bin.

        uniq_all, cnts_all = np.unique(idxs, return_counts=True)

        # Remove restricted bin.
        if excl_idxs is not None and excl_idxs.size > 0:
            excl_bin_idx = np.where(uniq_all == n_bins + 1)
            cnts_all = np.delete(cnts_all, excl_bin_idx)
            uniq_all = np.delete(uniq_all, excl_bin_idx)

        # Compute reduced bin counts.
        reduced_cnts = np.ceil(cnts_all / np.sum(cnts_all, dtype=float) * n).astype(int)
        reduced_cnts = np.minimum(
            reduced_cnts, cnts_all
        )  # limit reduced_cnts to what is available in cnts_all

        # Reduce/increase bin counts to desired total number of points.
        reduced_cnts_delta = n - np.sum(reduced_cnts)

        while np.abs(reduced_cnts_delta) > 0:

            # How many members can we remove from an arbitrary bucket, without any bucket with more than one member going to zero?
            max_bin_reduction = np.min(reduced_cnts[np.where(reduced_cnts > 1)]) - 1

            # Generate additional bin members to fill up/drain bucket counts of subset. This array contains (repeated) bucket IDs.
            outstanding = np.random.choice(
                uniq_all,
                min(max_bin_reduction, np.abs(reduced_cnts_delta)),
                p=(reduced_cnts - 1) / np.sum(reduced_cnts - 1, dtype=float),
                replace=True,
            )
            uniq_outstanding, cnts_outstanding = np.unique(
                outstanding, return_counts=True
            )  # Aggregate bucket IDs.

            outstanding_bucket_idx = np.where(
                np.in1d(uniq_all, uniq_outstanding, assume_unique=True)
            )[
                0
            ]  # Bucket IDs to Idxs.
            reduced_cnts[outstanding_bucket_idx] += (
                np.sign(reduced_cnts_delta) * cnts_outstanding
            )
            reduced_cnts_delta = n - np.sum(reduced_cnts)

        # Draw examples for each bin.
        idxs_train = np.empty((0,), dtype=int)
        for uniq_idx, bin_cnt in zip(uniq_all, reduced_cnts):
            idx_in_bin_all = np.where(idxs.ravel() == uniq_idx)[0]
            idxs_train = np.append(
                idxs_train, np.random.choice(idx_in_bin_all, bin_cnt, replace=False)
            )

        return idxs_train


================================================
FILE: sgdml/utils/__init__.py
================================================


================================================
FILE: sgdml/utils/desc.py
================================================
#!/usr/bin/python

# MIT License
#
# Copyright (c) 2018-2022 Stefan Chmiela, Luis Galvez
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

import numpy as np
import scipy as sp
from scipy import spatial

import multiprocessing as mp

Pool = mp.get_context('fork').Pool

from functools import partial
import timeit

try:
    import torch
except ImportError:
    _has_torch = False
else:
    _has_torch = True


def _pbc_diff(diffs, lat_and_inv, use_torch=False):
    """
    Clamp differences of vectors to super cell.

    Parameters
    ----------
        diffs : :obj:`numpy.ndarray`
            N x 3 matrix of N pairwise differences between vectors `u - v`
        lat_and_inv : tuple of :obj:`numpy.ndarray`
            Tuple of 3 x 3 matrix containing lattice vectors as columns and its inverse.
        use_torch : boolean, optional
            Enable, if the inputs are PyTorch objects.

    Returns
    -------
        :obj:`numpy.ndarray`
            N x 3 matrix clamped differences
    """

    lat, lat_inv = lat_and_inv

    if use_torch and not _has_torch:
        raise ImportError(
            'Optional PyTorch dependency not found! Please run \'pip install sgdml[torch]\' to install it or disable the PyTorch option.'
        )

    if use_torch:
        c = lat_inv.mm(diffs.t())
        diffs -= lat.mm(c.round()).t()
    else:
        c = lat_inv.dot(diffs.T)
        diffs -= lat.dot(np.around(c)).T

    return diffs


def _pdist(r, lat_and_inv=None):
    """
    Compute pairwise Euclidean distance matrix between all atoms.

    Parameters
    ----------
        r : :obj:`numpy.ndarray`
            Array of size 3N containing the Cartesian coordinates of
            each atom.
        lat_and_inv : tuple of :obj:`numpy.ndarray`, optional
            Tuple of 3x3 matrix containing lattice vectors as columns and its inverse.

    Returns
    -------
        :obj:`numpy.ndarray`
            Array of size N(N-1)/2 containing the upper triangle of the pairwise
            distance matrix between atoms.
    """

    r = r.reshape(-1, 3)
    n_atoms = r.shape[0]

    if lat_and_inv is None:
        pdist = sp.spatial.distance.pdist(r, 'euclidean')
    else:
        pdist = sp.spatial.distance.pdist(
            r, lambda u, v: np.linalg.norm(_pbc_diff(u - v, lat_and_inv))
        )

    tril_idxs = np.tril_indices(n_atoms, k=-1)
    return sp.spatial.distance.squareform(pdist, checks=False)[tril_idxs]


def _squareform(vec_or_mat):

    # vector to matrix representation
    if vec_or_mat.ndim == 1:

        n_tril = vec_or_mat.size
        n = int((1 + np.sqrt(8 * n_tril + 1)) / 2)

        i, j = np.tril_indices(n, k=-1)

        mat = np.zeros((n, n))
        mat[i, j] = vec_or_mat
        mat[j, i] = vec_or_mat

        return mat

    else:  # matrix to vector

        assert vec_or_mat.shape[0] == vec_or_mat.shape[1]  # matrix is square

        n = vec_or_mat.shape[0]
        i, j = np.tril_indices(n, k=-1)

        return vec_or_mat[i, j]


def _r_to_desc(r, pdist):
    """
    Generate descriptor for a set of atom positions in Cartesian
    coordinates.

    Parameters
    ----------
        r : :obj:`numpy.ndarray`
            Array of size 3N containing the Cartesian coordinates of
            each atom.
        pdist : :obj:`numpy.ndarray`
            Array of size N x N containing the Euclidean distance
            (2-norm) for each pair of atoms.

    Returns
    -------
        :obj:`numpy.ndarray`
            Descriptor representation as 1D array of size N(N-1)/2
    """

    # Add singleton dimension if input is (,3N).
    if r.ndim == 1:
        r = r[None, :]

    return 1.0 / pdist


def _r_to_d_desc(r, pdist, lat_and_inv=None):
    """
    Generate descriptor Jacobian for a set of atom positions in
    Cartesian coordinates.

    This method can apply the minimum-image convention as periodic
    boundary condition for distances between atoms, given the lattice vectors.

    Parameters
    ----------
        r : :obj:`numpy.ndarray`
            Array of size 3N containing the Cartesian coordinates of
            each atom.
        pdist : :obj:`numpy.ndarray`
            Array of size N x N containing the Euclidean distance
            (2-norm) for each pair of atoms.
        lat_and_inv : tuple of :obj:`numpy.ndarray`, optional
            Tuple of 3 x 3 matrix containing lattice vectors as columns and its inverse.

    Returns
    -------
        :obj:`numpy.ndarray`
            Array of size N(N-1)/2 x 3N containing all partial
            derivatives of the descriptor.
    """

    r = r.reshape(-1, 3)
    pdiff = r[:, None] - r[None, :]  # pairwise differences ri - rj

    n_atoms = r.shape[0]
    i, j = np.tril_indices(n_atoms, k=-1)

    pdiff = pdiff[i, j, :]  # lower triangular

    if lat_and_inv is not None:
        pdiff = _pbc_diff(pdiff, lat_and_inv)

    d_desc_elem = pdiff / (pdist**3)[:, None]

    return d_desc_elem


def _from_r(r, lat_and_inv=None):
    """
    Generate descriptor and its Jacobian for one molecular geometry
    in Cartesian coordinates.

    Parameters
    ----------
        r : :obj:`numpy.ndarray`
            Array of size 3N containing the Cartesian coordinates of
            each atom.
        lat_and_inv : tuple of :obj:`numpy.ndarray`, optional
            Tuple of 3 x 3 matrix containing lattice vectors as columns and its inverse.

    Returns
    -------
        :obj:`numpy.ndarray`
            Descriptor representation as 1D array of size N(N-1)/2
        :obj:`numpy.ndarray`
            Array of size N(N-1)/2 x 3N containing all partial
            derivatives of the descriptor.
    """

    # Add singleton dimension if input is (,3N).
    if r.ndim == 1:
        r = r[None, :]

    pd = _pdist(r, lat_and_inv)

    r_desc = _r_to_desc(r, pd)
    r_d_desc = _r_to_d_desc(r, pd, lat_and_inv)

    return r_desc, r_d_desc


class Desc(object):
    # def __init__(self, n_atoms, interact_cut_off=None, max_processes=None):
    def __init__(self, n_atoms, max_processes=None):
        """
        Generate descriptors and their Jacobians for molecular geometries,
        including support for periodic boundary conditions.

        Parameters
        ----------
                n_atoms : int
                        Number of atoms in the represented system.
                max_processes : int, optional
                        Limit the max. number of processes. Otherwise
                        all CPU cores are used.
        """

        self.n_atoms = n_atoms
        self.dim_i = 3 * n_atoms

        # Size of the resulting descriptor vector.
        self.dim = (n_atoms * (n_atoms - 1)) // 2

        self.tril_indices = np.tril_indices(n_atoms, k=-1)

        # Precompute indices for nonzero entries in desriptor derivatives.
        self.d_desc_mask = np.zeros((n_atoms, n_atoms - 1), dtype=int)
        for a in range(n_atoms):  # for each partial derivative
            rows, cols = self.tril_indices
            self.d_desc_mask[a, :] = np.concatenate(
                [np.where(rows == a)[0], np.where(cols == a)[0]]
            )

        self.dim_range = np.arange(self.dim)  # [0, 1, ..., dim-1]

        # Precompute indices for nonzero entries in desriptor derivatives.

        self.M = np.arange(1, n_atoms)  # indexes matrix row-wise, skipping diagonal
        for a in range(1, n_atoms):
            self.M = np.concatenate((self.M, np.delete(np.arange(n_atoms), a)))

        self.A = np.repeat(
            np.arange(n_atoms), n_atoms - 1
        )  # [0, 0, ..., 1, 1, ..., 2, 2, ...]

        self.max_processes = max_processes

    def from_R(self, R, lat_and_inv=None, max_processes=None, callback=None):
        """
        Generate descriptor and its Jacobian for multiple molecular geometries
        in Cartesian coordinates.

        Parameters
        ----------
            R : :obj:`numpy.ndarray`
                Array of size M x 3N containing the Cartesian coordinates of
                each atom.
            lat_and_inv : tuple of :obj:`numpy.ndarray`, optional
                Tuple of 3 x 3 matrix containing lattice vectors as columns and its inverse.
            max_processes : int, optional
                Limit the max. number of processes. Otherwise
                all CPU cores are used. This parameter overwrites the global setting as
                set during initialization.
            callback : callable, optional
                Descriptor and descriptor Jacobian generation status.
                    current : int
                        Current progress (number of completed descriptors).
                    total : int
                        Task size (total number of descriptors to create).
                    sec_disp_str : :obj:`str`, optional
                        Once complete, this string contains the
                        time it took complete this task (seconds).

        Returns
        -------
            :obj:`numpy.ndarray`
                Array of size M x N(N-1)/2 containing the descriptor representation
                for each geometry.
            :obj:`numpy.ndarray`
                Array of size M x N(N-1)/2 x 3N containing all partial
                derivatives of the descriptor for each geometry.
        """

        # Add singleton dimension if input is (,3N).
        if R.ndim == 1:
            R = R[None, :]

        M = R.shape[0]
        if M == 1:
            return _from_r(R, lat_and_inv)

        R_desc = np.empty([M, self.dim])
        R_d_desc = np.empty([M, self.dim, 3])

        # Generate descriptor and their Jacobians
        start = timeit.default_timer()

        pool = None
        map_func = map
        max_processes = max_processes or self.max_processes
        if max_processes != 1 and mp.cpu_count() > 1:
            pool = Pool((max_processes or mp.cpu_count()) - 1)  # exclude main process
            map_func = pool.imap

        for i, r_desc_r_d_desc in enumerate(
            map_func(partial(_from_r, lat_and_inv=lat_and_inv), R)
        ):
            R_desc[i, :], R_d_desc[i, :, :] = r_desc_r_d_desc

            if callback is not None and i < M - 1:
                callback(i, M - 1)

        if pool is not None:
            pool.close()
            pool.join()  # Wait for the worker processes to terminate (to measure total runtime correctly).
            pool = None

        stop = timeit.default_timer()

        if callback is not None:
            dur_s = stop - start
            sec_disp_str = 'took {:.1f} s'.format(dur_s) if dur_s >= 0.1 else ''
            callback(M, M, sec_disp_str=sec_disp_str)

        return R_desc, R_d_desc

    # Multiplies descriptor(s) jacobian with 3N-vector(s) from the right side
    def d_desc_dot_vec(self, R_d_desc, vecs, overwrite_vecs=False):

        if R_d_desc.ndim == 2:
            R_d_desc = R_d_desc[None, ...]

        if vecs.ndim == 1:
            vecs = vecs[None, ...]

        i, j = self.tril_indices

        vecs = vecs.reshape(vecs.shape[0], -1, 3)

        einsum = np.einsum
        if _has_torch and torch.is_tensor(R_d_desc):
            assert torch.is_tensor(vecs)
            einsum = torch.einsum

        return einsum('...ij,...ij->...i', R_d_desc, vecs[:, j, :] - vecs[:, i, :])

    # Multiplies descriptor(s) jacobian with N(N-1)/2-vector(s) from the left side
    def vec_dot_d_desc(self, R_d_desc, vecs, out=None):

        if R_d_desc.ndim == 2:
            R_d_desc = R_d_desc[None, ...]

        if vecs.ndim == 1:
            vecs = vecs[None, ...]

        assert (
            R_d_desc.shape[0] == 1
            or vecs.shape[0] == 1
            or R_d_desc.shape[0] == vecs.shape[0]
        )  # either multiple descriptors or multiple vectors at once, not both (or the same number of both, than it will must be a multidot)

        n = np.max((R_d_desc.shape[0], vecs.shape[0]))
        i, j = self.tril_indices

        out = np.zeros((n, self.n_atoms, self.n_atoms, 3))
        out[:, i, j, :] = R_d_desc * vecs[..., None]
        out[:, j, i, :] = -out[:, i, j, :]
        return out.sum(axis=1).reshape(n, -1)

        # if out is None or out.shape != (n, self.n_atoms*3):
        #    out = np.zeros((n, self.n_atoms*3))

        # R_d_desc_full = np.zeros((self.n_atoms, self.n_atoms, 3))
        # for a in range(n):

        #   R_d_desc_full[i, j, :] = R_d_desc * vecs[a, :, None]
        #    R_d_desc_full[j, i, :] = -R_d_desc_full[i, j, :]
        #    out[a,:] = R_d_desc_full.sum(axis=0).ravel()

        # return out

    def d_desc_from_comp(self, R_d_desc, out=None):
        """
        Convert a compressed representation of a descriptor Jacobian back
        to its full representation.

        The compressed representation omits all zeros and scales with N
        instead of N(N-1)/2.

        Parameters
        ----------
            R_d_desc : :obj:`numpy.ndarray` or :obj:`torch.tensor`
                Array of size M x N x N x 3 containing the compressed
                descriptor Jacobian.
            out : :obj:`numpy.ndarray` or :obj:`torch.tensor`, optional
                Output argument. This must have the exact kind that would
                be returned if it was not used.

        Note
        ----
                If used, the output argument must be initialized with zeros!

        Returns
        -------
            :obj:`numpy.ndarray` or :obj:`torch.tensor`
                Array of size M x N(N-1)/2 x 3N containing the full
                representation.
        """

        if R_d_desc.ndim == 2:
            R_d_desc = R_d_desc[None, ...]

        n = R_d_desc.shape[0]
        i, j = self.tril_indices

        if out is None:
            if _has_torch and torch.is_tensor(R_d_desc):
                device = R_d_desc.device
                dtype = R_d_desc.dtype
                out = torch.zeros((n, self.dim, self.n_atoms, 3), device=device).to(
                    dtype
                )
            else:
                out = np.zeros((n, self.dim, self.n_atoms, 3))
        else:
            out = out.reshape(n, self.dim, self.n_atoms, 3)

        out[:, self.dim_range, j, :] = R_d_desc
        out[:, self.dim_range, i, :] = -R_d_desc

        return out.reshape(-1, self.dim, self.dim_i)

    def d_desc_to_comp(self, R_d_desc):
        """
        Convert a descriptor Jacobian to a compressed representation.

        The compressed representation omits all zeros and scales with N
        instead of N(N-1)/2.

        Parameters
        ----------
            R_d_desc : :obj:`numpy.ndarray`
                Array of size M x N(N-1)/2 x 3N containing the descriptor
                Jacobian.

        Returns
        -------
            :obj:`numpy.ndarray`
                Array of size M x N x N x 3 containing the compressed
                representation.
        """

        # Add singleton dimension for single inputs.
        if R_d_desc.ndim == 2:
            R_d_desc = R_d_desc[None, ...]

        n = R_d_desc.shape[0]
        n_atoms = int(R_d_desc.shape[2] / 3)

        R_d_desc = R_d_desc.reshape(n, -1, n_atoms, 3)

        ret = np.zeros((n, n_atoms, n_atoms, 3))
        ret[:, self.M, self.A, :] = R_d_desc[:, self.d_desc_mask.ravel(), self.A, :]

        # Take the upper triangle.
        i, j = self.tril_indices
        return ret[:, i, j, :]

    @staticmethod
    def perm(perm):
        """
        Convert atom permutation to descriptor permutation.

        A permutation of N atoms is converted to a permutation that acts on
        the corresponding descriptor representation. Applying the converted
        permutation to a descriptor is equivalent to permuting the atoms
        first and then generating the descriptor.

        Parameters
        ----------
            perm : :obj:`numpy.ndarray`
                Array of size N containing the atom permutation.

        Returns
        -------
            :obj:`numpy.ndarray`
                Array of size N(N-1)/2 containing the corresponding
                descriptor permutation.
        """

        n = len(perm)

        rest = np.zeros((n, n))
        rest[np.tril_indices(n, -1)] = list(range((n**2 - n) // 2))
        rest = rest + rest.T
        rest = rest[perm, :]
        rest = rest[:, perm]

        return rest[np.tril_indices(n, -1)].astype(int)


================================================
FILE: sgdml/utils/io.py
================================================
#!/usr/bin/python

# MIT License
#
# Copyright (c) 2018-2021 Stefan Chmiela
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

import argparse
import hashlib
import os
import re
import sys

import numpy as np

from . import ui

_z_str_to_z_dict = {
    'H': 1,
    'He': 2,
    'Li': 3,
    'Be': 4,
    'B': 5,
    'C': 6,
    'N': 7,
    'O': 8,
    'F': 9,
    'Ne': 10,
    'Na': 11,
    'Mg': 12,
    'Al': 13,
    'Si': 14,
    'P': 15,
    'S': 16,
    'Cl': 17,
    'Ar': 18,
    'K': 19,
    'Ca': 20,
    'Sc': 21,
    'Ti': 22,
    'V': 23,
    'Cr': 24,
    'Mn': 25,
    'Fe': 26,
    'Co': 27,
    'Ni': 28,
    'Cu': 29,
    'Zn': 30,
    'Ga': 31,
    'Ge': 32,
    'As': 33,
    'Se': 34,
    'Br': 35,
    'Kr': 36,
    'Rb': 37,
    'Sr': 38,
    'Y': 39,
    'Zr': 40,
    'Nb': 41,
    'Mo': 42,
    'Tc': 43,
    'Ru': 44,
    'Rh': 45,
    'Pd': 46,
    'Ag': 47,
    'Cd': 48,
    'In': 49,
    'Sn': 50,
    'Sb': 51,
    'Te': 52,
    'I': 53,
    'Xe': 54,
    'Cs': 55,
    'Ba': 56,
    'La': 57,
    'Ce': 58,
    'Pr': 59,
    'Nd': 60,
    'Pm': 61,
    'Sm': 62,
    'Eu': 63,
    'Gd': 64,
    'Tb': 65,
    'Dy': 66,
    'Ho': 67,
    'Er': 68,
    'Tm': 69,
    'Yb': 70,
    'Lu': 71,
    'Hf': 72,
    'Ta': 73,
    'W': 74,
    'Re': 75,
    'Os': 76,
    'Ir': 77,
    'Pt': 78,
    'Au': 79,
    'Hg': 80,
    'Tl': 81,
    'Pb': 82,
    'Bi': 83,
    'Po': 84,
    'At': 85,
    'Rn': 86,
    'Fr': 87,
    'Ra': 88,
    'Ac': 89,
    'Th': 90,
    'Pa': 91,
    'U': 92,
    'Np': 93,
    'Pu': 94,
    'Am': 95,
    'Cm': 96,
    'Bk': 97,
    'Cf': 98,
    'Es': 99,
    'Fm': 100,
    'Md': 101,
    'No': 102,
    'Lr': 103,
    'Rf': 104,
    'Db': 105,
    'Sg': 106,
    'Bh': 107,
    'Hs': 108,
    'Mt': 109,
    'Ds': 110,
    'Rg': 111,
    'Cn': 112,
    'Uuq': 114,
    'Uuh': 116,
}
_z_to_z_str_dict = {v: k for k, v in _z_str_to_z_dict.items()}


def z_str_to_z(z_str):
    return np.array([_z_str_to_z_dict[x] for x in z_str])


def z_to_z_str(z):
    return [_z_to_z_str_dict[int(x)] for x in z]


def train_dir_name(dataset, n_train, use_sym, use_E, use_E_cstr):

    theory_level_str = re.sub(r'[^\w\-_\.]', '.', str(dataset['theory']))
    theory_level_str = re.sub(r'\.\.', '.', theory_level_str)

    sym_str = '-sym' if use_sym else ''
    # cprsn_str = '-cprsn' if use_cprsn else ''
    noE_str = '-noE' if not use_E else ''
    Ecstr_str = '-Ecstr' if use_E_cstr else ''

    return 'sgdml_cv_%s-%s-train%d%s%s%s' % (
        dataset['name'].astype(str),
        theory_level_str,
        n_train,
        sym_str,
        # cprsn_str,
        noE_str,
        Ecstr_str,
    )


def task_file_name(task):

    n_train = task['idxs_train'].shape[0]
    n_perms = task['perms'].shape[0]
    sig = np.squeeze(task['sig'])

    return 'task-train%d-sym%d-sig%04d.npz' % (n_train, n_perms, sig)


def model_file_name(task_or_model, is_extended=False):

    n_train = task_or_model['idxs_train'].shape[0]
    n_perms = task_or_model['perms'].shape[0]
    sig = np.squeeze(task_or_model['sig'])

    if is_extended:
        dataset = np.squeeze(task_or_model['dataset_name'])
        theory_level_str = re.sub(
            r'[^\w\-_\.]', '.', str(np.squeeze(task_or_model['dataset_theory']))
        )
        theory_level_str = re.sub(r'\.\.', '.', theory_level_str)
        return '%s-%s-train%d-sym%d.npz' % (dataset, theory_level_str, n_train, n_perms)
    return 'model-train%d-sym%d-sig%04d.npz' % (n_train, n_perms, sig)


def dataset_md5(dataset):

    md5_hash = hashlib.md5()

    keys = ['z', 'R']
    if 'E' in dataset:
        keys.append('E')
    keys.append('F')

    # only include new extra keys in fingerprint for 'modern' dataset files
    # 'code_version' was included from 0.4.0.dev1
    # opt_keys = ['lattice', 'e_unit', 'E_min', 'E_max', 'E_mean', 'E_var', 'f_unit', 'F_min', 'F_max', 'F_mean', 'F_var']
    # for k in opt_keys:
    #    if k in dataset:
    #        keys.append(k)

    for k in keys:
        d = dataset[k]
        if type(d) is np.ndarray:
            d = d.ravel()
        md5_hash.update(hashlib.md5(d).digest())

    return md5_hash.hexdigest().encode('utf-8')


# ## FILES

# Read geometry file (xyz format).
# R: (n_geo,3*n_atoms)
# z: (3*n_atoms,)
def read_xyz(file_path):

    with open(file_path, 'r') as f:
        n_atoms = None

        R, z = [], []
        for i, line in enumerate(f):
            line = line.strip()
            if not n_atoms:
                n_atoms = int(line)

            cols = line.split()
            file_i, line_i = divmod(i, n_atoms + 2)
            if line_i >= 2:
                R.append(list(map(float, cols[1:4])))
                if file_i == 0:  # first molecule
                    z.append(_z_str_to_z_dict[cols[0]])

        R = np.array(R).reshape(-1, 3 * n_atoms)
        z = np.array(z)

        f.close()
    return R, z


# Write geometry file (xyz format).
def write_geometry(filename, r, z, comment_str=''):

    r = np.squeeze(r)
    try:
        with open(filename, 'w') as f:
            f.write(str(len(r)) + '\n' + comment_str)
            for i, atom in enumerate(r):
                f.write('\n' + _z_to_z_str_dict[z[i]] + '\t')
                f.write('\t'.join(str(x) for x in atom))
    except IOError:
        sys.exit("ERROR: Writing xyz file failed.")


# Write geometry file (xyz format).
def generate_xyz_str(r, z, e=None, f=None, lattice=None):

    comment_str = ''
    if lattice is not None:
        comment_str += 'Lattice=\"{}\" '.format(
            ' '.join(['{:.12g}'.format(l) for l in lattice.T.ravel()])
        )
    if e is not None:
        comment_str += 'Energy={:.12g} '.format(e)
    comment_str += 'Properties=species:S:1:pos:R:3'
    if f is not None:
        comment_str += ':forces:R:3'

    species_str = '\n'.join([_z_to_z_str_dict[z_i] for z_i in z])

    r_f_str = ui.gen_mat_str(r)[0]
    if f is not None:
        r_f_str = ui.merge_col_str(r_f_str, ui.gen_mat_str(f)[0])

    xyz_str = str(len(r)) + '\n' + comment_str + '\n'
    xyz_str += ui.merge_col_str(species_str, r_f_str)

    return xyz_str


def lattice_vec_to_par(lat):

    lat = lat.T
    lengths = [np.linalg.norm(v) for v in lat]

    angles = []
    for i in range(3):
        j = i - 1
        k = i - 2

        ll = lengths[j] * lengths[k]
        if ll > 1e-16:
            x = np.dot(lat[j], lat[k]) / ll
            angle = 180.0 / np.pi * np.arccos(x)
        else:
            angle = 90.0
        angles.append(angle)

    return lengths, angles


### FILE HANDLING


def is_file_type(arg, type):
    """
    Validate file path and check if the file is of the specified type.

    Parameters
    ----------
        arg : :obj:`str`
            File path.
        type : {'dataset', 'task', 'model'}
            Possible file types.

    Returns
    -------
        (:obj:`str`, :obj:`dict`)
            Tuple of file path (as provided) and data stored in the
            file. The returned instance of NpzFile class must be
            closed to avoid leaking file descriptors.

    Raises
    ------
        ArgumentTypeError
            If the provided file path does not lead to a NpzFile.
        ArgumentTypeError
            If the file is not readable.
        ArgumentTypeError
            If the file is of wrong type.
        ArgumentTypeError
            If path/fingerprint is provided, but the path is not valid.
        ArgumentTypeError
            If fingerprint could not be resolved.
        ArgumentTypeError
            If multiple files with the same fingerprint exist.

    """

    # Replace MD5 dataset fingerprint with file name, if necessary.
    if type == 'dataset' and not arg.endswith('.npz') and not os.path.isdir(arg):
        dir = '.'
        if re.search(r'^[a-f0-9]{32}$', arg):  # arg looks similar to MD5 hash string
            md5_str = arg
        else:  # is it a path with a MD5 hash at the end?
            md5_str = os.path.basename(os.path.normpath(arg))
            dir = os.path.dirname(os.path.normpath(arg))

            if dir == '':  # it is only a filename after all, hence not the right type
                raise argparse.ArgumentTypeError('{0} is not a .npz file'.format(arg))

            if re.search(r'^[a-f0-9]{32}$', md5_str) and not os.path.isdir(
                dir
            ):  # path has MD5 hash string at the end, but directory is not valid
                raise argparse.ArgumentTypeError('{0} is not a directory'.format(dir))

        file_names = filter_file_type(dir, type, md5_match=md5_str)

        if not len(file_names):
            raise argparse.ArgumentTypeError(
                "No {0} files with fingerprint '{1}' found in '{2}'".format(
                    type, md5_str, dir
                )
            )
        elif len(file_names) > 1:
            error_str = (
                "Multiple {0} files with fingerprint '{1}' found in '{2}'".format(
                    type, md5_str, dir
                )
            )
            for file_name in file_names:
                error_str += '\n       {0}'.format(file_name)

            raise argparse.ArgumentTypeError(error_str)
        else:
            arg = os.path.join(dir, file_names[0])

    if not arg.endswith('.npz'):
        argparse.ArgumentTypeError('{0} is not a .npz file'.format(arg))

    try:
        file = np.load(arg, allow_pickle=True)
    except Exception:
        raise argparse.ArgumentTypeError('{0} is not readable'.format(arg))

    if 'type' not in file or file['type'].astype(str) != type[0]:
        raise argparse.ArgumentTypeError('{0} is not a {1} file'.format(arg, type))

    return arg, file


def filter_file_type(dir, type, md5_match=None):
    """
    Filters all files from a directory that match a given type and (optionally)
    a given fingerprint.

    Parameters
    ----------
        arg : :obj:`str`
            File path.
        type : {'dataset', 'task', 'model'}
            Possible file types.
        md5_match : :obj:`str`, optional
            Fingerprint string.

    Returns
    -------
        :obj:`list` of :obj:`str`
            List of file names that match the specified type and fingerprint
            (if provided).

    Raises
    ------
        ArgumentTypeError
            If the directory contains unreadable .npz files.

    """

    file_names = []
    for file_name in sorted(os.listdir(dir)):
        if file_name.endswith('.npz'):
            file_path = os.path.join(dir, file_name)
            try:
                file = np.load(file_path, allow_pickle=True)
            except Exception:
                raise argparse.ArgumentTypeError(
                    '{0} contains unreadable .npz files'.format(arg)
                )

            if 'type' in file and file['type'].astype(str) == type[0]:

                if md5_match is None:
                    file_names.append(file_name)
                elif 'md5' in file and file['md5'] == md5_match:
                    file_names.append(file_name)

            file.close()

    return file_names


def is_valid_file_type(arg_in):
    """
    Check if file is either a valid dataset, task or model file.

    Parameters
    ----------
        arg_in : :obj:`str`
            File path.

    Returns
    -------
        (:obj:`str`, :obj:`dict`)
            Tuple of file path (as provided) and data stored in the
            file. The returned instance of NpzFile class must be
            closed to avoid leaking file descriptors.

    Raises
    ------
        ArgumentTypeError
            If the provided file path does not point to a supported
            file type.

    """

    arg, file = None, None
    try:
        arg, file = is_file_type(arg_in, 'dataset')
    except argparse.ArgumentTypeError:
        pass

    if file is None:
        try:
            arg, file = is_file_type(arg_in, 'task')
        except argparse.ArgumentTypeError:
            pass

    if file is None:
        try:
            arg, file = is_file_type(arg_in, 'model')
        except argparse.ArgumentTypeError:
            pass

    if file is None:
        raise argparse.ArgumentTypeError(
            '{0} is neither a dataset, task, nor model file'.format(arg)
        )

    return arg, file


def is_dir_with_file_type(arg, type, or_file=False):
    """
    Validate directory path and check if it contains files of the specified type.

    Note
    ----
        If a file path is provided, this function acts like its a directory with
        just one file.

    Parameters
    ----------
        arg : :obj:`str`
            File path.
        type : {'dataset', 'task', 'model'}
            Possible file types.
        or_file : bool
            If `arg` contains a file path, act like it's a directory
            with just a single file inside.

    Returns
    -------
        (:obj:`str`, :obj:`list` of :obj:`str`)
            Tuple of directory path (as provided) and a list of
            contained file names of the specified type.

    Raises
    ------
        ArgumentTypeError
            If the provided directory path does not lead to a directory.
        ArgumentTypeError
            If directory contains unreadable files.
        ArgumentTypeError
            If directory contains no files of the specified type.
    """

    if or_file and os.path.isfile(arg):  # arg: file path
        _, file = is_file_type(
            arg, type
        )  # raises exception if there is a problem with the file
        file.close()
        file_name = os.path.basename(arg)
        file_dir = os.path.dirname(arg)
        return file_dir, [file_name]
    else:  # arg: dir

        if not os.path.isdir(arg):
            raise argparse.ArgumentTypeError('{0} is not a directory'.format(arg))

        file_names = filter_file_type(arg, type)

        # if not len(file_names):
        #    raise argparse.ArgumentTypeError(
        #        '{0} contains no {1} files'.format(arg, type)
        #    )

        return arg, file_names


def is_task_dir_resumeable(
    train_dir, train_dataset, test_dataset, n_train, n_test, sigs, gdml
):
    r"""
    Check if a directory contains `task` and/or `model` files that
    match the configuration of a training process specified in the
    remaining arguments.

    Check if the training and test datasets in each task match
    `train_dataset` and `test_dataset`, if the number of training and
    test points matches and if the choices for the kernel
    hyper-parameter :math:`\sigma` are contained in the list. Check
    also, if the existing tasks/models contain symmetries and if
    that's consistent with the flag `gdml`. This function is useful
    for determining if a training process can be resumed using the
    existing files or not.

    Parameters
    ----------
        train_dir : :obj:`str`
            Path to training directory.
        train_dataset : :obj:`dataset`
            Dataset from which training points are sampled.
        test_dataset : :obj:`test_dataset`
            Dataset from which test points are sampled (may be the
            same as `train_dataset`).
        n_train : int
            Number of training points to sample.
        n_test : int
            Number of test points to sample.
        sigs : :obj:`list` of int
            List of :math:`\sigma` kernel hyper-parameter choices
            (usually: the hyper-parameter search grid)
        gdml : bool
            If `True`, don't include any symmetries in model (GDML),
            otherwise do (sGDML).

    Returns
    -------
        bool
            False, if any of the files in the directory do not match
            the training configuration.
    """

    for file_name in sorted(os.listdir(train_dir)):
        if file_name.endswith('.npz'):
            file_path = os.path.join(train_dir, file_name)
            file = np.load(file_path, allow_pickle=True)

            if 'type' not in file:
                continue
            elif file['type'] == 't' or file['type'] == 'm':

                if (
                    file['md5_train'] != train_dataset['md5']
                    or file['md5_valid'] != test_dataset['md5']
                    or len(file['idxs_train']) != n_train
                    or len(file['idxs_valid']) != n_test
                    or gdml
                    and file['perms'].shape[0] > 1
                    or file['sig'] not in sigs
                ):
                    return False

    return True


### ARGUMENT VALIDATION


def is_strict_pos_int(arg):
    """
    Validate strictly positive integer input.

    Parameters
    ----------
        arg : :obj:`str`
            Integer as string.

    Returns
    -------
        int
            Parsed integer.

    Raises
    ------
        ArgumentTypeError
            If integer is not > 0.
    """
    x = int(arg)
    if x <= 0:
        raise argparse.ArgumentTypeError('must be strictly positive')
    return x


def parse_list_or_range(arg):
    """
    Parses a string that represents either an integer or a range in
    the notation ``<start>:<step>:<stop>``.

    Parameters
    ----------
        arg : :obj:`str`
            Integer or range string.

    Returns
    -------
        int or :obj:`list` of int

    Raises
    ------
        ArgumentTypeError
            If input can neither be interpreted as an integer nor a valid range.
    """

    if re.match(r'^\d+:\d+:\d+$', arg) or re.match(r'^\d+:\d+$', arg):
        rng_params = list(map(int, arg.split(':')))

        step = 1
        if len(rng_params) == 2:  # start, stop
            start, stop = rng_params
        else:  # start, step, stop
            start, step, stop = rng_params

        rng = list(range(start, stop + 1, step))  # include last stop-element in range
        if len(rng) == 0:
            raise argparse.ArgumentTypeError('{0} is an empty range'.format(arg))

        return rng
    elif re.match(r'^\d+$', arg):
        return int(arg)

    raise argparse.ArgumentTypeError(
        '{0} is neither a integer list, nor valid range in the form <start>:[<step>:]<stop>'.format(
            arg
        )
    )


================================================
FILE: sgdml/utils/perm.py
================================================
#!/usr/bin/python

# MIT License
#
# Copyright (c) 2018-2021 Stefan Chmiela
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

from __future__ import print_function

import multiprocessing as mp

Pool = mp.get_context('fork').Pool

import sys
import timeit
from functools import partial

import numpy as np
import scipy.optimize
import scipy.spatial.distance
from scipy.sparse import csr_matrix
from scipy.sparse.csgraph import minimum_spanning_tree

from .. import DONE, NOT_DONE
from .desc import Desc
from . import ui

glob = {}


def share_array(arr_np, typecode):
    arr = mp.RawArray(typecode, arr_np.ravel())
    return arr, arr_np.shape


def _bipartite_match_wkr(i, n_train, same_z_cost):

    global glob

    adj_set = np.frombuffer(glob['adj_set']).reshape(glob['adj_set_shape'])
    v_set = np.frombuffer(glob['v_set']).reshape(glob['v_set_shape'])
    match_cost = np.frombuffer(glob['match_cost']).reshape(glob['match_cost_shape'])

    adj_i = scipy.spatial.distance.squareform(adj_set[i, :])
    v_i = v_set[i, :, :]

    match_perms = {}
    for j in range(i + 1, n_train):

        adj_j = scipy.spatial.distance.squareform(adj_set[j, :])
        v_j = v_set[j, :, :]

        cost = -np.fabs(v_i).dot(np.fabs(v_j).T)
        cost += same_z_cost * np.max(np.abs(cost))

        _, perm = scipy.optimize.linear_sum_assignment(cost)

        adj_i_perm = adj_i[:, perm]
        adj_i_perm = adj_i_perm[perm, :]

        score_before = np.linalg.norm(adj_i - adj_j)
        score = np.linalg.norm(adj_i_perm - adj_j)

        match_cost[i, j] = score
        if score >= score_before:
            match_cost[i, j] = score_before
        elif not np.isclose(score_before, score):  # otherwise perm is identity
            match_perms[i, j] = perm

    return match_perms


def bipartite_match(R, z, lat_and_inv=None, max_processes=None, callback=None):

    global glob

    n_train, n_atoms, _ = R.shape

    # penalty matrix for mixing atom species
    same_z_cost = np.repeat(z[:, None], len(z), axis=1) - z
    same_z_cost[same_z_cost != 0] = 1

    # NEW

    # penalty matrix for mixing differently bonded atoms
    # NOTE: needs ASE, expects R to be in angstrom, does not support bond breaking

    # from ase import Atoms
    # from ase.geometry.analysis import Analysis

    # atoms = Atoms(
    #     z, positions=R[0]
    # )  # only use first molecule in dataset to find connected components (fix me later, maybe) # *0.529177249

    # bonds = Analysis(atoms).all_bonds[0]
    # #n_bonds = np.array([len(bonds_i) for bonds_i in bonds])

    # same_bonding_cost = np.zeros((n_atoms, n_atoms))
    # for i in range(n_atoms):
    #     bi = bonds[i]
    #     z_bi = z[bi]
    #     for j in range(i+1,n_atoms):
    #         bj = bonds[j]
    #         z_bj = z[bj]

    #         if set(z_bi) == set(z_bj):
    #             same_bonding_cost[i,j] = 1

    # same_bonding_cost += same_bonding_cost.T

    # same_bonding_cost[np.diag_indices(n_atoms)] = 1
    # same_bonding_cost = 1-same_bonding_cost

    # set(a) & set(b)

    # same_bonding_cost = np.repeat(n_bonds[:, None], len(n_bonds), axis=1) - n_bonds
    # same_bonding_cost[same_bonding_cost != 0] = 1

    # NEW

    match_cost = np.zeros((n_train, n_train))

    desc = Desc(n_atoms, max_processes=max_processes)

    adj_set = np.empty((n_train, desc.dim))
    v_set = np.empty((n_train, n_atoms, n_atoms))
    for i in range(n_train):
        r = np.squeeze(R[i, :, :])

        if lat_and_inv is None:
            adj = scipy.spatial.distance.pdist(r, 'euclidean')

            # from ase import Atoms
            # from ase.geometry.analysis import Analysis

            # atoms = Atoms(
            #     z, positions=r
            # )  # only use first molecule in dataset to find connected components (fix me later, maybe) # *0.529177249

            # bonds = Analysis(atoms).all_bonds[0]

            # adj = scipy.spatial.distance.squareform(adj)

            # bonded = np.zeros((z.size, z.size))

            # for j, bonded_to in enumerate(bonds):
            # inv_bonded_to = np.arange(n_atoms)
            # inv_bonded_to[bonded_to] = 0

            # adj[j, inv_bonded_to] = 0

            #    bonded[j, bonded_to] = 1

            # bonded = bonded + bonded.T

            # print(bonded)

        else:

            from .desc import _pdist, _squareform

            adj_tri = _pdist(r, lat_and_inv)
            adj = _squareform(adj_tri)  # our vectorized format to full matrix
            adj = scipy.spatial.distance.squareform(
                adj
            )  # full matrix to numpy vectorized format

        w, v = np.linalg.eig(scipy.spatial.distance.squareform(adj))
        v = v[:, w.argsort()[::-1]]

        adj_set[i, :] = adj
        v_set[i, :, :] = v

    glob['adj_set'], glob['adj_set_shape'] = share_array(adj_set, 'd')
    glob['v_set'], glob['v_set_shape'] = share_array(v_set, 'd')
    glob['match_cost'], glob['match_cost_shape'] = share_array(match_cost, 'd')

    if callback is not None:
        callback = partial(callback, disp_str='Bi-partite matching')

    start = timeit.default_timer()

    pool = None
    map_func = map
    if max_processes != 1 and mp.cpu_count() > 1:
        pool = Pool((max_processes or mp.cpu_count()) - 1)  # exclude main process
        map_func = pool.imap_unordered

    match_perms_all = {}
    for i, match_perms in enumerate(
        map_func(
            partial(_bipartite_match_wkr, n_train=n_train, same_z_cost=same_z_cost),
            list(range(n_train)),
        )
    ):
        match_perms_all.update(match_perms)

        if callback is not None:
            callback(i, n_train)

    if pool is not None:
        pool.close()
        pool.join()  # Wait for the worker processes to terminate (to measure total runtime correctly).
        pool = None

    stop = timeit.default_timer()

    dur_s = stop - start
    sec_disp_str = 'took {:.1f} s'.format(dur_s) if dur_s >= 0.1 else ''
    if callback is not None:
        callback(n_train, n_train, sec_disp_str=sec_disp_str)

    match_cost = np.frombuffer(glob['match_cost']).reshape(glob['match_cost_shape'])
    match_cost = match_cost + match_cost.T
    match_cost[np.diag_indices_from(match_cost)] = np.inf
    match_cost = csr_matrix(match_cost)

    return match_perms_all, match_cost


def sync_perm_mat(match_perms_all, match_cost, n_atoms, callback=None):

    if callback is not None:
        callback = partial(
            callback, disp_str='Multi-partite matching (permutation synchronization)'
        )
        callback(NOT_DONE)

    tree = minimum_spanning_tree(match_cost, overwrite=True)

    perms = np.arange(n_atoms, dtype=int)[None, :]
    rows, cols = tree.nonzero()
    for com in zip(rows, cols):
        perm = match_perms_all.get(com)
        if perm is not None:
            perms = np.vstack((perms, perm))
    perms = np.unique(perms, axis=0)

    if callback is not None:
        callback(DONE)

    return perms


# convert permutation to dijoined cycles
def to_cycles(perm):
    pi = {i: perm[i] for i in range(len(perm))}
    cycles = []

    while pi:
        elem0 = next(iter(pi))  # arbitrary starting element
        this_elem = pi[elem0]
        next_item = pi[this_elem]

        cycle = []
        while True:
            cycle.append(this_elem)
            del pi[this_elem]
            this_elem = next_item
            if next_item in pi:
                next_item = pi[next_item]
            else:
                break

        cycles.append(cycle)

    return cycles


# find permutation group with larges cardinality
# note: this is used if transitive closure fails (to salvage at least some permutations)
def salvage_subgroup(perms):

    n_perms, n_atoms = perms.shape

    all_long_cycles = []
    for i in range(n_perms):
        long_cycles = [cy for cy in to_cycles(list(perms[i, :])) if len(cy) > 1]
        all_long_cycles += long_cycles

    # print(all_long_cycles)
    # print('--------------')

    def _cycle_intersects_with_larger_one(cy):

        for ac in all_long_cycles:
            if len(cy) < len(ac):
                if not set(cy).isdisjoint(ac):
                    return True

        return False

    lcms = []
    keep_idx_many = []
    for i in range(n_perms):

        # print(to_cycles(list(perms[i, :])))

        # is this permutation valid?
        # remove permutations that contain cycles that share elements with larger cycles in other perms
        long_cycles = [cy for cy in to_cycles(list(perms[i, :])) if len(cy) > 1]

        # print('long cycles:')
        # print(long_cycles)

        ignore_perm = any(list(map(_cycle_intersects_with_larger_one, long_cycles)))

        if not ignore_perm:
            keep_idx_many.append(i)

        # print(ignore_perm)

        # print()

        # cy_lens = [len(cy) for cy in to_cycles(list(perms[i, :]))]
        # lcm = np.lcm.reduce(cy_lens)
        # lcms.append(lcm)
    # keep_idx = np.argmax(lcms)
    # perms = np.vstack((np.arange(n_atoms), perms[keep_idx,:]))
    perms = perms[keep_idx_many, :]

    # print(perms)

    return perms


def complete_sym_group(
    perms, n_perms_max=None, disp_str='Permutation group completion', callback=None
):

    if callback is not None:
        callback = partial(callback, disp_str=disp_str)
        callback(NOT_DONE)

    perm_added = True
    while perm_added:
        perm_added = False
        n_perms = perms.shape[0]
        for i in range(n_perms):
            for j in range(n_perms):

                new_perm = perms[i, perms[j, :]]
                if not (new_perm == perms).all(axis=1).any():
                    perm_added = True
                    perms = np.vstack((perms, new_perm))

                    # Transitive closure is not converging! Give up and return identity permutation.
                    if n_perms_max is not None and perms.shape[0] == n_perms_max:

                        if callback is not None:
                            callback(
                                DONE,
                                sec_disp_str='transitive closure has failed',
                                done_with_warning=True,
                            )
                        return None

    if callback is not None:
        callback(
            DONE,
            sec_disp_str='found {:d} symmetries'.format(perms.shape[0]),
        )

    return perms


def find_perms(R, z, lat_and_inv=None, callback=None, max_processes=None):

    m, n_atoms = R.shape[:2]

    # Find matching for all pairs.
    match_perms_all, match_cost = bipartite_match(
        R, z, lat_and_inv, max_processes, callback=callback
    )

    # Remove inconsistencies.
    match_perms = sync_perm_mat(match_perms_all, match_cost, n_atoms, callback=callback)

    # Commplete symmetric group.
    # Give up, if transitive closure yields more than 100 unique permutations.
    sym_group_perms = complete_sym_group(
        match_perms, n_perms_max=100, callback=callback
    )

    # Limit closure to largest cardinality permutation in the set to get at least some symmetries.
    if sym_group_perms is None:
        match_perms_subset = salvage_subgroup(match_perms)
        sym_group_perms = complete_sym_group(
            match_perms_subset,
            n_perms_max=100,
            disp_str='Closure disaster recovery',
            callback=callback,
        )

    return sym_group_perms


def find_extra_perms(R, z, lat_and_inv=None, callback=None, max_processes=None):

    m, n_atoms = R.shape[:2]

    # NEW

    # catcher
    # p = np.arange(n_atoms)
    # plane_3idxs = [19,17,47] # left to right
    # perm = find_perms_via_reflection(R[0], z, np.arange(n_atoms), plane_3idxs, lat_and_inv=None, max_processes=None)
    # perms = np.vstack((p[None,:], perm))
    # plane_3idxs = [(4,5),(2,1),(34,33)]  # top to bottom
    # perm = find_perms_via_reflection(R[0], z, np.arange(n_atoms), plane_3idxs, lat_and_inv=None, max_processes=None)
    # perms = np.vstack((perm[None,:], perms))
    # sym_group_perms = complete_sym_group(perms, n_perms_max=100, callback=callback)

    # nanotube
    R = R.copy()
    frags = find_frags(R[0], z, lat_and_inv=lat_and_inv)
    print(frags)

    perms = np.arange(n_atoms)[None, :]

    plane_3idxs = [280, 281, 273]  # half outer
    add_perms = find_perms_via_reflection(
        R[0], z, frags[1], plane_3idxs, lat_and_inv=None, max_processes=None
    )
    perms = np.vstack((perms, add_perms))

    # rotate inner
    # add_perms = find_perms_via_alignment(R[0], frags[0], [214, 215, 210, 211], [209, 208, 212, 213], z, lat_and_inv=lat_and_inv, max_processes=max_processes)
    # perms = np.vstack((perms, add_perms))
    # sym_group_perms = complete_sym_group(perms, callback=callback)

    # rotate outer
    # add_perms = find_perms_via_alignment(R[0], frags[1], [361, 360, 368, 369], [363, 362, 356, 357], z, lat_and_inv=lat_and_inv, max_processes=max_processes)
    # perms = np.vstack((perms, add_perms))
    # sym_group_perms = complete_sym_group(perms, callback=callback)

    perms = np.unique(perms, axis=0)
    sym_group_perms = complete_sym_group(perms, callback=callback)
    print(sym_group_perms.shape)

    return sym_group_perms

    # buckycatcher
    R = R.copy()  # *0.529177
    frags = find_frags(R[0], z, lat_and_inv=lat_and_inv)

    perms = np.arange(n_atoms)[None, :]

    # syms of catcher
    plane_3idxs = [54, 47, 17]  # left to right
    add_perms = find_perms_via_reflection(
        R[0], z, frags[0], plane_3idxs, lat_and_inv=None, max_processes=None
    )
    perms = np.vstack((perms, add_perms))

    plane_3idxs = [(33, 34), (31, 30), (5, 4)]  # top to bottom
    add_perms = find_perms_via_reflection(
        R[0], z, frags[0], plane_3idxs, lat_and_inv=None, max_processes=None
    )
    perms = np.vstack((perms, add_perms))

    # move cells
    # add_perms = find_perms_via_alignment(R[0], frags[1], [128, 129, 127], [133, 132, 134], z, lat_and_inv=lat_and_inv, max_processes=max_processes)
    # perms = np.vstack((perms, add_perms))
    # sym_group_perms = complete_sym_group(perms, callback=callback)

    # print(sym_group_perms.shape)

    # rotate cells
    add_perms = find_perms_via_alignment(
        R[0],
        frags[1],
        [129, 128, 127],
        [128, 127, 135],
        z,
        lat_and_inv=lat_and_inv,
        max_processes=max_processes,
    )
    perms = np.vstack((perms, add_perms))
    # print(add_perms.shape)
    # sym_group_perms = complete_sym_group(perms, callback=callback)

    # rotate cells (triangle)
    # add_perms = find_perms_via_alignment(R[0], frags[1], [132, 129, 134], [129, 134, 132], z, lat_and_inv=lat_and_inv, max_processes=max_processes)
    # perms = np.vstack((perms, add_perms))
    sym_group_perms = complete_sym_group(perms, callback=callback)

    # print(perms.shape)
    print(sym_group_perms.shape)

    # frag 1: bucky ball
    # perms = find_perms_in_frag(R, z, frags[1], lat_and_inv=lat_and_inv, max_processes=max_processes)
    # perms = np.vstack((p[None,:], perms))

    # print('perms')
    # print(perms.shape)

    # perms = np.unique(perms, axis=0)
    # perms = complete_sym_group(perms, callback=callback)

    # print('perms')
    # print(perms.shape)
    # print(sym_group_perms.shape)

    return sym_group_perms

    # NEW


def find_frags(r, z, lat_and_inv=None):

    from ase import Atoms
    from ase.geometry.analysis import Analysis
    from scipy.sparse.csgraph import connected_components

    print('Finding permutable non-bonded fragments... (assumes Ang!)')

    lat = None
    if lat_and_inv:
        lat = lat_and_inv[0]

    n_atoms = r.shape[0]
    atoms = Atoms(
        z, positions=r, cell=lat, pbc=lat is not None
    )  # only use first molecule in dataset to find connected components (fix me later, maybe) # *0.529177249

    adj = Analysis(atoms).adjacency_matrix[0]
    _, labels = connected_components(csgraph=adj, directed=False, return_labels=True)

    # frags = []
    # for label in np.unique(labels):
    #    frags.append(np.where(labels == label)[0])
    frags = [np.where(labels == label)[0] for label in np.unique(labels)]
    n_frags = len(frags)

    if n_frags == n_atoms:
        print(
            'Skipping fragment symmetry search (something went wrong, e.g. length unit not in Angstroms, etc.)'
        )
        return None

    print('| Found ' + str(n_frags) + ' disconnected fragments.')

    return frags


def find_frag_perms(R, z, lat_and_inv=None, callback=None, max_processes=None):

    from ase import Atoms
    from ase.geometry.analysis import Analysis
    from scipy.sparse.csgraph import connected_components

    # TODO: positions must be in Angstrom for this to work!!

    n_train, n_atoms = R.shape[:2]
    lat, lat_inv = lat_and_inv

    atoms = Atoms(
        z, positions=R[0], cell=lat, pbc=lat is not None
    )  # only use first molecule in dataset to find connected components (fix me later, maybe) # *0.529177249

    adj = Analysis(atoms).adjacency_matrix[0]
    _, labels = connected_components(csgraph=adj, directed=False, return_labels=True)

    # frags = []
    # for label in np.unique(labels):
    #    frags.append(np.where(labels == label)[0])
    frags = [np.where(labels == label)[0] for label in np.unique(labels)]
    n_frags = len(frags)

    if n_frags == n_atoms:
        print(
            'Skipping fragment symmetry search (something went wrong, e.g. length unit not in Angstroms, etc.)'
        )
        return [range(n_atoms)]

    # print(labels)

    # from . import ui, io
    # xyz_str = io.generate_xyz_str(R[0][np.where(labels == 0)[0], :]*0.529177249, z[np.where(labels == 0)[0]])
    # xyz_str = ui.indent_str(xyz_str, 2)
    # sprint(xyz_str)

    # NEW

    # uniq_labels = np.unique(labels)
    # R_cg = np.empty((R.shape[0], len(uniq_labels), R.shape[2]))
    # z_frags = []
    # z_cg = []
    # for label in uniq_labels:
    #     frag_idxs = np.where(labels == label)[0]

    #     R_cg[:,label,:] = np.mean(R[:,frag_idxs,:], axis=1)
    #     z_frag = np.sort(z[frag_idxs])

    #     z_frag_label = 0
    #     if len(z_frags) == 0:
    #         z_frags.append(z_frag)
    #     else:
    #         z_frag_label = np.where(np.all(z_frags == z_frag, axis=1))[0]

    #         if len(z_frag_label) == 0: # not found
    #             z_frag_label = len(z_frags)
    #             z_frags.append(z_frag)
    #         else:
    #             z_frag_label = z_frag_label[0]

    #     z_cg.append(z_frag_label)

    # print(z_cg)
    # print(R_cg.shape)

    # perms = find_perms(R_cg, np.array(z_cg), lat_and_inv=lat_and_inv, max_processes=max_processes)

    # print('cg perms')
    # print(perms)

    # NEW

    # print(n_frags)

    print('| Found ' + str(n_frags) + ' disconnected fragments.')

    # ufrags = np.unique([np.sort(z[frag]) for frag in frags])
    # print(ufrags)

    # sys.exit()

    # n_frags_unique = 0 # number of unique fragments

    # match fragments to find identical ones (allows permutations of fragments)
    swap_perms = [np.arange(n_atoms)]
    for f1 in range(n_frags):
        for f2 in range(f1 + 1, n_frags):

            sort_idx_f1 = np.argsort(z[frags[f1]])
            sort_idx_f2 = np.argsort(z[frags[f2]])
            inv_sort_idx_f2 = inv_perm(sort_idx_f2)

            z1 = z[frags[f1]][sort_idx_f1]
            z2 = z[frags[f2]][sort_idx_f2]

            if np.array_equal(z1, z2):  # fragment have the same composition

                for ri in range(
                    min(10, R.shape[0])
                ):  # only use first molecule in dataset for matching (fix me later)

                    R_match1 = R[ri, frags[f1], :]
                    R_match2 = R[ri, frags[f2], :]

                    # if np.array_equal(z1, z2):

                    R_pair = np.concatenate(
                        (R_match1[None, sort_idx_f1, :], R_match2[None, sort_idx_f2, :])
                    )

                    perms = find_perms(
                        R_pair, z1, lat_and_inv=lat_and_inv, max_processes=max_processes
                    )

                    # embed local permutation into global context
                    for p in perms:

                        match_perm = sort_idx_f1[p][inv_sort_idx_f2]

                        swap_perm = np.arange(n_atoms)
                        swap_perm[frags[f1]] = frags[f2][match_perm]
                        swap_perm[frags[f2][match_perm]] = frags[f1]
                        swap_perms.append(swap_perm)

            # else:
            #    n_frags_unique += 1

    swap_perms = np.unique(np.array(swap_perms), axis=0)

    # print(swap_perms)

    # print('| Found ' + str(n_frags_unique) + ' (likely to be) *unique* disconnected fragments.')

    # commplete symmetric group
    sym_group_perms = complete_sym_group(swap_perms)
    print(
        '| Found '
        + str(sym_group_perms.shape[0])
        + ' fragment permutations after closure.'
    )

    # return sym_group_perms

    # match fragments with themselves (to find symmetries in each fragment)

    def _frag_perm_to_perm(n_atoms, frag_idxs, frag_perms):

        # frag_idxs - indices of the fragment (one fragment!)
        # frag_perms - N fragment permutations (Nxn_atoms)

        perms = np.arange(n_atoms)[None, :]
        for fp in frag_perms:

            p = np.arange(n_atoms)
            p[frag_idxs] = frag_idxs[fp]
            perms = np.vstack((p[None, :], perms))

        return perms

    if n_frags > 1:
        print('| Finding symmetries in individual fragments.')
        for f in range(n_frags):

            R_frag = R[:, frags[f], :]
            z_frag = z[frags[f]]

            frag_perms = find_perms(
                R_frag, z_frag, lat_and_inv=lat_and_inv, max_processes=max_processes
            )

            perms = _frag_perm_to_perm(n_atoms, frags[f], frag_perms)
            sym_group_perms = np.vstack((perms, sym_group_perms))

            print('{:d} perms'.format(perms.shape[0]))

        sym_group_perms = np.unique(sym_group_perms, axis=0)
    sym_group_perms = complete_sym_group(sym_group_perms, callback=callback)

    return sym_group_perms

    # f = 0
    # perms = find_perms_via_alignment(R[0, :, :], frags[f], [215, 214, 210, 211], [209, 208, 212, 213], z, lat_and_inv=lat_and_inv, max_processes=max_processes)
    # #perms = find_perms_via_alignment(R[0, :, :], frags[f], [214, 215, 210, 211], [209, 208, 212, 213], z, lat_and_inv=lat_and_inv, max_processes=max_processes)
    # sym_group_perms = np.vstack((perms[None,:], sym_group_perms))
    # sym_group_perms = complete_sym_group(sym_group_perms, callback=callback)

    # #print(sym_group_perms.shape)

    # #import sys
    # #sys.exit()

    # return sym_group_perms


def _frag_perm_to_perm(n_atoms, frag_idxs, frag_perms):

    # frag_idxs - indices of the fragment (one fragment!)
    # frag_perms - N fragment permutations (Nxn_atoms)

    perms = np.arange(n_atoms)[None, :]
    for fp in frag_perms:

        p = np.arange(n_atoms)
        p[frag_idxs] = frag_idxs[fp]
        perms = np.vstack((p[None, :], perms))

    return perms


def find_perms_in_frag(R, z, frag_idxs, lat_and_inv=None, max_processes=None):

    n_atoms = R.shape[1]

    R_frag = R[:, frag_idxs, :]
    z_frag = z[frag_idxs]

    frag_perms = find_perms(
        R_frag, z_frag, lat_and_inv=lat_and_inv, max_processes=max_processes
    )

    perms = _frag_perm_to_perm(n_atoms, frag_idxs, frag_perms)

    return perms


def find_perms_via_alignment(
    pts_full,
    frag_idxs,
    align_a_idxs,
    align_b_idxs,
    z,
    lat_and_inv=None,
    max_processes=None,
):

    # 1. find rotatino that aligns points (Nx3 matrix) in 'align_a_idxs' with points in 'align_b_idxs'
    # 2. rotate the whole thing
    # find perms by matching those two structures (match atoms that are closest after transformation)

    # align_a_ctr = np.mean(align_a_pts, axis=0)
    # align_b_ctr = np.mean(align_b_pts, axis=0)

    # alignment indices are included in fragment
    assert np.isin(align_a_idxs, frag_idxs).all()
    assert np.isin(align_b_idxs, frag_idxs).all()

    assert len(align_a_idxs) == len(align_b_idxs)

    # align_a_frag_idxs = np.where(np.in1d(frag_idxs, align_a_idxs))[0]
    # align_b_frag_idxs = np.where(np.in1d(frag_idxs, align_b_idxs))[0]

    pts = pts_full[frag_idxs, :]

    align_a_pts = pts_full[align_a_idxs, :]
    align_b_pts = pts_full[align_b_idxs, :]

    ctr = np.mean(pts, axis=0)
    align_a_pts -= ctr
    align_b_pts -= ctr

    ab_cov = align_a_pts.T.dot(align_b_pts)
    u, s, vh = np.linalg.svd(ab_cov)
    R = u.dot(vh)

    if np.linalg.det(R) < 0:
        vh[2, :] *= -1  # multiply 3rd column of V by -1
        R = u.dot(vh)

    pts -= ctr
    pts_R = pts.copy()

    pts_R = R.dot(pts_R.T).T

    pts += ctr
    pts_R += ctr

    pts_full_R = pts_full.copy()
    pts_full_R[frag_idxs, :] = pts_R

    R_pair = np.vstack((pts_full[None, :, :], pts_full_R[None, :, :]))

    # from . import io

    # xyz_str = io.generate_xyz_str(pts_full, z)
    # print(xyz_str)

    # xyz_str = io.generate_xyz_str(pts_full_R, z)
    # print(xyz_str)

    # z_frag = z[frag_idxs]

    adj = scipy.spatial.distance.cdist(R_pair[0], R_pair[1], 'euclidean')
    _, perm = scipy.optimize.linear_sum_assignment(adj)

    # score_before = np.linalg.norm(adj)

    # adj_perm = scipy.spatial.distance.cdist(R_pair[0,:], R_pair[0, perm], 'euclidean')
    # score = np.linalg.norm(adj_perm)

    # print(score_before)
    # print(score)

    # print('---')

    # print('data \'model example\'', '|', end='')
    # rint('testing', '|', end='')
    # n_atoms = pts_full.shape[1]
    # print(n_atoms)

    # for p in pts_full[:,:]:
    #    print('H {:.5f} {:.5f} {:.5f}'.format(*p), '|', end='')

    # print('end \'model example\';show data')

    # draw selection
    if False:

        print('---')

        from matplotlib import cm

        viridis = cm.get_cmap('prism')
        colors = viridis(np.linspace(0, 1, len(align_a_idxs)))

        for i, idx in enumerate(align_a_idxs):
            color_str = (
                '['
                + str(int(colors[i, 0] * 255))
                + ','
                + str(int(colors[i, 1] * 255))
                + ','
                + str(int(colors[i, 2] * 255))
                + ']'
            )
            print('select atomno=' + str(idx + 1) + '; color ' + color_str)

        for i, idx in enumerate(align_b_idxs):
            color_str = (
                '['
                + str(int(colors[i, 0] * 255))
                + ','
                + str(int(colors[i, 1] * 255))
                + ','
                + str(int(colors[i, 2] * 255))
                + ']'
            )
            print('select atomno=' + str(idx + 1) + '; color ' + color_str)
        print('---')

    return perm


def find_perms_via_reflection(
    r, z, frag_idxs, plane_3idxs, lat_and_inv=None, max_processes=None
):

    # plane_3idxs can be tuples of atoms (to take their center) or atom indices

    # pts = pts_full[frag_idxs, :]
    # pts = r.copy()

    # compute normal of plane defined by atoms in 'plane_idxs'

    is_plane_defined_by_bond_centers = type(plane_3idxs[0]) is tuple
    if is_plane_defined_by_bond_centers:
        a = (r[plane_3idxs[0][0], :] + r[plane_3idxs[0][1], :]) / 2
        b = (r[plane_3idxs[1][0], :] + r[plane_3idxs[1][1], :]) / 2
        c = (r[plane_3idxs[2][0], :] + r[plane_3idxs[2][1], :]) / 2
    else:
        a = r[plane_3idxs[0], :]
        b = r[plane_3idxs[1], :]
        c = r[plane_3idxs[2], :]

    ab = b - a
    ab /= np.linalg.norm(ab)

    ac = c - a
    ac /= np.linalg.norm(ac)

    normal = np.cross(ab, ac)[:, None]

    # compute reflection matrix
    reflection = np.eye(3) - 2 * normal.dot(normal.T)

    r_R = r.copy()
    r_R[frag_idxs, :] = reflection.dot(r[frag_idxs, :].T).T

    # R_pair = np.vstack((r[None,:,:], r_R[None,:,:]))

    adj = scipy.spatial.distance.cdist(r, r_R, 'euclidean')
    _, perm = scipy.optimize.linear_sum_assignment(adj)

    print_perm_colors(perm, r, plane_3idxs)

    # score_before = np.linalg.norm(adj)

    # adj_perm = scipy.spatial.distance.cdist(R_pair[0,:], R_pair[0, perm], 'euclidean')
    # score = np.linalg.norm(adj_perm)

    return perm


def print_perm_colors(perm, pts, plane_3idxs=None):

    idx_done = []
    c = -1
    for i in range(perm.shape[0]):
        if i not in idx_done and perm[i] not in idx_done:
            c += 1
            idx_done += [i]
            idx_done += [perm[i]]

    from matplotlib import cm

    viridis = cm.get_cmap('prism')
    colors = viridis(np.linspace(0, 1, c + 1))

    print('---')
    print('select all; color [255,255,255]')

    if plane_3idxs is not None:

        def pts_str(x):
            return '{' + str(x[0]) + ', ' + str(x[1]) + ', ' + str(x[2]) + '}'

        is_plane_defined_by_bond_centers = type(plane_3idxs[0]) is tuple
        if is_plane_defined_by_bond_centers:
            a = (pts[plane_3idxs[0][0], :] + pts[plane_3idxs[0][1], :]) / 2
            b = (pts[plane_3idxs[1][0], :] + pts[plane_3idxs[1][1], :]) / 2
            c = (pts[plane_3idxs[2][0], :] + pts[plane_3idxs[2][1], :]) / 2
        else:
            a = pts[plane_3idxs[0], :]
            b = pts[plane_3idxs[1], :]
            c = pts[plane_3idxs[2], :]

        print(
            'draw plane1 300 PLANE '
            + pts_str(a)
            + ' '
            + pts_str(b)
            + ' '
            + pts_str(c)
            + ';color $plane1 green'
        )

    idx_done = []
    c = -1
    for i in range(perm.shape[0]):
        if i not in idx_done and perm[i] not in idx_done:

            c += 1
            color_str = (
                '['
                + str(int(colors[c, 0] * 255))
                + ','
                + str(int(colors[c, 1] * 255))
                + ','
                + str(int(colors[c, 2] * 255))
                + ']'
            )

            if i != perm[i]:
                print('select atomno=' + str(i + 1) + '; color ' + color_str)
                print('select atomno=' + str(perm[i] + 1) + '; color ' + color_str)
            idx_done += [i]
            idx_done += [perm[i]]

    print('---')


def inv_perm(perm):

    inv_perm = np.empty(perm.size, perm.dtype)
    inv_perm[perm] = np.arange(perm.T.size)

    return inv_perm


================================================
FILE: sgdml/utils/ui.py
================================================
#!/usr/bin/python

# MIT License
#
# Copyright (c) 2018-2021 Stefan Chmiela
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

from __future__ import print_function
from functools import partial

from .. import __version__, MAX_PRINT_WIDTH, LOG_LEVELNAME_WIDTH
import textwrap
import re
import sys

if sys.version[0] == '3':
    raw_input = input

import numpy as np


def yes_or_no(question):
    """
    Ask for yes/no user input on a question.

    Any response besides ``y`` yields a negative answer.

    Parameters
    ----------
        question : :obj:`str`
            User question.
    """

    reply = raw_input(question + ' (y/n): ').lower().strip()
    if not reply or reply[0] != 'y':
        return False
    else:
        return True


last_callback_pct = 0


def callback(
    current,
    total=1,
    disp_str='',
    sec_disp_str=None,
    done_with_warning=False,
    newline_when_done=True,
):
    """
    Print progress or toggle bar.

    Example (progress):
    ``[ 45%] Task description (secondary string)``

    Example (toggle, not done):
    ``[ .. ] Task description (secondary string)``

    Example (toggle, done):
    ``[DONE] Task description (secondary string)``

    Parameters
    ----------
        current : int
            How many items already processed?
        total : int, optional
            Total number of items? If there is only
            one item, the toggle style is used.
        disp_str : :obj:`str`, optional
            Task description.
        sec_disp_str : :obj:`str`, optional
            Additional string shown in gray.
        done_with_warning : bool, optional
            Indicate that the process did not
            finish successfully.
        newline_when_done : bool, optional
            Finish with a newline character once
            current=total (default: True)?
    """

    global last_callback_pct

    is_toggle = total == 1
    is_done = np.isclose(current - total, 0.0)

    bold_color_str = partial(color_str, bold=True)

    if is_toggle:

        if is_done:
            if done_with_warning:
                flag_str = bold_color_str('[WARN]', fore_color=YELLOW)
            else:
                flag_str = bold_color_str('[DONE]', fore_color=GREEN)

        else:
            flag_str = bold_color_str('[' + blink_str(' .. ') + ']')
    else:

        # Only show progress in 10 percent steps when not printing to terminal.
        pct = int(float(current) * 100 / total)
        pct = int(np.ceil(pct / 10.0)) * 10 if not sys.stdout.isatty() else pct

        # Do not print, if there is no need to.
        if not is_done and pct == last_callback_pct:
            return
        else:
            last_callback_pct = pct

        flag_str = bold_color_str(
            '[{:3d}%]'.format(pct), fore_color=GREEN if is_done else WHITE
        )

    sys.stdout.write('\r{} {}'.format(flag_str, disp_str))

    if sec_disp_str is not None:
        w = MAX_PRINT_WIDTH - LOG_LEVELNAME_WIDTH - len(disp_str) - 1
        # sys.stdout.write(' \x1b[90m{0: >{width}}\x1b[0m'.format(sec_disp_str, width=w))
        sys.stdout.write(
            color_str(' {:>{width}}'.format(sec_disp_str, width=w), fore_color=GRAY)
        )

    if is_done and newline_when_done:
        sys.stdout.write('\n')

    sys.stdout.flush()


# use this to integrate a callback for a subtask with an existing callback function
# 'subtask_callback = partial(ui.sec_callback, main_callback=self.callback)'
def sec_callback(
    current, total=1, disp_str=None, sec_disp_str=None, main_callback=None, **kwargs
):
    global last_callback_pct

    assert main_callback is not None

    is_toggle = total == 1
    is_done = np.isclose(current - total, 0.0)

    sec_disp_str = disp_str
    if is_toggle:
        sec_disp_str = '{} | {}'.format(disp_str, 'DONE' if is_done else ' .. ')
    else:

        # Only show progress in 10 percent steps when not printing to terminal.
        pct = int(float(current) * 100 / total)
        pct = int(np.ceil(pct / 10.0)) * 10 if not sys.stdout.isatty() else pct

        # Do not print, if there is no need to.
        if pct == last_callback_pct:
            return

        last_callback_pct = pct
        sec_disp_str = '{} | {:3d}%'.format(disp_str, pct)

    main_callback(0, sec_disp_str=sec_disp_str, **kwargs)


# COLORS

BLACK, RED, GREEN, YELLOW, BLUE, MAGENTA, CYAN, WHITE, GRAY = list(range(8)) + [60]
COLOR_SEQ, RESET_SEQ = '\033[{:d};{:d};{:d}m', '\033[0m'

ENABLE_COLORED_OUTPUT = (
    sys.stdout.isatty()
)  # Running in a real terminal or piped/redirected?


def color_str(str, fore_color=WHITE, back_color=BLACK, bold=False):

    if ENABLE_COLORED_OUTPUT:

        # foreground is set with 30 plus the number of the color, background with 40
        return (
            COLOR_SEQ.format(1 if bold else 0, 30 + fore_color, 40 + back_color)
            + str
            + RESET_SEQ
        )
    else:
        return str


def blink_str(str):

    return '\x1b[5m' + str + '\x1b[0m' if ENABLE_COLORED_OUTPUT else str


def unicode_str(s):

    if sys.version[0] == '3':
        s = str(s, 'utf-8', 'ignore')
    else:
        s = str(s)

    return s.rstrip('\x00')  # remove null-characters


def gen_memory_str(bytes):

    pwr = 1024
    n = 0
    pwr_strs = {0: '', 1: 'K', 2: 'M', 3: 'G', 4: 'T'}
    while bytes > pwr and n < 4:
        bytes /= pwr
        n += 1

    return '{:.{num_dec_pts}f} {}B'.format(
        bytes, pwr_strs[n], num_dec_pts=max(0, n - 2)
    )  # 1 decimal point for GB, 2 for TB


def gen_lattice_str(lat):

    lat_str, col_widths = gen_mat_str(lat)
    desc_str = (' '.join([('{:' + str(w) + '}') for w in col_widths])).format(
        'a', 'b', 'c'
    ) + '\n'

    lat_str = indent_str(lat_str, 21)

    return desc_str + lat_str


def str_plen(str):
    """
    Returns printable length of string. This function can only account for invisible characters due to string styling with ``color_str``.

    Parameters
    ----------
        str : :obj:`str`
            String.

    Returns
    -------
        :obj:`str`

    """

    num_colored_subs = str.count(RESET_SEQ)
    return len(str) - (
        14 * num_colored_subs
    )  # 14: length of invisible characters per colored segment


def wrap_str(str, width=MAX_PRINT_WIDTH - LOG_LEVELNAME_WIDTH):
    """
    Wrap multiline string after a given number of characters. The default maximum line already accounts for the indentation due to the logging level label.

    Parameters
    ----------
        str : :obj:`str`
            Multiline string.
        width : int, optional
            Max number of characters in a line.

    Returns
    -------
        :obj:`str`

    """

    return '\n'.join(
        [
            '\n'.join(
                textwrap.wrap(
                    line,
                    width + (len(line) - str_plen(line)),
                    break_long_words=False,
                    replace_whitespace=False,
                )
            )
            for line in str.splitlines()
        ]
    )


def indent_str(str, indent):
    """
    Indents all lines of a multiline string right by a given number of
    characters.

    Parameters
    ----------
        str : :obj:`str`
            Multiline string.
        indent : int
            Number of characters added in front of each line.

    Returns
    -------
        :obj:`str`

    """

    return re.sub('^', ' ' * indent, str, flags=re.MULTILINE)


def wrap_indent_str(label, str, width=MAX_PRINT_WIDTH - LOG_LEVELNAME_WIDTH):
    """
    Wraps and indents a multiline string to arrange it with the provided label in two columns. The default maximum line already accounts for the indentation due to the logging level label.

    Example:
    ``<label><multiline string>``

    Parameters
    ----------
        label : :obj:`str`
            Label
        str : :obj:`str`
            Multiline string.

    Returns
    -------
        :obj:`str`

    """

    label_len = str_plen(label)

    str = wrap_str(str, width - label_len)
    str = indent_str(str, label_len)

    return label + str[label_len:]


def merge_col_str(
    col_str1, col_str2
):  # merge two multiline strings that represent columns in a table
    """
    Merges two multiline strings that represent columns in a table by
    concatenating each pair of lines.

    Note
    ----
        Both strings must have the same number of lines.

    Parameters
    ----------
        col_str1 : :obj:`str`
            First multiline string.
        col_str2 : :obj:`str`
            Second multiline string.

    Returns
    -------
        :obj:`str`

    """

    return '\n'.join(
        [
            ' '.join([c1, c2])
            for c1, c2 in zip(col_str1.split('\n'), col_str2.split('\n'))
        ]
    )


def gen_mat_str(mat):
    """
    Converts a matrix to a multiline string such that the decimal points
    align in each column. Trailing zeros are replaced with spaces.

    Parameters
    ----------
        mat : :obj:`numpy.ndarray`

    Returns
    -------
        :obj:`str`
            String representation of matrix.

    """

    def _int_len(
        x,
    ):  # length of string representation before decimal point (including sign)
        return len(str(int(abs(x)))) + (0 if x >= 0 else 1)

    def _dec_len(x):  # length of string representation after decimal point

        x_str_split = '{:g}'.format(x).split('.')
        return len(x_str_split[1]) if len(x_str_split) > 1 else 0

    def _max_int_len_for_col(
        mat, col
    ):  # length of string representation before decimal point for each col
        col_min = np.min(mat[:, col])
        col_max = np.max(mat[:, col])
        return max(_int_len(col_min), _int_len(col_max))

    def _max_dec_len_for_col(
        mat, col
    ):  # length of string representation after decimal point for each col
        return max([_dec_len(cell) for cell in mat[:, col]])

    n_cols = mat.shape[1]
    col_int_widths = [_max_int_len_for_col(mat, i) for i in range(n_cols)]
    col_dec_widths = [_max_dec_len_for_col(mat, i) for i in range(n_cols)]
    col_widths = [iw + cd + 1 for iw, cd in zip(col_int_widths, col_dec_widths)]

    mat_str = ''
    for row in mat:
        if mat_str != '':
            mat_str += '\n'
        mat_str += ' '.join(
            ' ' * max(col_int_widths[j] - _int_len(x), 0)
            + ('{: <' + str(_int_len(x) + col_dec_widths[j] + 1) + 'g}').format(x)
            for j, x in enumerate(row)
        )

    return mat_str, col_widths


def gen_range_str(min, max):
    """
    Generates a string that shows a minimum and maximum value, as well as the range.

    Example:
    ``<min> |-- <range> --| <max>``

    Parameters
    ----------
        min : float
            Minimum value.
        max : float
            Maximum value.

    Returns
    -------
        :obj:`str`

    """

    return '{:<.3f} |-- {:^8.3f} --| {:<9.3f}'.format(min, max - min, max)


def print_step_title(title_str, sec_title_str='', underscore=True):

    if sec_title_str != '':
        sec_title_str = ' ' + sec_title_str

    underscore_str = '\n' + '-' * MAX_PRINT_WIDTH if underscore else ''

    print(
        '\n'
        + color_str(
            ' ' + title_str + ' ', fore_color=BLACK, back_color=WHITE, bold=True
        )
        + sec_title_str
        + underscore_str
    )


def print_two_column_str(str, sec_str=''):

    sec_str = color_str(
        '{:>{width}}'.format(sec_str, width=MAX_PRINT_WIDTH - str_plen(str) - 1),
        fore_color=GRAY,
    )
    print('{} {}'.format(str, sec_str))

    # print(
    #     '{} \x1b[90m{:>{width}}\x1b[0m'.format(
    #         str, sec_str, width=MAX_PRINT_WIDTH - str_plen(str) - 1
    #     )
    # )


def print_lattice(lat=None, inset=False):

    from . import io

    lat_str = 'n/a'
    if lat is not None:
        lat_str = gen_lattice_str(lat)
        lengths, angles = io.lattice_vec_to_par(lat)

    if inset:
        print('    {:<16} {}'.format('Lattice:', lat_str))
    else:
        print('  {:<18} {}'.format('Lattice:', lat_str))
    if lat is not None:
        print('    {:<16} a = {:g}, b = {:g}, c = {:g}'.format('Lengths:', *lengths))
        print(
            '    {:<16} alpha = {:g}, beta = {:g}, gamma = {:g}'.format(
                'Angles [deg]:', *angles
            )
        )