Repository: uber-research/EvoGrad Branch: master Commit: 53afad88074d Files: 14 Total size: 26.5 KB Directory structure: gitextract_9ie9iprt/ ├── .gitignore ├── LICENSE ├── README.md ├── demos/ │ ├── cartpole.py │ ├── max_ent_interference.py │ ├── max_var_interference.py │ ├── standard_interference.py │ └── standard_quadratic.py ├── evograd/ │ ├── __init__.py │ ├── distributions.py │ ├── expectation.py │ └── noise.py ├── requirements.txt └── setup.py ================================================ FILE CONTENTS ================================================ ================================================ FILE: .gitignore ================================================ venv __pycache__ *.pyo *.pyc build dist *.egg-info *.swp ================================================ FILE: LICENSE ================================================ "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by the text below. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under this License. This License governs use of the accompanying Work, and your use of the Work constitutes acceptance of this License. You may use this Work for any non-commercial purpose, subject to the restrictions in this License. Some purposes which can be non-commercial are teaching, academic research, and personal experimentation. You may also distribute this Work with books or other teaching materials, or publish the Work on websites, that are intended to teach the use of the Work. You may not use or distribute this Work, or any derivative works, outputs, or results from the Work, in any form for commercial purposes. Non-exhaustive examples of commercial purposes would be running business operations, licensing, leasing, or selling the Work, or distributing the Work for use with commercial products. You may modify this Work and distribute the modified Work for non-commercial purposes, however, you may not grant rights to the Work or derivative works that are broader than or in conflict with those provided by this License. For example, you may not distribute modifications of the Work under terms that would permit commercial use, or under terms that purport to require the Work or derivative works to be sublicensed to others. In return, we require that you agree: 1. Not to remove any copyright or other notices from the Work. 2. That if you distribute the Work in Source or Object form, you will include a verbatim copy of this License. 3. That if you distribute derivative works of the Work in Source form, you do so only under a license that includes all of the provisions of this License and is not in conflict with this License, and if you distribute derivative works of the Work solely in Object form you do so only under a license that complies with this License. 4. That if you have modified the Work or created derivative works from the Work, and distribute such modifications or derivative works, you will cause the modified files to carry prominent notices so that recipients know that they are not receiving the original Work. Such notices must state: (i) that you have changed the Work; and (ii) the date of any changes. 5. If you publicly use the Work or any output or result of the Work, you will provide a notice with such use that provides any person who uses, views, accesses, interacts with, or is otherwise exposed to the Work (i) with information of the nature of the Work, (ii) with a link to the Work, and (iii) a notice that the Work is available under this License. 6. THAT THE WORK COMES "AS IS", WITH NO WARRANTIES. THIS MEANS NO EXPRESS, IMPLIED OR STATUTORY WARRANTY, INCLUDING WITHOUT LIMITATION, WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE OR ANY WARRANTY OF TITLE OR NON-INFRINGEMENT. ALSO, YOU MUST PASS THIS DISCLAIMER ON WHENEVER YOU DISTRIBUTE THE WORK OR DERIVATIVE WORKS. 7. THAT NEITHER UBER TECHNOLOGIES, INC. NOR ANY OF ITS AFFILIATES, SUPPLIERS, SUCCESSORS, NOR ASSIGNS WILL BE LIABLE FOR ANY DAMAGES RELATED TO THE WORK OR THIS LICENSE, INCLUDING DIRECT, INDIRECT, SPECIAL, CONSEQUENTIAL OR INCIDENTAL DAMAGES, TO THE MAXIMUM EXTENT THE LAW PERMITS, NO MATTER WHAT LEGAL THEORY IT IS BASED ON. ALSO, YOU MUST PASS THIS LIMITATION OF LIABILITY ON WHENEVER YOU DISTRIBUTE THE WORK OR DERIVATIVE WORKS. 8. That if you sue anyone over patents that you think may apply to the Work or anyone's use of the Work, your license to the Work ends automatically. 9. That your rights under the License end automatically if you breach it in any way. 10. Uber Technologies, Inc. reserves all rights not expressly granted to you in this License. ================================================ FILE: README.md ================================================ # EvoGrad EvoGrad is a lightweight tool for differentiating through expectation, built on top of PyTorch. Tools that enable fast and flexible experimentation democratize and accelerate machine learning research. However, one field that so far has not been greatly impacted by automatic differentiation tools is evolutionary computation The reason is that most evolutionary algorithms are gradient-free: they do not follow any explicit mathematical gradient (i.e., the mathematically optimal local direction of improvement), and instead proceed through a generate-and-test heuristic. In other words, they create new variants, test them out, and keep the best. Recent and exciting research in evolutionary algorithms for deep reinforcement learning, however, has highlighted how a specific class of evolutionary algorithms can benefit from auto-differentiation. Work from OpenAI demonstrated that a form of Natural Evolution Strategies (NES) is massively scalable, and competitive with modern deep reinforcement learning algorithms. EvoGrad enables fast prototyping of NES-like algorithms. We believe there are many interesting algorithms yet to be discovered in this vein, and we hope this library will help to catalyze progress in the machine learning community. ## Examples ### Natural Evolution Strategies As a first example, we’ll implement the simplified NES algorithm of [Salimans et al. (2017)](https://openai.com/blog/evolution-strategies/) in EvoGrad. EvoGrad provides several probability distributions which may be used in the expectation function. We will use a normal distribution because it is the most common choice in practice. Let’s consider the problem of finding a fitness peak in a simple 1-D search space. We can define our population distribution over this search space to be initially centered at 1.0, with a fixed variance of 0.05, with the following Python code: ```python mu = torch.tensor([1.0], requires_grad=True) p = Normal(mu, 0.05) ``` Next, let’s define a simple fitness function that rewards individuals for approaching the location 5.0 in the search space: ```python def fitness(xs): return -(x - 5.0) ** 2 ``` Each generation of evolution in NES takes samples from the population distribution and evaluates the fitness of each of those individual samples. Here we sample and evaluate 100 individuals from the distribution: ```python sample = p.sample(n=100) fitnesses = fitness(sample) ``` Optionally, we can apply a [whitening transformation](https://en.wikipedia.org/wiki/Whitening_transformation) to the fitnesses (a form of pre-processing that often increases NES performance), like this: ```python fitnesses = (fitnesses - fitnesses.mean()) / fitnesses.std() ``` Now we can use these calculated fitness values to estimate the mean fitness over our population distribution: ```python mean = expectation(fitnesses, sample, p=p) ``` Although we could have estimated the mean value directly with the snippet: `mean = fitnesses.mean()`, what we gain by instead using the EvoGrad `expectation` function is the ability to backpropagate through mean. We can then use the resulting auto-differentiated gradients to optimize the center of the 1D Gaussian population distribution (mu) through gradient descent (here, to increase the expected fitness value of the population): ```python mean.backward() with torch.no_grad(): mu += alpha * mu.grad mu.grad.zero_() ``` ### Maximizing Variance As a more sophisticated example, rather than maximizing the mean fitness, we can maximize the variance of behaviors in the population. While fitness is a measure of quality for a fixed task, in some situations we want to prepare for the unknown, and instead might want our population to contain a diversity of behaviors that can easily be adapted to solve a wide range of possible future tasks. To do so, we need a quantification of behavior, which we can call a behavior characterization. Similarly to how you can evaluate an individual parameter vector drawn from the population distribution to establish its fitness (e.g. how far does this controller cause a robot to walk?), you could evaluate such a draw and return some quantification of its behavior (e.g., what position does a robot controlled by this neural network locomote to?). For this example, let’s choose a simple but illustrative, 1D behavior characterization, namely, the product of two sine waves (one with much faster frequency than the other): ```python def behavior(x): return 5 * torch.sin(0.2 * x) * torch.sin(20 * x) ``` Now, instead of estimating the mean fitness, we can calculate a statistic that reflects the diversity of sampled behaviors. The variance of a distribution is one metric of diversity, and one variant of evolvability ES measures and optimizes such variance of behaviors sampled from the population distribution: ```python sample = p.sample(n=100) behaviors = behavior(sample) zscore = (behaviors - behaviors.mean()) / behaviors.std() variance = expectation(zscore ** 2, sample, p=p) ``` ### Maximizing Entropy In the previous example, the gradient would be relatively straightforward to compute by hand. However, sometimes we may need to maximize objectives whose derivatives would be much more challenging to derive. In particular, this final example will seek to maximize the entropy of the distribution of behaviors (a variant of evolvability ES). Note that for this example you'll also have to install `scipy` from pip. To create a differentiable estimate of entropy, we first compute the pairwise distances between the different behaviors. Next, we create a smooth probability distribution by fitting a [kernel density estimate](https://en.wikipedia.org/wiki/Kernel_density_estimation): ```python dists = scipy.spatial.distance.squareform(scipy.spatial.distance.pdist(behaviors, "sqeuclidean")) kernel = torch.tensor(scipy.exp(-dists / k_sigma ** 2), dtype=torch.float32) p_x = expectation(kernel, sample, p=p, dim=1) ``` Then, we can use these probabilities to estimate the [entropy of the distribution](https://en.wikipedia.org/wiki/Entropy_(information_theory)), and run gradient descent on it as before: ```python entropy = expectation(-torch.log(p_x), sample, p=p) ``` Full code for these examples can be found in the `demos` directory of this repository. ## Installation Either install EvoGrad from pip: ``` pip install evograd ``` Or from the source code in this repository: ``` git clone github.com/uber-research/EvoGrad cd EvoGrad pip install -r requirements.txt pip install -e . ``` ## About Development of EvoGrad was led by [Alex Gajewski](https://github.com/agajews) as a Summer intern at Uber AI Labs. ================================================ FILE: demos/cartpole.py ================================================ # Copyright (c) 2019 Uber Technologies, Inc. # # Licensed under the Uber Non-Commercial License (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at the root directory of this project. # # See the License for the specific language governing permissions and # limitations under the License. import torch import numpy as np from evograd import expectation from evograd.distributions import Normal import gym def simulate_single(weights): total_reward = 0.0 num_run = 10 for t in range(num_run): observation = env.reset() for i in range(300): action = 1 if np.dot(weights, observation) > 0 else 0 observation, reward, done, info = env.step(action) total_reward += reward if done: break return total_reward / num_run def simulate(batch_weights): rewards = [] for weights in batch_weights: rewards.append(simulate_single(weights.numpy())) return torch.tensor(rewards) mu = torch.randn(4, requires_grad=True) # population mean npop = 50 # population size std = 0.5 # noise standard deviation alpha = 0.03 # learning rate p = Normal(mu, std) env = gym.make("CartPole-v0") for t in range(2000): sample = p.sample(npop) fitnesses = simulate(sample) scaled_fitnesses = (fitnesses - fitnesses.mean()) / fitnesses.std() mean = expectation(scaled_fitnesses, sample, p=p) mean.backward() with torch.no_grad(): mu += alpha * mu.grad mu.grad.zero_() print("step: {}, mean fitness: {:0.5}".format(t, float(fitnesses.mean()))) ================================================ FILE: demos/max_ent_interference.py ================================================ # Copyright (c) 2019 Uber Technologies, Inc. # # Licensed under the Uber Non-Commercial License (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at the root directory of this project. # # See the License for the specific language governing permissions and # limitations under the License. import scipy import scipy.spatial import torch import numpy as np from evograd import expectation from evograd.distributions import Normal def fun(x): return 5 * torch.sin(0.2 * x) * torch.sin(20 * x) mu = torch.tensor([1.0], requires_grad=True) npop = 500 # population size std = 0.5 # noise standard deviation k_sigma = 1.0 # kernel standard deviation alpha = 0.10 # learning rate p = Normal(mu, std) for t in range(2000): sample = p.sample(npop) novelties = fun(sample) novelties = (novelties - novelties.mean()) / novelties.std() dists = scipy.spatial.distance.squareform( scipy.spatial.distance.pdist(novelties, "sqeuclidean") ) kernel = torch.tensor(scipy.exp(-dists / k_sigma ** 2), dtype=torch.float32) p_x = expectation(kernel, sample, p=p) entropy = expectation(-torch.log(p_x), sample, p=p) entropy.backward() with torch.no_grad(): mu += alpha * mu.grad mu.grad.zero_() print("step: {}, estimated entropy: {:0.5}".format(t, float(mu))) ================================================ FILE: demos/max_var_interference.py ================================================ # Copyright (c) 2019 Uber Technologies, Inc. # # Licensed under the Uber Non-Commercial License (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at the root directory of this project. # # See the License for the specific language governing permissions and # limitations under the License. import torch import numpy as np from evograd import expectation from evograd.distributions import Normal def fun(x): return 5 * torch.sin(0.2 * x) * torch.sin(20 * x) mu = torch.tensor([1.0], requires_grad=True) npop = 500 # population size std = 0.5 # noise standard deviation alpha = 0.03 # learning rate p = Normal(mu, std) for t in range(2000): sample = p.sample(npop) behaviors = fun(sample) zscores = (behaviors - behaviors.mean()) / behaviors.std() variance = expectation(zscores ** 2, sample, p=p) variance.backward() with torch.no_grad(): mu += alpha * mu.grad mu.grad.zero_() print("step: {}, estimated variance: {:0.5}".format(t, float(mu))) ================================================ FILE: demos/standard_interference.py ================================================ #Copyright (c) 2019 Uber Technologies, Inc. # #Licensed under the Uber Non-Commercial License (the "License"); #you may not use this file except in compliance with the License. #You may obtain a copy of the License at the root directory of this project. # #See the License for the specific language governing permissions and #limitations under the License. import torch from evograd import expectation from evograd.distributions import Normal def fun(x): return 5 * torch.sin(0.2 * x) * torch.sin(20 * x) mu = torch.tensor([1.0], requires_grad=True) npop = 500 # population size std = 0.5 # noise standard deviation alpha = 0.03 # learning rate p = Normal(mu, std) for t in range(2000): sample = p.sample(npop) fitnesses = fun(sample) fitnesses = (fitnesses - fitnesses.mean()) / fitnesses.std() mean = expectation(fitnesses, sample, p=p) mean.backward() with torch.no_grad(): mu += alpha * mu.grad mu.grad.zero_() print('step: {}, mean fitness: {:0.5}'.format(t, float(mu))) ================================================ FILE: demos/standard_quadratic.py ================================================ #Copyright (c) 2019 Uber Technologies, Inc. # #Licensed under the Uber Non-Commercial License (the "License"); #you may not use this file except in compliance with the License. #You may obtain a copy of the License at the root directory of this project. # #See the License for the specific language governing permissions and #limitations under the License. import torch from evograd import expectation from evograd.distributions import Normal def fun(x): return -(x - 5.0) ** 2 mu = torch.tensor([1.0], requires_grad=True) npop = 500 # population size std = 0.5 # noise standard deviation alpha = 0.03 # learning rate p = Normal(mu, std) for t in range(2000): sample = p.sample(npop) fitnesses = fun(sample) fitnesses = (fitnesses - fitnesses.mean()) / fitnesses.std() mean = expectation(fitnesses, sample, p=p) mean.backward() with torch.no_grad(): mu += alpha * mu.grad mu.grad.zero_() print('step: {}, mean fitness: {:0.5}'.format(t, float(mu))) ================================================ FILE: evograd/__init__.py ================================================ #Copyright (c) 2019 Uber Technologies, Inc. # #Licensed under the Uber Non-Commercial License (the "License"); #you may not use this file except in compliance with the License. #You may obtain a copy of the License at the root directory of this project. # #See the License for the specific language governing permissions and #limitations under the License. from .expectation import expectation ================================================ FILE: evograd/distributions.py ================================================ # Copyright (c) 2019 Uber Technologies, Inc. # # Licensed under the Uber Non-Commercial License (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at the root directory of this project. # # See the License for the specific language governing permissions and # limitations under the License. import numpy as np import torch from .noise import noise class NormalProbRatio(torch.autograd.Function): @staticmethod def forward(ctx, mu, sigma, descriptors, decode_fn): ctx.save_for_backward(mu) ctx.sigma = sigma ctx.descriptors = descriptors ctx.decode_fn = decode_fn res = torch.ones(len(descriptors), dtype=torch.float32) res.requires_grad = True return res @staticmethod def backward(ctx, grad_output): mu, = ctx.saved_tensors theta = ctx.decode_fn(ctx.descriptors) grad = (theta - mu) / ctx.sigma ** 2 * grad_output.unsqueeze(1) return (grad, None, None, None) class MixNormalProbRatio(torch.autograd.Function): @staticmethod def forward(ctx, mus, sigma, descriptors, decode_fn): ctx.save_for_backward(mus) ctx.sigma = sigma ctx.descriptors = descriptors ctx.decode_fn = decode_fn res = torch.ones(len(descriptors), dtype=torch.float32) res.requires_grad = True return res @staticmethod def backward(ctx, grad_output): mus, = ctx.saved_tensors thetas = ctx.decode_fn(ctx.descriptors) epsilons = [thetas - mu for mu in mus] grads = torch.stack( [ (epsilon / ctx.sigma ** 2) / ( 1 + sum( torch.exp( -0.5 * (other.dot(other) - epsilon.dot(epsilon)) / ctx.sigma ** 2 ) for other in epsilons if other is not epsilon ) ) * grad_output for epsilon in epsilons ] ) return (grads, None, None, None) class Distribution: def __init__(self, device, random_state=None): if random_state is None: random_state = np.random.RandomState() # pylint: disable=no-member self.random_state = random_state self.device = device class Normal(Distribution): def __init__(self, mu, sigma, random_state=None): """ mu: torch.tensor sigma: torch.tensor or float """ super().__init__(mu.device, random_state) self.mu = mu self.sigma = sigma def ratio(self, descriptors): return NormalProbRatio.apply(self.mu, self.sigma, descriptors, self.decode) def sample(self, n, encode=False): n_epsilons = n noise_inds = np.asarray( [ noise.sample_index(self.random_state, len(self.mu)) for _ in range(n_epsilons) ], dtype="int", ) descriptors = [(idx, 1) for idx in noise_inds] if encode: return descriptors thetas = torch.stack([self.decode(descriptor) for descriptor in descriptors]) return thetas def decode(self, descriptor): if not isinstance(descriptor, tuple): # assert isinstance(descriptor, torch.tensor) return descriptor noise_idx, direction = descriptor epsilon = torch.tensor(noise.get(noise_idx, len(self.mu)), device=self.device) with torch.no_grad(): return self.mu + direction * self.sigma * epsilon class PairedNormal(Normal): def sample(self, n, encode=False): assert n % 2 == 0 n_epsilons = n // 2 noise_inds = np.asarray( [ noise.sample_index(self.random_state, len(self.mu)) for _ in range(n_epsilons) ], dtype="int", ) descriptors = [(idx, 1) for idx in noise_inds] + [ (idx, -1) for idx in noise_inds ] if encode: return descriptors thetas = torch.stack([self.decode(descriptor) for descriptor in descriptors]) return thetas class MixNormal(Distribution): def __init__(self, mus, sigma, random_state=None): """ mu: torch.tensor sigma: torch.tensor or float """ super().__init__(mus[0].device, random_state) self.mus = [mu for mu in mus] self.sigma = sigma def ratio(self, descriptor): return MixNormalProbRatio.apply(self.mus, self.sigma, descriptor, self.decode) def sample(self, n, encode=False): n_epsilons = n noise_ids = np.asarray( [ noise.sample_index(self.random_state, len(self.mus[0])) for _ in range(n_epsilons) ], dtype="int", ) mu_ids = np.random.randint(len(self.mus), size=n) descriptors = [ (noise_id, mu_id, 1) for noise_id, mu_id in zip(noise_ids, mu_ids) ] if encode: return descriptors thetas = torch.stack([self.decode(descriptor) for descriptor in descriptors]) return thetas def decode(self, descriptor): if not isinstance(descriptor, tuple): # assert isinstance(descriptor, torch.tensor) return descriptor noise_id, mu_id, direction = descriptor epsilon = torch.tensor( noise.get(noise_id, len(self.mus[0])), device=self.device ) with torch.no_grad(): return self.mus[mu_id] + direction * self.sigma * epsilon ================================================ FILE: evograd/expectation.py ================================================ # Copyright (c) 2019 Uber Technologies, Inc. # # Licensed under the Uber Non-Commercial License (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at the root directory of this project. # # See the License for the specific language governing permissions and # limitations under the License. import torch def unsqueeze_as(x, y): assert len(y.shape) >= len(x.shape) for _ in range(len(y.shape) - len(x.shape)): x = torch.unsqueeze(x, -1) return x def expectation(vals, sample, p): ratio = unsqueeze_as(p.ratio(sample), vals) prod = vals * ratio return prod.mean(dim=0) ================================================ FILE: evograd/noise.py ================================================ #Copyright (c) 2019 Uber Technologies, Inc. # #Licensed under the Uber Non-Commercial License (the "License"); #you may not use this file except in compliance with the License. #You may obtain a copy of the License at the root directory of this project. # #See the License for the specific language governing permissions and #limitations under the License. import logging import numpy as np logger = logging.getLogger(__name__) logging.basicConfig(level=logging.INFO) debug = True class SharedNoiseTable: def __init__(self): import ctypes import multiprocessing seed = 42 # 1 gigabyte of 32-bit numbers. Will actually sample 2 gigabytes below. count = 250000000 if not debug else 10000000 logger.info("Sampling {} random numbers with seed {}".format(count, seed)) self._shared_mem = multiprocessing.Array(ctypes.c_float, count) self.noise = np.ctypeslib.as_array(self._shared_mem.get_obj()) assert self.noise.dtype == np.float32 self.noise[:] = np.random.RandomState(seed).randn( # pylint: disable=no-member count ) # 64-bit to 32-bit conversion here logger.info("Sampled {} bytes".format(self.noise.size * 4)) def get(self, i, dim): return self.noise[i : i + dim] def sample_index(self, stream, dim): return stream.randint(0, len(self.noise) - dim + 1) noise = SharedNoiseTable() ================================================ FILE: requirements.txt ================================================ numpy==1.16.4 torch==1.1.0.post2 ================================================ FILE: setup.py ================================================ # Copyright (c) 2019 Uber Technologies, Inc. # # Licensed under the Uber Non-Commercial License (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at the root directory of this project. # # See the License for the specific language governing permissions and # limitations under the License. import setuptools with open("README.md", "r") as fh: long_description = fh.read() setuptools.setup( name="evograd", version="0.1.2", author="Alex Gajewski", author_email="agajews@gmail.com", description="A lightweight tool for differentiating through expectations", long_description=long_description, long_description_content_type="text/markdown", url="https://github.com/uber-research/EvoGrad", packages=setuptools.find_packages(), install_requires=["numpy", "torch"], classifiers=[ "Programming Language :: Python :: 3", "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", ], )