Repository: mohammadasghari/dqn-multi-agent-rl
Branch: master
Commit: 6f392154b850
Files: 19
Total size: 63.0 KB
Directory structure:
gitextract_yem4qnp2/
├── LICENSE
├── README.md
├── agents_landmarks_multiagent.py
├── brain.py
├── dqn_agent.py
├── environments/
│ ├── __init__.py
│ ├── agents_landmarks/
│ │ ├── __init__.py
│ │ └── env.py
│ └── predators_prey/
│ ├── __init__.py
│ └── env.py
├── predators_prey_multiagent.py
├── prioritized_experience_replay.py
├── results_predators_prey/
│ ├── rewards_files/
│ │ └── 5e-05_RMSProp_1000000_64_10000_100000_100_0_4_256_DQN_PER_0.5_False_3_5_1_1.csv
│ ├── timesteps_files/
│ │ └── 5e-05_RMSProp_1000000_64_10000_100000_100_0_4_256_DQN_PER_0.5_False_3_5_1_1.csv
│ └── weights_files/
│ ├── 5e-05_RMSProp_1000000_64_10000_100000_100_0_4_256_DQN_PER_0.5_False_3_5_1_1_0.h5
│ ├── 5e-05_RMSProp_1000000_64_10000_100000_100_0_4_256_DQN_PER_0.5_False_3_5_1_1_1.h5
│ └── 5e-05_RMSProp_1000000_64_10000_100000_100_0_4_256_DQN_PER_0.5_False_3_5_1_1_2.h5
├── sum_tree.py
└── uniform_experience_replay.py
================================================
FILE CONTENTS
================================================
================================================
FILE: LICENSE
================================================
MIT License
Copyright (c) 2019 mohammadasghari
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
================================================
FILE: README.md
================================================
# Deep Q-learning (DQN) for Multi-agent Reinforcement Learning (RL)
DQN implementation for two multi-agent environments: `agents_landmarks` and `predators_prey` (See [details.pdf](https://github.com/mohammadasghari/dqn-multi-agent-rl/blob/master/details.pdf) for a detailed description of these environments).
## Code structure
- `./environments/`: folder where the two environments (`agents_landmarks` and `predators_prey`) are stored.
1) `./environments/agents_landmarks`: in this environment, there exist ***n*** agents that must cooperate through actions to reach a set of ***n*** landmarks in a two dimensional discrete ***k***-by-***k*** grid environment.
2) `./environments/predators_prey`: in this environment, ***n*** agents (called predators) must cooperate with each other to capture one prey in a two dimensional discrete ***k***-by-***k*** grid environment.
- `./dqn_agent.py`: contains code for the implementation of DQN and its extensions (Double DQN, Dueling DQN, DQN with Prioritized Experience Replay) (See [details.pdf](https://github.com/mohammadasghari/dqn-multi-agent-rl/blob/master/details.pdf) for a detailed description of the DQN and its extensions).
- `./brain.py`: contains code for the implementation of neural networks required for DQN (See [details.pdf](https://github.com/mohammadasghari/dqn-multi-agent-rl/blob/master/details.pdf) for a detailed description of the neural network implementation).
- `./uniform_experience_replay.py`: contains code for the implementation of Uniform Experience Replay (UER) which can be used in DQN.
- `./prioritized_experience_replay.py`: contains code for the implementation of Prioritized Experience Replay (PER) which can be used in DQN.
- `./sum_tree.py`: contains code for the implementation of sum tree data structure which is used in Prioritized Experience Replay (PER).
- `./agents_landmarks_multiagent.py`: contains code for applying DQN to the `agents_landmarks` environment.
- `./predators_prey_multiagent.py`: contains code for applying DQN to the `predators_prey` environment.
- `./results_agents_landmarks/`: folder where the results (neural net weights, rewards of the episodes, videos, figures, etc.) for the `agents_landmarks` environment are stored.
- `./results_predators_prey/`: folder where the results (neural net weights, rewards of the episodes, videos, figures, etc.) for the `predators_prey` environment are stored.
- `./details.pdf`: a pdf file including a detailed description of the DQN and its extensions, the environments, and the neural network implementation.
## Results
#### Predators and Prey Environment
In this environment, the prey is captured when one predator moves to the location of the prey while the other predators occupy, for support, the neighboring cells of the prey's location.
##### Fixed prey (mode 0)
<img src="/results_predators_prey/videos/prey_mode_0.gif" height="400px" width="400px" >
##### Random prey (mode 1)
<img src="/results_predators_prey/videos/prey_mode_1.gif" height="400px" width="400px" >
##### Random escaping prey (mode 2)
<img src="/results_predators_prey/videos/prey_mode_2.gif" height="400px" width="400px" >
#### Agents and Landmarks Environment
##### 10 agents and 10 landmarks
<img src="/results_agents_landmarks/videos/10_10.gif" height="400px" width="400px" >
##### 16 agents and 16 landmarks
<img src="/results_agents_landmarks/videos/16_16.gif" height="400px" width="400px" >
### Todos
- Write required dependencies and installation steps
- ...
================================================
FILE: agents_landmarks_multiagent.py
================================================
"""
Created on Wednesday Jan 16 2019
@author: Seyed Mohammad Asghari
@github: https://github.com/s3yyy3d-m
"""
import numpy as np
import os
import random
import argparse
import pandas as pd
from environments.agents_landmarks.env import agentslandmarks
from dqn_agent import Agent
import glob
ARG_LIST = ['learning_rate', 'optimizer', 'memory_capacity', 'batch_size', 'target_frequency', 'maximum_exploration',
'max_timestep', 'first_step_memory', 'replay_steps', 'number_nodes', 'target_type', 'memory',
'prioritization_scale', 'dueling', 'agents_number', 'grid_size', 'game_mode', 'reward_mode']
def get_name_brain(args, idx):
file_name_str = '_'.join([str(args[x]) for x in ARG_LIST])
return './results_agents_landmarks/weights_files/' + file_name_str + '_' + str(idx) + '.h5'
def get_name_rewards(args):
file_name_str = '_'.join([str(args[x]) for x in ARG_LIST])
return './results_agents_landmarks/rewards_files/' + file_name_str + '.csv'
def get_name_timesteps(args):
file_name_str = '_'.join([str(args[x]) for x in ARG_LIST])
return './results_agents_landmarks/timesteps_files/' + file_name_str + '.csv'
class Environment(object):
def __init__(self, arguments):
current_path = os.path.dirname(__file__) # Where your .py file is located
self.env = agentslandmarks(arguments, current_path)
self.episodes_number = arguments['episode_number']
self.render = arguments['render']
self.recorder = arguments['recorder']
self.max_ts = arguments['max_timestep']
self.test = arguments['test']
self.filling_steps = arguments['first_step_memory']
self.steps_b_updates = arguments['replay_steps']
self.max_random_moves = arguments['max_random_moves']
self.num_agents = arguments['agents_number']
self.num_landmarks = self.num_agents
self.game_mode = arguments['game_mode']
self.grid_size = arguments['grid_size']
def run(self, agents, file1, file2):
total_step = 0
rewards_list = []
timesteps_list = []
max_score = -10000
for episode_num in xrange(self.episodes_number):
state = self.env.reset()
if self.render:
self.env.render()
random_moves = random.randint(0, self.max_random_moves)
# create randomness in initial state
for _ in xrange(random_moves):
actions = [4 for _ in xrange(len(agents))]
state, _, _ = self.env.step(actions)
if self.render:
self.env.render()
# converting list of positions to an array
state = np.array(state)
state = state.ravel()
done = False
reward_all = 0
time_step = 0
while not done and time_step < self.max_ts:
# if self.render:
# self.env.render()
actions = []
for agent in agents:
actions.append(agent.greedy_actor(state))
next_state, reward, done = self.env.step(actions)
# converting list of positions to an array
next_state = np.array(next_state)
next_state = next_state.ravel()
if not self.test:
for agent in agents:
agent.observe((state, actions, reward, next_state, done))
if total_step >= self.filling_steps:
agent.decay_epsilon()
if time_step % self.steps_b_updates == 0:
agent.replay()
agent.update_target_model()
total_step += 1
time_step += 1
state = next_state
reward_all += reward
if self.render:
self.env.render()
rewards_list.append(reward_all)
timesteps_list.append(time_step)
print("Episode {p}, Score: {s}, Final Step: {t}, Goal: {g}".format(p=episode_num, s=reward_all,
t=time_step, g=done))
if self.recorder:
os.system("ffmpeg -r 2 -i ./results_agents_landmarks/snaps/%04d.png -b:v 40000 -minrate 40000 -maxrate 4000k -bufsize 1835k -c:v mjpeg -qscale:v 0 "
+ "./results_agents_landmarks/videos/{a1}_{a2}_{a3}_{a4}.avi".format(a1=self.num_agents,
a2=self.num_landmarks,
a3=self.game_mode,
a4=self.grid_size))
files = glob.glob('./results_agents_landmarks/snaps/*')
for f in files:
os.remove(f)
if not self.test:
if episode_num % 100 == 0:
df = pd.DataFrame(rewards_list, columns=['score'])
df.to_csv(file1)
df = pd.DataFrame(timesteps_list, columns=['steps'])
df.to_csv(file2)
if total_step >= self.filling_steps:
if reward_all > max_score:
for agent in agents:
agent.brain.save_model()
max_score = reward_all
if __name__ =="__main__":
parser = argparse.ArgumentParser()
# DQN Parameters
parser.add_argument('-e', '--episode-number', default=1000000, type=int, help='Number of episodes')
parser.add_argument('-l', '--learning-rate', default=0.00005, type=float, help='Learning rate')
parser.add_argument('-op', '--optimizer', choices=['Adam', 'RMSProp'], default='RMSProp',
help='Optimization method')
parser.add_argument('-m', '--memory-capacity', default=1000000, type=int, help='Memory capacity')
parser.add_argument('-b', '--batch-size', default=64, type=int, help='Batch size')
parser.add_argument('-t', '--target-frequency', default=10000, type=int,
help='Number of steps between the updates of target network')
parser.add_argument('-x', '--maximum-exploration', default=100000, type=int, help='Maximum exploration step')
parser.add_argument('-fsm', '--first-step-memory', default=0, type=float,
help='Number of initial steps for just filling the memory')
parser.add_argument('-rs', '--replay-steps', default=4, type=float, help='Steps between updating the network')
parser.add_argument('-nn', '--number-nodes', default=256, type=int, help='Number of nodes in each layer of NN')
parser.add_argument('-tt', '--target-type', choices=['DQN', 'DDQN'], default='DDQN')
parser.add_argument('-mt', '--memory', choices=['UER', 'PER'], default='PER')
parser.add_argument('-pl', '--prioritization-scale', default=0.5, type=float, help='Scale for prioritization')
parser.add_argument('-du', '--dueling', action='store_true', help='Enable Dueling architecture if "store_false" ')
parser.add_argument('-gn', '--gpu-num', default='2', type=str, help='Number of GPU to use')
parser.add_argument('-test', '--test', action='store_true', help='Enable the test phase if "store_false"')
# Game Parameters
parser.add_argument('-k', '--agents-number', default=5, type=int, help='The number of agents')
parser.add_argument('-g', '--grid-size', default=10, type=int, help='Grid size')
parser.add_argument('-ts', '--max-timestep', default=100, type=int, help='Maximum number of timesteps per episode')
parser.add_argument('-gm', '--game-mode', choices=[0, 1], type=int, default=1, help='Mode of the game, '
'0: landmarks and agents fixed, '
'1: landmarks and agents random ')
parser.add_argument('-rw', '--reward-mode', choices=[0, 1, 2], type=int, default=1, help='Mode of the reward,'
'0: Only terminal rewards'
'1: Partial rewards '
'(number of unoccupied landmarks'
'2: Full rewards '
'(sum of dinstances of agents to landmarks)')
parser.add_argument('-rm', '--max-random-moves', default=0, type=int,
help='Maximum number of random initial moves for the agents')
# Visualization Parameters
parser.add_argument('-r', '--render', action='store_false', help='Turn on visualization if "store_false"')
parser.add_argument('-re', '--recorder', action='store_true', help='Store the visualization as a movie '
'if "store_false"')
args = vars(parser.parse_args())
os.environ['CUDA_VISIBLE_DEVICES'] = args['gpu_num']
env = Environment(args)
state_size = env.env.state_size
action_space = env.env.action_space()
all_agents = []
for b_idx in xrange(args['agents_number']):
brain_file = get_name_brain(args, b_idx)
all_agents.append(Agent(state_size, action_space, b_idx, brain_file, args))
rewards_file = get_name_rewards(args)
timesteps_file = get_name_timesteps(args)
env.run(all_agents, rewards_file, timesteps_file)
================================================
FILE: brain.py
================================================
"""
Created on Wednesday Jan 16 2019
@author: Seyed Mohammad Asghari
@github: https://github.com/s3yyy3d-m
"""
import os
from keras.models import Sequential, Model
from keras.layers import Dense, Lambda, Input, Concatenate
from keras.optimizers import *
import tensorflow as tf
from keras import backend as K
HUBER_LOSS_DELTA = 1.0
def huber_loss(y_true, y_predict):
err = y_true - y_predict
cond = K.abs(err) < HUBER_LOSS_DELTA
L2 = 0.5 * K.square(err)
L1 = HUBER_LOSS_DELTA * (K.abs(err) - 0.5 * HUBER_LOSS_DELTA)
loss = tf.where(cond, L2, L1)
return K.mean(loss)
class Brain(object):
def __init__(self, state_size, action_size, brain_name, arguments):
self.state_size = state_size
self.action_size = action_size
self.weight_backup = brain_name
self.batch_size = arguments['batch_size']
self.learning_rate = arguments['learning_rate']
self.test = arguments['test']
self.num_nodes = arguments['number_nodes']
self.dueling = arguments['dueling']
self.optimizer_model = arguments['optimizer']
self.model = self._build_model()
self.model_ = self._build_model()
def _build_model(self):
if self.dueling:
x = Input(shape=(self.state_size,))
# a series of fully connected layer for estimating V(s)
y11 = Dense(self.num_nodes, activation='relu')(x)
y12 = Dense(self.num_nodes, activation='relu')(y11)
y13 = Dense(1, activation="linear")(y12)
# a series of fully connected layer for estimating A(s,a)
y21 = Dense(self.num_nodes, activation='relu')(x)
y22 = Dense(self.num_nodes, activation='relu')(y21)
y23 = Dense(self.action_size, activation="linear")(y22)
w = Concatenate(axis=-1)([y13, y23])
# combine V(s) and A(s,a) to get Q(s,a)
z = Lambda(lambda a: K.expand_dims(a[:, 0], axis=-1) + a[:, 1:] - K.mean(a[:, 1:], keepdims=True),
output_shape=(self.action_size,))(w)
else:
x = Input(shape=(self.state_size,))
# a series of fully connected layer for estimating Q(s,a)
y1 = Dense(self.num_nodes, activation='relu')(x)
y2 = Dense(self.num_nodes, activation='relu')(y1)
z = Dense(self.action_size, activation="linear")(y2)
model = Model(inputs=x, outputs=z)
if self.optimizer_model == 'Adam':
optimizer = Adam(lr=self.learning_rate, clipnorm=1.)
elif self.optimizer_model == 'RMSProp':
optimizer = RMSprop(lr=self.learning_rate, clipnorm=1.)
else:
print('Invalid optimizer!')
model.compile(loss=huber_loss, optimizer=optimizer)
if self.test:
if not os.path.isfile(self.weight_backup):
print('Error:no file')
else:
model.load_weights(self.weight_backup)
return model
def train(self, x, y, sample_weight=None, epochs=1, verbose=0): # x is the input to the network and y is the output
self.model.fit(x, y, batch_size=len(x), sample_weight=sample_weight, epochs=epochs, verbose=verbose)
def predict(self, state, target=False):
if target: # get prediction from target network
return self.model_.predict(state)
else: # get prediction from local network
return self.model.predict(state)
def predict_one_sample(self, state, target=False):
return self.predict(state.reshape(1,self.state_size), target=target).flatten()
def update_target_model(self):
self.model_.set_weights(self.model.get_weights())
def save_model(self):
self.model.save(self.weight_backup)
================================================
FILE: dqn_agent.py
================================================
"""
Created on Wednesday Jan 16 2019
@author: Seyed Mohammad Asghari
@github: https://github.com/s3yyy3d-m
"""
import numpy as np
import random
from brain import Brain
from uniform_experience_replay import Memory as UER
from prioritized_experience_replay import Memory as PER
MAX_EPSILON = 1.0
MIN_EPSILON = 0.01
MIN_BETA = 0.4
MAX_BETA = 1.0
class Agent(object):
epsilon = MAX_EPSILON
beta = MIN_BETA
def __init__(self, state_size, action_size, bee_index, brain_name, arguments):
self.state_size = state_size
self.action_size = action_size
self.bee_index = bee_index
self.learning_rate = arguments['learning_rate']
self.gamma = 0.95
self.brain = Brain(self.state_size, self.action_size, brain_name, arguments)
self.memory_model = arguments['memory']
if self.memory_model == 'UER':
self.memory = UER(arguments['memory_capacity'])
elif self.memory_model == 'PER':
self.memory = PER(arguments['memory_capacity'], arguments['prioritization_scale'])
else:
print('Invalid memory model!')
self.target_type = arguments['target_type']
self.update_target_frequency = arguments['target_frequency']
self.max_exploration_step = arguments['maximum_exploration']
self.batch_size = arguments['batch_size']
self.step = 0
self.test = arguments['test']
if self.test:
self.epsilon = MIN_EPSILON
def greedy_actor(self, state):
if np.random.rand() <= self.epsilon:
return random.randrange(self.action_size)
else:
return np.argmax(self.brain.predict_one_sample(state))
def find_targets_per(self, batch):
batch_len = len(batch)
states = np.array([o[1][0] for o in batch])
states_ = np.array([o[1][3] for o in batch])
p = self.brain.predict(states)
p_ = self.brain.predict(states_)
pTarget_ = self.brain.predict(states_, target=True)
x = np.zeros((batch_len, self.state_size))
y = np.zeros((batch_len, self.action_size))
errors = np.zeros(batch_len)
for i in range(batch_len):
o = batch[i][1]
s = o[0]
a = o[1][self.bee_index]
r = o[2]
s_ = o[3]
done = o[4]
t = p[i]
old_value = t[a]
if done:
t[a] = r
else:
if self.target_type == 'DDQN':
t[a] = r + self.gamma * pTarget_[i][np.argmax(p_[i])]
elif self.target_type == 'DQN':
t[a] = r + self.gamma * np.amax(pTarget_[i])
else:
print('Invalid type for target network!')
x[i] = s
y[i] = t
errors[i] = np.abs(t[a] - old_value)
return [x, y, errors]
def find_targets_uer(self, batch):
batch_len = len(batch)
states = np.array([o[0] for o in batch])
states_ = np.array([o[3] for o in batch])
p = self.brain.predict(states)
p_ = self.brain.predict(states_)
pTarget_ = self.brain.predict(states_, target=True)
x = np.zeros((batch_len, self.state_size))
y = np.zeros((batch_len, self.action_size))
errors = np.zeros(batch_len)
for i in range(batch_len):
o = batch[i]
s = o[0]
a = o[1][self.bee_index]
r = o[2]
s_ = o[3]
done = o[4]
t = p[i]
old_value = t[a]
if done:
t[a] = r
else:
if self.target_type == 'DDQN':
t[a] = r + self.gamma * pTarget_[i][np.argmax(p_[i])]
elif self.target_type == 'DQN':
t[a] = r + self.gamma * np.amax(pTarget_[i])
else:
print('Invalid type for target network!')
x[i] = s
y[i] = t
errors[i] = np.abs(t[a] - old_value)
return [x, y]
def observe(self, sample):
if self.memory_model == 'UER':
self.memory.remember(sample)
elif self.memory_model == 'PER':
_, _, errors = self.find_targets_per([[0, sample]])
self.memory.remember(sample, errors[0])
else:
print('Invalid memory model!')
def decay_epsilon(self):
# slowly decrease Epsilon based on our experience
self.step += 1
if self.test:
self.epsilon = MIN_EPSILON
self.beta = MAX_BETA
else:
if self.step < self.max_exploration_step:
self.epsilon = MIN_EPSILON + (MAX_EPSILON - MIN_EPSILON) * (self.max_exploration_step - self.step)/self.max_exploration_step
self.beta = MAX_BETA + (MIN_BETA - MAX_BETA) * (self.max_exploration_step - self.step)/self.max_exploration_step
else:
self.epsilon = MIN_EPSILON
def replay(self):
if self.memory_model == 'UER':
batch = self.memory.sample(self.batch_size)
x, y = self.find_targets_uer(batch)
self.brain.train(x, y)
elif self.memory_model == 'PER':
[batch, batch_indices, batch_priorities] = self.memory.sample(self.batch_size)
x, y, errors = self.find_targets_per(batch)
normalized_batch_priorities = [float(i) / sum(batch_priorities) for i in batch_priorities]
importance_sampling_weights = [(self.batch_size * i) ** (-1 * self.beta)
for i in normalized_batch_priorities]
normalized_importance_sampling_weights = [float(i) / max(importance_sampling_weights)
for i in importance_sampling_weights]
sample_weights = [errors[i] * normalized_importance_sampling_weights[i] for i in xrange(len(errors))]
self.brain.train(x, y, np.array(sample_weights))
self.memory.update(batch_indices, errors)
else:
print('Invalid memory model!')
def update_target_model(self):
if self.step % self.update_target_frequency == 0:
self.brain.update_target_model()
================================================
FILE: environments/__init__.py
================================================
================================================
FILE: environments/agents_landmarks/__init__.py
================================================
================================================
FILE: environments/agents_landmarks/env.py
================================================
"""
Created on Wednesday Jan 16 2019
@author: Seyed Mohammad Asghari
@github: https://github.com/s3yyy3d-m
"""
import random
import operator
import numpy as np
import pygame
import sys
import os
# Define some colors
BLACK = (0, 0, 0)
WHITE = (255, 255, 255)
GREEN = (0, 255, 0)
RED = (255, 0, 0)
BLUE = (0, 0, 255)
GRAY = (128, 128, 128)
ORANGE = (255, 128, 0)
# This sets the WIDTH and HEIGHT of each grid location
WIDTH = 60
HEIGHT = 60
# This sets the margin between each cell
MARGIN = 1
class agentslandmarks:
UP = 0
DOWN = 1
LEFT = 2
RIGHT = 3
STAY = 4
A = [UP, DOWN, LEFT, RIGHT, STAY]
A_DIFF = [(-1, 0), (1, 0), (0, -1), (0, 1), (0, 0)]
def __init__(self, args, current_path):
self.game_mode = args['game_mode']
self.reward_mode = args['reward_mode']
self.num_agents = args['agents_number']
self.num_landmarks = self.num_agents
self.grid_size = args['grid_size']
self.state_size = (self.num_agents + self.num_landmarks) * 2
self.agents_positions = []
self.landmarks_positions = []
self.render_flag = args['render']
self.recorder_flag = args['recorder']
# enables visualizer
if self.render_flag:
[self.screen, self.my_font] = self.gui_setup()
self.step_num = 1
resource_path = os.path.join(current_path, 'environments') # The resource folder path
resource_path = os.path.join(resource_path, 'agents_landmarks') # The resource folder path
image_path = os.path.join(resource_path, 'images') # The image folder path
img = pygame.image.load(os.path.join(image_path, 'agent.jpg')).convert()
self.img_agent = pygame.transform.scale(img, (WIDTH, WIDTH))
img = pygame.image.load(os.path.join(image_path, 'landmark.jpg')).convert()
self.img_landmark = pygame.transform.scale(img, (WIDTH, WIDTH))
img = pygame.image.load(os.path.join(image_path, 'agent_landmark.jpg')).convert()
self.img_agent_landmark = pygame.transform.scale(img, (WIDTH, WIDTH))
img = pygame.image.load(os.path.join(image_path, 'agent_agent_landmark.jpg')).convert()
self.img_agent_agent_landmark = pygame.transform.scale(img, (WIDTH, WIDTH))
img = pygame.image.load(os.path.join(image_path, 'agent_agent.jpg')).convert()
self.img_agent_agent = pygame.transform.scale(img, (WIDTH, WIDTH))
if self.recorder_flag:
self.snaps_path = os.path.join(current_path, 'results_agents_landmarks') # The resource folder path
self.snaps_path = os.path.join(self.snaps_path, 'snaps') # The resource folder path
self.cells = []
self.positions_idx = []
# self.agents_collide_flag = args['collide_flag']
# self.penalty_per_collision = args['penalty_collision']
self.num_episodes = 0
self.terminal = False
def set_positions_idx(self):
cells = [(i, j) for i in range(0, self.grid_size) for j in range(0, self.grid_size)]
positions_idx = []
if self.game_mode == 0:
# first enter the positions for the landmarks and then for the agents. If the grid is n*n, then the
# positions are
# 0 1 2 ... n-1
# n n+1 n+2 ... 2n-1
# 2n 2n+1 2n+2 ... 3n-1
# . . . . .
# . . . . .
# . . . . .
# (n-1)*n (n-1)*n+1 (n-1)*n+2 ... n*n+1
# , e.g.,
# positions_idx = [0, 6, 23, 24] where 0 and 6 are the positions of landmarks and 23 and 24 are positions
# of agents
positions_idx = []
if self.game_mode == 1:
positions_idx = np.random.choice(len(cells), size=self.num_landmarks + self.num_agents,
replace=False)
return [cells, positions_idx]
def reset(self): # initialize the world
self.terminal = False
[self.cells, self.positions_idx] = self.set_positions_idx()
# separate the generated position indices for walls, pursuers, and evaders
landmarks_positions_idx = self.positions_idx[0:self.num_landmarks]
agents_positions_idx = self.positions_idx[self.num_landmarks:self.num_landmarks + self.num_agents]
# map generated position indices to positions
self.landmarks_positions = [self.cells[pos] for pos in landmarks_positions_idx]
self.agents_positions = [self.cells[pos] for pos in agents_positions_idx]
initial_state = list(sum(self.landmarks_positions + self.agents_positions, ()))
return initial_state
def step(self, agents_actions):
# update the position of agents
self.agents_positions = self.update_positions(self.agents_positions, agents_actions)
if self.reward_mode == 0:
binary_cover_list = []
for landmark in self.landmarks_positions:
distances = [np.linalg.norm(np.array(landmark) - np.array(agent_pos), 1)
for agent_pos in self.agents_positions]
min_dist = min(distances)
if min_dist == 0:
binary_cover_list.append(min_dist)
else:
binary_cover_list.append(1)
# check the terminal case
if sum(binary_cover_list) == 0:
reward = 0
self.terminal = True
else:
reward = -1
self.terminal = False
if self.reward_mode == 1:
binary_cover_list = []
for landmark in self.landmarks_positions:
distances = [np.linalg.norm(np.array(landmark) - np.array(agent_pos), 1)
for agent_pos in self.agents_positions]
min_dist = min(distances)
if min_dist == 0:
binary_cover_list.append(0)
else:
binary_cover_list.append(1)
reward = -1 * sum(binary_cover_list)
# check the terminal case
if reward == 0:
self.terminal = True
else:
self.terminal = False
if self.reward_mode == 2:
# calculate the sum of minimum distances of agents to landmarks
reward = 0
for landmark in self.landmarks_positions:
distances = [np.linalg.norm(np.array(landmark) - np.array(agent_pos), 1)
for agent_pos in self.agents_positions]
reward -= min(distances)
# check the terminal case
if reward == 0:
self.terminal = True
new_state = list(sum(self.landmarks_positions + self.agents_positions, ()))
return [new_state, reward, self.terminal]
def update_positions(self, pos_list, act_list):
positions_action_applied = []
for idx in xrange(len(pos_list)):
if act_list[idx] != 4:
pos_act_applied = map(operator.add, pos_list[idx], self.A_DIFF[act_list[idx]])
# checks to make sure the new pos in inside the grid
for i in xrange(0, 2):
if pos_act_applied[i] < 0:
pos_act_applied[i] = 0
if pos_act_applied[i] >= self.grid_size:
pos_act_applied[i] = self.grid_size - 1
positions_action_applied.append(tuple(pos_act_applied))
else:
positions_action_applied.append(pos_list[idx])
final_positions = []
for pos_idx in xrange(len(pos_list)):
if positions_action_applied[pos_idx] == pos_list[pos_idx]:
final_positions.append(pos_list[pos_idx])
elif positions_action_applied[pos_idx] not in pos_list and positions_action_applied[
pos_idx] not in positions_action_applied[
0:pos_idx] + positions_action_applied[
pos_idx + 1:]:
final_positions.append(positions_action_applied[pos_idx])
else:
final_positions.append(pos_list[pos_idx])
return final_positions
def action_space(self):
return len(self.A)
def render(self):
pygame.time.delay(500)
pygame.display.flip()
for event in pygame.event.get():
if event.type == pygame.QUIT:
sys.exit()
self.screen.fill(BLACK)
text = self.my_font.render("Step: {0}".format(self.step_num), 1, WHITE)
self.screen.blit(text, (5, 15))
for row in range(self.grid_size):
for column in range(self.grid_size):
pos = (row, column)
frequency = self.find_frequency(pos, self.agents_positions)
if pos in self.landmarks_positions and frequency >= 1:
if frequency == 1:
self.screen.blit(self.img_agent_landmark,
((MARGIN + WIDTH) * column + MARGIN, (MARGIN + HEIGHT) * row + MARGIN + 50))
else:
self.screen.blit(self.img_agent_agent_landmark,
((MARGIN + WIDTH) * column + MARGIN, (MARGIN + HEIGHT) * row + MARGIN + 50))
elif pos in self.landmarks_positions:
self.screen.blit(self.img_landmark,
((MARGIN + WIDTH) * column + MARGIN, (MARGIN + HEIGHT) * row + MARGIN + 50))
elif frequency >= 1:
if frequency == 1:
self.screen.blit(self.img_agent,
((MARGIN + WIDTH) * column + MARGIN, (MARGIN + HEIGHT) * row + MARGIN + 50))
elif frequency > 1:
self.screen.blit(self.img_agent_agent,
((MARGIN + WIDTH) * column + MARGIN, (MARGIN + HEIGHT) * row + MARGIN + 50))
else:
print('Error!')
else:
pygame.draw.rect(self.screen, WHITE,
[(MARGIN + WIDTH) * column + MARGIN, (MARGIN + HEIGHT) * row + MARGIN + 50, WIDTH,
HEIGHT])
if self.recorder_flag:
file_name = "%04d.png" % self.step_num
pygame.image.save(self.screen, os.path.join(self.snaps_path, file_name))
if not self.terminal:
self.step_num += 1
def gui_setup(self):
# Initialize pygame
pygame.init()
# Set the HEIGHT and WIDTH of the screen
board_size_x = (WIDTH + MARGIN) * self.grid_size
board_size_y = (HEIGHT + MARGIN) * self.grid_size
window_size_x = int(board_size_x)
window_size_y = int(board_size_y * 1.2)
window_size = [window_size_x, window_size_y]
screen = pygame.display.set_mode(window_size)
# Set title of screen
pygame.display.set_caption("Agents-and-Landmarks Game")
myfont = pygame.font.SysFont("monospace", 30)
return [screen, myfont]
def find_frequency(self, a, items):
freq = 0
for item in items:
if item == a:
freq += 1
return freq
================================================
FILE: environments/predators_prey/__init__.py
================================================
================================================
FILE: environments/predators_prey/env.py
================================================
"""
Created on Wednesday Jan 16 2019
@author: Seyed Mohammad Asghari
@github: https://github.com/s3yyy3d-m
"""
import random
import operator
import numpy as np
import pygame
import sys
import os
# Define some colors
BLACK = (0, 0, 0)
WHITE = (255, 255, 255)
GREEN = (0, 255, 0)
RED = (255, 0, 0)
BLUE = (0, 0, 255)
GRAY = (128, 128, 128)
ORANGE = (255, 128, 0)
# This sets the WIDTH and HEIGHT of each grid location
WIDTH = 60
HEIGHT = 60
# This sets the margin between each cell
MARGIN = 1
class PredatorsPrey(object):
UP = 0
DOWN = 1
LEFT = 2
RIGHT = 3
STAY = 4
A = [UP, DOWN, LEFT, RIGHT, STAY]
A_DIFF = [(-1, 0), (1, 0), (0, -1), (0, 1), (0,0)]
def __init__(self, args, current_path):
self.num_predators = args['agents_number']
self.num_preys = 1
self.preys_mode = args['preys_mode']
self.num_walls = 0
self.grid_size = args['grid_size']
self.game_mode = args['game_mode']
self.reward_mode = args['reward_mode']
self.state_size = (self.num_preys + self.num_predators + self.num_walls)*2
self.predators_positions = []
self.preys_positions = []
self.walls_positions = []
self.render_flag = args['render']
self.recorder_flag = args['recorder']
# enables visualizer
if self.render_flag:
[self.screen, self.my_font] = self.gui_setup()
self.step_num = 1
resource_path = os.path.join(current_path, 'environments') # The resource folder path
resource_path = os.path.join(resource_path, 'predators_prey') # The resource folder path
image_path = os.path.join(resource_path, 'images') # The image folder path
img = pygame.image.load(os.path.join(image_path, 'predator_prey.jpg')).convert()
self.img_predator_prey = pygame.transform.scale(img, (WIDTH, WIDTH))
img = pygame.image.load(os.path.join(image_path, 'predator.jpg')).convert()
self.img_predator = pygame.transform.scale(img, (WIDTH, WIDTH))
img = pygame.image.load(os.path.join(image_path, 'prey.jpg')).convert()
self.img_prey = pygame.transform.scale(img, (WIDTH, WIDTH))
if self.recorder_flag:
self.snaps_path = os.path.join(current_path, 'results_predators_prey') # The resource folder path
self.snaps_path = os.path.join(self.snaps_path, 'snaps') # The resource folder path
self.cells = []
self.agents_positions_idx = []
self.num_episodes = 0
self.terminal = False
def set_positions_idx(self):
cells = [(i, j) for i in range(0, self.grid_size) for j in range(0, self.grid_size)]
positions_idx = []
if self.game_mode == 0:
# first enter the positions for the agents (predators) and the single prey. If the grid is n*n,
# then the positions are
# 0 1 2 ... n-1
# n n+1 n+2 ... 2n-1
# 2n 2n+1 2n+2 ... 3n-1
# . . . . .
# . . . . .
# . . . . .
# (n-1)*n (n-1)*n+1 (n-1)*n+2 ... n*n+1
# , e.g.,
# positions_idx = [0, 6, 23, 24] where 0, 6, and 23 are the positions of the agents 24 is the position
# of the prey
positions_idx = []
if self.game_mode == 1:
positions_idx = np.random.choice(len(cells), size=self.num_predators + self.num_preys, replace=False)
return [cells, positions_idx]
def reset(self): # initialize the world
self.terminal = False
self.num_catches = 0
[self.cells, self.agents_positions_idx] = self.set_positions_idx()
# separate the generated position indices for walls, predators, and preys
walls_positions_idx = self.agents_positions_idx[0:self.num_walls]
predators_positions_idx = self.agents_positions_idx[self.num_walls:self.num_walls + self.num_predators]
preys_positions_idx = self.agents_positions_idx[self.num_walls + self.num_predators:]
# map generated position indices to positions
self.walls_positions = [self.cells[pos] for pos in walls_positions_idx]
self.predators_positions = [self.cells[pos] for pos in predators_positions_idx]
self.preys_positions = [self.cells[pos] for pos in preys_positions_idx]
initial_state = list(sum(self.walls_positions + self.predators_positions + self.preys_positions, ()))
return initial_state
def fix_prey(self):
return 4
def actor_prey_random(self):
return random.randrange(self.action_space())
def actor_prey_random_escape(self, prey_index):
prey_pos = self.preys_positions[prey_index]
[_, action_to_neighbors] = self.empty_neighbor_finder(prey_pos)
return random.choice(action_to_neighbors)
def neighbor_finder(self, pos):
neighbors_pos = []
action_to_neighbor = []
pos_repeat = [pos for _ in xrange(4)]
for idx in xrange(4):
neighbor_pos = map(operator.add, pos_repeat[idx], self.A_DIFF[idx])
if neighbor_pos[0] in range(0,self.grid_size) and neighbor_pos[1] in range(0,self.grid_size)\
and neighbor_pos not in self.walls_positions:
neighbors_pos.append(neighbor_pos)
action_to_neighbor.append(idx)
neighbors_pos.append(pos)
action_to_neighbor.append(4)
return [neighbors_pos, action_to_neighbor]
def empty_neighbor_finder(self, pos):
neighbors_pos = []
action_to_neighbor = []
pos_repeat = [pos for _ in xrange(4)]
for idx in xrange(4):
neighbor_pos = map(operator.add, pos_repeat[idx], self.A_DIFF[idx])
if neighbor_pos[0] in range(0,self.grid_size) and neighbor_pos[1] in range(0, self.grid_size)\
and neighbor_pos not in self.walls_positions:
neighbors_pos.append(neighbor_pos)
action_to_neighbor.append(idx)
neighbors_pos.append(pos)
action_to_neighbor.append(4)
empty_neighbors_pos = []
action_to_empty_neighbor = []
for idx in xrange(len(neighbors_pos)):
if tuple(neighbors_pos[idx]) not in self.predators_positions:
empty_neighbors_pos.append(neighbors_pos[idx])
action_to_empty_neighbor.append(action_to_neighbor[idx])
return [empty_neighbors_pos, action_to_empty_neighbor]
def step(self, predators_actions):
# update the position of preys
preys_actions = []
for prey_idx in xrange(len(self.preys_positions)):
if self.preys_mode == 0:
preys_actions.append(self.fix_prey())
elif self.preys_mode == 1:
preys_actions.append(self.actor_prey_random_escape(prey_idx))
elif self.preys_mode == 2:
preys_actions.append(self.actor_prey_random())
else:
print('Invalid mode for the prey')
self.preys_positions = self.update_positions(self.preys_positions, preys_actions)
# update the position of predators
self.predators_positions = self.update_positions(self.predators_positions, predators_actions)
# check whether any predator catches any prey
[reward, self.terminal] = self.check_catching()
new_state = list(sum(self.walls_positions + self.predators_positions + self.preys_positions,()))
return [new_state, reward, self.terminal]
def check_catching(self):
new_preys_position = []
terminal_flag = False
# checks to see whether the position of any prey is the same of as the position of any predator
if self.reward_mode == 0:
for prey_pos in self.preys_positions:
new_preys_position.append(prey_pos)
distances = 0
for predator in self.predators_positions:
distances += np.linalg.norm(np.array(predator) - np.array(self.preys_positions[0]), 1)
[prey_empty_neigbours, _] = self.empty_neighbor_finder(self.preys_positions[0])
# check the terminal case
if int(distances) == self.num_predators - 1 or len(prey_empty_neigbours) == 0:
terminal_flag = True
reward = 0
else:
reward = -1
elif self.reward_mode == 1:
for prey_pos in self.preys_positions:
new_preys_position.append(prey_pos)
distances = 0
for predator in self.predators_positions:
distances += np.linalg.norm(np.array(predator) - np.array(self.preys_positions[0]), 1)
[prey_empty_neigbours, _] = self.empty_neighbor_finder(self.preys_positions[0])
# check the terminal case
if int(distances) == self.num_predators - 1 or len(prey_empty_neigbours) == 0:
terminal_flag = True
reward = 0
else:
reward = -1 * distances
else:
print('Invalid game mode')
self.preys_positions = new_preys_position
return [reward, terminal_flag]
def update_positions(self, pos_list, act_list):
positions_action_applied = []
for idx in xrange(len(pos_list)):
if act_list[idx] != 4:
pos_act_applied = map(operator.add, pos_list[idx], self.A_DIFF[act_list[idx]])
# checks to make sure the new pos in inside the grid
for i in xrange(0, 2):
if pos_act_applied[i] < 0:
pos_act_applied[i] = 0
if pos_act_applied[i] >= self.grid_size:
pos_act_applied[i] = self.grid_size - 1
positions_action_applied.append(tuple(pos_act_applied))
else:
positions_action_applied.append(pos_list[idx])
final_positions = []
for pos_idx in xrange(len(pos_list)):
if positions_action_applied[pos_idx] == pos_list[pos_idx]:
final_positions.append(pos_list[pos_idx])
elif positions_action_applied[pos_idx] not in pos_list and positions_action_applied[pos_idx] not in positions_action_applied[
0:pos_idx] + positions_action_applied[
pos_idx + 1:]:
final_positions.append(positions_action_applied[pos_idx])
else:
final_positions.append(pos_list[pos_idx])
return final_positions
def action_space(self):
return len(self.A)
def render(self):
pygame.time.wait(500)
pygame.display.flip()
for event in pygame.event.get():
if event.type == pygame.QUIT:
sys.exit()
self.screen.fill(BLACK)
text = self.my_font.render("Step: {0}".format(self.step_num), 1, WHITE)
self.screen.blit(text, (5, 15))
# for row in range(self.grid_size):
# for column in range(self.grid_size):
# pos = (row, column)
# if pos in self.predators_positions and pos in self.preys_positions:
# color = ORANGE
# elif pos in self.predators_positions:
# color = BLUE
# elif pos in self.preys_positions:
# color = RED
# else:
# color = WHITE
# pygame.draw.rect(self.screen, color,
# [(MARGIN + WIDTH) * column + MARGIN, (MARGIN + HEIGHT) * row + MARGIN + 50, WIDTH,
# HEIGHT])
for row in range(self.grid_size):
for column in range(self.grid_size):
pos = (row, column)
if pos in self.predators_positions and pos in self.preys_positions:
self.screen.blit(self.img_predator_prey,
((MARGIN + WIDTH) * column + MARGIN, (MARGIN + HEIGHT) * row + MARGIN + 50))
elif pos in self.predators_positions:
self.screen.blit(self.img_predator,
((MARGIN + WIDTH) * column + MARGIN, (MARGIN + HEIGHT) * row + MARGIN + 50))
elif pos in self.preys_positions:
self.screen.blit(self.img_prey,
((MARGIN + WIDTH) * column + MARGIN, (MARGIN + HEIGHT) * row + MARGIN + 50))
else:
color = WHITE
pygame.draw.rect(self.screen, color,
[(MARGIN + WIDTH) * column + MARGIN, (MARGIN + HEIGHT) * row + MARGIN + 50, WIDTH,
HEIGHT])
if self.recorder_flag:
file_name = "%04d.png" % self.step_num
pygame.image.save(self.screen, os.path.join(self.snaps_path, file_name))
if not self.terminal:
self.step_num += 1
def gui_setup(self):
# Initialize pygame
pygame.init()
# Set the HEIGHT and WIDTH of the screen
board_size_x = (WIDTH + MARGIN) * self.grid_size
board_size_y = (HEIGHT + MARGIN) * self.grid_size
window_size_x = int(board_size_x*1.01)
window_size_y = int(board_size_y * 1.2)
window_size = [window_size_x, window_size_y]
screen = pygame.display.set_mode(window_size, 0, 32)
# Set title of screen
pygame.display.set_caption("Predators-and-Prey Game")
myfont = pygame.font.SysFont("monospace", 30)
return [screen, myfont]
================================================
FILE: predators_prey_multiagent.py
================================================
"""
Created on Wednesday Jan 16 2019
@author: Seyed Mohammad Asghari
@github: https://github.com/s3yyy3d-m
"""
import numpy as np
import os
import random
import argparse
import pandas as pd
from environments.predators_prey.env import PredatorsPrey
from dqn_agent import Agent
import glob
ARG_LIST = ['learning_rate', 'optimizer', 'memory_capacity', 'batch_size', 'target_frequency', 'maximum_exploration',
'max_timestep', 'first_step_memory', 'replay_steps', 'number_nodes', 'target_type', 'memory',
'prioritization_scale', 'dueling', 'agents_number', 'grid_size', 'game_mode', 'reward_mode']
def get_name_brain(args, idx):
file_name_str = '_'.join([str(args[x]) for x in ARG_LIST])
return './results_predators_prey/weights_files/' + file_name_str + '_' + str(idx) + '.h5'
def get_name_rewards(args):
file_name_str = '_'.join([str(args[x]) for x in ARG_LIST])
return './results_predators_prey/rewards_files/' + file_name_str + '.csv'
def get_name_timesteps(args):
file_name_str = '_'.join([str(args[x]) for x in ARG_LIST])
return './results_predators_prey/timesteps_files/' + file_name_str + '.csv'
class Environment(object):
def __init__(self, arguments):
current_path = os.path.dirname(__file__) # Where your .py file is located
self.env = PredatorsPrey(arguments, current_path)
self.episodes_number = arguments['episode_number']
self.render = arguments['render']
self.recorder = arguments['recorder']
self.max_ts = arguments['max_timestep']
self.test = arguments['test']
self.filling_steps = arguments['first_step_memory']
self.steps_b_updates = arguments['replay_steps']
self.max_random_moves = arguments['max_random_moves']
self.num_predators = arguments['agents_number']
self.num_preys = 1
self.preys_mode = arguments['preys_mode']
self.game_mode = arguments['game_mode']
self.grid_size = arguments['grid_size']
def run(self, agents, file1, file2):
total_step = 0
rewards_list = []
timesteps_list = []
max_score = -10000
for episode_num in xrange(self.episodes_number):
state = self.env.reset()
if self.render:
self.env.render()
random_moves = random.randint(0, self.max_random_moves)
# create randomness in initial state
for _ in xrange(random_moves):
actions = [4 for _ in xrange(len(agents))]
state, _, _ = self.env.step(actions)
if self.render:
self.env.render()
# converting list of positions to an array
state = np.array(state)
state = state.ravel()
done = False
reward_all = 0
time_step = 0
while not done and time_step < self.max_ts:
# if self.render:
# self.env.render()
actions = []
for agent in agents:
actions.append(agent.greedy_actor(state))
next_state, reward, done = self.env.step(actions)
# converting list of positions to an array
next_state = np.array(next_state)
next_state = next_state.ravel()
if not self.test:
for agent in agents:
agent.observe((state, actions, reward, next_state, done))
if total_step >= self.filling_steps:
agent.decay_epsilon()
if time_step % self.steps_b_updates == 0:
agent.replay()
agent.update_target_model()
total_step += 1
time_step += 1
state = next_state
reward_all += reward
if self.render:
self.env.render()
rewards_list.append(reward_all)
timesteps_list.append(time_step)
print("Episode {p}, Score: {s}, Final Step: {t}, Goal: {g}".format(p=episode_num, s=reward_all,
t=time_step, g=done))
if self.recorder:
os.system("ffmpeg -r 4 -i ./results_predators_prey/snaps/%04d.png -b:v 40000 -minrate 40000 -maxrate 4000k -bufsize 1835k -c:v mjpeg -qscale:v 0 "
+ "./results_predators_prey/videos/{a1}_{a2}_{a3}_{a4}_{a5}.avi".format(a1=self.num_predators,
a2=self.num_preys,
a3=self.preys_mode,
a4=self.game_mode,
a5=self.grid_size))
files = glob.glob('./results_predators_prey/snaps/*')
for f in files:
os.remove(f)
if not self.test:
if episode_num % 100 == 0:
df = pd.DataFrame(rewards_list, columns=['score'])
df.to_csv(file1)
df = pd.DataFrame(timesteps_list, columns=['steps'])
df.to_csv(file2)
if total_step >= self.filling_steps:
if reward_all > max_score:
for agent in agents:
agent.brain.save_model()
max_score = reward_all
if __name__ =="__main__":
parser = argparse.ArgumentParser()
# DQN Parameters
parser.add_argument('-e', '--episode-number', default=1, type=int, help='Number of episodes')
parser.add_argument('-l', '--learning-rate', default=0.00005, type=float, help='Learning rate')
parser.add_argument('-op', '--optimizer', choices=['Adam', 'RMSProp'], default='RMSProp',
help='Optimization method')
parser.add_argument('-m', '--memory-capacity', default=1000000, type=int, help='Memory capacity')
parser.add_argument('-b', '--batch-size', default=64, type=int, help='Batch size')
parser.add_argument('-t', '--target-frequency', default=10000, type=int,
help='Number of steps between the updates of target network')
parser.add_argument('-x', '--maximum-exploration', default=100000, type=int, help='Maximum exploration step')
parser.add_argument('-fsm', '--first-step-memory', default=0, type=float,
help='Number of initial steps for just filling the memory')
parser.add_argument('-rs', '--replay-steps', default=4, type=float, help='Steps between updating the network')
parser.add_argument('-nn', '--number-nodes', default=256, type=int, help='Number of nodes in each layer of NN')
parser.add_argument('-tt', '--target-type', choices=['DQN', 'DDQN'], default='DQN')
parser.add_argument('-mt', '--memory', choices=['UER', 'PER'], default='PER')
parser.add_argument('-pl', '--prioritization-scale', default=0.5, type=float, help='Scale for prioritization')
parser.add_argument('-du', '--dueling', action='store_true', help='Enable Dueling architecture if "store_false" ')
parser.add_argument('-gn', '--gpu-num', default='2', type=str, help='Number of GPU to use')
parser.add_argument('-test', '--test', action='store_true', help='Enable the test phase if "store_false"')
# Game Parameters
parser.add_argument('-k', '--agents-number', default=3, type=int, help='The number of agents')
parser.add_argument('-g', '--grid-size', default=5, type=int, help='Grid size')
parser.add_argument('-ts', '--max-timestep', default=100, type=int, help='Maximum number of timesteps per episode')
parser.add_argument('-gm', '--game-mode', choices=[0, 1], type=int, default=1, help='Mode of the game, '
'0: prey and agents (predators)'
'are fixed,'
'1: prey and agents (predators)'
'are random ')
parser.add_argument('-rw', '--reward-mode', choices=[0, 1], type=int, default=1, help='Mode of the reward,'
'0: Only terminal rewards, '
'1: Full rewards,'
'(sum of dinstances of agents'
' to the prey)')
parser.add_argument('-rm', '--max-random-moves', default=0, type=int,
help='Maximum number of random initial moves for agents')
parser.add_argument('-evm', '--preys-mode', choices=[0, 1, 2], type=int, default=2, help='Mode of preys:'
'0: fixed,'
'1: random,'
'2: random escape')
# Visualization Parameters
parser.add_argument('-r', '--render', action='store_false', help='Turn on visualization if "store_false"')
parser.add_argument('-re', '--recorder', action='store_true', help='Store the visualization as a movie if '
'"store_false"')
args = vars(parser.parse_args())
os.environ['CUDA_VISIBLE_DEVICES'] = args['gpu_num']
env = Environment(args)
state_size = env.env.state_size
action_space = env.env.action_space()
all_agents = []
for b_idx in xrange(args['agents_number']):
brain_file = get_name_brain(args, b_idx)
all_agents.append(Agent(state_size, action_space, b_idx, brain_file, args))
rewards_file = get_name_rewards(args)
timesteps_file = get_name_timesteps(args)
env.run(all_agents, rewards_file, timesteps_file)
================================================
FILE: prioritized_experience_replay.py
================================================
"""
Created on Wednesday Jan 16 2019
@author: Seyed Mohammad Asghari
@github: https://github.com/s3yyy3d-m
"""
import random
from sum_tree import SumTree as ST
class Memory(object):
e = 0.05
def __init__(self, capacity, pr_scale):
self.capacity = capacity
self.memory = ST(self.capacity)
self.pr_scale = pr_scale
self.max_pr = 0
def get_priority(self, error):
return (error + self.e) ** self.pr_scale
def remember(self, sample, error):
p = self.get_priority(error)
self_max = max(self.max_pr, p)
self.memory.add(self_max, sample)
def sample(self, n):
sample_batch = []
sample_batch_indices = []
sample_batch_priorities = []
num_segments = self.memory.total() / n
for i in xrange(n):
left = num_segments * i
right = num_segments * (i + 1)
s = random.uniform(left, right)
idx, pr, data = self.memory.get(s)
sample_batch.append((idx, data))
sample_batch_indices.append(idx)
sample_batch_priorities.append(pr)
return [sample_batch, sample_batch_indices, sample_batch_priorities]
def update(self, batch_indices, errors):
for i in xrange(len(batch_indices)):
p = self.get_priority(errors[i])
self.memory.update(batch_indices[i], p)
================================================
FILE: results_predators_prey/rewards_files/5e-05_RMSProp_1000000_64_10000_100000_100_0_4_256_DQN_PER_0.5_False_3_5_1_1.csv
================================================
,score
0,-984.0
================================================
FILE: results_predators_prey/timesteps_files/5e-05_RMSProp_1000000_64_10000_100000_100_0_4_256_DQN_PER_0.5_False_3_5_1_1.csv
================================================
,steps
0,100
================================================
FILE: sum_tree.py
================================================
import numpy
class SumTree(object):
def __init__(self, capacity):
self.write = 0
self.capacity = capacity
self.tree = numpy.zeros(2*capacity - 1)
self.data = numpy.zeros(capacity, dtype=object)
def _propagate(self, idx, change):
parent = (idx - 1) // 2
self.tree[parent] += change
if parent != 0:
self._propagate(parent, change)
def _retrieve(self, idx, s):
left = 2 * idx + 1
right = left + 1
if left >= len(self.tree):
return idx
if s <= self.tree[left]:
return self._retrieve(left, s)
else:
return self._retrieve(right, s-self.tree[left])
def total(self):
return self.tree[0]
def add(self, p, data):
idx = self.write + self.capacity - 1
self.data[self.write] = data
self.update(idx, p)
self.write += 1
if self.write >= self.capacity:
self.write = 0
def update(self, idx, p):
change = p - self.tree[idx]
self.tree[idx] = p
self._propagate(idx, change)
# def get_real_idx(self, data_idx):
#
# tempIdx = data_idx - self.write
# if tempIdx >= 0:
# return tempIdx
# else:
# return tempIdx + self.capacity
def get(self, s):
idx = self._retrieve(0, s)
dataIdx = idx - self.capacity + 1
# realIdx = self.get_real_idx(dataIdx)
return idx, self.tree[idx], self.data[dataIdx]
================================================
FILE: uniform_experience_replay.py
================================================
"""
Created on Wednesday Jan 16 2019
@author: Seyed Mohammad Asghari
@github: https://github.com/s3yyy3d-m
"""
import random
from collections import deque
class Memory(object):
def __init__(self, capacity):
self.capacity = capacity
self.memory = deque(maxlen=self.capacity)
def remember(self, sample):
self.memory.append(sample)
def sample(self, n):
n = min(n, len(self.memory))
sample_batch = random.sample(self.memory, n)
return sample_batch
gitextract_yem4qnp2/ ├── LICENSE ├── README.md ├── agents_landmarks_multiagent.py ├── brain.py ├── dqn_agent.py ├── environments/ │ ├── __init__.py │ ├── agents_landmarks/ │ │ ├── __init__.py │ │ └── env.py │ └── predators_prey/ │ ├── __init__.py │ └── env.py ├── predators_prey_multiagent.py ├── prioritized_experience_replay.py ├── results_predators_prey/ │ ├── rewards_files/ │ │ └── 5e-05_RMSProp_1000000_64_10000_100000_100_0_4_256_DQN_PER_0.5_False_3_5_1_1.csv │ ├── timesteps_files/ │ │ └── 5e-05_RMSProp_1000000_64_10000_100000_100_0_4_256_DQN_PER_0.5_False_3_5_1_1.csv │ └── weights_files/ │ ├── 5e-05_RMSProp_1000000_64_10000_100000_100_0_4_256_DQN_PER_0.5_False_3_5_1_1_0.h5 │ ├── 5e-05_RMSProp_1000000_64_10000_100000_100_0_4_256_DQN_PER_0.5_False_3_5_1_1_1.h5 │ └── 5e-05_RMSProp_1000000_64_10000_100000_100_0_4_256_DQN_PER_0.5_False_3_5_1_1_2.h5 ├── sum_tree.py └── uniform_experience_replay.py
SYMBOL INDEX (73 symbols across 9 files)
FILE: agents_landmarks_multiagent.py
function get_name_brain (line 22) | def get_name_brain(args, idx):
function get_name_rewards (line 29) | def get_name_rewards(args):
function get_name_timesteps (line 36) | def get_name_timesteps(args):
class Environment (line 43) | class Environment(object):
method __init__ (line 45) | def __init__(self, arguments):
method run (line 62) | def run(self, agents, file1, file2):
FILE: brain.py
function huber_loss (line 18) | def huber_loss(y_true, y_predict):
class Brain (line 29) | class Brain(object):
method __init__ (line 31) | def __init__(self, state_size, action_size, brain_name, arguments):
method _build_model (line 44) | def _build_model(self):
method train (line 94) | def train(self, x, y, sample_weight=None, epochs=1, verbose=0): # x i...
method predict (line 98) | def predict(self, state, target=False):
method predict_one_sample (line 104) | def predict_one_sample(self, state, target=False):
method update_target_model (line 107) | def update_target_model(self):
method save_model (line 110) | def save_model(self):
FILE: dqn_agent.py
class Agent (line 22) | class Agent(object):
method __init__ (line 27) | def __init__(self, state_size, action_size, bee_index, brain_name, arg...
method greedy_actor (line 54) | def greedy_actor(self, state):
method find_targets_per (line 60) | def find_targets_per(self, batch):
method find_targets_uer (line 100) | def find_targets_uer(self, batch):
method observe (line 140) | def observe(self, sample):
method decay_epsilon (line 152) | def decay_epsilon(self):
method replay (line 166) | def replay(self):
method update_target_model (line 191) | def update_target_model(self):
FILE: environments/agents_landmarks/env.py
class agentslandmarks (line 32) | class agentslandmarks:
method __init__ (line 41) | def __init__(self, args, current_path):
method set_positions_idx (line 85) | def set_positions_idx(self):
method reset (line 112) | def reset(self): # initialize the world
method step (line 129) | def step(self, agents_actions):
method update_positions (line 196) | def update_positions(self, pos_list, act_list):
method action_space (line 226) | def action_space(self):
method render (line 229) | def render(self):
method gui_setup (line 281) | def gui_setup(self):
method find_frequency (line 303) | def find_frequency(self, a, items):
FILE: environments/predators_prey/env.py
class PredatorsPrey (line 32) | class PredatorsPrey(object):
method __init__ (line 42) | def __init__(self, args, current_path):
method set_positions_idx (line 85) | def set_positions_idx(self):
method reset (line 111) | def reset(self): # initialize the world
method fix_prey (line 131) | def fix_prey(self):
method actor_prey_random (line 134) | def actor_prey_random(self):
method actor_prey_random_escape (line 137) | def actor_prey_random_escape(self, prey_index):
method neighbor_finder (line 143) | def neighbor_finder(self, pos):
method empty_neighbor_finder (line 159) | def empty_neighbor_finder(self, pos):
method step (line 183) | def step(self, predators_actions):
method check_catching (line 205) | def check_catching(self):
method update_positions (line 255) | def update_positions(self, pos_list, act_list):
method action_space (line 284) | def action_space(self):
method render (line 287) | def render(self):
method gui_setup (line 340) | def gui_setup(self):
FILE: predators_prey_multiagent.py
function get_name_brain (line 23) | def get_name_brain(args, idx):
function get_name_rewards (line 30) | def get_name_rewards(args):
function get_name_timesteps (line 37) | def get_name_timesteps(args):
class Environment (line 44) | class Environment(object):
method __init__ (line 46) | def __init__(self, arguments):
method run (line 65) | def run(self, agents, file1, file2):
FILE: prioritized_experience_replay.py
class Memory (line 12) | class Memory(object):
method __init__ (line 15) | def __init__(self, capacity, pr_scale):
method get_priority (line 21) | def get_priority(self, error):
method remember (line 24) | def remember(self, sample, error):
method sample (line 30) | def sample(self, n):
method update (line 48) | def update(self, batch_indices, errors):
FILE: sum_tree.py
class SumTree (line 4) | class SumTree(object):
method __init__ (line 6) | def __init__(self, capacity):
method _propagate (line 12) | def _propagate(self, idx, change):
method _retrieve (line 20) | def _retrieve(self, idx, s):
method total (line 32) | def total(self):
method add (line 35) | def add(self, p, data):
method update (line 45) | def update(self, idx, p):
method get (line 59) | def get(self, s):
FILE: uniform_experience_replay.py
class Memory (line 12) | class Memory(object):
method __init__ (line 14) | def __init__(self, capacity):
method remember (line 18) | def remember(self, sample):
method sample (line 21) | def sample(self, n):
Condensed preview — 19 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (67K chars).
[
{
"path": "LICENSE",
"chars": 1072,
"preview": "MIT License\n\nCopyright (c) 2019 mohammadasghari\n\nPermission is hereby granted, free of charge, to any person obtaining a"
},
{
"path": "README.md",
"chars": 3524,
"preview": "# Deep Q-learning (DQN) for Multi-agent Reinforcement Learning (RL)\n\nDQN implementation for two multi-agent environments"
},
{
"path": "agents_landmarks_multiagent.py",
"chars": 10007,
"preview": "\"\"\"\nCreated on Wednesday Jan 16 2019\n\n@author: Seyed Mohammad Asghari\n@github: https://github.com/s3yyy3d-m\n\"\"\"\n\nimport"
},
{
"path": "brain.py",
"chars": 3787,
"preview": "\"\"\"\nCreated on Wednesday Jan 16 2019\n\n@author: Seyed Mohammad Asghari\n@github: https://github.com/s3yyy3d-m\n\"\"\"\n\nimport"
},
{
"path": "dqn_agent.py",
"chars": 6309,
"preview": "\"\"\"\nCreated on Wednesday Jan 16 2019\n\n@author: Seyed Mohammad Asghari\n@github: https://github.com/s3yyy3d-m\n\"\"\"\n\nimport"
},
{
"path": "environments/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "environments/agents_landmarks/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "environments/agents_landmarks/env.py",
"chars": 11698,
"preview": "\"\"\"\nCreated on Wednesday Jan 16 2019\n\n@author: Seyed Mohammad Asghari\n@github: https://github.com/s3yyy3d-m\n\"\"\"\n\nimport"
},
{
"path": "environments/predators_prey/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "environments/predators_prey/env.py",
"chars": 14060,
"preview": "\"\"\"\nCreated on Wednesday Jan 16 2019\n\n@author: Seyed Mohammad Asghari\n@github: https://github.com/s3yyy3d-m\n\"\"\"\n\nimport"
},
{
"path": "predators_prey_multiagent.py",
"chars": 10590,
"preview": "\"\"\"\nCreated on Wednesday Jan 16 2019\n\n@author: Seyed Mohammad Asghari\n@github: https://github.com/s3yyy3d-m\n\"\"\"\n\nimport"
},
{
"path": "prioritized_experience_replay.py",
"chars": 1391,
"preview": "\"\"\"\nCreated on Wednesday Jan 16 2019\n\n@author: Seyed Mohammad Asghari\n@github: https://github.com/s3yyy3d-m\n\"\"\"\n\nimport"
},
{
"path": "results_predators_prey/rewards_files/5e-05_RMSProp_1000000_64_10000_100000_100_0_4_256_DQN_PER_0.5_False_3_5_1_1.csv",
"chars": 16,
"preview": ",score\n0,-984.0\n"
},
{
"path": "results_predators_prey/timesteps_files/5e-05_RMSProp_1000000_64_10000_100000_100_0_4_256_DQN_PER_0.5_False_3_5_1_1.csv",
"chars": 13,
"preview": ",steps\n0,100\n"
},
{
"path": "sum_tree.py",
"chars": 1530,
"preview": "import numpy\n\n\nclass SumTree(object):\n\n def __init__(self, capacity):\n self.write = 0\n self.capacity = "
},
{
"path": "uniform_experience_replay.py",
"chars": 512,
"preview": "\"\"\"\nCreated on Wednesday Jan 16 2019\n\n@author: Seyed Mohammad Asghari\n@github: https://github.com/s3yyy3d-m\n\"\"\"\n\nimport"
}
]
// ... and 3 more files (download for full content)
About this extraction
This page contains the full source code of the mohammadasghari/dqn-multi-agent-rl GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 19 files (63.0 KB), approximately 15.1k tokens, and a symbol index with 73 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.