需要至少安装C++的桌面开发和C++的游戏开发两部分,安装版本见截图;另需安装.NET框架,版本为4.6.2
## 3. 下载虚幻引擎源代码 (非官方,修改过源码)
https://ageasga-my.sharepoint.com/:u:/g/personal/fuqingxu_yiteam_tech/Ee3lQrUjKNFMjPITm5G-hEgBbeEN6dMOPtKP9ssgONKJcA?e=BavOoJ
## 4. 编译虚幻引擎
1. 解压源代码(到至少150GB空间的磁盘)
1. Open your source folder in Explorer and run **Setup.bat**.
This will download binary content for the engine, as well as installing prerequisites and setting up Unreal file associations.
On Windows 8, a warning from SmartScreen may appear. Click "More info", then "Run anyway" to continue. 运行**Setup.bat**
A clean download of the engine binaries is currently 3-4gb, which may take some time to complete.
Subsequent checkouts only require incremental downloads and will be much quicker. 需要一段时间
1. Run **GenerateProjectFiles.bat** to create project files for the engine. It should take less than a minute to complete. 运行 **GenerateProjectFiles.bat**
1. Load the project into Visual Studio by double-clicking on the **UE4.sln** file. Set your solution configuration to **Development Editor** and your solution
platform to **Win64**, then right click on the **UE4** target and select **Build**. It may take anywhere between 10 and 40 minutes to finish compiling, depending on your system specs. 打开**UE4.sln**,界面顶部的项目配置为**Development Editor** 和 **Win64**, 右击界面右侧菜单的**UE4**,点击**Build**,需要20分钟~1小时时间编译(配置好后,点选界面上面栏目中的生成Build,点击生成UE4)
1. 右键点击项目工程文件夹中的UHMP.uproject,按照图示选择虚幻引擎版本为4.27.2-release,进行初步生成项目的操作;之后打开UHMP.sln,将界面顶部的项目配置为**Development Editor** 和 **Win64**,点选右侧资源管理器中的UHMP,再点击界面顶部的生成-->生成UHMP;完成后双击UHMP.uproject打开工程项目
6. 成功打开的虚幻编辑器加载界面应当如下图所示
================================================
FILE: Please_Run_This_First_To_Fetch_Big_Files.py
================================================
import os, commentjson, shutil, subprocess, tqdm, shutil
import zipfile
from modelscope import snapshot_download
try: os.makedirs('./TEMP')
except: pass
version = 'unreal-map-v3.4'
model_dir = snapshot_download(f'BinaryHusky/{version}')
zip_file_path = f'./TEMP/{version}.zip'
def combine_file(model_dir, output_file_path, num_parts):
with open(output_file_path, 'wb') as output_file:
for i in range(0, num_parts):
part_file_path = os.path.join(model_dir, "tensor", f"safetensor_{i+1}.pt")
with open(part_file_path, 'rb') as part_file:
output_file.write(part_file.read())
extract_to_path = './'
combine_file(model_dir, output_file_path=zip_file_path, num_parts=5)
# 打开 ZIP 文件
with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
# 解压所有文件到指定目录
zip_ref.extractall(extract_to_path)
print(f"files unzipped {extract_to_path}")
print("everything is ready!")
================================================
FILE: PythonExample/README.md
================================================
This demo program can connect to developing/compiled U-MAP environment to debug your simulation.
This program is a copy of another resp: https://github.com/binary-husky/hmp2g
================================================
FILE: PythonExample/hmp_minimal_modules/.gitattributes
================================================
*.js linguist-detectable=false
================================================
FILE: PythonExample/hmp_minimal_modules/.gitignore
================================================
# Build and Release Folders
bin-debug/
bin-release/
*/__pycache__
[Oo]bj/
[Bb]in/
# Other files and folders
.settings/
# Executables
*.swf
*.air
*.ipa
*.apk
*.so
*.pyc
*.pyd
*.so
# gpu lock
*.glock
*.mp3
*.png
# pytorch model
*.pt
core
!MISSION/collective_assult/malib/core
TODO
__pycache__/
./build/
ALGORITHM/Starcraft/result/
ALGORITHM/Starcraft/model/
ZipResults/
VISUALIZE/train no aWiseAttn
VISUALIZE/train-half-death-reward
ZipResults/Starcraft/
result
# .vscode/
forattack-train/
full-cargo/
T*/
checkpoint/
PROFILE/
RECYCLE/
TEMP/
test_only_*.py
test_only_log
test_only_profile.txt
test_only_profilex.txt
debug_change_self_n_agent.json.profile.txt
debug2-2500pt-test_only_profile.txt
test_only_logdebug_change_self_n_agent.json.log
test_only_profiledebug_change_self_n_agent.json.txt
xx_profile_n_agents2.py
xx_profile_n_agents3.py
xx_profile_n_agents4.py
UTIL/keys.py
private*
fqx*.jsonc
my_*.jsonc
result.prof
ignore
bvrAI.log
ZHECKPOINT/*
THIRDPARTY/pymarl2/test
!ZHECKPOINT/test-50+50
ZHECKPOINT/test-50+50/*
!ZHECKPOINT/test-50+50/model.pt
!ZHECKPOINT/test-50+50/experiment.json
!ZHECKPOINT/test-50+50/test-50+50.jsonc
!ZHECKPOINT/test-50+50/test50.gif
!ZHECKPOINT/test-100+100
ZHECKPOINT/test-100+100/*
!ZHECKPOINT/test-100+100/model.pt
!ZHECKPOINT/test-100+100/experiment.json
!ZHECKPOINT/test-100+100/test-100+100.jsonc
!ZHECKPOINT/50RL-55opp
ZHECKPOINT/50RL-55opp/*
!ZHECKPOINT/50RL-55opp/test-50RL-55opp.jsonc
!ZHECKPOINT/50RL-55opp/model.pt
ZHECKPOINT/50RL-55opp/experiment.json
!ZHECKPOINT/test-cargo50
ZHECKPOINT/test-cargo50/*
!ZHECKPOINT/test-cargo50/model.pt
!ZHECKPOINT/test-cargo50/experiment.json
!ZHECKPOINT/test-cargo50/test-cargo50.jsonc
!ZHECKPOINT/test-cargo50/cargo50.jpg
!ZHECKPOINT/test-cargo50/history_cpt
ZHECKPOINT/test-cargo50/history_cpt/*
!ZHECKPOINT/test-cargo50/history_cpt/init.pkl
!ZHECKPOINT/test-50+50/butterfly.webp
!ZHECKPOINT/test-aii515
ZHECKPOINT/test-aii515/*
!ZHECKPOINT/test-aii515/model.pt
!ZHECKPOINT/test-aii515/experiment.json
!ZHECKPOINT/test-aii515/test-aii515.jsonc
!ZHECKPOINT/test-aii515/aii.jpg
!ZHECKPOINT/test-aii515/history_cpt
ZHECKPOINT/test-aii515/history_cpt/*
!ZHECKPOINT/test-aii515/history_cpt/init.pkl
!ZHECKPOINT/basic-ma-40-demo
ZHECKPOINT/basic-ma-40-demo/*
!ZHECKPOINT/basic-ma-40-demo/trained_model.pt
!ZHECKPOINT/basic-ma-40-demo/train.json
!ZHECKPOINT/basic-ma-40-demo/test.json
!ZHECKPOINT/adca-demo
ZHECKPOINT/adca-demo/*
!ZHECKPOINT/adca-demo/model_trained.pt
!ZHECKPOINT/adca-demo/train.json
!ZHECKPOINT/adca-demo/test.json
!ZHECKPOINT/uhmap_hete10vs10
ZHECKPOINT/uhmap_hete10vs10/backup_files
ZHECKPOINT/uhmap_hete10vs10/logger
!ZHECKPOINT/uhmap_hete10vs10/model_trained.pt
ZHECKPOINT/uhmap_hete10vs10/experiment.jsonc
cmd_io.txt
rec.jpg
detail_reward.jpg
z_*
ALGORITHM/conc_4hist_divtree3
ALGORITHM/conc_4hist_divtree2
example_dca_cs*
PersonalityDevelop.pdf
6vs7Pr-continue-train.jsonc
6vs7Pr.jsonc
6vs7PrTry2-Link.jsonc
6vs7PrTry2.jsonc
7vs7Pr.json
7vs7Pr.jsonc
batch_experiment_backup.py
mcom_buffer_0____starting_session.txt
temp.jpg
temp.jpg.jpg
x.txt
debug*.jsonc
HLT_eval.py
qplex-pad.jsonc
raw_exp.jsonc
info.json
UTIL/mem_watcher.py
ALGORITHM/mirror*
================================================
FILE: PythonExample/hmp_minimal_modules/.gitmodules
================================================
[submodule "THIRDPARTY/pymarl2/pymarl2src"]
path = THIRDPARTY/pymarl2/pymarl2src
url = https://github.com/binary-husky/pymarl-hmap-compat.git
branch = master
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/common/alg_base.py
================================================
import os, time, torch, traceback
import numpy as np
from config import GlobalConfig
from UTIL.colorful import *
class AlgorithmBase():
def __init__(self, n_agent, n_thread, space, mcv=None, team=None):
self.n_thread = n_thread
self.n_agent = n_agent
self.team = team
self.act_space = space['act_space']
self.obs_space = space['obs_space']
self.ScenarioConfig = GlobalConfig.ScenarioConfig
self.mcv = mcv
self.device = GlobalConfig.device
def interact_with_env(self, team_intel):
raise NotImplementedError
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/common/attention.py
================================================
import math
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.distributions.categorical import Categorical
from torch.distributions.multivariate_normal import MultivariateNormal
from UTIL.tensor_ops import my_view
class MultiHeadAttention(nn.Module):
# taken from https://github.com/wouterkool/attention-tsp/blob/master/graph_encoder.py
def __init__(
self,
n_heads,
input_dim,
embed_dim=None,
val_dim=None,
key_dim=None
):
super(MultiHeadAttention, self).__init__()
if val_dim is None:
assert embed_dim is not None, "Provide either embed_dim or val_dim"
val_dim = embed_dim // n_heads
if key_dim is None:
key_dim = val_dim
self.n_heads = n_heads
self.input_dim = input_dim
self.embed_dim = embed_dim
self.val_dim = val_dim
self.key_dim = key_dim
self.norm_factor = 1 / math.sqrt(key_dim) # See Attention is all you need
self.W_query = nn.Parameter(torch.Tensor(n_heads, input_dim, key_dim))
self.W_key = nn.Parameter(torch.Tensor(n_heads, input_dim, key_dim))
self.W_val = nn.Parameter(torch.Tensor(n_heads, input_dim, val_dim))
if embed_dim is not None:
self.W_out = nn.Parameter(torch.Tensor(n_heads, key_dim, embed_dim))
self.init_parameters()
def init_parameters(self):
for param in self.parameters():
stdv = 1. / math.sqrt(param.size(-1))
param.data.uniform_(-stdv, stdv)
def forward(self, q, k=None, v=None, mask=None, return_attn=False, return_attn_weight=False):
if q.dim()<=3:
out = self.forward_(q, k, v, mask, return_attn, return_attn_weight)
if return_attn:
out, attn = out
assert attn.shape[0]==1
attn = attn.squeeze(0)
return out, attn
return out
hyper_dim = q.shape[:-2]
q = my_view(q, [-1, *q.shape[-2:]])
if k is not None:
k = my_view(k, [-1, *k.shape[-2:]])
if v is not None:
v = my_view(v, [-1, *v.shape[-2:]])
if mask is not None: mask = my_view(mask, [-1, *mask.shape[-2:]])
out = self.forward_(q, k, v, mask, return_attn, return_attn_weight)
if return_attn:
out, attn = out
if hyper_dim is not None:
out = out.view(*hyper_dim, *out.shape[-2:])
attn = attn.view(*hyper_dim, *attn.shape[-2:]) #??
return out, attn
else:
if hyper_dim is not None:
out = out.view(*hyper_dim, *q.shape[-2:])
return out
def forward_(self, q, k=None, v=None, mask=None, return_attn=False, return_attn_weight=False):
"""
:param q: queries (batch_size, n_query, input_dim)
:param k: data (batch_size, n_key/graph_size, input_dim)
:param mask: mask (batch_size, n_query, graph_size) or viewable as that (i.e. can be 2 dim if n_query == 1)
Mask should contain 1 if attention is not possible (i.e. mask is negative adjacency)
:return:
"""
if k is None:
k = q # compute self-attention
if v is None:
v = k
# k should be (batch_size, graph_size, input_dim)
batch_size, graph_size, input_dim = k.size()
n_query = q.size(1)
assert q.size(0) == batch_size
assert q.size(2) == input_dim
assert input_dim == self.input_dim, "Wrong embedding dimension of input"
kflat = k.contiguous().view(-1, input_dim)
qflat = q.contiguous().view(-1, input_dim)
vflat = v.contiguous().view(-1, input_dim)
# last dimension can be different for keys and values
shp = (self.n_heads, batch_size, graph_size, -1)
shp_q = (self.n_heads, batch_size, n_query, -1)
# Calculate queries, (n_heads, n_query, graph_size, key/val_size)
Q = torch.matmul(qflat, self.W_query).view(shp_q)
# Calculate keys and values (n_heads, batch_size, graph_size, key/val_size)
K = torch.matmul(kflat, self.W_key).view(shp)
V = torch.matmul(vflat, self.W_val).view(shp)
# Calculate compatibility (n_heads, batch_size, n_query, graph_size)
compatibility = self.norm_factor * torch.matmul(Q, K.transpose(2, 3))
if return_attn_weight:
assert self.n_heads == 1
if mask is not None:
mask = mask.view(1, batch_size, n_query, graph_size).expand_as(compatibility)
compatibility[mask.bool()] = -math.inf
return compatibility.squeeze(0)
# Optionally apply mask to prevent attention
if mask is not None: # expand to n_heads
mask = mask.view(1, batch_size, n_query, graph_size).expand_as(compatibility)
compatibility[mask.bool()] = -math.inf
attn = F.softmax(compatibility, dim=-1)
# If there are nodes with no neighbours then softmax returns nan so we fix them to 0
if mask is not None:
attnc = attn.clone()
attnc[mask.bool()] = 0
attn = attnc
# 为了在这里解决 0*nan = nan 的问题,输入必须将V中的nan转化为0
heads = torch.matmul(attn, V)
out = torch.mm(
heads.permute(1, 2, 0, 3).contiguous().view(-1, self.n_heads * self.val_dim),
self.W_out.view(-1, self.embed_dim)
).view(batch_size, n_query, self.embed_dim)
if return_attn:
return out, attn
return out
class SimpleAttention(nn.Module):
def __init__(self, h_dim):
super().__init__()
self.W_query = nn.Parameter(torch.Tensor(h_dim, h_dim))
self.W_key = nn.Parameter(torch.Tensor(h_dim, h_dim))
self.W_val = nn.Parameter(torch.Tensor(h_dim, h_dim))
self.init_parameters()
def init_parameters(self):
for param in self.parameters():
stdv = 1. / math.sqrt(param.size(-1))
param.data.uniform_(-stdv, stdv)
def forward(self, k, q, v, mask=None):
Q = torch.matmul(q, self.W_query)
K = torch.matmul(k, self.W_key)
V = torch.matmul(v, self.W_val)
norm_factor = 1 / math.sqrt(Q.shape[-1])
compat = norm_factor * torch.matmul(Q, K.transpose(-1, -2))
if mask is not None: compat[mask.bool()] = -math.inf
# 为了在这里解决 0*nan = nan 的问题,输入必须将V中的nan转化为0
score = torch.nan_to_num(F.softmax(compat, dim=-1), 0)
return torch.matmul(score, V)
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/common/conc.py
================================================
import math
import torch,time,random
import torch.nn as nn
import torch.nn.functional as F
from UTIL.tensor_ops import my_view, __hash__, __hashn__, pad_at_dim, gather_righthand
class Concentration(nn.Module):
def __init__(self, n_focus_on, h_dim, skip_connect=False, skip_connect_dim=0, adopt_selfattn=False):
super().__init__()
self.n_focus_on = n_focus_on
self.skip_connect = skip_connect
self.skip_dim = h_dim+skip_connect_dim
self.CT_W_query = nn.Parameter(torch.Tensor(h_dim, h_dim))
self.CT_W_key = nn.Parameter(torch.Tensor(h_dim, h_dim))
self.CT_W_val = nn.Parameter(torch.Tensor(h_dim, h_dim))
self.CT_motivate_mlp = nn.Sequential(nn.Linear(h_dim * 2, h_dim), nn.ReLU(inplace=True))
self.AT_forward_mlp = nn.Sequential(nn.Linear((n_focus_on+1)*self.skip_dim, h_dim), nn.ReLU(inplace=True))
self.adopt_selfattn = adopt_selfattn
if self.adopt_selfattn:
assert False, ('no longer support')
self.init_parameters()
def init_parameters(self):
for param in self.parameters():
stdv = 1. / math.sqrt(param.size(-1))
param.data.uniform_(-stdv, stdv)
def forward(self, vs, ve, ve_dead, skip_connect_ze=None, skip_connect_zs=None):
mask = ve_dead
Q = torch.matmul(vs, self.CT_W_query)
K = torch.matmul(ve, self.CT_W_key)
norm_factor = 1 / math.sqrt(Q.shape[-1])
compat = norm_factor * torch.matmul(Q, K.transpose(2, 3))
assert compat.shape[-2] == 1
compat = compat.squeeze(-2)
compat[mask.bool()] = -math.inf
score = F.softmax(compat, dim=-1)
# nodes with no neighbours were softmax into nan, fix them to 0
score = torch.nan_to_num(score, 0)
# ----------- motivational brach -------------
Va = torch.matmul(score.unsqueeze(-2), torch.matmul(ve, self.CT_W_val))
v_M = torch.cat((vs, Va), -1).squeeze(-2)
v_M_final = self.CT_motivate_mlp(v_M)
# ----------- forward branch -------------
score_sort_index = torch.argsort(score, dim=-1, descending=True)
score_sort_drop_index = score_sort_index[..., :self.n_focus_on]
if self.skip_connect:
ve = torch.cat((ve, skip_connect_ze), -1)
vs = torch.cat((vs, skip_connect_zs), -1)
ve_C = gather_righthand(src=ve, index=score_sort_drop_index, check=False)
need_padding = (score_sort_drop_index.shape[-1] != self.n_focus_on)
if need_padding:
print('the n_focus param is large than input, advise: pad observation instead of pad here')
ve_C = pad_at_dim(ve_C, dim=-2, n=self.n_focus_on)
v_C_stack = torch.cat((vs, ve_C), dim=-2)
if self.adopt_selfattn:
v_C_stack = self.AT_Attention(v_C_stack, mask=None)
v_C_flat = my_view(v_C_stack, [0, 0, -1]); assert v_C_stack.dim()==4
v_C_final = self.AT_forward_mlp(v_C_flat)
return v_C_final, v_M_final
class ConcentrationHete(nn.Module):
def __init__(self, n_focus_on, h_dim, skip_connect=False, skip_connect_dim=0, adopt_selfattn=False):
super().__init__()
self.n_focus_on = n_focus_on
self.skip_connect = skip_connect
self.skip_dim = h_dim+skip_connect_dim
self.AT_W_query = nn.Parameter(torch.Tensor(h_dim, h_dim))
self.AT_W_key = nn.Parameter(torch.Tensor(h_dim, h_dim))
self.AT_W_val = nn.Parameter(torch.Tensor(h_dim, h_dim))
self.AT_motivate_mlp = nn.Sequential(nn.Linear(h_dim * 2, h_dim), nn.ReLU(inplace=True))
self.AT_forward_mlp = nn.Sequential(nn.Linear((n_focus_on+1)*self.skip_dim, h_dim), nn.ReLU(inplace=True))
self.adopt_selfattn = adopt_selfattn
if self.adopt_selfattn:
assert False, ('no longer support')
self.init_parameters()
def init_parameters(self):
for param in self.parameters():
stdv = 1. / math.sqrt(param.size(-1))
param.data.uniform_(-stdv, stdv)
def forward(self, vs, ve, ve_dead, skip_connect_ze=None, skip_connect_zs=None):
mask = ve_dead
Q = torch.matmul(vs, self.AT_W_query)
K = torch.matmul(ve, self.AT_W_key)
norm_factor = 1 / math.sqrt(Q.shape[-1])
compat = norm_factor * torch.matmul(Q, K.transpose(2, 3))
assert compat.shape[-2] == 1
compat = compat.squeeze(-2)
compat[mask.bool()] = -math.inf
score = F.softmax(compat, dim=-1)
# nodes with no neighbours were softmax into nan, fix them to 0
score = torch.nan_to_num(score, 0)
# ----------- motivational brach -------------
Va = torch.matmul(score.unsqueeze(-2), torch.matmul(ve, self.AT_W_val))
v_M = torch.cat((vs, Va), -1).squeeze(-2)
v_M_final = self.AT_motivate_mlp(v_M)
# ----------- forward branch -------------
score_sort_index = torch.argsort(score, dim=-1, descending=True)
score_sort_drop_index = score_sort_index[..., :self.n_focus_on]
if self.skip_connect:
ve = torch.cat((ve, skip_connect_ze), -1)
vs = torch.cat((vs, skip_connect_zs), -1)
ve_C = gather_righthand(src=ve, index=score_sort_drop_index, check=False)
need_padding = (score_sort_drop_index.shape[-1] != self.n_focus_on)
if need_padding:
print('the n_focus param is large than input, advise: pad observation instead of pad here')
ve_C = pad_at_dim(ve_C, dim=-2, n=self.n_focus_on)
v_C_stack = torch.cat((vs, ve_C), dim=-2)
if self.adopt_selfattn:
v_C_stack = self.AT_Attention(v_C_stack, mask=None)
v_C_flat = my_view(v_C_stack, [0, 0, -1]); assert v_C_stack.dim()==4
v_C_final = self.AT_forward_mlp(v_C_flat)
return v_C_final, v_M_final
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/common/dl_pool.py
================================================
"""
Author: Fu Qingxu,CASIA
Description: deep learning sample manager
"""
import torch
import numpy as np
class DeepLearningPool(object):
def __init__(self, pool_size, batch_size) -> None:
super().__init__()
self.x_batch = None
self.y_batch = None
self.size = pool_size
self.batch_size = batch_size
def add_and_sample(self, x, y):
n_sample = x.shape[0]
assert n_sample > 0
if self.x_batch is None:
self.x_batch = np.zeros(shape=(self.size, *x.shape[1:]), dtype=x.dtype)
self.y_batch = np.zeros(shape=(self.size, *y.shape[1:]), dtype=y.dtype)
self.current_idx = 0
self.current_size = 0
idx = self._get_storage_idx(n_sample)
self.x_batch[idx] = x
self.y_batch[idx] = y
return self._sample()
def _get_storage_idx(self, inc=None):
inc = inc or 1
if self.current_idx + inc <= self.size:
idx = np.arange(self.current_idx, self.current_idx + inc)
self.current_idx += inc
elif self.current_idx < self.size:
overflow = inc - (self.size - self.current_idx)
idx_a = np.arange(self.current_idx, self.size)
idx_b = np.arange(0, overflow)
idx = np.concatenate([idx_a, idx_b])
self.current_idx = overflow
else:
idx = np.arange(0, inc)
self.current_idx = inc
self.current_size = min(self.size, self.current_size + inc)
if inc == 1:
idx = idx[0]
return idx
def _sample(self):
idx = np.random.randint(0, self.current_size, self.batch_size)
return self.x_batch[idx], self.y_batch[idx]
if __name__ == '__main__':
dlp = DeepLearningPool(10, 7)
res = dlp.add_and_sample(x=np.random.rand(2,2,3),y=np.array([1,2]))
print(dlp.y_batch,'res',res[1])
res = dlp.add_and_sample(x=np.random.rand(4,2,3),y=np.array([3,4,5,6]))
print(dlp.y_batch,'res',res[1])
res = dlp.add_and_sample(x=np.random.rand(3,2,3),y=np.array([7,8,9]))
print(dlp.y_batch,'res',res[1])
res = dlp.add_and_sample(x=np.random.rand(3,2,3),y=np.array([10,11,12]))
print(dlp.y_batch,'res',res[1])
res = dlp.add_and_sample(x=np.random.rand(3,2,3),y=np.array([13,14,15]))
print(dlp.y_batch,'res',res[1])
res = dlp.add_and_sample(x=np.random.rand(3,2,3),y=np.array([16,17,18]))
print(dlp.y_batch,'res',res[1])
print('end of test')
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/common/his.py
================================================
import matplotlib.pyplot as plt
import numpy as np
import matplotlib
# 设置matplotlib正常显示中文和负号
matplotlib.rcParams['font.sans-serif']=['SimHei'] # 用黑体显示中文
matplotlib.rcParams['axes.unicode_minus']=False # 正常显示负号
# 随机生成(10000,)服从正态分布的数据
data = np.random.randn(10000)
"""
绘制直方图
data:必选参数,绘图数据
bins:直方图的长条形数目,可选项,默认为10
normed:是否将得到的直方图向量归一化,可选项,默认为0,代表不归一化,显示频数。normed=1,表示归一化,显示频率。
facecolor:长条形的颜色
edgecolor:长条形边框的颜色
alpha:透明度
"""
plt.hist(data, bins=40, facecolor="blue", edgecolor="black", alpha=0.7)
# 显示横轴标签
plt.xlabel("区间")
# 显示纵轴标签
plt.ylabel("频数/频率")
# 显示图标题
plt.title("频数/频率分布直方图")
plt.show()
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/common/hyper_net.py
================================================
from re import X
import torch
import torch.nn as nn
import torch.nn.functional as F
from UTIL.tensor_ops import my_view
class HyperNet(nn.Module):
def __init__(self, **kwargs):
super(HyperNet, self).__init__()
self.x_input_dim = kwargs['x_input_dim']
self.embed_dim = kwargs['embed_dim']
self.hyper_input_dim = kwargs['hyper_input_dim']
# hyper w1 b1
self.hyper_w1 = nn.Sequential( nn.Linear(self.hyper_input_dim, self.embed_dim),
nn.ReLU(inplace=True),
nn.Linear(self.embed_dim, self.x_input_dim * self.embed_dim))
self.hyper_b1 = nn.Sequential(nn.Linear(self.hyper_input_dim, self.embed_dim))
# hyper w2 b2
self.hyper_w2 = nn.Sequential(
nn.Linear(self.hyper_input_dim, self.embed_dim),
nn.ReLU(inplace=True),
nn.Linear(self.embed_dim, self.embed_dim * self.embed_dim))
self.hyper_b2 = nn.Sequential(nn.Linear(self.hyper_input_dim, self.embed_dim),
nn.ReLU(inplace=True),
nn.Linear(self.embed_dim, 1))
def forward(self, x, hyper_x):
# x shape (thread/batch, agent, core)
# hyper_x shape (thread/batch, core)
assert hyper_x.dim() == 3
# reshape w1 into # (..., x_input_dim, embed_dim)
w1 = my_view(self.hyper_w1(hyper_x), [0, 0, self.x_input_dim, self.embed_dim])
b1 = self.hyper_b1(hyper_x).unsqueeze(-2) # b1 (thread/batch, core=embed_dim)
# Second layer
w2 = my_view(self.hyper_w2(hyper_x), [0, 0, self.embed_dim, self.embed_dim])
b2 = self.hyper_b2(hyper_x).unsqueeze(-2)
## x shape = (..., x_input_dim)
## w1 shape = (..., x_input_dim, embed_dim)
# x reshape = (..., 1, x_input_dim)
x = x.unsqueeze(-2)
hidden = F.elu(torch.matmul(x, w1) + b1) # b * t, 1, emb
# Forward (batch, 1, 32) * w2(batch, 32, 1) => y(batch, 1)
y = torch.matmul(hidden, w2) + b2 # b * t, 1, 1
return y.squeeze(-2)
class MyHyperNet(nn.Module):
def __init__(self, x_in_dim, hyber_in_dim, layer_out_dims, hyber_hid_dim):
super(MyHyperNet, self).__init__()
self.x_in_dim = x_in_dim
self.layer_out_dims = layer_out_dims
self.hyber_in_dim = hyber_in_dim
self.hyber_hid_dim = hyber_hid_dim
self.n_layer = len(self.layer_out_dims)
self.layer_dim_dict = [(x_in_dim, layer_out_dims[0])] + [(d_in, d_out) for d_in, d_out in zip(layer_out_dims[:-1], layer_out_dims[1:])]
self.weight_each_layer = nn.ModuleList([
nn.Sequential(nn.Linear(self.hyber_in_dim, self.hyber_hid_dim), nn.ReLU(inplace=True), nn.Linear(self.hyber_hid_dim, d_in * d_out))
for d_in, d_out in self.layer_dim_dict
])
self.bias_each_layer = nn.ModuleList([
nn.Sequential(nn.Linear(self.hyber_in_dim, self.hyber_hid_dim), nn.ReLU(inplace=True), nn.Linear(self.hyber_hid_dim, d_out))
for d_in, d_out in self.layer_dim_dict
])
def forward(self, x, hyper_x):
# x shape (thread/batch, agent, core)
# hyper_x shape (thread/batch, core)
assert hyper_x.dim() == 3
x = x.unsqueeze(-2)
for i in range(self.n_layer):
d_in, d_out = self.layer_dim_dict[i]
w = my_view(self.weight_each_layer[i](hyper_x), [0, 0, d_in, d_out])
b = self.bias_each_layer[i](hyper_x).unsqueeze(-2)
x = torch.matmul(x, w) + b
is_last_layer = (i==(self.n_layer-1))
if is_last_layer:
# do NOT use relu at last layer
pass
else:
x = F.relu(x, inplace=True)
return x.squeeze(-2)
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/common/logit2act.py
================================================
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.distributions.categorical import Categorical
from UTIL.tensor_ops import my_view, Args2tensor_Return2numpy, Args2tensor
from UTIL.tensor_ops import pt_inf
"""
network initialize
"""
class Logit2Act(nn.Module):
def __init__(self, *args, **kwargs):
super().__init__()
def _logit2act_rsn(self, logits_agent_cluster, eval_mode, greedy, eval_actions=None, avail_act=None, eprsn=None):
if avail_act is not None: logits_agent_cluster = torch.where(avail_act>0, logits_agent_cluster, -pt_inf())
act_dist = self.ccategorical.feed_logits(logits_agent_cluster)
if not greedy: act = self.ccategorical.sample(act_dist, eprsn) if not eval_mode else eval_actions
else: act = torch.argmax(act_dist.probs, axis=2)
# the policy gradient loss will feedback from here
actLogProbs = self._get_act_log_probs(act_dist, act)
# sum up the log prob of all agents
distEntropy = act_dist.entropy().mean(-1) if eval_mode else None
return act, actLogProbs, distEntropy, act_dist.probs
def _logit2act(self, logits_agent_cluster, eval_mode, greedy, eval_actions=None, avail_act=None, **kwargs):
if avail_act is not None: logits_agent_cluster = torch.where(avail_act>0, logits_agent_cluster, -pt_inf())
act_dist = Categorical(logits = logits_agent_cluster)
if not greedy: act = act_dist.sample() if not eval_mode else eval_actions
else: act = torch.argmax(act_dist.probs, axis=2)
actLogProbs = self._get_act_log_probs(act_dist, act) # the policy gradient loss will feedback from here
# sum up the log prob of all agents
distEntropy = act_dist.entropy().mean(-1) if eval_mode else None
return act, actLogProbs, distEntropy, act_dist.probs
@staticmethod
def _get_act_log_probs(distribution, action):
return distribution.log_prob(action.squeeze(-1)).unsqueeze(-1)
@Args2tensor_Return2numpy
def act(self, *args, **kargs):
return self._act(*args, **kargs)
@Args2tensor
def evaluate_actions(self, *args, **kargs):
return self._act(*args, **kargs, eval_mode=True)
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/common/mlp.py
================================================
import torch
import torch.nn as nn
import torch.nn.functional as F
from .norm import DynamicNorm
class SimpleMLP(nn.Module):
def __init__(self, in_dim, out_dim, hidden_dim=128, use_normalization=False):
super().__init__()
activation_func = nn.ReLU
h_dim = hidden_dim
if use_normalization:
print('test DynamicNorm')
self.mlp = nn.Sequential(
DynamicNorm(in_dim, only_for_last_dim=True, exclude_one_hot=True),
nn.Linear(in_dim, h_dim),
activation_func(inplace=True),
nn.Linear(h_dim, out_dim)
)
else:
self.mlp = nn.Sequential(
nn.Linear(in_dim, h_dim),
activation_func(inplace=True),
nn.Linear(h_dim, out_dim)
)
def forward(self,x):
return self.mlp(x)
class ResLinear(nn.Module):
def __init__(self, io_dim, h_dim, need_input_tf=False, input_tf_dim=None, inplace_relu=True) -> None:
super(ResLinear, self).__init__()
self.need_input_tf = need_input_tf
if need_input_tf:
self.f0 = nn.Linear(input_tf_dim, io_dim)
self.f1 = nn.Linear(io_dim, h_dim)
self.lkrelu = nn.ReLU(inplace=True) if inplace_relu else nn.ReLU(inplace=False)
self.f2 = nn.Linear(h_dim, io_dim)
def forward(self, xo):
if self.need_input_tf:
xo = self.f0(xo)
x = self.lkrelu(self.f1(xo))
x = self.f2(x) + xo
x = self.lkrelu(x)
return x
class LinearFinal(nn.Module):
__constants__ = ['in_features', 'out_features']
in_features: int
out_features: int
weight: torch.Tensor
def __init__(self, in_features: int, out_features: int, bias: bool = True) -> None:
super(LinearFinal, self).__init__()
self.in_features = in_features
self.out_features = out_features
self.weight = nn.Parameter(torch.Tensor(out_features, in_features))
if bias:
self.bias = nn.Parameter(torch.Tensor(out_features))
else:
self.register_parameter('bias', None)
def forward(self, input: torch.Tensor) -> torch.Tensor:
return F.linear(input, self.weight, self.bias)
def extra_repr(self) -> str:
return 'in_features={}, out_features={}, bias={}'.format(
self.in_features, self.out_features, self.bias is not None
)
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/common/net_manifest.py
================================================
import torch.nn as nn
def weights_init(m):
def init_Linear(m, final_layer=False):
nn.init.orthogonal_(m.weight.data)
if final_layer:nn.init.orthogonal_(m.weight.data, gain=0.01)
if m.bias is not None: nn.init.uniform_(m.bias.data, a=-0.02, b=0.02)
initial_fn_dict = {
'Net': None,
'NetCentralCritic': None,
'DataParallel':None,
'BatchNorm1d':None,
'Concentration':None,
'ConcentrationHete':None,
'Pnet':None,
'Sequential':None,
'DataParallel':None,
'Tanh':None,
'ModuleList':None,
'ModuleDict':None,
'MultiHeadAttention':None,
'SimpleMLP':None,
'SimpleAttention':None,
'SelfAttention_Module':None,
'ReLU':None,
'Softmax':None,
'DynamicNorm':None,
'DynamicNormFix':None,
'EXTRACT':None,
'LinearFinal':lambda m:init_Linear(m, final_layer=True),
'Linear':init_Linear,
'ResLinear':None,
'LeakyReLU':None,
'HyperNet':None,
'MyHyperNet':None,
'DivTree':None,
}
classname = m.__class__.__name__
assert classname in initial_fn_dict.keys(), ('how to handle the initialization of this class? ', classname)
init_fn = initial_fn_dict[classname]
if init_fn is None: return
init_fn(m)
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/common/norm.py
================================================
"""
CASIA, fuqingxu
live vector normalization using pytorch,
therefore the parameter of normalization (mean and var)
can be save together with network parameters
light up exclude_one_hot=True to prevent onehot component being normalized
"""
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.distributions.categorical import Categorical
from torch.distributions.multivariate_normal import MultivariateNormal
from UTIL.tensor_ops import my_view
from UTIL.tensor_ops import Args2tensor_Return2numpy
class DynamicNorm(nn.Module):
# ! warning! this module will mess with multi-gpu setting!!
def __init__(self, input_size, only_for_last_dim, exclude_one_hot=True, exclude_nan=False):
super().__init__()
assert only_for_last_dim
self.exclude_one_hot = exclude_one_hot
self.mean = nn.Parameter(torch.zeros(input_size, requires_grad=False), requires_grad=False)
self.var = nn.Parameter(torch.ones(input_size, requires_grad=False), requires_grad=False)
self.n_sample = nn.Parameter(torch.zeros(1, requires_grad=False, dtype=torch.long), requires_grad=False)
if self.exclude_one_hot:
self.one_hot_filter = nn.Parameter(torch.ones(input_size, requires_grad=False, dtype=torch.bool), requires_grad=False)
self.input_size = input_size
self.exclude_nan = exclude_nan
self.patience = 1000
def forward(self, x, get_mu_var=False):
assert self.input_size == x.shape[-1], ('self.input_size',self.input_size,'x.shape[-1]',x.shape[-1])
_2dx = x.detach().reshape(-1, self.input_size)
if self.exclude_nan: _2dx = _2dx[~torch.isnan(_2dx).any(axis=-1)]
this_batch_size = _2dx.shape[0]
# assert this_batch_size>=1
if this_batch_size<=0:
print('Warning! An empty batch just being normalized')
x = torch.clip_((x - self.mean) / torch.sqrt_(self.var + 1e-8), -10, 10)
return x
if self.training:
with torch.no_grad():
this_batch_mean = torch.mean(_2dx, dim=0)
this_batch_var = torch.var(_2dx, dim=0, unbiased=False)
if torch.isnan(this_batch_var).any():
assert False, ('nan value detected in normalization! but you can turn on exclude_nan')
assert _2dx.dim() == 2
delta = this_batch_mean - self.mean
tot_count = self.n_sample + this_batch_size
new_mean = self.mean + delta * this_batch_size / tot_count
m_a = self.var * (self.n_sample)
m_b = this_batch_var * (this_batch_size)
M2 = m_a + m_b + torch.square_(delta) * self.n_sample * this_batch_size / (self.n_sample + this_batch_size)
new_var = M2 / (self.n_sample + this_batch_size)
if self.exclude_one_hot: # 滤除-1,0和1的点位
self.one_hot_filter.data &= ~(((_2dx != 0) & (_2dx != 1) & (_2dx != -1)).any(dim=0))
self.mean.data = torch.where(self.one_hot_filter, self.mean, new_mean) if self.exclude_one_hot else new_mean # new_mean
new_var_clip = torch.clamp(new_var, min=0.01, max=1000)
self.var.data = torch.where(self.one_hot_filter, self.var, new_var_clip) if self.exclude_one_hot else new_var_clip
self.n_sample.data = tot_count
if get_mu_var:
return self.mean, self.var
x = torch.clip_((x - self.mean) / torch.sqrt_(self.var + 1e-8), -10, 10)
return x
# @Args2tensor_Return2numpy
# def get_mean_var(self, x):
# return self.forward(x, get_mu_var=True)
class DynamicNormFix(nn.Module):
# ! warning! this module will mess with multi-gpu setting!!
def __init__(self, input_size, only_for_last_dim, exclude_one_hot=True, exclude_nan=False):
super().__init__()
assert only_for_last_dim
self.exclude_one_hot = exclude_one_hot
self.mean = nn.Parameter(torch.zeros(input_size, requires_grad=False), requires_grad=False)
self.var = nn.Parameter(torch.ones(input_size, requires_grad=False), requires_grad=False)
self.var_fix = nn.Parameter(torch.ones(input_size, requires_grad=False), requires_grad=False)
self.min = nn.Parameter(torch.ones(input_size, requires_grad=False)+float('inf'), requires_grad=False)
self.max = nn.Parameter(torch.ones(input_size, requires_grad=False)-float('inf'), requires_grad=False)
self.n_sample = nn.Parameter(torch.zeros(1, requires_grad=False, dtype=torch.long), requires_grad=False)
if self.exclude_one_hot:
self.one_hot_filter = nn.Parameter(torch.ones(input_size, requires_grad=False, dtype=torch.bool), requires_grad=False)
self.input_size = input_size
self.exclude_nan = exclude_nan
self.patience = 1000
self.var_fix_wait = 1000
# var fixing, T2 is maximum x abs value after normalization
self.T1 = 5
self.T2 = 10
self.TD = (self.T2**2 - self.T1**2)/self.T2**2
self.first_run = True
self.debug = True
# 兼容np
@Args2tensor_Return2numpy
def np_forward(self, x, freeze=False, get_mu_var=False):
return self.forward(x, freeze, get_mu_var)
def forward(self, x, freeze=False, get_mu_var=False):
assert self.input_size == x.shape[-1], ('self.input_size',self.input_size,'x.shape[-1]',x.shape[-1])
_2dx = x.detach().reshape(-1, self.input_size)
if self.exclude_nan: _2dx = _2dx[~torch.isnan(_2dx).any(axis=-1)]
_2dx_view = my_view(_2dx, [-1, 0])
this_batch_size = _2dx.shape[0]
# assert this_batch_size>=1
if this_batch_size<=0:
print('Warning! An empty batch just being normalized')
x = torch.clip_((x - self.mean) / torch.sqrt_(self.var_fix + 1e-8), -10, 10)
return x
if self.training and (not freeze):
with torch.no_grad():
this_batch_mean = torch.mean(_2dx, dim=0)
this_batch_var = torch.var(_2dx, dim=0, unbiased=False)
if torch.isnan(this_batch_var).any():
assert False, ('nan value detected in normalization! but you can turn on exclude_nan')
assert _2dx.dim() == 2
delta = this_batch_mean - self.mean
tot_count = self.n_sample + this_batch_size
new_mean = self.mean + delta * this_batch_size / tot_count
m_a = self.var * (self.n_sample)
m_b = this_batch_var * (this_batch_size)
M2 = m_a + m_b + torch.square_(delta) * self.n_sample * this_batch_size / (self.n_sample + this_batch_size)
new_var = M2 / (self.n_sample + this_batch_size)
if self.exclude_one_hot: # 滤除-1,0和1的点位
self.one_hot_filter.data &= ~(((_2dx != 0) & (_2dx != 1) & (_2dx != -1)).any(dim=0))
self.mean.data = torch.where(self.one_hot_filter, self.mean, new_mean) if self.exclude_one_hot else new_mean # new_mean
# if self.patience > 0: self.check_errors(_2dx, new_var)
self.var.data = torch.where(self.one_hot_filter, self.var, new_var) if self.exclude_one_hot else new_var
# begin fix variance
max_tmp, _ = _2dx_view.max(0)
min_tmp, _ = _2dx_view.min(0)
# if self.first_run:
if self.patience > 0:
self.patience -= 1
self.first_run = False
self.max.data = torch.maximum(max_tmp, self.max)
self.min.data = torch.minimum(min_tmp, self.min)
else:
# self.max.data = torch.maximum(max_tmp, self.max)
# self.min.data = torch.minimum(min_tmp, self.min)
self.max.data = self.max + (torch.maximum(max_tmp, self.max)-self.max) * this_batch_size / tot_count
self.min.data = self.min + (torch.minimum(min_tmp, self.min)-self.min) * this_batch_size / tot_count
# # if self.debug: self.mcv.rec(max_tmp.squeeze().item(), 'batch max')
# # if self.debug: self.mcv.rec(min_tmp.squeeze().item(), 'batch min')
# # if self.debug: self.mcv.rec(torch.maximum(max_tmp, self.max).squeeze().item(), 'hist max')
# # if self.debug: self.mcv.rec(torch.minimum(min_tmp, self.min).squeeze().item(), 'hist min')
# if self.debug: self.mcv.rec(self.max.data, 'fixed max')
# if self.debug: self.mcv.rec(self.min.data, 'fixed min')
# if self.debug: self.mcv.rec_show()
dm = torch.maximum((self.max - self.mean), (self.mean - self.min))
# std_th_1 = dm / self.T1
std_threshold_2 = dm / self.T2
# var1 = std_th_1**2
var2 = std_threshold_2**2
leak = self.TD * self.var + var2 # leak = (var1 - var2)/(var1) *self.var + var2
new_var_fix = torch.maximum(self.var, leak)
self.var_fix.data = torch.where(self.one_hot_filter, self.var_fix, new_var_fix) if self.exclude_one_hot else new_var_fix
# if self.debug: self.mcv.rec(self.var.data, 'var')
# if self.debug: self.mcv.rec(self.var_fix.data, 'var fix')
# if self.debug: self.mcv.rec(self.var_fix.data-self.var.data, 'delta var')
# if self.debug: self.mcv.rec((1 - self.mean) / torch.sqrt_(self.var_fix + 1e-8), 'base line +1')
# if self.debug: self.mcv.rec((-1 - self.mean) / torch.sqrt_(self.var_fix + 1e-8), 'base line -1')
# if self.debug: self.mcv.rec((10 - self.mean) / torch.sqrt_(self.var_fix + 1e-8), 'base line +10')
# if self.debug: self.mcv.rec((-10 - self.mean) / torch.sqrt_(self.var_fix + 1e-8), 'base line -10')
# !!! qq = self.var_fix.data-self.var.data
# !!! if self.patience > 0 and self.patience < 800 and (not (qq==0).all()):
# !!! print('[norm.py] Input issue: cannot be well expressed by normal distribution', torch.where(qq!=0))
self.n_sample.data = tot_count
# t = (_2dx_view - self.mean) / torch.sqrt_(self.var_fix + 1e-8)
if get_mu_var:
return self.mean, self.var_fix
return (x - self.mean) / torch.sqrt_(self.var_fix + 1e-8)
# def check_errors(self, _2dx, new_var):
# self.patience -= 1
'''
test script
import torch, time
from ALGORITHM.common.norm import DynamicNormFix
input_size = 1
only_for_last_dim = True
dynamic_norm = DynamicNormFix(input_size, only_for_last_dim, exclude_one_hot=True, exclude_nan=False)
for _ in range(101100):
# mask = (torch.randn(60, 1, out=None) > 0)
# x = torch.where(mask,
# torch.randn(60, 1, out=None)*10,
# torch.randn(60, 1, out=None)*5,
# )
# 左边
std = 0.01; offset = -0.01; num = 5
x3 = torch.randn(num, 1, out=None) * std + offset
# 中间
std = 0.01; offset = 0; num = 500
x2 = torch.randn(num, 1, out=None) * std + offset
# 右边
std = 0.01; offset = 1; num = 5
x1 = torch.randn(num, 1, out=None) * std + offset
# # 左边
# std = 1; offset = -10; num = 5
# x3 = torch.randn(num, 1, out=None) * std + offset
# # 中间
# std = 1; offset = 5; num = 500
# x2 = torch.randn(num, 1, out=None) * std + offset
# # 右边
# std = 1; offset = 5; num = 5
# x1 = torch.randn(num, 1, out=None) * std + offset
x = torch.cat((x1,x2,x3), 0)
y = dynamic_norm(x)
print(y)
time.sleep(60)
'''
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/common/pca.py
================================================
import numpy as np
def pca(samples, target_dim):
assert len(samples.shape) == 2
data = samples - np.mean(samples,axis=0) # mean at batch dim
covMat = np.cov(data,rowvar=0)
fValue,fVector = np.linalg.eig(covMat)
fValueSort = np.argsort(-fValue)
fValueTopN = fValueSort[:target_dim]
fvectormat = fVector[:,fValueTopN]
down_dim_data = np.dot(data, fvectormat)
return down_dim_data
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/common/ppo_sampler.py
================================================
import torch, math
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np
from random import randint, sample
from torch.utils.data.sampler import BatchSampler, SubsetRandomSampler
from UTIL.colorful import *
from UTIL.tensor_ops import _2tensor, __hash__, repeat_at, _2cpu2numpy
from UTIL.tensor_ops import my_view, scatter_with_nan, sample_balance
from config import GlobalConfig as cfg
from UTIL.gpu_share import GpuShareUnit
class TrajPoolSampler():
def __init__(self, n_div, traj_pool, flag, req_dict, req_dict_rename, prevent_batchsize_oom=False, mcv=None):
self.n_pieces_batch_division = n_div
self.prevent_batchsize_oom = prevent_batchsize_oom
self.mcv = mcv
if self.prevent_batchsize_oom:
assert self.n_pieces_batch_division==1, 'self.n_pieces_batch_division should be 1'
self.num_batch = None
self.container = {}
self.warned = False
assert flag=='train'
# req_dict = ['obs', 'state', 'action', 'actionLogProb', 'return', 'reward', 'threat', 'value']
# req_dict_rename = ['obs', 'state', 'action', 'actionLogProb', 'return', 'reward', 'threat', 'state_value']
if cfg.ScenarioConfig.AvailActProvided:
req_dict.append('avail_act')
req_dict_rename.append('avail_act')
return_rename = "return"
value_rename = "state_value"
advantage_rename = "advantage"
# replace 'obs' to 'obs > xxxx'
for key_index, key in enumerate(req_dict):
key_name = req_dict[key_index]
key_rename = req_dict_rename[key_index]
if not hasattr(traj_pool[0], key_name):
real_key_list = [real_key for real_key in traj_pool[0].__dict__ if (key_name+'>' in real_key)]
assert len(real_key_list) > 0, ('check variable provided!', key,key_index)
for real_key in real_key_list:
mainkey, subkey = real_key.split('>')
req_dict.append(real_key)
req_dict_rename.append(key_rename+'>'+subkey)
self.big_batch_size = -1 # vector should have same length, check it!
# load traj into a 'container'
for key_index, key in enumerate(req_dict):
key_name = req_dict[key_index]
key_rename = req_dict_rename[key_index]
if not hasattr(traj_pool[0], key_name): continue
set_item = np.concatenate([getattr(traj, key_name) for traj in traj_pool], axis=0)
if not (self.big_batch_size==set_item.shape[0] or (self.big_batch_size<0)):
print('error')
assert self.big_batch_size==set_item.shape[0] or (self.big_batch_size<0), (key,key_index)
self.big_batch_size = set_item.shape[0]
self.container[key_rename] = set_item # assign value to key_rename
# normalize advantage inside the batch
self.container[advantage_rename] = self.container[return_rename] - self.container[value_rename]
self.container[advantage_rename] = ( self.container[advantage_rename] - self.container[advantage_rename].mean() ) / (self.container[advantage_rename].std() + 1e-5)
# size of minibatch for each agent
self.mini_batch_size = math.ceil(self.big_batch_size / self.n_pieces_batch_division)
# do once
self.do_once_fin = False
def __len__(self):
return self.n_pieces_batch_division
def reminder(self, n_sample):
if not self.do_once_fin:
self.do_once_fin = True
drop_percent = (self.big_batch_size-n_sample) / self.big_batch_size*100
if self.mcv is not None: self.mcv.rec(drop_percent, 'drop percent')
if drop_percent > 20:
print_ = print亮红
print_('droping %.1f percent samples..'%(drop_percent))
assert False, "GPU OOM!"
else:
print_ = print
print_('droping %.1f percent samples..'%(drop_percent))
def get_sampler(self):
if not self.prevent_batchsize_oom:
#
sampler = BatchSampler(SubsetRandomSampler(range(self.big_batch_size)), self.mini_batch_size, drop_last=False)
else:
max_n_sample = self.determine_max_n_sample()
n_sample = min(self.big_batch_size, max_n_sample)
self.reminder(n_sample)
sampler = BatchSampler(SubsetRandomSampler(range(n_sample)), n_sample, drop_last=False)
return sampler
def reset_and_get_iter(self):
self.sampler = self.get_sampler()
for indices in self.sampler:
selected = {}
for key in self.container:
selected[key] = self.container[key][indices]
for key in [key for key in selected if '>' in key]:
# re-combine child key with its parent
mainkey, subkey = key.split('>')
if not mainkey in selected: selected[mainkey] = {}
selected[mainkey][subkey] = selected[key]
del selected[key]
yield selected
def determine_max_n_sample(self):
assert self.prevent_batchsize_oom
if not hasattr(TrajPoolSampler,'MaxSampleNum'):
# initialization
TrajPoolSampler.MaxSampleNum = [int(self.big_batch_size*(i+1)/50) for i in range(50)]
max_n_sample = self.big_batch_size
elif TrajPoolSampler.MaxSampleNum[-1] > 0:
# meaning that oom never happen, at least not yet
# only update when the batch size increases
if self.big_batch_size > TrajPoolSampler.MaxSampleNum[-1]:
TrajPoolSampler.MaxSampleNum.append(self.big_batch_size)
max_n_sample = self.big_batch_size
else:
# meaning that oom already happened, choose TrajPoolSampler.MaxSampleNum[-2] to be the limit
assert TrajPoolSampler.MaxSampleNum[-2] > 0
max_n_sample = TrajPoolSampler.MaxSampleNum[-2]
return max_n_sample
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/common/rl_alg_base.py
================================================
import time
from UTIL.tensor_ops import __hash__, repeat_at
from UTIL.colorful import *
from .alg_base import AlgorithmBase
# model IO
class RLAlgorithmBase(AlgorithmBase):
def __init__(self, n_agent, n_thread, space, mcv=None, team=None):
super().__init__(n_agent, n_thread, space, mcv, team)
# data integraty check
self._unfi_frag_ = None
# Skip currupt data integraty check after this patience is exhausted
self.patience = 1000
def interact_with_env(self, team_intel):
raise NotImplementedError
def save_model(self, update_cnt, info=None):
raise NotImplementedError
def process_framedata(self, traj_framedata):
raise NotImplementedError
# Rollout Processor 准备提交Rollout,以下划线开头和结尾的键值需要对齐(self.n_thread, ...)
# note that keys starting with _ must have shape (self.n_thread, ...), details see fn:mask_paused_env()
def process_framedata(self, traj_framedata):
'''
hook is called when reward and next moment observation is ready,
now feed them into trajectory manager.
Rollout Processor | 准备提交Rollout, 以下划线开头和结尾的键值需要对齐(self.n_thread, ...)
note that keys starting with _ must have shape (self.n_thread, ...), details see fn:mask_paused_env()
'''
# strip info, since it is not array
items_to_pop = ['info', 'Latest-Obs']
for k in items_to_pop:
if k in traj_framedata:
traj_framedata.pop(k)
# the agent-wise reward is supposed to be the same, so averge them
if self.ScenarioConfig.RewardAsUnity:
traj_framedata['reward'] = repeat_at(traj_framedata['reward'], insert_dim=-1, n_times=self.n_agent)
# change the name of done to be recognised (by trajectory manager)
traj_framedata['_DONE_'] = traj_framedata.pop('done')
traj_framedata['_TOBS_'] = traj_framedata.pop(
'Terminal-Obs-Echo') if 'Terminal-Obs-Echo' in traj_framedata else None
# mask out pause thread
traj_framedata = self.mask_paused_env(traj_framedata)
# put the frag into memory
self.batch_traj_manager.feed_traj(traj_framedata)
def check_reward_type(self, AlgorithmConfig):
if self.ScenarioConfig.RewardAsUnity != AlgorithmConfig.TakeRewardAsUnity:
assert self.ScenarioConfig.RewardAsUnity
assert not AlgorithmConfig.TakeRewardAsUnity
print亮紫(
'Warning, the scenario (MISSION) provide `RewardAsUnity`, but AlgorithmConfig does not `TakeRewardAsUnity` !')
print亮紫(
'If you continue, team reward will be duplicated to serve as individual rewards, wait 3s to proceed...')
time.sleep(3)
def mask_paused_env(self, frag):
running = ~frag['_SKIP_']
if running.all():
return frag
for key in frag:
if not key.startswith('_') and hasattr(frag[key], '__len__') and len(frag[key]) == self.n_thread:
frag[key] = frag[key][running]
return frag
'''
Get event from hmp task runner, called when each test rotinue is complete.
'''
def on_notify(self, message, **kargs):
self.save_model(
update_cnt=self.traj_manager.update_cnt,
info=str(kargs)
)
'''
function to be called when reward is received
'''
def commit_traj_frag(self, unfi_frag, req_hook=True):
assert self._unfi_frag_ is None
self._unfi_frag_ = unfi_frag
self._check_data_hash() # check data integraty
if req_hook:
# leave a hook
return self.traj_waiting_hook
else:
return None
def traj_waiting_hook(self, new_frag):
'''
This function will be called from
hook is called when reward and next moment observation is ready
'''
# do data curruption check at beginning, this is important!
self._check_data_curruption()
# finish the frame data with new data feedin
fi_frag = self._unfi_frag_
fi_frag.update(new_frag)
# call upper level function to deal with frame data
self.process_framedata(traj_framedata=fi_frag)
# delete data reference
self._unfi_frag_ = None
def _no_hook(self, new_frag):
return
# protect data from overwriting
def _check_data_hash(self):
if self.patience > 0:
self.patience -= 1
self.hash_db = {}
# for debugging, to detect write protection error
for key in self._unfi_frag_:
item = self._unfi_frag_[key]
if isinstance(item, dict):
self.hash_db[key] = {}
for subkey in item:
subitem = item[subkey]
self.hash_db[key][subkey] = __hash__(subitem)
else:
self.hash_db[key] = __hash__(item)
# protect data from overwriting
def _check_data_curruption(self):
if self.patience > 0:
self.patience -= 1
assert self._unfi_frag_ is not None
assert self.hash_db is not None
for key in self._unfi_frag_:
item = self._unfi_frag_[key]
if isinstance(item, dict):
for subkey in item:
subitem = item[subkey]
assert self.hash_db[key][subkey] == __hash__(subitem), ('Currupted data!')
else:
assert self.hash_db[key] == __hash__(item), ('Currupted data!')
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/common/traj.py
================================================
# cython: language_level=3
import numpy as np
from UTIL.colorful import *
from UTIL.tensor_ops import __hash__
class TRAJ_BASE():
key_data_type = {}
key_data_shape = {}
max_mem_length = -1
def __init__(self, traj_limit, env_id):
self.traj_limit = traj_limit
self.env_id = env_id
self.readonly_lock = False
self.key_dict = []
self.time_pointer = 0
self.need_reward_bootstrap = False
self.deprecated_flag = False
# remember something in a time step, add it to trajectory
def remember(self, key, content):
assert not self.readonly_lock
if not (key in self.key_dict) and (content is not None):
self.init_track(key=key, first_content=content)
getattr(self, key)[self.time_pointer] = content
elif not (key in self.key_dict) and (content is None):
self.init_track_none(key=key)
elif (key in self.key_dict) and (content is not None):
getattr(self, key)[self.time_pointer] = content
else:
pass
# duplicate/rename a trajectory
def copy_track(self, origin_key, new_key):
if hasattr(self, origin_key):
origin_handle = getattr(self, origin_key)
setattr(self, new_key, origin_handle.copy())
new_handle = getattr(self, new_key)
self.key_dict.append(new_key)
#return origin_handle, new_handle
else:
real_key_list = [real_key for real_key in self.__dict__ if (origin_key+'>' in real_key)]
assert len(real_key_list)>0, ('this key does not exist (yet), check:', origin_key)
for real_key in real_key_list:
mainkey, subkey = real_key.split('>')
self.copy_track(real_key, (new_key+'>'+subkey))
#return
# make sure dtype is ok
def check_type_shape(self, key, first_content=None):
if first_content is not None:
content_type = first_content.dtype
content_shape = first_content.shape
if key in TRAJ_BASE.key_data_type:
assert TRAJ_BASE.key_data_type[key] == content_type
else:
TRAJ_BASE.key_data_type[key] = content_type
TRAJ_BASE.key_data_shape[key] = content_shape
return content_type, content_shape
assert key in TRAJ_BASE.key_data_type
return TRAJ_BASE.key_data_type[key], TRAJ_BASE.key_data_shape[key]
# create track, executed used when a key showing up for the first time in 'self.remember'
def init_track(self, key, first_content):
content = first_content
self.check_type_shape(key, first_content)
assert isinstance(content, np.ndarray) or isinstance(content, float), (key, content.__class__)
tensor_size = ((self.traj_limit,) + tuple(content.shape))
set_item = np.zeros(shape=tensor_size, dtype=content.dtype)
set_item[:] = np.nan if np.issubdtype(content.dtype, np.floating) else 0
setattr(self, key, set_item)
self.key_dict.append(key)
# key pop up yet content is None,
# read dtype from history dtype dictionary to fill the hole
def init_track_none(self, key):
content_dtype, content_shape = self.check_type_shape(key)
tensor_size = ((self.traj_limit,) + tuple(content_shape))
set_item = np.zeros(shape=tensor_size, dtype=content_dtype)
set_item[:] = np.nan if np.issubdtype(content_dtype, np.floating) else 0
setattr(self, key, set_item)
self.key_dict.append(key)
# push the time pointer forward, before you call 'self.remember' again to fill t+1 data
def time_shift(self):
assert self.time_pointer < self.traj_limit
self.time_pointer += 1
# cut trajectory tail, when the number of episode time step < traj_limit
def cut_tail(self):
TJ = lambda key: getattr(self, key)
self.readonly_lock = True
n_frame = self.time_pointer
# check is buffer size too big
if n_frame > TRAJ_BASE.max_mem_length:
TRAJ_BASE.max_mem_length = n_frame
print('max_mem_length:%d, traj_limit:%d'%(TRAJ_BASE.max_mem_length, self.traj_limit))
# clip tail
for key in self.key_dict: setattr(self, key, TJ(key)[:n_frame])
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/common/traj_gae.py
================================================
# cython: language_level=3
import numpy as np
from ALGORITHM.common.traj import TRAJ_BASE
import copy
from UTIL.colorful import *
from UTIL.tensor_ops import my_view, repeat_at, gather_righthand
class trajectory(TRAJ_BASE):
dead_mask_check = True # confirm mask ok
def __init__(self, traj_limit, env_id, alg_cfg):
super().__init__(traj_limit, env_id)
self.agent_alive_reference = 'alive'
self.alg_cfg = alg_cfg
def early_finalize(self):
assert not self.readonly_lock # unfinished traj
self.need_reward_bootstrap = True
def set_terminal_obs(self, tobs):
self.tobs = copy.deepcopy(tobs)
def cut_tail(self):
# 删去多余的预留空间
super().cut_tail()
TJ = lambda key: getattr(self, key)
# 进一步地, 根据这个轨迹上的NaN,删除所有无效时间点
agent_alive = getattr(self, self.agent_alive_reference)
assert len(agent_alive.shape) == 2, "shoud be 2D (time, agent)/dead_or_alive"
if self.need_reward_bootstrap:
assert False, ('it should not go here if everything goes as expected')
# deprecated if nothing in it
p_valid = agent_alive.any(axis=-1)
p_invalid = ~p_valid
is_fully_valid_traj = (p_valid[-1] == True)
# assert p_valid[-1] == True, 如果有三只队伍,很有可能出现一只队伍全体阵亡,但游戏仍未结束的情况
if p_invalid.all(): #invalid traj
self.deprecated_flag = True
return
if not is_fully_valid_traj:
# adjust reward position if not fully valid
reward = TJ('reward')
for i in reversed(range(self.time_pointer)):
if p_invalid[i] and i != 0: # invalid, push reward forward
reward[i-1] += reward[i]; reward[i] = np.nan
setattr(self, 'reward', reward)
# clip NaN
for key in self.key_dict: setattr(self, key, TJ(key)[p_valid])
if not is_fully_valid_traj:
# reset time pointer
self.time_pointer = p_valid.sum()
# all done
return
def reward_push_forward(self, dead_mask):
# self.new_reward = self.reward.copy()
if self.alg_cfg.gamma_in_reward_forwarding:
gamma = self.alg_cfg.gamma_in_reward_forwarding_value
for i in reversed(range(self.time_pointer)):
if i==0: continue
self.reward[i-1] += np.where(dead_mask[i], self.reward[i]*gamma, 0) # if dead_mask[i]==True, this frame is invalid, move reward forward, set self.reward[i] to 0
self.reward[i] = np.where(dead_mask[i], 0, self.reward[i]) # if dead_mask[i]==True, this frame is invalid, move reward forward, set self.reward[i] to 0
else:
for i in reversed(range(self.time_pointer)):
if i==0: continue
self.reward[i-1] += np.where(dead_mask[i], self.reward[i], 0) # if dead_mask[i]==True, this frame is invalid, move reward forward, set self.reward[i] to 0
self.reward[i] = np.where(dead_mask[i], 0, self.reward[i]) # if dead_mask[i]==True, this frame is invalid, move reward forward, set self.reward[i] to 0
return
# new finalize
def finalize(self):
self.readonly_lock = True
assert not self.deprecated_flag
TJ = lambda key: getattr(self, key)
assert not np.isnan(TJ('reward')).any()
# deadmask
agent_alive = getattr(self, self.agent_alive_reference)
dead_mask = ~agent_alive
if trajectory.dead_mask_check:
trajectory.dead_mask_check = False
if not dead_mask.any():
assert False, "Are you sure agents cannot die? If so, delete this check."
self.reward_push_forward(dead_mask) # push terminal reward forward 38 42 54
threat = np.zeros(shape=dead_mask.shape) - 1
assert dead_mask.shape[0] == self.time_pointer
for i in reversed(range(self.time_pointer)):
# threat[:(i+1)] 不包含threat[(i+1)]
if i+1 < self.time_pointer:
threat[:(i+1)] += (~(dead_mask[i+1]&dead_mask[i])).astype(np.int)
elif i+1 == self.time_pointer:
threat[:] += (~dead_mask[i]).astype(np.int)
SAFE_LIMIT = 8
threat = np.clip(threat, -1, SAFE_LIMIT)
setattr(self, 'threat', np.expand_dims(threat, -1))
# ! Use GAE to calculate return
if self.alg_cfg.use_policy_resonance:
self.gae_finalize_return_pr(reward_key='reward', value_key='BAL_value_all_level', new_return_name='BAL_return_all_level')
else:
self.gae_finalize_return(reward_key='reward', value_key='value', new_return_name='return')
return
def gae_finalize_return(self, reward_key, value_key, new_return_name):
# ------- gae parameters -------
gamma = self.alg_cfg.gamma
tau = self.alg_cfg.tau
# ------- -------------- -------
rewards = getattr(self, reward_key)
value = getattr(self, value_key)
# ------- -------------- -------
length = rewards.shape[0]
assert rewards.shape[0]==value.shape[0]
# if dimension not aligned
if rewards.ndim == value.ndim-1: rewards = np.expand_dims(rewards, -1)
# initalize two more tracks
setattr(self, new_return_name, np.zeros_like(value))
self.key_dict.append(new_return_name)
# ------- -------------- -------
returns = getattr(self, new_return_name)
boot_strap = 0 if not self.need_reward_bootstrap else self.boot_strap_value['bootstrap_'+value_key]
for step in reversed(range(length)):
if step==(length-1): # 最后一帧
value_preds_delta = rewards[step] + gamma * boot_strap - value[step]
gae = value_preds_delta
else:
value_preds_delta = rewards[step] + gamma * value[step + 1] - value[step]
gae = value_preds_delta + gamma * tau * gae
returns[step] = gae + value[step]
def gae_finalize_return_pr(self, reward_key, value_key, new_return_name):
# ------- gae parameters -------
gamma = self.alg_cfg.gamma
tau = self.alg_cfg.tau
# ------- -------------- -------
BAL_value_all_level = copy.deepcopy(getattr(self, value_key))
# reshape to (batch, agent*distribution_precision, 1)
value = my_view(BAL_value_all_level, [0, -1, 1])
# ------- ------- reshape reward ------- -------
rewards_cp = copy.deepcopy(getattr(self, reward_key))
# if dimension not aligned
if rewards_cp.ndim == value.ndim-1: rewards_cp = np.expand_dims(rewards_cp, -1)
assert rewards_cp.shape[-1] == 1
n_agent = rewards_cp.shape[-2]
assert BAL_value_all_level.shape[-2] == n_agent
assert BAL_value_all_level.shape[-1] == self.alg_cfg.distribution_precision
rewards_cp = repeat_at(rewards_cp.squeeze(-1), -1, self.alg_cfg.distribution_precision)
rewards_cp = my_view(rewards_cp, [0, -1, 1])
# ------- -------------- -------
length = rewards_cp.shape[0]
assert rewards_cp.shape[0]==value.shape[0]
# ------- -------------- -------
returns = np.zeros_like(value)
boot_strap = 0 if not self.need_reward_bootstrap else self.boot_strap_value['bootstrap_'+value_key]
for step in reversed(range(length)):
if step==(length-1): # 最后一帧
value_preds_delta = rewards_cp[step] + gamma * boot_strap - value[step]
gae = value_preds_delta
else:
value_preds_delta = rewards_cp[step] + gamma * value[step + 1] - value[step]
gae = value_preds_delta + gamma * tau * gae
returns[step] = gae + value[step]
# ------- -------------- -------
returns = my_view(returns, [0, n_agent, self.alg_cfg.distribution_precision]) # BAL_return_all_level
setattr(self, new_return_name, returns)
self.key_dict.append(new_return_name)
def select_value_level(BAL_all_level, randl):
n_agent = BAL_all_level.shape[1]
tmp_index = np.expand_dims(repeat_at(randl, -1, n_agent), -1)
return gather_righthand(src=BAL_all_level, index=tmp_index, check=False)
self.value_selected = select_value_level(BAL_all_level=self.BAL_value_all_level, randl=self.randl)
self.return_selected = select_value_level(BAL_all_level=self.BAL_return_all_level, randl=self.randl)
'''
轨迹池管理
'''
class TrajManagerBase(object):
def __init__(self, n_env, traj_limit, alg_cfg):
self.alg_cfg = alg_cfg
self.n_env = n_env
self.traj_limit = traj_limit
self.update_cnt = 0
self.traj_pool = []
self.registered_keys = []
self.live_trajs = [trajectory(self.traj_limit, env_id=i, alg_cfg=self.alg_cfg) for i in range(self.n_env)]
self.live_traj_frame = [0 for _ in range(self.n_env)]
self._traj_lock_buf = None
self.patience = 1000
pass
def __check_integraty(self, traj_frag):
if self.patience < 0:
return # stop wasting time checking this
self.patience -= 1
for key in traj_frag:
if key not in self.registered_keys and (not key.startswith('_')):
self.registered_keys.append(key)
for key in self.registered_keys:
assert key in traj_frag, ('this key sometimes disappears from the traj_frag:', key)
def batch_update(self, traj_frag):
self.__check_integraty(traj_frag)
done = traj_frag['_DONE_']; traj_frag.pop('_DONE_') # done flag
skip = traj_frag['_SKIP_']; traj_frag.pop('_SKIP_') # skip/frozen flag
tobs = traj_frag['_TOBS_']; traj_frag.pop('_TOBS_') # terminal obs
# single bool to list bool
if isinstance(done, bool): done = [done for _ in range(self.n_env)]
if isinstance(skip, bool): skip = [skip for _ in range(self.n_env)]
n_active = sum(~skip)
# feed
cnt = 0
for env_i in range(self.n_env):
if skip[env_i]: continue
# otherwise
frag_index = cnt; cnt += 1
env_index = env_i
traj_handle = self.live_trajs[env_index]
for key in traj_frag:
self.traj_remember(traj_handle, key=key, content=traj_frag[key],frag_index=frag_index, n_active=n_active)
self.live_traj_frame[env_index] += 1
traj_handle.time_shift()
if done[env_i]:
assert tobs[env_i] is not None # get the final obs
traj_handle.set_terminal_obs(tobs[env_i])
self.traj_pool.append(traj_handle)
self.live_trajs[env_index] = trajectory(self.traj_limit, env_id=env_index, alg_cfg=self.alg_cfg)
self.live_traj_frame[env_index] = 0
def traj_remember(self, traj, key, content, frag_index, n_active):
if content is None: traj.remember(key, None)
elif isinstance(content, dict):
for sub_key in content:
self.traj_remember(traj, "".join((key , ">" , sub_key)), content=content[sub_key], frag_index=frag_index, n_active=n_active)
else:
assert n_active == len(content), ('length error')
traj.remember(key, content[frag_index]) # *
class BatchTrajManager(TrajManagerBase):
def __init__(self, n_env, traj_limit, trainer_hook, alg_cfg):
super().__init__(n_env, traj_limit, alg_cfg)
self.trainer_hook = trainer_hook
self.traj_limit = traj_limit
# 函数入口
def feed_traj(self, traj_frag, require_hook=False):
if require_hook: raise ModuleNotFoundError("not supported anymore")
assert self._traj_lock_buf is None
assert '_DONE_' in traj_frag
assert '_SKIP_' in traj_frag
self.batch_update(traj_frag=traj_frag) # call parent's batch_update()
return
def train_and_clear_traj_pool(self):
print('do update %d'%self.update_cnt)
for traj_handle in self.traj_pool:
traj_handle.cut_tail()
self.traj_pool = list(filter(lambda traj: not traj.deprecated_flag, self.traj_pool))
for traj_handle in self.traj_pool: traj_handle.finalize()
self.trainer_hook(self.traj_pool, 'train')
self.traj_pool = []
self.update_cnt += 1
return self.update_cnt
def can_exec_training(self):
num_traj_needed = self.alg_cfg.train_traj_needed
if len(self.traj_pool) >= num_traj_needed:
return True
else:
return False
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/common/traj_manager.py
================================================
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/example_foundation.py
================================================
import numpy as np
import copy
import math
import random
class ExampleFoundation():
def __init__(self, n_agent, n_thread, space, mcv=None, team=None):
self.n_thread = n_thread
self.n_agent = n_agent
self.handler = [None for _ in range(self.n_thread)]
def interact_with_env(self, team_intel):
info = team_intel['Latest-Team-Info']
done = team_intel['Env-Suffered-Reset']
step_cnt = team_intel['Current-Obs-Step']
action_list = np.zeros(shape=(self.n_agent, self.n_thread, 1))
return action_list, team_intel
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/hete_league_onenet_fix/ccategorical.py
================================================
from torch.distributions.categorical import Categorical
import torch
from .foundation import AlgorithmConfig
from UTIL.tensor_ops import repeat_at, _2tensor
from torch.distributions import kl_divergence
EPS = 1e-9
# yita = p_hit = 0.14
def random_process(probs, rsn_flag):
yita = AlgorithmConfig.yita
with torch.no_grad():
max_place = probs.argmax(-1, keepdims=True)
mask_max = torch.zeros_like(probs).scatter_(-1, max_place, 1).bool()
pmax = probs[mask_max]
if rsn_flag:
assert max_place.shape[-1] == 1
return max_place.squeeze(-1)
else:
# forbit max prob being chosen, pmax = probs.max(axis=-1)
p_hat = pmax + (pmax-1)/(1/yita-1)
k = 1/(1-yita)
#!!! write
probs *= k
#!!! write
probs[mask_max] = p_hat
# print(probs)
dist = Categorical(probs=probs)
samp = dist.sample()
assert samp.shape[-1] != 1
return samp
def random_process_allow_big_yita(probs, rsn_flag):
yita = AlgorithmConfig.yita
with torch.no_grad():
max_place = probs.argmax(-1, keepdims=True)
mask_max = torch.zeros_like(probs).scatter_(-1, max_place, 1).bool()
pmax = probs[mask_max].reshape(max_place.shape) #probs[max_place].clone()
if rsn_flag:
assert max_place.shape[-1] == 1
return max_place.squeeze(-1)
else:
# forbit max prob being chosen
# pmax = probs.max(axis=-1) #probs[max_place].clone()
yita_arr = torch.ones_like(pmax)*yita
yita_arr_clip = torch.minimum(pmax, yita_arr)
# p_hat = pmax + (pmax-1) / (1/yita_arr_clip-1) + 1e-10
p_hat = (pmax-yita_arr_clip)/(1-yita_arr_clip)
k = 1/(1-yita_arr_clip)
probs *= k
probs[mask_max] = p_hat.reshape(-1)
# print(probs)
dist = Categorical(probs=probs)
samp = dist.sample()
assert samp.shape[-1] != 1
return samp #.squeeze(-1)
def random_process_with_clamp3(probs, yita, yita_min_prob, rsn_flag):
with torch.no_grad():
max_place = probs.argmax(-1, keepdims=True)
mask_max = torch.zeros_like(probs).scatter_(dim=-1, index=max_place, value=1).bool()
pmax = probs[mask_max].reshape(max_place.shape)
# act max
assert max_place.shape[-1] == 1
act_max = max_place.squeeze(-1)
# act samp
yita_arr = torch.ones_like(pmax)*yita
# p_hat = pmax + (pmax-1) / (1/yita_arr_clip-1) + 1e-10
p_hat = (pmax-yita_arr)/((1-yita_arr)+EPS)
p_hat = p_hat.clamp(min=yita_min_prob)
k = (1-p_hat)/((1-pmax)+EPS)
probs *= k
probs[mask_max] = p_hat.reshape(-1)
dist = Categorical(probs=probs)
act_samp = dist.sample()
# assert act_samp.shape[-1] != 1
hit_e = _2tensor(rsn_flag)
return torch.where(hit_e, act_max, act_samp)
class CCategorical():
def __init__(self, planner):
self.planner = planner
pass
def sample(self, dist, eprsn):
probs = dist.probs.clone()
return random_process_with_clamp3(probs, self.planner.yita, self.planner.yita_min_prob, eprsn)
def register_rsn(self, rsn_flag):
self.rsn_flag = rsn_flag
def feed_logits(self, logits):
try:
return Categorical(logits=logits)
except:
print('error')
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/hete_league_onenet_fix/cython_func.pyx
================================================
import numpy as np
cimport numpy as np
cimport cython
from cython.parallel import prange
np.import_array()
ctypedef fused DTYPE_t:
np.float32_t
np.float64_t
ctypedef fused DTYPE_intlong_t:
np.int64_t
np.int32_t # to compat Windows
ctypedef np.uint8_t DTYPE_bool_t
@cython.boundscheck(False)
@cython.wraparound(False)
@cython.nonecheck(False)
def roll_hisory( DTYPE_t[:,:,:,:] obs_feed_new,
DTYPE_t[:,:,:,:] prev_obs_feed,
DTYPE_bool_t[:,:,:] valid_mask,
DTYPE_intlong_t[:,:] N_valid,
DTYPE_t[:,:,:,:] next_his_pool):
# how many threads
cdef Py_ssize_t vmax = N_valid.shape[0]
# how many agents
cdef Py_ssize_t wmax = N_valid.shape[1]
# how many entity subjects (including self @0)
cdef Py_ssize_t max_obs_entity = obs_feed_new.shape[2]
cdef int n_v, th, a, t, k, pointer
for th in prange(vmax, nogil=True):
# for each thread range -> prange
for a in prange(wmax):
# for each agent
pointer = 0
# step 1 fill next_his_pool[0 ~ (nv-1)] with obs_feed_new[0 ~ max_obs_entity-1]
for k in range(max_obs_entity):
if valid_mask[th,a,k]:
next_his_pool[th, a, pointer] = obs_feed_new[th,a,k]
pointer = pointer + 1
# step 2 fill next_his_pool[nv ~ (max_obs_entity-1)] with prev_obs_feed[0 ~ (max_obs_entity-1-nv)]
n_v = N_valid[th,a]
for k in range(n_v, max_obs_entity):
next_his_pool[th,a,k] = prev_obs_feed[th,a,k-n_v]
return np.asarray(next_his_pool)
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/hete_league_onenet_fix/div_tree.py
================================================
import torch
import torch.nn as nn
import numpy as np
from ALGORITHM.common.mlp import LinearFinal
from UTIL.tensor_ops import add_onehot_id_at_last_dim, add_onehot_id_at_last_dim_fixlen, repeat_at, _2tensor, gather_righthand, scatter_righthand
class DivTree(nn.Module): # merge by MLP version
def __init__(self, input_dim, h_dim, n_action):
super().__init__()
# to design a division tree, I need to get the total number of agents
from .foundation import AlgorithmConfig
self.n_agent = AlgorithmConfig.n_agent
self.div_tree = get_division_tree(self.n_agent)
self.n_level = len(self.div_tree)
self.max_level = len(self.div_tree) - 1
self.current_level = 0
self.init_level = AlgorithmConfig.div_tree_init_level
if self.init_level < 0:
self.init_level = self.max_level
self.current_level_floating = 0.0
get_net = lambda: nn.Sequential(
nn.Linear(h_dim+self.n_agent, h_dim),
nn.ReLU(inplace=True),
LinearFinal(h_dim, n_action)
)
# Note: this is NOT net defining for each agent
# Instead, all agents starts from self.nets[0]
self.nets = torch.nn.ModuleList(modules=[
get_net() for i in range(self.n_agent)
])
def set_to_init_level(self, auto_transfer=True):
if self.init_level!=self.current_level:
for i in range(self.current_level, self.init_level):
self.change_div_tree_level(i+1, auto_transfer)
def change_div_tree_level(self, level, auto_transfer=True):
print('performing div tree level change (%d -> %d/%d) \n'%(self.current_level, level, self.max_level))
self.current_level = level
self.current_level_floating = level
assert len(self.div_tree) > self.current_level, ('Reach max level already!')
if not auto_transfer: return
transfer_list = []
for i in range(self.n_agent):
previous_net_index = self.div_tree[self.current_level-1, i]
post_net_index = self.div_tree[self.current_level, i]
if post_net_index!=previous_net_index:
transfer = (previous_net_index, post_net_index)
if transfer not in transfer_list:
transfer_list.append(transfer)
for transfer in transfer_list:
from_which_net = transfer[0]
to_which_net = transfer[1]
self.nets[to_which_net].load_state_dict(self.nets[from_which_net].state_dict())
print('transfering model parameters from %d-th net to %d-th net'%(from_which_net, to_which_net))
return
def forward(self, x_in, agent_ids): # x0: shape = (?,...,?, n_agent, core_dim)
if self.current_level == 0:
x0 = add_onehot_id_at_last_dim_fixlen(x_in, fixlen=self.n_agent, agent_ids=agent_ids)
x2 = self.nets[0](x0)
return x2, None
else:
x0 = add_onehot_id_at_last_dim_fixlen(x_in, fixlen=self.n_agent, agent_ids=agent_ids)
res = []
for i in range(self.n_agent):
use_which_net = self.div_tree[self.current_level, i]
res.append(self.nets[use_which_net](x0[..., i, :]))
x2 = torch.stack(res, -2)
# x22 = self.nets[0](x1)
return x2, None
# def forward_try_parallel(self, x0): # x0: shape = (?,...,?, n_agent, core_dim)
# x1 = self.shared_net(x0)
# stream = []
# res = []
# for i in range(self.n_agent):
# stream.append(torch.cuda.Stream())
# torch.cuda.synchronize()
# for i in range(self.n_agent):
# use_which_net = self.div_tree[self.current_level, i]
# with torch.cuda.stream(stream[i]):
# res.append(self.nets[use_which_net](x1[..., i, :]))
# print(res[i])
# # s1 = torch.cuda.Stream()
# # s2 = torch.cuda.Stream()
# # # Wait for the above tensors to initialise.
# # torch.cuda.synchronize()
# # with torch.cuda.stream(s1):
# # C = torch.mm(A, A)
# # with torch.cuda.stream(s2):
# # D = torch.mm(B, B)
# # Wait for C and D to be computed.
# torch.cuda.synchronize()
# # Do stuff with C and D.
# x2 = torch.stack(res, -2)
# return x2
def _2div(arr):
arr_res = arr.copy()
arr_pieces = []
pa = 0
st = 0
needdivcnt = 0
for i, a in enumerate(arr):
if a!=pa:
arr_pieces.append([st, i])
if (i-st)!=1: needdivcnt+=1
pa = a
st = i
arr_pieces.append([st, len(arr)])
if (len(arr)-st)!=1: needdivcnt+=1
offset = range(len(arr_pieces), len(arr_pieces)+needdivcnt)
p=0
for arr_p in arr_pieces:
length = arr_p[1] - arr_p[0]
if length == 1: continue
half_len = int(np.ceil(length / 2))
for j in range(arr_p[0]+half_len, arr_p[1]):
try:
arr_res[j] = offset[p]
except:
print('wtf')
p+=1
return arr_res
def get_division_tree(n_agents):
agent2divitreeindex = np.arange(n_agents)
np.random.shuffle(agent2divitreeindex)
max_div = np.ceil(np.log2(n_agents)).astype(int)
levels = np.zeros(shape=(max_div+1, n_agents), dtype=int)
tree_of_agent = []*(max_div+1)
for ith, level in enumerate(levels):
if ith == 0: continue
res = _2div(levels[ith-1,:])
levels[ith,:] = res
res_levels = levels.copy()
for i, div_tree_index in enumerate(agent2divitreeindex):
res_levels[:, i] = levels[:, div_tree_index]
return res_levels
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/hete_league_onenet_fix/foundation.py
================================================
import os, time, torch, traceback, shutil, pickle, io
import numpy as np
from UTIL.colorful import *
from config import GlobalConfig
from UTIL.tensor_ops import repeat_at, _2tensor
from ALGORITHM.common.rl_alg_base import RLAlgorithmBase
class AlgorithmConfig:
'''
AlgorithmConfig: This config class will be 'injected' with new settings from json.
(E.g., override configs with ```python main.py --cfg example.jsonc```)
(please see UTIL.config_args to find out how this advanced trick works out.)
'''
# configuration, open to jsonc modification
gamma = 0.99
tau = 0.95
train_traj_needed = 512
hete_n_alive_frontend = 1
TakeRewardAsUnity = False
use_normalization = True
wait_norm_stable = True
add_prob_loss = False
n_focus_on = 2
n_entity_placeholder = 11
load_checkpoint = False
load_specific_checkpoint = ''
# PPO part
clip_param = 0.2
ppo_epoch = 16
n_pieces_batch_division = 1
value_loss_coef = 0.1
entropy_coef = 0.05
max_grad_norm = 0.5
clip_param = 0.2
lr = 1e-4
# prevent GPU OOM
prevent_batchsize_oom = False
gamma_in_reward_forwarding = False
gamma_in_reward_forwarding_value = 0.99
net_hdim = 24
dual_conc = True
n_agent = 'auto load, do not change'
ConfigOnTheFly = True
hete_n_net_placeholder = 5
hete_thread_align = False
hete_same_prob = 0.25
hete_lasted_n = 100
policy_resonance = False
use_avail_act = True
debug = False
ignore_test = False
type_agent_diff_lr = False
hete_exclude_zero_wr = False
policy_matrix_testing = False
test_which_cpk = 1
type_sel_override = False
type_sel_override_list = []
allow_fast_test = True
def str_array_to_num(str_arr):
out_arr = []
buffer = {}
for str in str_arr:
if str not in buffer:
buffer[str] = len(buffer)
out_arr.append(buffer[str])
return out_arr
def itemgetter(*items):
# same with operator.itemgetter
def g(obj): return tuple(obj[item] if item in obj else None for item in items)
return g
class CPU_Unpickler(pickle.Unpickler):
def find_class(self, module, name):
if module == 'torch.storage' and name == '_load_from_bytes':
return lambda b: torch.load(io.BytesIO(b), map_location='cpu')
else:
return super().find_class(module, name)
class ReinforceAlgorithmFoundation(RLAlgorithmBase):
def __init__(self, n_agent, n_thread, space, mcv=None, team=None):
from .shell_env import ShellEnvWrapper, ActionConvertLegacy
from .hete_net import HeteNet
super().__init__(n_agent, n_thread, space, mcv, team)
AlgorithmConfig.n_agent = n_agent
self.action_converter = ActionConvertLegacy(
SELF_TEAM_ASSUME=team,
OPP_TEAM_ASSUME=(1-team),
OPP_NUM_ASSUME=GlobalConfig.ScenarioConfig.N_AGENT_EACH_TEAM[1-team]
)
n_actions = len(self.action_converter.dictionary_args)
# change obs format, e.g., converting dead agent obs into NaN
self.shell_env = ShellEnvWrapper(n_agent, n_thread, space, mcv, self, AlgorithmConfig, GlobalConfig.ScenarioConfig, self.team)
if self.ScenarioConfig.EntityOriented: rawob_dim = self.ScenarioConfig.obs_vec_length
else: rawob_dim = space['obs_space']['obs_shape']
# self.StagePlanner, for policy resonance
from .stage_planner import StagePlanner
self.stage_planner = StagePlanner(mcv=mcv)
# heterogeneous agent types
agent_type_list = [a['type'] for a in GlobalConfig.ScenarioConfig.SubTaskConfig.agent_list]
self.HeteAgentType = str_array_to_num(agent_type_list)
hete_type = np.array(self.HeteAgentType)[self.ScenarioConfig.AGENT_ID_EACH_TEAM[team]]
# initialize policy
self.policy = HeteNet(rawob_dim=rawob_dim, n_action=n_actions, hete_type=hete_type, stage_planner=self.stage_planner)
self.policy = self.policy.to(self.device)
# initialize optimizer and trajectory (batch) manager
from .ppo import PPO
from .trajectory import BatchTrajManager
self.trainer = PPO(self.policy, ppo_config=AlgorithmConfig, mcv=mcv)
self.traj_manager = BatchTrajManager(
n_env=n_thread, traj_limit=int(GlobalConfig.ScenarioConfig.MaxEpisodeStep),
trainer_hook=self.trainer.train_on_traj)
self.stage_planner.trainer = self.trainer
# confirm that reward method is correct
self.check_reward_type(AlgorithmConfig)
# load checkpoints if needed
self.load_model(AlgorithmConfig)
# enable config_on_the_fly ability
if AlgorithmConfig.ConfigOnTheFly:
self._create_config_fly()
if AlgorithmConfig.policy_matrix_testing:
self.threads_test_reward_sum = np.zeros(shape=(n_thread,), dtype=float)
# self.threads_test_reward = []
self.recent_test_rewards = []
self.recent_test_wins = []
self._unfi_frag_matrix_ = None
self.recent_test_hete_gp_summary = []
self.current_hete_gp_summary = None
from VISUALIZE.mcom import mcom
self.mcv_matrix = mcom(
path='%s/logger/matrix/'%GlobalConfig.logdir,
image_path='%s/matrix.jpg'%GlobalConfig.logdir,
draw_mode='Img',
tag='[ppo.py]' )
self.mcv_matrix.rec_init(color='r')
def action_making(self, StateRecall, test_mode):
# make sure hook is cleared
assert ('_hook_' not in StateRecall)
# read obs et.al.
obs, threads_active_flag, avail_act, hete_pick, hete_type, gp_sel_summary, eprsn = \
itemgetter('obs', 'threads_active_flag', 'avail_act', '_hete_pick_', '_hete_type_', '_gp_pick_', '_EpRsn_')(StateRecall)
# make sure obs shape is correct
assert obs is not None, ('Make sure obs is ok')
assert len(obs) == sum(threads_active_flag), ('check batch size')
# make sure avail_act is correct
if AlgorithmConfig.use_avail_act: assert avail_act is not None
# policy resonance flag reshape
eprsn = repeat_at(eprsn, -1, self.n_agent)
thread_index = np.arange(self.n_thread)[threads_active_flag]
# make decision
with torch.no_grad():
action, value, action_log_prob = self.policy.act(obs=obs,
test_mode=test_mode,
avail_act=avail_act,
hete_pick=hete_pick,
hete_type=hete_type,
gp_sel_summary=gp_sel_summary,
thread_index=thread_index,
eprsn=eprsn,
)
# commit obs to buffer, vars named like _x_ are aligned, others are not!
traj_framefrag = {
"_SKIP_": ~threads_active_flag,
"value": value,
"hete_pick": hete_pick,
"hete_type": hete_type,
"gp_sel_summary": gp_sel_summary,
"avail_act": avail_act,
"actionLogProb": action_log_prob,
"obs": obs,
"action": action,
}
if avail_act is not None: traj_framefrag.update({'avail_act': avail_act})
# deal with rollout later when the reward is ready, leave a hook as a callback here
if not test_mode:
StateRecall['_hook_'] = self.commit_traj_frag(traj_framefrag, req_hook = True)
else:
if test_mode and AlgorithmConfig.policy_matrix_testing:
StateRecall['_hook_'] = self.matrix_callback_special(traj_framefrag)
return action.copy(), StateRecall
'''
function to be called when reward is received
'''
def matrix_callback_special(self, framefrag):
assert self._unfi_frag_matrix_ is None
self._unfi_frag_matrix_ = framefrag
return self.matrix_callback_special_callback
def matrix_callback_special_callback(self, new_frag):
fi_frag = self._unfi_frag_matrix_
self._unfi_frag_matrix_ = None
reward = new_frag['reward'].copy()
done = new_frag['done'].copy()
# self.threads_test_reward.append(reward)
self.threads_test_reward_sum += reward * ~fi_frag['_SKIP_']
if not any(fi_frag['_SKIP_']):
self.current_hete_gp_summary = fi_frag["gp_sel_summary"]
if done.all():
self.recent_test_rewards.extend(self.threads_test_reward_sum)
self.recent_test_wins.extend([q['team_ranking'][self.team]==0 for q in new_frag['info']]) # 0 means rank first
self.recent_test_hete_gp_summary.extend(self.current_hete_gp_summary)
self.threads_test_reward_sum *= 0
self.current_hete_gp_summary = None
return None
def interact_with_env(self, StateRecall):
'''
Interfacing with marl, standard method that you must implement
(redirect to shell_env to help with history rolling)
'''
return self.shell_env.interact_with_env(StateRecall)
def interact_with_env_genuine(self, StateRecall):
'''
When shell_env finish the preparation, interact_with_env_genuine is called
(Determine whether or not to do a training routinue)
'''
# if not StateRecall['Test-Flag']: self.train() # when needed, train!
return self.action_making(StateRecall, StateRecall['Test-Flag'])
def train(self):
'''
Get event from hmp task runner, save model now!
'''
if self.traj_manager.can_exec_training():
if self.stage_planner.can_exec_trainning():
self.traj_manager.train_and_clear_traj_pool()
else:
self.traj_manager.clear_traj_pool()
# read configuration
if AlgorithmConfig.ConfigOnTheFly: self._config_on_fly()
#
self.stage_planner.update_plan()
# override parent function
def on_notify(self, message, **kargs):
win_rate = kargs['win_rate']
mean_reward = kargs['mean_reward']
path = self.save_model(
update_cnt=self.traj_manager.update_cnt,
info=str(kargs)
)
# print('[random win rate] ! ! ! ! !')
# win_rate = np.random.rand()
self.policy.register_ckp(win_rate, path, mean_reward)
if AlgorithmConfig.policy_matrix_testing:
from UTIL.data_struct import UniqueList
# self.recent_test_hete_gp_summary_str = self.recent_test_hete_gp_summary # [str(q.tolist()) for q in self.recent_test_hete_gp_summary]
recent_test_hete_gp_summary_ls = [q.tolist() for q in self.recent_test_hete_gp_summary]
ulist = UniqueList(recent_test_hete_gp_summary_ls)
for u in ulist:
feature = self.policy.ph_to_feature[u].squeeze().cpu().numpy().tolist()
feature = "[%.2f,%.2f,%.2f]"%tuple(feature)
mask = [u==uu for uu in recent_test_hete_gp_summary_ls]
r = np.array(self.recent_test_rewards)[mask].mean()
wr = np.array(self.recent_test_wins)[mask].mean()
self.mcv_matrix.rec(self.policy.ckpg_input_cnt, 'time')
self.mcv_matrix.rec(r, 'r of=%s'%feature)
self.mcv_matrix.rec(wr, 'w of=%s'%feature)
self.mcv_matrix.rec(sum(mask), 'n of=%s'%feature)
self.mcv_matrix.rec_show()
self.recent_test_rewards = []
self.recent_test_wins = []
self.recent_test_hete_gp_summary = []
def save_model(self, update_cnt, info=None):
'''
save model now!
save if triggered when:
1. Update_cnt = 50, 100, ...
2. Given info, indicating a hmp command
3. A flag file is detected, indicating a save command from human
'''
if not os.path.exists('%s/history_cpt/' % GlobalConfig.logdir):
os.makedirs('%s/history_cpt/' % GlobalConfig.logdir)
# dir 1
pt_path = '%s/model.pt' % GlobalConfig.logdir
print绿('saving model to %s' % pt_path)
torch.save({
'policy': self.policy.state_dict(),
'optimizer': self.trainer.optimizer.state_dict(),
}, pt_path)
# dir 2
info = str(update_cnt) if info is None else ''.join([str(update_cnt), '_', info])
pt_path2 = '%s/history_cpt/model_%s.pt' % (GlobalConfig.logdir, info)
shutil.copyfile(pt_path, pt_path2)
# save ckpg_info
with open('%s/history_cpt/ckpg_info.pkl'%GlobalConfig.logdir, 'wb') as f:pickle.dump((self.policy.ckpg_info, self.policy.ckpg_input_cnt, [(n.feature, n.static, n.ready_to_go) for n in self.policy._nets_flat_placeholder_]),f)
print绿('save_model fin')
return pt_path2
def find_ckp(self, feature):
import glob
list_ckp = glob.glob('%s/history_cpt/*.pt'%GlobalConfig.logdir)
ckp_dir = [ckp for ckp in list_ckp if str(feature[0]) in ckp][0]
cuda_n = 'cpu' if 'cpu' in self.device else self.device
cpt = torch.load(ckp_dir, map_location=cuda_n)
# get previous frontier network
return {k.replace('_nets_flat_placeholder_.0.',''):v for k, v in cpt['policy'].items() if '_nets_flat_placeholder_.0.' in k}
def load_model(self, AlgorithmConfig):
'''
load model now
'''
if AlgorithmConfig.load_checkpoint:
manual_dir = AlgorithmConfig.load_specific_checkpoint
ckpt_dir = '%s/model.pt' % GlobalConfig.logdir if manual_dir == '' else '%s/%s' % (GlobalConfig.logdir, manual_dir)
cuda_n = 'cpu' if 'cpu' in self.device else self.device
strict = True
if not platform.system()=="Linux": assert ':' not in ckpt_dir, ('Windows OS does not allow : in file name')
cpt = torch.load(ckpt_dir, map_location=cuda_n)
self.policy.load_state_dict(cpt['policy'], strict=strict)
# https://github.com/pytorch/pytorch/issues/3852
self.trainer.optimizer.load_state_dict(cpt['optimizer'])
print黄('loaded checkpoint:', ckpt_dir)
if os.path.exists('%s/history_cpt/ckpg_info.pkl'%GlobalConfig.logdir):
with open('%s/history_cpt/ckpg_info.pkl'%GlobalConfig.logdir, 'rb') as f:
self.policy.ckpg_info, self.policy.ckpg_input_cnt, n_flags = CPU_Unpickler(f).load()
for (n, flags) in zip(self.policy._nets_flat_placeholder_, n_flags):
n.feature = flags[0]
n.static = flags[1]
n.ready_to_go = flags[2]
if n.feature!=1:
n.load_state_dict(self.find_ckp(n.feature), strict=True)
self.policy.ph_to_feature = _2tensor(np.array([n.feature for n in self.policy._nets_flat_placeholder_]))
print黄('loaded ckpg_info')
else:
print('Warning, past policy missing !!')
def process_framedata(self, traj_framedata):
'''
hook is called when reward and next moment observation is ready,
now feed them into trajectory manager.
Rollout Processor | 准备提交Rollout, 以下划线开头和结尾的键值需要对齐(self.n_thread, ...)
note that keys starting with _ must have shape (self.n_thread, ...), details see fn:mask_paused_env()
'''
# strip info, since it is not array
items_to_pop = ['info', 'Latest-Obs']
for k in items_to_pop:
if k in traj_framedata:
traj_framedata.pop(k)
# the agent-wise reward is supposed to be the same, so averge them
if self.ScenarioConfig.RewardAsUnity:
traj_framedata['reward'] = repeat_at(traj_framedata['reward'], insert_dim=-1, n_times=self.n_agent)
# change the name of done to be recognised (by trajectory manager)
traj_framedata['_DONE_'] = traj_framedata.pop('done')
traj_framedata['_TOBS_'] = traj_framedata.pop(
'Terminal-Obs-Echo') if 'Terminal-Obs-Echo' in traj_framedata else None
# mask out pause thread
traj_framedata = self.mask_paused_env(traj_framedata)
# put the frag into memory
self.traj_manager.feed_traj_framedata(traj_framedata)
def mask_paused_env(self, frag):
running = ~frag['_SKIP_']
if running.all():
return frag
for key in frag:
if not key.startswith('_') and hasattr(frag[key], '__len__') and len(frag[key]) == self.n_thread:
frag[key] = frag[key][running]
return frag
def _create_config_fly(self):
logdir = GlobalConfig.logdir
self.input_file_dir = '%s/cmd_io.txt' % logdir
if not os.path.exists(self.input_file_dir):
with open(self.input_file_dir, 'w+', encoding='utf8') as f: f.writelines(["# Write cmd at next line: ", ""])
def _config_on_fly(self):
if not os.path.exists(self.input_file_dir): return
with open(self.input_file_dir, 'r', encoding='utf8') as f:
cmdlines = f.readlines()
cmdlines_writeback = []
any_change = False
for cmdline in cmdlines:
if cmdline.startswith('#') or cmdline=="\n" or cmdline==" \n":
cmdlines_writeback.append(cmdline)
else:
any_change = True
try:
print亮绿('[foundation.py] ------- executing: %s ------'%cmdline)
exec(cmdline)
cmdlines_writeback.append('# [execute successfully]\t'+cmdline)
except:
print红(traceback.format_exc())
cmdlines_writeback.append('# [execute failed]\t'+cmdline)
if any_change:
with open(self.input_file_dir, 'w+', encoding='utf8') as f:
f.writelines(cmdlines_writeback)
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/hete_league_onenet_fix/hete_assignment.py
================================================
import copy
import numpy as np
from UTIL.tensor_ops import my_view, __hash__, repeat_at, gather_righthand
from .foundation import AlgorithmConfig
def random_group(random_select_fn, n_thread, hete_type, n_hete_types, n_group, selected_tps, testing):
n_agent = hete_type.shape[-1]
group_sel_arr = np.zeros(shape=(n_thread, n_agent), dtype=int)
gp_sel_summary = []
for i in range(n_thread):
group_assignment = np.array([
random_select_fn(testing)
if type not in selected_tps[i] else 0
for type in range(n_hete_types)
])
assert (group_assignment[selected_tps[i]]==0).all()
gp_sel_summary.append(copy.deepcopy(group_assignment))
for ht, group in enumerate(group_assignment):
mask = (hete_type == ht)
group_sel_arr[i,mask] = group
return group_sel_arr, np.stack(gp_sel_summary).astype(np.int64)
def select_nets_for_shellenv(n_types, policy, hete_type_list, n_thread, n_gp, testing):
if (not testing) or (AlgorithmConfig.policy_matrix_testing):
n_alive_frontend = AlgorithmConfig.hete_n_alive_frontend
tmp = np.arange(n_types)
# select types to use frontier
if not AlgorithmConfig.type_sel_override:
selected_types = np.stack([
np.random.choice(
a=tmp,
size=(n_alive_frontend),
replace=False,
p=None)
for _ in range(n_thread)
])
else:
selected_types = np.stack([
AlgorithmConfig.type_sel_override_list
for _ in range(n_thread)
])
else:
# testing but not policy_matrix_testing: select all types to use frontier
selected_types = np.stack([np.arange(n_types) for _ in range(n_thread)])
# generate a random group selection array
if not AlgorithmConfig.policy_matrix_testing:
random_select_fn = policy.random_select
else:
random_select_fn = policy.random_select_matrix_test
group_sel_arr, gp_sel_summary = random_group(
random_select_fn=random_select_fn, n_thread=n_thread, hete_type=hete_type_list,
n_hete_types=n_types, n_group=n_gp, selected_tps=selected_types, testing=testing)
# group to net index
n_tp = n_types
get_placeholder = lambda type, group: group*n_tp + type
hete_type_arr = repeat_at(hete_type_list, 0, n_thread)
selected_nets = get_placeholder(type=hete_type_arr, group=group_sel_arr)
return selected_nets, gp_sel_summary
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/hete_league_onenet_fix/hete_net.py
================================================
import torch, math, copy, pickle
import numpy as np
import torch.nn as nn
from config import GlobalConfig as cfg
from torch.distributions.categorical import Categorical
from UTIL.colorful import print亮绿
from UTIL.tensor_ops import Args2tensor_Return2numpy, Args2tensor, __hashn__, cat_last_dim, __hash__, one_hot_with_nan, repeat_at, scatter_righthand, gather_righthand, _2cpu2numpy, my_view
from .foundation import AlgorithmConfig
from ALGORITHM.common.pca import pca
from ALGORITHM.common.net_manifest import weights_init
from .net import Net, NetCentralCritic
def popgetter(*items):
def g(obj): return tuple(obj.pop(item) if item in obj else None for item in items)
return g
class no_context():
def __enter__(self):
return None
def __exit__(self, exc_type, exc_value, traceback):
return False
def _count_list_type(x):
type_cnt = {}
for xx in x:
if xx not in type_cnt: type_cnt[xx] = 0
type_cnt[xx] += 1
return len(type_cnt)
def _create_tensor_ph_or_fill_(ref, pt, offset, *args):
n_threads, n_agents, mask = args
if pt[offset] is None:
pt[offset] = torch.zeros(size=(n_threads*n_agents, *ref.shape[2:]), device=ref.device, dtype=ref.dtype)
pt[offset][mask] = ref.squeeze(0)
def _tensor_expand_thread_dim_v2_(ref, pt, offset, *args):
# undo dim collapse
n_threads, n_agents = args
v = pt[offset]
pt[offset] = v.view(n_threads, n_agents, *v.shape[1:])
def dfs_create_and_fn(ref, pt, offset, fn, *args):
'''
ref: target to sync
pt: mutable list
offset: mutable list index
fn: function to be executed at leaf nodes
args: anything needed
'''
if ref is None: # there is nothing to sync, instead, do something at leaf node only
ref = pt[offset]
if ref == 'vph':
pt[offset] = 'vph'
return
elif isinstance(ref, tuple) or isinstance(ref, list):
if pt[offset] is None: pt[offset] = [None for item in ref]
for i, item in enumerate(ref):
dfs_create_and_fn(item, pt[offset], i, fn, *args)
elif isinstance(ref, dict):
if pt[offset] is None: pt[offset] = {key:None for key in ref}
for key in ref:
dfs_create_and_fn(ref[key], pt[offset], key, fn, *args)
elif isinstance(ref, torch.Tensor):
fn(ref, pt, offset, *args)
else:
assert False
def _deal_single_in(x, mask_flatten):
if isinstance(x, torch.Tensor):
# collapse first two dims
return x.view(-1, *x.shape[2:])[mask_flatten].unsqueeze(0)
else:
return x
# todo: https://pytorch.org/tutorials/advanced/torch-script-parallelism.html?highlight=parallel
def distribute_compute(fn_arr, mask_arr, **kwargs):
"""compute on each network
Args:
fn_arr : a list of forwarding networks
mask_arr : mask of kwargs
Returns:
tuple tensors: the result of networks
"""
# python don't have pointers,
# however, a list is a mutable type in python, that's what we need
g_out = [None]
n_threads = mask_arr[0].shape[0]
n_agents = mask_arr[0].shape[1]
# calculated result will be gathered into ret_tuple_gather
ret_tuple_gather = []
# one by one we compute the result
for fn, mask in zip(fn_arr, mask_arr):
assert mask.dim()==2
mask_flatten = mask.flatten()
agent_ids = torch.where(mask)[1]
agent_ids = agent_ids.unsqueeze(0) # fake an extral dimension
_kwargs = {key:_deal_single_in(kwargs[key], mask_flatten) for key in kwargs}
with torch.no_grad() if fn.static else no_context() as gs: # no_grad is already declared outside in act mode
ret_tuple = fn._act(agent_ids=agent_ids, **_kwargs)
ret_tuple_gather.append(ret_tuple)
# stack ret_tuple_gather into g_out
for ret_tuple, fn, mask in zip(ret_tuple_gather, fn_arr, mask_arr):
mask_flatten = mask.flatten()
dfs_create_and_fn(ret_tuple, g_out, 0, _create_tensor_ph_or_fill_, n_threads, n_agents, mask_flatten)
# reshape the tensor
dfs_create_and_fn(None, g_out, 0, _tensor_expand_thread_dim_v2_, n_threads, n_agents)
return tuple(g_out[0])
class HeteNet(nn.Module):
def __init__(self, rawob_dim, n_action, hete_type, **kwargs):
super().__init__()
self.rawob_dim = rawob_dim
self.n_action = n_action
self.hete_type = hete_type
self.n_hete_types = _count_list_type(self.hete_type)
self.hete_n_net_placeholder = AlgorithmConfig.hete_n_net_placeholder
self.use_normalization = AlgorithmConfig.use_normalization
self.n_tp = self.n_hete_types
self.n_gp = self.hete_n_net_placeholder
self.n_agent_each_tp = [sum(self.hete_type==i) for i in range(self.n_hete_types)]
self.n_agents = len(self.hete_type)
# convertion between placeholder index and type-group index
self.tpgp_2_ph = lambda type, group: group*self.n_tp + type
self.ph_2_tpgp = lambda ph: (ph%self.n_hete_types, ph//self.n_hete_types)
self.ph_2_gp = lambda ph: ph//self.n_hete_types
# initialize net placeholders
self._nets_flat_placeholder_ = torch.nn.ModuleList(modules=[
Net(rawob_dim, n_action, **kwargs) for _ in range(
self.n_gp
)
])
# initialize critic
self._critic_central = NetCentralCritic(rawob_dim, n_action, **kwargs)
# reshape the handle of networks
self.nets = [ [ self._nets_flat_placeholder_[gp] ] for gp in range(self.n_gp)]
# the frontier nets
self.frontend_nets = self.nets[0]
# the static nets
self.static_nets = self.nets[1:]
# heterogeneous feature dimension
self.hete_feature_dim = 1
# add flags to each nets
for gp, n_arr in enumerate(self.nets):
for _, n in enumerate(n_arr):
ph_index = gp
n.gp = gp
# n.lr_div = self.n_agent_each_tp[tp] / self.n_agents
if gp!=0:
# lock static nets: the static nets are not loaded yet
n.feature = np.zeros(self.hete_feature_dim)
n.ready_to_go = False
self.lock_net(ph_index)
else:
# unlock frontier nets: the frontier nets are ready
n.feature = np.ones(self.hete_feature_dim)
n.ready_to_go = True
self.unlock_net(ph_index)
# a list to trace the vital checkpoints
self.ckpg_info = []
# track the number of checkpoints commited
self.ckpg_input_cnt = 0
# feature array, arranged according to placeholders
self.ph_to_feature = torch.tensor(np.array([n.feature for n in self._nets_flat_placeholder_]), dtype=torch.float, device=cfg.device)
#
# from UTIL.sync_exp import SynWorker
# self.syn_worker = SynWorker('follow')
def lock_net(self, i):
n = self._nets_flat_placeholder_[i]
n.static = True
n.eval()
def unlock_net(self, i):
n = self._nets_flat_placeholder_[i]
n.static = False
n.train()
def register_ckp(self, win_rate, cpk_path, mean_reward):
# deal with new checkpoint
self.ckpg_input_cnt += 1
# get previous win rates
prev_win_rate = [self.ckpg_info[i]['win_rate'] for i in range(len(self.ckpg_info))]
# if the winrate is not a breakthough, give up
if len(prev_win_rate)>0 and win_rate <= max(prev_win_rate):
return
if AlgorithmConfig.hete_exclude_zero_wr and win_rate==0:
return
# list the infomation about this checkpoint
self.ckpg_info.append({
'win_rate': win_rate,
'mean_reward': mean_reward,
'ckpg_cnt': self.ckpg_input_cnt,
'cpk_path': cpk_path,
'model': copy.deepcopy(self.frontend_nets[0].state_dict()),
'feature': [
win_rate
],
})
# sort according to win rate
self.ckpg_info.sort(key=lambda x:x['win_rate'])
# remove a checkpoint that is too close to its neighbor
self.trim_ckp()
print('ckp register change!')
print([self.ckpg_info[i]['win_rate'] for i in range(len(self.ckpg_info))])
print([self.ckpg_info[i]['ckpg_cnt'] for i in range(len(self.ckpg_info))])
# reload parameters
for i, static_nets in enumerate(self.static_nets):
# some net cannot be loaded with parameters yet, because ckpg_info has not collect enough samples
if i >= len(self.ckpg_info): continue
for _, net in enumerate(static_nets):
# load parameters
net.load_state_dict(self.ckpg_info[i]['model'], strict=True)
# the net must be static
assert net.static
# now the net is ready
net.ready_to_go = True
net.feature = self.ckpg_info[i]['feature']
# reload the net features
self.ph_to_feature = torch.tensor(np.array([n.feature for n in self._nets_flat_placeholder_]), dtype=torch.float, device=cfg.device)
print('parameters reloaded')
def random_select(self, testing, *args, **kwargs):
"""randomly select a group index
Args:
AlgorithmConfig.hete_same_prob: a probability about choosing the frontier net as the teammate
Returns:
int: a group index
"""
assert not testing
if np.random.rand() < AlgorithmConfig.hete_same_prob:
return 0
# choose randomly among existing nets
n_option = len(self.ckpg_info)
if n_option > 0:
if n_option > AlgorithmConfig.hete_lasted_n:
assert AlgorithmConfig.hete_lasted_n != 0
rand_sel = np.random.randint(low=n_option+1-AlgorithmConfig.hete_lasted_n, high=n_option+1)
else:
rand_sel = np.random.randint(low=1, high=n_option+1)
return rand_sel
else:
return 0
if AlgorithmConfig.policy_matrix_testing:
def random_select_matrix_test(self, testing, *args, **kwargs):
if testing:
hete_frontier_prob = 0 # 1 / (AlgorithmConfig.hete_lasted_n+1)
# print('manual selection')
n_option = len(self.ckpg_info)
LAST = AlgorithmConfig.test_which_cpk
# return 0
return (n_option+1) - LAST
else:
hete_frontier_prob = AlgorithmConfig.hete_same_prob
if np.random.rand() < hete_frontier_prob:
return 0
# choose randomly among existing nets
n_option = len(self.ckpg_info)
if n_option > 0:
if AlgorithmConfig.hete_lasted_n == 0:
return 0
if n_option > AlgorithmConfig.hete_lasted_n:
assert AlgorithmConfig.hete_lasted_n != 0
rand_sel = np.random.randint(low=n_option+1-AlgorithmConfig.hete_lasted_n, high=n_option+1)
else:
rand_sel = np.random.randint(low=1, high=n_option+1)
return rand_sel
else:
return 0
# called after training update
def on_update(self, update_cnt):
return
def redirect_to_frontend(self, i):
return i%self.n_tp
def acquire_net(self, i):
tp, gp = self.ph_2_tpgp(i)
return self._nets_flat_placeholder_[gp]
def exe(self, hete_pick=None, **kargs):
# shape
n_thread = hete_pick.shape[0]
n_agents = hete_pick.shape[1]
# pop items from kargs
gp_sel_summary, thread_indices, hete_type = popgetter('gp_sel_summary', 'thread_index', 'hete_type')(kargs)
# get ph_feature
# _012345 = torch.arange(self.n_tp, device=kargs['obs'].device, dtype=torch.int64)
ph_sel = gp_sel_summary # *self.n_tp + repeat_at(_012345, 0, n_thread) # group * self.n_tp + tp
ph_feature = self.ph_to_feature[ph_sel] # my_view(, [0, -1])
ph_feature_cp_raw = repeat_at(ph_feature, 1, n_agents)
agent2tp_onehot = torch.nn.functional.one_hot(hete_type.long(), num_classes=self.n_tp).unsqueeze(-1)
type_gp_mat = repeat_at(gp_sel_summary, -1, self.n_tp)
same_gp = (type_gp_mat == type_gp_mat.transpose(-1,-2)).long()
agent_self_type_mask2 = gather_righthand(same_gp, index=hete_type, check=False).unsqueeze(-1)
assert ph_feature_cp_raw.dim() == 4
ph_feature_cp2 = (ph_feature_cp_raw*(1-agent_self_type_mask2) + agent_self_type_mask2)
ph_feature_cp_obs_ = torch.cat((ph_feature_cp2, agent2tp_onehot), 2)
ph_feature_cp_critic_ = torch.cat((ph_feature_cp_raw, agent2tp_onehot), 2)
ph_feature_cp_obs = my_view(ph_feature_cp_obs_, [0,0,-1]) # ph_feature_cp_obs.shape = torch.Size([n_thread=16, n_agents=10, core_dim=12])
ph_feature_cp_critic = my_view(ph_feature_cp_critic_, [0,0,-1]) # ph_feature_cp_obs.shape = torch.Size([n_thread=16, n_agents=10, core_dim=12])
# add ph_feature to kwargs
kargs['obs_hfeature'] = ph_feature_cp_obs
# get a manifest of running nets
# invo_hete_types = [i for i in range(self.n_tp*self.n_gp) if (i in hete_pick)]
invo_gps = [i for i in range(self.n_gp) if (i in gp_sel_summary)]
running_nets = [self.nets[gp][0] for gp in invo_gps]
# make sure all nets under testing is frontend / frontier
if 'test_mode' in kargs and kargs['test_mode']:
for net in running_nets:
if not AlgorithmConfig.policy_matrix_testing: assert not net.static
# run actor policy networks
actor_result = distribute_compute(
fn_arr = running_nets,
mask_arr = [(self.ph_2_gp(hete_pick) == gp) for gp in invo_gps],
**kargs
)
# run critic network
kargs.pop('obs_hfeature') # replace h_feature
kargs['obs_hfeature_critic'] = ph_feature_cp_critic
critic_result = self._critic_central.estimate_state(**kargs)
# combine actor_result and critic_result
actor_result = list(actor_result)
for i, item in enumerate(actor_result):
if item=='vph': actor_result[i] = critic_result
# done !
return tuple(actor_result)
@Args2tensor_Return2numpy
def act(self, **kargs):
return self.exe(**kargs)
@Args2tensor
def evaluate_actions(self, **kargs):
return self.exe(**kargs, eval_mode=True)
def trim_ckp(self):
RemoveNew = True
max_static_gp = self.n_gp - 1
if len(self.ckpg_info) <= max_static_gp:
return
else:
assert len(self.ckpg_info) == max_static_gp+1
# find two ckp with nearest
winrate_list = np.array([self.ckpg_info[i]['win_rate'] for i in range(len(self.ckpg_info))])
winrate_list = np.abs(winrate_list[1:] - winrate_list[:-1])
index = np.argmin(winrate_list)
old_index = index
new_index = index + 1
if self.ckpg_info[new_index]['ckpg_cnt'] < self.ckpg_info[old_index]['ckpg_cnt']:
new_index, old_index = old_index, new_index
if RemoveNew:
self.ckpg_info.pop(new_index)
else:
self.ckpg_info.pop(old_index)
assert len(self.ckpg_info) == max_static_gp
pass
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/hete_league_onenet_fix/net.py
================================================
import torch, math, copy
import numpy as np
import torch.nn as nn
from torch.distributions.categorical import Categorical
from UTIL.colorful import print亮绿
from UTIL.tensor_ops import Args2tensor_Return2numpy, Args2tensor, __hashn__, my_view
from UTIL.tensor_ops import pt_inf
from UTIL.exp_helper import changed
from .ccategorical import CCategorical
from .foundation import AlgorithmConfig
from ALGORITHM.common.attention import SimpleAttention
from ALGORITHM.common.norm import DynamicNormFix
from ALGORITHM.common.net_manifest import weights_init
from ALGORITHM.common.hyper_net import HyperNet
"""
network initialize
"""
class Net(nn.Module):
def __init__(self, rawob_dim, n_action, **kwargs):
super().__init__()
self.update_cnt = nn.Parameter(
torch.zeros(1, requires_grad=False, dtype=torch.long), requires_grad=False)
self.use_normalization = AlgorithmConfig.use_normalization
self.use_policy_resonance = AlgorithmConfig.policy_resonance
self.n_action = n_action
if self.use_policy_resonance:
self.ccategorical = CCategorical(kwargs['stage_planner'])
self.is_resonance_active = lambda: kwargs['stage_planner'].is_resonance_active()
h_dim = AlgorithmConfig.net_hdim
# observation normalization
if self.use_normalization:
self._batch_norm = DynamicNormFix(rawob_dim, only_for_last_dim=True, exclude_one_hot=True, exclude_nan=True)
n_entity = AlgorithmConfig.n_entity_placeholder
# # # # # # # # # # actor-critic share # # # # # # # # # # # #
self.obs_encoder = nn.Sequential(nn.Linear(rawob_dim, h_dim), nn.ReLU(inplace=True), nn.Linear(h_dim, h_dim))
self.attention_layer = SimpleAttention(h_dim=h_dim)
# # # # # # # # # # actor # # # # # # # # # # # #
_size = n_entity * h_dim
self.hyper_net = HyperNet(embed_dim=h_dim, hyper_input_dim=6, x_input_dim=_size)
self.policy_head = nn.Sequential(
nn.Linear(h_dim, h_dim), nn.ReLU(inplace=True),
nn.Linear(h_dim, self.n_action))
self.is_recurrent = False
self.apply(weights_init)
return
def act(self, *args, **kargs):
return self._act(*args, **kargs)
def evaluate_actions(self, *args, **kargs):
return self._act(*args, **kargs, eval_mode=True)
def _act(self, obs=None, test_mode=None, eval_mode=False, eval_actions=None, avail_act=None, agent_ids=None, eprsn=None, obs_hfeature=None):
assert (self.ready_to_go)
mask_dead = torch.isnan(obs).any(-1) # find dead agents
# if not (obs[..., -3+self.tp][~mask_dead] == -1).all().item():
# assert False
if self.static:
assert self.gp >=1
# if not test_mode: assert not self.ready_to_go
eval_act = eval_actions if eval_mode else None
others = {}
if self.use_normalization:
if torch.isnan(obs).all(): pass
else: obs = self._batch_norm(obs, freeze=(eval_mode or test_mode or self.static))
obs_hfeature_norm = obs_hfeature
mask_dead = torch.isnan(obs).any(-1)
obs = torch.nan_to_num_(obs, 0) # replace dead agents' obs, from NaN to 0
# # # # # # # # # # actor-critic share # # # # # # # # # # # #
baec = self.obs_encoder(obs)
baec = self.attention_layer(k=baec,q=baec,v=baec, mask=mask_dead)
# # # # # # # # # # actor # # # # # # # # # # # #
at_bac = my_view(baec,[0,0,-1])
at_bac_hn = self.hyper_net(at_bac, hyper_x=obs_hfeature_norm)
logits = self.policy_head(at_bac_hn)
# choose action selector
logit2act = self._logit2act_rsn if self.use_policy_resonance and self.is_resonance_active() else self._logit2act
# apply action selector
act, actLogProbs, distEntropy, probs = logit2act( logits,
eval_mode=eval_mode,
greedy=(test_mode or self.static),
eval_actions=eval_act,
avail_act=avail_act,
eprsn=eprsn )
if not eval_mode: return act, 'vph', actLogProbs
else: return 'vph', actLogProbs, distEntropy, probs, others
def _logit2act_rsn(self, logits_agent_cluster, eval_mode, greedy, eval_actions=None, avail_act=None, eprsn=None):
if avail_act is not None: logits_agent_cluster = torch.where(avail_act>0, logits_agent_cluster, -pt_inf())
act_dist = self.ccategorical.feed_logits(logits_agent_cluster)
if not greedy: act = self.ccategorical.sample(act_dist, eprsn) if not eval_mode else eval_actions
else: act = torch.argmax(act_dist.probs, axis=2)
# the policy gradient loss will feedback from here
actLogProbs = self._get_act_log_probs(act_dist, act)
# sum up the log prob of all agents
distEntropy = act_dist.entropy().mean(-1) if eval_mode else None
return act, actLogProbs, distEntropy, act_dist.probs
def _logit2act(self, logits_agent_cluster, eval_mode, greedy, eval_actions=None, avail_act=None, **kwargs):
if avail_act is not None: logits_agent_cluster = torch.where(avail_act>0, logits_agent_cluster, -pt_inf())
act_dist = Categorical(logits = logits_agent_cluster)
if not greedy: act = act_dist.sample() if not eval_mode else eval_actions
else: act = torch.argmax(act_dist.probs, axis=2)
actLogProbs = self._get_act_log_probs(act_dist, act) # the policy gradient loss will feedback from here
# sum up the log prob of all agents
distEntropy = act_dist.entropy().mean(-1) if eval_mode else None
return act, actLogProbs, distEntropy, act_dist.probs
@staticmethod
def _get_act_log_probs(distribution, action):
return distribution.log_prob(action.squeeze(-1)).unsqueeze(-1)
class NetCentralCritic(nn.Module):
def __init__(self, rawob_dim, n_action, **kwargs):
super().__init__()
self.update_cnt = nn.Parameter(
torch.zeros(1, requires_grad=False, dtype=torch.long), requires_grad=False)
self.use_normalization = AlgorithmConfig.use_normalization
self.use_policy_resonance = AlgorithmConfig.policy_resonance
self.n_action = n_action
if self.use_policy_resonance:
self.ccategorical = CCategorical(kwargs['stage_planner'])
self.is_resonance_active = lambda: kwargs['stage_planner'].is_resonance_active()
h_dim = AlgorithmConfig.net_hdim
# observation normalization
if self.use_normalization:
self._batch_norm = DynamicNormFix(rawob_dim, only_for_last_dim=True, exclude_one_hot=True, exclude_nan=True)
n_entity = AlgorithmConfig.n_entity_placeholder
# # # # # # # # # # actor-critic share # # # # # # # # # # # #
self.obs_encoder = nn.Sequential(nn.Linear(rawob_dim, h_dim), nn.ReLU(inplace=True), nn.Linear(h_dim, h_dim))
self.attention_layer = SimpleAttention(h_dim=h_dim)
# # # # # # # # # # critic # # # # # # # # # # # #
_size = n_entity * h_dim
self.hyper_net = HyperNet(embed_dim=h_dim, hyper_input_dim=6, x_input_dim=_size)
self.ct_encoder = nn.Sequential(nn.Linear(h_dim, h_dim), nn.ReLU(inplace=True), nn.Linear(h_dim, h_dim))
self.ct_attention_layer = SimpleAttention(h_dim=h_dim)
self.get_value = nn.Sequential(nn.Linear(h_dim, h_dim), nn.ReLU(inplace=True),nn.Linear(h_dim, 1))
self.is_recurrent = False
self.apply(weights_init)
return
def estimate_state(self, obs=None, test_mode=None, eval_mode=False, eval_actions=None, avail_act=None, agent_ids=None, eprsn=None, obs_hfeature_critic=None):
if self.use_normalization:
if torch.isnan(obs).all(): pass
else: obs = self._batch_norm(obs, freeze=(eval_mode or test_mode))
obs_hfeature_norm = obs_hfeature_critic
mask_dead = torch.isnan(obs).any(-1)
obs = torch.nan_to_num_(obs, 0) # replace dead agents' obs, from NaN to 0
# # # # # # # # # # actor-critic share # # # # # # # # # # # #
baec = self.obs_encoder(obs)
baec = self.attention_layer(k=baec,q=baec,v=baec, mask=mask_dead)
# # # # # # # # # # critic # # # # # # # # # # # #
ct_bac = my_view(baec,[0,0,-1])
ct_bac_hn = self.hyper_net(ct_bac, hyper_x=obs_hfeature_norm)
ct_bac = self.ct_encoder(ct_bac_hn)
ct_bac = self.ct_attention_layer(k=ct_bac,q=ct_bac,v=ct_bac)
value = self.get_value(ct_bac)
return value
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/hete_league_onenet_fix/ppo.py
================================================
import torch, math, traceback
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np
from random import randint, sample
from torch.utils.data.sampler import BatchSampler, SubsetRandomSampler
from UTIL.colorful import *
from UTIL.tensor_ops import _2tensor, __hash__, __hashn__
from config import GlobalConfig as cfg
from UTIL.gpu_share import GpuShareUnit
from .ppo_sampler import TrajPoolSampler
from VISUALIZE.mcom import mcom
class PPO():
def __init__(self, policy_and_critic, ppo_config, mcv=None):
self.policy_and_critic = policy_and_critic
self.clip_param = ppo_config.clip_param
self.ppo_epoch = ppo_config.ppo_epoch
self.use_avail_act = ppo_config.ppo_epoch
self.n_pieces_batch_division = ppo_config.n_pieces_batch_division
self.value_loss_coef = ppo_config.value_loss_coef
self.entropy_coef = ppo_config.entropy_coef
self.max_grad_norm = ppo_config.max_grad_norm
self.add_prob_loss = ppo_config.add_prob_loss
self.prevent_batchsize_oom = ppo_config.prevent_batchsize_oom
# self.freeze_body = ppo_config.freeze_body
self.lr = ppo_config.lr
self.all_parameter = list(policy_and_critic.named_parameters())
self.parameter = [p for p_name, p in self.all_parameter] # 535
# set learning rate differently?
if ppo_config.type_agent_diff_lr:
others_parameters = [v for k,v in self.all_parameter if '_nets_flat' not in k]
adam_lr_list = [
{'params': list(n.parameters()), 'lr':self.lr*n.lr_div} for n in policy_and_critic._nets_flat_placeholder_
] + [{'params': list(policy_and_critic._critic_central.parameters()), 'lr':self.lr}] # 33*3*5 + 40 = 535
assert sum([len(list(d['params'])) for d in adam_lr_list]) == len(self.all_parameter)
self.optimizer = optim.Adam(adam_lr_list, lr=self.lr)
else:
self.optimizer = optim.Adam(self.parameter, lr=self.lr)
self.g_update_delayer = 0
self.g_initial_value_loss = 0
# 轮流训练式
self.mcv = mcv
self.ppo_update_cnt = 0
self.batch_size_reminder = True
self.trivial_dict = {}
assert self.n_pieces_batch_division == 1
self.gpu_share_unit = GpuShareUnit(cfg.device, gpu_party=cfg.gpu_party)
self.mcv2 = mcom( path='%s/logger/ppo/'%cfg.logdir,
image_path='%s/detail_reward.jpg'%cfg.logdir,
rapid_flush=True,
draw_mode=cfg.draw_mode,
tag='[ppo.py]' )
self.mcv2.rec_init(color='g')
def freeze_body(self):
self.freeze_body = True
self.at_parameter = [p for p_name, p in self.all_parameter if 'AT_policy_head' in p_name]
self.at_optimizer = optim.Adam(self.at_parameter, lr=self.lr)
self.ct_parameter = [p for p_name, p in self.all_parameter if 'CT_' in p_name]
self.ct_optimizer = optim.Adam(self.ct_parameter, lr=self.lr*10.0) #(self.lr)
print('change train object')
def train_on_traj(self, traj_pool, task):
while True:
try:
with self.gpu_share_unit:
self.train_on_traj_(traj_pool, task)
break # 运行到这说明显存充足
except RuntimeError as err:
print(traceback.format_exc())
if self.prevent_batchsize_oom:
# in some cases, reversing MaxSampleNum a single time is not enough
if TrajPoolSampler.MaxSampleNum[-1] < 0: TrajPoolSampler.MaxSampleNum.pop(-1)
assert TrajPoolSampler.MaxSampleNum[-1] > 0
TrajPoolSampler.MaxSampleNum[-1] = -1
print亮红('Insufficient gpu memory, using previous sample size !')
else:
assert False
torch.cuda.empty_cache()
def log_reward_rich(self, traj_pool, mcv2):
tags = {}
for traj in traj_pool:
traj.reward_sum = sum(traj.reward[:,0])
gp_list = traj.gp_sel_summary[0]
if (gp_list==0).all():
tag = 'frontend'
if tag not in tags: tags[tag] = []
tags[tag].append(traj.reward_sum)
else:
gp = max(gp_list)
wr = self.policy_and_critic.ckpg_info[gp-1]['win_rate']
tp = np.argmax(gp_list)
tag = 'tp:%d wr:%.2f'%(tp, wr)
if tag not in tags: tags[tag] = []
tags[tag].append(traj.reward_sum)
tags = dict(sorted(tags.items()))
for k in tags:
mcv2.rec(np.array(tags[k]).mean(), k)
mcv2.rec_show()
def train_on_traj_(self, traj_pool, task):
self.log_reward_rich(traj_pool, self.mcv2)
ppo_valid_percent_list = []
sampler = TrajPoolSampler(n_div=1, traj_pool=traj_pool, flag=task, prevent_batchsize_oom=self.prevent_batchsize_oom, mcv=self.mcv)
# before_training_hash = [__hashn__(t.parameters()) for t in (self.policy_and_critic._nets_flat_placeholder_)]
for e in range(self.ppo_epoch):
sample_iter = sampler.reset_and_get_iter()
self.optimizer.zero_grad()
# ! get traj fragment
sample = next(sample_iter)
# ! build graph, then update network
loss_final, others = self.establish_pytorch_graph(task, sample, e)
loss_final = loss_final*0.5
if e==0: print('[PPO.py] Memory Allocated %.2f GB'%(torch.cuda.memory_allocated()/1073741824))
loss_final.backward()
# log
ppo_valid_percent_list.append(others.pop('PPO valid percent').item())
self.log_trivial(dictionary=others); others = None
nn.utils.clip_grad_norm_(self.parameter, self.max_grad_norm)
self.optimizer.step()
if ppo_valid_percent_list[-1] < 0.70:
print亮黄('policy change too much, epoch terminate early'); break
pass # finish all epoch update
print亮黄(np.array(ppo_valid_percent_list))
self.log_trivial_finalize()
net_updated = [any([p.grad is not None for p in t.parameters()]) for t in (self.policy_and_critic._nets_flat_placeholder_)]
self.optimizer.zero_grad(set_to_none=True)
self.ppo_update_cnt += 1
for updated, net in zip(net_updated, self.policy_and_critic._nets_flat_placeholder_):
if updated:
net.update_cnt.data[0] = self.ppo_update_cnt
self.policy_and_critic.on_update(self.ppo_update_cnt)
torch.cuda.empty_cache()
return self.ppo_update_cnt
def freeze_body(self):
assert False, "function forbidden"
self.freeze_body = True
self.parameter_pv = [p_name for p_name, p in self.all_parameter if not any(p_name.startswith(kw) for kw in ('obs_encoder', 'attention_layer'))]
self.parameter = [p for p_name, p in self.all_parameter if not any(p_name.startswith(kw) for kw in ('obs_encoder', 'attention_layer'))]
self.optimizer = optim.Adam(self.parameter, lr=self.lr)
print('change train object')
def log_trivial(self, dictionary):
for key in dictionary:
if key not in self.trivial_dict: self.trivial_dict[key] = []
item = dictionary[key].item() if hasattr(dictionary[key], 'item') else dictionary[key]
self.trivial_dict[key].append(item)
def log_trivial_finalize(self, print=True):
for key in self.trivial_dict:
self.trivial_dict[key] = np.array(self.trivial_dict[key])
print_buf = ['[ppo.py] ']
for key in self.trivial_dict:
self.trivial_dict[key] = self.trivial_dict[key].mean()
print_buf.append(' %s:%.3f, '%(key, self.trivial_dict[key]))
if self.mcv is not None: self.mcv.rec(self.trivial_dict[key], key)
if print: print紫(''.join(print_buf))
if self.mcv is not None:
self.mcv.rec_show()
self.trivial_dict = {}
def establish_pytorch_graph(self, flag, sample, n):
obs = _2tensor(sample['obs'])
advantage = _2tensor(sample['advantage'])
action = _2tensor(sample['action'])
oldPi_actionLogProb = _2tensor(sample['actionLogProb'])
real_value = _2tensor(sample['return'])
hete_pick = _2tensor(sample['hete_pick'])
hete_type = _2tensor(sample['hete_type'])
gp_sel_summary = _2tensor(sample['gp_sel_summary'])
avail_act = _2tensor(sample['avail_act']) if 'avail_act' in sample else None
# batchsize = advantage.shape[0]#; print亮紫(batchsize)
batch_agent_size = advantage.shape[0]*advantage.shape[1]
assert flag == 'train'
newPi_value, newPi_actionLogProb, entropy, probs, others = \
self.policy_and_critic.evaluate_actions(
obs=obs,
eval_actions=action,
test_mode=False,
avail_act=avail_act,
hete_pick=hete_pick,
hete_type=hete_type,
gp_sel_summary=gp_sel_summary)
entropy_loss = entropy.mean()
n_actions = probs.shape[-1]
if self.add_prob_loss: assert n_actions <= 15 #
penalty_prob_line = (1/n_actions)*0.12
probs_loss = (penalty_prob_line - torch.clamp(probs, min=0, max=penalty_prob_line)).mean()
if not self.add_prob_loss:
probs_loss = torch.zeros_like(probs_loss)
# dual clip ppo core
E = newPi_actionLogProb - oldPi_actionLogProb
E_clip = torch.zeros_like(E)
E_clip = torch.where(advantage > 0, torch.clamp(E, max=np.log(1.0+self.clip_param)), E_clip)
E_clip = torch.where(advantage < 0, torch.clamp(E, min=np.log(1.0-self.clip_param), max=np.log(5) ), E_clip)
ratio = torch.exp(E_clip)
policy_loss = -(ratio*advantage).mean()
# add all loses
value_loss = 0.5 * F.mse_loss(real_value, newPi_value)
AT_net_loss = policy_loss - entropy_loss*self.entropy_coef # + probs_loss*20
CT_net_loss = value_loss * 1.0
# AE_new_loss = ae_loss * 1.0
loss_final = AT_net_loss + CT_net_loss # + AE_new_loss
ppo_valid_percent = ((E_clip == E).int().sum()/batch_agent_size)
nz_mask = real_value!=0
value_loss_abs = (real_value[nz_mask] - newPi_value[nz_mask]).abs().mean()
others = {
'Value loss Abs': value_loss_abs,
'PPO valid percent': ppo_valid_percent,
'CT_net_loss': CT_net_loss,
'AT_net_loss': AT_net_loss,
}
return loss_final, others
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/hete_league_onenet_fix/ppo_sampler.py
================================================
import torch, math, traceback
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np
from random import randint, sample
from torch.utils.data.sampler import BatchSampler, SubsetRandomSampler
from UTIL.colorful import *
from UTIL.tensor_ops import _2tensor, __hash__, repeat_at
from config import GlobalConfig as cfg
from UTIL.gpu_share import GpuShareUnit
class TrajPoolSampler():
def __init__(self, n_div, traj_pool, flag, prevent_batchsize_oom=False, mcv=None):
self.n_pieces_batch_division = n_div
self.prevent_batchsize_oom = prevent_batchsize_oom
self.mcv = mcv
if self.prevent_batchsize_oom:
assert self.n_pieces_batch_division==1, ('?')
self.num_batch = None
self.container = {}
self.warned = False
assert flag=='train'
req_dict = ['hete_type', 'gp_sel_summary', 'avail_act', 'obs', 'action', 'actionLogProb', 'return', 'reward', 'hete_pick', 'value']
req_dict_rename = ['hete_type', 'gp_sel_summary', 'avail_act', 'obs', 'action', 'actionLogProb', 'return', 'reward', 'hete_pick', 'state_value']
return_rename = "return"
value_rename = "state_value"
advantage_rename = "advantage"
# replace 'obs' to 'obs > xxxx'
for key_index, key in enumerate(req_dict):
key_name = req_dict[key_index]
key_rename = req_dict_rename[key_index]
if not hasattr(traj_pool[0], key_name):
real_key_list = [real_key for real_key in traj_pool[0].__dict__ if (key_name+'>' in real_key)]
assert len(real_key_list) > 0, ('check variable provided!', key,key_index)
for real_key in real_key_list:
mainkey, subkey = real_key.split('>')
req_dict.append(real_key)
req_dict_rename.append(key_rename+'>'+subkey)
self.big_batch_size = -1 # vector should have same length, check it!
# load traj into a 'container'
for key_index, key in enumerate(req_dict):
key_name = req_dict[key_index]
key_rename = req_dict_rename[key_index]
if not hasattr(traj_pool[0], key_name): continue
set_item = np.concatenate([getattr(traj, key_name) for traj in traj_pool], axis=0)
if not (self.big_batch_size==set_item.shape[0] or (self.big_batch_size<0)):
print('error')
assert self.big_batch_size==set_item.shape[0] or (self.big_batch_size<0), (key,key_index)
self.big_batch_size = set_item.shape[0]
self.container[key_rename] = set_item # 指针赋值
# normalize advantage inside the batch
self.container[advantage_rename] = self.container[return_rename] - self.container[value_rename]
self.container[advantage_rename] = ( self.container[advantage_rename] - self.container[advantage_rename].mean() ) / (self.container[advantage_rename].std() + 1e-5)
# size of minibatch for each agent
self.mini_batch_size = math.ceil(self.big_batch_size / self.n_pieces_batch_division)
def __len__(self):
return self.n_pieces_batch_division
def determine_max_n_sample(self):
assert self.prevent_batchsize_oom
if not hasattr(TrajPoolSampler,'MaxSampleNum'):
# initialization
TrajPoolSampler.MaxSampleNum = [int(self.big_batch_size*(i+1)/50) for i in range(50)]
max_n_sample = self.big_batch_size
elif TrajPoolSampler.MaxSampleNum[-1] > 0:
# meaning that oom never happen, at least not yet
# only update when the batch size increases
if self.big_batch_size > TrajPoolSampler.MaxSampleNum[-1]: TrajPoolSampler.MaxSampleNum.append(self.big_batch_size)
max_n_sample = self.big_batch_size
else:
# meaning that oom already happened, choose TrajPoolSampler.MaxSampleNum[-2] to be the limit
assert TrajPoolSampler.MaxSampleNum[-2] > 0
max_n_sample = TrajPoolSampler.MaxSampleNum[-2]
return max_n_sample
def reset_and_get_iter(self):
if not self.prevent_batchsize_oom:
self.sampler = BatchSampler(SubsetRandomSampler(range(self.big_batch_size)), self.mini_batch_size, drop_last=False)
else:
max_n_sample = self.determine_max_n_sample()
n_sample = min(self.big_batch_size, max_n_sample)
if not hasattr(self,'reminded'):
self.reminded = True
drop_percent = (self.big_batch_size-n_sample)/self.big_batch_size*100
if self.mcv is not None:
self.mcv.rec(drop_percent, 'drop percent')
if drop_percent > 20:
print_ = print亮红
print_('droping %.1f percent samples..'%(drop_percent))
assert False, "GPU OOM!"
else:
print_ = print
print_('droping %.1f percent samples..'%(drop_percent))
self.sampler = BatchSampler(SubsetRandomSampler(range(n_sample)), n_sample, drop_last=False)
for indices in self.sampler:
selected = {}
for key in self.container:
selected[key] = self.container[key][indices]
for key in [key for key in selected if '>' in key]:
# 重新把子母键值组合成二重字典
mainkey, subkey = key.split('>')
if not mainkey in selected: selected[mainkey] = {}
selected[mainkey][subkey] = selected[key]
del selected[key]
yield selected
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/hete_league_onenet_fix/shell_env.py
================================================
import numpy as np
from config import GlobalConfig
from UTIL.colorful import *
from UTIL.tensor_ops import my_view, __hash__, repeat_at, gather_righthand
from MISSION.uhmap.actset_lookup import encode_action_as_digits
from .foundation import AlgorithmConfig
from .cython_func import roll_hisory
from .hete_assignment import select_nets_for_shellenv
class ShellEnvConfig:
add_avail_act = False
class ActionConvertLegacy():
def __init__(self, SELF_TEAM_ASSUME, OPP_TEAM_ASSUME, OPP_NUM_ASSUME) -> None:
self.SELF_TEAM_ASSUME = SELF_TEAM_ASSUME
self.OPP_TEAM_ASSUME = OPP_TEAM_ASSUME
self.OPP_NUM_ASSUME = OPP_NUM_ASSUME
# (main_cmd, sub_cmd, x=None, y=None, z=None, UID=None, T=None, T_index=None)
self.dictionary_args = [
('N/A', 'N/A', None, None, None, None, None, None), # 0
('Idle', 'DynamicGuard', None, None, None, None, None, None), # 1
('Idle', 'StaticAlert', None, None, None, None, None, None), # 2
('Idle', 'AsFarAsPossible', None, None, None, None, None, None), # 4
('Idle', 'StayWhenTargetInRange', None, None, None, None, None, None), # 5
('SpecificMoving', 'Dir+X', None, None, None, None, None, None), # 7
('SpecificMoving', 'Dir+Y', None, None, None, None, None, None), # 8
('SpecificMoving', 'Dir-X', None, None, None, None, None, None), # 9
('SpecificMoving', 'Dir-Y', None, None, None, None, None, None), # 10
]
for i in range(self.OPP_NUM_ASSUME):
self.dictionary_args.append( ('SpecificAttacking', 'N/A', None, None, None, None, OPP_TEAM_ASSUME, i) )
def convert_act_arr(self, type, a):
if type == 'RLA_UAV_Support':
args = self.dictionary_args[a]
# override wrong actions
if args[0] == 'SpecificAttacking':
return encode_action_as_digits('N/A', 'N/A', None, None, None, None, None, None)
# override incorrect actions
if args[0] == 'Idle':
return encode_action_as_digits('Idle', 'StaticAlert', None, None, None, None, None, None)
return encode_action_as_digits(*args)
else:
return encode_action_as_digits(*self.dictionary_args[a])
def get_tp_avail_act(self, type):
DISABLE = 0
ENABLE = 1
n_act = len(self.dictionary_args)
ret = np.zeros(n_act) + ENABLE
for i in range(n_act):
args = self.dictionary_args[i]
# for all kind of agents
if args[0] == 'PatrolMoving': ret[i] = DISABLE
if type == 'RLA_UAV_Support':
if args[0] == 'PatrolMoving': ret[i] = DISABLE
if args[0] == 'SpecificAttacking': ret[i] = DISABLE
if args[0] == 'Idle': ret[i] = DISABLE
if args[1] == 'StaticAlert': ret[i] = ENABLE
return ret
def confirm_parameters_are_correct(self, team, agent_num, opp_agent_num):
assert team == self.SELF_TEAM_ASSUME
assert self.SELF_TEAM_ASSUME + self.OPP_TEAM_ASSUME == 1
assert self.SELF_TEAM_ASSUME + self.OPP_TEAM_ASSUME == 1
assert opp_agent_num == self.OPP_NUM_ASSUME
def count_list_type(x):
type_cnt = {}
for xx in x:
if xx not in type_cnt: type_cnt[xx] = 0
type_cnt[xx] += 1
return len(type_cnt)
class ShellEnvWrapper(object):
def __init__(self, n_agent, n_thread, space, mcv, rl_functional, alg_config, ScenarioConfig, team):
self.n_agent = n_agent
self.n_thread = n_thread
self.team = team
self.space = space
self.mcv = mcv
self.rl_functional = rl_functional
if GlobalConfig.ScenarioConfig.EntityOriented:
self.core_dim = GlobalConfig.ScenarioConfig.obs_vec_length
else:
self.core_dim = space['obs_space']['obs_shape']
self.n_entity_placeholder = alg_config.n_entity_placeholder
# whether to use avail_act to block forbiden actions
self.AvailActProvided = False
if hasattr(ScenarioConfig, 'AvailActProvided'):
self.AvailActProvided = ScenarioConfig.AvailActProvided
self.action_converter = ActionConvertLegacy(
SELF_TEAM_ASSUME=team,
OPP_TEAM_ASSUME=(1-team),
OPP_NUM_ASSUME=GlobalConfig.ScenarioConfig.N_AGENT_EACH_TEAM[1-team]
)
# heterogeneous agent types
agent_type_list = [a['type'] for a in GlobalConfig.ScenarioConfig.SubTaskConfig.agent_list]
opp_type_list = [a['type'] for a in GlobalConfig.ScenarioConfig.SubTaskConfig.agent_list if a['team']!=self.team]
self_type_list = [a['type'] for a in GlobalConfig.ScenarioConfig.SubTaskConfig.agent_list if a['team']==self.team]
def str_array_to_num(str_arr):
out_arr = []
buffer = {}
for str in str_arr:
if str not in buffer:
buffer[str] = len(buffer)
out_arr.append(buffer[str])
return out_arr
self.HeteAgentType = str_array_to_num(agent_type_list)
self.hete_type = np.array(self.HeteAgentType)[GlobalConfig.ScenarioConfig.AGENT_ID_EACH_TEAM[team]]
self.n_hete_types = count_list_type(self.hete_type)
# check parameters
assert self.n_agent == len(self_type_list)
self.action_converter.confirm_parameters_are_correct(team, self.n_agent, len(opp_type_list))
self.patience = 2000
self.epsiode_cnt = 0
def cold_start_warmup(self, StateRecall):
self.agent_uid = GlobalConfig.ScenarioConfig.AGENT_ID_EACH_TEAM[self.team]
self.agent_type = [agent_meta['type']
for agent_meta in StateRecall['Latest-Team-Info'][0]['dataArr']
if agent_meta['uId'] in self.agent_uid]
if ShellEnvConfig.add_avail_act:
self.avail_act = np.stack(tuple(self.action_converter.get_tp_avail_act(tp) for tp in self.agent_type))
self.avail_act = repeat_at(self.avail_act, insert_dim=0, n_times=self.n_thread)
def interact_with_env(self, StateRecall):
# warm up at first execution
if not hasattr(self, 'agent_type'):
self.cold_start_warmup(StateRecall)
# action init to: -1
act = np.zeros(shape=(self.n_thread, self.n_agent), dtype=np.int) - 1
# read and reshape observation
obs = StateRecall['Latest-Obs']
obs = my_view(obs,[0, 0, -1, self.core_dim])
# mask out invalid observation with NaN
obs[(obs==0).all(-1)] = np.nan
# stopped env mask
P = StateRecall['ENV-PAUSE']
# running env mask
R = ~P
# reset env mask
RST = StateRecall['Env-Suffered-Reset']
# when needed, train!
if not StateRecall['Test-Flag']: self.rl_functional.train()
# if true: just experienced full reset on all episode, this is the first step of all env threads
if RST.all():
if AlgorithmConfig.allow_fast_test and GlobalConfig.test_only and (self.epsiode_cnt > GlobalConfig.report_reward_interval):
import sys
sys.exit(0)
self.epsiode_cnt += self.n_thread
# policy resonance
eprsn_yita = self.rl_functional.stage_planner.yita if AlgorithmConfig.policy_resonance else 0
EpRsn = np.random.rand(self.n_thread) < eprsn_yita
StateRecall['_EpRsn_'] = EpRsn
# heterogeneous agent identification
StateRecall['_hete_type_'] = repeat_at(self.hete_type, 0, self.n_thread)
# select static/frontier actor network
StateRecall['_hete_pick_'], StateRecall['_gp_pick_'] = select_nets_for_shellenv(
n_types=self.n_hete_types,
policy=self.rl_functional.policy,
hete_type_list=self.hete_type,
n_thread = self.n_thread,
n_gp=AlgorithmConfig.hete_n_net_placeholder,
testing=StateRecall['Test-Flag']
)
print([(t['win_rate'], t['ckpg_cnt']) for t in self.rl_functional.policy.ckpg_info])
# prepare observation for the real RL algorithm
I_StateRecall = {
'obs':obs[R],
'avail_act':self.avail_act[R],
'Test-Flag':StateRecall['Test-Flag'],
'_EpRsn_':StateRecall['_EpRsn_'][R],
'_hete_pick_':StateRecall['_hete_pick_'][R],
'_hete_type_':StateRecall['_hete_type_'][R],
'_gp_pick_':StateRecall['_gp_pick_'][R],
'threads_active_flag':R,
'Latest-Team-Info':StateRecall['Latest-Team-Info'][R],
}
# load available act to limit action space if possible
if self.AvailActProvided:
avail_act = np.array([info['avail-act'] for info in np.array(StateRecall['Latest-Team-Info'][R], dtype=object)])
I_StateRecall.update({'avail_act':avail_act})
# the real RL algorithm ! !
act_active, internal_recall = self.rl_functional.interact_with_env_genuine(I_StateRecall)
# get decision results
act[R] = act_active
# confirm actions are valid (satisfy 'avail-act')
if ShellEnvConfig.add_avail_act and self.patience>0:
self.patience -= 1
assert (gather_righthand(self.avail_act, repeat_at(act, -1, 1), check=False)[R]==1).all()
# translate action into ue4 tuple action
act_converted = np.array([[ self.action_converter.convert_act_arr(self.agent_type[agentid], act) for agentid, act in enumerate(th) ] for th in act])
# swap thread(batch) axis and agent axis
actions_list = np.swapaxes(act_converted, 0, 1)
# register callback hook
if not StateRecall['Test-Flag']:
StateRecall['_hook_'] = internal_recall['_hook_']
assert StateRecall['_hook_'] is not None
else:
if AlgorithmConfig.policy_matrix_testing:
StateRecall['_hook_'] = internal_recall['_hook_']
assert StateRecall['_hook_'] is not None
# all done
return actions_list, StateRecall
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/hete_league_onenet_fix/stage_planner.py
================================================
import math
from .foundation import AlgorithmConfig
from UTIL.colorful import *
class PolicyRsnConfig:
resonance_start_at_update = 10
yita_min_prob = 0.15 # should be >= (1/n_action)
yita_max = 0.75
yita_inc_per_update = 0.0075 # (increase to 0.75 in 500 updates)
freeze_critic = False
yita_shift_method = '-sin'
yita_shift_cycle = 1000
class StagePlanner:
def __init__(self, mcv) -> None:
if AlgorithmConfig.policy_resonance:
self.resonance_active = False
self.yita = 0
self.yita_min_prob = PolicyRsnConfig.yita_min_prob
self.freeze_body = False
self.update_cnt = 0
self.mcv = mcv
self.trainer = None
if AlgorithmConfig.wait_norm_stable:
self.wait_norm_stable_cnt = 2
else:
self.wait_norm_stable_cnt = 0
return
def is_resonance_active(self,):
return self.resonance_active
def is_body_freeze(self,):
return self.freeze_body
def get_yita(self):
return self.yita
def get_yita_min_prob(self):
return PolicyRsnConfig.yita_min_prob
def can_exec_trainning(self):
if self.wait_norm_stable_cnt > 0:
print亮绿('waiting initial normalization stable, skip training!')
self.wait_norm_stable_cnt -= 1
return False
else:
return True
def update_plan(self):
self.update_cnt += 1
if AlgorithmConfig.policy_resonance:
if self.resonance_active:
self.when_pr_active()
elif not self.resonance_active:
self.when_pr_inactive()
return
def activate_pr(self):
self.resonance_active = True
self.freeze_body = True
if PolicyRsnConfig.freeze_critic:
self.trainer.freeze_body()
def when_pr_inactive(self):
assert not self.resonance_active
if PolicyRsnConfig.resonance_start_at_update >= 0:
# mean need to activate pr later
if self.update_cnt > PolicyRsnConfig.resonance_start_at_update:
# time is up, activate pr
self.activate_pr()
# log
pr = 1 if self.resonance_active else 0
self.mcv.rec(pr, 'resonance')
self.mcv.rec(self.yita, 'self.yita')
def when_pr_active(self):
assert self.resonance_active
self._update_yita()
# log
pr = 1 if self.resonance_active else 0
self.mcv.rec(pr, 'resonance')
self.mcv.rec(self.yita, 'self.yita')
def _update_yita(self):
'''
increase self.yita by @yita_inc_per_update per function call
'''
if PolicyRsnConfig.yita_shift_method == '-cos':
self.yita = PolicyRsnConfig.yita_max
t = -math.cos(2*math.pi/PolicyRsnConfig.yita_shift_cycle * self.update_cnt) * PolicyRsnConfig.yita_max
if t<=0:
self.yita = 0
else:
self.yita = t
print亮绿('yita update:', self.yita)
elif PolicyRsnConfig.yita_shift_method == '-sin':
self.yita = PolicyRsnConfig.yita_max
t = -math.sin(2*math.pi/PolicyRsnConfig.yita_shift_cycle * self.update_cnt) * PolicyRsnConfig.yita_max
if t<=0:
self.yita = 0
else:
self.yita = t
print亮绿('yita update:', self.yita)
elif PolicyRsnConfig.yita_shift_method == 'slow-inc':
self.yita += PolicyRsnConfig.yita_inc_per_update
if self.yita > PolicyRsnConfig.yita_max:
self.yita = PolicyRsnConfig.yita_max
print亮绿('yita update:', self.yita)
else:
assert False
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/hete_league_onenet_fix/trajectory.py
================================================
# cython: language_level=3
from config import GlobalConfig
import numpy as np
from numpy.core.numeric import indices
from .foundation import AlgorithmConfig
from ALGORITHM.common.traj import TRAJ_BASE
import copy
from UTIL.colorful import *
from UTIL.tensor_ops import __hash__, my_view, np_one_hot, np_repeat_at, np_softmax, scatter_with_nan
class trajectory(TRAJ_BASE):
def __init__(self, traj_limit, env_id):
super().__init__(traj_limit, env_id)
self.reference_track_name = 'value'
def early_finalize(self):
assert not self.readonly_lock # unfinished traj
self.need_reward_bootstrap = True
def set_terminal_obs(self, tobs):
self.tobs = copy.deepcopy(tobs)
def cut_tail(self):
# 删去多余的预留空间
super().cut_tail()
TJ = lambda key: getattr(self, key)
# 进一步地, 根据这个轨迹上的NaN,删除所有无效时间点
reference_track = getattr(self, self.reference_track_name)
if self.need_reward_bootstrap:
assert False, ('it should not go here if everything goes as expected')
# print('need_reward_bootstrap') 找到最后一个不是nan的位置
T = np.where(~np.isnan(reference_track.squeeze()))[0][-1]
self.boot_strap_value = {
'bootstrap_value':TJ('value').squeeze()[T].copy(),
}
assert not hasattr(self,'tobs')
self.set_terminal_obs(TJ('g_obs')[T].copy())
reference_track[T] = np.nan
# deprecated if nothing in it
p_invalid = np.isnan(my_view(reference_track, [0, -1])).any(axis=-1)
p_valid = ~p_invalid
if p_invalid.all(): #invalid traj
self.deprecated_flag = True
return
# adjust reward position
reward = TJ('reward')
for i in reversed(range(self.time_pointer)):
if p_invalid[i] and i != 0: # invalid, push reward forward
reward[i-1] += reward[i]; reward[i] = np.nan
setattr(self, 'reward', reward)
# clip NaN
for key in self.key_dict: setattr(self, key, TJ(key)[p_valid])
# all done
return
def reward_push_forward(self, dead_mask):
# self.new_reward = self.reward.copy()
if AlgorithmConfig.gamma_in_reward_forwarding:
gamma = AlgorithmConfig.gamma_in_reward_forwarding_value
for i in reversed(range(self.time_pointer)):
if i==0: continue
self.reward[i-1] += np.where(dead_mask[i], self.reward[i]*gamma, 0) # if dead_mask[i]==True, this frame is invalid, move reward forward, set self.reward[i] to 0
self.reward[i] = np.where(dead_mask[i], 0, self.reward[i]) # if dead_mask[i]==True, this frame is invalid, move reward forward, set self.reward[i] to 0
else:
for i in reversed(range(self.time_pointer)):
if i==0: continue
self.reward[i-1] += np.where(dead_mask[i], self.reward[i], 0) # if dead_mask[i]==True, this frame is invalid, move reward forward, set self.reward[i] to 0
self.reward[i] = np.where(dead_mask[i], 0, self.reward[i]) # if dead_mask[i]==True, this frame is invalid, move reward forward, set self.reward[i] to 0
return
# new finalize
def finalize(self):
self.readonly_lock = True
assert not self.deprecated_flag
TJ = lambda key: getattr(self, key)
assert not np.isnan(TJ('reward')).any()
# deadmask
tmp = np.isnan(my_view(self.obs, [0,0,-1]))
dead_mask = tmp.all(-1)
# if (True): # check if the mask is correct
# dead_mask_self = np.isnan(my_view(self.obs, [0,0,-1])[:,:,0])
# assert (dead_mask==dead_mask_self).all()
# dead_mask2 = tmp.any(-1)
# assert (dead_mask==dead_mask2).all()
self.reward_push_forward(dead_mask) # push terminal reward forward 38 42 54
threat = np.zeros(shape=dead_mask.shape) - 1
assert dead_mask.shape[0] == self.time_pointer
for i in reversed(range(self.time_pointer)):
# threat[:(i+1)] 不包含threat[(i+1)]
if i+1 < self.time_pointer:
threat[:(i+1)] += (~(dead_mask[i+1]&dead_mask[i])).astype(np.int)
elif i+1 == self.time_pointer:
threat[:] += (~dead_mask[i]).astype(np.int)
SAFE_LIMIT = 11
threat = np.clip(threat, -1, SAFE_LIMIT)
setattr(self, 'threat', np.expand_dims(threat, -1))
# ! Use GAE to calculate return
self.gae_finalize_return(reward_key='reward', value_key='value', new_return_name='return')
return
def gae_finalize_return(self, reward_key, value_key, new_return_name):
# ------- gae parameters -------
gamma = AlgorithmConfig.gamma
tau = AlgorithmConfig.tau
# ------- -------------- -------
rewards = getattr(self, reward_key)
value = getattr(self, value_key)
length = rewards.shape[0]
assert rewards.shape[0]==value.shape[0]
# if dimension not aligned
if rewards.ndim == value.ndim-1: rewards = np.expand_dims(rewards, -1)
# initalize two more tracks
setattr(self, new_return_name, np.zeros_like(value))
self.key_dict.append(new_return_name)
returns = getattr(self, new_return_name)
boot_strap = 0 if not self.need_reward_bootstrap else self.boot_strap_value['bootstrap_'+value_key]
for step in reversed(range(length)):
if step==(length-1): # 最后一帧
value_preds_delta = rewards[step] + gamma * boot_strap - value[step]
gae = value_preds_delta
else:
value_preds_delta = rewards[step] + gamma * value[step + 1] - value[step]
gae = value_preds_delta + gamma * tau * gae
returns[step] = gae + value[step]
class TrajPoolManager(object):
def __init__(self):
self.cnt = 0
def absorb_finalize_pool(self, pool):
for traj_handle in pool:
traj_handle.cut_tail()
pool = list(filter(lambda traj: not traj.deprecated_flag, pool))
for traj_handle in pool: traj_handle.finalize()
self.cnt += 1
task = ['train']
return task, pool
'''
轨迹池管理
'''
class TrajManagerBase(object):
def __init__(self, n_env, traj_limit):
self.n_env = n_env
self.traj_limit = traj_limit
self.update_cnt = 0
self.traj_pool = []
self.registered_keys = []
self.live_trajs = [trajectory(self.traj_limit, env_id=i) for i in range(self.n_env)]
self.live_traj_frame = [0 for _ in range(self.n_env)]
self._traj_lock_buf = None
self.patience = 1000
pass
def __check_integraty(self, traj_frag):
if self.patience < 0:
return # stop wasting time checking this
self.patience -= 1
for key in traj_frag:
if key not in self.registered_keys and (not key.startswith('_')):
self.registered_keys.append(key)
for key in self.registered_keys:
assert key in traj_frag, ('this key sometimes disappears from the traj_frag:', key)
def batch_update(self, traj_frag):
self.__check_integraty(traj_frag)
done = traj_frag['_DONE_']; traj_frag.pop('_DONE_') # done flag
skip = traj_frag['_SKIP_']; traj_frag.pop('_SKIP_') # skip/frozen flag
tobs = traj_frag['_TOBS_']; traj_frag.pop('_TOBS_') # terminal obs
# single bool to list bool
if isinstance(done, bool): done = [done for i in range(self.n_env)]
if isinstance(skip, bool): skip = [skip for i in range(self.n_env)]
n_active = sum(~skip)
# feed
cnt = 0
for env_i in range(self.n_env):
if skip[env_i]: continue
# otherwise
frag_index = cnt; cnt += 1
env_index = env_i
traj_handle = self.live_trajs[env_index]
for key in traj_frag:
self.traj_remember(traj_handle, key=key, content=traj_frag[key],frag_index=frag_index, n_active=n_active)
self.live_traj_frame[env_index] += 1
traj_handle.time_shift()
if done[env_i]:
assert tobs[env_i] is not None # get the final obs
traj_handle.set_terminal_obs(tobs[env_i])
self.traj_pool.append(traj_handle)
self.live_trajs[env_index] = trajectory(self.traj_limit, env_id=env_index)
self.live_traj_frame[env_index] = 0
def traj_remember(self, traj, key, content, frag_index, n_active):
if content is None: traj.remember(key, None)
elif isinstance(content, dict):
for sub_key in content:
self.traj_remember(traj, "".join((key , ">" , sub_key)), content=content[sub_key], frag_index=frag_index, n_active=n_active)
else:
assert n_active == len(content), ('length error')
traj.remember(key, content[frag_index]) # *
class BatchTrajManager(TrajManagerBase):
def __init__(self, n_env, traj_limit, trainer_hook):
super().__init__(n_env, traj_limit)
self.trainer_hook = trainer_hook
self.traj_limit = traj_limit
self.train_traj_needed = AlgorithmConfig.train_traj_needed
self.pool_manager = TrajPoolManager()
def update(self, traj_frag, index):
assert traj_frag is not None
for j, env_i in enumerate(index):
traj_handle = self.live_trajs[env_i]
for key in traj_frag:
if traj_frag[key] is None:
assert False, key
if isinstance(traj_frag[key], dict): # 如果是二重字典,特殊处理
for sub_key in traj_frag[key]:
content = traj_frag[key][sub_key][j]
traj_handle.remember(key + ">" + sub_key, content)
else:
content = traj_frag[key][j]
traj_handle.remember(key, content)
self.live_traj_frame[env_i] += 1
traj_handle.time_shift()
return
# 函数入口
def feed_traj_framedata(self, traj_frag, require_hook=False):
# an unlock hook must be executed before new trajectory feed in
assert self._traj_lock_buf is None
if require_hook:
# the traj_frag is not intact, lock up traj_frag, wait for more
assert '_SKIP_' in traj_frag
assert '_DONE_' not in traj_frag
assert 'reward' not in traj_frag
self._traj_lock_buf = traj_frag
return self.unlock_fn
else:
assert '_DONE_' in traj_frag
assert '_SKIP_' in traj_frag
self.batch_update(traj_frag=traj_frag)
return
def clear_traj_pool(self):
print('do update %d'%self.update_cnt)
_, self.traj_pool = self.pool_manager.absorb_finalize_pool(pool=self.traj_pool)
self.traj_pool = []
# self.update_cnt += 1
# assert ppo_update_cnt == self.update_cnt
return self.update_cnt
def train_and_clear_traj_pool(self):
print('do update %d'%self.update_cnt)
current_task_l, self.traj_pool = self.pool_manager.absorb_finalize_pool(pool=self.traj_pool)
for current_task in current_task_l:
ppo_update_cnt = self.trainer_hook(self.traj_pool, current_task)
self.traj_pool = []
self.update_cnt += 1
# assert ppo_update_cnt == self.update_cnt
return self.update_cnt
def can_exec_training(self):
if len(self.traj_pool) >= self.train_traj_needed: return True
else: return False
def unlock_fn(self, traj_frag):
assert self._traj_lock_buf is not None
traj_frag.update(self._traj_lock_buf)
self._traj_lock_buf = None
assert '_DONE_' in traj_frag
assert '_SKIP_' in traj_frag
self.batch_update(traj_frag=traj_frag)
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/my_ai/foundation.py
================================================
import numpy as np
from UTIL.colorful import *
from UTIL.tensor_ops import my_view, __hash__
from config import GlobalConfig
from MISSION.uhmap.actionset import strActionToDigits, ActDigitLen
class AlgorithmConfig:
preserve = ''
class ReinforceAlgorithmFoundation(object):
def __init__(self, n_agent, n_thread, space, mcv=None, team=None):
self.n_agent = n_agent
self.n_thread = n_thread
self.space = space
self.mcv = mcv
def interact_with_env(self, StateRecall):
obs = StateRecall['Latest-Obs']
P = StateRecall['ENV-PAUSE']
active_thread_obs = obs[~P]
actions = np.zeros(shape=(self.n_thread, self.n_agent, ActDigitLen))
for env_index in range(self.n_thread):
for agent_index in range(self.n_agent):
if np.random.rand() < 0.5:
color_index = np.random.randint(low=0, high=4)
actions[env_index, agent_index] = strActionToDigits(f'ActionSetDemo::ChangeColor;{color_index}')
else:
uid = 11 if agent_index % 2 == 0 else 10
actions[env_index, agent_index] = strActionToDigits(f'ActionSetDemo::FireToWaterdrop;{uid}')
StateRecall['_hook_'] = None
return actions, StateRecall
class DiscreteRLFoundation(object):
def __init__(self, n_agent, n_thread, space, mcv=None, team=None):
self.n_agent = n_agent
self.n_thread = n_thread
self.space = space
self.mcv = mcv
self.action_list = [
strActionToDigits('ActionSetDemo::ChangeColor;0'),
strActionToDigits('ActionSetDemo::ChangeColor;1'),
strActionToDigits('ActionSetDemo::ChangeColor;2'),
strActionToDigits('ActionSetDemo::ChangeColor;3'),
strActionToDigits('ActionSetDemo::FireToWaterdrop;10'),
strActionToDigits('ActionSetDemo::FireToWaterdrop;11'),
strActionToDigits('ActionSetDemo::MoveToDirection;X=1.0 Y=0.0 Z=0.0'),
strActionToDigits('ActionSetDemo::MoveToDirection;X=1.0 Y=1.0 Z=0.0'),
strActionToDigits('ActionSetDemo::MoveToDirection;X=0.0 Y=1.0 Z=0.0'),
strActionToDigits('ActionSetDemo::MoveToDirection;X=-1.0 Y=1.0 Z=0.0'),
strActionToDigits('ActionSetDemo::MoveToDirection;X=-1.0 Y=0.0 Z=0.0'),
strActionToDigits('ActionSetDemo::MoveToDirection;X=-1.0 Y=-1.0 Z=0.0'),
strActionToDigits('ActionSetDemo::MoveToDirection;X=0.0 Y=-1.0 Z=0.0'),
strActionToDigits('ActionSetDemo::MoveToDirection;X=1.0 Y=-1.0 Z=0.0'),
strActionToDigits('ActionSetDemo::MoveToDirection;X=0.0 Y=0.0 Z=1.0'),
strActionToDigits('ActionSetDemo::MoveToDirection;X=0.0 Y=0.0 Z=-1.0'),
]
self.how_many_actions = len(self.action_list)
def interact_with_env(self, StateRecall):
obs = StateRecall['Latest-Obs']
P = StateRecall['ENV-PAUSE']
active_thread_obs = obs[~P]
actions = np.zeros(shape=(self.n_thread, self.n_agent, ActDigitLen))
for env_index in range(self.n_thread):
for agent_index in range(self.n_agent):
action_x = np.random.randint(low=0,high=self.how_many_actions)
actions[env_index, agent_index] = self.action_list[action_x]
StateRecall['_hook_'] = None
return actions, StateRecall
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/ppo_ma/ccategorical.py
================================================
from torch.distributions.categorical import Categorical
import torch
from .foundation import AlgorithmConfig
from UTIL.tensor_ops import repeat_at, _2tensor
from torch.distributions import kl_divergence
EPS = 1e-9
# yita = p_hit = 0.14
def random_process(probs, rsn_flag):
yita = AlgorithmConfig.yita
with torch.no_grad():
max_place = probs.argmax(-1, keepdims=True)
mask_max = torch.zeros_like(probs).scatter_(-1, max_place, 1).bool()
pmax = probs[mask_max]
if rsn_flag:
assert max_place.shape[-1] == 1
return max_place.squeeze(-1)
else:
# forbit max prob being chosen, pmax = probs.max(axis=-1)
p_hat = pmax + (pmax-1)/(1/yita-1)
k = 1/(1-yita)
#!!! write
probs *= k
#!!! write
probs[mask_max] = p_hat
# print(probs)
dist = Categorical(probs=probs)
samp = dist.sample()
assert samp.shape[-1] != 1
return samp
def random_process_allow_big_yita(probs, rsn_flag):
yita = AlgorithmConfig.yita
with torch.no_grad():
max_place = probs.argmax(-1, keepdims=True)
mask_max = torch.zeros_like(probs).scatter_(-1, max_place, 1).bool()
pmax = probs[mask_max].reshape(max_place.shape) #probs[max_place].clone()
if rsn_flag:
assert max_place.shape[-1] == 1
return max_place.squeeze(-1)
else:
# forbit max prob being chosen
# pmax = probs.max(axis=-1) #probs[max_place].clone()
yita_arr = torch.ones_like(pmax)*yita
yita_arr_clip = torch.minimum(pmax, yita_arr)
# p_hat = pmax + (pmax-1) / (1/yita_arr_clip-1) + 1e-10
p_hat = (pmax-yita_arr_clip)/(1-yita_arr_clip)
k = 1/(1-yita_arr_clip)
probs *= k
probs[mask_max] = p_hat.reshape(-1)
# print(probs)
dist = Categorical(probs=probs)
samp = dist.sample()
assert samp.shape[-1] != 1
return samp #.squeeze(-1)
def random_process_with_clamp3(probs, yita, yita_min_prob, rsn_flag):
with torch.no_grad():
max_place = probs.argmax(-1, keepdims=True)
mask_max = torch.zeros_like(probs).scatter_(dim=-1, index=max_place, value=1).bool()
pmax = probs[mask_max].reshape(max_place.shape)
# act max
assert max_place.shape[-1] == 1
act_max = max_place.squeeze(-1)
# act samp
yita_arr = torch.ones_like(pmax)*yita
# p_hat = pmax + (pmax-1) / (1/yita_arr_clip-1) + 1e-10
p_hat = (pmax-yita_arr)/((1-yita_arr)+EPS)
p_hat = p_hat.clamp(min=yita_min_prob)
k = (1-p_hat)/((1-pmax)+EPS)
probs *= k
probs[mask_max] = p_hat.reshape(-1)
dist = Categorical(probs=probs)
act_samp = dist.sample()
# assert act_samp.shape[-1] != 1
hit_e = _2tensor(rsn_flag)
return torch.where(hit_e, act_max, act_samp)
class CCategorical():
def __init__(self, planner):
self.planner = planner
pass
def sample(self, dist, eprsn):
probs = dist.probs.clone()
return random_process_with_clamp3(probs, self.planner.yita, self.planner.yita_min_prob, eprsn)
def register_rsn(self, rsn_flag):
self.rsn_flag = rsn_flag
def feed_logits(self, logits):
try:
return Categorical(logits=logits)
except:
print('error')
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/ppo_ma/cython_func.pyx
================================================
import numpy as np
cimport numpy as np
cimport cython
from cython.parallel import prange
np.import_array()
ctypedef fused DTYPE_float:
np.float32_t
np.float64_t
ctypedef fused DTYPE_int64_t:
np.int64_t
np.int32_t # to compat Windows
ctypedef np.uint8_t DTYPE_bool_t
@cython.boundscheck(False)
@cython.wraparound(False)
@cython.nonecheck(False)
def roll_hisory( DTYPE_float[:,:,:,:] obs_feed_new,
DTYPE_float[:,:,:,:] prev_obs_feed,
DTYPE_bool_t[:,:,:] valid_mask,
DTYPE_int64_t[:,:] N_valid,
DTYPE_float[:,:,:,:] next_his_pool):
# how many threads
cdef Py_ssize_t vmax = N_valid.shape[0]
# how many agents
cdef Py_ssize_t wmax = N_valid.shape[1]
# how many entity subjects (including self @0)
cdef Py_ssize_t max_obs_entity = obs_feed_new.shape[2]
cdef int n_v, th, a, t, k, pointer
for th in prange(vmax, nogil=True):
# for each thread range -> prange
for a in prange(wmax):
# for each agent
pointer = 0
# step 1 fill next_his_pool[0 ~ (nv-1)] with obs_feed_new[0 ~ max_obs_entity-1]
for k in range(max_obs_entity):
if valid_mask[th,a,k]:
next_his_pool[th,a, pointer] = obs_feed_new[th,a, k]
pointer = pointer + 1
# step 2 fill next_his_pool[nv ~ (max_obs_entity-1)] with prev_obs_feed[0 ~ (max_obs_entity-1-nv)]
n_v = N_valid[th,a]
for k in range(n_v, max_obs_entity):
next_his_pool[th,a, k] = prev_obs_feed[th,a, k-n_v]
return np.asarray(next_his_pool)
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/ppo_ma/div_tree.py
================================================
import torch
import torch.nn as nn
import numpy as np
from ALGORITHM.common.mlp import LinearFinal
from UTIL.tensor_ops import add_onehot_id_at_last_dim, add_onehot_id_at_last_dim_fixlen, repeat_at, _2tensor, gather_righthand, scatter_righthand
class DivTree(nn.Module): # merge by MLP version
def __init__(self, input_dim, h_dim, n_action):
super().__init__()
# to design a division tree, I need to get the total number of agents
from .foundation import AlgorithmConfig
self.n_agent = AlgorithmConfig.n_agent
self.div_tree = get_division_tree(self.n_agent)
self.n_level = len(self.div_tree)
self.max_level = len(self.div_tree) - 1
self.current_level = 0
self.init_level = AlgorithmConfig.div_tree_init_level
if self.init_level < 0:
self.init_level = self.max_level
self.current_level_floating = 0.0
get_net = lambda: nn.Sequential(
nn.Linear(h_dim+self.n_agent, h_dim),
nn.ReLU(inplace=True),
LinearFinal(h_dim, n_action)
)
# Note: this is NOT net defining for each agent
# Instead, all agents starts from self.nets[0]
self.nets = torch.nn.ModuleList(modules=[
get_net() for i in range(self.n_agent)
])
def set_to_init_level(self, auto_transfer=True):
if self.init_level!=self.current_level:
for i in range(self.current_level, self.init_level):
self.change_div_tree_level(i+1, auto_transfer)
def change_div_tree_level(self, level, auto_transfer=True):
print('performing div tree level change (%d -> %d/%d) \n'%(self.current_level, level, self.max_level))
self.current_level = level
self.current_level_floating = level
assert len(self.div_tree) > self.current_level, ('Reach max level already!')
if not auto_transfer: return
transfer_list = []
for i in range(self.n_agent):
previous_net_index = self.div_tree[self.current_level-1, i]
post_net_index = self.div_tree[self.current_level, i]
if post_net_index!=previous_net_index:
transfer = (previous_net_index, post_net_index)
if transfer not in transfer_list:
transfer_list.append(transfer)
for transfer in transfer_list:
from_which_net = transfer[0]
to_which_net = transfer[1]
self.nets[to_which_net].load_state_dict(self.nets[from_which_net].state_dict())
print('transfering model parameters from %d-th net to %d-th net'%(from_which_net, to_which_net))
return
def forward(self, x_in, agent_ids): # x0: shape = (?,...,?, n_agent, core_dim)
if self.current_level == 0:
x0 = add_onehot_id_at_last_dim_fixlen(x_in, fixlen=self.n_agent, agent_ids=agent_ids)
x2 = self.nets[0](x0)
return x2, None
else:
x0 = add_onehot_id_at_last_dim_fixlen(x_in, fixlen=self.n_agent, agent_ids=agent_ids)
res = []
for i in range(self.n_agent):
use_which_net = self.div_tree[self.current_level, i]
res.append(self.nets[use_which_net](x0[..., i, :]))
x2 = torch.stack(res, -2)
# x22 = self.nets[0](x1)
return x2, None
# def forward_try_parallel(self, x0): # x0: shape = (?,...,?, n_agent, core_dim)
# x1 = self.shared_net(x0)
# stream = []
# res = []
# for i in range(self.n_agent):
# stream.append(torch.cuda.Stream())
# torch.cuda.synchronize()
# for i in range(self.n_agent):
# use_which_net = self.div_tree[self.current_level, i]
# with torch.cuda.stream(stream[i]):
# res.append(self.nets[use_which_net](x1[..., i, :]))
# print(res[i])
# # s1 = torch.cuda.Stream()
# # s2 = torch.cuda.Stream()
# # # Wait for the above tensors to initialise.
# # torch.cuda.synchronize()
# # with torch.cuda.stream(s1):
# # C = torch.mm(A, A)
# # with torch.cuda.stream(s2):
# # D = torch.mm(B, B)
# # Wait for C and D to be computed.
# torch.cuda.synchronize()
# # Do stuff with C and D.
# x2 = torch.stack(res, -2)
# return x2
def _2div(arr):
arr_res = arr.copy()
arr_pieces = []
pa = 0
st = 0
needdivcnt = 0
for i, a in enumerate(arr):
if a!=pa:
arr_pieces.append([st, i])
if (i-st)!=1: needdivcnt+=1
pa = a
st = i
arr_pieces.append([st, len(arr)])
if (len(arr)-st)!=1: needdivcnt+=1
offset = range(len(arr_pieces), len(arr_pieces)+needdivcnt)
p=0
for arr_p in arr_pieces:
length = arr_p[1] - arr_p[0]
if length == 1: continue
half_len = int(np.ceil(length / 2))
for j in range(arr_p[0]+half_len, arr_p[1]):
try:
arr_res[j] = offset[p]
except:
print('wtf')
p+=1
return arr_res
def get_division_tree(n_agents):
agent2divitreeindex = np.arange(n_agents)
np.random.shuffle(agent2divitreeindex)
max_div = np.ceil(np.log2(n_agents)).astype(int)
levels = np.zeros(shape=(max_div+1, n_agents), dtype=int)
tree_of_agent = []*(max_div+1)
for ith, level in enumerate(levels):
if ith == 0: continue
res = _2div(levels[ith-1,:])
levels[ith,:] = res
res_levels = levels.copy()
for i, div_tree_index in enumerate(agent2divitreeindex):
res_levels[:, i] = levels[:, div_tree_index]
return res_levels
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/ppo_ma/foundation.py
================================================
import os, time, torch, traceback, shutil
import numpy as np
from UTIL.colorful import *
from config import GlobalConfig
from UTIL.tensor_ops import repeat_at
from ALGORITHM.common.rl_alg_base import RLAlgorithmBase
class AlgorithmConfig:
'''
AlgorithmConfig: This config class will be 'injected' with new settings from json.
(E.g., override configs with ```python main.py --cfg example.jsonc```)
(please see UTIL.config_args to find out how this advanced trick works out.)
'''
# configuration, open to jsonc modification
gamma = 0.99
tau = 0.95
train_traj_needed = 512
TakeRewardAsUnity = False
use_normalization = True
add_prob_loss = False
n_entity_placeholder = 10
load_checkpoint = False
load_specific_checkpoint = ''
# PPO part
clip_param = 0.2
ppo_epoch = 16
n_pieces_batch_division = 1
value_loss_coef = 0.1
entropy_coef = 0.05
max_grad_norm = 0.5
clip_param = 0.2
lr = 1e-4
# sometimes the episode length gets longer,
# resulting in more samples and causing GPU OOM,
# prevent this by fixing the number of samples to initial
# by randomly sampling and droping
prevent_batchsize_oom = False
gamma_in_reward_forwarding = False
gamma_in_reward_forwarding_value = 0.99
net_hdim = 24
dual_conc = True
n_entity_placeholder = 'auto load, do not change'
n_agent = 'auto load, do not change'
entity_distinct = 'auto load, do not change'
ConfigOnTheFly = True
policy_resonance = False
use_avail_act = True
debug = False
def str_array_to_num(str_arr):
out_arr = []
buffer = {}
for str in str_arr:
if str not in buffer:
buffer[str] = len(buffer)
out_arr.append(buffer[str])
return out_arr
def itemgetter(*items):
# same with operator.itemgetter
def g(obj): return tuple(obj[item] if item in obj else None for item in items)
return g
class ReinforceAlgorithmFoundation(RLAlgorithmBase):
def __init__(self, n_agent, n_thread, space, mcv=None, team=None):
from .shell_env import ShellEnvWrapper, ActionConvertLegacy
from .net import Net
super().__init__(n_agent, n_thread, space, mcv, team)
AlgorithmConfig.n_agent = n_agent
# change obs format, e.g., converting dead agent obs into NaN
self.shell_env = ShellEnvWrapper(n_agent, n_thread, space, mcv, self, AlgorithmConfig, GlobalConfig.ScenarioConfig, self.team)
n_actions = len(self.shell_env.action_converter.dictionary_args)
if self.ScenarioConfig.EntityOriented:
AlgorithmConfig.n_entity_placeholder = GlobalConfig.ScenarioConfig.obs_n_entity
rawob_dim = self.ScenarioConfig.obs_vec_length
else: rawob_dim = space['obs_space']['obs_shape']
# self.StagePlanner, for policy resonance
from .stage_planner import StagePlanner
self.stage_planner = StagePlanner(mcv=mcv)
# initialize policy
self.policy = Net(rawob_dim=rawob_dim, n_action=n_actions, stage_planner=self.stage_planner)
self.policy = self.policy.to(self.device)
# initialize optimizer and trajectory (batch) manager
from .ppo import PPO
from .trajectory import BatchTrajManager
self.trainer = PPO(self.policy, ppo_config=AlgorithmConfig, mcv=mcv)
self.traj_manager = BatchTrajManager(
n_env=n_thread, traj_limit=int(GlobalConfig.ScenarioConfig.MaxEpisodeStep),
trainer_hook=self.trainer.train_on_traj)
self.stage_planner.trainer = self.trainer
# confirm that reward method is correct
self.check_reward_type(AlgorithmConfig)
# load checkpoints if needed
self.load_model(AlgorithmConfig)
# enable config_on_the_fly ability
if AlgorithmConfig.ConfigOnTheFly:
self._create_config_fly()
def action_making(self, StateRecall, test_mode):
# make sure hook is cleared
assert ('_hook_' not in StateRecall)
# read obs
obs, threads_active_flag, avail_act, eprsn = \
itemgetter('obs', 'threads_active_flag', 'avail_act', '_EpRsn_')(StateRecall)
# make sure obs is right
assert obs is not None, ('Make sure obs is ok')
assert len(obs) == sum(threads_active_flag), ('check batch size')
# make sure avail_act is correct
if AlgorithmConfig.use_avail_act: assert avail_act is not None
# policy resonance flag reshape
eprsn = repeat_at(eprsn, -1, self.n_agent)
thread_index = np.arange(self.n_thread)[threads_active_flag]
# make decision
with torch.no_grad():
action, value, action_log_prob = self.policy.act(obs=obs,
test_mode=test_mode,
avail_act=avail_act,
eprsn=eprsn,
)
# commit obs to buffer, vars named like _x_ are aligned, others are not!
traj_framefrag = {
"_SKIP_": ~threads_active_flag,
"value": value,
"avail_act": avail_act,
"actionLogProb": action_log_prob,
"obs": obs,
"action": action,
}
if avail_act is not None: traj_framefrag.update({'avail_act': avail_act})
# deal with rollout later when the reward is ready, leave a hook as a callback here
if not test_mode: StateRecall['_hook_'] = self.commit_traj_frag(traj_framefrag, req_hook = True)
return action.copy(), StateRecall
def interact_with_env(self, StateRecall):
'''
Interfacing with marl, standard method that you must implement
(redirect to shell_env to help with history rolling)
'''
return self.shell_env.interact_with_env(StateRecall)
def interact_with_env_genuine(self, StateRecall):
'''
When shell_env finish the preparation, interact_with_env_genuine is called
(Determine whether or not to do a training routinue)
'''
# if not StateRecall['Test-Flag']: self.train() # when needed, train!
return self.action_making(StateRecall, StateRecall['Test-Flag'])
def train(self):
'''
Get event from hmp task runner, save model now!
'''
if self.traj_manager.can_exec_training():
if self.stage_planner.can_exec_trainning():
self.traj_manager.train_and_clear_traj_pool()
else:
self.traj_manager.clear_traj_pool()
# read configuration
if AlgorithmConfig.ConfigOnTheFly: self._config_on_fly()
#
self.stage_planner.update_plan()
def save_model(self, update_cnt, info=None):
'''
save model now!
save if triggered when:
1. Update_cnt = 50, 100, ...
2. Given info, indicating a hmp command
3. A flag file is detected, indicating a save command from human
'''
if not os.path.exists('%s/history_cpt/' % GlobalConfig.logdir):
os.makedirs('%s/history_cpt/' % GlobalConfig.logdir)
# dir 1
pt_path = '%s/model.pt' % GlobalConfig.logdir
print绿('saving model to %s' % pt_path)
torch.save({
'policy': self.policy.state_dict(),
'optimizer': self.trainer.optimizer.state_dict(),
}, pt_path)
# dir 2
info = str(update_cnt) if info is None else ''.join([str(update_cnt), '_', info])
pt_path2 = '%s/history_cpt/model_%s.pt' % (GlobalConfig.logdir, info)
shutil.copyfile(pt_path, pt_path2)
print绿('save_model fin')
def load_model(self, AlgorithmConfig):
'''
load model now
'''
if AlgorithmConfig.load_checkpoint:
manual_dir = AlgorithmConfig.load_specific_checkpoint
ckpt_dir = '%s/model.pt' % GlobalConfig.logdir if manual_dir == '' else '%s/%s' % (GlobalConfig.logdir, manual_dir)
cuda_n = 'cpu' if 'cpu' in self.device else self.device
strict = True
cpt = torch.load(ckpt_dir, map_location=cuda_n)
self.policy.load_state_dict(cpt['policy'], strict=strict)
# https://github.com/pytorch/pytorch/issues/3852
self.trainer.optimizer.load_state_dict(cpt['optimizer'])
print黄('loaded checkpoint:', ckpt_dir)
def process_framedata(self, traj_framedata):
'''
hook is called when reward and next moment observation is ready,
now feed them into trajectory manager.
Rollout Processor | 准备提交Rollout, 以下划线开头和结尾的键值需要对齐(self.n_thread, ...)
note that keys starting with _ must have shape (self.n_thread, ...), details see fn:mask_paused_env()
'''
# strip info, since it is not array
items_to_pop = ['info', 'Latest-Obs']
for k in items_to_pop:
if k in traj_framedata:
traj_framedata.pop(k)
# the agent-wise reward is supposed to be the same, so averge them
if self.ScenarioConfig.RewardAsUnity:
traj_framedata['reward'] = repeat_at(traj_framedata['reward'], insert_dim=-1, n_times=self.n_agent)
# change the name of done to be recognised (by trajectory manager)
traj_framedata['_DONE_'] = traj_framedata.pop('done')
traj_framedata['_TOBS_'] = traj_framedata.pop(
'Terminal-Obs-Echo') if 'Terminal-Obs-Echo' in traj_framedata else None
# mask out pause thread
traj_framedata = self.mask_paused_env(traj_framedata)
# put the frag into memory
self.traj_manager.feed_traj_framedata(traj_framedata)
def mask_paused_env(self, frag):
running = ~frag['_SKIP_']
if running.all():
return frag
for key in frag:
if not key.startswith('_') and hasattr(frag[key], '__len__') and len(frag[key]) == self.n_thread:
frag[key] = frag[key][running]
return frag
def _create_config_fly(self):
logdir = GlobalConfig.logdir
self.input_file_dir = '%s/cmd_io.txt' % logdir
if not os.path.exists(self.input_file_dir):
with open(self.input_file_dir, 'w+', encoding='utf8') as f: f.writelines(["# Write cmd at next line: ", ""])
def _config_on_fly(self):
if not os.path.exists(self.input_file_dir): return
with open(self.input_file_dir, 'r', encoding='utf8') as f:
cmdlines = f.readlines()
cmdlines_writeback = []
any_change = False
for cmdline in cmdlines:
if cmdline.startswith('#') or cmdline=="\n" or cmdline==" \n":
cmdlines_writeback.append(cmdline)
else:
any_change = True
try:
print亮绿('[foundation.py] ------- executing: %s ------'%cmdline)
exec(cmdline)
cmdlines_writeback.append('# [execute successfully]\t'+cmdline)
except:
print红(traceback.format_exc())
cmdlines_writeback.append('# [execute failed]\t'+cmdline)
if any_change:
with open(self.input_file_dir, 'w+', encoding='utf8') as f:
f.writelines(cmdlines_writeback)
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/ppo_ma/net.py
================================================
import torch, math, copy
import numpy as np
import torch.nn as nn
from torch.distributions.categorical import Categorical
from UTIL.colorful import print亮绿
from UTIL.tensor_ops import Args2tensor_Return2numpy, Args2tensor, __hashn__, my_view
from UTIL.tensor_ops import pt_inf
from UTIL.exp_helper import changed
from .ccategorical import CCategorical
from .foundation import AlgorithmConfig
from ALGORITHM.common.attention import SimpleAttention
from ALGORITHM.common.norm import DynamicNormFix
from ALGORITHM.common.net_manifest import weights_init
"""
network initialize
"""
class Net(nn.Module):
def __init__(self, rawob_dim, n_action, **kwargs):
super().__init__()
self.update_cnt = nn.Parameter(
torch.zeros(1, requires_grad=False, dtype=torch.long), requires_grad=False)
self.use_normalization = AlgorithmConfig.use_normalization
self.use_policy_resonance = AlgorithmConfig.policy_resonance
self.n_action = n_action
if self.use_policy_resonance:
self.ccategorical = CCategorical(kwargs['stage_planner'])
self.is_resonance_active = lambda: kwargs['stage_planner'].is_resonance_active()
h_dim = AlgorithmConfig.net_hdim
# observation normalization
if self.use_normalization:
self._batch_norm = DynamicNormFix(rawob_dim, only_for_last_dim=True, exclude_one_hot=True, exclude_nan=True)
n_entity = AlgorithmConfig.n_entity_placeholder
# # # # # # # # # # actor-critic share # # # # # # # # # # # #
self.obs_encoder = nn.Sequential(nn.Linear(rawob_dim, h_dim), nn.ReLU(inplace=True), nn.Linear(h_dim, h_dim))
self.attention_layer = SimpleAttention(h_dim=h_dim)
# # # # # # # # # # actor # # # # # # # # # # # #
_size = n_entity * h_dim
self.policy_head = nn.Sequential(
nn.Linear(_size, h_dim), nn.ReLU(inplace=True),
nn.Linear(h_dim, h_dim//2), nn.ReLU(inplace=True),
nn.Linear(h_dim//2, self.n_action))
# # # # # # # # # # critic # # # # # # # # # # # #
_size = n_entity * h_dim
self.ct_encoder = nn.Sequential(nn.Linear(_size, h_dim), nn.ReLU(inplace=True), nn.Linear(h_dim, h_dim))
self.ct_attention_layer = SimpleAttention(h_dim=h_dim)
self.get_value = nn.Sequential(nn.Linear(h_dim, h_dim), nn.ReLU(inplace=True),nn.Linear(h_dim, 1))
self.is_recurrent = False
self.apply(weights_init)
return
@Args2tensor_Return2numpy
def act(self, *args, **kargs):
return self._act(*args, **kargs)
@Args2tensor
def evaluate_actions(self, *args, **kargs):
return self._act(*args, **kargs, eval_mode=True)
def _act(self, obs=None, test_mode=None, eval_mode=False, eval_actions=None, avail_act=None, agent_ids=None, eprsn=None):
eval_act = eval_actions if eval_mode else None
others = {}
if self.use_normalization:
if torch.isnan(obs).all(): pass
else: obs = self._batch_norm(obs, freeze=(eval_mode or test_mode))
mask_dead = torch.isnan(obs).any(-1)
obs = torch.nan_to_num_(obs, 0) # replace dead agents' obs, from NaN to 0
# # # # # # # # # # actor-critic share # # # # # # # # # # # #
baec = self.obs_encoder(obs)
baec = self.attention_layer(k=baec,q=baec,v=baec, mask=mask_dead)
# # # # # # # # # # actor # # # # # # # # # # # #
at_bac = my_view(baec,[0,0,-1])
logits = self.policy_head(at_bac)
# choose action selector
logit2act = self._logit2act_rsn if self.use_policy_resonance and self.is_resonance_active() else self._logit2act
# apply action selector
act, actLogProbs, distEntropy, probs = logit2act( logits,
eval_mode=eval_mode,
test_mode=test_mode,
eval_actions=eval_act,
avail_act=avail_act,
eprsn=eprsn)
# # # # # # # # # # critic # # # # # # # # # # # #
ct_bac = my_view(baec,[0,0,-1])
ct_bac = self.ct_encoder(ct_bac)
ct_bac = self.ct_attention_layer(k=ct_bac,q=ct_bac,v=ct_bac)
value = self.get_value(ct_bac)
if not eval_mode: return act, value, actLogProbs
else: return value, actLogProbs, distEntropy, probs, others
def _logit2act_rsn(self, logits_agent_cluster, eval_mode, test_mode, eval_actions=None, avail_act=None, eprsn=None):
if avail_act is not None: logits_agent_cluster = torch.where(avail_act>0, logits_agent_cluster, -pt_inf())
act_dist = self.ccategorical.feed_logits(logits_agent_cluster)
if not test_mode: act = self.ccategorical.sample(act_dist, eprsn) if not eval_mode else eval_actions
else: act = torch.argmax(act_dist.probs, axis=2)
# the policy gradient loss will feedback from here
actLogProbs = self._get_act_log_probs(act_dist, act)
# sum up the log prob of all agents
distEntropy = act_dist.entropy().mean(-1) if eval_mode else None
return act, actLogProbs, distEntropy, act_dist.probs
def _logit2act(self, logits_agent_cluster, eval_mode, test_mode, eval_actions=None, avail_act=None, **kwargs):
if avail_act is not None: logits_agent_cluster = torch.where(avail_act>0, logits_agent_cluster, -pt_inf())
act_dist = Categorical(logits = logits_agent_cluster)
if not test_mode: act = act_dist.sample() if not eval_mode else eval_actions
else: act = torch.argmax(act_dist.probs, axis=2)
actLogProbs = self._get_act_log_probs(act_dist, act) # the policy gradient loss will feedback from here
# sum up the log prob of all agents
distEntropy = act_dist.entropy().mean(-1) if eval_mode else None
return act, actLogProbs, distEntropy, act_dist.probs
@staticmethod
def _get_act_log_probs(distribution, action):
return distribution.log_prob(action.squeeze(-1)).unsqueeze(-1)
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/ppo_ma/ppo.py
================================================
import torch, math, traceback
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np
from random import randint, sample
from torch.utils.data.sampler import BatchSampler, SubsetRandomSampler
from UTIL.colorful import *
from UTIL.tensor_ops import _2tensor, __hash__, __hashn__
from config import GlobalConfig as cfg
from UTIL.gpu_share import GpuShareUnit
from .ppo_sampler import TrajPoolSampler
from VISUALIZE.mcom import mcom
class PPO():
def __init__(self, policy_and_critic, ppo_config, mcv=None):
self.policy_and_critic = policy_and_critic
self.clip_param = ppo_config.clip_param
self.ppo_epoch = ppo_config.ppo_epoch
self.use_avail_act = ppo_config.ppo_epoch
self.n_pieces_batch_division = ppo_config.n_pieces_batch_division
self.value_loss_coef = ppo_config.value_loss_coef
self.entropy_coef = ppo_config.entropy_coef
self.max_grad_norm = ppo_config.max_grad_norm
self.add_prob_loss = ppo_config.add_prob_loss
self.prevent_batchsize_oom = ppo_config.prevent_batchsize_oom
# self.freeze_body = ppo_config.freeze_body
self.lr = ppo_config.lr
self.all_parameter = list(policy_and_critic.named_parameters())
# if not self.freeze_body:
self.parameter = [p for p_name, p in self.all_parameter]
self.optimizer = optim.Adam(self.parameter, lr=self.lr)
self.g_update_delayer = 0
self.g_initial_value_loss = 0
# 轮流训练式
self.mcv = mcv
self.ppo_update_cnt = 0
self.batch_size_reminder = True
self.trivial_dict = {}
assert self.n_pieces_batch_division == 1
self.gpu_share_unit = GpuShareUnit(cfg.device, gpu_party=cfg.gpu_party)
def train_on_traj(self, traj_pool, task):
while True:
try:
with self.gpu_share_unit:
self.train_on_traj_(traj_pool, task)
break # 运行到这说明显存充足
except RuntimeError as err:
print(traceback.format_exc())
if self.prevent_batchsize_oom:
# in some cases, reversing MaxSampleNum a single time is not enough
if TrajPoolSampler.MaxSampleNum[-1] < 0: TrajPoolSampler.MaxSampleNum.pop(-1)
assert TrajPoolSampler.MaxSampleNum[-1] > 0
TrajPoolSampler.MaxSampleNum[-1] = -1
print亮红('Insufficient gpu memory, using previous sample size !')
else:
assert False
torch.cuda.empty_cache()
def train_on_traj_(self, traj_pool, task):
ppo_valid_percent_list = []
sampler = TrajPoolSampler(n_div=1, traj_pool=traj_pool, flag=task, prevent_batchsize_oom=self.prevent_batchsize_oom, mcv=self.mcv)
# before_training_hash = [__hashn__(t.parameters()) for t in (self.policy_and_critic._nets_flat_placeholder_)]
for e in range(self.ppo_epoch):
sample_iter = sampler.reset_and_get_iter()
self.optimizer.zero_grad()
# ! get traj fragment
sample = next(sample_iter)
# ! build graph, then update network
loss_final, others = self.establish_pytorch_graph(task, sample, e)
loss_final = loss_final*0.5
if e==0: print('[PPO.py] Memory Allocated %.2f GB'%(torch.cuda.memory_allocated()/1073741824))
loss_final.backward()
# log
ppo_valid_percent_list.append(others.pop('PPO valid percent').item())
self.log_trivial(dictionary=others); others = None
nn.utils.clip_grad_norm_(self.parameter, self.max_grad_norm)
self.optimizer.step()
if ppo_valid_percent_list[-1] < 0.70:
print亮黄('policy change too much, epoch terminate early'); break
pass # finish all epoch update
print亮黄(np.array(ppo_valid_percent_list))
self.log_trivial_finalize()
self.ppo_update_cnt += 1
return self.ppo_update_cnt
def freeze_body(self):
assert False, "function forbidden"
self.freeze_body = True
self.parameter_pv = [p_name for p_name, p in self.all_parameter if not any(p_name.startswith(kw) for kw in ('obs_encoder', 'attention_layer'))]
self.parameter = [p for p_name, p in self.all_parameter if not any(p_name.startswith(kw) for kw in ('obs_encoder', 'attention_layer'))]
self.optimizer = optim.Adam(self.parameter, lr=self.lr)
print('change train object')
def log_trivial(self, dictionary):
for key in dictionary:
if key not in self.trivial_dict: self.trivial_dict[key] = []
item = dictionary[key].item() if hasattr(dictionary[key], 'item') else dictionary[key]
self.trivial_dict[key].append(item)
def log_trivial_finalize(self, print=True):
for key in self.trivial_dict:
self.trivial_dict[key] = np.array(self.trivial_dict[key])
print_buf = ['[ppo.py] ']
for key in self.trivial_dict:
self.trivial_dict[key] = self.trivial_dict[key].mean()
print_buf.append(' %s:%.3f, '%(key, self.trivial_dict[key]))
if self.mcv is not None: self.mcv.rec(self.trivial_dict[key], key)
if print: print紫(''.join(print_buf))
if self.mcv is not None:
self.mcv.rec_show()
self.trivial_dict = {}
def establish_pytorch_graph(self, flag, sample, n):
obs = _2tensor(sample['obs'])
advantage = _2tensor(sample['advantage'])
action = _2tensor(sample['action'])
oldPi_actionLogProb = _2tensor(sample['actionLogProb'])
real_value = _2tensor(sample['return'])
avail_act = _2tensor(sample['avail_act']) if 'avail_act' in sample else None
# batchsize = advantage.shape[0]#; print亮紫(batchsize)
batch_agent_size = advantage.shape[0]*advantage.shape[1]
assert flag == 'train'
newPi_value, newPi_actionLogProb, entropy, probs, others = \
self.policy_and_critic.evaluate_actions(
obs=obs,
eval_actions=action,
test_mode=False,
avail_act=avail_act)
entropy_loss = entropy.mean()
n_actions = probs.shape[-1]
if self.add_prob_loss: assert n_actions <= 15 #
penalty_prob_line = (1/n_actions)*0.12
probs_loss = (penalty_prob_line - torch.clamp(probs, min=0, max=penalty_prob_line)).mean()
if not self.add_prob_loss:
probs_loss = torch.zeros_like(probs_loss)
# dual clip ppo core
E = newPi_actionLogProb - oldPi_actionLogProb
E_clip = torch.zeros_like(E)
E_clip = torch.where(advantage > 0, torch.clamp(E, max=np.log(1.0+self.clip_param)), E_clip)
E_clip = torch.where(advantage < 0, torch.clamp(E, min=np.log(1.0-self.clip_param), max=np.log(5) ), E_clip)
ratio = torch.exp(E_clip)
policy_loss = -(ratio*advantage).mean()
# add all loses
value_loss = 0.5 * F.mse_loss(real_value, newPi_value)
AT_net_loss = policy_loss - entropy_loss*self.entropy_coef # + probs_loss*20
CT_net_loss = value_loss * 1.0
# AE_new_loss = ae_loss * 1.0
loss_final = AT_net_loss + CT_net_loss # + AE_new_loss
ppo_valid_percent = ((E_clip == E).int().sum()/batch_agent_size)
nz_mask = real_value!=0
value_loss_abs = (real_value[nz_mask] - newPi_value[nz_mask]).abs().mean()
others = {
'Value loss Abs': value_loss_abs,
'PPO valid percent': ppo_valid_percent,
'CT_net_loss': CT_net_loss,
'AT_net_loss': AT_net_loss,
}
return loss_final, others
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/ppo_ma/ppo_sampler.py
================================================
import torch, math, traceback
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np
from random import randint, sample
from torch.utils.data.sampler import BatchSampler, SubsetRandomSampler
from UTIL.colorful import *
from UTIL.tensor_ops import _2tensor, __hash__, repeat_at
from config import GlobalConfig as cfg
from UTIL.gpu_share import GpuShareUnit
class TrajPoolSampler():
def __init__(self, n_div, traj_pool, flag, prevent_batchsize_oom=False, mcv=None):
self.n_pieces_batch_division = n_div
self.prevent_batchsize_oom = prevent_batchsize_oom
self.mcv = mcv
if self.prevent_batchsize_oom:
assert self.n_pieces_batch_division==1, ('?')
self.num_batch = None
self.container = {}
self.warned = False
assert flag=='train'
req_dict = ['avail_act', 'obs', 'action', 'actionLogProb', 'return', 'reward', 'value']
req_dict_rename = ['avail_act', 'obs', 'action', 'actionLogProb', 'return', 'reward', 'state_value']
return_rename = "return"
value_rename = "state_value"
advantage_rename = "advantage"
# replace 'obs' to 'obs > xxxx'
for key_index, key in enumerate(req_dict):
key_name = req_dict[key_index]
key_rename = req_dict_rename[key_index]
if not hasattr(traj_pool[0], key_name):
real_key_list = [real_key for real_key in traj_pool[0].__dict__ if (key_name+'>' in real_key)]
assert len(real_key_list) > 0, ('check variable provided!', key,key_index)
for real_key in real_key_list:
mainkey, subkey = real_key.split('>')
req_dict.append(real_key)
req_dict_rename.append(key_rename+'>'+subkey)
self.big_batch_size = -1 # vector should have same length, check it!
# load traj into a 'container'
for key_index, key in enumerate(req_dict):
key_name = req_dict[key_index]
key_rename = req_dict_rename[key_index]
if not hasattr(traj_pool[0], key_name): continue
set_item = np.concatenate([getattr(traj, key_name) for traj in traj_pool], axis=0)
if not (self.big_batch_size==set_item.shape[0] or (self.big_batch_size<0)):
print('error')
assert self.big_batch_size==set_item.shape[0] or (self.big_batch_size<0), (key,key_index)
self.big_batch_size = set_item.shape[0]
self.container[key_rename] = set_item # 指针赋值
# normalize advantage inside the batch
self.container[advantage_rename] = self.container[return_rename] - self.container[value_rename]
self.container[advantage_rename] = ( self.container[advantage_rename] - self.container[advantage_rename].mean() ) / (self.container[advantage_rename].std() + 1e-5)
# size of minibatch for each agent
self.mini_batch_size = math.ceil(self.big_batch_size / self.n_pieces_batch_division)
def __len__(self):
return self.n_pieces_batch_division
def determine_max_n_sample(self):
assert self.prevent_batchsize_oom
if not hasattr(TrajPoolSampler,'MaxSampleNum'):
# initialization
TrajPoolSampler.MaxSampleNum = [int(self.big_batch_size*(i+1)/50) for i in range(50)]
max_n_sample = self.big_batch_size
elif TrajPoolSampler.MaxSampleNum[-1] > 0:
# meaning that oom never happen, at least not yet
# only update when the batch size increases
if self.big_batch_size > TrajPoolSampler.MaxSampleNum[-1]: TrajPoolSampler.MaxSampleNum.append(self.big_batch_size)
max_n_sample = self.big_batch_size
else:
# meaning that oom already happened, choose TrajPoolSampler.MaxSampleNum[-2] to be the limit
assert TrajPoolSampler.MaxSampleNum[-2] > 0
max_n_sample = TrajPoolSampler.MaxSampleNum[-2]
return max_n_sample
def reset_and_get_iter(self):
if not self.prevent_batchsize_oom:
self.sampler = BatchSampler(SubsetRandomSampler(range(self.big_batch_size)), self.mini_batch_size, drop_last=False)
else:
max_n_sample = self.determine_max_n_sample()
n_sample = min(self.big_batch_size, max_n_sample)
if not hasattr(self,'reminded'):
self.reminded = True
drop_percent = (self.big_batch_size-n_sample)/self.big_batch_size*100
if self.mcv is not None:
self.mcv.rec(drop_percent, 'drop percent')
if drop_percent > 20:
print_ = print亮红
print_('droping %.1f percent samples..'%(drop_percent))
assert False, "GPU OOM!"
else:
print_ = print
print_('droping %.1f percent samples..'%(drop_percent))
self.sampler = BatchSampler(SubsetRandomSampler(range(n_sample)), n_sample, drop_last=False)
for indices in self.sampler:
selected = {}
for key in self.container:
selected[key] = self.container[key][indices]
for key in [key for key in selected if '>' in key]:
# 重新把子母键值组合成二重字典
mainkey, subkey = key.split('>')
if not mainkey in selected: selected[mainkey] = {}
selected[mainkey][subkey] = selected[key]
del selected[key]
yield selected
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/ppo_ma/shell_env.py
================================================
import numpy as np
from config import GlobalConfig
from UTIL.colorful import *
from UTIL.tensor_ops import my_view, __hash__, repeat_at, gather_righthand
from MISSION.uhmap.actset_lookup import encode_action_as_digits
from MISSION.uhmap.actionset_v3 import strActionToDigits, ActDigitLen
from .foundation import AlgorithmConfig
from .cython_func import roll_hisory
class ShellEnvConfig:
add_avail_act = False
class ActionConvertPredatorPrey():
def __init__(self, SELF_TEAM_ASSUME, OPP_TEAM_ASSUME, OPP_NUM_ASSUME) -> None:
self.dictionary_args = [
'ActionSet4::MoveToDirection;X=1.0 Y=0.0 Z=0.0',
'ActionSet4::MoveToDirection;X=1.0 Y=1.0 Z=0.0',
'ActionSet4::MoveToDirection;X=0.0 Y=1.0 Z=0.0',
'ActionSet4::MoveToDirection;X=-1.0 Y=1.0 Z=0.0',
'ActionSet4::MoveToDirection;X=-1.0 Y=0.0 Z=0.0',
'ActionSet4::MoveToDirection;X=-1.0 Y=-1.0 Z=0.0',
'ActionSet4::MoveToDirection;X=0.0 Y=-1.0 Z=0.0',
'ActionSet4::MoveToDirection;X=1.0 Y=-1.0 Z=0.0',
]
def convert_act_arr(self, type, a):
return strActionToDigits(self.dictionary_args[a])
def get_tp_avail_act(self, type):
DISABLE = 0
ENABLE = 1
n_act = len(self.dictionary_args)
ret = np.zeros(n_act) + ENABLE
return ret
def confirm_parameters_are_correct(self, team, agent_num, opp_agent_num):
pass
class ActionConvertLegacy():
def __init__(self, SELF_TEAM_ASSUME, OPP_TEAM_ASSUME, OPP_NUM_ASSUME) -> None:
self.SELF_TEAM_ASSUME = SELF_TEAM_ASSUME
self.OPP_TEAM_ASSUME = OPP_TEAM_ASSUME
self.OPP_NUM_ASSUME = OPP_NUM_ASSUME
# (main_cmd, sub_cmd, x=None, y=None, z=None, UID=None, T=None, T_index=None)
self.dictionary_args = [
('N/A', 'N/A', None, None, None, None, None, None), # 0
('Idle', 'DynamicGuard', None, None, None, None, None, None), # 1
('Idle', 'StaticAlert', None, None, None, None, None, None), # 2
('Idle', 'AsFarAsPossible', None, None, None, None, None, None), # 4
('Idle', 'StayWhenTargetInRange', None, None, None, None, None, None), # 5
('SpecificMoving', 'Dir+X', None, None, None, None, None, None), # 7
('SpecificMoving', 'Dir+Y', None, None, None, None, None, None), # 8
('SpecificMoving', 'Dir-X', None, None, None, None, None, None), # 9
('SpecificMoving', 'Dir-Y', None, None, None, None, None, None), # 10
]
for i in range(self.OPP_NUM_ASSUME):
self.dictionary_args.append( ('SpecificAttacking', 'N/A', None, None, None, None, OPP_TEAM_ASSUME, i) )
def convert_act_arr(self, type, a):
if type == 'RLA_UAV_Support':
args = self.dictionary_args[a]
# override wrong actions
if args[0] == 'SpecificAttacking':
return encode_action_as_digits('N/A', 'N/A', None, None, None, None, None, None)
# override incorrect actions
if args[0] == 'Idle':
return encode_action_as_digits('Idle', 'StaticAlert', None, None, None, None, None, None)
return encode_action_as_digits(*args)
else:
return encode_action_as_digits(*self.dictionary_args[a])
def get_tp_avail_act(self, type):
DISABLE = 0
ENABLE = 1
n_act = len(self.dictionary_args)
ret = np.zeros(n_act) + ENABLE
for i in range(n_act):
args = self.dictionary_args[i]
# for all kind of agents
if args[0] == 'PatrolMoving': ret[i] = DISABLE
if type == 'RLA_UAV_Support':
if args[0] == 'PatrolMoving': ret[i] = DISABLE
if args[0] == 'SpecificAttacking': ret[i] = DISABLE
if args[0] == 'Idle': ret[i] = DISABLE
if args[1] == 'StaticAlert': ret[i] = ENABLE
return ret
def confirm_parameters_are_correct(self, team, agent_num, opp_agent_num):
assert team == self.SELF_TEAM_ASSUME
assert self.SELF_TEAM_ASSUME + self.OPP_TEAM_ASSUME == 1
assert self.SELF_TEAM_ASSUME + self.OPP_TEAM_ASSUME == 1
assert opp_agent_num == self.OPP_NUM_ASSUME
def count_list_type(x):
type_cnt = {}
for xx in x:
if xx not in type_cnt: type_cnt[xx] = 0
type_cnt[xx] += 1
return len(type_cnt)
class ShellEnvWrapper(object):
def __init__(self, n_agent, n_thread, space, mcv, rl_functional, alg_config, ScenarioConfig, team):
self.n_agent = n_agent
self.n_thread = n_thread
self.team = team
self.space = space
self.mcv = mcv
self.rl_functional = rl_functional
if GlobalConfig.ScenarioConfig.EntityOriented:
self.core_dim = GlobalConfig.ScenarioConfig.obs_vec_length
else:
self.core_dim = space['obs_space']['obs_shape']
self.n_entity_placeholder = alg_config.n_entity_placeholder
# whether to use avail_act to block forbiden actions
self.AvailActProvided = False
if hasattr(ScenarioConfig, 'AvailActProvided'):
self.AvailActProvided = ScenarioConfig.AvailActProvided
if GlobalConfig.ScenarioConfig.SubTaskSelection in ['UhmapLargeScale', 'UhmapHuge', 'UhmapBreakingBad']:
ActionToDiscreteConverter = ActionConvertLegacy
else:
ActionToDiscreteConverter = ActionConvertPredatorPrey
self.action_converter = ActionToDiscreteConverter(
SELF_TEAM_ASSUME=team,
OPP_TEAM_ASSUME=(1-team),
OPP_NUM_ASSUME=GlobalConfig.ScenarioConfig.N_AGENT_EACH_TEAM[1-team]
)
# check parameters
self.patience = 2000
def interact_with_env(self, StateRecall):
if not hasattr(self, 'agent_type'):
self.agent_uid = GlobalConfig.ScenarioConfig.AGENT_ID_EACH_TEAM[self.team]
self.agent_type = [agent_meta['type']
for agent_meta in StateRecall['Latest-Team-Info'][0]['dataArr']
if agent_meta['uId'] in self.agent_uid]
if ShellEnvConfig.add_avail_act:
self.avail_act = np.stack(tuple(self.action_converter.get_tp_avail_act(tp) for tp in self.agent_type))
self.avail_act = repeat_at(self.avail_act, insert_dim=0, n_times=self.n_thread)
act = np.zeros(shape=(self.n_thread, self.n_agent), dtype=np.int) - 1 # 初始化全部为 -1
# read internal coop graph info
obs = StateRecall['Latest-Obs']
obs = my_view(obs,[0, 0, -1, self.core_dim])
obs[(obs==0).all(-1)] = np.nan
n_entity_raw = obs.shape[-2]
AlgorithmConfig.entity_distinct = [list(range(1)), list(range(1,n_entity_raw)), list(range(n_entity_raw,2*n_entity_raw))]
P = StateRecall['ENV-PAUSE']
R = ~P
RST = StateRecall['Env-Suffered-Reset']
# when needed, train!
if not StateRecall['Test-Flag']: self.rl_functional.train()
if RST.all():
# just experienced full reset on all episode, this is the first step of all env threads
# randomly pick threads
eprsn_yita = self.rl_functional.stage_planner.yita if AlgorithmConfig.policy_resonance else 0
EpRsn = np.random.rand(self.n_thread) < eprsn_yita
StateRecall['_EpRsn_'] = EpRsn
# prepare observation for the real RL algorithm
obs_feed = obs[R]
I_StateRecall = {
'obs':obs_feed,
'avail_act':self.avail_act[R],
'Test-Flag':StateRecall['Test-Flag'],
'_EpRsn_':StateRecall['_EpRsn_'][R],
'threads_active_flag':R,
'Latest-Team-Info':StateRecall['Latest-Team-Info'][R],
}
# load available act to limit action space if possible
if self.AvailActProvided:
avail_act = np.array([info['avail-act'] for info in np.array(StateRecall['Latest-Team-Info'][R], dtype=object)])
I_StateRecall.update({'avail_act':avail_act})
# the real RL algorithm ! !
act_active, internal_recall = self.rl_functional.interact_with_env_genuine(I_StateRecall)
# get decision results
act[R] = act_active
# confirm actions are valid (satisfy 'avail-act')
if ShellEnvConfig.add_avail_act and self.patience>0:
self.patience -= 1
assert (gather_righthand(self.avail_act, repeat_at(act, -1, 1), check=False)[R]==1).all()
# translate action into ue4 tuple action
act_converted = np.array([[ self.action_converter.convert_act_arr(self.agent_type[agentid], act) for agentid, act in enumerate(th) ] for th in act])
# swap thread(batch) axis and agent axis
actions_list = np.swapaxes(act_converted, 0, 1)
# register callback hook
if not StateRecall['Test-Flag']:
StateRecall['_hook_'] = internal_recall['_hook_']
assert StateRecall['_hook_'] is not None
return actions_list, StateRecall
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/ppo_ma/stage_planner.py
================================================
import math
from .foundation import AlgorithmConfig
from UTIL.colorful import *
class PolicyRsnConfig:
resonance_start_at_update = 10
yita_min_prob = 0.15 # should be >= (1/n_action)
yita_max = 0.75
yita_inc_per_update = 0.0075 # (increase to 0.75 in 500 updates)
freeze_critic = False
yita_shift_method = '-sin'
yita_shift_cycle = 1000
class StagePlanner:
def __init__(self, mcv) -> None:
if AlgorithmConfig.policy_resonance:
self.resonance_active = False
self.yita = 0
self.yita_min_prob = PolicyRsnConfig.yita_min_prob
self.freeze_body = False
self.update_cnt = 0
self.mcv = mcv
self.trainer = None
def is_resonance_active(self,):
return self.resonance_active
def is_body_freeze(self,):
return self.freeze_body
def get_yita(self):
return self.yita
def get_yita_min_prob(self):
return PolicyRsnConfig.yita_min_prob
def can_exec_trainning(self):
return True
def update_plan(self):
self.update_cnt += 1
if AlgorithmConfig.policy_resonance:
if self.resonance_active:
self.when_pr_active()
elif not self.resonance_active:
self.when_pr_inactive()
return
def activate_pr(self):
self.resonance_active = True
self.freeze_body = True
if PolicyRsnConfig.freeze_critic:
self.trainer.freeze_body()
def when_pr_inactive(self):
assert not self.resonance_active
if PolicyRsnConfig.resonance_start_at_update >= 0:
# mean need to activate pr later
if self.update_cnt > PolicyRsnConfig.resonance_start_at_update:
# time is up, activate pr
self.activate_pr()
# log
pr = 1 if self.resonance_active else 0
self.mcv.rec(pr, 'resonance')
self.mcv.rec(self.yita, 'self.yita')
def when_pr_active(self):
assert self.resonance_active
self._update_yita()
# log
pr = 1 if self.resonance_active else 0
self.mcv.rec(pr, 'resonance')
self.mcv.rec(self.yita, 'self.yita')
def _update_yita(self):
'''
increase self.yita by @yita_inc_per_update per function call
'''
if PolicyRsnConfig.yita_shift_method == '-cos':
self.yita = PolicyRsnConfig.yita_max
t = -math.cos(2*math.pi/PolicyRsnConfig.yita_shift_cycle * self.update_cnt) * PolicyRsnConfig.yita_max
if t<=0:
self.yita = 0
else:
self.yita = t
print亮绿('yita update:', self.yita)
elif PolicyRsnConfig.yita_shift_method == '-sin':
self.yita = PolicyRsnConfig.yita_max
t = -math.sin(2*math.pi/PolicyRsnConfig.yita_shift_cycle * self.update_cnt) * PolicyRsnConfig.yita_max
if t<=0:
self.yita = 0
else:
self.yita = t
print亮绿('yita update:', self.yita)
elif PolicyRsnConfig.yita_shift_method == 'slow-inc':
self.yita += PolicyRsnConfig.yita_inc_per_update
if self.yita > PolicyRsnConfig.yita_max:
self.yita = PolicyRsnConfig.yita_max
print亮绿('yita update:', self.yita)
else:
assert False
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/ppo_ma/trajectory.py
================================================
# cython: language_level=3
from config import GlobalConfig
import numpy as np
from numpy.core.numeric import indices
from .foundation import AlgorithmConfig
from ALGORITHM.common.traj import TRAJ_BASE
import copy
from UTIL.colorful import *
from UTIL.tensor_ops import __hash__, my_view, np_one_hot, np_repeat_at, np_softmax, scatter_with_nan
class trajectory(TRAJ_BASE):
def __init__(self, traj_limit, env_id):
super().__init__(traj_limit, env_id)
self.reference_track_name = 'value'
def early_finalize(self):
assert not self.readonly_lock # unfinished traj
self.need_reward_bootstrap = True
def set_terminal_obs(self, tobs):
self.tobs = copy.deepcopy(tobs)
def cut_tail(self):
# 删去多余的预留空间
super().cut_tail()
TJ = lambda key: getattr(self, key)
# 进一步地, 根据这个轨迹上的NaN,删除所有无效时间点
reference_track = getattr(self, self.reference_track_name)
if self.need_reward_bootstrap:
assert False, ('it should not go here if everything goes as expected')
# print('need_reward_bootstrap') 找到最后一个不是nan的位置
T = np.where(~np.isnan(reference_track.squeeze()))[0][-1]
self.boot_strap_value = {
'bootstrap_value':TJ('value').squeeze()[T].copy(),
}
assert not hasattr(self,'tobs')
self.set_terminal_obs(TJ('g_obs')[T].copy())
reference_track[T] = np.nan
# deprecated if nothing in it
p_invalid = np.isnan(my_view(reference_track, [0, -1])).any(axis=-1)
p_valid = ~p_invalid
if p_invalid.all(): #invalid traj
self.deprecated_flag = True
return
# adjust reward position
reward = TJ('reward')
for i in reversed(range(self.time_pointer)):
if p_invalid[i] and i != 0: # invalid, push reward forward
reward[i-1] += reward[i]; reward[i] = np.nan
setattr(self, 'reward', reward)
# clip NaN
for key in self.key_dict: setattr(self, key, TJ(key)[p_valid])
# all done
return
def reward_push_forward(self, dead_mask):
# self.new_reward = self.reward.copy()
if AlgorithmConfig.gamma_in_reward_forwarding:
gamma = AlgorithmConfig.gamma_in_reward_forwarding_value
for i in reversed(range(self.time_pointer)):
if i==0: continue
self.reward[i-1] += np.where(dead_mask[i], self.reward[i]*gamma, 0) # if dead_mask[i]==True, this frame is invalid, move reward forward, set self.reward[i] to 0
self.reward[i] = np.where(dead_mask[i], 0, self.reward[i]) # if dead_mask[i]==True, this frame is invalid, move reward forward, set self.reward[i] to 0
else:
for i in reversed(range(self.time_pointer)):
if i==0: continue
self.reward[i-1] += np.where(dead_mask[i], self.reward[i], 0) # if dead_mask[i]==True, this frame is invalid, move reward forward, set self.reward[i] to 0
self.reward[i] = np.where(dead_mask[i], 0, self.reward[i]) # if dead_mask[i]==True, this frame is invalid, move reward forward, set self.reward[i] to 0
return
# new finalize
def finalize(self):
self.readonly_lock = True
assert not self.deprecated_flag
TJ = lambda key: getattr(self, key)
assert not np.isnan(TJ('reward')).any()
# deadmask
tmp = np.isnan(my_view(self.obs, [0,0,-1]))
dead_mask = tmp.all(-1)
# if (True): # check if the mask is correct
# dead_mask_self = np.isnan(my_view(self.obs, [0,0,-1])[:,:,0])
# assert (dead_mask==dead_mask_self).all()
# dead_mask2 = tmp.any(-1)
# assert (dead_mask==dead_mask2).all()
self.reward_push_forward(dead_mask) # push terminal reward forward 38 42 54
threat = np.zeros(shape=dead_mask.shape) - 1
assert dead_mask.shape[0] == self.time_pointer
for i in reversed(range(self.time_pointer)):
# threat[:(i+1)] 不包含threat[(i+1)]
if i+1 < self.time_pointer:
threat[:(i+1)] += (~(dead_mask[i+1]&dead_mask[i])).astype(np.int)
elif i+1 == self.time_pointer:
threat[:] += (~dead_mask[i]).astype(np.int)
SAFE_LIMIT = 11
threat = np.clip(threat, -1, SAFE_LIMIT)
setattr(self, 'threat', np.expand_dims(threat, -1))
# ! Use GAE to calculate return
self.gae_finalize_return(reward_key='reward', value_key='value', new_return_name='return')
return
def gae_finalize_return(self, reward_key, value_key, new_return_name):
# ------- gae parameters -------
gamma = AlgorithmConfig.gamma
tau = AlgorithmConfig.tau
# ------- -------------- -------
rewards = getattr(self, reward_key)
value = getattr(self, value_key)
length = rewards.shape[0]
assert rewards.shape[0]==value.shape[0]
# if dimension not aligned
if rewards.ndim == value.ndim-1: rewards = np.expand_dims(rewards, -1)
# initalize two more tracks
setattr(self, new_return_name, np.zeros_like(value))
self.key_dict.append(new_return_name)
returns = getattr(self, new_return_name)
boot_strap = 0 if not self.need_reward_bootstrap else self.boot_strap_value['bootstrap_'+value_key]
for step in reversed(range(length)):
if step==(length-1): # 最后一帧
value_preds_delta = rewards[step] + gamma * boot_strap - value[step]
gae = value_preds_delta
else:
value_preds_delta = rewards[step] + gamma * value[step + 1] - value[step]
gae = value_preds_delta + gamma * tau * gae
returns[step] = gae + value[step]
class TrajPoolManager(object):
def __init__(self):
self.cnt = 0
def absorb_finalize_pool(self, pool):
for traj_handle in pool:
traj_handle.cut_tail()
pool = list(filter(lambda traj: not traj.deprecated_flag, pool))
for traj_handle in pool: traj_handle.finalize()
self.cnt += 1
task = ['train']
return task, pool
'''
轨迹池管理
'''
class TrajManagerBase(object):
def __init__(self, n_env, traj_limit):
self.n_env = n_env
self.traj_limit = traj_limit
self.update_cnt = 0
self.traj_pool = []
self.registered_keys = []
self.live_trajs = [trajectory(self.traj_limit, env_id=i) for i in range(self.n_env)]
self.live_traj_frame = [0 for _ in range(self.n_env)]
self._traj_lock_buf = None
self.patience = 1000
pass
def __check_integraty(self, traj_frag):
if self.patience < 0:
return # stop wasting time checking this
self.patience -= 1
for key in traj_frag:
if key not in self.registered_keys and (not key.startswith('_')):
self.registered_keys.append(key)
for key in self.registered_keys:
assert key in traj_frag, ('this key sometimes disappears from the traj_frag:', key)
def batch_update(self, traj_frag):
self.__check_integraty(traj_frag)
done = traj_frag['_DONE_']; traj_frag.pop('_DONE_') # done flag
skip = traj_frag['_SKIP_']; traj_frag.pop('_SKIP_') # skip/frozen flag
tobs = traj_frag['_TOBS_']; traj_frag.pop('_TOBS_') # terminal obs
# single bool to list bool
if isinstance(done, bool): done = [done for i in range(self.n_env)]
if isinstance(skip, bool): skip = [skip for i in range(self.n_env)]
n_active = sum(~skip)
# feed
cnt = 0
for env_i in range(self.n_env):
if skip[env_i]: continue
# otherwise
frag_index = cnt; cnt += 1
env_index = env_i
traj_handle = self.live_trajs[env_index]
for key in traj_frag:
self.traj_remember(traj_handle, key=key, content=traj_frag[key],frag_index=frag_index, n_active=n_active)
self.live_traj_frame[env_index] += 1
traj_handle.time_shift()
if done[env_i]:
assert tobs[env_i] is not None # get the final obs
traj_handle.set_terminal_obs(tobs[env_i])
self.traj_pool.append(traj_handle)
self.live_trajs[env_index] = trajectory(self.traj_limit, env_id=env_index)
self.live_traj_frame[env_index] = 0
def traj_remember(self, traj, key, content, frag_index, n_active):
if content is None: traj.remember(key, None)
elif isinstance(content, dict):
for sub_key in content:
self.traj_remember(traj, "".join((key , ">" , sub_key)), content=content[sub_key], frag_index=frag_index, n_active=n_active)
else:
assert n_active == len(content), ('length error')
traj.remember(key, content[frag_index]) # *
class BatchTrajManager(TrajManagerBase):
def __init__(self, n_env, traj_limit, trainer_hook):
super().__init__(n_env, traj_limit)
self.trainer_hook = trainer_hook
self.traj_limit = traj_limit
self.train_traj_needed = AlgorithmConfig.train_traj_needed
self.pool_manager = TrajPoolManager()
def update(self, traj_frag, index):
assert traj_frag is not None
for j, env_i in enumerate(index):
traj_handle = self.live_trajs[env_i]
for key in traj_frag:
if traj_frag[key] is None:
assert False, key
if isinstance(traj_frag[key], dict): # 如果是二重字典,特殊处理
for sub_key in traj_frag[key]:
content = traj_frag[key][sub_key][j]
traj_handle.remember(key + ">" + sub_key, content)
else:
content = traj_frag[key][j]
traj_handle.remember(key, content)
self.live_traj_frame[env_i] += 1
traj_handle.time_shift()
return
# 函数入口
def feed_traj_framedata(self, traj_frag, require_hook=False):
# an unlock hook must be executed before new trajectory feed in
assert self._traj_lock_buf is None
if require_hook:
# the traj_frag is not intact, lock up traj_frag, wait for more
assert '_SKIP_' in traj_frag
assert '_DONE_' not in traj_frag
assert 'reward' not in traj_frag
self._traj_lock_buf = traj_frag
return self.unlock_fn
else:
assert '_DONE_' in traj_frag
assert '_SKIP_' in traj_frag
self.batch_update(traj_frag=traj_frag)
return
def clear_traj_pool(self):
print('do update %d'%self.update_cnt)
_, self.traj_pool = self.pool_manager.absorb_finalize_pool(pool=self.traj_pool)
self.traj_pool = []
# self.update_cnt += 1
# assert ppo_update_cnt == self.update_cnt
return self.update_cnt
def train_and_clear_traj_pool(self):
print('do update %d'%self.update_cnt)
current_task_l, self.traj_pool = self.pool_manager.absorb_finalize_pool(pool=self.traj_pool)
for current_task in current_task_l:
ppo_update_cnt = self.trainer_hook(self.traj_pool, current_task)
self.traj_pool = []
self.update_cnt += 1
# assert ppo_update_cnt == self.update_cnt
return self.update_cnt
def can_exec_training(self):
if len(self.traj_pool) >= self.train_traj_needed: return True
else: return False
def unlock_fn(self, traj_frag):
assert self._traj_lock_buf is not None
traj_frag.update(self._traj_lock_buf)
self._traj_lock_buf = None
assert '_DONE_' in traj_frag
assert '_SKIP_' in traj_frag
self.batch_update(traj_frag=traj_frag)
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/random/actionset.py
================================================
# ====================================================================
# random moving
# ====================================================================
import numpy as np
from MISSION.uhmap.actionset_v3 import strActionToDigits, ActDigitLen
from MISSION.uhmap.actset_lookup import encode_action_as_digits
class ActionConvertV1Dummy():
def __init__(self, SELF_TEAM_ASSUME, OPP_TEAM_ASSUME, OPP_NUM_ASSUME) -> None:
self.SELF_TEAM_ASSUME = SELF_TEAM_ASSUME
self.OPP_TEAM_ASSUME = OPP_TEAM_ASSUME
self.OPP_NUM_ASSUME = OPP_NUM_ASSUME
# (main_cmd, sub_cmd, x=None, y=None, z=None, UID=None, T=None, T_index=None)
self.dictionary_args = [
'ActionSet1::N/A;N/A' ,
]
self.ActDigitLen = ActDigitLen
self.n_act = len(self.dictionary_args)
def convert_act_arr(self, type, a):
return strActionToDigits(self.dictionary_args[a])
def get_tp_avail_act(self, type):
DISABLE = 0
ENABLE = 1
n_act = len(self.dictionary_args)
ret = np.zeros(n_act) + ENABLE
for i in range(n_act):
args = self.dictionary_args[i]
# for all kind of agents
if args[0] == 'PatrolMoving': ret[i] = DISABLE
if type == 'RLA_UAV_Support':
if args[0] == 'PatrolMoving': ret[i] = DISABLE
if args[0] == 'SpecificAttacking': ret[i] = DISABLE
if args[0] == 'Idle': ret[i] = DISABLE
if args[1] == 'StaticAlert': ret[i] = ENABLE
return ret
def confirm_parameters_are_correct(self, team, agent_num, opp_agent_num):
assert team == self.SELF_TEAM_ASSUME
assert self.SELF_TEAM_ASSUME + self.OPP_TEAM_ASSUME == 1
assert self.SELF_TEAM_ASSUME + self.OPP_TEAM_ASSUME == 1
assert opp_agent_num == self.OPP_NUM_ASSUME
class ActionConvertV1Carrier():
def __init__(self, SELF_TEAM_ASSUME, OPP_TEAM_ASSUME, OPP_NUM_ASSUME) -> None:
self.SELF_TEAM_ASSUME = SELF_TEAM_ASSUME
self.OPP_TEAM_ASSUME = OPP_TEAM_ASSUME
self.OPP_NUM_ASSUME = OPP_NUM_ASSUME
# (main_cmd, sub_cmd, x=None, y=None, z=None, UID=None, T=None, T_index=None)
self.dictionary_args = [
'ActionSet1::N/A;N/A' ,
'ActionSet1::Special;Detach' ,
'ActionSet1::Idle;DynamicGuard' ,
'ActionSet1::Idle;StaticAlert' ,
'ActionSet1::Idle;AggressivePersue' ,
'ActionSet1::Idle;AsFarAsPossible' ,
'ActionSet1::Idle;StayWhenTargetInRange' ,
'ActionSet1::Idle;StayWhenTargetInHalfRange' ,
'ActionSet1::SpecificMoving;Dir+X' ,
'ActionSet1::SpecificMoving;Dir+Y' ,
'ActionSet1::SpecificMoving;Dir-X' ,
'ActionSet1::SpecificMoving;Dir-Y' ,
'ActionSet1::PatrolMoving;Dir+X' ,
'ActionSet1::PatrolMoving;Dir+Y' ,
'ActionSet1::PatrolMoving;Dir-X' ,
'ActionSet1::PatrolMoving;Dir-Y' ,
]
for i in range(self.OPP_NUM_ASSUME):
self.dictionary_args.append( f'ActionSet1::SpecificAttacking;T{OPP_TEAM_ASSUME}-{i}')
self.ActDigitLen = ActDigitLen
self.n_act = len(self.dictionary_args)
def convert_act_arr(self, type, a):
return strActionToDigits(self.dictionary_args[a])
def get_tp_avail_act(self, type):
DISABLE = 0
ENABLE = 1
n_act = len(self.dictionary_args)
ret = np.zeros(n_act) + ENABLE
for i in range(n_act):
args = self.dictionary_args[i]
# for all kind of agents
if args[0] == 'PatrolMoving': ret[i] = DISABLE
if type == 'RLA_UAV_Support':
if args[0] == 'PatrolMoving': ret[i] = DISABLE
if args[0] == 'SpecificAttacking': ret[i] = DISABLE
if args[0] == 'Idle': ret[i] = DISABLE
if args[1] == 'StaticAlert': ret[i] = ENABLE
return ret
def confirm_parameters_are_correct(self, team, agent_num, opp_agent_num):
assert team == self.SELF_TEAM_ASSUME
assert self.SELF_TEAM_ASSUME + self.OPP_TEAM_ASSUME == 1
assert self.SELF_TEAM_ASSUME + self.OPP_TEAM_ASSUME == 1
assert opp_agent_num == self.OPP_NUM_ASSUME
class ActionConvertV1Momentum():
def __init__(self, SELF_TEAM_ASSUME, OPP_TEAM_ASSUME, OPP_NUM_ASSUME) -> None:
self.SELF_TEAM_ASSUME = SELF_TEAM_ASSUME
self.OPP_TEAM_ASSUME = OPP_TEAM_ASSUME
self.OPP_NUM_ASSUME = OPP_NUM_ASSUME
# (main_cmd, sub_cmd, x=None, y=None, z=None, UID=None, T=None, T_index=None)
self.dictionary_args = [
'ActionSet1::MoveToDirection2D@Z;X=1.0 Y=0.0 Z=700.0',
'ActionSet1::MoveToDirection2D@Z;X=1.0 Y=1.0 Z=700.0',
'ActionSet1::MoveToDirection2D@Z;X=0.0 Y=1.0 Z=700.0',
'ActionSet1::MoveToDirection2D@Z;X=-1.0 Y=1.0 Z=700.0',
'ActionSet1::MoveToDirection2D@Z;X=-1.0 Y=0.0 Z=700.0',
'ActionSet1::MoveToDirection2D@Z;X=-1.0 Y=-1.0 Z=700.0',
'ActionSet1::MoveToDirection2D@Z;X=0.0 Y=-1.0 Z=700.0',
'ActionSet1::MoveToDirection2D@Z;X=1.0 Y=-1.0 Z=700.0',
]
self.ActDigitLen = ActDigitLen
self.n_act = len(self.dictionary_args)
def convert_act_arr(self, type, a):
return strActionToDigits(self.dictionary_args[a])
def get_tp_avail_act(self, type):
DISABLE = 0
ENABLE = 1
n_act = len(self.dictionary_args)
ret = np.zeros(n_act) + ENABLE
return ret
def confirm_parameters_are_correct(self, team, agent_num, opp_agent_num):
assert team == self.SELF_TEAM_ASSUME
assert self.SELF_TEAM_ASSUME + self.OPP_TEAM_ASSUME == 1
assert self.SELF_TEAM_ASSUME + self.OPP_TEAM_ASSUME == 1
assert opp_agent_num == self.OPP_NUM_ASSUME
class ActionConvertMovingV4():
def __init__(self, SELF_TEAM_ASSUME, OPP_TEAM_ASSUME, OPP_NUM_ASSUME) -> None:
self.dictionary_args = [
'ActionSet4::MoveToDirection;X=1.0 Y=0.0 Z=0.0',
'ActionSet4::MoveToDirection;X=1.0 Y=1.0 Z=0.0',
'ActionSet4::MoveToDirection;X=0.0 Y=1.0 Z=0.0',
'ActionSet4::MoveToDirection;X=-1.0 Y=1.0 Z=0.0',
'ActionSet4::MoveToDirection;X=-1.0 Y=0.0 Z=0.0',
'ActionSet4::MoveToDirection;X=-1.0 Y=-1.0 Z=0.0',
'ActionSet4::MoveToDirection;X=0.0 Y=-1.0 Z=0.0',
'ActionSet4::MoveToDirection;X=1.0 Y=-1.0 Z=0.0',
]
self.n_act = len(self.dictionary_args)
self.ActDigitLen = ActDigitLen
def convert_act_arr(self, type, a):
return strActionToDigits(self.dictionary_args[a])
def get_tp_avail_act(self, type):
DISABLE = 0; ENABLE = 1
ret = np.zeros(self.n_act) + ENABLE # enable all
return ret
class CarrierAction():
def __init__(self, SELF_TEAM_ASSUME, OPP_TEAM_ASSUME, OPP_NUM_ASSUME) -> None:
self.dictionary_args = [
'ActionSet4::MoveToDirection;X=1.0 Y=0.0 Z=0.0',
'ActionSet4::MoveToDirection;X=1.0 Y=1.0 Z=0.0',
'ActionSet4::MoveToDirection;X=0.0 Y=1.0 Z=0.0',
'ActionSet4::MoveToDirection;X=-1.0 Y=1.0 Z=0.0',
'ActionSet4::MoveToDirection;X=-1.0 Y=0.0 Z=0.0',
'ActionSet4::MoveToDirection;X=-1.0 Y=-1.0 Z=0.0',
'ActionSet4::MoveToDirection;X=0.0 Y=-1.0 Z=0.0',
'ActionSet4::MoveToDirection;X=1.0 Y=-1.0 Z=0.0',
]
self.n_act = len(self.dictionary_args)
self.ActDigitLen = ActDigitLen
def convert_act_arr(self, type, a):
return strActionToDigits(self.dictionary_args[a])
def get_tp_avail_act(self, type):
DISABLE = 0; ENABLE = 1
ret = np.zeros(self.n_act) + ENABLE # enable all
return ret
class ActionConvertV2():
def __init__(self, SELF_TEAM_ASSUME, OPP_TEAM_ASSUME, OPP_NUM_ASSUME) -> None:
self.SELF_TEAM_ASSUME = SELF_TEAM_ASSUME
self.OPP_TEAM_ASSUME = OPP_TEAM_ASSUME
self.OPP_NUM_ASSUME = OPP_NUM_ASSUME
# (main_cmd, sub_cmd, x=None, y=None, z=None, UID=None, T=None, T_index=None)
self.dictionary_args = [
'ActionSet2::N/A;N/A' ,
'ActionSet2::Idle;DynamicGuard' ,
'ActionSet2::Idle;StaticAlert' ,
'ActionSet2::Idle;AggressivePersue' ,
'ActionSet2::Idle;AsFarAsPossible' ,
'ActionSet2::Idle;StayWhenTargetInRange' ,
'ActionSet2::Idle;StayWhenTargetInHalfRange' ,
'ActionSet2::SpecificMoving;Dir+X' ,
'ActionSet2::SpecificMoving;Dir+Y' ,
'ActionSet2::SpecificMoving;Dir-X' ,
'ActionSet2::SpecificMoving;Dir-Y' ,
'ActionSet2::PatrolMoving;Dir+X' ,
'ActionSet2::PatrolMoving;Dir+Y' ,
'ActionSet2::PatrolMoving;Dir-X' ,
'ActionSet2::PatrolMoving;Dir-Y' ,
]
for i in range(self.OPP_NUM_ASSUME):
self.dictionary_args.append( f'ActionSet2::SpecificAttacking;T{OPP_TEAM_ASSUME}-{i}')
self.ActDigitLen = ActDigitLen
self.n_act = len(self.dictionary_args)
def convert_act_arr(self, type, a):
return strActionToDigits(self.dictionary_args[a])
def get_tp_avail_act(self, type):
DISABLE = 0
ENABLE = 1
n_act = len(self.dictionary_args)
ret = np.zeros(n_act) + ENABLE
for i in range(n_act):
args = self.dictionary_args[i]
# for all kind of agents
if args[0] == 'PatrolMoving': ret[i] = DISABLE
if type == 'RLA_UAV_Support':
if args[0] == 'PatrolMoving': ret[i] = DISABLE
if args[0] == 'SpecificAttacking': ret[i] = DISABLE
if args[0] == 'Idle': ret[i] = DISABLE
if args[1] == 'StaticAlert': ret[i] = ENABLE
return ret
def confirm_parameters_are_correct(self, team, agent_num, opp_agent_num):
assert team == self.SELF_TEAM_ASSUME
assert self.SELF_TEAM_ASSUME + self.OPP_TEAM_ASSUME == 1
assert self.SELF_TEAM_ASSUME + self.OPP_TEAM_ASSUME == 1
assert opp_agent_num == self.OPP_NUM_ASSUME
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/random/foundation.py
================================================
import numpy as np
from UTIL.colorful import *
from UTIL.tensor_ops import my_view, __hash__
from config import GlobalConfig
class AlgorithmConfig:
preserve = ''
# 改变自身颜色的动作 ChangeColor(color_index)
# 运动动作 MoveToDirection(x, y, z)
# 提高一段时间的加速度的动作 AccHighLevel4
# 发射武器动作 FireToWaterDrop(water_drop_uid)
class RandomController(object):
def __init__(self, n_agent, n_thread, space, mcv=None, team=None):
self.n_agent = n_agent
self.n_thread = n_thread
self.space = space
self.mcv = mcv
self.n_action = GlobalConfig.ScenarioConfig.n_actions
def interact_with_env(self, StateRecall):
obs = StateRecall['Latest-Obs']
P = StateRecall['ENV-PAUSE']
active_thread_obs = obs[~P]
actions = np.random.randint(low=0,high=self.n_action, size=(self.n_thread, self.n_agent, 1))
StateRecall['_hook_'] = None
return actions, StateRecall
class DummyRandomControllerWithActionSetV1(object):
def __init__(self, n_agent, n_thread, space, mcv=None, team=None):
from .actionset import ActionConvertV1Dummy
self.n_agent = n_agent
self.n_thread = n_thread
self.space = space
self.mcv = mcv
self.actions_set = ActionConvertV1Dummy(
SELF_TEAM_ASSUME=team,
OPP_TEAM_ASSUME=(1-team),
OPP_NUM_ASSUME=GlobalConfig.ScenarioConfig.N_AGENT_EACH_TEAM[1-team]
)
self.n_action = self.actions_set.n_act
def interact_with_env(self, StateRecall):
obs = StateRecall['Latest-Obs']
P = StateRecall['ENV-PAUSE']
active_thread_obs = obs[~P]
actions = np.random.randint(low=0,high=self.n_action, size=(self.n_thread, self.n_agent, 1))
act_converted = np.array(
list(map(lambda x: self.actions_set.convert_act_arr(None, x),
actions.flatten()))).reshape(self.n_thread, self.n_agent, self.actions_set.ActDigitLen)
StateRecall['_hook_'] = None
return act_converted, StateRecall
class RandomControllerWithActionSetV2(object):
def __init__(self, n_agent, n_thread, space, mcv=None, team=None):
from .actionset import ActionConvertV2
self.n_agent = n_agent
self.n_thread = n_thread
self.space = space
self.mcv = mcv
self.actions_set = ActionConvertV2(
SELF_TEAM_ASSUME=team,
OPP_TEAM_ASSUME=(1-team),
OPP_NUM_ASSUME=GlobalConfig.ScenarioConfig.N_AGENT_EACH_TEAM[1-team]
)
self.n_action = self.actions_set.n_act
def interact_with_env(self, StateRecall):
obs = StateRecall['Latest-Obs']
P = StateRecall['ENV-PAUSE']
active_thread_obs = obs[~P]
actions = np.random.randint(low=0,high=self.n_action, size=(self.n_thread, self.n_agent, 1))
act_converted = np.array(
list(map(lambda x: self.actions_set.convert_act_arr(None, x),
actions.flatten()))).reshape(self.n_thread, self.n_agent, self.actions_set.ActDigitLen)
StateRecall['_hook_'] = None
return act_converted, StateRecall
class RandomControllerWithActionSetV4(object):
def __init__(self, n_agent, n_thread, space, mcv=None, team=None):
from .actionset import ActionConvertMovingV4
self.n_agent = n_agent
self.n_thread = n_thread
self.space = space
self.mcv = mcv
self.actions_set = ActionConvertMovingV4(
SELF_TEAM_ASSUME=team,
OPP_TEAM_ASSUME=(1-team),
OPP_NUM_ASSUME=GlobalConfig.ScenarioConfig.N_AGENT_EACH_TEAM[1-team]
)
self.n_action = self.actions_set.n_act
def interact_with_env(self, StateRecall):
obs = StateRecall['Latest-Obs']
P = StateRecall['ENV-PAUSE']
active_thread_obs = obs[~P]
actions = np.random.randint(low=0,high=self.n_action, size=(self.n_thread, self.n_agent, 1))
act_converted = np.array(
list(map(lambda x: self.actions_set.convert_act_arr(None, x),
actions.flatten()))).reshape(self.n_thread, self.n_agent, self.actions_set.ActDigitLen)
StateRecall['_hook_'] = None
return act_converted, StateRecall
class RandomControllerWithActionSetV1(object):
def __init__(self, n_agent, n_thread, space, mcv=None, team=None):
from .actionset import ActionConvertV1Carrier
self.n_agent = n_agent
self.n_thread = n_thread
self.space = space
self.mcv = mcv
self.actions_set = ActionConvertV1Carrier(
SELF_TEAM_ASSUME=team,
OPP_TEAM_ASSUME=(1-team),
OPP_NUM_ASSUME=GlobalConfig.ScenarioConfig.N_AGENT_EACH_TEAM[1-team]
)
self.n_action = self.actions_set.n_act
def interact_with_env(self, StateRecall):
obs = StateRecall['Latest-Obs']
P = StateRecall['ENV-PAUSE']
active_thread_obs = obs[~P]
actions = np.random.randint(low=0,high=self.n_action, size=(self.n_thread, self.n_agent, 1))
act_converted = np.array(
list(map(lambda x: self.actions_set.convert_act_arr(None, x),
actions.flatten()))).reshape(self.n_thread, self.n_agent, self.actions_set.ActDigitLen)
StateRecall['_hook_'] = None
return act_converted, StateRecall
class RandomControllerWithMomentumAgent(object):
def __init__(self, n_agent, n_thread, space, mcv=None, team=None):
from .actionset import ActionConvertV1Momentum
self.n_agent = n_agent
self.n_thread = n_thread
self.space = space
self.mcv = mcv
self.actions_set = ActionConvertV1Momentum(
SELF_TEAM_ASSUME=team,
OPP_TEAM_ASSUME=(1-team),
OPP_NUM_ASSUME=GlobalConfig.ScenarioConfig.N_AGENT_EACH_TEAM[1-team]
)
self.n_action = self.actions_set.n_act
def interact_with_env(self, StateRecall):
obs = StateRecall['Latest-Obs']
P = StateRecall['ENV-PAUSE']
active_thread_obs = obs[~P]
actions = np.random.randint(low=0,high=self.n_action, size=(self.n_thread, self.n_agent, 1))
act_converted = np.array(
list(map(lambda x: self.actions_set.convert_act_arr(None, x),
actions.flatten()))).reshape(self.n_thread, self.n_agent, self.actions_set.ActDigitLen)
StateRecall['_hook_'] = None
return act_converted, StateRecall
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/script_ai/a_attackpost.py
================================================
import numpy as np
from UTIL.colorful import *
from UTIL.tensor_ops import my_view, __hash__
from config import GlobalConfig
from MISSION.uhmap.actionset_v3 import strActionToDigits, ActDigitLen
class AlgorithmConfig:
preserve = ''
class DummyAlgorithmBase():
def __init__(self, n_agent, n_thread, space, mcv=None, team=None):
self.n_agent = n_agent
self.n_thread = n_thread
self.ScenarioConfig = GlobalConfig.ScenarioConfig
self.attack_order = {}
self.team = team
def forward(self, inp, state, mask=None):
raise NotImplementedError
def to(self, device):
return self
def interact_with_env(self, State_Recall):
assert State_Recall['Latest-Obs'] is not None, ('make sure obs is ok')
ENV_PAUSE = State_Recall['ENV-PAUSE']
ENV_ACTIVE = ~ENV_PAUSE
assert self.n_thread == len(ENV_ACTIVE), ('the number of thread is wrong?')
n_active_thread = sum(ENV_ACTIVE)
# assert len(State_Recall['Latest-Obs']) == n_active_thread, ('make sure we have the right batch of obs')
actions = np.zeros(shape=(self.n_thread, self.n_agent, ActDigitLen))
# set actions of in-active threads to NaN (will be done again in multi_team.py, this line is not necessary)
actions[ENV_PAUSE] = np.nan
# swap (self.n_thread, self.n_agent) -> (self.n_agent, self.n_thread)
actions = np.swapaxes(actions, 0, 1)
return actions, {}
class AttackPostPreprogramBaseline(DummyAlgorithmBase):
def interact_with_env(self, State_Recall):
assert State_Recall['Latest-Obs'] is not None, ('make sure obs is ok')
ENV_PAUSE = State_Recall['ENV-PAUSE']
ENV_ACTIVE = ~ENV_PAUSE
assert self.n_thread == len(ENV_ACTIVE), ('the number of thread is wrong?')
n_active_thread = sum(ENV_ACTIVE)
AirCarrierUID = 2
# assert len(State_Recall['Latest-Obs']) == n_active_thread, ('make sure we have the right batch of obs')
actions = np.zeros(shape=(self.n_thread, self.n_agent, ActDigitLen))
self_agent_uid_range = GlobalConfig.ScenarioConfig.AGENT_ID_EACH_TEAM[self.team]
for thread in range(self.n_thread):
if ENV_PAUSE[thread]:
# 如果,该线程停止,不做任何处理
continue
x_arr = np.array([d['agentLocationArr'][0] for d in np.array(State_Recall['Latest-Team-Info'][thread]['dataArr'])[self_agent_uid_range]])
x_arr_valid = np.array([x for x in x_arr if np.isfinite(x)])
x_avg = x_arr_valid.mean()
for index, x in enumerate(x_arr):
if not np.isfinite(x): pass
if x > x_avg-1000:
actions[thread, index] = strActionToDigits(f'ActionSet2::SpecificAttacking;T1-0')
else:
actions[thread, index] = strActionToDigits(f'ActionSet2::Idle;DynamicGuard')
# actions[thread, :] = strActionToDigits(f'ActionSet2::SpecificAttacking;T1-0')
# set actions of in-active threads to NaN (will be done again in multi_team.py, this line is not necessary)
actions[ENV_PAUSE] = np.nan
return actions, {}
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/script_ai/a_escape.py
================================================
import numpy as np
from UTIL.colorful import *
from UTIL.tensor_ops import my_view, __hash__
from config import GlobalConfig
from MISSION.uhmap.actionset_v3 import strActionToDigits, ActDigitLen
class AlgorithmConfig:
preserve = ''
class DummyAlgorithmBase():
def __init__(self, n_agent, n_thread, space, mcv=None, team=None):
self.n_agent = n_agent
self.n_thread = n_thread
self.ScenarioConfig = GlobalConfig.ScenarioConfig
self.attack_order = {}
self.team = team
def forward(self, inp, state, mask=None):
raise NotImplementedError
def to(self, device):
return self
def interact_with_env(self, State_Recall):
assert State_Recall['Latest-Obs'] is not None, ('make sure obs is ok')
ENV_PAUSE = State_Recall['ENV-PAUSE']
ENV_ACTIVE = ~ENV_PAUSE
assert self.n_thread == len(ENV_ACTIVE), ('the number of thread is wrong?')
n_active_thread = sum(ENV_ACTIVE)
# assert len(State_Recall['Latest-Obs']) == n_active_thread, ('make sure we have the right batch of obs')
actions = np.zeros(shape=(self.n_thread, self.n_agent, ActDigitLen))
# set actions of in-active threads to NaN (will be done again in multi_team.py, this line is not necessary)
actions[ENV_PAUSE] = np.nan
# swap (self.n_thread, self.n_agent) -> (self.n_agent, self.n_thread)
actions = np.swapaxes(actions, 0, 1)
return actions, {}
class EscapeGreenPreprogramBaseline(DummyAlgorithmBase):
def interact_with_env(self, State_Recall):
assert State_Recall['Latest-Obs'] is not None, ('make sure obs is ok')
ENV_PAUSE = State_Recall['ENV-PAUSE']
ENV_ACTIVE = ~ENV_PAUSE
assert self.n_thread == len(ENV_ACTIVE), ('the number of thread is wrong?')
n_active_thread = sum(ENV_ACTIVE)
AirCarrierUID = 2
# assert len(State_Recall['Latest-Obs']) == n_active_thread, ('make sure we have the right batch of obs')
actions = np.zeros(shape=(self.n_thread, self.n_agent, ActDigitLen))
self_agent_uid_range = GlobalConfig.ScenarioConfig.AGENT_ID_EACH_TEAM[self.team]
for thread in range(self.n_thread):
if ENV_PAUSE[thread]:
# 如果,该线程停止,不做任何处理
continue
x_arr = np.array([d['agentLocationArr'][0] for d in np.array(State_Recall['Latest-Team-Info'][thread]['dataArr'])[self_agent_uid_range]])
x_arr_valid = np.array([x for x in x_arr if np.isfinite(x)])
x_avg = x_arr_valid.mean()
for index, x in enumerate(x_arr):
if not np.isfinite(x): pass
actions[thread, index] = strActionToDigits(f'ActionSet2::SpecificAttacking;T1-0')
# actions[thread, :] = strActionToDigits(f'ActionSet2::SpecificAttacking;T1-0')
# set actions of in-active threads to NaN (will be done again in multi_team.py, this line is not necessary)
actions[ENV_PAUSE] = np.nan
return actions, {}
class EscapeRedPreprogramBaseline(DummyAlgorithmBase):
def interact_with_env(self, State_Recall):
assert State_Recall['Latest-Obs'] is not None, ('make sure obs is ok')
ENV_PAUSE = State_Recall['ENV-PAUSE']
ENV_ACTIVE = ~ENV_PAUSE
assert self.n_thread == len(ENV_ACTIVE), ('the number of thread is wrong?')
actions = np.zeros(shape=(self.n_thread, self.n_agent, ActDigitLen))
self_agent_uid_range = GlobalConfig.ScenarioConfig.AGENT_ID_EACH_TEAM[self.team]
for thread in range(self.n_thread):
if ENV_PAUSE[thread]:
# 如果,该线程停止,不做任何处理
continue
x_arr = np.array([d['agentLocationArr'][0] for d in np.array(State_Recall['Latest-Team-Info'][thread]['dataArr'])[self_agent_uid_range]])
x_arr_valid = np.array([x for x in x_arr if np.isfinite(x)])
x_avg = x_arr_valid.mean()
for index, x in enumerate(x_arr):
if not np.isfinite(x): pass
if index <2:
actions[thread, index] = strActionToDigits(f'ActionSet4::MoveToDirection;X=-1.0 Y=0.0 Z=0.0')
else:
actions[thread, index] = strActionToDigits(f'ActionSet4::MoveToDirection;X=+1.0 Y=0.0 Z=0.0')
# set actions of in-active threads to NaN (will be done again in multi_team.py, this line is not necessary)
actions[ENV_PAUSE] = np.nan
return actions, {}
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/script_ai/a_test_reproduce.py
================================================
import numpy as np
from UTIL.colorful import *
from UTIL.tensor_ops import my_view, __hash__
from config import GlobalConfig
from MISSION.uhmap.actionset_v3 import strActionToDigits, ActDigitLen
class AlgorithmConfig:
preserve = ''
class DummyAlgorithmBase():
def __init__(self, n_agent, n_thread, space, mcv=None, team=None):
self.n_agent = n_agent
self.n_thread = n_thread
self.ScenarioConfig = GlobalConfig.ScenarioConfig
self.attack_order = {}
self.team = team
def forward(self, inp, state, mask=None):
raise NotImplementedError
def to(self, device):
return self
def interact_with_env(self, State_Recall):
assert State_Recall['Latest-Obs'] is not None, ('make sure obs is ok')
ENV_PAUSE = State_Recall['ENV-PAUSE']
ENV_ACTIVE = ~ENV_PAUSE
assert self.n_thread == len(ENV_ACTIVE), ('the number of thread is wrong?')
n_active_thread = sum(ENV_ACTIVE)
# assert len(State_Recall['Latest-Obs']) == n_active_thread, ('make sure we have the right batch of obs')
actions = np.zeros(shape=(self.n_thread, self.n_agent, ActDigitLen))
# set actions of in-active threads to NaN (will be done again in multi_team.py, this line is not necessary)
actions[ENV_PAUSE] = np.nan
# swap (self.n_thread, self.n_agent) -> (self.n_agent, self.n_thread)
actions = np.swapaxes(actions, 0, 1)
return actions, {}
pre_def_color = [
'(R=1,G=0,B=0,A=1)',
'(R=0,G=1,B=0,A=1)',
'(R=0,G=0,B=1,A=1)',
]
sel_l = [
[-8, -8, -4, -3, -5, -5, -4, -2, 0, 0, 1, 2, 3, 0, 4, 4, 3, 5, 6, 8, -7, -5, -6, -6, -3, -3, -2, -1, -1, 0, 1, 1, 0, 4, 5, 6, 5, 6, 6, 4, -7, -6, -5, -6, -3, -4, -3, -2, -1, 0, 0, -1, 0, 1, 4, 3, 5, 6, 5, 6, -7, -8, -5, -4, -4, -2, -1, 0, 0, -1, 0, 1, 2, 0, 3, 5, 2, 4, 4, 8, -7, -6, -5, -6, -3, -4, 0, -2, -1, 0, -1, -1, 1, 2, 1, 2, 6, 4, 3, 5, -7, -6, -5, -4, -4, -3, -3, -2, -3, 1, 1, 1, 3, 2, 2, 6, 5, 3, 5, 7, -5, -5, -2, -3, -4, -2, -4, -1, 0, -1, 0, 0, 2, 1, 5, 1, 2, 3, 6, 6, -7, -6, -7, -5, -4, -1, -2, -5, -2, -1, -1, 1, 1, 4, 3, 4, 4, 4, 5, 7, -8, -5, -4, -2, -3, -1, 0, -1, 1, -1, 2, -1, 3, 1, 0, 2, 5, 4, 4, 5, -6, -5, -3, -4, -3, -4, -2, 0, -1, -3, 0, 2, 2, -1, 2, 5, 5, 3, 5, 4],
[-5, -6, -3, -4, -3, -4, -2, -2, -1, 1, -1, 2, 1, 2, 5, 4, 3, 2, 5, 8, -8, -8, -6, -2, -3, -2, -3, -1, 1, -2, 2, 1, 1, 3, 3, 3, 3, 4, 7, 6, -6, -5, -5, -7, -2, -2, -2, -4, -2, -1, 0, 1, 2, 2, 5, 3, 7, 5, 4, 7, -8, -8, -3, -4, -4, -4, -3, -3, -2, 1, -2, 1, 2, 1, 2, 4, 4, 5, 6, 7, -7, -6, -4, -3, -4, -3, -1, 1, -1, 0, 0, 0, 4, 2, 2, 3, 4, 5, 5, 5, -5, -5, -5, -3, -2, -2, -3, -2, -1, 0, 0, 2, 3, 3, 3, 2, 5, 5, 4, 6, -8, -6, -6, -3, -3, -2, 0, -2, -1, 2, 0, 2, 2, 2, 3, 2, 1, 4, 4, 7, -8, -6, -6, -3, -2, -3, -2, -1, 1, -1, 2, 3, 1, 2, 2, 3, 2, 3, 3, 8, -6, -6, -5, -2, -2, -2, -2, -1, 0, -1, -1, 2, 3, 2, 0, 3, 3, 5, 6, 8, -7, -5, -3, -5, -5, -4, -1, -2, 0, 0, 1, 1, 0, 1, 1, 3, 4, 3, 3, 5],
[-8, -8, -5, -6, -1, -2, -2, 0, -2, -2, 0, 2, 2, 2, 5, 2, 3, 6, 7, 6, -8, -8, -4, -5, -4, -5, -2, -1, -1, -1, 1, 0, 3, 1, 3, 5, 5, 7, 5, 7, -7, -6, -5, -5, -7, -2, -1, 0, -1, -2, 1, 1, 0, 1, 3, 3, 6, 4, 5, 7, -8, -7, -6, -4, -3, -3, -2, -1, -1, -1, 0, 1, 0, 0, 3, 3, 4, 5, 5, 8, -6, -5, -6, -3, -4, -3, -3, -2, -1, 1, 0, 0, 1, 2, 2, 4, 5, 5, 4, 5, -8, -4, -7, -6, -3, -2, -3, -3, 1, 0, 0, 1, 1, 2, 2, 4, 4, 5, 6, 6, -5, -5, -3, -5, -4, -4, -1, -1, -1, -1, 0, 1, 4, 4, 6, 3, 4, 4, 5, 7, -6, -7, -5, -4, -3, -4, -1, -2, 0, -1, 1, 1, 1, 3, 2, 3, 4, 3, 3, 5, -7, -8, -5, -5, -3, -3, -3, -3, -2, 0, 0, 2, 1, 2, 3, 2, 3, 4, 7, 6, -8, -5, -4, -4, -4, -4, -1, -4, 0, -1, 1, 0, 0, 1, 4, 1, 3, 4, 6, 6],
[-7, -6, -4, -6, -4, -4, -4, -2, -2, 1, -1, 1, 3, 3, 3, 4, 3, 6, 6, 8, -7, -5, -6, -7, -4, -3, -4, -2, 0, -1, 0, 2, 2, 0, 3, 4, 5, 5, 6, 7, -7, -6, -7, -3, -4, -3, -1, -5, -1, 0, -1, 1, 1, 2, 2, 3, 5, 5, 8, 6, -6, -6, -6, -4, -3, -2, -4, -2, -2, 2, 1, 1, 2, 0, 3, 4, 5, 5, 5, 7, -7, -5, -4, -3, -7, -2, -2, -2, -1, 0, 1, 1, 1, 4, 4, 4, 5, 4, 4, 6, -5, -5, -5, -4, -2, -3, -4, -1, 0, -1, -2, 1, 0, 2, 3, 3, 5, 6, 7, 6, -7, -5, -5, -2, -3, -3, -3, 1, 0, -2, 0, -1, 2, 2, 3, 4, 4, 4, 6, 7, -8, -6, -6, -4, -4, -2, -2, -2, -2, -1, 0, 0, 1, 2, 2, 3, 4, 2, 5, 5, -6, -4, -5, -4, -3, -3, -1, -1, -2, -2, 0, 1, 2, 2, 4, 5, 6, 5, 6, 5, -7, -5, -4, -2, -3, -4, -2, -2, -2, -1, 2, 1, 1, 2, 3, 3, 4, 4, 4, 6],
]
class TestReproduce(DummyAlgorithmBase):
def __init__(self, n_agent, n_thread, space, mcv=None, team=None):
super().__init__(n_agent, n_thread, space, mcv, team)
self.episode = -1
def interact_with_env(self, State_Recall):
assert State_Recall['Latest-Obs'] is not None, ('make sure obs is ok')
ENV_PAUSE = State_Recall['ENV-PAUSE']
RST = State_Recall['Env-Suffered-Reset']
ENV_ACTIVE = ~ENV_PAUSE
assert self.n_thread == len(ENV_ACTIVE), ('the number of thread is wrong?')
if all(RST):
self.episode += 1
# assert len(State_Recall['Latest-Obs']) == n_active_thread, ('make sure we have the right batch of obs')
actions = np.zeros(shape=(self.n_thread, self.n_agent, ActDigitLen))
self_agent_uid_range = GlobalConfig.ScenarioConfig.AGENT_ID_EACH_TEAM[self.team]
for thread in range(self.n_thread):
if ENV_PAUSE[thread]:
continue
sel_l_ = [] # 1
x_arr_ = np.array([d['agentLocationArr'][0] for d in np.array(State_Recall['Latest-Team-Info'][thread]['dataArr'])[self_agent_uid_range]]) # 1
for a in range(self.n_agent):
sel = sel_l[self.episode][a] # 2
sel_ = (x_arr_[a] + 35) // 70 # 1
sel_l_.append(int(sel_)) # 1
actions[thread, a] = strActionToDigits(f'ActionSet1::ChangeColor;{pre_def_color[int(sel)%3]}')
print(sel_l[self.episode][:10], sel_l_[:10])
print(sel_l_)
# set actions of in-active threads to NaN (will be done again in multi_team.py, this line is not necessary)
actions[ENV_PAUSE] = np.nan
return actions, {}
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/script_ai/assignment.py
================================================
import copy
import random
import numpy as np
import datetime
import time
from ALGORITHM.script_ai.module_evaluation import *
from ALGORITHM.script_ai.global_params import *
class TaskAssign(object):
"""docstring for TaskAssign"""
def __init__(self, attackers, drone, defenders):
super(TaskAssign, self).__init__()
self.attackers = attackers
self.drone = drone
self.defenders = defenders
self.evaluator = Evaluation_module()
# params
self.ratio_thres = 1
# UGV attack(get defender target id)
def assign_attack(self, attack_IDlist):
# simple strategy -- one opponent
max_stlist = []
max_idlist = []
def_idlist = []
for ID, attr in self.defenders.items():
if 'dead' not in attr['state']:
def_idlist.append(ID)
for att in attack_IDlist:
inner_stance_list = [self.evaluator.UAV2UAV_id('offensive', self.attackers[att], defender) for defender in self.defenders.values() if 'dead' not in defender['state']]
# print(inner_stance_list)
inner_max_stance = max(inner_stance_list)
max_stlist.append(inner_max_stance)
max_idlist.append(inner_stance_list.index(inner_max_stance))
max_stance = max(max_stlist)
target_id = def_idlist[max_idlist[max_stlist.index(max_stance)]]
return target_id
# # UAV hold(hold 1)
# def assign_drone_initial(self, ):
# # check for key points
# all_defender_pos = []
# min_dist = []
# for attr in self.defenders.values():
# all_defender_pos.append([attr['X'], attr['Y']])
# if len(all_defender_pos)>0:
# for key_point in key_points:
# min_dist.append(min([np.linalg.norm(np.array(key_point)-np.array(enemy_pos)) for enemy_pos in all_defender_pos]))
# if max(min_dist) > DRIVE_AWAY_DIST:
# return ['running', min_dist.index(max(min_dist))]
# else:
# return ['running', 0]
# else:
# return ['running', 0]
# UAV hold(running back and forth)
def assign_drone_ini(self, ):
# check for key points
all_defender_pos = []
min_dist = []
for attr in self.defenders.values():
all_defender_pos.append([attr['X'], attr['Y']])
if len(all_defender_pos) > 0:
for key_point in key_points:
min_dist.append(
min([np.linalg.norm(np.array(key_point) - np.array(enemy_pos)) for enemy_pos in all_defender_pos]))
return ['running', min_dist.index(max(min_dist))]
else:
return ['running', 0]
# all_defender_pos = []
# drone_dicvalue=[]
# drone_pos=[]
# min_dist = []
# for attr in self.defenders.values():
# all_defender_pos.append([attr['X'], attr['Y']])
# for attr in self.drone.values():
# drone_dicvalue.append(attr)
# drone_pos.append([drone_dicvalue[2],drone_dicvalue[3]])
# drone_pos.append([drone_dicvalue[2], drone_dicvalue[3]])
# if len(all_defender_pos)>0:
# for key_point in key_points:
# drone2point=[np.linalg.norm(np.array(key_point) - np.array(pos)) for pos in drone_pos]
# defender2point=[np.linalg.norm(np.array(key_point) - np.array(enemy_pos)) for enemy_pos in all_defender_pos]
# min_dist.append(min([a/b for a,b in zip(drone2point,defender2point)]))
# return ['running', min_dist.index(min(min_dist))]
# else:
# return ['running', 0]
def judge_expeled(self, ):
all_defender_pos = []
# drone_pos = [self.drone['X'], self.drone['Y']]
key_point_idx = self.drone['state'][1]
key_point_pos = key_points[key_point_idx]
for attr in self.defenders.values():
all_defender_pos.append([attr['X'], attr['Y']])
min_dist = min([np.linalg.norm(np.array(key_point_pos) - np.array(def_pos)) for def_pos in all_defender_pos])
if min_dist < DRIVE_AWAY_DIST:
return True
else:
return False
# # UGV defend(negetive defend)
# def assign_defend(self, def_ID):
# defend = self.defenders[def_ID]
# defend_pos = [defend['X'], defend['Y']]
# all_attacker_pos = []
# all_attacker_ids = []
# for ID, attr in self.attackers.items():
# all_attacker_pos.append([attr['X'], attr['Y']])
# all_attacker_ids.append(ID)
# if len(all_attacker_pos) > 0:
# dist_list = [np.linalg.norm(np.array(defend_pos)-np.array(attack_pos)) for attack_pos in all_attacker_pos]
# min_dist = min(dist_list)
# if min_dist < DEFEND_DIST:
# return ['attack', all_attacker_ids[dist_list.index(min_dist)]]
# else:
# return None
# else:
# return None
# UGV defend(active defend)
# def defend(self, def_ID):
# attack_ids = [ID for ID, attr in self.attackers.items() if 'dead' not in attr['state']]
# defend_ids = [ID for ID, attr in self.defenders.items() if 'dead' not in attr['state']]
#
# alive_attack = len(attack_ids)
# alive_defend = len(defend_ids)
#
# if alive_attack > 0 and alive_defend > 0 and def_ID in defend_ids:
# if alive_attack == alive_defend:
# idx = defend_ids.index(def_ID)
# return ['attack', attack_ids[idx]]
# elif alive_attack < alive_defend:
# return ['attack', attack_ids[0]]
# else:
# return ['attack', attack_ids[0]]
# else:
# return ['idle']
# def2att_ids = [defender['state'][1] for defender in self.defenders.values() if defender['state'][0] == 'attack']
# avail_ids = [ID for ID in attack_ids if ID not in def2att_ids]
#
# if len(avail_ids)>0:
# return ['attack', avail_ids[0]]
# else:
# return ['attack', def2att_ids[0]]
# # UGV expel(nearest assign)
# def assign_expel(self, def_ID):
# drone_pos = [self.drone['X'], self.drone['Y']]
# drone_min_dist = []
# for key_point in key_points:
# drone_min_dist.append(np.linalg.norm(np.array(key_point)-np.array(drone_pos)))
# if min(drone_min_dist) < DRIVE_AWAY_DIST:
# min_dist = 1000
# min_id = list(self.defenders.keys())[0]
# key_point_idx = drone_min_dist.index(min(drone_min_dist))
# for ID, attr in self.defenders.items():
# expel_dist = np.linalg.norm(np.array(key_points[key_point_idx])-np.array([attr['X'], attr['Y']]))
# if expel_dist < min_dist:
# min_dist = expel_dist
# min_id = ID
# if def_ID == min_id:
# return ['expel', key_point_idx]
# else:
# return None
# else:
# return None
# UGV expel(both)
def expel(self, ):
drone_pos = [self.drone['X'], self.drone['Y']]
drone_dist = []
for key_point in key_points:
drone_dist.append(np.linalg.norm(np.array(key_point)-np.array(drone_pos)))
# if min(drone_dist) < DRIVE_AWAY_DIST:
return ['expel', drone_dist.index(min(drone_dist))]
# else:
# return None
# defender assign
def assign_defend(self, def_ID):
alive_attackers = [attacker for attacker in self.attackers.values() if 'dead' not in attacker['state']]
alive_attackers_ids = [ID for ID, attr in self.attackers.items() if 'dead' not in attr['state']]
dist = [np.linalg.norm(np.array([att['X'], att['Y']]) - np.array([self.defenders[def_ID]['X'], self.defenders[def_ID]['Y']])) for att in alive_attackers]
if len(dist) > 0 and min(dist) < DEFEND_DIST:
idx = dist.index(min(dist))
return ['attack', alive_attackers_ids[idx]]
else:
return self.expel()
# alive_attacker_list = [att for att in self.attackers.values() if 'dead' not in att['state']]
#
# if len(alive_attacker_list) > 0:
# UGV_stance = [self.evaluator.UAV2UAV_id('offensive', attacker, self.defenders[def_ID]) for attacker in alive_attacker_list]
# # print(inner_stance_list)
# max_UGV_stance = max(UGV_stance)
#
# drone_stance = [self.evaluator.Drone2Point_id(self.drone, keyPoint) for keyPoint in key_points]
# max_drone_stance = max(drone_stance)
#
# print('UGV stance: ', max_UGV_stance)
# print('drone stance: ', max_drone_stance)
# print('ratio: ', max_UGV_stance/max_drone_stance)
#
# ratio = max_UGV_stance/max_drone_stance
#
# if ratio > self.ratio_thres:
# assigned_state = self.defend(def_ID)
# else:
# assigned_state = self.expel()
#
# else:
# assigned_state = self.expel()
#
# return assigned_state
# judge retreat for attackers
def is_retreat(self, att_ID, def_ID):
if 'dead' in self.attackers[att_ID]['state'] or 'dead' in self.defenders[def_ID]['state']:
return False
else:
# # dist version
# attacker_pos = [self.attackers[att_ID]['X'], self.attackers[att_ID]['Y']]
# dist_list = [np.linalg.norm(np.array(attacker_pos) - np.array([defender['X'], defender['Y']])) for defender \
# in self.defenders.values()]
# if min(dist_list) < 800:
# return True
# else:
# return False
# stance_version
stance = self.evaluator.UAV2UAV_id('offensive', self.attackers[att_ID], self.defenders[def_ID])
if stance > RETREAT_STANCE:
print('retreat stance: ', stance)
return True
else:
return False
def is_attack(self, att_ID):
if 'dead' in self.attackers[att_ID]['state']:
return False
else:
# # dist version
# attacker_pos = [self.attackers[att_ID]['X'], self.attackers[att_ID]['Y']]
# dist_list = [np.linalg.norm(np.array(attacker_pos) - np.array([defender['X'], defender['Y']])) for defender \
# in self.defenders.values()]
# if min(dist_list) > 1000:
# return True
# else:
# return False
# stance version
stance_list = [self.evaluator.UAV2UAV_id('offensive', self.attackers[att_ID], self.defenders[def_ID]) for def_ID in \
self.defenders.keys() if 'dead' not in self.defenders[def_ID]['state']]
if max(stance_list) < RETREAT_STANCE:
print('attack stance: ', max(stance_list))
return True
else:
print('retreat stance: ', max(stance_list))
return False
# test
# if __name__ == '__main__':
#
# # assigner = TaskAssign()
# # attack_goals, defend_goals, avoid_goals, uav_point = align.assign_all(ally_agents_data, enemy_agents_data, key_points)
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/script_ai/blue_strategy.py
================================================
import numpy as np
import math
def offense_combat(self_data, ally_agents_data, enemy_agents_data, key_points, blue_alive, red_alive,
agent_type):
# 防守方red小车最大速度
red_car_max_vel = 600
# 进攻方blue小车最大速度
blue_car_max_vel = 600
# 进攻方blue无人机最大速度
blue_drone_max_vel = 600
# 进攻方无人机占领夺控点胜利时间
time_to_win = 2.0
# 驱离载荷作用范围
expel_range = 1200
# 无人车打击距离
fire_dist = 2000
all_enemy_agent_pos = []
for agent_id, dict_value in enemy_agents_data.items():
all_enemy_agent_pos.append([dict_value['X'], dict_value['Y'], dict_value['Z']])
"""
all_enemy_agent_yaw = []
for agent_id, dict_value in enemy_agents_data.items():
if agent_id != '231':
all_enemy_agent_yaw.append([dict_value['Yaw']])
"""
blue_drone_current_pos = np.array([[ally_agents_data['231']['X'], ally_agents_data['231']['Y'], ally_agents_data['231']['Z']]])
ally_agents_data.pop('231')
if '211' in self_data.keys():
friend_agents_data = dict(self_data, **ally_agents_data)
if '211' in ally_agents_data.keys():
friend_agents_data = dict(ally_agents_data, **self_data)
all_friend_agent_pos = []
for agent_id, dict_value in friend_agents_data.items():
all_friend_agent_pos.append([dict_value['X'], dict_value['Y'], dict_value['Z']])
"""
all_friend_agent_yaw = []
for agent_id, dict_value in friend_agents_data.items():
all_friend_agent_yaw.append([dict_value['Yaw']])
"""
target_location = np.array(key_points)
red_car_current_pos = np.array(all_enemy_agent_pos)
blue_car_current_pos = np.array(all_friend_agent_pos)
blue_car_current_pos = blue_car_current_pos
if agent_type == 0:
drone_keypoint_relative_dist = np.array([[np.linalg.norm(blue_drone_current_pos[0][:2] - target_location[b][:2])
for b in range(2)]])
if red_alive[0] is True and red_alive[1] is True:
dist_red_car_blue_drone = np.array(
[[np.linalg.norm(red_car_current_pos[a][:2] - blue_drone_current_pos[0][:2])]
for a in range(2)])
if np.min(dist_red_car_blue_drone) < expel_range:# and np.argmin(drone_keypoint_relative_dist) < 600:
flag = 'defense'
else:
flag = 'offense'
elif red_alive[0] is True and red_alive[1] is False:
dist_red_car_blue_drone = np.linalg.norm(red_car_current_pos[0][:2] - blue_drone_current_pos[0][:2])
if dist_red_car_blue_drone < expel_range:# and np.argmin(drone_keypoint_relative_dist) < 600:
flag = 'defense'
else:
flag = 'offense'
elif red_alive[0] is False and red_alive[1] is True:
dist_red_car_blue_drone = np.linalg.norm(red_car_current_pos[1][:2] - blue_drone_current_pos[0][:2])
if dist_red_car_blue_drone < expel_range:# and np.argmin(drone_keypoint_relative_dist) < 600:
flag = 'defense'
else:
flag = 'offense'
elif red_alive[0] is False and red_alive[1] is False:
flag = 'offense'
if flag == 'offense':
if blue_alive[0] is True or blue_alive[1] is True:
# 无人车
if blue_alive[0] is True and blue_alive[1] is True:
center_x = (blue_car_current_pos[0][0] + blue_car_current_pos[1][0]) / 2
center_y = (blue_car_current_pos[0][1] + blue_car_current_pos[1][1]) / 2
center_z = (blue_car_current_pos[0][2] + blue_car_current_pos[1][2]) / 2
elif blue_alive[0] is True and blue_alive[1] is False:
center_x = blue_car_current_pos[0][0]
center_y = blue_car_current_pos[0][1]
center_z = blue_car_current_pos[0][2]
elif blue_alive[0] is False and blue_alive[1] is True:
center_x = blue_car_current_pos[1][0]
center_y = blue_car_current_pos[1][1]
center_z = blue_car_current_pos[1][2]
blue_center = np.array([[center_x, center_y, center_z]])
if red_alive[0] is True and red_alive[1] is True:
dist = np.array([[np.linalg.norm(blue_center[0][:2] - red_car_current_pos[b][:2])
for b in range(2)]])
target_pos = red_car_current_pos[np.argmin(dist)]
target_id = np.argmin(dist)
elif red_alive[0] is True and red_alive[1] is False:
target_pos = red_car_current_pos[0]
target_id = 0
elif red_alive[0] is False and red_alive[1] is True:
target_pos = red_car_current_pos[1]
target_id = 1
elif red_alive[0] is False and red_alive[1] is False:
target_pos = target_location[0]
target_id = 1
else:
target_pos = blue_car_current_pos[0]
target_id = 0
else:
target_pos = blue_drone_current_pos[0]
target_pos[0] = target_pos[0] + (-2000 * np.sign(blue_drone_current_pos[0][0]))
target_pos[1] = target_pos[1] + (-3000 * np.sign(blue_drone_current_pos[0][1]))
target_pos[2] = blue_car_current_pos[0][2]
target_id = 0
return target_pos, target_id, flag
elif agent_type == 1:
# 无人机
drone_keypoint_relative_dist = np.array([[np.linalg.norm(blue_drone_current_pos[0][:2] - target_location[b][:2])
for b in range(2)]])
safe_target_location = []
for index in range(2):
if red_alive[0] is True and red_alive[1] is True:
dist = np.array([[np.linalg.norm(target_location[index][:2] - red_car_current_pos[b][:2])
for b in range(2)]])
if np.min(dist) > expel_range:
safe_target_location.append(True)
else:
safe_target_location.append(False)
elif red_alive[0] is True and red_alive[1] is False:
dist = np.linalg.norm(target_location[index][:2] - red_car_current_pos[0][:2])
if dist > expel_range:
safe_target_location.append(True)
else:
safe_target_location.append(False)
elif red_alive[0] is False and red_alive[1] is True:
dist = np.linalg.norm(target_location[index][:2] - red_car_current_pos[1][:2])
if dist > expel_range:
safe_target_location.append(True)
else:
safe_target_location.append(False)
elif red_alive[0] is False and red_alive[1] is False:
safe_target_location = [True, True]
if safe_target_location[0] is True and safe_target_location[1] is True:
red_car_dist = np.array(
[[np.linalg.norm(red_car_current_pos[a][:2] - target_location[b][:2]) - expel_range for b in range(2)]
for a in range(2)])
if red_alive[0] is False:
red_car_dist[0] = 100000
elif red_alive[1] is False:
red_car_dist[1] = 100000
target_defense_index = np.argmax(np.max(red_car_dist, axis=0, keepdims=True))
blue_drone_dist_to_go = np.array(
[[np.linalg.norm(blue_drone_current_pos[a][:2] - target_location[b][:2]) for b in range(2)]
for a in range(1)])
# return target_location[np.argmin(blue_drone_dist_to_go)]
return target_location[target_defense_index]
elif safe_target_location[0] is True and safe_target_location[1] is False:
return target_location[0]
elif safe_target_location[0] is False and safe_target_location[1] is True:
return target_location[1]
elif safe_target_location[0] is False and safe_target_location[1] is False:
return np.array([(target_location[0][0] + target_location[0][1]) / 2,
(target_location[1][0] + target_location[1][1]) / 2])
if __name__ == '__main__':
agent_type = 0
# 211和221是进攻方blue小车, 231是进攻方blue无人机
# 311和321是防守方red小车
# 当前要决策的blue小车的信息
self_data = {'211': {'X': 1500, 'Y': -2000, 'Z': 0, 'Yaw': 30, 'Blood': 100}}
ally_agents_data = {'221': {'X': -2500, 'Y': -2500, 'Z': 0, 'Yaw': 40, 'Blood': 100},
'231': {'X': 700, 'Y': 3300, 'Z': 1500, 'Yaw': 0}}
# 进攻方blue小车和无人机信息
enemy_agents_data = {'311': {'X': 2700.0, 'Y': 3300, 'Z': 0, 'Yaw': 20, 'Blood': 100},
'321': {'X': -1000, 'Y': -700, 'Z': 0, 'Yaw': 10, 'Blood': 100}}
# 夺控点信息
key_points = [[700, 3300, 0], [-2500, -700, 0]]
# 存活状态
blue_alive = [True, True]
red_alive = [True, True]
if agent_type == 0:
target_position, target_id, flag = offense_combat(self_data, ally_agents_data, enemy_agents_data,
key_points, blue_alive, red_alive, agent_type)
print(target_position)
print('\r\n')
print(target_id)
print('\r\n')
print(flag)
elif agent_type == 1:
target_position = offense_combat(self_data, ally_agents_data, enemy_agents_data,
key_points, blue_alive, red_alive, agent_type)
print(target_position)
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/script_ai/decision.py
================================================
import copy
import random
import numpy as np
import datetime
import time
from ALGORITHM.script_ai.assignment import *
from ALGORITHM.script_ai.global_params import *
class decision():
"""docstring for decision"""
def __init__(self, attackers, drone, defenders):
super(decision, self).__init__()
self.attackers = attackers
self.drone = drone
self.defenders = defenders
self.assigner = TaskAssign(self.attackers, self.drone, self.defenders)
# output actions
def act(self, type=None):
actions_list = {}
self.alive_attack = len([att for att in list(self.attackers.values()) if 'dead' not in att['state']])
self.alive_defend = len([ded for ded in list(self.defenders.values()) if 'dead' not in ded['state']])
if not type:
type = 'attackers'
if type == 'attackers':
self.attack_StateTrans()
for ID, attr in self.attackers.items():
if 'dead' in attr['state']:
des_pos = [attr['X'], attr['Y'], attr['Z']]
actions_list[ID] = des_pos
continue
if 'attack' in attr['state'] and attr['state'][1] is not '0':
def_ID = attr['state'][1]
opp = self.defenders[def_ID]
des_pos = [opp['X']-400, opp['Y']-400, opp['Z']]
actions_list[ID] = des_pos
elif 'retreat' in attr['state']:
if self.drone['state'][0] is not 'idle':
select_key_points = key_points[self.drone['state'][1]]
agent_pos = [attr['X'], attr['Y'], attr['Z']]
if select_key_points[1] - agent_pos[1] > 50 and select_key_points[0] - agent_pos[0] > 50:
k = (select_key_points[1] - agent_pos[1]) / (select_key_points[0] - agent_pos[0])
des_pos = [agent_pos[0] - 1400, agent_pos[1] - 1400 * k, agent_pos[2]]
else:
des_pos = [agent_pos[0] - 1400, agent_pos[1], agent_pos[2]]
else:
des_pos = ATTA_RETREAT_POS
actions_list[ID] = des_pos
# des_pos = ATTA_RETREAT_POS
# actions_list[ID] = des_pos
elif 'idle' in attr['state']:
des_pos = [attr['X'], attr['Y'], attr['Z']]
actions_list[ID] = des_pos
# attack target assign
assign_attackers = [ID for ID, attr in self.attackers.items() if (len(attr['state'])>1 and attr['state'][1] == '0')]
# attacker assignment
if len(assign_attackers)>0:
target_ID = self.assigner.assign_attack(assign_attackers)
target = self.defenders[target_ID]
for attacker_ID in assign_attackers:
actions_list[attacker_ID] = [target['X']-300, target['Y']-300, target['Z']]
self.attackers[attacker_ID]['state'][1] = target_ID
# drone action
self.drone_StateTrans()
if 'idle' in self.drone['state']:
des_pos = [self.drone['X'], self.drone['Y'], self.drone['Z']]
elif 'running' in self.drone['state'] or 'hold' in self.drone['state']:
des_pos = key_points[self.drone['state'][1]]
actions_list['drone'] = des_pos
# same as attackers
elif type == 'defenders':
self.defend_StateTrans()
for ID, attr in self.defenders.items():
if 'dead' in attr['state']:
des_pos = [attr['X'], attr['Y'], attr['Z']]
actions_list[ID] = des_pos
continue
if 'attack' in attr['state']:
def_ID = attr['state'][1]
opp = self.attackers[def_ID]
des_pos = [opp['X'], opp['Y'], opp['Z']]
elif 'expel' in attr['state']:
des_pos = key_points[attr['state'][1]]
elif 'retreat' in attr['state']:
des_pos = DEF_RETREAT_POS
elif 'idle' in attr['state']:
des_pos = key_points[0]
# else:
# des_pos = [attr['X'], attr['Y'], attr['Z']]
actions_list[ID] = des_pos
else:
raise ValueError('invalid type!')
return actions_list
# state machine
def attack_StateTrans(self, ):
# attackers
for ID in list(self.attackers.keys()):
attr = self.attackers[ID]
# check alive/dead(blood thres: 2)
if attr['blood'] <= 2 and 'dead' not in attr['state']:
attr['blood'] = 0
attr['state'] = ['dead']
continue
# idle2attack
if 'idle' in attr['state']:
if self.alive_defend > 0:
attr['state'] = ['attack']
def_ID = '0'
attr['state'].append(def_ID)
continue
# attack2idle/retreat(blood thres: 10)
if 'attack' in attr['state']:
def_ID = attr['state'][1]
is_attack = self.assigner.is_attack(ID)
# 2idle
if 'dead' in self.defenders[def_ID]['state']:
attr['state'] = ['attack']
def_ID = '0'
attr['state'].append(def_ID)
elif not is_attack:
attr['state'] = ['retreat']
continue
# retreat2attack
if 'retreat' in attr['state']:
if self.alive_defend > 0 and attr['blood'] > 10:
is_attack = self.assigner.is_attack(ID)
if is_attack:
attr['state'] = ['attack']
def_ID = '0'
attr['state'].append(def_ID)
# dist_list = [np.linalg.norm(np.array([attr['X'], attr['Y'], attr['Z']]) - np.array([ded['X'], ded['Y'], ded['Z']])) for ded in self.defenders.values()]
# min_dist = min(dist_list)
# if min_dist > 1500:
# attr['state'] = ['attack']
# def_ID = '0'
# attr['state'].append(def_ID)
# if 'idle' in self.drone['state']:
# self.drone['state'] = self.assigner.assign_2point()
def drone_StateTrans(self, ):
drone_pos = [self.drone['X'], self.drone['Y']]
# initial assign
if 'idle' in self.drone['state']:
self.drone['state'] = self.assigner.assign_drone_ini()
# # run2hold
# elif 'running' in self.drone['state']:
# cur_point_idx = self.drone['state'][1]
# cur_point_pos = key_points[cur_point_idx]
# if np.linalg.norm(np.array(drone_pos) - np.array(cur_point_pos)) < 10:
# self.drone['state'] = ['hold', cur_point_idx]
# else:
# pass
elif 'running' in self.drone['state']:
cur_point_idx = self.drone['state'][1]
cur_point_pos = key_points[cur_point_idx]
if self.assigner.judge_expeled():
self.drone['state'] = ['running', int(1 - cur_point_idx)]
self.drone['state'] = self.assigner.assign_drone_ini()
else:
pass
# defender
def defend_StateTrans(self, ):
for ID in list(self.defenders.keys()):
attr = self.defenders[ID]
# check alive/dead
if attr['blood'] <= 2 and 'dead' not in attr['state']:
attr['blood'] = 0
attr['state'] = ['dead']
continue
# # expel nearest
# if self.assigner.assign_expel(ID) is not None:
# attr['state'] = self.assigner.assign_expel(ID)
# continue
# # expel both
# if self.assigner.assign_expel() is not None:
# attr['state'] = self.assigner.assign_expel()
# continue
#
# # idle2attack
# if self.alive_attack>0 and self.assigner.assign_defend() is not None:
# attr['state'] = self.assigner.assign_defend()
# continue
attr['state'] = self.assigner.assign_defend(ID)
# attack2idle/retreat(blood thres: 10)
if 'attack' in attr['state']:
att_ID = attr['state'][1]
# 2retreat
if attr['blood'] <= 10 and att_ID in self.attackers.keys() and self.attackers[att_ID]['blood'] > 5:
attr['state'] = ['retreat']
# 2idle
elif 'dead' in self.attackers[att_ID]['state']:
attr['state'] = ['idle']
# else:
# attr['state'] = self.assigner.assign_defend(ID)
def test():
# test initial data
# ally_agents_data={"221": {"ammo": 100, "velocity": 0.5, "X":1, "Y":1, "Z":0, "Yaw":0, 'blood':100, 'state': ['idle']}, "231": {"ammo": 100, "velocity": 0.5, "X":2, "Y":2, "Z":1.5, "Yaw":0, 'blood':100, 'state': ['idle']}}
# enemy_agents_data={"311": {"ammo": 100, "velocity": 0.8, "X":5, "Y":5, "Z":0, "Yaw":0, 'blood':100, 'state': ['idle']}, "321": {"ammo": 100, "velocity": 0.8, "X":6, "Y":6, "Z":0, "Yaw":0, 'blood':100, 'state': ['idle']}}
# test dead detection √
# ally_agents_data={"221": {"ammo": 100, "velocity": 0.5, "X":1, "Y":1, "Z":0, "Yaw":0, 'blood':0, 'state': ['idle']}, "231": {"ammo": 100, "velocity": 0.5, "X":2, "Y":2, "Z":1.5, "Yaw":0, 'blood':1, 'state': ['retreat']}}
# drone_data={"ammo": 100, "velocity": 0.5, "X":1, "Y":1, "Z":0, "Yaw":0, 'blood':0, 'state': ['idle']}
# enemy_agents_data={"311": {"ammo": 100, "velocity": 0.8, "X":5, "Y":5, "Z":0, "Yaw":0, 'blood':1, 'state': ['attack']}, "321": {"ammo": 100, "velocity": 0.8, "X":6, "Y":6, "Z":0, "Yaw":0, 'blood':0, 'state': ['dead']}}
# test idle2attack √
# ally_agents_data={"221": {"ammo": 100, "velocity": 0.5, "X":1, "Y":1, "Z":0, "Yaw":0, 'blood':100, 'state': ['idle']}, "231": {"ammo": 100, "velocity": 0.5, "X":2, "Y":2, "Z":1.5, "Yaw":0, 'blood':100, 'state': ['idle']}}
# drone_data={"ammo": 100, "velocity": 0.5, "X":1, "Y":1, "Z":0, "Yaw":0, 'blood':0, 'state': ['idle']}
# enemy_agents_data={"311": {"ammo": 100, "velocity": 0.8, "X":5, "Y":5, "Z":0, "Yaw":0, 'blood':100, 'state': ['idle']}, "321": {"ammo": 100, "velocity": 0.8, "X":6, "Y":6, "Z":0, "Yaw":0, 'blood':100, 'state': ['idle']}}
# test attack2idle √
# ally_agents_data={"221": {"ammo": 100, "velocity": 0.5, "X":1, "Y":1, "Z":0, "Yaw":0, 'blood':100, 'state': ['attack', '311']}, "231": {"ammo": 100, "velocity": 0.5, "X":2, "Y":2, "Z":1.5, "Yaw":0, 'blood':100, 'state': ['attack', '311']}}
# drone_data={"ammo": 100, "velocity": 0.5, "X":1, "Y":1, "Z":0, "Yaw":0, 'blood':0, 'state': ['idle']}
# enemy_agents_data = {}
# test attack2retreat √
ally_agents_data={"221": {"ammo": 100, "velocity": 0.5, "X":1, "Y":1, "Z":0, "Yaw":0, 'blood':11, 'state': ['attack', '311']}, "231": {"ammo": 100, "velocity": 0.5, "X":2, "Y":2, "Z":1.5, "Yaw":0, 'blood':10, 'state': ['attack', '311']}}
drone_data={"ammo": 100, "velocity": 0.5, "X":-3, "Y":0, "Z":0, "Yaw":0, 'blood':0, 'state': ['idle']}
enemy_agents_data={"311": {"ammo": 100, "velocity": 0.8, "X":2, "Y":1, "Z":0, "Yaw":0, 'blood':6, 'state': ['idle']}, "321": {"ammo": 100, "velocity": 0.8, "X":6, "Y":6, "Z":0, "Yaw":0, 'blood':100, 'state': ['idle']}}
# enemy_agents_data={"311":{"ammo": 100, "velocity": 0.5, "X":2, "Y":2, "Z":0, "Yaw":0, 'blood':11, 'state': ['expel']}}
DecisionMake = decision(ally_agents_data, drone_data, enemy_agents_data)
attackers = DecisionMake.attackers
defenders = DecisionMake.defenders
drone = DecisionMake.drone
# decision module test
attack_actions = DecisionMake.act(type='attackers')
defend_actions = DecisionMake.act(type='defenders')
print('attack property: ', attackers)
print('defend property: ', defenders)
att_states = []
def_states = []
for k, v in attackers.items():
att_states.append({k:v['state']})
for k, v in defenders.items():
def_states.append({k:v['state']})
drone_state = drone['state']
print('attack states: ', att_states)
print('defend states: ', def_states)
print('drone states: ', drone_state)
print('attack actions: ', attack_actions)
print('defend actions: ', defend_actions)
if __name__ == '__main__':
test()
# DecisionMake = decision(ally_agents_data, enemy_agents_data)
# while(1):
# attack_actions = DecisionMake.act(type='attackers')
# defend_actions = DecisionMake.act(type='defenders')
# time.sleep(0.05)
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/script_ai/dummy.py
================================================
import numpy as np
from UTIL.tensor_ops import copy_clone
class DummyAlgConfig():
reserve = ""
class DummyAlgorithmBase():
def __init__(self, n_agent, n_thread, space, mcv=None, team=None):
from config import GlobalConfig
self.n_agent = n_agent
self.n_thread = n_thread
self.ScenarioConfig = GlobalConfig.ScenarioConfig
def forward(self, inp, state, mask=None):
raise NotImplementedError
def to(self, device):
return self
def interact_with_env(self, State_Recall):
assert State_Recall['Latest-Obs'] is not None, ('make sure obs is ok')
ENV_PAUSE = State_Recall['ENV-PAUSE']
ENV_ACTIVE = ~ENV_PAUSE
assert self.n_thread == len(ENV_ACTIVE), ('the number of thread is wrong?')
n_active_thread = sum(ENV_ACTIVE)
assert len(State_Recall['Latest-Obs']) == n_active_thread, ('make sure we have the right batch of obs')
actions = np.zeros(shape=(self.n_thread, self.n_agent))
# set actions of in-active threads to NaN (will be done again in multi_team.py, this line is not necessary)
actions[ENV_PAUSE] = np.nan
# swap (self.n_thread, self.n_agent) -> (self.n_agent, self.n_thread)
actions = np.swapaxes(actions, 0, 1)
return actions, {}
class DummyAlgorithm(DummyAlgorithmBase):
def interact_with_env(self, State_Recall):
assert State_Recall['Latest-Obs'] is not None, ('make sure obs is ok')
ENV_PAUSE = State_Recall['ENV-PAUSE']
ENV_ACTIVE = ~ENV_PAUSE
assert self.n_thread == len(ENV_ACTIVE), ('the number of thread is wrong?')
n_active_thread = sum(ENV_ACTIVE)
assert len(State_Recall['Latest-Obs']) == n_active_thread, ('make sure we have the right batch of obs')
actions = np.zeros(shape=(self.n_thread, self.n_agent, 4))
env0_step = State_Recall['Current-Obs-Step']
if env0_step%2==0:
actions[..., 0] = 1 # AT
for i in range(5): actions[:, i, 1] = i # TT
actions[..., 2] = 0 # HT
actions[..., 3] = 0 # SP
else:
actions[..., 0] = 5 # AT
for i in range(5): actions[:, i, 1] = i # TT
actions[..., 2] = 0 # HT
actions[..., 3] = 0 # SP
# set actions of in-active threads to NaN (will be done again in multi_team.py, this line is not necessary)
actions[ENV_PAUSE] = np.nan
# swap (self.n_thread, self.n_agent) -> (self.n_agent, self.n_thread)
actions = np.swapaxes(actions, 0, 1)
return actions, {}
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/script_ai/dummy_uhmap.py
================================================
import numpy as np
from UTIL.tensor_ops import copy_clone
from MISSION.uhmap.actset_lookup import encode_action_as_digits
from ALGORITHM.script_ai.decision import decision
from ALGORITHM.script_ai.assignment import *
from ALGORITHM.script_ai.global_params import *
attact_states={'211':['idle'],'221':['idle']}
drone_state={'231':['idle']}
defend_states={'311':['idle'],'321':['idle']}
print("===========================")
print(attact_states)
print(drone_state)
print(defend_states)
print("===========================")
class DummyAlgConfig():
reserve = ""
class DummyAlgorithmBase():
def __init__(self, n_agent, n_thread, space, mcv=None, team=None):
from config import GlobalConfig
self.n_agent = n_agent
self.n_thread = n_thread
self.ScenarioConfig = GlobalConfig.ScenarioConfig
def forward(self, inp, state, mask=None):
raise NotImplementedError
def to(self, device):
return self
def interact_with_env(self, State_Recall):
assert State_Recall['Latest-Obs'] is not None, ('make sure obs is ok')
ENV_PAUSE = State_Recall['ENV-PAUSE']
ENV_ACTIVE = ~ENV_PAUSE
assert self.n_thread == len(ENV_ACTIVE), ('the number of thread is wrong?')
n_active_thread = sum(ENV_ACTIVE)
assert len(State_Recall['Latest-Obs']) == n_active_thread, ('make sure we have the right batch of obs')
actions = np.zeros(shape=(self.n_thread, self.n_agent))
# set actions of in-active threads to NaN (will be done again in multi_team.py, this line is not necessary)
actions[ENV_PAUSE] = np.nan
# swap (self.n_thread, self.n_agent) -> (self.n_agent, self.n_thread)
actions = np.swapaxes(actions, 0, 1)
return actions, {}
# 进攻方决策
class DummyAlgorithmT1(DummyAlgorithmBase):
def interact_with_env(self, State_Recall):
try:
res = self.interact_with_env_(State_Recall)
except:
actions = np.zeros(shape=(self.n_thread, self.n_agent, 8))
actions[:] = encode_action_as_digits("N/A", "N/A", x=None, y=None, z=None, UID=None, T=None, T_index=None)
actions = np.swapaxes(actions, 0, 1)
res = (actions, None)
return res
def interact_with_env_(self, State_Recall):
assert State_Recall['Latest-Obs'] is not None, ('make sure obs is ok')
ENV_PAUSE = State_Recall['ENV-PAUSE']
ENV_ACTIVE = ~ENV_PAUSE
assert self.n_thread == len(ENV_ACTIVE), ('the number of thread is wrong?')
n_active_thread = sum(ENV_ACTIVE)
assert len(State_Recall['Latest-Obs']) == n_active_thread, ('make sure we have the right batch of obs')
actions = np.zeros(shape=(self.n_thread, self.n_agent, 8))
env0_step = State_Recall['Current-Obs-Step']
obs = State_Recall['Latest-Team-Info']
thread = 0
global attact_states
global drone_state
global defend_states
if State_Recall['Env-Suffered-Reset']==[True]:
attact_states = {'211': ['idle'], '221': ['idle']}
drone_state = {'231': ['idle']}
defend_states = {'311': ['idle'], '321': ['idle']}
# 防守方red小车的信息
red_agents_data = {'311': {'ammo':100, 'velocity':0, 'X': obs[0]['dataArr'][3]['agentLocation']['x'],
'Y': obs[0]['dataArr'][3]['agentLocation']['y'],
'Z': obs[0]['dataArr'][3]['agentLocation']['z'],
'Yaw': 0,
'blood': obs[0]['dataArr'][3]['agentHp'], 'state':defend_states['311']},
'321': {'ammo':100, 'velocity':0, 'X': obs[0]['dataArr'][4]['agentLocation']['x'],
'Y': obs[0]['dataArr'][4]['agentLocation']['y'],
'Z': obs[0]['dataArr'][4]['agentLocation']['z'],
'Yaw': 0,
'blood': obs[0]['dataArr'][4]['agentHp'], 'state':defend_states['321']}}
# 进攻方blue小车和无人机信息,其中231是无人机
blue_agents_data = {'211': {'ammo':100, 'velocity':0, 'X': obs[0]['dataArr'][0]['agentLocation']['x'],
'Y': obs[0]['dataArr'][0]['agentLocation']['y'],
'Z': obs[0]['dataArr'][0]['agentLocation']['z'],
'Yaw': 0,
'blood': obs[0]['dataArr'][0]['agentHp'], 'state':attact_states['211']},
'221': {'ammo':100, 'velocity':0, 'X': obs[0]['dataArr'][1]['agentLocation']['x'],
'Y': obs[0]['dataArr'][1]['agentLocation']['y'],
'Z': obs[0]['dataArr'][1]['agentLocation']['z'],
'Yaw': 0,
'blood': obs[0]['dataArr'][1]['agentHp'], 'state':attact_states['221']}}
drone_data={'ammo':100, 'velocity':0, 'X': obs[0]['dataArr'][2]['agentLocation']['x'],
'Y': obs[0]['dataArr'][2]['agentLocation']['y'],
'Z': obs[0]['dataArr'][2]['agentLocation']['z'],
'Yaw': 0,
'blood': obs[0]['dataArr'][2]['agentHp'], 'state':drone_state['231']}
blue_alive = [obs[0]['dataArr'][0]['agentAlive'], obs[0]['dataArr'][1]['agentAlive']]
red_alive = [obs[0]['dataArr'][3]['agentAlive'], obs[0]['dataArr'][4]['agentAlive']]
# 夺控点信息 在global——params.py修改
# key_points = [[700, -3300, 500], [-3000, 700, 500]]
DecisionMake = decision(blue_agents_data,drone_data,red_agents_data)
attackers = DecisionMake.attackers
defenders = DecisionMake.defenders
drone = DecisionMake.drone
#decision module test
attack_actions = DecisionMake.act(type='attackers')
defend_actions = DecisionMake.act(type='defenders')
att_states = []
def_states = []
for k, v in attackers.items():
att_states.append({k: v['state']})
for k, v in defenders.items():
def_states.append({k: v['state']})
# print('++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++')
# print(drone['state'])
# print('++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++')
attact_states['211']=att_states[0]['211']
attact_states['221'] = att_states[1]['221']
defend_states['311'] = def_states[0]['311']
defend_states['321'] = def_states[1]['321']
drone_state['231'] = drone['state']
print('+++++++++++++++++++++++++ info +++++++++++++++++++++++++++++++++')
print('211 state: ', attact_states['211'])
print('221 state: ', attact_states['221'])
print('311 state: ', defend_states['311'])
print('321 state: ', defend_states['321'])
print('drone state: ', drone['state'])
print('attack actions: ', attack_actions)
print('defend actions: ', defend_actions)
print('++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++')
# 小车211决策
if attact_states['211'][0] == 'attack':
if attact_states['211'][1] == '311':
# actions[thread, 0] = encode_action_as_digits("SpecificAttacking", "N/A", x=None, y=None, z=None,UID=3,T=None, T_index=None)
actions[thread, 0] = encode_action_as_digits("SpecificMoving", "N/A", x=attack_actions['211'][0], y=attack_actions['221'][1], z=500,UID=None,T=None, T_index=None)
else:
# actions[thread, 0] = encode_action_as_digits("SpecificAttacking", "N/A", x=None, y=None, z=None, UID=4,T=None, T_index=None)
actions[thread, 0] = encode_action_as_digits("SpecificMoving", "N/A", x=attack_actions['211'][0], y=attack_actions['211'][1],z=500,UID=None,T=None, T_index=None)
else:
actions[thread, 0] = encode_action_as_digits("SpecificMoving", "N/A", x=attack_actions['211'][0], y=attack_actions['211'][1],z=500,UID=None,T=None, T_index=None)
# 小车221决策
if attact_states['221'][0] == 'attack':
if attact_states['221'][1] == '311':
# actions[thread, 1] = encode_action_as_digits("SpecificAttacking", "N/A", x=None, y=None, z=None, UID=3,T=None, T_index=None)
actions[thread, 1] = encode_action_as_digits("SpecificMoving", "N/A", x=attack_actions['221'][0], y=attack_actions['221'][1],z=500,UID=None,T=None, T_index=None)
else:
# actions[thread, 1] = encode_action_as_digits("SpecificAttacking", "N/A", x=None, y=None, z=None, UID=4,T=None, T_index=None)
actions[thread, 1] = encode_action_as_digits("SpecificMoving", "N/A", x=attack_actions['221'][0], y=attack_actions['221'][1],z=500,UID=None,T=None, T_index=None)
else:
actions[thread, 1] = encode_action_as_digits("SpecificMoving", "N/A", x=attack_actions['221'][0], y=attack_actions['221'][1],z=500,UID=None,T=None, T_index=None)
# 无人机231决策
actions[thread, 2] = encode_action_as_digits("SpecificMoving", "N/A", x=attack_actions['drone'][0],
y=attack_actions['drone'][1],
z=500,
UID=None,
T=None, T_index=None)
# if env0_step < 2:
# actions[thread, :] = self.act2digit_dictionary['ActionSet2::Idle;DynamicGuard']
# actions[thread, 0] = encode_action_as_digits("SpecificAttacking", "N/A", x=None, y=None, z=None, UID=4, T=None, T_index=None)
# actions[thread, 0] = encode_action_as_digits("PatrolMoving", "N/A", x=0, y=0, z=379, UID=None, T=None, T_index=None)
# actions[thread, 1] = encode_action_as_digits("SpecificAttacking", "N/A", x=None, y=None, z=None, UID=3, T=None, T_index=None)
# actions[thread, 2] = encode_action_as_digits("SpecificMoving", "N/A", x=-3000, y=700, z=500, UID=None,T=None, T_index=None)
# actions[thread, 2] = encode_action_as_digits("Idle", "DynamicGuard", x=700, y=-3300, z=500, UID=None, T=None, T_index=None)
"""
if env0_step%4 == 0:
actions[thread, 2] = encode_action_as_digits("SpecificMoving", "Dir+X+Y", x=700, y=-3300, z=500, UID=None, T=None, T_index=None)
if env0_step%4 == 1:
actions[thread, 2] = encode_action_as_digits("SpecificMoving", "Dir+X-Y", x=700, y=-3300, z=500, UID=None, T=None, T_index=None)
if env0_step%4 == 2:
actions[thread, 2] = encode_action_as_digits("SpecificMoving", "Dir-X-Y", x=700, y=-3300, z=500, UID=None, T=None, T_index=None)
if env0_step%4 == 3:
actions[thread, 2] = encode_action_as_digits("SpecificMoving", "Dir-X+Y", x=700, y=-3300, z=500, UID=None, T=None, T_index=None)
"""
# set actions of in-active threads to NaN (will be done again in multi_team.py, this line is not necessary)
actions[ENV_PAUSE] = np.nan
# swap (self.n_thread, self.n_agent) -> (self.n_agent, self.n_thread)
actions = np.swapaxes(actions, 0, 1)
return actions, {}
# 防守方决策
class DummyAlgorithmT2(DummyAlgorithmBase):
def interact_with_env(self, State_Recall):
try:
res = self.interact_with_env_(State_Recall)
except:
actions = np.zeros(shape=(self.n_thread, self.n_agent, 8))
actions[:] = encode_action_as_digits("N/A", "N/A", x=None, y=None, z=None, UID=None, T=None, T_index=None)
actions = np.swapaxes(actions, 0, 1)
res = (actions, None)
return res
def interact_with_env_(self, State_Recall):
assert State_Recall['Latest-Obs'] is not None, ('make sure obs is ok')
ENV_PAUSE = State_Recall['ENV-PAUSE']
ENV_ACTIVE = ~ENV_PAUSE
assert self.n_thread == len(ENV_ACTIVE), ('the number of thread is wrong?')
n_active_thread = sum(ENV_ACTIVE)
assert len(State_Recall['Latest-Obs']) == n_active_thread, ('make sure we have the right batch of obs')
actions = np.zeros(shape=(self.n_thread, self.n_agent, 8))
env0_step = State_Recall['Current-Obs-Step']
obs = State_Recall['Latest-Team-Info']
thread = 0
# 防守方red小车的信息
red_agents_data = {'311': {'ammo': 100, 'velocity': 0, 'X': obs[0]['dataArr'][3]['agentLocation']['x'],
'Y': obs[0]['dataArr'][3]['agentLocation']['y'],
'Z': obs[0]['dataArr'][3]['agentLocation']['z'],
'Yaw': 0,
'blood': obs[0]['dataArr'][3]['agentHp'], 'state': defend_states['311']},
'321': {'ammo': 100, 'velocity': 0, 'X': obs[0]['dataArr'][4]['agentLocation']['x'],
'Y': obs[0]['dataArr'][4]['agentLocation']['y'],
'Z': obs[0]['dataArr'][4]['agentLocation']['z'],
'Yaw': 0,
'blood': obs[0]['dataArr'][4]['agentHp'], 'state': defend_states['321']}}
# 进攻方blue小车和无人机信息,其中231是无人机
blue_agents_data = {'211': {'ammo': 100, 'velocity': 0, 'X': obs[0]['dataArr'][0]['agentLocation']['x'],
'Y': obs[0]['dataArr'][0]['agentLocation']['y'],
'Z': obs[0]['dataArr'][0]['agentLocation']['z'],
'Yaw': 0,
'blood': obs[0]['dataArr'][0]['agentHp'], 'state': attact_states['211']},
'221': {'ammo': 100, 'velocity': 0, 'X': obs[0]['dataArr'][1]['agentLocation']['x'],
'Y': obs[0]['dataArr'][1]['agentLocation']['y'],
'Z': obs[0]['dataArr'][1]['agentLocation']['z'],
'Yaw': 0,
'blood': obs[0]['dataArr'][1]['agentHp'], 'state': attact_states['221']}}
drone_data = {'ammo': 100, 'velocity': 0, 'X': obs[0]['dataArr'][2]['agentLocation']['x'],
'Y': obs[0]['dataArr'][2]['agentLocation']['y'],
'Z': obs[0]['dataArr'][2]['agentLocation']['z'],
'Yaw': 0,
'blood': obs[0]['dataArr'][2]['agentHp'], 'state': drone_state['231']}
blue_alive = [obs[0]['dataArr'][0]['agentAlive'], obs[0]['dataArr'][1]['agentAlive']]
red_alive = [obs[0]['dataArr'][3]['agentAlive'], obs[0]['dataArr'][4]['agentAlive']]
# 夺控点信息 在global——params.py修改
# key_points = [[700, -3300, 500], [-3000, 700, 500]]
DecisionMake = decision(blue_agents_data, drone_data, red_agents_data)
attackers = DecisionMake.attackers
defenders = DecisionMake.defenders
drone = DecisionMake.drone
# decision module test
attack_actions = DecisionMake.act(type='attackers')
defend_actions = DecisionMake.act(type='defenders')
att_states = []
def_states = []
for k, v in attackers.items():
att_states.append({k: v['state']})
for k, v in defenders.items():
def_states.append({k: v['state']})
attact_states['211'] = att_states[0]['211']
attact_states['221'] = att_states[1]['221']
defend_states['311'] = def_states[0]['311']
defend_states['321'] = def_states[1]['321']
drone_state['231'] = drone['state']
# print("==============智能体速度位置信息======================")
# print(red_agents_data)
# print(blue_agents_data)
# print(drone_data)
# print("====================================")
print("==============智能体state信息======================")
print(attact_states)
print(defend_states)
# print(drone_state)
print("====================================")
# 小车311决策
if defend_states['311'][0] == 'attack':
if defend_states['311'][1] == '211':
# actions[thread, 0] = encode_action_as_digits("SpecificAttacking", "N/A", x=None, y=None, z=None, UID=0,T=None, T_index=None)
actions[thread, 0] = encode_action_as_digits("SpecificMoving", "N/A", x=defend_actions['311'][0], y=defend_actions['311'][1],z=500,UID=None,T=None, T_index=None)
else:
# actions[thread, 0] = encode_action_as_digits("SpecificAttacking", "N/A", x=None, y=None, z=None, UID=1,T=None, T_index=None)
actions[thread, 1] = encode_action_as_digits("SpecificMoving", "N/A", x=defend_actions['311'][0], y=defend_actions['311'][1],z=500,UID=None,T=None, T_index=None)
else:
actions[thread, 0] = encode_action_as_digits("SpecificMoving", "N/A", x=defend_actions['311'][0], y=defend_actions['311'][1],z=500,UID=None,T=None, T_index=None)
# 小车321决策
if defend_states['321'][0] == 'attack':
if defend_states['321'][1] == '211':
# actions[thread, 1] = encode_action_as_digits("SpecificAttacking", "N/A", x=None, y=None, z=None, UID=0,T=None, T_index=None)
actions[thread, 1] = encode_action_as_digits("SpecificMoving", "N/A", x=defend_actions['321'][0], y=defend_actions['321'][1],z=500,UID=None,T=None, T_index=None)
else:
# actions[thread, 1] = encode_action_as_digits("SpecificAttacking", "N/A", x=None, y=None, z=None, UID=1,T=None, T_index=None)
actions[thread, 1] = encode_action_as_digits("SpecificMoving", "N/A", x=defend_actions['321'][0], y=defend_actions['321'][1],z=500,UID=None,T=None, T_index=None)
else:
actions[thread, 1] = encode_action_as_digits("SpecificMoving", "N/A", x=defend_actions['321'][0], y=defend_actions['321'][1],z=500,UID=None,T=None, T_index=None)
# actions[thread, 0] = encode_action_as_digits("SpecificAttacking", "N/A", x=None, y=None, z=None, UID=1, T=None, T_index=None)
# actions[thread, 0] = encode_action_as_digits("Idle", "AggressivePersue", x=10000, y=-10000, z=379, UID=None, T=None, T_index=None)
# actions[thread, 0] = encode_action_as_digits("SpecificMoving", "N/A", x=10000, y=-10000, z=379, UID=None, T=None, T_index=None)
# actions[thread, 1] = encode_action_as_digits("PatrolMoving", "N/A", x=444*5, y=444*5, z=379, UID=None, T=None, T_index=None)
# set actions of in-active threads to NaN (will be done again in multi_team.py, this line is not necessary)
actions[ENV_PAUSE] = np.nan
# swap (self.n_thread, self.n_agent) -> (self.n_agent, self.n_thread)
actions = np.swapaxes(actions, 0, 1)
return actions, {}
'''
if env0_step < 5:
actions[thread, :] = self.act2digit_dictionary['ActionSet2::Idle;DynamicGuard']
elif env0_step < 15:
actions[thread, :] = self.act2digit_dictionary['ActionSet2::SpecificAttacking;UID-3']
elif env0_step < 25:
actions[thread, :] = self.act2digit_dictionary['ActionSet2::SpecificAttacking;UID-4']
elif env0_step < 35:
actions[thread, :] = self.act2digit_dictionary['ActionSet2::SpecificAttacking;UID-5']
elif env0_step < 45:
actions[thread, :] = self.act2digit_dictionary['ActionSet2::SpecificAttacking;UID-6']
elif env0_step < 55:
actions[thread, :] = self.act2digit_dictionary['ActionSet2::SpecificAttacking;UID-7']
'''
'''
if env0_step < 5:
actions[thread, 0] = self.act2digit_dictionary['ActionSet2::Idle;DynamicGuard']
else:
actions[thread, 0] = self.act2digit_dictionary['ActionSet2::SpecificAttacking;UID-1']
'''
'''
if env0_step < 5:
if env0_step%4 == 0:
actions[thread, 2] = self.act2digit_dictionary['ActionSet2::PatrolMoving;Dir+X+Y']
if env0_step%4 == 1:
actions[thread, 2] = self.act2digit_dictionary['ActionSet2::PatrolMoving;Dir+X-Y']
if env0_step%4 == 2:
actions[thread, 2] = self.act2digit_dictionary['ActionSet2::PatrolMoving;Dir-X-Y']
if env0_step%4 == 3:
actions[thread, 2] = self.act2digit_dictionary['ActionSet2::PatrolMoving;Dir-X+Y']
elif env0_step < 10:
actions[thread, 2] = self.act2digit_dictionary['ActionSet2::Idle;DynamicGuard']
elif env0_step < 15:
if env0_step%4 == 0:
actions[thread, 2] = self.act2digit_dictionary['ActionSet2::SpecificMoving;Dir+X']
if env0_step%4 == 1:
actions[thread, 2] = self.act2digit_dictionary['ActionSet2::SpecificMoving;Dir+Y']
if env0_step%4 == 2:
actions[thread, 2] = self.act2digit_dictionary['ActionSet2::SpecificMoving;Dir-X']
if env0_step%4 == 3:
actions[thread, 2] = self.act2digit_dictionary['ActionSet2::SpecificMoving;Dir-Y']
elif env0_step < 20:
actions[thread, 0] = self.act2digit_dictionary['ActionSet2::Idle;StaticAlert']
elif env0_step < 30:
if env0_step%4 == 0:
actions[thread, 2] = self.act2digit_dictionary['ActionSet2::PatrolMoving;Dir+X+Y']
if env0_step%4 == 1:
actions[thread, 2] = self.act2digit_dictionary['ActionSet2::PatrolMoving;Dir+X-Y']
if env0_step%4 == 2:
actions[thread, 2] = self.act2digit_dictionary['ActionSet2::PatrolMoving;Dir-X-Y']
if env0_step%4 == 3:
actions[thread, 2] = self.act2digit_dictionary['ActionSet2::PatrolMoving;Dir-X+Y']
else:
actions[thread, 0] = self.act2digit_dictionary['ActionSet2::Idle;StaticAlert']
'''
"""
thread = 0
if env0_step%4 == 0:
actions[thread, 0] = self.act2digit_dictionary['ActionSet2::PatrolMoving;Dir+X+Y']
if env0_step%4 == 1:
actions[thread, 0] = self.act2digit_dictionary['ActionSet2::PatrolMoving;Dir+X-Y']
if env0_step%4 == 2:
actions[thread, 0] = self.act2digit_dictionary['ActionSet2::PatrolMoving;Dir-X-Y']
if env0_step%4 == 3:
actions[thread, 0] = self.act2digit_dictionary['ActionSet2::PatrolMoving;Dir-X+Y']
"""
"""
thread = 0
if env0_step%4 == 0:
actions[thread, 0] = self.act2digit_dictionary['ActionSet2::SpecificMoving;Dir+X+Y']
if env0_step%4 == 1:
actions[thread, 0] = self.act2digit_dictionary['ActionSet2::SpecificMoving;Dir+X-Y']
if env0_step%4 == 2:
actions[thread, 0] = self.act2digit_dictionary['ActionSet2::SpecificMoving;Dir-X-Y']
if env0_step%4 == 3:
actions[thread, 0] = self.act2digit_dictionary['ActionSet2::SpecificMoving;Dir-X+Y']
"""
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/script_ai/global_params.py
================================================
# # expel dist
# DRIVE_AWAY_DIST = 1.2
#
# # defend dist
# DEFEND_DIST = 2
#
# # retreat to the safe entrance
# ATTA_RETREAT_POS = [-0.5, -3]
#
# # retreat to the dangerous entrance
# DEF_RETREAT_POS = [3, 2]
#
# # key points
# key_points = [[0.7, -3.3], [-3, 0.7]]
# stance thres
RETREAT_STANCE = 0.3
# expel dist
DRIVE_AWAY_DIST = 1000
# defend dist
DEFEND_DIST = 1000
# retreat to the safe entrance
ATTA_RETREAT_POS = [-500, -3000]
# retreat to the dangerous entrance
DEF_RETREAT_POS = [3000, 2000]
# key points
key_points = [[700, -3300], [-3000, 700]]
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/script_ai/manual.py
================================================
import numpy as np
from UTIL.tensor_ops import my_view, copy_clone
try:
from numba import jit
except:
from UTIL.tensor_ops import dummy_decorator as jit
def to_cpu_numpy(x):
return x.cpu().numpy() if hasattr(x,'cpu') else x
class CoopAlgConfig():
reserve = None
class DummyAlgorithmFoundationHI3D():
def __init__(self, n_agent, n_thread, space, mcv=None, team=None):
from config import GlobalConfig
super().__init__()
self.n_agent = n_agent
ScenarioConfig = GlobalConfig.ScenarioConfig
self.num_entity = ScenarioConfig.num_entity
self.landmark_uid = ScenarioConfig.uid_dictionary['landmark_uid']
self.agent_uid = ScenarioConfig.uid_dictionary['agent_uid']
self.entity_uid = ScenarioConfig.uid_dictionary['entity_uid']
self.pos_decs = ScenarioConfig.obs_vec_dictionary['pos']
self.vel_decs = ScenarioConfig.obs_vec_dictionary['vel']
self.num_landmarks = len(self.landmark_uid)
self.invader_uid = ScenarioConfig.uid_dictionary['invader_uid']
self.n_entity = ScenarioConfig.num_entity
self.n_basic_dim = ScenarioConfig.obs_vec_length
self.n_thread = n_thread
self.attack_target = [None] * self.n_thread
def forward(self, inp, state, mask=None):
raise NotImplementedError
def to(self, device):
return self
def get_previous(self, team_intel):
info = copy_clone(team_intel['Latest-Obs'])
Env_Suffered_Reset = copy_clone(team_intel['Env-Suffered-Reset'])
return info, Env_Suffered_Reset
def interact_with_env(self, State_Recall):
main_obs, Env_Suffered_Reset = self.get_previous(State_Recall)
action = np.ones(shape=(main_obs.shape[0], main_obs.shape[1], 1)) * -1
n_thread = main_obs.shape[0]
about_all_objects = main_obs[:,0,:]
objects_emb = my_view(x=about_all_objects, shape=[0,-1,self.n_basic_dim]) # select one agent
invader_emb = objects_emb[:, self.invader_uid, :]
landmark_emb = objects_emb[:, self.landmark_uid,:]
invader_pos = invader_emb[:, :, self.pos_decs]
invader_vel = invader_emb[:, :, self.vel_decs]
landmark_pos = landmark_emb[:, :, self.pos_decs]
# 为每一个invader设置一个随机目标,当且仅当step == 0 时(episode刚刚开始)
self.set_nearest_target(Env_Suffered_Reset, invader_pos, landmark_pos)
n_thread = self.n_thread
n_agent = self.n_agent
attack_target = np.array(self.attack_target)
action = self.get_action(action, attack_target, invader_pos, invader_vel, landmark_pos, n_agent, n_thread)
assert not (action == -1).any()
actions_list = []
for i in range(self.n_agent):
actions_list.append(action[:, i])
return np.array(actions_list), None
# @jit(nopython=True)
# @staticmethod
@jit(forceobj=True)
def get_action(self, action, attack_target, invader_pos, invader_vel, landmark_pos, n_agent, n_thread):
posit_vec = np.zeros_like(invader_vel)
for thread in range(n_thread):
for agent in range(n_agent):
posit_vec[thread,agent] = landmark_pos[thread, attack_target[thread][agent]] - invader_pos[thread, agent]
return self.dir_to_action3d(vec=posit_vec,vel=invader_vel)
@staticmethod
def dir_to_action3d(vec, vel):
def np_mat3d_normalize_each_line(mat):
return mat / np.expand_dims(np.linalg.norm(mat, axis=2) + 1e-16, axis=-1)
desired_speed = 0.8
vec = np_mat3d_normalize_each_line(vec)*desired_speed
return vec
def set_nearest_target(self, Env_Suffered_Reset, invader_pos, landmark_pos):
for thread, env_suffered_reset_ in enumerate(Env_Suffered_Reset):
if env_suffered_reset_:
invader_attack_target = [None] * self.n_agent
for i in range(self.n_agent):
posit_vec = np.array([landmark_pos[thread, j] - invader_pos[thread, i] for j in range(self.num_landmarks)])
dis_arr = np.linalg.norm(posit_vec, axis=-1)
assigned_target = np.argmin(dis_arr)
# assigned_target = np.random.randint(low=0, high=self.num_landmarks)
invader_attack_target[i] = assigned_target
self.attack_target[thread] = np.array(invader_attack_target)
class DummyAlgorithmFoundationHI3D_old():
def __init__(self, n_agent, n_thread, space, mcv=None, team=None):
from config import GlobalConfig
super().__init__()
self.n_agent = n_agent
ScenarioConfig = GlobalConfig.ScenarioConfig
self.num_entity = ScenarioConfig.num_entity
self.landmark_uid = ScenarioConfig.uid_dictionary['landmark_uid']
self.agent_uid = ScenarioConfig.uid_dictionary['agent_uid']
self.entity_uid = ScenarioConfig.uid_dictionary['entity_uid']
self.pos_decs = ScenarioConfig.obs_vec_dictionary['pos']
self.vel_decs = ScenarioConfig.obs_vec_dictionary['vel']
self.num_landmarks = len(self.landmark_uid)
self.invader_uid = ScenarioConfig.uid_dictionary['invader_uid']
self.n_entity = ScenarioConfig.num_entity
self.n_basic_dim = ScenarioConfig.obs_vec_length
self.n_thread = n_thread
self.attack_target = [None] * self.n_thread
def forward(self, inp, state, mask=None):
raise NotImplementedError
def to(self, device):
return self
def get_previous(self, team_intel):
info = copy_clone(team_intel['Latest-Obs'])
Env_Suffered_Reset = copy_clone(team_intel['Env-Suffered-Reset'])
return info, Env_Suffered_Reset
def interact_with_env(self, State_Recall):
main_obs, Env_Suffered_Reset = self.get_previous(State_Recall)
action = np.ones(shape=(main_obs.shape[0], main_obs.shape[1], 1)) * -1
n_thread = main_obs.shape[0]
about_all_objects = main_obs[:,0,:]
objects_emb = my_view(x=about_all_objects, shape=[0,-1,self.n_basic_dim]) # select one agent
invader_emb = objects_emb[:, self.invader_uid, :]
landmark_emb = objects_emb[:, self.landmark_uid,:]
invader_pos = invader_emb[:, :, self.pos_decs]
invader_vel = invader_emb[:, :, self.vel_decs]
landmark_pos = landmark_emb[:, :, self.pos_decs]
# 为每一个invader设置一个随机目标,当且仅当step == 0 时(episode刚刚开始)
self.set_random_target(Env_Suffered_Reset)
n_thread = self.n_thread
n_agent = self.n_agent
attack_target = np.array(self.attack_target)
action = self.get_action(action, attack_target, invader_pos, invader_vel, landmark_pos, n_agent, n_thread)
assert not (action == -1).any()
actions_list = []
for i in range(self.n_agent):
actions_list.append(action[:, i])
return np.array(actions_list), None
# @jit(nopython=True)
# @staticmethod
@jit(forceobj=True)
def get_action(self, action, attack_target, invader_pos, invader_vel, landmark_pos, n_agent, n_thread):
posit_vec = np.zeros_like(invader_vel)
for thread in range(n_thread):
for agent in range(n_agent):
posit_vec[thread,agent] = landmark_pos[thread, attack_target[thread][agent]] - invader_pos[thread, agent]
return self.dir_to_action3d(vec=posit_vec,vel=invader_vel)
@staticmethod
@jit(forceobj=True)
def dir_to_action3d(vec, vel):
def np_mat3d_normalize_each_line(mat):
return mat / np.expand_dims(np.linalg.norm(mat, axis=2) + 1e-16, axis=-1)
vec = np_mat3d_normalize_each_line(vec)
e_u = np.array([0 ,1 , 0 ])
e_d = np.array([0 ,-1 , 0 ])
e_r = np.array([1 ,0 , 0 ])
e_l = np.array([-1 ,0 , 0 ])
e_a = np.array([0 ,0 , 1 ])
e_b = np.array([0 ,0 ,-1 ])
vel_u = np_mat3d_normalize_each_line(vel + e_u * 0.1)
vel_d = np_mat3d_normalize_each_line(vel + e_d * 0.1)
vel_r = np_mat3d_normalize_each_line(vel + e_r * 0.1)
vel_l = np_mat3d_normalize_each_line(vel + e_l * 0.1)
vel_a = np_mat3d_normalize_each_line(vel + e_a * 0.1)
vel_b = np_mat3d_normalize_each_line(vel + e_b * 0.1)
proj_u = (vel_u * vec).sum(-1)
proj_d = (vel_d * vec).sum(-1)
proj_r = (vel_r * vec).sum(-1)
proj_l = (vel_l * vec).sum(-1)
proj_a = (vel_a * vec).sum(-1)
proj_b = (vel_b * vec).sum(-1)
_u = ((vec * e_u).sum(-1)>0).astype(np.int)
_d = ((vec * e_d).sum(-1)>0).astype(np.int)
_r = ((vec * e_r).sum(-1)>0).astype(np.int)
_l = ((vec * e_l).sum(-1)>0).astype(np.int)
_a = ((vec * e_a).sum(-1)>0).astype(np.int)
_b = ((vec * e_b).sum(-1)>0).astype(np.int)
proj_u = proj_u + _u*2
proj_d = proj_d + _d*2
proj_r = proj_r + _r*2
proj_l = proj_l + _l*2
proj_a = proj_a + _a*2
proj_b = proj_b + _b*2
dot_stack = np.stack([proj_u, proj_d, proj_r, proj_l, proj_a, proj_b])
direct = np.argmax(dot_stack, 0)
action = np.where(direct == 0, 2, 0)
action += np.where(direct == 1, 4, 0)
action += np.where(direct == 2, 1, 0)
action += np.where(direct == 3, 3, 0)
action += np.where(direct == 4, 5, 0)
action += np.where(direct == 5, 6, 0)
return np.expand_dims(action, axis=-1)
def set_random_target(self, Env_Suffered_Reset):
for thread, env_suffered_reset_ in enumerate(Env_Suffered_Reset):
if env_suffered_reset_:
invader_attack_target = [None] * self.n_agent
for i in range(self.n_agent):
assigned_target = np.random.randint(low=0, high=self.num_landmarks)
invader_attack_target[i] = assigned_target
self.attack_target[thread] = np.array(invader_attack_target)
class IHDummyAlgorithmFoundation():
def __init__(self, n_agent, n_thread, space, mcv=None, team=None):
from config import GlobalConfig
super().__init__()
self.n_agent = n_agent
ScenarioConfig = GlobalConfig.ScenarioConfig
self.num_entity = ScenarioConfig.num_entity
self.landmark_uid = ScenarioConfig.uid_dictionary['landmark_uid']
self.agent_uid = ScenarioConfig.uid_dictionary['agent_uid']
self.invader_uid = ScenarioConfig.uid_dictionary['invader_uid']
self.n_entity = ScenarioConfig.num_entity
self.n_basic_dim = ScenarioConfig.obs_vec_length
self.n_object = ScenarioConfig.num_object
self.n_thread = n_thread
self.num_landmarks = ScenarioConfig.num_landmarks
self.attack_target = [None] * self.n_thread
def forward(self, inp, state, mask=None):
raise NotImplementedError
def to(self, device):
return self
def get_previous(self, team_intel):
info = copy_clone(team_intel['Latest-Obs'])
done = copy_clone(team_intel['Env-Suffered-Reset'])
return info, done
'''
info, done = self.get_previous(team_intel)
current_step = info[:,0,-1]
object_info = my_view(info[:,0,:-1],[0,-1,5])
worker_emb = object_info[:, self.worker_uid]
cargo_emb = object_info[:, self.cargo_uid]
worker_pos = worker_emb[:,:,self.dec_pos]
worker_vel = worker_emb[:,:,self.dec_vel]
worker_drag = worker_emb[:,:,self.dec_other]
cargo_dropoff_pos = cargo_emb[:,:,self.dec_pos]
cargo_dropoff_weight = cargo_emb[:,:,self.dec_other]
cargo_pos = cargo_dropoff_pos[:, :self.n_cargo]
dropoff_pos = cargo_dropoff_pos[:, self.n_cargo:]
cargo_weight = cargo_dropoff_weight[:, :self.n_cargo]
'''
def interact_with_env(self, State_Recall):
info, done = self.get_previous(State_Recall)
current_step = info[:,0,-1]
entity_pure_emb = my_view(info[:,0,:-1],shape=[0,-1,5])
action = np.ones(shape=(info.shape[0], info.shape[1], 1)) * -1
entity_pos = entity_pure_emb[:, :, (0,1)]
entity_vel = entity_pure_emb[:, :, (2,3)]
invader_vel = entity_vel[:, self.invader_uid]
invader_pos = entity_pos[:, self.invader_uid]
landmark_pos = entity_pos[:, self.landmark_uid]
# 为每一个invader设置一个随机目标,当且仅当step == 0 时(episode刚刚开始)
self.set_random_target(current_step)
n_thread = self.n_thread
n_agent = self.n_agent
attack_target = np.array(self.attack_target)
self.get_action(action, attack_target, invader_pos, invader_vel, landmark_pos, n_agent, n_thread)
assert not (action == -1).any()
actions_list = []
for i in range(self.n_agent):
actions_list.append(action[:, i])
return np.array(actions_list), None
@staticmethod
@jit(nopython=True)
def get_action(action, attack_target, invader_pos, invader_vel, landmark_pos, n_agent, n_thread):
def Norm(x):
return np.linalg.norm(x)
for thread in range(n_thread):
for agent in range(n_agent):
speed_vec = invader_vel[thread, agent]
posit_vec = landmark_pos[thread, attack_target[thread][agent]] - invader_pos[thread, agent]
posit_norm = Norm(posit_vec)
if posit_norm != 0:
posit_vec = posit_vec / posit_norm
speed_norm = Norm(speed_vec)
if speed_norm != 0:
speed_vec = speed_vec / speed_norm
up = np.sum(posit_vec * np.array([0, 1]))
dn = np.sum(posit_vec * np.array([0, -1]))
ri = np.sum(posit_vec * np.array([1, 0]))
le = np.sum(posit_vec * np.array([-1, 0]))
up_v = np.sum(speed_vec * np.array([0, 1]))
dn_v = np.sum(speed_vec * np.array([0, -1]))
ri_v = np.sum(speed_vec * np.array([1, 0]))
le_v = np.sum(speed_vec * np.array([-1, 0]))
dot_product = np.array([up, dn, ri, le])
dot_product_v = np.array([up_v, dn_v, ri_v, le_v])
# situation 1
bool_ = (dot_product > dot_product_v) & (dot_product > 0)
direct = bool_.astype(np.int64)
if np.sum(direct) != 1: # 向量重合,或者速度为0,不再对比速度方向
direct = np.argmax(dot_product)
else:
# assert sum(direct) == 1 #检查
direct = np.argmax(direct)
# stay_no_acc?[0], left[1], right[2], DOWN[3], Up[4]
if direct == 0: # Up
action[thread, agent, 0] = 2
elif direct == 1: # DOWN
action[thread, agent, 0] = 4
elif direct == 2: # right
action[thread, agent, 0] = 1
elif direct == 3: # left
action[thread, agent, 0] = 3
def set_random_target(self, step_env_cnt_cnt):
for thread, step_env_cnt in enumerate(step_env_cnt_cnt):
if step_env_cnt == 0:
invader_attack_target = [None] * self.n_agent
for i in range(self.n_agent):
assigned_target = np.random.randint(low=0, high=self.num_landmarks)
invader_attack_target[i] = assigned_target
self.attack_target[thread] = np.array(invader_attack_target)
class DummyAlgorithmFoundation():
def __init__(self, n_agent, n_thread, space, mcv=None, team=None):
super().__init__()
self.n_agent = n_agent
self.n_thread = n_thread
self.mcv = mcv
self.act_space = space['act_space']
self.obs_space = space['obs_space']
self.n_cargo = GlobalConfig.ScenarioConfig.n_cargo
self.worker_uid = GlobalConfig.ScenarioConfig.uid_dictionary['agent_uid']
self.cargo_uid = GlobalConfig.ScenarioConfig.uid_dictionary['entity_uid']
self.dec_pos = GlobalConfig.ScenarioConfig.obs_vec_dictionary['pos']
self.dec_vel = GlobalConfig.ScenarioConfig.obs_vec_dictionary['vel']
self.dec_other = GlobalConfig.ScenarioConfig.obs_vec_dictionary['mass']
self.vec_len = GlobalConfig.ScenarioConfig.obs_vec_length
def interact_with_env(self, team_intel):
info, done = self.get_previous(team_intel)
current_step = info[:,0,-1]
object_info = my_view(info[:,0,:-1],[0,-1,self.vec_len])
worker_emb = object_info[:, self.worker_uid]
cargo_emb = object_info[:, self.cargo_uid]
worker_pos = worker_emb[:,:,self.dec_pos]
worker_vel = worker_emb[:,:,self.dec_vel]
worker_drag = worker_emb[:,:,self.dec_other]
cargo_dropoff_pos = cargo_emb[:,:,self.dec_pos]
cargo_dropoff_weight = cargo_emb[:,:,self.dec_other]
cargo_pos = cargo_dropoff_pos[:, :self.n_cargo]
dropoff_pos = cargo_dropoff_pos[:, self.n_cargo:]
cargo_weight = (cargo_dropoff_weight[:, :self.n_cargo]+1)*(self.n_agent/self.n_cargo)
worker_target_sel = np.zeros(shape=(self.n_thread,self.n_agent, 1))
for t in range(self.n_thread):
p = 0
for c, cw in enumerate(cargo_weight[t]):
if cw > self.n_agent: continue
for j in range(int(p), int(p+cw)):
worker_target_sel[t,j] = c if worker_drag[t,j] < 0 else (c+self.n_cargo)
p = p+cw
target_pos = np.take_along_axis(cargo_dropoff_pos,worker_target_sel.astype(np.long),1)
actions_list = []
act = np.random.randint(low=0,high=5,size=(self.n_thread, self.n_agent, 1))
act = self.dir_to_action(vec=target_pos-worker_pos, vel=worker_vel)
for i in range(self.n_agent):
actions_list.append(act[:, i])
return actions_list, None
def get_previous(self, team_intel):
info = copy_clone(team_intel['Latest-Obs'])
done = copy_clone(team_intel['Env-Suffered-Reset'])
return info, done
@staticmethod
def dir_to_action(vec, vel):
def np_mat3d_normalize_each_line(mat):
return mat / np.expand_dims(np.linalg.norm(mat, axis=2) + 1e-16, axis=-1)
vec = np_mat3d_normalize_each_line(vec)
e_u = np.array([0,1])
e_d = np.array([0,-1])
e_r = np.array([1,0])
e_l = np.array([-1,0])
vel_u = np_mat3d_normalize_each_line(vel + e_u * 0.1)
vel_d = np_mat3d_normalize_each_line(vel + e_d * 0.1)
vel_r = np_mat3d_normalize_each_line(vel + e_r * 0.1)
vel_l = np_mat3d_normalize_each_line(vel + e_l * 0.1)
proj_u = (vel_u * vec).sum(-1)
proj_d = (vel_d * vec).sum(-1)
proj_r = (vel_r * vec).sum(-1)
proj_l = (vel_l * vec).sum(-1)
_u = ((vec * e_u).sum(-1)>0).astype(np.int)
_d = ((vec * e_d).sum(-1)>0).astype(np.int)
_r = ((vec * e_r).sum(-1)>0).astype(np.int)
_l = ((vec * e_l).sum(-1)>0).astype(np.int)
proj_u = proj_u + _u*2
proj_d = proj_d + _d*2
proj_r = proj_r + _r*2
proj_l = proj_l + _l*2
dot_stack = np.stack([proj_u, proj_d, proj_r, proj_l])
direct = np.argmax(dot_stack, 0)
action = np.where(direct == 0, 2, 0)
action += np.where(direct == 1, 4, 0)
action += np.where(direct == 2, 1, 0)
action += np.where(direct == 3, 3, 0)
return np.expand_dims(action, axis=-1)
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/script_ai/module_evaluation.py
================================================
import copy
import random
import numpy as np
import datetime
import time
import math
#态势评估模块
#接口输入:全局状态信息,包括进攻方无人车各种状态,防守方无人车各种状态以及地图状态
class Evaluation_module():
def __init__(self, critical_points=[[-3000, 700, 0], [700, -3300, 0]]):
self.R0 = 750 # 距离态势缩放因子
self.V0 = 0.1 # 速度态势缩放因子
self.phi0 = 1 # 俯仰角态势系数
self.psi0 = 0 # 偏航角态势系数
self.ammo0 = 5 # 载荷态势缩放因子(增函数用)
self.heal0 = 5 # 血量态势缩放因子(增函数用)
self.AMMO0 = 5 # 载荷态势缩放因子(减函数用)
self.HEAL0 = 5 # 血量态势缩放因子(减函数用)
# 已知的环境信息
self.critical_points = critical_points # 夺控点位置
# 计算相对速度态势时使用的增函数,输出区间为(0,1)
def SigmoidTen(self, x, c):
y = np.exp(-x/c)
return 1/(1+10*y)
# 计算停留时间态势时使用的增函数,输出区间为[0,0.9)
def SigmoidNine(self, x, c):
y = np.exp(-x/c)
return 1/(1+9*y) - 0.1
# 计算无人车相对坐标点的态势,以备最佳点规划使用,输入为点的坐标和此无人车的信息
# 在进行最佳点规划时,会计算待打击目标附近的几个敌方无人车的态势之和,以此来计算最佳点
def UAV2Point(self, p_position, position, velocity, phi, psi, ammo, health):
# 相对距离威胁
p_position = np.array(p_position)
position = np.array(position)
velocity = np.array(velocity)
r = p_position - position
dist = np.sqrt(np.sum(np.square(r)))
Sr = np.exp(-dist/self.R0)
# 相对速度威胁
V = np.dot(r, velocity) / dist # 求速度在连线朝向上的投影
Sv = self.SigmoidTen(V, self.V0)
# 俯仰角威胁
Sphi = np.exp(-np.abs(phi - self.phi0))
# 偏航角威胁
Spsi = np.exp(-np.abs(psi - self.psi0))
# 载荷威胁(增函数)载荷为0时威胁为0
# Sammo = self.SigmoidNine(ammo, self.ammo0)
Sammo = 1
# 强健度威胁(增函数) 血量为0时威胁为0
Sheal = self.SigmoidNine(health, self.heal0)
# 总态势计算 (系数之和不一定为1,每个系数直接在此处修改)
# 载荷威胁和强健度威胁此处用乘法,算法需要 [在打击范围内] 寻找总态势最小的点作为坐标点
S_sum = (0.6 * Sr + 0.2 * Sv + 0.2 * Sphi + 0.0 * Spsi) * Sammo * Sheal
return S_sum
# 计算无人车相对无人车(智能体)的态势(威胁),以作为选取打击对象的依据(选取威胁大的但是优势低的)
# 其中a_position等表示智能体无人车的参数,即计算态势时考虑的主体的参数
def UAV2UAV(self, identity, a_position,a_ammo,a_health, position,velocity, phi, psi, ammo, health):
a_position = np.array(a_position)
position = np.array(position)
velocity = np.array(velocity)
# 能力比例系数,如我方无人车面对敌方无人车时为1.5, 敌方无人车面对我方无人车时为0.67
# 己方载荷及健康态势计算(增函数)
# Mammo = self.SigmoidNine(a_ammo, self.ammo0)
Mammo = 1
Mhealth = self.SigmoidNine(a_health, self.heal0)
# 对方载荷及健康优势计算(减函数)
# Sammo = np.exp(-ammo / self.AMMO0)
Sammo = 1
Shealth = np.exp(-health / self.HEAL0)
# 进攻方优势计算(选择优势最大的进行打击,若相同则选择距离更近的进行打击)
if identity == "offensive":
# 相对距离威胁
r = a_position - position
dist = np.sqrt(np.sum(np.square(r)))
Sr = np.exp(-dist / self.R0)
S_offensive = 10 * Mammo*Mhealth * Sammo*Shealth * Sr # 乘了系数10以致于S不过分小
return S_offensive
# 防守方优势计算(优先打击距离夺控点近的无人车)
if identity == "defensive":
Sr_temp = 0
for critical_point in self.critical_points:
r = critical_point - position
dist = np.sqrt(np.sum(np.square(r)))
Sr = np.exp(-dist / self.R0)
if Sr >= Sr_temp:
Sr_temp = Sr
S_defensive = 10 * Mammo * Mhealth * Sammo * Shealth * Sr_temp
return S_defensive
# 计算无人机相对夺控点的态势,以此作为防守方是否进行驱离及选择谁对谁进行驱离的依据
def Drone2Point(self, p_position,p_ts, position, velocity):
# 相对距离威胁
p_position = np.array(p_position)
position = np.array(position)
velocity = np.array(velocity)
r = p_position - position
dist = np.sqrt(np.sum(np.square(r)))
Spr = np.exp(-dist / 1) # 此处的放缩系数采用与无人机参数相关的
# 相对速度威胁
V = np.dot(r, velocity) / dist # 求速度在连线朝向上的投影
Spv = self.SigmoidTen(V, 0.2) # 此处的放缩系数采用与无人机参数相关的
# 停留时间威胁
Spt = self.SigmoidNine(p_ts, 0.5) # 不能接受无人机停留3秒及以上
# 计算综合态势
Sp = Spt + 0.6 * Spr + 0.2 * Spv # 建议驱离阈值: Sp >= 0.5
print(Sp)
return Sp
def UAV2Point_id(self, attacker_dict, key_point):
# 相对距离威胁
#进攻方无人车信息
ally_agent_pos = [attacker_dict['X'], attacker_dict['Y'], attacker_dict['Z']]
ally_agent_blood = attacker_dict['blood']
ally_agent_velocityx = attacker_dict['vx']
ally_agent_velocityy = attacker_dict['vy']
ally_agent_ammo = attacker_dict['ammo']
ally_agent_velocity = [ally_agent_velocityx, ally_agent_velocityy, 0]
p_position = np.array(key_point)
position = np.array(ally_agent_pos)
velocity = np.array(ally_agent_velocity)
r = p_position - position
phi = math.degrees(math.atan2((ally_agent_pos[0] - key_point[0]), (ally_agent_pos[1] - key_point[1])))
ammo = ally_agent_ammo
health = ally_agent_blood
# 相对速度威胁
dist = np.sqrt(np.sum(np.square(r)))
Sr = np.exp(-dist/self.R0)
# V = np.dot(r, velocity) / dist # 求速度在连线朝向上的投影
#Sv = self.SigmoidTen(V, self.V0)
# 偏航角威胁
# Sphi = np.exp(-np.abs(phi - self.phi0))
# 俯仰角威胁
# Spsi = np.exp(-np.abs(psi - self.psi0))
# 载荷威胁(增函数)载荷为0时威胁为0
# Sammo = self.SigmoidNine(ammo, self.ammo0)
Sammo = 1
# 强健度威胁(增函数) 血量为0时威胁为0
Sheal = self.SigmoidNine(health, self.heal0)
# 总态势计算 (系数之和不一定为1,每个系数直接在此处修改)
# 载荷威胁和强健度威胁此处用乘法,算法需要 [在打击范围内] 寻找总态势最小的点作为坐标点
# S_sum = (0.6 * Sr + 0.2 * Sv + 0.2 * Sphi + 0.0 * Spsi) * Sammo * Sheal
# S_sum = (0.6 * Sr + 0.2 * Sv + 0.2 * Sphi) * Sammo * Sheal
S_sum = Sr * Sammo * Sheal
return S_sum
def UAV2UAV_id(self, identity, attacker_dict, defender_dict):
#进攻方无人车信息
ally_agent_pos = [attacker_dict['X'], attacker_dict['Y'], attacker_dict['Z']]
ally_agent_blood = attacker_dict['blood']
ally_agent_ammo = attacker_dict['ammo']
enemy_agent_pos = [defender_dict['X'], defender_dict['Y'], defender_dict['Z']]
enemy_agent_blood = defender_dict['blood']
enemy_agent_ammo = defender_dict['ammo']
a_position = np.array(ally_agent_pos)
position = np.array(enemy_agent_pos)
a_ammo = ally_agent_ammo
a_health = ally_agent_blood
ammo = enemy_agent_ammo
health = enemy_agent_blood
# 进攻方优势计算(选择优势最大的进行打击,若相同则选择距离更近的进行打击)
if identity == "offensive":
# 相对距离威胁
# Mammo = self.SigmoidNine(a_ammo, self.ammo0)
Mammo = 1
del_health = 100 - health
Mhealth = health / 100
# Mhealth = self.SigmoidNine(a_health, self.heal0)
# 对方载荷及健康优势计算(减函数)
# Sammo = np.exp(ammo / self.AMMO0)
Sammo = 1
# Shealth = np.exp(health / self.HEAL0)
del_a_health = 100 - a_health
Shealth = a_health / 100
r = a_position - position
dist = np.sqrt(np.sum(np.square(r)))
Sr = np.exp(-dist / self.R0)
S_offensive = 1 * Mammo * Mhealth * Sammo * Shealth * Sr # 乘了系数10以致于S不过分小
return S_offensive
# 防守方优势计算(优先打击距离夺控点近的无人车)
if identity == "defensive":
Sr_temp = 0
# Mammo = self.SigmoidNine(ammo, self.ammo0)
Mammo = 1
# Mhealth = self.SigmoidNine(health, self.heal0)
del_health = 100 - health
Mhealth = health / 100
# 对方载荷及健康优势计算(减函数)
# Sammo = np.exp(-a_ammo / self.AMMO0)
Sammo = 1
del_a_health = 100 - a_health
Shealth = a_health / 100
for critical_point in self.critical_points:
r = critical_point - a_position
dist = np.sqrt(np.sum(np.square(r)))
Sr = np.exp(-dist / self.R0)
if Sr >= Sr_temp:
Sr_temp = Sr
S_defensive = Mammo * Mhealth * Sammo * Shealth * Sr_temp
return S_defensive
def Drone2Point_id(self, drone_data, key_point):
drone_pos = [drone_data['X'], drone_data['Y'], drone_data['Z']]
drone_blood = drone_data['blood']
drone_velocityx = drone_data['vx']
drone_velocityy = drone_data['vy']
drone_velocity = [drone_velocityx, drone_velocityy, 0]
# 相对距离威胁
p_position = np.array(key_point)
position = np.array(drone_pos)
velocity = np.array(drone_velocity)
r = p_position - position
dist = np.sqrt(np.sum(np.square(r)))
Spr = np.exp(-dist / 1000) # 此处的放缩系数采用与无人机参数相关的
# 相对速度威胁
# V = np.dot(r, velocity) / dist # 求速度在连线朝向上的投影
# Spv = self.SigmoidTen(V, 0.2) # 此处的放缩系数采用与无人机参数相关的
# 停留时间威胁
# Spt = self.SigmoidNine(p_ts, 0.5) # 不能接受无人机停留3秒及以上
# 计算综合态势
# Sp = Spt + 0.6 * Spr + 0.2 * Spv # 建议驱离阈值: Sp >= 0.5
Sp = Spr
return Sp
#计算防守方无人车相对于进攻方无人车的态势矩阵
#矩阵横轴维度为进攻方无人车数量,纵轴维度为防守方无人车数量
def defend_to_attack(self, self_data, ally_agents_data, enemy_agents_data, key_points):
#无人车
all_friend_agents_data = dict(self_data, **ally_agents_data) # 进攻方所有智能体数据
for agent_id, dict_value in all_friend_agents_data.items():
if 'blood' not in dict_value:
temp = agent_id
all_friend_agents_data.pop(temp) # 剔除无人机数据,只考虑地面无人车平台
#进攻方无人车信息
all_friend_agent_pos = []
all_friend_agent_blood = []
all_friend_agent_velocityx = []
all_friend_agent_velocityy = []
all_friend_agent_ammo = []
all_friend_agent_ID = []
all_friend_amount = 0
for agent_id, dict_value in all_friend_agents_data.items():
all_friend_agent_ID.append(agent_id) #编号接口,形式参照丘老师代码,正确性存疑
# all_friend_agent_ammo.append(dict_value['ammo']) #载荷接口,形式参照丘老师代码,正确性存疑
all_friend_agent_velocityx.append(dict_value['velocityx']) #速度接口,形式参照丘老师代码,正确性存疑
all_friend_agent_velocityy.append(dict_value['velocityy']) #速度接口,形式参照丘老师代码,正确性存疑
all_friend_agent_blood.append(dict_value['blood']) #血量接口,形式参照丘老师代码,正确性存疑
all_friend_agent_pos.append([dict_value['X'], dict_value['Y'], dict_value['Z']]) #位置接口,参照丘老师代码编写
all_friend_amount += 1
#防守方无人车信息
all_enemy_agent_pos = []
all_enemy_agent_blood = []
all_enemy_agent_velocityx = []
all_enemy_agent_velocityy = []
all_enemy_agent_ammo = []
all_enemy_agent_ID = []
all_enemy_amount = 0
for agent_id, dict_value in enemy_agents_data.items():
all_enemy_agent_ID.append(agent_id)
# all_enemy_agent_ammo.append(dict_value['ammo'])
all_enemy_agent_velocityx.append(dict_value['velocityx'])
all_enemy_agent_velocityy.append(dict_value['velocityy']) #速度接口,形式参照丘老师代码,正确性存疑
all_enemy_agent_blood.append(dict_value['blood'])
all_enemy_agent_pos.append([dict_value['X'], dict_value['Y'], dict_value['Z']])
all_enemy_amount += 1
evaluation = np.zeros((all_friend_amount, all_enemy_amount))
for i in range(all_friend_amount):
for j in range(all_enemy_amount):
yaw = math.degrees(math.atan2((all_enemy_agent_pos[j][0] - all_friend_agent_pos[i][0]), (all_enemy_agent_pos[j][1] - all_friend_agent_pos[i][1])))
# UAV2UAV(self, identity, a_position,a_ammo,a_health, position,velocity,phi,psi,ammo,health)
all_enemy_agent_velocity = [all_enemy_agent_velocityx[j], all_enemy_agent_velocityy[j], 0]
evaluation[i][j] = self.UAV2UAV("offensive", all_friend_agent_pos[i], 0, all_friend_agent_blood[i],
all_enemy_agent_pos[j], all_enemy_agent_velocity, yaw, 0, 0, all_enemy_agent_blood[j])
return evaluation
#计算进攻方无人车相对于防守方无人车的态势矩阵
#矩阵横轴维度为防守方无人车数量,纵轴维度为进攻方无人车数量
def attack_to_defend(self, self_data, ally_agents_data, enemy_agents_data, key_points):
#无人车
all_friend_agents_data = dict(self_data, **ally_agents_data) # 进攻方所有智能体数据
for agent_id, dict_value in all_friend_agents_data.items():
if 'blood' not in dict_value:
temp = agent_id
all_friend_agents_data.pop(temp) # 剔除无人机数据,只考虑地面无人车平台
#进攻方无人车信息
all_friend_agent_pos = []
all_friend_agent_blood = []
all_friend_agent_velocityx = []
all_friend_agent_velocityy = []
all_friend_agent_ammo = []
all_friend_agent_ID = []
all_friend_amount = 0
for agent_id, dict_value in all_friend_agents_data.items():
all_friend_agent_ID.append(agent_id) #编号接口,形式参照丘老师代码,正确性存疑
#all_friend_agent_ammo.append(dict_value['ammo']) #载荷接口,形式参照丘老师代码,正确性存疑
all_friend_agent_velocityx.append(dict_value['velocityx']) #速度接口,形式参照丘老师代码,正确性存疑
all_friend_agent_velocityy.append(dict_value['velocityy']) #速度接口,形式参照丘老师代码,正确性存疑
all_friend_agent_blood.append(dict_value['blood']) #血量接口,形式参照丘老师代码,正确性存疑
all_friend_agent_pos.append([dict_value['X'], dict_value['Y'], dict_value['Z']]) #位置接口,参照丘老师代码编写
all_friend_amount += 1
#防守方无人车信息
all_enemy_agent_pos = []
all_enemy_agent_blood = []
all_enemy_agent_velocityx = []
all_enemy_agent_velocityy = []
all_enemy_agent_ammo = []
all_enemy_agent_ID = []
all_enemy_amount = 0
for agent_id, dict_value in enemy_agents_data.items():
all_enemy_agent_ID.append(agent_id)
#all_enemy_agent_ammo.append(dict_value['ammo'])
all_enemy_agent_velocityx.append(dict_value['velocityx'])
all_enemy_agent_velocityy.append(dict_value['velocityy'])
all_enemy_agent_blood.append(dict_value['blood'])
all_enemy_agent_pos.append([dict_value['X'], dict_value['Y'], dict_value['Z']])
all_enemy_amount += 1
evaluation = np.zeros((all_enemy_amount, all_friend_amount))
for i in range(all_enemy_amount):
for j in range(all_friend_amount):
yaw = math.degrees(math.atan2((all_enemy_agent_pos[j][0] - all_friend_agent_pos[i][0]), (all_enemy_agent_pos[j][1] - all_friend_agent_pos[i][1])))
# UAV2UAV(self, identity, a_position,a_ammo,a_health, position,velocity,phi,psi,ammo,health)
all_friend_agent_velocity = [all_friend_agent_velocityx[j], all_friend_agent_velocityy[j], 0]
evaluation[i][j] = self.UAV2UAV("defensive", all_enemy_agent_pos[i], 0, all_enemy_agent_blood[i],
all_friend_agent_pos[j], all_friend_agent_velocity, yaw, 0, 0, all_friend_agent_blood[j])
return evaluation
#计算无人机对于夺控点位置的态势矩阵
#矩阵横轴代表夺控点,纵轴代表无人机
def uav_to_defend(self, self_data, ally_agents_data, enemy_agents_data, key_points):
all_friend_agents_data = dict(self_data, **ally_agents_data) # 进攻方所有智能体数据
for agent_id, dict_value in all_friend_agents_data.items():
if 'blood' not in dict_value:
temp1 = agent_id
temp2 = dict_value
drone_data = {}
drone_data[temp1] = temp2
#无人机信息
drone_pos = []
drone_velocityx = []
drone_velocityy = []
drone_ID = []
drone_amount = 0
for agent_id, dict_value in drone_data.items():
drone_ID.append(agent_id) #编号接口,形式参照丘老师代码,正确性存疑
drone_velocityx.append(dict_value['velocityx']) #速度接口,形式参照丘老师代码,正确性存疑
drone_velocityy.append(dict_value['velocityy']) #速度接口,形式参照丘老师代码,正确性存疑
drone_pos.append([dict_value['X'], dict_value['Y'], dict_value['Z']]) #位置接口,参照丘老师代码编写
drone_amount += 1
#夺控点位置
key_point_amount = 0
key_point_pos = []
for key_point in key_points:
key_point_pos.append(key_point)
key_point_amount += 1
evaluation = np.zeros((key_point_amount, drone_amount))
for i in range(key_point_amount):
for j in range(drone_amount):
# Drone2Point(self, p_position,p_ts, position, velocity)
# print(self.Drone2Point(key_point_pos[i], 0, drone_pos[j], drone_velocity[j]))
drone_velocity = [drone_velocityx[j], drone_velocityy[j], 0]
evaluation[i][j] = self.Drone2Point(key_point_pos[i], 0, drone_pos[j], drone_velocity)
return evaluation
'''
#计算无人车对于周围位置点的态势评估矩阵
#
def attack_to_point(self_data, ally_agents_data, enemy_agents_data, key_points):
#无人车
all_friend_agents_data = dict(self_data, **ally_agents_data) # 进攻方所有智能体数据
all_friend_agents_data.pop("231") # 剔除无人机数据,只考虑地面无人车平台
#进攻方无人车信息
all_friend_agent_pos = []
all_friend_agent_blood = []
all_friend_agent_velocity = []
all_friend_agent_ammo = []
all_friend_agent_ID = []
all_friend_amount = 0
for agent_id, dict_value in all_friend_agents_data.items():
all_friend_agent_ID.append(dict_value['ID']) #编号接口,形式参照丘老师代码,正确性存疑
all_friend_agent_ammo.append(dict_value['ammo']) #载荷接口,形式参照丘老师代码,正确性存疑
all_friend_agent_velocity.append(dict_value['velocity']) #速度接口,形式参照丘老师代码,正确性存疑
all_friend_agent_blood.append(dict_value['blood']) #血量接口,形式参照丘老师代码,正确性存疑
all_friend_agent_pos.append([dict_value['X'], dict_value['Y'], dict_value['Z']]) #位置接口,参照丘老师代码编写
all_friend_amount += 1
evaluation = np.zeros((all_friend_amount, all_enemy_amount))
for i in range(all_enemy_amount):
for j in range(all_friend_amount):
yaw = math.degrees(math.atan2((all_enemy_agent_pos[j][0] - all_friend_agent_pos[i][0]), (all_enemy_agent_pos[j][1] - all_friend_agent_pos[i][1])))
#UAV2UAV(self, identity, a_position,a_ammo,a_health, position,velocity,phi,psi,ammo,health)
evaluation[i][j] = self.UAV2UAV("defensive", all_enemy_agent_pos[i], all_enemy_agent_ammo[i], all_enemy_agent_blood[i],
all_friend_agent_pos[j], all_friend_agent_velocity[j], yaw, 0, all_friend_agent_ammo[j], all_friend_agent_blood[j])
return evaluation
'''
#态势评估主函数
def evaluate(self, self_data, ally_agents_data, enemy_agents_data, key_points):
d2a = self.defend_to_attack(self_data, ally_agents_data, enemy_agents_data, key_points)
a2d = self.attack_to_defend(self_data, ally_agents_data, enemy_agents_data, key_points)
u2d = self.uav_to_defend(self_data, ally_agents_data, enemy_agents_data, key_points)
return d2a, a2d, u2d
def test():
evaluator = Evaluation_module()
# test
if __name__ == '__main__':
test()
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/script_ai/red_strategy.py
================================================
import numpy as np
import math
from scipy.optimize import linear_sum_assignment
def defense_combat(self_data, ally_agents_data, enemy_agents_data, key_points, blue_alive, red_alive):
# 防守方red小车最大速度
red_car_max_vel = 600
# 进攻方blue小车最大速度
blue_car_max_vel = 600
# 进攻方blue无人机最大速度
blue_drone_max_vel = 600
# 进攻方无人机占领夺控点胜利时间
time_to_win = 2.0
# 驱离载荷作用范围
expel_range = 1200
# 无人车打击距离
fire_dist = 2000
all_enemy_agent_pos = []
for agent_id, dict_value in enemy_agents_data.items():
all_enemy_agent_pos.append([dict_value['X'], dict_value['Y'], dict_value['Z']])
"""
all_enemy_agent_yaw = []
for agent_id, dict_value in enemy_agents_data.items():
if agent_id != '231':
all_enemy_agent_yaw.append([dict_value['Yaw']])
"""
if '311' in self_data.keys():
friend_agents_data = dict(self_data, **ally_agents_data)
if '311' in ally_agents_data.keys():
friend_agents_data = dict(ally_agents_data, **self_data)
all_friend_agent_pos = []
for agent_id, dict_value in friend_agents_data.items():
all_friend_agent_pos.append([dict_value['X'], dict_value['Y'], dict_value['Z']])
"""
all_friend_agent_yaw = []
for agent_id, dict_value in friend_agents_data.items():
all_friend_agent_yaw.append([dict_value['Yaw']])
"""
target_location = np.array(key_points)
red_car_current_pos = np.array(all_friend_agent_pos)
blue_car_current_pos = np.array(all_enemy_agent_pos[0:2])
blue_drone_current_pos = np.array([all_enemy_agent_pos[-1]])
#red_car_current_yaw = np.array(all_friend_agent_yaw)
#blue_car_current_yaw = np.array(all_enemy_agent_yaw)
"""
blue_alive = []
for agent_id, dict_value in enemy_agents_data.items():
if agent_id is not '231':
if dict_value['Blood'] == 0:
blue_alive.append(False)
else:
blue_alive.append(True)
red_alive = []
for agent_id, dict_value in friend_agents_data.items():
if dict_value['Blood'] == 0:
red_alive.append(False)
else:
red_alive.append(True)
"""
blue_drone_dist_to_go = np.array([[np.linalg.norm(blue_drone_current_pos[a][:2] - target_location[b][:2]) for b in range(2)]
for a in range(1)])
blue_drone_time_to_go = blue_drone_dist_to_go / blue_drone_max_vel
if blue_drone_time_to_go[0][0] < 0.2:
blue_drone_time_to_go[0][0] = -1000
if blue_drone_time_to_go[0][1] < 0.2:
blue_drone_time_to_go[0][1] = -1000
red_car_dist_to_go = np.array(
[[np.linalg.norm(red_car_current_pos[a][:2] - target_location[b][:2]) - expel_range for b in range(2)]
for a in range(2)])
red_car_time_to_go = red_car_dist_to_go / red_car_max_vel
if red_alive[0] is False:
red_car_time_to_go[0] = 100000
elif red_alive[1] is False:
red_car_time_to_go[1] = 100000
red_car_next_pos = red_car_current_pos
if 0: #np.sum(red_alive) == 0:
target_id = np.array([1, 1])
#red_car_next_yaw = red_car_current_yaw
#red_car_next_fire_yaw = np.array([[0], [0]])
#red_car_fire_flag = [False, False]
else:
if np.sum(red_alive) == 2 and np.sum(blue_alive) == 2:
red_blue_car_relative_dist = np.array(
[[np.linalg.norm(red_car_current_pos[a][:2] - blue_car_current_pos[b][:2]) for b in
range(2)] for a in range(2)])
_, col_index = linear_sum_assignment(red_blue_car_relative_dist)
target_pos = blue_car_current_pos[col_index]
target_id = col_index
elif np.sum(red_alive) == 1 and np.sum(blue_alive) == 2:
if red_alive[0] is True:
red_blue_car_relative_dist = np.array(
[[np.linalg.norm(red_car_current_pos[0][:2] - blue_car_current_pos[b][:2]) for b in range(2)]])
target_pos = np.array(
[blue_car_current_pos[np.argmin(red_blue_car_relative_dist)], red_car_current_pos[1]])
elif red_alive[1] is True:
red_blue_car_relative_dist = np.array(
[[np.linalg.norm(red_car_current_pos[1][:2] - blue_car_current_pos[b][:2]) for b in range(2)]])
target_pos = np.array(
[red_car_current_pos[0], blue_car_current_pos[np.argmin(red_blue_car_relative_dist)]])
target_id = np.array([np.argmin(red_blue_car_relative_dist), np.argmin(red_blue_car_relative_dist)])
elif np.sum(red_alive) == 2 and np.sum(blue_alive) == 1:
if blue_alive[0] is True:
target_pos = np.array([blue_car_current_pos[0], blue_car_current_pos[0]])
target_id = np.array([0, 0])
elif blue_alive[1] is True:
target_pos = np.array([blue_car_current_pos[1], blue_car_current_pos[1]])
target_id = np.array([1, 1])
elif np.sum(red_alive) == 1 and np.sum(blue_alive) == 1:
if red_alive[0] is True:
if blue_alive[0] is True:
target_pos = np.array([blue_car_current_pos[0], red_car_current_pos[1]])
target_id = np.array([0, 0])
elif blue_alive[1] is True:
target_pos = np.array([blue_car_current_pos[1], red_car_current_pos[1]])
target_id = np.array([1, 1])
elif red_alive[1] is True:
if blue_alive[0] is True:
target_pos = np.array([red_car_current_pos[0], blue_car_current_pos[0]])
target_id = np.array([0, 0])
elif blue_alive[1] is True:
target_pos = np.array([red_car_current_pos[0], blue_car_current_pos[1]])
target_id = np.array([1, 1])
else:
red_car_next_pos = red_car_current_pos
target_id = np.array([1, 1])
blue_success_time = blue_drone_time_to_go + time_to_win*0.0
if (blue_success_time - np.min(red_car_time_to_go, axis=0, keepdims=True) >= 0).all() and np.sum(
blue_alive) > 0:
# 'offense'
red_car_next_pos = target_pos
flag = 'offense'
else:
# 'defense'
target_defense_index = np.argmin(
blue_success_time - np.min(red_car_time_to_go, axis=0, keepdims=True))
if red_alive[0] is True and red_alive[1] is True:
#red_car_next_pos = np.array(
# [target_location[target_defense_index], target_location[target_defense_index]])
red_car_next_pos[0][:2] = blue_drone_current_pos[0][0:2]
red_car_next_pos[1][:2] = blue_drone_current_pos[0][0:2]
elif red_alive[0] is True and red_alive[1] is False:
#red_car_next_pos = np.array(
# [target_location[target_defense_index], red_car_current_pos[1]])
red_car_next_pos[0][:2] = blue_drone_current_pos[0][0:2]
elif red_alive[0] is False and red_alive[1] is True:
#red_car_next_pos = np.array(
# [red_car_current_pos[0], target_location[target_defense_index]])
red_car_next_pos[1][:2] = blue_drone_current_pos[0][0:2]
flag = 'defense'
"""
agent_yaw = [0, 0]
fire_yaw = [0, 0]
fire_flag = [False, False]
for index in range(2):
relative_dist = np.linalg.norm(red_car_current_pos[index] - target_pos[index])
relative_yaw = math.degrees(math.atan2((target_pos[index][1] - red_car_current_pos[index][1]),
(target_pos[index][0] - red_car_current_pos[index][0])))
if red_alive[index] is True:
if relative_dist < fire_dist:
red_car_next_pos[index] = red_car_current_pos[index]
if ((relative_yaw < red_car_current_yaw[index] - 90) or (
relative_yaw > red_car_current_yaw[index] + 90)): # 在车的基础上,旋转打击载荷
agent_yaw[index] = relative_yaw # 直接旋转车朝向
fire_yaw[index] = 0
else:
agent_yaw[index] = red_car_current_yaw[index]
fire_yaw[index] = relative_yaw - red_car_current_yaw[index]
fire_flag[index] = True
else:
fire_flag[index] = False
agent_yaw[index] = red_car_current_yaw[index]
fire_yaw[index] = 0
else:
fire_flag[index] = False
agent_yaw[index] = red_car_current_yaw[index]
fire_yaw[index] = 0
red_car_next_yaw = np.array(agent_yaw)
red_car_next_fire_yaw = np.array(fire_yaw)
"""
if '311' in self_data.keys():
return red_car_next_pos[0], target_id[0], flag
#return [red_car_next_pos[0], red_car_next_yaw [0], red_car_next_fire_yaw[0], fire_flag[0]]
else:
return red_car_next_pos[1], target_id[1], flag
#return [red_car_next_pos[1], red_car_next_yaw [1], red_car_next_fire_yaw[1], fire_flag[1]]
if __name__ == '__main__':
# 211和221是进攻方blue小车, 231是进攻方blue无人机
# 311和321是防守方red小车
# 当前要决策的防守方red小车的信息
self_data = {'321': {'X': -3.0, 'Y': 2.5, 'Z': 0, 'Yaw': 10, 'Blood': 100}}
# 其余防守方red小车的信息
ally_agents_data = {'311': {'X': 0.0, 'Y': 1.5, 'Z': 0, 'Yaw': 20, 'Blood': 100}}
# 进攻方blue小车和无人机信息
enemy_agents_data = {'211': {'X': 1.5, 'Y': -2.0, 'Z': 0, 'Yaw': 30, 'Blood': 100},
'221': {'X': -2.5, 'Y': -2.5, 'Z': 0, 'Yaw': 40, 'Blood': 100},
'231': {'X': 0.7, 'Y': 3.3, 'Z': 1.5, 'Yaw': 0}}
# 夺控点信息
key_points = [[0.7, 3.3, 0], [-3.0, -0.7, 0]]
# 存活状态
blue_alive = [True, True]
red_alive = [True, True]
target_position, target_id, flag = defense_combat(self_data, ally_agents_data, enemy_agents_data, key_points, blue_alive, red_alive)
print(target_position)
print('\r\n')
print(target_id)
print('\r\n')
print(flag)
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/script_ai/stance.py
================================================
import copy
import random
import numpy as np
import datetime
import time
import math
#态势评估模块
#接口输入:全局状态信息,包括进攻方无人车各种状态,防守方无人车各种状态以及地图状态
class Evaluation_module():
def __init__(self, critical_points=[[-3, -0.7, 0], [0.7, 3.3, 0]]):
self.R0 = 1.5 * 100 # 距离态势缩放因子
self.V0 = 0.1 # 速度态势缩放因子
self.phi0 = np.pi / 4 # 俯仰角态势系数
self.psi0 = 0 # 偏航角态势系数
self.ammo0 = 0.15 # 载荷态势缩放因子(增函数用)
self.heal0 = 0.15 # 血量态势缩放因子(增函数用)
self.AMMO0 = 0.5 # 载荷态势缩放因子(减函数用)
self.HEAL0 = 0.5 # 血量态势缩放因子(减函数用)
# 已知的环境信息
self.critical_points = critical_points # 夺控点位置
# 计算相对速度态势时使用的增函数,输出区间为(0,1)
def SigmoidTen(self, x, c):
y = np.exp(-x/c)
return 1/(1+10*y)
# 计算停留时间态势时使用的增函数,输出区间为[0,0.9)
def SigmoidNine(self, x, c):
y = np.exp(-x/c)
return 1/(1+9*y) - 0.1
# 计算无人车相对坐标点的态势,以备最佳点规划使用,输入为点的坐标和此无人车的信息
# 在进行最佳点规划时,会计算待打击目标附近的几个敌方无人车的态势之和,以此来计算最佳点
def UAV2Point(self, p_position, position, velocity, phi, psi, ammo, health):
# 相对距离威胁
p_position = np.array(p_position)
position = np.array(position)
velocity = np.array(velocity)
r = p_position - position
dist = np.sqrt(np.sum(np.square(r)))
Sr = np.exp(-dist/self.R0)
# 相对速度威胁
V = np.dot(r, velocity) / dist # 求速度在连线朝向上的投影
Sv = self.SigmoidTen(V, self.V0)
# 俯仰角威胁
Sphi = np.exp(-np.abs(phi - self.phi0))
# 偏航角威胁
Spsi = np.exp(-np.abs(psi - self.psi0))
# 载荷威胁(增函数)载荷为0时威胁为0
# Sammo = self.SigmoidNine(ammo, self.ammo0)
Sammo = 1
# 强健度威胁(增函数) 血量为0时威胁为0
Sheal = self.SigmoidNine(health, self.heal0)
# 总态势计算 (系数之和不一定为1,每个系数直接在此处修改)
# 载荷威胁和强健度威胁此处用乘法,算法需要 [在打击范围内] 寻找总态势最小的点作为坐标点
S_sum = (0.6 * Sr + 0.2 * Sv + 0.2 * Sphi + 0.0 * Spsi) * Sammo * Sheal
return S_sum
# 计算无人车相对无人车(智能体)的态势(威胁),以作为选取打击对象的依据(选取威胁大的但是优势低的)
# 其中a_position等表示智能体无人车的参数,即计算态势时考虑的主体的参数
def UAV2UAV(self, identity, a_position,a_ammo,a_health, position,velocity, phi, psi, ammo, health):
a_position = np.array(a_position)
position = np.array(position)
velocity = np.array(velocity)
# 能力比例系数,如我方无人车面对敌方无人车时为1.5, 敌方无人车面对我方无人车时为0.67
# 己方载荷及健康态势计算(增函数)
# Mammo = self.SigmoidNine(a_ammo, self.ammo0)
Mammo = 1
Mhealth = self.SigmoidNine(a_health, self.heal0)
# 对方载荷及健康优势计算(减函数)
# Sammo = np.exp(-ammo / self.AMMO0)
Sammo = 1
Shealth = np.exp(-health / self.HEAL0)
# 进攻方优势计算(选择优势最大的进行打击,若相同则选择距离更近的进行打击)
if identity == "offensive":
# 相对距离威胁
r = a_position - position
dist = np.sqrt(np.sum(np.square(r)))
Sr = np.exp(-dist / self.R0)
S_offensive = 10 * Mammo*Mhealth * Sammo*Shealth * Sr # 乘了系数10以致于S不过分小
return S_offensive
# 防守方优势计算(优先打击距离夺控点近的无人车)
if identity == "defensive":
Sr_temp = 0
for critical_point in self.critical_points:
r = critical_point - position
dist = np.sqrt(np.sum(np.square(r)))
Sr = np.exp(-dist / self.R0)
if Sr >= Sr_temp:
Sr_temp = Sr
S_defensive = 10 * Mammo * Mhealth * Sammo * Shealth * Sr_temp
return S_defensive
# 计算无人机相对夺控点的态势,以此作为防守方是否进行驱离及选择谁对谁进行驱离的依据
def Drone2Point(self, p_position,p_ts, position, velocity):
# 相对距离威胁
p_position = np.array(p_position)
position = np.array(position)
velocity = np.array(velocity)
r = p_position - position
dist = np.sqrt(np.sum(np.square(r)))
Spr = np.exp(-dist / 1) # 此处的放缩系数采用与无人机参数相关的
# 相对速度威胁
V = np.dot(r, velocity) / dist # 求速度在连线朝向上的投影
Spv = self.SigmoidTen(V, 0.2) # 此处的放缩系数采用与无人机参数相关的
# 停留时间威胁
Spt = self.SigmoidNine(p_ts, 0.5) # 不能接受无人机停留3秒及以上
# 计算综合态势
Sp = Spt + 0.6 * Spr + 0.2 * Spv # 建议驱离阈值: Sp >= 0.5
print(Sp)
return Sp
def UAV2Point_id(self, attacker_dict, key_point):
# 相对距离威胁
#进攻方无人车信息
ally_agent_pos = [attacker_dict['X'], attacker_dict['Y'], attacker_dict['Z']]
ally_agent_blood = attacker_dict['blood']
ally_agent_velocityx = attacker_dict['vx']
ally_agent_velocityy = attacker_dict['vy']
ally_agent_ammo = attacker_dict['ammo']
ally_agent_velocity = [ally_agent_velocityx, ally_agent_velocityy, 0]
p_position = np.array(key_point)
position = np.array(ally_agent_pos)
velocity = np.array(ally_agent_velocity)
r = p_position - position
phi = math.degrees(math.atan2((ally_agent_pos[0] - key_point[0]), (ally_agent_pos[1] - key_point[1])))
ammo = ally_agent_ammo
health = ally_agent_blood
# 相对速度威胁
dist = np.sqrt(np.sum(np.square(r)))
Sr = np.exp(-dist/self.R0)
V = np.dot(r, velocity) / dist # 求速度在连线朝向上的投影
Sv = self.SigmoidTen(V, self.V0)
# 偏航角威胁
Sphi = np.exp(-np.abs(phi - self.phi0))
# 俯仰角威胁
# Spsi = np.exp(-np.abs(psi - self.psi0))
# 载荷威胁(增函数)载荷为0时威胁为0
Sammo = self.SigmoidNine(ammo, self.ammo0)
# 强健度威胁(增函数) 血量为0时威胁为0
Sheal = self.SigmoidNine(health, self.heal0)
# 总态势计算 (系数之和不一定为1,每个系数直接在此处修改)
# 载荷威胁和强健度威胁此处用乘法,算法需要 [在打击范围内] 寻找总态势最小的点作为坐标点
# S_sum = (0.6 * Sr + 0.2 * Sv + 0.2 * Sphi + 0.0 * Spsi) * Sammo * Sheal
S_sum = (0.6 * Sr + 0.2 * Sv + 0.2 * Sphi) * Sammo * Sheal
return S_sum
def UAV2UAV_id(self, identity, attacker_dict, defender_dict):
#进攻方无人车信息
ally_agent_pos = [attacker_dict['X'], attacker_dict['Y'], attacker_dict['Z']]
ally_agent_blood = attacker_dict['blood']
# ally_agent_ammo = attacker_dict['ammo']
enemy_agent_pos = [defender_dict['X'], defender_dict['Y'], defender_dict['Z']]
enemy_agent_blood = defender_dict['blood']
# enemy_agent_ammo = defender_dict['ammo']
a_position = np.array(ally_agent_pos)
position = np.array(enemy_agent_pos)
# a_ammo = ally_agent_ammo
a_health = ally_agent_blood
# ammo = enemy_agent_ammo
health = enemy_agent_blood
# 进攻方优势计算(选择优势最大的进行打击,若相同则选择距离更近的进行打击)
if identity == "offensive":
# 相对距离威胁
# Mammo = self.SigmoidNine(a_ammo, self.ammo0)
# Mhealth = np.exp(a_health / self.heal0)
Mhealth = np.exp(a_health/100)
# 对方载荷及健康优势计算(减函数)
# Sammo = np.exp(-ammo / self.AMMO0)
# Shealth = np.exp(-health / self.HEAL0)
Shealth = np.exp(-health/100)
r = a_position - position
dist = np.sqrt(np.sum(np.square(r)))
# Sr = np.exp(-dist / self.R0)
Sr = np.exp(-dist / 1000)
# S_offensive = 10 * Mammo*Mhealth * Sammo*Shealth * Sr # 乘了系数10以致于S不过分小
S_offensive = 0.1 * Mhealth * Shealth * Sr # 乘了系数10以致于S不过分小
return S_offensive
# 防守方优势计算(优先打击距离夺控点近的无人车)
if identity == "defensive":
Sr_temp = 0
Mammo = self.SigmoidNine(ammo, self.ammo0)
Mhealth = self.SigmoidNine(health, self.heal0)
# 对方载荷及健康优势计算(减函数)
Sammo = np.exp(-a_ammo / self.AMMO0)
Shealth = np.exp(-a_health / self.HEAL0)
for critical_point in self.critical_points:
r = critical_point - position
dist = np.sqrt(np.sum(np.square(r)))
Sr = np.exp(-dist / self.R0)
if Sr >= Sr_temp:
Sr_temp = Sr
S_defensive = 10 * Mammo * Mhealth * Sammo * Shealth * Sr_temp
return S_defensive
def Drone2Point_id(self, drone_data, key_point):
drone_pos = [drone_data['X'], drone_data['Y']]
# drone_blood = drone_data['blood']
# drone_velocityx = drone_data['vx']
# drone_velocityy = drone_data['vy']
# drone_velocity = [drone_velocityx, drone_velocityy, 0]
# 相对距离威胁
p_position = np.array(key_point)
position = np.array(drone_pos)
# velocity = np.array(drone_velocity)
r = p_position - position
dist = np.sqrt(np.sum(np.square(r)))
Spr = np.exp(-dist / 100) # 此处的放缩系数采用与无人机参数相关的
# 相对速度威胁
# V = np.dot(r, velocity) / dist # 求速度在连线朝向上的投影
# Spv = self.SigmoidTen(V, 0.2) # 此处的放缩系数采用与无人机参数相关的
# 停留时间威胁
# Spt = self.SigmoidNine(p_ts, 0.5) # 不能接受无人机停留3秒及以上
# 计算综合态势
# Sp = Spt + 0.6 * Spr + 0.2 * Spv # 建议驱离阈值: Sp >= 0.5
# Sp = 0.6 * Spr + 0.4 * Spv
Sp = Spr
return Sp
#计算防守方无人车相对于进攻方无人车的态势矩阵
#矩阵横轴维度为进攻方无人车数量,纵轴维度为防守方无人车数量
def defend_to_attack(self, self_data, ally_agents_data, enemy_agents_data, key_points):
#无人车
all_friend_agents_data = dict(self_data, **ally_agents_data) # 进攻方所有智能体数据
for agent_id, dict_value in all_friend_agents_data.items():
if 'blood' not in dict_value:
temp = agent_id
all_friend_agents_data.pop(temp) # 剔除无人机数据,只考虑地面无人车平台
#进攻方无人车信息
all_friend_agent_pos = []
all_friend_agent_blood = []
all_friend_agent_velocityx = []
all_friend_agent_velocityy = []
all_friend_agent_ammo = []
all_friend_agent_ID = []
all_friend_amount = 0
for agent_id, dict_value in all_friend_agents_data.items():
all_friend_agent_ID.append(agent_id) #编号接口,形式参照丘老师代码,正确性存疑
# all_friend_agent_ammo.append(dict_value['ammo']) #载荷接口,形式参照丘老师代码,正确性存疑
all_friend_agent_velocityx.append(dict_value['velocityx']) #速度接口,形式参照丘老师代码,正确性存疑
all_friend_agent_velocityy.append(dict_value['velocityy']) #速度接口,形式参照丘老师代码,正确性存疑
all_friend_agent_blood.append(dict_value['blood']) #血量接口,形式参照丘老师代码,正确性存疑
all_friend_agent_pos.append([dict_value['X'], dict_value['Y'], dict_value['Z']]) #位置接口,参照丘老师代码编写
all_friend_amount += 1
#防守方无人车信息
all_enemy_agent_pos = []
all_enemy_agent_blood = []
all_enemy_agent_velocityx = []
all_enemy_agent_velocityy = []
all_enemy_agent_ammo = []
all_enemy_agent_ID = []
all_enemy_amount = 0
for agent_id, dict_value in enemy_agents_data.items():
all_enemy_agent_ID.append(agent_id)
# all_enemy_agent_ammo.append(dict_value['ammo'])
all_enemy_agent_velocityx.append(dict_value['velocityx'])
all_enemy_agent_velocityy.append(dict_value['velocityy']) #速度接口,形式参照丘老师代码,正确性存疑
all_enemy_agent_blood.append(dict_value['blood'])
all_enemy_agent_pos.append([dict_value['X'], dict_value['Y'], dict_value['Z']])
all_enemy_amount += 1
evaluation = np.zeros((all_friend_amount, all_enemy_amount))
for i in range(all_friend_amount):
for j in range(all_enemy_amount):
yaw = math.degrees(math.atan2((all_enemy_agent_pos[j][0] - all_friend_agent_pos[i][0]), (all_enemy_agent_pos[j][1] - all_friend_agent_pos[i][1])))
# UAV2UAV(self, identity, a_position,a_ammo,a_health, position,velocity,phi,psi,ammo,health)
all_enemy_agent_velocity = [all_enemy_agent_velocityx[j], all_enemy_agent_velocityy[j], 0]
evaluation[i][j] = self.UAV2UAV("offensive", all_friend_agent_pos[i], 0, all_friend_agent_blood[i],
all_enemy_agent_pos[j], all_enemy_agent_velocity, yaw, 0, 0, all_enemy_agent_blood[j])
return evaluation
#计算进攻方无人车相对于防守方无人车的态势矩阵
#矩阵横轴维度为防守方无人车数量,纵轴维度为进攻方无人车数量
def attack_to_defend(self, self_data, ally_agents_data, enemy_agents_data, key_points):
#无人车
all_friend_agents_data = dict(self_data, **ally_agents_data) # 进攻方所有智能体数据
for agent_id, dict_value in all_friend_agents_data.items():
if 'blood' not in dict_value:
temp = agent_id
all_friend_agents_data.pop(temp) # 剔除无人机数据,只考虑地面无人车平台
#进攻方无人车信息
all_friend_agent_pos = []
all_friend_agent_blood = []
all_friend_agent_velocityx = []
all_friend_agent_velocityy = []
all_friend_agent_ammo = []
all_friend_agent_ID = []
all_friend_amount = 0
for agent_id, dict_value in all_friend_agents_data.items():
all_friend_agent_ID.append(agent_id) #编号接口,形式参照丘老师代码,正确性存疑
#all_friend_agent_ammo.append(dict_value['ammo']) #载荷接口,形式参照丘老师代码,正确性存疑
all_friend_agent_velocityx.append(dict_value['velocityx']) #速度接口,形式参照丘老师代码,正确性存疑
all_friend_agent_velocityy.append(dict_value['velocityy']) #速度接口,形式参照丘老师代码,正确性存疑
all_friend_agent_blood.append(dict_value['blood']) #血量接口,形式参照丘老师代码,正确性存疑
all_friend_agent_pos.append([dict_value['X'], dict_value['Y'], dict_value['Z']]) #位置接口,参照丘老师代码编写
all_friend_amount += 1
#防守方无人车信息
all_enemy_agent_pos = []
all_enemy_agent_blood = []
all_enemy_agent_velocityx = []
all_enemy_agent_velocityy = []
all_enemy_agent_ammo = []
all_enemy_agent_ID = []
all_enemy_amount = 0
for agent_id, dict_value in enemy_agents_data.items():
all_enemy_agent_ID.append(agent_id)
#all_enemy_agent_ammo.append(dict_value['ammo'])
all_enemy_agent_velocityx.append(dict_value['velocityx'])
all_enemy_agent_velocityy.append(dict_value['velocityy'])
all_enemy_agent_blood.append(dict_value['blood'])
all_enemy_agent_pos.append([dict_value['X'], dict_value['Y'], dict_value['Z']])
all_enemy_amount += 1
evaluation = np.zeros((all_enemy_amount, all_friend_amount))
for i in range(all_enemy_amount):
for j in range(all_friend_amount):
yaw = math.degrees(math.atan2((all_enemy_agent_pos[j][0] - all_friend_agent_pos[i][0]), (all_enemy_agent_pos[j][1] - all_friend_agent_pos[i][1])))
# UAV2UAV(self, identity, a_position,a_ammo,a_health, position,velocity,phi,psi,ammo,health)
all_friend_agent_velocity = [all_friend_agent_velocityx[j], all_friend_agent_velocityy[j], 0]
evaluation[i][j] = self.UAV2UAV("defensive", all_enemy_agent_pos[i], 0, all_enemy_agent_blood[i],
all_friend_agent_pos[j], all_friend_agent_velocity, yaw, 0, 0, all_friend_agent_blood[j])
return evaluation
#计算无人机对于夺控点位置的态势矩阵
#矩阵横轴代表夺控点,纵轴代表无人机
def uav_to_defend(self, self_data, ally_agents_data, enemy_agents_data, key_points):
all_friend_agents_data = dict(self_data, **ally_agents_data) # 进攻方所有智能体数据
for agent_id, dict_value in all_friend_agents_data.items():
if 'blood' not in dict_value:
temp1 = agent_id
temp2 = dict_value
drone_data = {}
drone_data[temp1] = temp2
#无人机信息
drone_pos = []
drone_velocityx = []
drone_velocityy = []
drone_ID = []
drone_amount = 0
for agent_id, dict_value in drone_data.items():
drone_ID.append(agent_id) #编号接口,形式参照丘老师代码,正确性存疑
drone_velocityx.append(dict_value['velocityx']) #速度接口,形式参照丘老师代码,正确性存疑
drone_velocityy.append(dict_value['velocityy']) #速度接口,形式参照丘老师代码,正确性存疑
drone_pos.append([dict_value['X'], dict_value['Y'], dict_value['Z']]) #位置接口,参照丘老师代码编写
drone_amount += 1
#夺控点位置
key_point_amount = 0
key_point_pos = []
for key_point in key_points:
key_point_pos.append(key_point)
key_point_amount += 1
evaluation = np.zeros((key_point_amount, drone_amount))
for i in range(key_point_amount):
for j in range(drone_amount):
# Drone2Point(self, p_position,p_ts, position, velocity)
# print(self.Drone2Point(key_point_pos[i], 0, drone_pos[j], drone_velocity[j]))
drone_velocity = [drone_velocityx[j], drone_velocityy[j], 0]
evaluation[i][j] = self.Drone2Point(key_point_pos[i], 0, drone_pos[j], drone_velocity)
return evaluation
'''
#计算无人车对于周围位置点的态势评估矩阵
#
def attack_to_point(self_data, ally_agents_data, enemy_agents_data, key_points):
#无人车
all_friend_agents_data = dict(self_data, **ally_agents_data) # 进攻方所有智能体数据
all_friend_agents_data.pop("231") # 剔除无人机数据,只考虑地面无人车平台
#进攻方无人车信息
all_friend_agent_pos = []
all_friend_agent_blood = []
all_friend_agent_velocity = []
all_friend_agent_ammo = []
all_friend_agent_ID = []
all_friend_amount = 0
for agent_id, dict_value in all_friend_agents_data.items():
all_friend_agent_ID.append(dict_value['ID']) #编号接口,形式参照丘老师代码,正确性存疑
all_friend_agent_ammo.append(dict_value['ammo']) #载荷接口,形式参照丘老师代码,正确性存疑
all_friend_agent_velocity.append(dict_value['velocity']) #速度接口,形式参照丘老师代码,正确性存疑
all_friend_agent_blood.append(dict_value['blood']) #血量接口,形式参照丘老师代码,正确性存疑
all_friend_agent_pos.append([dict_value['X'], dict_value['Y'], dict_value['Z']]) #位置接口,参照丘老师代码编写
all_friend_amount += 1
evaluation = np.zeros((all_friend_amount, all_enemy_amount))
for i in range(all_enemy_amount):
for j in range(all_friend_amount):
yaw = math.degrees(math.atan2((all_enemy_agent_pos[j][0] - all_friend_agent_pos[i][0]), (all_enemy_agent_pos[j][1] - all_friend_agent_pos[i][1])))
#UAV2UAV(self, identity, a_position,a_ammo,a_health, position,velocity,phi,psi,ammo,health)
evaluation[i][j] = self.UAV2UAV("defensive", all_enemy_agent_pos[i], all_enemy_agent_ammo[i], all_enemy_agent_blood[i],
all_friend_agent_pos[j], all_friend_agent_velocity[j], yaw, 0, all_friend_agent_ammo[j], all_friend_agent_blood[j])
return evaluation
'''
#态势评估主函数
def evaluate(self, self_data, ally_agents_data, enemy_agents_data, key_points):
d2a = self.defend_to_attack(self_data, ally_agents_data, enemy_agents_data, key_points)
a2d = self.attack_to_defend(self_data, ally_agents_data, enemy_agents_data, key_points)
u2d = self.uav_to_defend(self_data, ally_agents_data, enemy_agents_data, key_points)
return d2a, a2d, u2d
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/script_ai/uhmap_bb.py
================================================
import copy
from math import sqrt
import numpy as np
from MISSION.uhmap.actset_lookup import encode_action_as_digits
from config import GlobalConfig
class DummyAlgConfig():
reserve = ""
class DummyAlgorithmBase():
def __init__(self, n_agent, n_thread, space, mcv=None, team=None):
self.n_agent = n_agent
self.n_thread = n_thread
self.ScenarioConfig = GlobalConfig.ScenarioConfig
self.attack_order = {}
def forward(self, inp, state, mask=None):
raise NotImplementedError
def to(self, device):
return self
def interact_with_env(self, State_Recall):
assert State_Recall['Latest-Obs'] is not None, ('make sure obs is ok')
ENV_PAUSE = State_Recall['ENV-PAUSE']
ENV_ACTIVE = ~ENV_PAUSE
assert self.n_thread == len(ENV_ACTIVE), ('the number of thread is wrong?')
n_active_thread = sum(ENV_ACTIVE)
# assert len(State_Recall['Latest-Obs']) == n_active_thread, ('make sure we have the right batch of obs')
actions = np.zeros(shape=(self.n_thread, self.n_agent, 8))
# set actions of in-active threads to NaN (will be done again in multi_team.py, this line is not necessary)
actions[ENV_PAUSE] = np.nan
# swap (self.n_thread, self.n_agent) -> (self.n_agent, self.n_thread)
actions = np.swapaxes(actions, 0, 1)
return actions, {}
class DummyAlgorithmT2(DummyAlgorithmBase):
def interact_with_env(self, State_Recall):
assert State_Recall['Latest-Obs'] is not None, ('make sure obs is ok')
ENV_PAUSE = State_Recall['ENV-PAUSE']
ENV_ACTIVE = ~ENV_PAUSE
assert self.n_thread == len(ENV_ACTIVE), ('the number of thread is wrong?')
n_active_thread = sum(ENV_ACTIVE)
AirCarrierUID = 2
# assert len(State_Recall['Latest-Obs']) == n_active_thread, ('make sure we have the right batch of obs')
actions = np.zeros(shape=(self.n_thread, self.n_agent, 8 ))
for thread in range(self.n_thread):
if ENV_PAUSE[thread]:
# 如果,该线程停止,不做任何处理
continue
AirCarrier = State_Recall['Latest-Team-Info'][thread]['dataArr'][AirCarrierUID]
if AirCarrier['agentAlive']:
assert 'RLA_UAV' in AirCarrier['type']
landmarks = State_Recall['Latest-Team-Info'][thread]['dataGlobal']['keyObjArr']
squredis = lambda a,b: sqrt(
(a['agentLocation']['x']-b['location']['x'])**2 +
(a['agentLocation']['y']-b['location']['y'])**2 +
(a['agentLocation']['z']-b['location']['z'])**2 )
AirCarrirSquareDisToEachLandmark = [squredis(AirCarrier, landmark) for landmark in landmarks]
nearLandmark = np.argmin(AirCarrirSquareDisToEachLandmark)
pos_lm = np.array([
landmarks[nearLandmark]['location']['x'],
landmarks[nearLandmark]['location']['y'],
landmarks[nearLandmark]['location']['z'],
])
pos_ac_proj = np.array([
AirCarrier['agentLocation']['x'],
AirCarrier['agentLocation']['y'],
landmarks[nearLandmark]['location']['z'],
])
unit_2ac_prj = (pos_ac_proj - pos_lm) / np.linalg.norm(pos_ac_proj - pos_lm)
p = unit_2ac_prj*400 + pos_lm
actions[thread, :] = encode_action_as_digits('PatrolMoving', 'N/A', x=p[0], y=p[1], z=p[2], UID=None, T=None, T_index=None)
else:
actions[thread, :] = encode_action_as_digits('N/A', 'N/A', x=None, y=None, z=None, UID=None, T=None, T_index=None)
# set actions of in-active threads to NaN (will be done again in multi_team.py, this line is not necessary)
actions[ENV_PAUSE] = np.nan
# swap (self.n_thread, self.n_agent) -> (self.n_agent, self.n_thread)
actions = np.swapaxes(actions, 0, 1)
return actions, {}
class DummyAlgorithmT1(DummyAlgorithmBase):
def interact_with_env(self, State_Recall):
assert State_Recall['Latest-Obs'] is not None, ('make sure obs is ok')
ENV_PAUSE = State_Recall['ENV-PAUSE']
ENV_ACTIVE = ~ENV_PAUSE
assert self.n_thread == len(ENV_ACTIVE), ('the number of thread is wrong?')
n_active_thread = sum(ENV_ACTIVE)
AirCarrierUID = 2
# assert len(State_Recall['Latest-Obs']) == n_active_thread, ('make sure we have the right batch of obs')
actions = np.zeros(shape=(self.n_thread, self.n_agent, 8 ))
for thread in range(self.n_thread):
landmarks = State_Recall['Latest-Team-Info'][thread]['dataGlobal']['keyObjArr']
px = landmarks[0]['location']['x']
py = landmarks[0]['location']['y']
for a in range(self.n_agent):
if not State_Recall['Latest-Team-Info'][thread]['dataArr'][a]['agentAlive']: continue
pz = State_Recall['Latest-Team-Info'][thread]['dataArr'][a]['agentLocation']['z']
actions[thread, a] = encode_action_as_digits('SpecificMoving', 'N/A', x=px, y=py, z=pz, UID=None, T=None, T_index=None)
# set actions of in-active threads to NaN (will be done again in multi_team.py, this line is not necessary)
actions[ENV_PAUSE] = np.nan
# swap (self.n_thread, self.n_agent) -> (self.n_agent, self.n_thread)
actions = np.swapaxes(actions, 0, 1)
return actions, {}
class DummyAlgorithmIdle(DummyAlgorithmBase):
def interact_with_env(self, State_Recall):
assert State_Recall['Latest-Obs'] is not None, ('make sure obs is ok')
ENV_PAUSE = State_Recall['ENV-PAUSE']
ENV_ACTIVE = ~ENV_PAUSE
assert self.n_thread == len(ENV_ACTIVE), ('the number of thread is wrong?')
n_active_thread = sum(ENV_ACTIVE)
AirCarrierUID = 2
# assert len(State_Recall['Latest-Obs']) == n_active_thread, ('make sure we have the right batch of obs')
actions = np.zeros(shape=(self.n_thread, self.n_agent, 8 ))
for thread in range(self.n_thread):
if ENV_PAUSE[thread]:
# 如果,该线程停止,不做任何处理
continue
# AirCarrier = State_Recall['Latest-Team-Info'][thread]['dataArr'][AirCarrierUID]
# if AirCarrier['agentAlive']:
# assert 'RLA_UAV' in AirCarrier['type']
# landmarks = State_Recall['Latest-Team-Info'][thread]['dataGlobal']['keyObjArr']
# squredis = lambda a,b: sqrt(
# (a['agentLocation']['x']-b['location']['x'])**2 +
# (a['agentLocation']['y']-b['location']['y'])**2 +
# (a['agentLocation']['z']-b['location']['z'])**2 )
# AirCarrirSquareDisToEachLandmark = [squredis(AirCarrier, landmark) for landmark in landmarks]
# nearLandmark = np.argmin(AirCarrirSquareDisToEachLandmark)
# px = landmarks[nearLandmark]['location']['x']
# py = landmarks[nearLandmark]['location']['y']
# pz = landmarks[nearLandmark]['location']['z']
# actions[thread, :] = encode_action_as_digits('PatrolMoving', 'N/A', x=px, y=py, z=pz, UID=None, T=None, T_index=None)
# else:
# actions[thread, :] = encode_action_as_digits('N/A', 'N/A', x=None, y=None, z=None, UID=None, T=None, T_index=None)
if State_Recall['Env-Suffered-Reset'][thread]:
actions[thread, :] = encode_action_as_digits('N/A', 'N/A', x=None, y=None, z=None, UID=None, T=None, T_index=None)
else:
actions[thread, :] = encode_action_as_digits('Idle', 'StaticAlert', x=None, y=None, z=None, UID=None, T=None, T_index=None)
# set actions of in-active threads to NaN (will be done again in multi_team.py, this line is not necessary)
actions[ENV_PAUSE] = np.nan
# swap (self.n_thread, self.n_agent) -> (self.n_agent, self.n_thread)
actions = np.swapaxes(actions, 0, 1)
return actions, {}
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/script_ai/uhmap_island.py
================================================
from cmath import isinf, pi
from turtle import done
import numpy as np
import math
from MISSION.uhmap.actionset_v3 import strActionToDigits, ActDigitLen
from config import GlobalConfig
class DummyAlgConfig():
reserve = ""
class DummyAlgorithmBase():
def __init__(self, n_agent, n_thread, space, mcv=None, team=None):
self.n_agent = n_agent
self.n_thread = n_thread
self.team = team
self.ScenarioConfig = GlobalConfig.ScenarioConfig
self.attack_order = {}
self.team_agent_uid = GlobalConfig.ScenarioConfig.AGENT_ID_EACH_TEAM[team]
self.demo_type = GlobalConfig.ScenarioConfig.DemoType
if self.demo_type == 'AirShow' or 'AirAttack':
self.phase = 1
if self.demo_type == 'AirAttack':
self.TargetPosition = []
def forward(self, inp, state, mask=None):
raise NotImplementedError
def to(self, device):
return self
def interact_with_env(self, State_Recall):
assert State_Recall['Latest-Obs'] is not None, ('make sure obs is ok')
ENV_PAUSE = State_Recall['ENV-PAUSE']
ENV_ACTIVE = ~ENV_PAUSE
assert self.n_thread == len(ENV_ACTIVE), ('the number of thread is wrong?')
n_active_thread = sum(ENV_ACTIVE)
# assert len(State_Recall['Latest-Obs']) == n_active_thread, ('make sure we have the right batch of obs')
actions = np.zeros(shape=(self.n_thread, self.n_agent, ActDigitLen))
# set actions of in-active threads to NaN (will be done again in multi_team.py, this line is not necessary)
actions[ENV_PAUSE] = np.nan
# swap (self.n_thread, self.n_agent) -> (self.n_agent, self.n_thread)
actions = np.swapaxes(actions, 0, 1)
return actions, {}
class DummyAlgorithmIdle(DummyAlgorithmBase):
'''
福建省东北角大致方位 (-17500,-19500)
福建省西南角大致方位 (-22500,-5000)
注意:0°对应x轴正方向,90°对应y轴正方向
plane_rotaion = [方位角,俯仰角,翻滚角]
'''
def interact_with_env(self, State_Recall):
assert State_Recall['Latest-Obs'] is not None, ('make sure obs is ok')
ENV_PAUSE = State_Recall['ENV-PAUSE']
ENV_ACTIVE = ~ENV_PAUSE
assert self.n_thread == len(ENV_ACTIVE), ('the number of thread is wrong?')
n_active_thread = sum(ENV_ACTIVE)
# assert len(State_Recall['Latest-Obs']) == n_active_thread, ('make sure we have the right batch of obs')
actions = np.zeros(shape=(self.n_thread, self.n_agent, ActDigitLen))
for thread in range(self.n_thread):
if ENV_PAUSE[thread]:
# 如果,该线程停止,不做任何处理
continue
# 此处代码仅做demo用
act_dic = ActionDictionary
if self.demo_type == "AirShow":
'''
飞行场景设计:
初始位置: x/y按照索引设定, 初始高度一致为1000米, 初始俯仰角为0, 初始滚转角为0, 初始偏航角按照索引设定
1.上升到统一高度5000m, 欧拉角全部设为0
2.方位角先变到-90°, 后变到0°, 在此期间高度上升至10000m
3.方位角变到180°
4.方位角先变到-90°, 后变到180°, 在此期间高度下降至5000m
5.方位角变到0°
6.重复2~5动作
'''
for id in range(self.n_agent):
cruise_height = 15000
cruise_speed = 600
plane_location = State_Recall['Latest-Team-Info'][thread]['dataArr'][id]['agentLocationArr']
plane_rotaion = State_Recall['Latest-Team-Info'][thread]['dataArr'][id]['agentRotationArr']
if self.phase == 1:
if np.abs(0 - plane_rotaion[1]) < 1:
actions[thread, id] = act_dic.select_act('PlaneAgent', 2)
print("Change Height!")
else:
actions[thread, id] = act_dic.select_act('PlaneAgent', 6)
print("Change Direction!")
if np.abs(5000 - plane_location[2]) < 0.1 and np.abs(0 - plane_rotaion[0]) < 1:
self.phase += 1
print("Stage1 Done")
elif self.phase == 2:
print("Stage2!")
if np.abs(0 - plane_rotaion[1]) < 1:
actions[thread, id] = act_dic.select_act('PlaneAgent', 3)
else:
actions[thread, id] = act_dic.select_act('PlaneAgent', 12)
if np.abs(-90 - plane_rotaion[0]) < 1:
self.phase += 0.5
elif self.phase == 2.5:
actions[thread, id] = act_dic.select_act('PlaneAgent', 6)
if np.abs(10000 - plane_location[2]) < 0.1 and np.abs(0 - plane_rotaion[0]) < 1:
self.phase += 0.5
print("Stage2 Done")
elif self.phase == 3:
actions[thread, id] = act_dic.select_act('PlaneAgent', 10)
if np.abs(180 - plane_rotaion[0]) < 1:
self.phase += 1
print("Stage3 Done")
elif self.phase == 4:
print("Stage4!")
if np.abs(0 - plane_rotaion[1]) < 1:
actions[thread, id] = act_dic.select_act('PlaneAgent', 2)
else:
actions[thread, id] = act_dic.select_act('PlaneAgent', 12)
if np.abs(-90 - plane_rotaion[0]) < 1:
self.phase += 0.5
elif self.phase == 4.5:
print(self.phase)
actions[thread, id] = act_dic.select_act('PlaneAgent', 10)
if np.abs(5000 - plane_location[2]) < 10 and np.abs(180 - plane_rotaion[0]) < 1:
self.phase += 0.5
print("Stage4 Done")
elif self.phase == 5:
actions[thread, id] = act_dic.select_act('PlaneAgent', 6)
if np.abs(0 - plane_rotaion[0]) < 1:
self.phase = 2
print("Stage5 Done")
elif self.demo_type == "AirAttack":
'''
全流程demo演示:
1. 飞机一直跟踪至目标方位,并上升到巡航高度(暂定15000m)
2. 在开始上升后,开始加速动作,一直到巡航速度(暂定600m/s)
3. 到达巡航高度后,飞机进行巡航,慢慢抵达目标
4. 接近目标后,降低高度到打击高度(暂定5000m) (预计打击半径暂定为50,000m)
5. 到达指定高度后,始终将目标方向作为期望航迹方位角方向
6. 到达目标后,与目标进行交互打击,若目标被摧毁则在原地盘旋
7. 打击后汇合
'''
# 获取目标坐标及参数设置
if self.phase <= 1:
for id in range(100,105):
self.TargetPosition.append(State_Recall['Latest-Team-Info'][thread]['dataArr'][id]['agentLocationArr'])
cruise_height = 15000
cruise_speed = 600
attack_height = 5000
ready_radius = 50000
# 执行飞行器脚本
for id in range(self.n_agent - 5):
if State_Recall['Latest-Team-Info'][thread]['dataArr'][id]['agentAlive']:
plane_location = State_Recall['Latest-Team-Info'][thread]['dataArr'][id]['agentLocationArr']
plane_rotaion = State_Recall['Latest-Team-Info'][thread]['dataArr'][id]['agentRotationArr']
# 分配目标
index = math.floor(id / (self.n_agent - 5) * 5)
target_location = np.array((self.TargetPosition[index]))
delta_location = target_location - plane_location
target_pitch = self.DeltaLocation2Angle(delta_location[0], delta_location[1])
# 1.
if self.phase == 1:
if np.abs(cruise_height - plane_location[2]) < 5:
self.phase += 1
print("Stage1 Done")
elif np.abs(0 - plane_rotaion[1]) < 1:
action_num = self.TargetHeight2Action(cruise_height)
actions[thread, id] = act_dic.select_act('PlaneAgent', action_num)
print("Change Height to 15000m!")
else:
# 跟踪目标位置
delta_location = target_location - plane_location
target_pitch = self.DeltaLocation2Angle(delta_location[0], delta_location[1])
action_num = self.TargetAngle2Action(target_pitch)
actions[thread, id] = act_dic.select_act('PlaneAgent', action_num)
# 2.
if self.phase == 2:
speed = float(State_Recall['Latest-Team-Info'][thread]['dataArr'][id]['rSVD1'])
if np.abs(cruise_speed - speed) < 1:
self.phase += 1
print("Stage2 Done")
else:
actions[thread, id] = act_dic.select_act('PlaneAgent', 14)
# 3.
if self.phase == 3:
delta_location = target_location - plane_location
if np.sqrt(np.sum(np.square((delta_location[0], delta_location[1])))) <= ready_radius:
self.phase += 1
print("Stage3 Done")
else:
# 跟踪目标位置
delta_location = target_location - plane_location
target_pitch = self.DeltaLocation2Angle(delta_location[0], delta_location[1])
action_num = self.TargetAngle2Action(target_pitch)
actions[thread, id] = act_dic.select_act('PlaneAgent', action_num)
# 4.
if self.phase == 4:
if np.abs(attack_height - plane_location[2]) < 5:
self.phase += 1
print("Stage4 Done")
elif np.abs(0 - plane_rotaion[1]) < 1:
action_num = self.TargetHeight2Action(attack_height)
actions[thread, id] = act_dic.select_act('PlaneAgent', action_num)
print("Change Height to 5000m!")
else:
# 跟踪目标位置
delta_location = target_location - plane_location
target_pitch = self.DeltaLocation2Angle(delta_location[0], delta_location[1])
action_num = self.TargetAngle2Action(target_pitch)
actions[thread, id] = act_dic.select_act('PlaneAgent', action_num)
# 5.
if self.phase == 5:
# 跟踪目标位置
delta_location = target_location - plane_location
target_pitch = self.DeltaLocation2Angle(delta_location[0], delta_location[1])
action_num = self.TargetAngle2Action(target_pitch)
actions[thread, id] = act_dic.select_act('PlaneAgent', action_num)
# 5.接近目标,对目标发射导弹
# dist_2D = np.sqrt(np.sum(np.square((delta_location[0], delta_location[1]))))
# if dist_2D < 10000:
# actions[thread, id] = strActionToDigits('ActionSet3::LaunchMissile;NONE')
# print(target_pitch)
# actions[thread, :] = strActionToDigits('ActionSet3::ChangeDirection;{}'.format(target_pitch))
# print(State_Recall['Latest-Team-Info'][thread]['dataArr'][0]['agentRotationArr'])
# set actions of in-active threads to NaN (will be done again in multi_team.py, this line is not necessary)
actions[ENV_PAUSE] = np.nan
# swap (self.n_thread, self.n_agent) -> (self.n_agent, self.n_thread)
actions = np.swapaxes(actions, 0, 1)
return actions, {}
def DeltaLocation2Angle(self, delta_x, delta_y):
'''
将输入的距离差向量转换为方位角度
此处为角度制
'''
# assert len(delta_location) == 2 or 3
# delta_x = delta_location[0]
# delta_y = delta_location[1]
if delta_x == 0 and delta_y != 0:
theta = 90 if delta_y > 0 else -90
else:
abs_theta = np.arctan(np.abs(delta_y) / np.abs(delta_x)) * 180 / pi
if delta_x > 0 and delta_y >= 0:
theta = abs_theta
elif delta_x < 0 and delta_y >= 0:
theta = 180 - abs_theta
elif delta_x > 0 and delta_y < 0:
theta = - abs_theta
elif delta_x < 0 and delta_y < 0:
theta = abs_theta - 180
return theta
def TargetAngle2Action(self, target_yaw):
'''
将输入的期望角度转化为对应的离散动作
此处为角度制
'''
action_yaw_set = np.array([0, 45, 90, 135, 180, -135, -90, -45])
delta_action_yaw_set = np.abs(action_yaw_set - target_yaw)
output_num = None
for i, element in enumerate(delta_action_yaw_set):
if element <= 22.5 or element >= (180+135+22.5):
output_num = i
break
# if output_num is None:
# print('离散方位角动作设置或者程序逻辑有问题!')
# print(target_yaw)
# print(delta_action_yaw_set)
assert output_num is not None, '离散方位角动作设置或者程序逻辑有问题!'
return output_num + 6
def TargetHeight2Action(self, target_height):
'''
将输入的期望高度转化为对应的离散动作
'''
action_height_set = np.array([1000, 5000, 10000, 15000, 20000])
delta_action_height_set = np.abs(action_height_set - target_height)
height_threshold = np.abs(action_height_set[-1] - action_height_set[-2]) / 2
output_num = None
for i, element in enumerate(delta_action_height_set):
if element <= height_threshold:
output_num = i
break
assert output_num is not None, '离散高度动作设置或者程序逻辑有问题!'
return output_num + 1
class ActionDictionary():
'''
Height Space(5): 20000m, 15000m, 10000m, 5000m, 1000m
Direction Space(8): 45°, 90°, 135°, 180°, -135°, -90°, -45°, 0°
Speed Space(2): Positive, Negative
'''
# Direction Space(16): 22.5°, 45°, 67.5°, 90°, 112.5°, 135°, 157.5°, 180°, -157.5°, -135°, -112.5°, -90°, -67.5°, -45°, -22.5°, 0°
# Speed Space(10): 150, 200, 250, 300, 350, 400, 450, 500, 550, 600
dictionary_args = [
'N/A;N/A', # 0
'ChangeHeight;1000', # 1
'ChangeHeight;5000', # 2
'ChangeHeight;10000', # 3
'ChangeHeight;15000', # 4
'ChangeHeight;20000', # 5
'ChangeDirection;0', # 6
'ChangeDirection;45', # 7
'ChangeDirection;90', # 8
'ChangeDirection;135', # 9
'ChangeDirection;180', # 10
'ChangeDirection;-135', # 11
'ChangeDirection;-90', # 12
'ChangeDirection;-45', # 13
'ChangeSpeed;Positive', # 14
'ChangeSpeed;Negative', # 15
]
@staticmethod
def select_act(type, a):
if type =='PlaneAgent':
args = ActionDictionary.dictionary_args[a]
return strActionToDigits(f'ActionSet3::{args}')
@staticmethod
def get_avail_act():
pass
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/script_ai/uhmap_ls.py
================================================
import copy
import numpy as np
from UTIL.tensor_ops import distance_mat_between
from scipy.optimize import linear_sum_assignment
from MISSION.uhmap.actset_lookup import encode_action_as_digits
from config import GlobalConfig
class DummyAlgConfig():
reserve = ""
class DummyAlgorithmBase():
def __init__(self, n_agent, n_thread, space, mcv=None, team=None):
self.n_agent = n_agent
self.n_thread = n_thread
self.team = team
self.ScenarioConfig = GlobalConfig.ScenarioConfig
self.attack_order = {}
self.team_agent_uid = GlobalConfig.ScenarioConfig.AGENT_ID_EACH_TEAM[team]
def forward(self, inp, state, mask=None):
raise NotImplementedError
def to(self, device):
return self
def interact_with_env(self, State_Recall):
assert State_Recall['Latest-Obs'] is not None, ('make sure obs is ok')
ENV_PAUSE = State_Recall['ENV-PAUSE']
ENV_ACTIVE = ~ENV_PAUSE
assert self.n_thread == len(ENV_ACTIVE), ('the number of thread is wrong?')
n_active_thread = sum(ENV_ACTIVE)
# assert len(State_Recall['Latest-Obs']) == n_active_thread, ('make sure we have the right batch of obs')
actions = np.zeros(shape=(self.n_thread, self.n_agent, 8))
# set actions of in-active threads to NaN (will be done again in multi_team.py, this line is not necessary)
actions[ENV_PAUSE] = np.nan
# swap (self.n_thread, self.n_agent) -> (self.n_agent, self.n_thread)
actions = np.swapaxes(actions, 0, 1)
return actions, {}
class DummyAlgorithmSeqFire(DummyAlgorithmBase):
def interact_with_env(self, State_Recall):
assert State_Recall['Latest-Obs'] is not None, ('make sure obs is ok')
ENV_PAUSE = State_Recall['ENV-PAUSE']
ENV_ACTIVE = ~ENV_PAUSE
assert self.n_thread == len(ENV_ACTIVE), ('the number of thread is wrong?')
n_active_thread = sum(ENV_ACTIVE)
# assert len(State_Recall['Latest-Obs']) == n_active_thread, ('make sure we have the right batch of obs')
actions = np.zeros(shape=(self.n_thread, self.n_agent, 8 ))
for thread in range(self.n_thread):
if ENV_PAUSE[thread]:
# 如果,该线程停止,不做任何处理
continue
# 如果,该线程没有停止
if State_Recall['Env-Suffered-Reset'][thread]:
# 如果该线程刚刚reset
opp_uid_range = GlobalConfig.ScenarioConfig.AGENT_ID_EACH_TEAM[1-self.team]
opp_uid_range = list(copy.deepcopy(opp_uid_range))
np.random.shuffle(opp_uid_range)
self.attack_order[thread] = opp_uid_range
# 当前的Episode步数
step_cnt = State_Recall['Current-Obs-Step'][thread]
# 当前的info
info = State_Recall['Latest-Team-Info']
raw_info = State_Recall['Latest-Team-Info'][thread]['dataArr']
# 判断agent是否存活
def uid_alive(uid):
return raw_info[uid]['agentAlive']
for uid in self.attack_order[thread]:
if uid_alive(uid):
# 如果该敌方存活,则集火攻击(:)
actions[thread, :] = encode_action_as_digits('SpecificAttacking', 'N/A', x=None, y=None, z=None, UID=uid, T=None, T_index=None)
# set actions of in-active threads to NaN (will be done again in multi_team.py, this line is not necessary)
actions[ENV_PAUSE] = np.nan
# swap (self.n_thread, self.n_agent) -> (self.n_agent, self.n_thread)
actions = np.swapaxes(actions, 0, 1)
return actions, {}
class DummyAlgorithmIdle(DummyAlgorithmBase):
def interact_with_env(self, State_Recall):
assert State_Recall['Latest-Obs'] is not None, ('make sure obs is ok')
ENV_PAUSE = State_Recall['ENV-PAUSE']
ENV_ACTIVE = ~ENV_PAUSE
assert self.n_thread == len(ENV_ACTIVE), ('the number of thread is wrong?')
n_active_thread = sum(ENV_ACTIVE)
AirCarrierUID = 2
# assert len(State_Recall['Latest-Obs']) == n_active_thread, ('make sure we have the right batch of obs')
actions = np.zeros(shape=(self.n_thread, self.n_agent, 8 ))
for thread in range(self.n_thread):
if ENV_PAUSE[thread]:
# 如果,该线程停止,不做任何处理
continue
if State_Recall['Env-Suffered-Reset'][thread]:
actions[thread, :] = encode_action_as_digits('Idle', 'AggressivePersue', x=None, y=None, z=None, UID=None, T=None, T_index=None)
else:
actions[thread, :] = encode_action_as_digits('N/A', 'N/A', x=None, y=None, z=None, UID=None, T=None, T_index=None)
# set actions of in-active threads to NaN (will be done again in multi_team.py, this line is not necessary)
actions[ENV_PAUSE] = np.nan
# swap (self.n_thread, self.n_agent) -> (self.n_agent, self.n_thread)
actions = np.swapaxes(actions, 0, 1)
return actions, {}
class DummyAlgorithmMarch(DummyAlgorithmBase):
def interact_with_env(self, State_Recall):
assert State_Recall['Latest-Obs'] is not None, ('make sure obs is ok')
ENV_PAUSE = State_Recall['ENV-PAUSE']
ENV_ACTIVE = ~ENV_PAUSE
assert self.n_thread == len(ENV_ACTIVE), ('the number of thread is wrong?')
n_active_thread = sum(ENV_ACTIVE)
AirCarrierUID = 2
# assert len(State_Recall['Latest-Obs']) == n_active_thread, ('make sure we have the right batch of obs')
actions = np.zeros(shape=(self.n_thread, self.n_agent, 8 ))
if not hasattr(self, 'march_direction'):
self.march_direction = '+Y'
for thread in range(self.n_thread):
if ENV_PAUSE[thread]:
# 如果,该线程停止,不做任何处理
continue
if State_Recall['Env-Suffered-Reset'][thread]:
a_agent_uid = self.team_agent_uid[0]
self.march_direction = '+Y' if State_Recall['Latest-Team-Info'][thread]['dataArr'][a_agent_uid]['agentLocation']['y'] <0 else '-Y'
actions[thread, :] = encode_action_as_digits('Idle', 'AggressivePersue', x=None, y=None, z=None, UID=None, T=None, T_index=None)
else:
if self.march_direction == '+Y':
actions[thread, :] = encode_action_as_digits('PatrolMoving', 'Dir+Y', x=None, y=None, z=None, UID=None, T=None, T_index=None)
else:
actions[thread, :] = encode_action_as_digits('PatrolMoving', 'Dir-Y', x=None, y=None, z=None, UID=None, T=None, T_index=None)
# set actions of in-active threads to NaN (will be done again in multi_team.py, this line is not necessary)
actions[ENV_PAUSE] = np.nan
# swap (self.n_thread, self.n_agent) -> (self.n_agent, self.n_thread)
actions = np.swapaxes(actions, 0, 1)
return actions, {}
def assign_opponent(opp_pos_arr, opp_id_arr, leader_pos_arr, leader_id_arr):
result = {}
dis_mat = distance_mat_between(leader_pos_arr, opp_pos_arr)
dis_mat[dis_mat == np.inf] = 1e10
indices, assignments = linear_sum_assignment(dis_mat)
for i, j, a in zip(range(len(indices)), indices, assignments):
assert i == j
result[leader_id_arr[i]] = opp_id_arr[a]
return result
class DummyAlgorithmLinedAttack(DummyAlgorithmBase):
def interact_with_env(self, State_Recall):
assert State_Recall['Latest-Obs'] is not None, ('make sure obs is ok')
ENV_PAUSE = State_Recall['ENV-PAUSE']
ENV_ACTIVE = ~ENV_PAUSE
assert self.n_thread == len(ENV_ACTIVE), ('the number of thread is wrong?')
n_active_thread = sum(ENV_ACTIVE)
actions = np.zeros(shape=(self.n_thread, self.n_agent, 8 ))
for thread in range(self.n_thread):
if ENV_PAUSE[thread]:
# 如果,该线程停止,不做任何处理
continue
actions[thread] = self.decide_each_thread(
thread = thread,
step_cnt = State_Recall['Current-Obs-Step'][thread],
raw_info = State_Recall['Latest-Team-Info'][thread]['dataArr'],
Env_Suffered_Reset = State_Recall['Env-Suffered-Reset'][thread]
)
# set actions of in-active threads to NaN (will be done again in multi_team.py, this line is not necessary)
actions[ENV_PAUSE] = np.nan
# swap (self.n_thread, self.n_agent) -> (self.n_agent, self.n_thread)
actions = np.swapaxes(actions, 0, 1)
return actions, {}
# 判断agent是否存活
def uid_alive(raw_info, uid):
return raw_info[uid]['agentAlive']
def decide_each_thread(self, **kwargs):
act_each_agent = np.zeros(shape=( self.n_agent, 8 ))
self_uid_range = GlobalConfig.ScenarioConfig.AGENT_ID_EACH_TEAM[self.team]
Env_Suffered_Reset = kwargs['Env_Suffered_Reset']
thread = kwargs['thread']
# 当前的Episode步数
step_cnt = kwargs['step_cnt']
raw_info = kwargs['raw_info']
# # 如果,该线程没有停止
# if Env_Suffered_Reset:
# # 如果该线程刚刚reset
# opp_uid_range = GlobalConfig.ScenarioConfig.AGENT_ID_EACH_TEAM[1-self.team]
# opp_uid_range = list(copy.deepcopy(opp_uid_range))
# np.random.shuffle(opp_uid_range)
# self.attack_order[thread] = opp_uid_range
opp_uid_range = GlobalConfig.ScenarioConfig.AGENT_ID_EACH_TEAM[1-self.team]
pos_arr_2d = np.array([_info['agentLocationArr'][:2] for _info in raw_info])
opp_pos_arr = pos_arr_2d[opp_uid_range]
self_air_uid_range = [info['uId'] for info in raw_info if info['agentAlive'] and info['agentTeam'] == self.team and info['type']=='RLA_UAV_Support']
N_leader = len(self_air_uid_range)
self_ground_uid_range = [info['uId'] for info in raw_info if info['agentAlive'] and info['agentTeam'] == self.team and info['type']!='RLA_UAV_Support']
if N_leader > 0:
self_air_pos_arr = pos_arr_2d[self_air_uid_range]
assignments = assign_opponent(
opp_pos_arr=opp_pos_arr,
opp_id_arr=opp_uid_range,
leader_pos_arr = self_air_pos_arr,
leader_id_arr=self_air_uid_range
)
for group in range(N_leader):
attack_uid = assignments[self_air_uid_range[group]]
group_member_uids = [uid for uid in self_ground_uid_range if uid%N_leader==group]
for group_member_uid in group_member_uids:
agent_team_index = raw_info[group_member_uid]['indexInTeam']
act_each_agent[agent_team_index] = encode_action_as_digits('SpecificAttacking', 'N/A', x=None, y=None, z=None, UID=attack_uid, T=None, T_index=None)
leader_uid = self_air_uid_range[group]
agent_team_index = raw_info[leader_uid]['indexInTeam']
z_leader = raw_info[leader_uid]['agentLocationArr'][2]
if len(group_member_uids) > 0:
team_center_pos = pos_arr_2d[group_member_uid]
act_each_agent[agent_team_index] = encode_action_as_digits('PatrolMoving', 'N/A', x=team_center_pos[0], y=team_center_pos[1], z=z_leader, UID=None, T=None, T_index=None)
else:
act_each_agent[agent_team_index] = encode_action_as_digits('SpecificAttacking', 'N/A', x=None, y=None, z=None, UID=attack_uid, T=None, T_index=None)
return act_each_agent
else:
center_pos_kd = pos_arr_2d[self_ground_uid_range].mean(0, keepdims=True)
dis = distance_mat_between(center_pos_kd, opp_pos_arr)
target_index = np.argmin(dis.squeeze())
attack_uid = opp_uid_range[target_index]
group_member_uids = self_ground_uid_range
for group_member_uid in group_member_uids:
agent_team_index = raw_info[group_member_uid]['indexInTeam']
act_each_agent[agent_team_index] = encode_action_as_digits('SpecificAttacking', 'N/A',
x=None, y=None, z=None, UID=attack_uid, T=None, T_index=None)
return act_each_agent
def vector_shift_towards(pos, toward_pos, offset):
delta = toward_pos - pos
delta = delta / (np.linalg.norm(delta) + 1e-10)
return pos + delta * offset
================================================
FILE: PythonExample/hmp_minimal_modules/ALGORITHM/script_ai/uhmap_ls_mp.py
================================================
import copy, atexit
import numpy as np
from UTIL.tensor_ops import distance_mat_between
from scipy.optimize import linear_sum_assignment
from MISSION.uhmap.actset_lookup import encode_action_as_digits
from config import GlobalConfig
class DummyAlgConfig():
reserve = ""
class DummyAlgorithmBase():
def __init__(self, n_agent, n_thread, space, mcv=None, team=None):
self.n_agent = n_agent
self.n_thread = n_thread
self.team = team
self.ScenarioConfig = GlobalConfig.ScenarioConfig
self.attack_order = {}
self.team_agent_uid = GlobalConfig.ScenarioConfig.AGENT_ID_EACH_TEAM[team]
def forward(self, inp, state, mask=None):
raise NotImplementedError
def to(self, device):
return self
def interact_with_env(self, State_Recall):
assert State_Recall['Latest-Obs'] is not None, ('make sure obs is ok')
ENV_PAUSE = State_Recall['ENV-PAUSE']
ENV_ACTIVE = ~ENV_PAUSE
assert self.n_thread == len(ENV_ACTIVE), ('the number of thread is wrong?')
n_active_thread = sum(ENV_ACTIVE)
# assert len(State_Recall['Latest-Obs']) == n_active_thread, ('make sure we have the right batch of obs')
actions = np.zeros(shape=(self.n_thread, self.n_agent, 8))
# set actions of in-active threads to NaN (will be done again in multi_team.py, this line is not necessary)
actions[ENV_PAUSE] = np.nan
# swap (self.n_thread, self.n_agent) -> (self.n_agent, self.n_thread)
actions = np.swapaxes(actions, 0, 1)
return actions, {}
class DummyAlgorithmSeqFire(DummyAlgorithmBase):
def interact_with_env(self, State_Recall):
assert State_Recall['Latest-Obs'] is not None, ('make sure obs is ok')
ENV_PAUSE = State_Recall['ENV-PAUSE']
ENV_ACTIVE = ~ENV_PAUSE
assert self.n_thread == len(ENV_ACTIVE), ('the number of thread is wrong?')
n_active_thread = sum(ENV_ACTIVE)
# assert len(State_Recall['Latest-Obs']) == n_active_thread, ('make sure we have the right batch of obs')
actions = np.zeros(shape=(self.n_thread, self.n_agent, 8 ))
for thread in range(self.n_thread):
if ENV_PAUSE[thread]:
# 如果,该线程停止,不做任何处理
continue
# 如果,该线程没有停止
if State_Recall['Env-Suffered-Reset'][thread]:
# 如果该线程刚刚reset
opp_uid_range = GlobalConfig.ScenarioConfig.AGENT_ID_EACH_TEAM[1-self.team]
opp_uid_range = list(copy.deepcopy(opp_uid_range))
np.random.shuffle(opp_uid_range)
self.attack_order[thread] = opp_uid_range
# 当前的Episode步数
step_cnt = State_Recall['Current-Obs-Step'][thread]
# 当前的info
info = State_Recall['Latest-Team-Info']
raw_info = State_Recall['Latest-Team-Info'][thread]['dataArr']
# 判断agent是否存活
def uid_alive(uid):
return raw_info[uid]['agentAlive']
for uid in self.attack_order[thread]:
if uid_alive(uid):
# 如果该敌方存活,则集火攻击(:)
actions[thread, :] = encode_action_as_digits('SpecificAttacking', 'N/A', x=None, y=None, z=None, UID=uid, T=None, T_index=None)
# set actions of in-active threads to NaN (will be done again in multi_team.py, this line is not necessary)
actions[ENV_PAUSE] = np.nan
# swap (self.n_thread, self.n_agent) -> (self.n_agent, self.n_thread)
actions = np.swapaxes(actions, 0, 1)
return actions, {}
class DummyAlgorithmIdle(DummyAlgorithmBase):
def interact_with_env(self, State_Recall):
assert State_Recall['Latest-Obs'] is not None, ('make sure obs is ok')
ENV_PAUSE = State_Recall['ENV-PAUSE']
ENV_ACTIVE = ~ENV_PAUSE
assert self.n_thread == len(ENV_ACTIVE), ('the number of thread is wrong?')
n_active_thread = sum(ENV_ACTIVE)
AirCarrierUID = 2
# assert len(State_Recall['Latest-Obs']) == n_active_thread, ('make sure we have the right batch of obs')
actions = np.zeros(shape=(self.n_thread, self.n_agent, 8 ))
for thread in range(self.n_thread):
if ENV_PAUSE[thread]:
# 如果,该线程停止,不做任何处理
continue
if State_Recall['Env-Suffered-Reset'][thread]:
actions[thread, :] = encode_action_as_digits('Idle', 'AggressivePersue', x=None, y=None, z=None, UID=None, T=None, T_index=None)
else:
actions[thread, :] = encode_action_as_digits('N/A', 'N/A', x=None, y=None, z=None, UID=None, T=None, T_index=None)
# set actions of in-active threads to NaN (will be done again in multi_team.py, this line is not necessary)
actions[ENV_PAUSE] = np.nan
# swap (self.n_thread, self.n_agent) -> (self.n_agent, self.n_thread)
actions = np.swapaxes(actions, 0, 1)
return actions, {}
class DummyAlgorithmMarch(DummyAlgorithmBase):
def interact_with_env(self, State_Recall):
assert State_Recall['Latest-Obs'] is not None, ('make sure obs is ok')
ENV_PAUSE = State_Recall['ENV-PAUSE']
ENV_ACTIVE = ~ENV_PAUSE
assert self.n_thread == len(ENV_ACTIVE), ('the number of thread is wrong?')
n_active_thread = sum(ENV_ACTIVE)
AirCarrierUID = 2
# assert len(State_Recall['Latest-Obs']) == n_active_thread, ('make sure we have the right batch of obs')
actions = np.zeros(shape=(self.n_thread, self.n_agent, 8 ))
if not hasattr(self, 'march_direction'):
self.march_direction = '+Y'
for thread in range(self.n_thread):
if ENV_PAUSE[thread]:
# 如果,该线程停止,不做任何处理
continue
if State_Recall['Env-Suffered-Reset'][thread]:
a_agent_uid = self.team_agent_uid[0]
self.march_direction = '+Y' if State_Recall['Latest-Team-Info'][thread]['dataArr'][a_agent_uid]['agentLocation']['y'] <0 else '-Y'
actions[thread, :] = encode_action_as_digits('Idle', 'AggressivePersue', x=None, y=None, z=None, UID=None, T=None, T_index=None)
else:
if self.march_direction == '+Y':
actions[thread, :] = encode_action_as_digits('PatrolMoving', 'Dir+Y', x=None, y=None, z=None, UID=None, T=None, T_index=None)
else:
actions[thread, :] = encode_action_as_digits('PatrolMoving', 'Dir-Y', x=None, y=None, z=None, UID=None, T=None, T_index=None)
# set actions of in-active threads to NaN (will be done again in multi_team.py, this line is not necessary)
actions[ENV_PAUSE] = np.nan
# swap (self.n_thread, self.n_agent) -> (self.n_agent, self.n_thread)
actions = np.swapaxes(actions, 0, 1)
return actions, {}
def assign_opponent(opp_pos_arr, opp_id_arr, leader_pos_arr, leader_id_arr):
result = {}
dis_mat = distance_mat_between(leader_pos_arr, opp_pos_arr)
dis_mat[dis_mat == np.inf] = 1e10
indices, assignments = linear_sum_assignment(dis_mat)
for i, j, a in zip(range(len(indices)), indices, assignments):
assert i == j
result[leader_id_arr[i]] = opp_id_arr[a]
return result
class ThreadDecisionMaker():
def apply_context(self, kwargs):
for k in kwargs:
setattr(self, k, kwargs[k])
def decide_each_thread(self, kwargs):
act_each_agent = np.zeros(shape=( self.n_agent, 8 ))
if kwargs['env_pause']: return act_each_agent
self_uid_range = GlobalConfig.ScenarioConfig.AGENT_ID_EACH_TEAM[self.team]
Env_Suffered_Reset = kwargs['Env_Suffered_Reset']
thread = kwargs['thread']
# 当前的Episode步数
step_cnt = kwargs['step_cnt']
raw_info = kwargs['raw_info']
# # 如果,该线程没有停止
# if Env_Suffered_Reset:
# # 如果该线程刚刚reset
# opp_uid_range = GlobalConfig.ScenarioConfig.AGENT_ID_EACH_TEAM[1-self.team]
# opp_uid_range = list(copy.deepcopy(opp_uid_range))
# np.random.shuffle(opp_uid_range)
# self.attack_order[thread] = opp_uid_range
opp_uid_range = GlobalConfig.ScenarioConfig.AGENT_ID_EACH_TEAM[1-self.team]
pos_arr_2d = np.array([_info['agentLocationArr'][:2] for _info in raw_info])
opp_pos_arr = pos_arr_2d[opp_uid_range]
self_air_uid_range = [info['uId'] for info in raw_info if info['agentAlive'] and info['agentTeam'] == self.team and info['type']=='RLA_UAV_Support']
N_leader = len(self_air_uid_range)
self_ground_uid_range = [info['uId'] for info in raw_info if info['agentAlive'] and info['agentTeam'] == self.team and info['type']!='RLA_UAV_Support']
if N_leader > 0:
self_air_pos_arr = pos_arr_2d[self_air_uid_range]
assignments = assign_opponent(
opp_pos_arr=opp_pos_arr,
opp_id_arr=opp_uid_range,
leader_pos_arr = self_air_pos_arr,
leader_id_arr=self_air_uid_range
)
for group in range(N_leader):
attack_uid = assignments[self_air_uid_range[group]]
group_member_uids = [uid for uid in self_ground_uid_range if uid%N_leader==group]
for group_member_uid in group_member_uids:
agent_team_index = raw_info[group_member_uid]['indexInTeam']
act_each_agent[agent_team_index] = encode_action_as_digits('SpecificAttacking', 'N/A', x=None, y=None, z=None, UID=attack_uid, T=None, T_index=None)
leader_uid = self_air_uid_range[group]
agent_team_index = raw_info[leader_uid]['indexInTeam']
z_leader = raw_info[leader_uid]['agentLocation']['z']
if len(group_member_uids) > 0:
team_center_pos = pos_arr_2d[group_member_uid]
act_each_agent[agent_team_index] = encode_action_as_digits('PatrolMoving', 'N/A', x=team_center_pos[0], y=team_center_pos[1], z=z_leader, UID=None, T=None, T_index=None)
else:
act_each_agent[agent_team_index] = encode_action_as_digits('SpecificAttacking', 'N/A', x=None, y=None, z=None, UID=attack_uid, T=None, T_index=None)
return act_each_agent
else:
center_pos_kd = pos_arr_2d[self_ground_uid_range].mean(0, keepdims=True)
dis = distance_mat_between(center_pos_kd, opp_pos_arr)
target_index = np.argmin(dis.squeeze())
attack_uid = opp_uid_range[target_index]
group_member_uids = self_ground_uid_range
for group_member_uid in group_member_uids:
agent_team_index = raw_info[group_member_uid]['indexInTeam']
act_each_agent[agent_team_index] = encode_action_as_digits('SpecificAttacking', 'N/A',
x=None, y=None, z=None, UID=attack_uid, T=None, T_index=None)
return act_each_agent
class DummyAlgorithmLinedAttack(DummyAlgorithmBase):
def __init__(self, n_agent, n_thread, space, mcv=None, team=None):
super().__init__(n_agent, n_thread, space, mcv, team)
sync_state = [self.__dict__.copy()]*self.n_thread
# multi-thread decision making
from UTIL.shm_pool import SmartPool
self.process_pool = SmartPool(fold=1, proc_num=self.n_thread, base_seed=0)
self.process_pool.add_target(name='DT%d'%self.team, lam=ThreadDecisionMaker)
atexit.register(self.process_pool.party_over) # failsafe, handles shm leak
self.process_pool.exec_target(
name='DT%d'%self.team,
dowhat='apply_context',
args_list=sync_state
)
def interact_with_env(self, State_Recall):
assert State_Recall['Latest-Obs'] is not None, ('make sure obs is ok')
ENV_PAUSE = State_Recall['ENV-PAUSE']
ENV_ACTIVE = ~ENV_PAUSE
assert self.n_thread == len(ENV_ACTIVE), ('the number of thread is wrong?')
n_active_thread = sum(ENV_ACTIVE)
actions = np.zeros(shape=(self.n_thread, self.n_agent, 8 ))
kwargs_L = [{
"env_pause": ENV_PAUSE[thread],
"thread" : thread,
"step_cnt" : State_Recall['Current-Obs-Step'][thread],
"raw_info" : State_Recall['Latest-Team-Info'][thread]['dataArr'],
"Env_Suffered_Reset" : State_Recall['Env-Suffered-Reset'][thread]
} for thread in range(self.n_thread)]
actions = self.process_pool.exec_target(
name='DT%d'%self.team,
dowhat='decide_each_thread',
args_list=kwargs_L
)
actions = np.stack(actions)
# set actions of in-active threads to NaN (will be done again in multi_team.py, this line is not necessary)
actions[ENV_PAUSE] = np.nan
# swap (self.n_thread, self.n_agent) -> (self.n_agent, self.n_thread)
actions = np.swapaxes(actions, 0, 1)
return actions, {}
# 判断agent是否存活
def uid_alive(raw_info, uid):
return raw_info[uid]['agentAlive']
def vector_shift_towards(pos, toward_pos, offset):
delta = toward_pos - pos
delta = delta / (np.linalg.norm(delta) + 1e-10)
return pos + delta * offset
================================================
FILE: PythonExample/hmp_minimal_modules/LICENSE
================================================
MIT License
Copyright (c) 2020 Ankur Deka
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
================================================
FILE: PythonExample/hmp_minimal_modules/MISSION/common/base_env.py
================================================
import numpy as np
class BaseEnv(object):
def __init__(self, rank) -> None:
self.observation_space = None
self.action_space = None
self.rank = rank
def step(self, act):
# obs: a Tensor with shape (n_agent, ...)
# reward: a Tensor with shape (n_agent, 1) or (n_team, 1)
# done: a Bool
# info: a dict
raise NotImplementedError
# Warning: if you have only one team and RewardAsUnity,
# you must make sure that reward has shape=[n_team=1, 1]
# e.g.
# >> RewardForTheOnlyTeam = +1
# >> RewardForAllTeams = np.array([RewardForTheOnlyTeam, ])
# >> return (ob, RewardForAllTeams, done, info)
return (ob, RewardForAllTeams, done, info) # choose this if RewardAsUnity
return (ob, RewardForAllAgents, done, info) # choose this if not RewardAsUnity
def reset(self):
# obs: a Tensor with shape (n_agent, ...)
# info: a dict
raise NotImplementedError
return ob, info
class RawObsArray(object):
raw_obs_size = {} # shared
def __init__(self, key='default'):
self.key = key
if self.key not in self.raw_obs_size:
self.guards_group = []
self.nosize = True
else:
self.guards_group = np.zeros(shape=(self.raw_obs_size[self.key]), dtype=np.float32)
self.nosize = False
self.p = 0
def append(self, buf):
if self.nosize:
self.guards_group.append(buf)
else:
L = len(buf)
self.guards_group[self.p:self.p+L] = buf[:]
self.p += L
def get(self):
if self.nosize:
self.guards_group = np.concatenate(self.guards_group)
self.raw_obs_size[self.key] = len(self.guards_group)
return self.guards_group
def get_group_size(self):
return len(self.guards_group)
def get_raw_obs_size(self):
assert self.key in self.raw_obs_size
return self.raw_obs_size[self.key]
================================================
FILE: PythonExample/hmp_minimal_modules/MISSION/env_router.py
================================================
import_path_ref = {
"collective_assult": ("MISSION.collective_assult.collective_assult_parallel_run", 'ScenarioConfig'),
"dca_multiteam": ("MISSION.dca_multiteam.collective_assult_parallel_run", 'ScenarioConfig'),
"collective_assult_debug": ("MISSION.collective_assult_debug.collective_assult_parallel_run", 'ScenarioConfig'),
"air_fight": ("MISSION.air_fight.environment.air_fight_compat", 'ScenarioConfig'),
"native_gym": ("MISSION.native_gym.native_gym_config", 'ScenarioConfig'),
"starcraft2": ("MISSION.starcraft.sc2_env_wrapper", 'ScenarioConfig'),
"sc2": ("MISSION.starcraft.sc2_env_wrapper", 'ScenarioConfig'),
"unity_game": ("MISSION.unity_game.unity_game_wrapper", 'ScenarioConfig'),
"sr_tasks->cargo": ("MISSION.sr_tasks.multiagent.scenarios.cargo", 'ScenarioConfig'),
"sr_tasks->hunter_invader": ("MISSION.sr_tasks.multiagent.scenarios.hunter_invader", 'ScenarioConfig'),
"sr_tasks->hunter_invader3d": ("MISSION.sr_tasks.multiagent.scenarios.hunter_invader3d", 'ScenarioConfig'),
"sr_tasks->hunter_invader3d_v2": ("MISSION.sr_tasks.multiagent.scenarios.hunter_invader3d_v2",'ScenarioConfig'),
"bvr": ("MISSION.bvr_sim.init_env", 'ScenarioConfig'),
"mathgame": ("MISSION.math_game.env", 'ScenarioConfig'),
"uhmap": ("MISSION.uhmap.uhmap_env_wrapper", 'ScenarioConfig'),
}
env_init_function_ref = {
"collective_assult": ("MISSION.collective_assult.collective_assult_parallel_run", 'make_collective_assult_env'),
"dca_multiteam": ("MISSION.dca_multiteam.collective_assult_parallel_run", 'make_collective_assult_env'),
"collective_assult_debug": ("MISSION.collective_assult_debug.collective_assult_parallel_run", 'make_collective_assult_env'),
"air_fight": ("MISSION.air_fight.environment.air_fight_compat", 'make_air_fight_env'),
"native_gym": ("MISSION.native_gym.native_gym_config", 'env_init_function'),
"starcraft2": ("MISSION.starcraft.sc2_env_wrapper", 'make_sc2_env'),
"sc2": ("MISSION.starcraft.sc2_env_wrapper", 'make_sc2_env'),
"unity_game": ("MISSION.unity_game.unity_game_wrapper", 'make_env'),
"sr_tasks": ("MISSION.sr_tasks.multiagent.scenario", 'sr_tasks_env'),
"bvr": ("MISSION.bvr_sim.init_env", 'make_bvr_env'),
"mathgame": ("MISSION.math_game.env", 'make_math_env'),
"uhmap": ("MISSION.uhmap.uhmap_env_wrapper", 'make_uhmap_env'),
}
##################################################################################################################################
##################################################################################################################################
from config import GlobalConfig
import importlib, os
from UTIL.colorful import print亮蓝
def load_ScenarioConfig():
if GlobalConfig.env_name not in import_path_ref:
assert False, ('need to find path of ScenarioConfig')
import_path, ScenarioConfig = import_path_ref[GlobalConfig.env_name]
GlobalConfig.ScenarioConfig = getattr(importlib.import_module(import_path), ScenarioConfig)
def make_env_function(env_name, rank):
load_ScenarioConfig()
ref_env_name = env_name
if 'native_gym' in env_name:
assert '->' in env_name
ref_env_name, env_name = env_name.split('->')
elif 'sr_tasks' in env_name:
assert '->' in env_name
ref_env_name, env_name = env_name.split('->')
import_path, func_name = env_init_function_ref[ref_env_name]
env_init_function = getattr(importlib.import_module(import_path), func_name)
return lambda: env_init_function(env_name, rank)
def make_parallel_envs(process_pool, marker=''):
from UTIL.shm_env import SuperpoolEnv
from config import GlobalConfig
from MISSION.env_router import load_ScenarioConfig
load_ScenarioConfig()
env_args_dict_list = [({
'env_name':GlobalConfig.env_name,
'proc_index':i if 'test' not in marker else -(i+1),
'marker':marker
},) for i in range(GlobalConfig.num_threads)]
if GlobalConfig.env_name == 'air_fight':
# This particular env has a dll file
# that must be loaded in main process
# 艹tmd有个dll必须在主进程加载
from MISSION.air_fight.environment.pytransform import pyarmor_runtime
pyarmor_runtime()
if GlobalConfig.env_name == 'bvr':
# 1、如果没用hmp的docker,请设置好 YOUR_ROOT_PASSWORD,不止这一处,请全局搜索"YOUR_ROOT_PASSWORD"替换所有
# 2、用docker的sock挂载到容器中,方法在SetupDocker.md中
print亮蓝('[env_router]: here goes the docker in docker check.')
YOUR_ROOT_PASSWORD = 'clara' # the sudo password
os.system("echo %s|sudo -S date"%YOUR_ROOT_PASSWORD) # get sudo power
res = os.popen("sudo docker ps").read()
if "CONTAINER ID" not in res:
print亮蓝('[env_router]: Error checking docker in docker, can not control host docker interface!')
raise "Error checking docker in docker, can not control host docker interface!"
pass
if GlobalConfig.env_name == 'collective_assult_debug':
# This particular env has a cython file that needs to be compiled in main process
# that must be loaded in main process
from MISSION.collective_assult_debug.cython_func import laser_hit_improve3
if GlobalConfig.env_name == 'dca_multiteam':
# This particular env has a cython file that needs to be compiled in main process
# that must be loaded in main process
from MISSION.dca_multiteam.cython_func import laser_hit_improve3
if GlobalConfig.env_name == 'uhmap':
# This particular env has a cython file that needs to be compiled in main process
# that must be loaded in main process
from MISSION.uhmap.SubTasks.cython_func import tear_number_apart
if GlobalConfig.num_threads > 1:
envs = SuperpoolEnv(process_pool, env_args_dict_list)
else:
envs = SuperpoolEnv(process_pool, env_args_dict_list)
return envs
================================================
FILE: PythonExample/hmp_minimal_modules/MISSION/readme.md
================================================
# Task Configuration Core Fields:
## Parameter Internal Relationship
* You may notice some configuration field ends with ```_cv```, they are parameters chained with other parameters. For example, when changing the ```map```, the limit of ```episode_length``` and the number of agents ```N_AGENT_EACH_TEAM``` are implicated and also need to be changed.
To make it simple, we add ```episode_length_cv``` and ```N_AGENT_EACH_TEAM_cv``` to record this link with lambda function.
* When parameters (e.g. ```map```) that are bind to other parameters are changed,
the Transparent Parameter Control (TPC) module will scan and parse variables with twin variables that end with ```_cv```, and automatically modify their values. (refer to ./UTIL/config_args.py)
Generally, you can safely ignore them and only pay attention to fields below.
## Fields
| Field | Value | Explaination | zh Explaination |
| ---- | ---- | ---- | ---- |
| N_TEAM | ```int``` | the number of agent teams in the tasks, information cannot be shared between different team | 队伍数量,每个队伍被一个ALGORITHM模块控制,队伍之间不可共享信息。大多数任务中,队伍之间是敌对关系 |
| N_AGENT_EACH_TEAM | ```list (of int)``` | the number of agents in each team | 每个队伍的智能体数量 |
| AGENT_ID_EACH_TEAM | ```list of list (of int)``` | the ID of agents in each team, double layer list, must agree with N_AGENT_EACH_TEAM! | 每个队伍的智能体的ID,双层列表,必须与N_AGENT_EACH_TEAM对应! |
| TEAM_NAMES | ```list (of string)``` | use which ALGORITHM to control each team, fill the path of chosen algorithm and its main class name, e.g.```"ALGORITHM.conc.foundation->ReinforceAlgorithmFoundation"``` | 选择每支队伍的控制算法,填写控制算法主模块的路径和类名|
| RewardAsUnity | ```bool``` | Shared reward, or each agent has individual reward signal | 每个队伍的智能体共享集体奖励(True),或者每个队伍的智能体都独享个体奖励(False) |
| ObsAsUnity | ```bool``` | Agents do not has individual observation, only shared collective observation | 没有个体观测值,整个群体的观测值获取方式如同单智能体问题一样 |
| StateProvided | ```bool``` | Whether the global state is provided in training. If True, the Algorithm can access both ```obs``` and ```state``` during training | 是否在训练过程中提供全局state |
# * How to Introduce a New Mission Environment
### Step 1: Declare Mission Info (how many agents and actions, maximum episode steps et.al.)
- make a folder under ```./MISSION```, e.g. ```./MISSION/uhmap.```
- make a py file, e.g. ```./MISSION/uhmap/uhmap_env_wrapper.py```
- in ```uhmap_env_wrapper.py```, copy and paste following template:
```python
from UTIL.config_args import ChainVar
# please register this ScenarioConfig into MISSION/env_router.py
class ScenarioConfig(object):
'''
ScenarioConfig: This config class will be 'injected' with new settings from JSONC.
(E.g., override configs with ```python main.py --cfg example.jsonc```)
(As the name indicated, ChainVars will change WITH vars it 'chained_with' during config injection)
(please see UTIL.config_args to find out how this advanced trick works out.)
'''
n_team1agent = 5
# Needed by the hmp core #
N_TEAM = 1
N_AGENT_EACH_TEAM = [n_team1agent,]
N_AGENT_EACH_TEAM_cv = ChainVar(lambda n_team1agent: [n_team1agent,], chained_with=['n_team1agent'])
AGENT_ID_EACH_TEAM = [range(0,n_team1agent),]
AGENT_ID_EACH_TEAM_cv = ChainVar(lambda n_team1agent: [range(0,n_team1agent),], chained_with=['n_team1agent'])
TEAM_NAMES = ['ALGORITHM.None->None',]
'''
## If the length of action array == the number of teams, set ActAsUnity to True
## If the length of action array == the number of agents, set ActAsUnity to False
'''
ActAsUnity = False
'''
## If the length of reward array == the number of agents, set RewardAsUnity to False
## If the length of reward array == 1, set RewardAsUnity to True
'''
RewardAsUnity = True
'''
## If the length of obs array == the number of agents, set ObsAsUnity to False
## If the length of obs array == the number of teams, set ObsAsUnity to True
'''
ObsAsUnity = False
# Needed by env itself #
MaxEpisodeStep = 100
render = False
# Needed by some ALGORITHM #
StateProvided = False
AvailActProvided = False
EntityOriented = False
n_actions = 2
obs_vec_length = 10
```
### Step 2: Writing Environment
- For convenience, please copy and paste ```class BaseEnv(object)``` into your script:
```python
class BaseEnv(object):
def __init__(self, rank) -> None:
self.observation_space = None
self.action_space = None
self.rank = rank
def step(self, act):
# obs: a Tensor with shape (n_agent, ...)
# reward: a Tensor with shape (n_agent, 1) or (n_team, 1)
# done: a Bool
# info: a dict
raise NotImplementedError
# Warning: if you have only one team and RewardAsUnity,
# you must make sure that reward has shape=[n_team=1, 1]
# e.g.
# >> RewardForTheOnlyTeam = +1
# >> RewardForAllTeams = np.array([RewardForTheOnlyTeam, ])
# >> return (ob, RewardForAllTeams, done, info)
return (ob, RewardForAllTeams, done, info) # choose this if RewardAsUnity
return (ob, RewardForAllAgents, done, info) # choose this if not RewardAsUnity
def reset(self):
# obs: a Tensor with shape (n_agent, ...)
# done: a Bool
raise NotImplementedError
return ob, info
```
- Then create a class that inherit from it (```class UhmapEnv(BaseEnv)```):
```python
class UhmapEnv(BaseEnv):
def __init__(self, rank) -> None:
super().__init__(rank)
self.id = rank
self.render = ScenarioConfig.render and (self.id==0)
self.n_agents = ScenarioConfig.n_team1agent
# self.observation_space = ?
# self.action_space = ?
if ScenarioConfig.StateProvided:
# self.observation_space['state_shape'] = ?
pass
if self.render:
# render init
pass
```
- Next, it is time to write your own code of ```step()``` and ```reset()``` function.
There is little we can help about that, as it is your custom environment after all.
### Step 3: Write a Function to Initialize the Environment
A empty function getting a instance of environment, it will used in step 4.
But don'y worry, two lines of code will do:
```python
# please register this into MISSION/env_router.py
def make_uhmap_env(env_id, rank):
return UhmapEnv(rank)
```
### Step 4: Make Everything Kiss Together
This step will make HMP aware of the existence of this new MISSION.
- Open ```MISSION/env_router.py```
- Add the path of environment's configuration in ```import_path_ref```
``` python
import_path_ref = {
"uhmap": ("MISSION.uhmap.uhmap_env_wrapper", 'ScenarioConfig'),
}
```
- Add the path of environment's init function in ```env_init_function_ref```, e.g.:
``` python
env_init_function_ref = {
"uhmap": ("MISSION.uhmap.uhmap_env_wrapper", "make_uhmap_env"),
}
```
### Step 5: Write a Config Override to Start Experiment
Create a ```exp.jsonc``` or ```json``` file,
copy and paste following content, and please pay attention to lines marked with ```***```, they are the most important ones:
```jsonc
{
// config HMP core
"config.py->GlobalConfig": {
"note": "uhmp-dev",
"env_name": "uhmap", // *** the selection of MISSION
"env_path": "MISSION.uhmap", // *** confirm the path of env (a fail safe)
"draw_mode": "Img",
"num_threads": "1",
"report_reward_interval": "1",
"test_interval": "128",
"test_epoch": "4",
"device": "cuda",
"max_n_episode": 500000,
"fold": "4",
"backup_files": [
]
},
// config MISSION
"MISSION.uhmap.uhmap_env_wrapper.py->ScenarioConfig": { // *** must kiss with "env_name" and "env_path"
// remember this? declared in ScenarioConfig class in ./MISSION/math_game/uhmap.py.
"n_team1agent": 4,
"n_actions": 10,
"StateProvided": false,
"TEAM_NAMES": [
"ALGORITHM.conc_4hist.foundation->ReinforceAlgorithmFoundation" // *** select ALGORITHMs
]
},
// config ALGORITHMs
"ALGORITHM.conc_4hist.foundation.py->AlgorithmConfig": { // must kiss with "TEAM_NAMES"
"train_traj_needed": "16",
"prevent_batchsize_oom": "True",
"n_focus_on": 3,
"lr": 0.0005,
"ppo_epoch": 24,
"gamma_in_reward_forwarding": "True",
"gamma_in_reward_forwarding_value": 0.95,
"gamma": 0.99
}
}
```
At last, run experiment with ```python main.py --cfg ./path-to-exp-json/exp.jsonc```.
================================================
FILE: PythonExample/hmp_minimal_modules/MISSION/uhmap/SubTasks/SubtaskCommonFn.py
================================================
import json, copy, re, os, inspect, os
import numpy as np
from UTIL.tensor_ops import my_view, repeat_at
from ...common.base_env import RawObsArray
from ..actionset_v3 import digitsToStrAction
from ..agent import Agent
from ..uhmap_env_wrapper import UhmapEnv, ScenarioConfig
from .cython_func import tear_num_arr
from ..actset_lookup import digit2act_dictionary, AgentPropertyDefaults
from ..actset_lookup import decode_action_as_string, decode_action_as_string
class UhmapCommonFn(UhmapEnv):
def reset(self):
"""
Reset function, it delivers reset command to unreal engine to spawn all agents
环境复位,每个episode的开始会执行一次此函数中会初始化所有智能体
"""
super().reset()
self.t = 0
pos_ro = np.random.rand()*2*np.pi
# spawn agents
AgentSettingArray = []
# count the number of agent in each team
n_team_agent = {}
for i, agent_info in enumerate(self.SubTaskConfig.agent_list):
team = agent_info['team']
if team not in n_team_agent: n_team_agent[team] = 0
self.SubTaskConfig.agent_list[i]['uid'] = i
self.SubTaskConfig.agent_list[i]['tid'] = n_team_agent[team]
n_team_agent[team] += 1
self.n_team_agent = n_team_agent
# push agent init info one by one
for i, agent_info in enumerate(self.SubTaskConfig.agent_list):
team = agent_info['team']
agent_info['n_team_agent'] = n_team_agent[team]
init_fn = getattr(self, agent_info['init_fn_name'])
AgentSettingArray.append(init_fn(agent_info, pos_ro))
self.agents = [Agent(team=a['team'], team_id=a['tid'], uid=a['uid']) for a in self.SubTaskConfig.agent_list]
# refer to struct.cpp, FParsedDataInput
resp = self.client.send_and_wait_reply(json.dumps({
'valid': True,
'DataCmd': 'reset',
'NumAgents' : len(self.SubTaskConfig.agent_list),
'AgentSettingArray': AgentSettingArray, # refer to struct.cpp, FAgentProperty
'TimeStepMax': ScenarioConfig.MaxEpisodeStep,
'TimeStep' : 0,
'Actions': None,
}))
resp = json.loads(resp)
# make sure the map (level in UE) is correct
# assert resp['dataGlobal']['levelName'] == 'UhmapLargeScale'
assert len(resp['dataArr']) == len(AgentSettingArray), "Illegal agent initial position. 非法的智能体初始化位置,一部分智能体没有生成."
return self.parse_response_ob_info(resp)
def step(self, act):
"""
step 函数,act中包含了所有agent的决策
"""
assert len(act) == self.n_agents
# translate actions to the format recognized by unreal engine
if self.SubTaskConfig.ActionFormat == 'Single-Digit':
act_send = [digit2act_dictionary[a] for a in act]
elif self.SubTaskConfig.ActionFormat == 'Multi-Digit':
act_send = [decode_action_as_string(a) for a in act]
elif self.SubTaskConfig.ActionFormat == 'ASCII':
act_send = [digitsToStrAction(a) for a in act]
else:
act_send = [digitsToStrAction(a) for a in act]
# simulation engine IO
resp = json.loads(self.client.send_and_wait_reply(json.dumps({
'valid': True,
'DataCmd': 'step',
'TimeStep': self.t,
'Actions': None,
'StringActions': act_send,
})))
# get obs for RL, info for script AI
ob, info = self.parse_response_ob_info(resp)
# generate reward, get the episode ending infomation
RewardForAllTeams, WinningResult = self.gen_reward_and_win(resp)
if WinningResult is not None:
info.update(WinningResult)
assert resp['dataGlobal']['episodeDone']
done = True
else:
done = False
if resp['dataGlobal']['timeCnt'] >= ScenarioConfig.MaxEpisodeStep:
assert done
return (ob, RewardForAllTeams, done, info) # choose this if RewardAsUnity
def parse_event(self, event):
"""
解析环境返回的一些关键事件,
如智能体阵亡,某队伍胜利等等。
关键事件需要在ue中进行定义.
该设计极大地简化了python端奖励的设计流程,
减小了python端的运算量。
"""
if not hasattr(self, 'pattern'): self.pattern = re.compile(r'<([^<>]*)>([^<>]*)')
return {k:v for k,v in re.findall(self.pattern, event)}
def extract_key_gameobj(self, resp):
"""
获取非智能体的仿真物件,例如重要landmark等
"""
keyObjArr = resp['dataGlobal']['keyObjArr']
return keyObjArr
def gen_reward_and_win(self, resp):
"""
奖励的设计在此定义,
(UE端编程死板,虽然预留了相关字段,
但请不要在UE端提供奖励的定义。)
建议:在UE端定义触发奖励的事件,如智能体阵亡、战术目标完成等,见parse_event
"""
reward = [0]*self.n_teams
events = resp['dataGlobal']['events']
WinningResult = None
for event in events:
event_parsed = self.parse_event(event)
# if event_parsed['Event'] == 'Destroyed':
# team = self.find_agent_by_uid(event_parsed['UID']).team
# reward[team] -= 0.05 # this team
# reward[1-team] += 0.10 # opp team
if event_parsed['Event'] == 'EndEpisode':
# print([a.alive * a.hp for a in self.agents])
WaterdropWin = False
WaterdropRank = False
WaterdropReward = 0
ShipWin = -1
ShipRank = -1
ShipReward = 0
EndReason = event_parsed['EndReason']
# According to MISSION\uhmap\SubTasks\UhmapWaterdropConf.py, team 0 is Ship team, team 1 is Waterdrop team
if EndReason == "ShipNumLessThanTheshold" or EndReason == "Team_0_AllDead":
WaterdropWin = True; WaterdropRank = 0; WaterdropReward = 1
ShipWin = False; ShipRank = 1; ShipReward = -1
elif EndReason == "TimeMaxCntReached" or EndReason == "Team_1_AllDead":
WaterdropWin = False; WaterdropRank = 1; WaterdropReward = -1
ShipWin = True; ShipRank = 0; ShipReward = 1
else:
print('unexpected end reaon:', EndReason)
WinningResult = {"team_ranking": [ShipRank, WaterdropRank], "end_reason": EndReason}
reward = [ShipReward, WaterdropReward]
# print(reward)
return reward, WinningResult
def step_skip(self):
"""
跳过一次决策,无用的函数
"""
return self.client.send_and_wait_reply(json.dumps({
'valid': True,
'DataCmd': 'skip_frame',
}))
def find_agent_by_uid(self, uid):
"""
用uid查找智能体(带缓存加速机制)
"""
if not hasattr(self, 'uid_to_agent_dict'):
self.uid_to_agent_dict = {}
self.uid_to_agent_dict.update({agent.uid:agent for agent in self.agents})
if isinstance(uid, str):
self.uid_to_agent_dict.update({str(agent.uid):agent for agent in self.agents})
return self.uid_to_agent_dict[uid]
def parse_response_ob_info(self, resp):
"""
粗解析智能体的观测,例如把死智能体的位置替换为inf(无穷远),
将智能体的agentLocation从字典形式转变为更简洁的(x,y,z)tuple形式
"""
assert resp['valid']
resp['dataGlobal']['distanceMat'] = np.array(resp['dataGlobal']['distanceMat']['flat_arr']).reshape(self.n_agents,self.n_agents)
if len(resp['dataGlobal']['events'])>0:
tmp = [kv.split('>') for kv in resp['dataGlobal']['events'][0].split('<') if kv]
info_parse = {t[0]:t[1] for t in tmp}
info_dict = resp
for info in info_dict['dataArr']:
alive = info['agentAlive']
if alive:
agentLocation = info.pop('agentLocation')
agentRotation = info.pop('agentRotation')
agentVelocity = info.pop('agentVelocity')
agentScale = info.pop('agentScale')
info['agentLocationArr'] = (agentLocation['x'], agentLocation['y'], agentLocation['z'])
info['agentVelocityArr'] = (agentVelocity['x'], agentVelocity['y'], agentVelocity['z'])
info['agentRotationArr'] = (agentRotation['yaw'], agentRotation['pitch'], agentRotation['roll'])
info['agentScaleArr'] = (agentScale['x'], agentScale['y'], agentScale['z'])
info.pop('previousAction')
info.pop('availActions')
# info.pop('rSVD1')
info.pop('interaction')
else:
inf = float('inf')
info['agentLocationArr'] = (inf, inf, inf)
info['agentVelocityArr'] = (inf, inf, inf)
info['agentRotationArr'] = (inf, inf, inf)
info = resp['dataArr']
for i, agent_info in enumerate(info):
self.agents[i].update_agent_attrs(agent_info)
self.key_obj = self.extract_key_gameobj(resp)
# return ob, info
return self.make_obs(resp), info_dict
@staticmethod
def item_random_mv(src,dst,prob,rand=False):
assert len(src.shape)==1; assert len(dst.shape)==1
if rand: np.random.shuffle(src)
len_src = len(src)
n_mv = (np.random.rand(len_src) < prob).sum()
item_mv = src[range(len_src-n_mv,len_src)]
src = src[range(0,0+len_src-n_mv)]
dst = np.concatenate((item_mv, dst))
return src, dst
@staticmethod
def get_binary_array(n_int, n_bits=8, dtype=np.float32):
arr = np.zeros((*n_int.shape, n_bits), dtype=dtype)
for i in range(n_bits):
arr[:, i] = (n_int%2==1).astype(int)
n_int = n_int / 2
n_int = n_int.astype(np.int8)
return arr
def make_obs(self, resp=None, get_shape=False):
# CORE_DIM = 38
CORE_DIM = 23
assert ScenarioConfig.obs_vec_length == CORE_DIM
if get_shape:
return CORE_DIM
# temporary parameters
OBS_RANGE_PYTHON_SIDE = 15000
MAX_NUM_OPP_OBS = 5
MAX_NUM_ALL_OBS = 5
# get and calculate distance array
pos3d_arr = np.zeros(shape=(self.n_agents, 3), dtype=np.float32)
for i, agent in enumerate(self.agents): pos3d_arr[i] = agent.pos3d
# use the distance matrix calculated by unreal engine to accelerate
# dis_mat = distance_matrix(pos3d_arr) # dis_mat is a matrix, shape = (n_agent, n_agent)
dis_mat = resp['dataGlobal']['distanceMat']
alive_all = np.array([agent.alive for agent in self.agents])
try:
dis_mat[~alive_all,:] = +np.inf
dis_mat[:,~alive_all] = +np.inf
except:
pass
# get team list
team_belonging = np.array([agent.team for agent in self.agents])
# gather the obs arr of all known agents
obs_arr = RawObsArray(key='Agent')
if not hasattr(self, "uid_binary"):
self.uid_binary = self.get_binary_array(np.arange(self.n_agents), 10)
for i, agent in enumerate(self.agents):
assert agent.location is not None
assert agent.uid == i
obs_arr.append(
self.uid_binary[i] # 0~9
)
obs_arr.append([
agent.index, # 10
agent.team, # 11
agent.alive, # 12
agent.uid_remote, # 13
])
obs_arr.append( #[14,15,16,17,18,19]
agent.pos3d
# tear_num_arr(agent.pos3d, n_digits=6, base=10, mv_left=0)
# tear_num_arr(agent.pos3d, 6, ScenarioConfig.ObsBreakBase, 0) # 3 -- > 3*6 = 18 , 18-3=15, 23+15 = 38
)
obs_arr.append(
agent.vel3d
)
obs_arr.append([
agent.hp,
agent.yaw,
agent.max_speed,
])
obs_ = obs_arr.get()
new_obs = my_view(obs_, [self.n_agents, -1])
assert CORE_DIM == new_obs.shape[-1]
OBS_ALL_AGENTS = np.zeros(shape=(self.n_agents, MAX_NUM_OPP_OBS+MAX_NUM_ALL_OBS, CORE_DIM))
# now arranging the individual obs
for i, agent in enumerate(self.agents):
if not agent.alive:
OBS_ALL_AGENTS[i, :] = np.nan
continue
# if alive
# scope
dis2all = dis_mat[i, :]
is_ally = (team_belonging == agent.team)
# scope
a2h_dis = dis2all[~is_ally]
h_alive = alive_all[~is_ally]
h_feature = new_obs[~is_ally]
h_iden_sort = np.argsort(a2h_dis)[:MAX_NUM_OPP_OBS]
a2h_dis_sorted = a2h_dis[h_iden_sort]
h_alive_sorted = h_alive[h_iden_sort]
h_vis_mask = (a2h_dis_sorted <= OBS_RANGE_PYTHON_SIDE) & h_alive_sorted
# scope
h_vis_index = h_iden_sort[h_vis_mask]
h_invis_index = h_iden_sort[~h_vis_mask]
h_vis_index, h_invis_index = self.item_random_mv(src=h_vis_index, dst=h_invis_index,prob=0, rand=True)
h_ind = np.concatenate((h_vis_index, h_invis_index))
h_msk = np.concatenate((h_vis_index<0, h_invis_index>=0)) # "<0" project to False; ">=0" project to True
a2h_feature_sort = h_feature[h_ind]
a2h_feature_sort[h_msk] = 0
if len(a2h_feature_sort)
a2f_dis = dis2all[is_ally]
f_alive = alive_all[is_ally]
f_feature = new_obs[is_ally]
f_iden_sort = np.argsort(a2f_dis)[:MAX_NUM_ALL_OBS]
a2f_dis_sorted = a2f_dis[f_iden_sort]
f_alive_sorted = f_alive[f_iden_sort]
f_vis_mask = (a2f_dis_sorted <= OBS_RANGE_PYTHON_SIDE) & f_alive_sorted
# scope
f_vis_index = f_iden_sort[f_vis_mask]
self_vis_index = f_vis_index[:1] # seperate self and ally
f_vis_index = f_vis_index[1:] # seperate self and ally
f_invis_index = f_iden_sort[~f_vis_mask]
f_vis_index, f_invis_index = self.item_random_mv(src=f_vis_index, dst=f_invis_index,prob=0, rand=True)
f_ind = np.concatenate((self_vis_index, f_vis_index, f_invis_index))
f_msk = np.concatenate((self_vis_index<0, f_vis_index<0, f_invis_index>=0)) # "<0" project to False; ">=0" project to True
self_ally_feature_sort = f_feature[f_ind]
self_ally_feature_sort[f_msk] = 0
if len(self_ally_feature_sort) None:
super().__init__(rank)
self.observation_space = self.make_obs(get_shape=True)
self.SubTaskConfig = SubTaskConfig
assert os.path.basename(inspect.getfile(SubTaskConfig)) == type(self).__name__+'Conf.py', \
('make sure you have imported the correct SubTaskConfig class')
def extract_key_gameobj(self, resp):
keyObjArr = resp['dataGlobal']['keyObjArr']
return keyObjArr
def gen_reward_and_win(self, resp):
reward = [0]*self.n_teams
events = resp['dataGlobal']['events']
WinningResult = None
for event in events:
event_parsed = self.parse_event(event)
if event_parsed['Event'] == 'Destroyed':
team = self.find_agent_by_uid(event_parsed['UID']).team
reward[team] -= 0.05 # this team
reward[1-team] += 0.10 # opp team
if event_parsed['Event'] == 'EndEpisode':
# print([a.alive * a.hp for a in self.agents])
EndReason = event_parsed['EndReason']
WinTeam = int(event_parsed['WinTeam'])
if WinTeam<0: # end due to timeout
agents_left_each_team = [0 for _ in range(self.n_teams)]
for a in self.agents:
if a.alive: agents_left_each_team[a.team] += 1
WinTeam = np.argmax(agents_left_each_team)
# <<1>> The alive agent number is EQUAL
if agents_left_each_team[WinTeam] == agents_left_each_team[1-WinTeam]:
hp_each_team = [0 for _ in range(self.n_teams)]
for a in self.agents:
if a.alive: hp_each_team[a.team] += a.hp
WinTeam = np.argmax(hp_each_team)
# <<2>> The alive agent HP sum is EQUAL
if hp_each_team[WinTeam] == hp_each_team[1-WinTeam]:
WinTeam = -1
if WinTeam >= 0:
WinningResult = {
"team_ranking": [0,1] if WinTeam==0 else [1,0],
"end_reason": EndReason
}
reward[WinTeam] += 1
reward[1-WinTeam] -= 1
else:
WinningResult = {
"team_ranking": [-1, -1],
"end_reason": EndReason
}
reward = [-1 for _ in range(self.n_teams)]
# print(reward)
return reward, WinningResult
@staticmethod
def item_random_mv(src,dst,prob,rand=False):
assert len(src.shape)==1; assert len(dst.shape)==1
if rand: np.random.shuffle(src)
len_src = len(src)
n_mv = (np.random.rand(len_src) < prob).sum()
item_mv = src[range(len_src-n_mv,len_src)]
src = src[range(0,0+len_src-n_mv)]
dst = np.concatenate((item_mv, dst))
return src, dst
@staticmethod
def get_binary_array(n_int, n_bits=8, dtype=np.float32):
arr = np.zeros((*n_int.shape, n_bits), dtype=dtype)
for i in range(n_bits):
arr[:, i] = (n_int%2==1).astype(int)
n_int = n_int / 2
n_int = n_int.astype(np.int8)
return arr
def make_obs(self, resp=None, get_shape=False):
# CORE_DIM = 38
CORE_DIM = 23
assert ScenarioConfig.obs_vec_length == CORE_DIM
if get_shape:
return CORE_DIM
# temporary parameters
OBS_RANGE_PYTHON_SIDE = 1500
MAX_NUM_OPP_OBS = 5
MAX_NUM_ALL_OBS = 5
# get and calculate distance array
pos3d_arr = np.zeros(shape=(self.n_agents, 3), dtype=np.float32)
for i, agent in enumerate(self.agents): pos3d_arr[i] = agent.pos3d
# use the distance matrix calculated by unreal engine to accelerate
# dis_mat = distance_matrix(pos3d_arr) # dis_mat is a matrix, shape = (n_agent, n_agent)
dis_mat = resp['dataGlobal']['distanceMat']
alive_all = np.array([agent.alive for agent in self.agents])
try:
dis_mat[~alive_all,:] = +np.inf
dis_mat[:,~alive_all] = +np.inf
except:
pass
# get team list
team_belonging = np.array([agent.team for agent in self.agents])
# gather the obs arr of all known agents
obs_arr = RawObsArray(key='Agent')
if not hasattr(self, "uid_binary"):
self.uid_binary = self.get_binary_array(np.arange(self.n_agents), 10)
for i, agent in enumerate(self.agents):
assert agent.location is not None
assert agent.uid == i
obs_arr.append(
self.uid_binary[i] # 0~9
)
obs_arr.append([
agent.index, # 10
agent.team, # 11
agent.alive, # 12
agent.uid_remote, # 13
])
obs_arr.append( #[14,15,16,17,18,19]
agent.pos3d
# tear_num_arr(agent.pos3d, n_digits=6, base=10, mv_left=0)
# tear_num_arr(agent.pos3d, 6, ScenarioConfig.ObsBreakBase, 0) # 3 -- > 3*6 = 18 , 18-3=15, 23+15 = 38
)
obs_arr.append(
agent.vel3d
)
obs_arr.append([
agent.hp,
agent.yaw,
agent.max_speed,
])
obs_ = obs_arr.get()
new_obs = my_view(obs_, [self.n_agents, -1])
assert CORE_DIM == new_obs.shape[-1]
OBS_ALL_AGENTS = np.zeros(shape=(
self.n_agents,
MAX_NUM_OPP_OBS+MAX_NUM_ALL_OBS,
CORE_DIM
))
# now arranging the individual obs
for i, agent in enumerate(self.agents):
if not agent.alive:
OBS_ALL_AGENTS[i, :] = np.nan
continue
# if alive
# scope
dis2all = dis_mat[i, :]
is_ally = (team_belonging == agent.team)
# scope
a2h_dis = dis2all[~is_ally]
h_alive = alive_all[~is_ally]
h_feature = new_obs[~is_ally]
h_iden_sort = np.argsort(a2h_dis)[:MAX_NUM_OPP_OBS]
a2h_dis_sorted = a2h_dis[h_iden_sort]
h_alive_sorted = h_alive[h_iden_sort]
h_vis_mask = (a2h_dis_sorted <= OBS_RANGE_PYTHON_SIDE) & h_alive_sorted
# scope
h_vis_index = h_iden_sort[h_vis_mask]
h_invis_index = h_iden_sort[~h_vis_mask]
h_vis_index, h_invis_index = self.item_random_mv(src=h_vis_index, dst=h_invis_index,prob=0, rand=True)
h_ind = np.concatenate((h_vis_index, h_invis_index))
h_msk = np.concatenate((h_vis_index<0, h_invis_index>=0)) # "<0" project to False; ">=0" project to True
a2h_feature_sort = h_feature[h_ind]
a2h_feature_sort[h_msk] = 0
if len(a2h_feature_sort)
a2f_dis = dis2all[is_ally]
f_alive = alive_all[is_ally]
f_feature = new_obs[is_ally]
f_iden_sort = np.argsort(a2f_dis)[:MAX_NUM_ALL_OBS]
a2f_dis_sorted = a2f_dis[f_iden_sort]
f_alive_sorted = f_alive[f_iden_sort]
f_vis_mask = (a2f_dis_sorted <= OBS_RANGE_PYTHON_SIDE) & f_alive_sorted
# scope
f_vis_index = f_iden_sort[f_vis_mask]
self_vis_index = f_vis_index[:1] # seperate self and ally
f_vis_index = f_vis_index[1:] # seperate self and ally
f_invis_index = f_iden_sort[~f_vis_mask]
f_vis_index, f_invis_index = self.item_random_mv(src=f_vis_index, dst=f_invis_index,prob=0, rand=True)
f_ind = np.concatenate((self_vis_index, f_vis_index, f_invis_index))
f_msk = np.concatenate((self_vis_index<0, f_vis_index<0, f_invis_index>=0)) # "<0" project to False; ">=0" project to True
self_ally_feature_sort = f_feature[f_ind]
self_ally_feature_sort[f_msk] = 0
if len(self_ally_feature_sort) 0:
OBJ_UID_OFFSET = 32768
obs_arr = RawObsArray(key = 'GameObj')
for i, obj in enumerate(self.key_obj):
assert obj['uId'] - OBJ_UID_OFFSET == i
obs_arr.append(
-self.uid_binary[i] # reverse uid binary, self.uid_binary[i]
)
obs_arr.append([
obj['uId'] - OBJ_UID_OFFSET, #agent.index,
-1, #agent.team,
True, #agent.alive,
obj['uId'] - OBJ_UID_OFFSET, #agent.uid_remote,
])
# tear_num_arr(agent.pos3d, n_digits=6, base=10, mv_left=0)
obs_arr.append(
[
obj['location']['x'], obj['location']['y'], obj['location']['z'] # agent.pos3d
]
# tear_num_arr([
# obj['location']['x'], obj['location']['y'], obj['location']['z'] # agent.pos3d
# ], 6, ScenarioConfig.ObsBreakBase, 0)
)
obs_arr.append([
obj['velocity']['x'], obj['velocity']['y'], obj['velocity']['z'] # agent.vel3d
]+
[
-1, # hp
obj['rotation']['yaw'], # yaw
0, # max_speed
])
OBS_GameObj = my_view(obs_arr.get(), [len(self.key_obj), -1])
OBS_GameObj = OBS_GameObj[:MAX_OBJ_NUM_ACCEPT, :]
OBS_GameObj = repeat_at(OBS_GameObj, insert_dim=0, n_times=self.n_agents)
OBS_ALL_AGENTS = np.concatenate((OBS_ALL_AGENTS, OBS_GameObj), axis=1)
return OBS_ALL_AGENTS
def init_ground(self, agent_info, pos_ro):
N_COL = 2
agent_class = agent_info['type']
team = agent_info['team']
n_team_agent = 10
tid = agent_info['tid']
uid = agent_info['uid']
x = 0 + 800*(tid - n_team_agent//2) //N_COL
y = (400* (tid%N_COL) + 2000) * (-1)**(team+1)
x,y = np.matmul(np.array([x,y]), np.array([[np.cos(pos_ro), -np.sin(pos_ro)], [np.sin(pos_ro), np.cos(pos_ro)] ]))
z = 500 # 500 is slightly above the ground
yaw = 90 if team==0 else -90
assert np.abs(x) < 15000.0 and np.abs(y) < 15000.0
agent_property = copy.deepcopy(AgentPropertyDefaults)
agent_property.update({
'DebugAgent': False,
# max drive/fly speed
'MaxMoveSpeed': 600,
# also influence object mass, please change it with causion!
'AgentScale' : { 'x': 0.75, 'y': 0.75, 'z': 0.75, },
# probability of escaping dmg 闪避
"DodgeProb": 0.0,
# ms explode dmg
"ExplodeDmg": 45,
# team belonging
'AgentTeam': team,
# choose ue class to init
'ClassName': agent_class,
# Weapon CD
'WeaponCD': 1,
# open fire range
"PerceptionRange": 2500,
"GuardRange": 1700,
"FireRange": 1400,
# debugging
'RSVD1': '',
# regular
'RSVD2': '-InitAct=ActionSet2::Idle;AsFarAsPossible',
# agent hp
'AgentHp':np.random.randint(low=90,high=110),
# the rank of agent inside the team
'IndexInTeam': tid,
# the unique identity of this agent in simulation system
'UID': uid,
# show color
'Color':'(R=0,G=1,B=0,A=1)' if team==0 else '(R=0,G=0,B=1,A=1)',
# initial location
'InitLocation': { 'x': x, 'y': y, 'z': z, },
# initial facing direction et.al.
'InitRotator': { 'pitch': 0, 'roll': 0, 'yaw': yaw, },
}),
return agent_property
def init_ground_tank(self, agent_info, pos_ro):
N_COL = 2
agent_class = agent_info['type']
team = agent_info['team']
n_team_agent = 10
tid = agent_info['tid']
uid = agent_info['uid']
x = 0 + 800*(tid - n_team_agent//2) //N_COL
y = (400* (tid%N_COL) + 2000) * (-1)**(team+1)
x,y = np.matmul(np.array([x,y]), np.array([[np.cos(pos_ro), -np.sin(pos_ro)], [np.sin(pos_ro), np.cos(pos_ro)] ]))
z = 500 # 500 is slightly above the ground
yaw = 90 if team==0 else -90
assert np.abs(x) < 15000.0 and np.abs(y) < 15000.0
agent_property = copy.deepcopy(AgentPropertyDefaults)
agent_property.update({
'DebugAgent': False,
# max drive/fly speed
'MaxMoveSpeed': 400,
# also influence object mass, please change it with causion!
'AgentScale' : { 'x': 0.75, 'y': 0.75, 'z': 0.75, },
# probability of escaping dmg 闪避
"DodgeProb": 0.0,
# ms explode dmg
"ExplodeDmg": 75,
# team belonging
'AgentTeam': team,
# choose ue class to init
'ClassName': agent_class,
# Weapon CD
'WeaponCD': 1,
# open fire range
"PerceptionRange": 2000,
"GuardRange": 1400,
"FireRange": 750 ,
# debugging
'RSVD1': '',
# regular
'RSVD2': '-InitAct=ActionSet2::Idle;AsFarAsPossible',
# agent hp
'AgentHp':np.random.randint(low=180,high=220),
# the rank of agent inside the team
'IndexInTeam': tid,
# the unique identity of this agent in simulation system
'UID': uid,
# show color
'Color':'(R=0,G=1,B=0,A=1)' if team==0 else '(R=0,G=0,B=1,A=1)',
# initial location
'InitLocation': { 'x': x, 'y': y, 'z': z, },
# initial facing direction et.al.
'InitRotator': { 'pitch': 0, 'roll': 0, 'yaw': yaw, },
}),
return agent_property
def init_air(self, agent_info, pos_ro):
N_COL = 2
agent_class = agent_info['type']
team = agent_info['team']
n_team_agent = 10
tid = agent_info['tid']
uid = agent_info['uid']
x = 0 + 800*(tid - n_team_agent//2) //N_COL
y = 2000 * (-1)**(team+1)
x,y = np.matmul(np.array([x,y]), np.array([[np.cos(pos_ro), -np.sin(pos_ro)], [np.sin(pos_ro), np.cos(pos_ro)] ]))
z = 1000
yaw = 90 if team==0 else -90
assert np.abs(x) < 15000.0 and np.abs(y) < 15000.0
agent_property = copy.deepcopy(AgentPropertyDefaults)
agent_property.update({
'DebugAgent': False,
# max drive/fly speed
'MaxMoveSpeed': 900,
# also influence object mass, please change it with causion!
'AgentScale' : { 'x': 0.75, 'y': 0.75, 'z': 0.75, },
# probability of escaping dmg 闪避
"DodgeProb": 0.0,
# ms explode dmg
"ExplodeDmg": 10,
# team belonging
'AgentTeam': team,
# choose ue class to init
'ClassName': agent_class,
# Weapon CD
'WeaponCD': 3,
# open fire range
"PerceptionRange": 2500,
"GuardRange": 1800,
"FireRange": 1700,
# debugging
'RSVD1': '-ring1=2500 -ring2=1800 -ring3=1700',
# regular
'RSVD2': '-InitAct=ActionSet2::Idle;StaticAlert',
# agent hp
'AgentHp':np.random.randint(low=40,high=60),
# the rank of agent inside the team
'IndexInTeam': tid,
# the unique identity of this agent in simulation system
'UID': uid,
# show color
'Color':'(R=0,G=1,B=0,A=1)' if team==0 else '(R=0,G=0,B=1,A=1)',
# initial location
'InitLocation': { 'x': x, 'y': y, 'z': z, },
# initial facing direction et.al.
'InitRotator': { 'pitch': 0, 'roll': 0, 'yaw': yaw, },
}),
return agent_property
================================================
FILE: PythonExample/hmp_minimal_modules/MISSION/uhmap/SubTasks/UhmapAdversialConf.py
================================================
class SubTaskConfig():
agent_list = [
{ "team": 0, "type": "RLA_UAV_Support", "init_fn_name": "init_air" },
{ "team": 0, "type": "RLA_CAR", "init_fn_name": "init_ground"},
{ "team": 0, "type": "RLA_CAR_Laser", "init_fn_name": "init_ground_tank"},
{ "team": 0, "type": "RLA_CAR", "init_fn_name": "init_ground"},
{ "team": 0, "type": "RLA_CAR_Laser", "init_fn_name": "init_ground_tank"},
{ "team": 0, "type": "RLA_CAR", "init_fn_name": "init_ground"},
{ "team": 0, "type": "RLA_CAR_Laser", "init_fn_name": "init_ground_tank"},
{ "team": 0, "type": "RLA_CAR", "init_fn_name": "init_ground"},
{ "team": 0, "type": "RLA_CAR_Laser", "init_fn_name": "init_ground_tank"},
{ "team": 0, "type": "RLA_UAV_Support", "init_fn_name": "init_air" },
{ "team": 1, "type": "RLA_UAV_Support", "init_fn_name": "init_air" },
{ "team": 1, "type": "RLA_CAR", "init_fn_name": "init_ground"},
{ "team": 1, "type": "RLA_CAR_Laser", "init_fn_name": "init_ground_tank"},
{ "team": 1, "type": "RLA_CAR", "init_fn_name": "init_ground"},
{ "team": 1, "type": "RLA_CAR_Laser", "init_fn_name": "init_ground_tank"},
{ "team": 1, "type": "RLA_CAR", "init_fn_name": "init_ground"},
{ "team": 1, "type": "RLA_CAR_Laser", "init_fn_name": "init_ground_tank"},
{ "team": 1, "type": "RLA_CAR", "init_fn_name": "init_ground"},
{ "team": 1, "type": "RLA_CAR_Laser", "init_fn_name": "init_ground_tank"},
{ "team": 1, "type": "RLA_UAV_Support", "init_fn_name": "init_air" }
]
obs_vec_length = 23
obs_n_entity = 11
ActionFormat = 'Multi-Digit'
================================================
FILE: PythonExample/hmp_minimal_modules/MISSION/uhmap/SubTasks/UhmapAttackPost.py
================================================
import json, copy, re, os, inspect, os
import numpy as np
from UTIL.tensor_ops import my_view, repeat_at
from ...common.base_env import RawObsArray
from ..actionset_v3 import digitsToStrAction
from ..agent import Agent
from ..uhmap_env_wrapper import UhmapEnv, ScenarioConfig
from .UhmapAttackPostConf import SubTaskConfig
from .cython_func import tear_num_arr
def init_position_helper(x_max, x_min, y_max, y_min, total, this):
n_col = np.ceil(np.sqrt(np.abs(x_max-x_min) * total / np.abs(y_max-y_min)))
n_row = np.ceil(total / n_col)
which_row = this // n_col
which_col = this % n_col
x = x_min + (which_col/n_col)*(x_max-x_min)
y = y_min + (which_row/n_row)*(y_max-y_min)
return x, y
class UhmapAttackPost(UhmapEnv):
def __init__(self, rank) -> None:
super().__init__(rank)
self.observation_space = self.make_obs(get_shape=True)
self.SubTaskConfig = SubTaskConfig
inspect.getfile(SubTaskConfig)
assert os.path.basename(inspect.getfile(SubTaskConfig)) == type(self).__name__+'Conf.py', \
('make sure you have imported the correct SubTaskConfig class')
def reset(self):
"""
Reset function, it delivers reset command to unreal engine to spawn all agents
环境复位,每个episode的开始会执行一次此函数中会初始化所有智能体
"""
super().reset()
self.t = 0
pos_ro = np.random.rand()*2*np.pi
# spawn agents
AgentSettingArray = []
# count the number of agent in each team
n_team_agent = {}
for i, agent_info in enumerate(SubTaskConfig.agent_list):
team = agent_info['team']
if team not in n_team_agent: n_team_agent[team] = 0
SubTaskConfig.agent_list[i]['uid'] = i
SubTaskConfig.agent_list[i]['tid'] = n_team_agent[team]
n_team_agent[team] += 1
# push agent init info one by one
for i, agent_info in enumerate(SubTaskConfig.agent_list):
team = agent_info['team']
agent_info['n_team_agent'] = n_team_agent[team]
init_fn = getattr(self, agent_info['init_fn_name'])
AgentSettingArray.append(init_fn(agent_info, pos_ro))
self.agents = [Agent(team=a['team'], team_id=a['tid'], uid=a['uid']) for a in SubTaskConfig.agent_list]
# refer to struct.cpp, FParsedDataInput
resp = self.client.send_and_wait_reply(json.dumps({
'valid': True,
'DataCmd': 'reset',
'NumAgents' : len(SubTaskConfig.agent_list),
'AgentSettingArray': AgentSettingArray, # refer to struct.cpp, FAgentProperty
'TimeStepMax': ScenarioConfig.MaxEpisodeStep,
'TimeStep' : 0,
'Actions': None,
}))
resp = json.loads(resp)
# make sure the map (level in UE) is correct
# assert resp['dataGlobal']['levelName'] == 'UhmapLargeScale'
assert len(resp['dataArr']) == len(AgentSettingArray)
return self.parse_response_ob_info(resp)
def step(self, act):
"""
step 函数,act中包含了所有agent的决策
"""
assert len(act) == self.n_agents
# translate actions to the format recognized by unreal engine
if ScenarioConfig.ActionFormat == 'Single-Digit':
act_send = [digit2act_dictionary[a] for a in act]
elif ScenarioConfig.ActionFormat == 'Multi-Digit':
act_send = [decode_action_as_string(a) for a in act]
elif ScenarioConfig.ActionFormat == 'ASCII':
act_send = [digitsToStrAction(a) for a in act]
else:
raise "ActionFormat is wrong!"
# simulation engine IO
resp = json.loads(self.client.send_and_wait_reply(json.dumps({
'valid': True,
'DataCmd': 'step',
'TimeStep': self.t,
'Actions': None,
'StringActions': act_send,
})))
# get obs for RL, info for script AI
ob, info = self.parse_response_ob_info(resp)
# generate reward, get the episode ending infomation
RewardForAllTeams, WinningResult = self.gen_reward_and_win(resp)
if WinningResult is not None:
info.update(WinningResult)
assert resp['dataGlobal']['episodeDone']
done = True
else:
done = False
if resp['dataGlobal']['timeCnt'] >= ScenarioConfig.MaxEpisodeStep:
assert done
return (ob, RewardForAllTeams, done, info) # choose this if RewardAsUnity
def parse_event(self, event):
"""
解析环境返回的一些关键事件,
如智能体阵亡,某队伍胜利等等。
关键事件需要在ue中进行定义.
该设计极大地简化了python端奖励的设计流程,
减小了python端的运算量。
"""
if not hasattr(self, 'pattern'): self.pattern = re.compile(r'<([^<>]*)>([^<>]*)')
return {k:v for k,v in re.findall(self.pattern, event)}
def extract_key_gameobj(self, resp):
"""
获取非智能体的仿真物件,例如重要landmark等
"""
keyObjArr = resp['dataGlobal']['keyObjArr']
return keyObjArr
def gen_reward_and_win(self, resp):
"""
奖励的设计在此定义,
(UE端编程死板,虽然预留了相关字段,
但请不要在UE端提供奖励的定义。)
建议:在UE端定义触发奖励的事件,如智能体阵亡、战术目标完成等,见parse_event
"""
reward = [0]*self.n_teams
events = resp['dataGlobal']['events']
WinningResult = None
for event in events:
event_parsed = self.parse_event(event)
# if event_parsed['Event'] == 'Destroyed':
# team = self.find_agent_by_uid(event_parsed['UID']).team
# reward[team] -= 0.05 # this team
# reward[1-team] += 0.10 # opp team
if event_parsed['Event'] == 'EndEpisode':
# print([a.alive * a.hp for a in self.agents])
PredatorWin = False
PredatorRank = False
PredatorReward = 0
PreyWin = -1
PreyRank = -1
PreyReward = 0
EndReason = event_parsed['EndReason']
# According to MISSION\uhmap\SubTasks\UhmapAttackPostConf.py, team 0 is prey team, team 1 is predator team
if EndReason == "AllPreyCaught" or EndReason == "Team_0_AllDead":
PredatorWin = True; PredatorRank = 0; PredatorReward = 1
PreyWin = False; PreyRank = 1; PreyReward = -1
elif EndReason == "TimeMaxCntReached" or EndReason == "Team_1_AllDead":
PredatorWin = False; PredatorRank = 1; PredatorReward = -1
PreyWin = True; PreyRank = 0; PreyReward = 1
else:
print('unexpected end reaon:', EndReason)
WinningResult = {"team_ranking": [PreyRank, PredatorRank], "end_reason": EndReason}
reward = [PreyReward, PredatorReward]
# print(reward)
return reward, WinningResult
def step_skip(self):
"""
跳过一次决策,无用的函数
"""
return self.client.send_and_wait_reply(json.dumps({
'valid': True,
'DataCmd': 'skip_frame',
}))
def find_agent_by_uid(self, uid):
"""
用uid查找智能体(带缓存加速机制)
"""
if not hasattr(self, 'uid_to_agent_dict'):
self.uid_to_agent_dict = {}
self.uid_to_agent_dict.update({agent.uid:agent for agent in self.agents})
if isinstance(uid, str):
self.uid_to_agent_dict.update({str(agent.uid):agent for agent in self.agents})
return self.uid_to_agent_dict[uid]
def parse_response_ob_info(self, resp):
"""
粗解析智能体的观测,例如把死智能体的位置替换为inf(无穷远),
将智能体的agentLocation从字典形式转变为更简洁的(x,y,z)tuple形式
"""
assert resp['valid']
resp['dataGlobal']['distanceMat'] = np.array(resp['dataGlobal']['distanceMat']['flat_arr']).reshape(self.n_agents,self.n_agents)
if len(resp['dataGlobal']['events'])>0:
tmp = [kv.split('>') for kv in resp['dataGlobal']['events'][0].split('<') if kv]
info_parse = {t[0]:t[1] for t in tmp}
info_dict = resp
for info in info_dict['dataArr']:
alive = info['agentAlive']
if alive:
agentLocation = info.pop('agentLocation')
agentRotation = info.pop('agentRotation')
agentVelocity = info.pop('agentVelocity')
agentScale = info.pop('agentScale')
info['agentLocationArr'] = (agentLocation['x'], agentLocation['y'], agentLocation['z'])
info['agentVelocityArr'] = (agentVelocity['x'], agentVelocity['y'], agentVelocity['z'])
info['agentRotationArr'] = (agentRotation['yaw'], agentRotation['pitch'], agentRotation['roll'])
info['agentScaleArr'] = (agentScale['x'], agentScale['y'], agentScale['z'])
info.pop('previousAction')
info.pop('availActions')
# info.pop('rSVD1')
info.pop('interaction')
else:
inf = float('inf')
info['agentLocationArr'] = (inf, inf, inf)
info['agentVelocityArr'] = (inf, inf, inf)
info['agentRotationArr'] = (inf, inf, inf)
info = resp['dataArr']
for i, agent_info in enumerate(info):
self.agents[i].update_agent_attrs(agent_info)
self.key_obj = self.extract_key_gameobj(resp)
# return ob, info
return self.make_obs(resp), info_dict
@staticmethod
def item_random_mv(src,dst,prob,rand=False):
assert len(src.shape)==1; assert len(dst.shape)==1
if rand: np.random.shuffle(src)
len_src = len(src)
n_mv = (np.random.rand(len_src) < prob).sum()
item_mv = src[range(len_src-n_mv,len_src)]
src = src[range(0,0+len_src-n_mv)]
dst = np.concatenate((item_mv, dst))
return src, dst
@staticmethod
def get_binary_array(n_int, n_bits=8, dtype=np.float32):
arr = np.zeros((*n_int.shape, n_bits), dtype=dtype)
for i in range(n_bits):
arr[:, i] = (n_int%2==1).astype(int)
n_int = n_int / 2
n_int = n_int.astype(np.int8)
return arr
def make_obs(self, resp=None, get_shape=False):
# CORE_DIM = 38
CORE_DIM = 23
assert ScenarioConfig.obs_vec_length == CORE_DIM
if get_shape:
return CORE_DIM
# temporary parameters
OBS_RANGE_PYTHON_SIDE = 15000
MAX_NUM_OPP_OBS = 5
MAX_NUM_ALL_OBS = 5
# get and calculate distance array
pos3d_arr = np.zeros(shape=(self.n_agents, 3), dtype=np.float32)
for i, agent in enumerate(self.agents): pos3d_arr[i] = agent.pos3d
# use the distance matrix calculated by unreal engine to accelerate
# dis_mat = distance_matrix(pos3d_arr) # dis_mat is a matrix, shape = (n_agent, n_agent)
dis_mat = resp['dataGlobal']['distanceMat']
alive_all = np.array([agent.alive for agent in self.agents])
try:
dis_mat[~alive_all,:] = +np.inf
dis_mat[:,~alive_all] = +np.inf
except:
pass
# get team list
team_belonging = np.array([agent.team for agent in self.agents])
# gather the obs arr of all known agents
obs_arr = RawObsArray(key='Agent')
if not hasattr(self, "uid_binary"):
self.uid_binary = self.get_binary_array(np.arange(self.n_agents), 10)
for i, agent in enumerate(self.agents):
assert agent.location is not None
assert agent.uid == i
obs_arr.append(
self.uid_binary[i] # 0~9
)
obs_arr.append([
agent.index, # 10
agent.team, # 11
agent.alive, # 12
agent.uid_remote, # 13
])
obs_arr.append( #[14,15,16,17,18,19]
agent.pos3d
# tear_num_arr(agent.pos3d, n_digits=6, base=10, mv_left=0)
# tear_num_arr(agent.pos3d, 6, ScenarioConfig.ObsBreakBase, 0) # 3 -- > 3*6 = 18 , 18-3=15, 23+15 = 38
)
obs_arr.append(
agent.vel3d
)
obs_arr.append([
agent.hp,
agent.yaw,
agent.max_speed,
])
obs_ = obs_arr.get()
new_obs = my_view(obs_, [self.n_agents, -1])
assert CORE_DIM == new_obs.shape[-1]
OBS_ALL_AGENTS = np.zeros(shape=(self.n_agents, MAX_NUM_OPP_OBS+MAX_NUM_ALL_OBS, CORE_DIM))
# now arranging the individual obs
for i, agent in enumerate(self.agents):
if not agent.alive:
OBS_ALL_AGENTS[i, :] = np.nan
continue
# if alive
# scope
dis2all = dis_mat[i, :]
is_ally = (team_belonging == agent.team)
# scope
a2h_dis = dis2all[~is_ally]
h_alive = alive_all[~is_ally]
h_feature = new_obs[~is_ally]
h_iden_sort = np.argsort(a2h_dis)[:MAX_NUM_OPP_OBS]
a2h_dis_sorted = a2h_dis[h_iden_sort]
h_alive_sorted = h_alive[h_iden_sort]
h_vis_mask = (a2h_dis_sorted <= OBS_RANGE_PYTHON_SIDE) & h_alive_sorted
# scope
h_vis_index = h_iden_sort[h_vis_mask]
h_invis_index = h_iden_sort[~h_vis_mask]
h_vis_index, h_invis_index = self.item_random_mv(src=h_vis_index, dst=h_invis_index,prob=0, rand=True)
h_ind = np.concatenate((h_vis_index, h_invis_index))
h_msk = np.concatenate((h_vis_index<0, h_invis_index>=0)) # "<0" project to False; ">=0" project to True
a2h_feature_sort = h_feature[h_ind]
a2h_feature_sort[h_msk] = 0
if len(a2h_feature_sort)
a2f_dis = dis2all[is_ally]
f_alive = alive_all[is_ally]
f_feature = new_obs[is_ally]
f_iden_sort = np.argsort(a2f_dis)[:MAX_NUM_ALL_OBS]
a2f_dis_sorted = a2f_dis[f_iden_sort]
f_alive_sorted = f_alive[f_iden_sort]
f_vis_mask = (a2f_dis_sorted <= OBS_RANGE_PYTHON_SIDE) & f_alive_sorted
# scope
f_vis_index = f_iden_sort[f_vis_mask]
self_vis_index = f_vis_index[:1] # seperate self and ally
f_vis_index = f_vis_index[1:] # seperate self and ally
f_invis_index = f_iden_sort[~f_vis_mask]
f_vis_index, f_invis_index = self.item_random_mv(src=f_vis_index, dst=f_invis_index,prob=0, rand=True)
f_ind = np.concatenate((self_vis_index, f_vis_index, f_invis_index))
f_msk = np.concatenate((self_vis_index<0, f_vis_index<0, f_invis_index>=0)) # "<0" project to False; ">=0" project to True
self_ally_feature_sort = f_feature[f_ind]
self_ally_feature_sort[f_msk] = 0
if len(self_ally_feature_sort) None:
super().__init__(rank)
self.observation_space = self.make_obs(get_shape=True)
self.SubTaskConfig = SubTaskConfig
assert os.path.basename(inspect.getfile(SubTaskConfig)) == type(self).__name__+'Conf.py', \
('make sure you have imported the correct SubTaskConfig class')
def reset(self):
super().reset()
self.t = 0
AgentPropertyDefaults.update({
'MaxMoveSpeed': 600,
'AgentScale' : { 'x': 1, 'y': 1, 'z': 1, }, # also influence object mass, please change it with causion!
"DodgeProb": 0.0, # probability of escaping dmg 闪避概率, test ok
"ExplodeDmg": 20, # ms explode dmg. test ok
})
# 500 is slightly above the ground,
# but agent will be spawn to ground automatically
####################### spawn all ###########################
AgentSettingArray = []
agent_uid_cnt = 0
# "N_AGENT_EACH_TEAM": [10, 10], // update N_AGENT_EACH_TEAM
for i in range(ScenarioConfig.N_AGENT_EACH_TEAM[0]-1): # For attacking, drones on the ground
x = 3254.0
y = 3891.0 + i *100
z = 500
agent_property = copy.deepcopy(AgentPropertyDefaults)
agent_property.update({
'ClassName': 'RLA_CAR', # FString ClassName = "";
'AgentTeam': 0, # int AgentTeam = 0;
'IndexInTeam': i, # int IndexInTeam = 0;
'UID': agent_uid_cnt, # int UID = 0;
'MaxMoveSpeed': 600,
"ExplodeDmg": 10,
"DodgeProb": 0.1,
'AgentHp': 100,
"WeaponCD": 1,
'Color':'(R=0,G=1,B=0,A=1)',
'InitLocation': { 'x': x, 'y': y, 'z': z, },
})
AgentSettingArray.append(agent_property); agent_uid_cnt += 1
x = 4000.0
y = 4000.0
z = 1000
agent_property = copy.deepcopy(AgentPropertyDefaults)
agent_property.update({
'ClassName': 'RLA_UAV_VIP', # FString ClassName = "";
'AgentTeam': 0, # int AgentTeam = 0;
'IndexInTeam': agent_uid_cnt, # under most situations IndexInTeam=agent_uid_cnt for team 0
'UID': agent_uid_cnt, # int UID = 0;
'MaxMoveSpeed': 1000,
"DodgeProb": 0.5,
"ExplodeDmg": 10,
'AgentHp': 1,
"WeaponCD": 10000000000,
'Color':'(R=0,G=1,B=0,A=1)',
'InitLocation': { 'x': x, 'y': y, 'z': z, },
})
AgentSettingArray.append(agent_property); agent_uid_cnt += 1
# "N_AGENT_EACH_TEAM": [10, 10], // update N_AGENT_EACH_TEAM
for i in range(ScenarioConfig.N_AGENT_EACH_TEAM[1]):
x = 0 + 500*(i+1) * (-1)**(i+1)
y = 0
z = 500
agent_property = copy.deepcopy(AgentPropertyDefaults)
agent_property.update({
'ClassName': 'RLA_CAR_RED',
'AgentTeam': 1,
'IndexInTeam': i,
'UID': agent_uid_cnt,
'MaxMoveSpeed': 700,
"DodgeProb": 0.1,
'AgentHp':100,
"ExplodeDmg": 10,
"WeaponCD": 0.5,
'Color':'(R=1,G=0,B=0,A=1)',
'InitLocation': { 'x': x, 'y': y, 'z': z, },
})
AgentSettingArray.append(agent_property); agent_uid_cnt += 1
# refer to struct.cpp, FParsedDataInput
resp = self.client.send_and_wait_reply(json.dumps({
'valid': True,
'DataCmd': 'reset',
'AgentSettingArray': AgentSettingArray, # refer to struct.cpp, FAgentProperty
'TimeStepMax': ScenarioConfig.MaxEpisodeStep,
'TimeStep' : 0,
'Actions': None,
}))
resp = json.loads(resp)
# make sure the map (level in UE) is correct
assert resp['dataGlobal']['levelName'] == 'UhmapBreakingBad'
assert len(resp['dataArr']) == len(AgentSettingArray)
return self.parse_response_ob_info(resp)
def step(self, act):
assert len(act) == self.n_agents
# translate actions to the format recognized by unreal engine
if ScenarioConfig.ActionFormat == 'Single-Digit':
act_send = [digit2act_dictionary[a] for a in act]
elif ScenarioConfig.ActionFormat == 'Multi-Digit':
act_send = [decode_action_as_string(a) for a in act]
# simulation engine IO
resp = json.loads(self.client.send_and_wait_reply(json.dumps({
'valid': True,
'DataCmd': 'step',
'TimeStep': self.t,
'Actions': None,
'StringActions': act_send,
})))
# get obs for RL, info for script AI
ob, info = self.parse_response_ob_info(resp)
# generate reward, get the episode ending infomation
RewardForAllTeams, WinningResult = self.gen_reward_and_win(resp)
if WinningResult is not None:
info.update(WinningResult)
assert resp['dataGlobal']['episodeDone']
done = True
else:
done = False
if resp['dataGlobal']['timeCnt'] >= ScenarioConfig.MaxEpisodeStep:
assert done
return (ob, RewardForAllTeams, done, info) # choose this if RewardAsUnity
def parse_event(self, event):
if not hasattr(self, 'pattern'): self.pattern = re.compile(r'<([^<>]*)>([^<>]*)')
return {k:v for k,v in re.findall(self.pattern, event)}
def extract_key_gameobj(self, resp):
keyObjArr = resp['dataGlobal']['keyObjArr']
return keyObjArr
def gen_reward_and_win(self, resp):
WIN_OR_LOSE_REWARD = 5
DRAW_REWARD = 2.5
KILL_REWARD = 0.1
BE_KILLED_REWARD = 0
reward = np.array([0.0]*self.n_teams,dtype=float)
events = resp['dataGlobal']['events']
WinningResult = None
# reward according to distance to either of the landmarks
landmarks_pos3darr = np.array([[
lm['location']['x'], lm['location']['y'], lm['location']['z']
] for lm in resp['dataGlobal']['keyObjArr']])
agent_pos3darr = np.array([agent.pos3d for agent in self.agents])
res = distance_mat_between(agent_pos3darr, landmarks_pos3darr)
penalty = -np.min(res, axis = -1) / 100000
reward += np.array([sum(penalty[ ScenarioConfig.AGENT_ID_EACH_TEAM[i] ]) for i in range(self.n_teams)])
# reward according to event (including win or lose event)
for event in events:
event_parsed = self.parse_event(event)
if event_parsed['Event'] == 'Destroyed':
team = self.find_agent_by_uid(event_parsed['UID']).team
reward[team] -= BE_KILLED_REWARD # this team
reward[1-team] += KILL_REWARD # opp team
if event_parsed['Event'] == 'EndEpisode':
# print([a.alive * a.hp for a in self.agents])
EndReason = event_parsed['EndReason']
WinTeam = int(event_parsed['WinTeam'])
if WinTeam<0: # end due to timeout
WinTeam = 1
if WinTeam >= 0:
WinningResult = {
"team_ranking": [0,1] if WinTeam==0 else [1,0],
"end_reason": EndReason
}
reward[WinTeam] += WIN_OR_LOSE_REWARD
reward[1-WinTeam] -= WIN_OR_LOSE_REWARD
else:
WinningResult = {
"team_ranking": [-1, -1],
"end_reason": EndReason
}
reward = [-DRAW_REWARD for _ in range(self.n_teams)]
# print(reward)
return reward, WinningResult
def step_skip(self):
return self.client.send_and_wait_reply(json.dumps({
'valid': True,
'DataCmd': 'skip_frame',
}))
def find_agent_by_uid(self, uid):
if not hasattr(self, 'uid_to_agent_dict'):
self.uid_to_agent_dict = {}
self.uid_to_agent_dict.update({agent.uid:agent for agent in self.agents})
if isinstance(uid, str):
self.uid_to_agent_dict.update({str(agent.uid):agent for agent in self.agents})
return self.uid_to_agent_dict[uid]
def parse_response_ob_info(self, resp):
assert resp['valid']
if len(resp['dataGlobal']['events'])>0:
tmp = [kv.split('>') for kv in resp['dataGlobal']['events'][0].split('<') if kv]
info_parse = {t[0]:t[1] for t in tmp}
# print('pass')
info_dict = resp
info = resp['dataArr']
for i, agent_info in enumerate(info):
self.agents[i].update_agent_attrs(agent_info)
self.key_obj = self.extract_key_gameobj(resp)
# return ob, info
return self.make_obs(resp), info_dict
@staticmethod
def item_random_mv(src,dst,prob,rand=False):
assert len(src.shape)==1; assert len(dst.shape)==1
if rand: np.random.shuffle(src)
len_src = len(src)
n_mv = (np.random.rand(len_src) < prob).sum()
item_mv = src[range(len_src-n_mv,len_src)]
src = src[range(0,0+len_src-n_mv)]
dst = np.concatenate((item_mv, dst))
return src, dst
@staticmethod
def get_binary_array(n_int, n_bits=8, dtype=np.float32):
arr = np.zeros((*n_int.shape, n_bits), dtype=dtype)
pointer = 0
for i in range(n_bits):
arr[:, i] = (n_int%2==1).astype(int)
n_int = n_int / 2
n_int = n_int.astype(np.int8)
return arr
def make_obs(self, resp=None, get_shape=False):
CORE_DIM = 23
assert ScenarioConfig.obs_vec_length == CORE_DIM
if get_shape:
return CORE_DIM
# temporary parameters
OBS_RANGE_PYTHON_SIDE = 1500
MAX_NUM_OPP_OBS = 5
MAX_NUM_ALL_OBS = 4
# get and calculate distance array
pos3d_arr = np.zeros(shape=(self.n_agents, 3), dtype=np.float32)
for i, agent in enumerate(self.agents): pos3d_arr[i] = agent.pos3d
# use the distance matrix calculated by unreal engine to accelerate
# dis_mat = distance_matrix(pos3d_arr) # dis_mat is a matrix, shape = (n_agent, n_agent)
dis_mat = np.array(resp['dataGlobal']['distanceMat']['flat_arr']).reshape(self.n_agents,self.n_agents)
alive_all = np.array([agent.alive for agent in self.agents])
try:
dis_mat[~alive_all,:] = +np.inf
dis_mat[:,~alive_all] = +np.inf
except:
pass
# get team list
team_belonging = np.array([agent.team for agent in self.agents])
# gather the obs arr of all known agents
obs_arr = RawObsArray(key='Agent')
if not hasattr(self, "uid_binary"):
self.uid_binary = self.get_binary_array(np.arange(self.n_agents), 10)
for i, agent in enumerate(self.agents):
assert agent.location is not None
assert agent.uid == i
obs_arr.append(
self.uid_binary[i]
)
obs_arr.append([
agent.index,
agent.team,
agent.alive,
agent.uid_remote,
])
obs_arr.append(
agent.pos3d
)
obs_arr.append(
agent.vel3d
)
obs_arr.append([
agent.hp,
agent.yaw,
agent.max_speed,
])
obs_ = obs_arr.get()
new_obs = my_view(obs_, [self.n_agents, -1])
assert CORE_DIM == new_obs.shape[-1]
OBS_ALL_AGENTS = np.zeros(shape=(
self.n_agents,
MAX_NUM_OPP_OBS+MAX_NUM_ALL_OBS,
CORE_DIM
))
# now arranging the individual obs
for i, agent in enumerate(self.agents):
if not agent.alive:
OBS_ALL_AGENTS[i, :] = np.nan
continue
# if alive
# scope
dis2all = dis_mat[i, :]
is_ally = (team_belonging == agent.team)
# scope
a2h_dis = dis2all[~is_ally]
h_alive = alive_all[~is_ally]
h_feature = new_obs[~is_ally]
h_iden_sort = np.argsort(a2h_dis)[:MAX_NUM_OPP_OBS]
a2h_dis_sorted = a2h_dis[h_iden_sort]
h_alive_sorted = h_alive[h_iden_sort]
h_vis_mask = (a2h_dis_sorted <= OBS_RANGE_PYTHON_SIDE) & h_alive_sorted
# scope
h_vis_index = h_iden_sort[h_vis_mask]
h_invis_index = h_iden_sort[~h_vis_mask]
h_vis_index, h_invis_index = self.item_random_mv(src=h_vis_index, dst=h_invis_index,prob=0, rand=True)
h_ind = np.concatenate((h_vis_index, h_invis_index))
h_msk = np.concatenate((h_vis_index<0, h_invis_index>=0)) # "<0" project to False; ">=0" project to True
a2h_feature_sort = h_feature[h_ind]
a2h_feature_sort[h_msk] = 0
if len(a2h_feature_sort)
a2f_dis = dis2all[is_ally]
f_alive = alive_all[is_ally]
f_feature = new_obs[is_ally]
f_iden_sort = np.argsort(a2f_dis)[:MAX_NUM_ALL_OBS]
a2f_dis_sorted = a2f_dis[f_iden_sort]
f_alive_sorted = f_alive[f_iden_sort]
f_vis_mask = (a2f_dis_sorted <= OBS_RANGE_PYTHON_SIDE) & f_alive_sorted
# scope
f_vis_index = f_iden_sort[f_vis_mask]
self_vis_index = f_vis_index[:1] # seperate self and ally
f_vis_index = f_vis_index[1:] # seperate self and ally
f_invis_index = f_iden_sort[~f_vis_mask]
f_vis_index, f_invis_index = self.item_random_mv(src=f_vis_index, dst=f_invis_index,prob=0, rand=True)
f_ind = np.concatenate((self_vis_index, f_vis_index, f_invis_index))
f_msk = np.concatenate((self_vis_index<0, f_vis_index<0, f_invis_index>=0)) # "<0" project to False; ">=0" project to True
self_ally_feature_sort = f_feature[f_ind]
self_ally_feature_sort[f_msk] = 0
if len(self_ally_feature_sort) None:
super().__init__(rank)
self.observation_space = self.make_obs(get_shape=True)
self.SubTaskConfig = SubTaskConfig
assert os.path.basename(inspect.getfile(SubTaskConfig)) == type(self).__name__+'Conf.py', \
('make sure you have imported the correct SubTaskConfig class')
def extract_key_gameobj(self, resp):
keyObjArr = resp['dataGlobal']['keyObjArr']
return keyObjArr
def gen_reward_and_win(self, resp):
reward = [0]*self.n_teams
events = resp['dataGlobal']['events']
WinningResult = None
for event in events:
event_parsed = self.parse_event(event)
if event_parsed['Event'] == 'Destroyed':
team = self.find_agent_by_uid(event_parsed['UID']).team
reward[team] -= 0.05 # this team
reward[1-team] += 0.10 # opp team
if event_parsed['Event'] == 'EndEpisode':
# print([a.alive * a.hp for a in self.agents])
EndReason = event_parsed['EndReason']
WinTeam = int(event_parsed['WinTeam'])
if WinTeam<0: # end due to timeout
agents_left_each_team = [0 for _ in range(self.n_teams)]
for a in self.agents:
if a.alive: agents_left_each_team[a.team] += 1
WinTeam = np.argmax(agents_left_each_team)
# <<1>> The alive agent number is EQUAL
if agents_left_each_team[WinTeam] == agents_left_each_team[1-WinTeam]:
hp_each_team = [0 for _ in range(self.n_teams)]
for a in self.agents:
if a.alive: hp_each_team[a.team] += a.hp
WinTeam = np.argmax(hp_each_team)
# <<2>> The alive agent HP sum is EQUAL
if hp_each_team[WinTeam] == hp_each_team[1-WinTeam]:
WinTeam = -1
if WinTeam >= 0:
WinningResult = {
"team_ranking": [0,1] if WinTeam==0 else [1,0],
"end_reason": EndReason
}
reward[WinTeam] += 1
reward[1-WinTeam] -= 1
else:
WinningResult = {
"team_ranking": [-1, -1],
"end_reason": EndReason
}
reward = [-1 for _ in range(self.n_teams)]
# print(reward)
return reward, WinningResult
@staticmethod
def item_random_mv(src,dst,prob,rand=False):
assert len(src.shape)==1; assert len(dst.shape)==1
if rand: np.random.shuffle(src)
len_src = len(src)
n_mv = (np.random.rand(len_src) < prob).sum()
item_mv = src[range(len_src-n_mv,len_src)]
src = src[range(0,0+len_src-n_mv)]
dst = np.concatenate((item_mv, dst))
return src, dst
@staticmethod
def get_binary_array(n_int, n_bits=8, dtype=np.float32):
arr = np.zeros((*n_int.shape, n_bits), dtype=dtype)
for i in range(n_bits):
arr[:, i] = (n_int%2==1).astype(int)
n_int = n_int / 2
n_int = n_int.astype(np.int8)
return arr
def make_obs(self, resp=None, get_shape=False):
# CORE_DIM = 38
CORE_DIM = 23
assert ScenarioConfig.obs_vec_length == CORE_DIM
if get_shape:
return CORE_DIM
# temporary parameters
OBS_RANGE_PYTHON_SIDE = 1500
MAX_NUM_OPP_OBS = 5
MAX_NUM_ALL_OBS = 5
# get and calculate distance array
pos3d_arr = np.zeros(shape=(self.n_agents, 3), dtype=np.float32)
for i, agent in enumerate(self.agents): pos3d_arr[i] = agent.pos3d
# use the distance matrix calculated by unreal engine to accelerate
# dis_mat = distance_matrix(pos3d_arr) # dis_mat is a matrix, shape = (n_agent, n_agent)
dis_mat = resp['dataGlobal']['distanceMat']
alive_all = np.array([agent.alive for agent in self.agents])
try:
dis_mat[~alive_all,:] = +np.inf
dis_mat[:,~alive_all] = +np.inf
except:
pass
# get team list
team_belonging = np.array([agent.team for agent in self.agents])
# gather the obs arr of all known agents
obs_arr = RawObsArray(key='Agent')
if not hasattr(self, "uid_binary"):
self.uid_binary = self.get_binary_array(np.arange(self.n_agents), 10)
for i, agent in enumerate(self.agents):
assert agent.location is not None
assert agent.uid == i
obs_arr.append(
self.uid_binary[i] # 0~9
)
obs_arr.append([
agent.index, # 10
agent.team, # 11
agent.alive, # 12
agent.uid_remote, # 13
])
obs_arr.append( #[14,15,16,17,18,19]
agent.pos3d
# tear_num_arr(agent.pos3d, n_digits=6, base=10, mv_left=0)
# tear_num_arr(agent.pos3d, 6, ScenarioConfig.ObsBreakBase, 0) # 3 -- > 3*6 = 18 , 18-3=15, 23+15 = 38
)
obs_arr.append(
agent.vel3d
)
obs_arr.append([
agent.hp,
agent.yaw,
agent.max_speed,
])
obs_ = obs_arr.get()
new_obs = my_view(obs_, [self.n_agents, -1])
assert CORE_DIM == new_obs.shape[-1]
OBS_ALL_AGENTS = np.zeros(shape=(
self.n_agents,
MAX_NUM_OPP_OBS+MAX_NUM_ALL_OBS,
CORE_DIM
))
# now arranging the individual obs
for i, agent in enumerate(self.agents):
if not agent.alive:
OBS_ALL_AGENTS[i, :] = np.nan
continue
# if alive
# scope
dis2all = dis_mat[i, :]
is_ally = (team_belonging == agent.team)
# scope
a2h_dis = dis2all[~is_ally]
h_alive = alive_all[~is_ally]
h_feature = new_obs[~is_ally]
h_iden_sort = np.argsort(a2h_dis)[:MAX_NUM_OPP_OBS]
a2h_dis_sorted = a2h_dis[h_iden_sort]
h_alive_sorted = h_alive[h_iden_sort]
h_vis_mask = (a2h_dis_sorted <= OBS_RANGE_PYTHON_SIDE) & h_alive_sorted
# scope
h_vis_index = h_iden_sort[h_vis_mask]
h_invis_index = h_iden_sort[~h_vis_mask]
h_vis_index, h_invis_index = self.item_random_mv(src=h_vis_index, dst=h_invis_index,prob=0, rand=True)
h_ind = np.concatenate((h_vis_index, h_invis_index))
h_msk = np.concatenate((h_vis_index<0, h_invis_index>=0)) # "<0" project to False; ">=0" project to True
a2h_feature_sort = h_feature[h_ind]
a2h_feature_sort[h_msk] = 0
if len(a2h_feature_sort)
a2f_dis = dis2all[is_ally]
f_alive = alive_all[is_ally]
f_feature = new_obs[is_ally]
f_iden_sort = np.argsort(a2f_dis)[:MAX_NUM_ALL_OBS]
a2f_dis_sorted = a2f_dis[f_iden_sort]
f_alive_sorted = f_alive[f_iden_sort]
f_vis_mask = (a2f_dis_sorted <= OBS_RANGE_PYTHON_SIDE) & f_alive_sorted
# scope
f_vis_index = f_iden_sort[f_vis_mask]
self_vis_index = f_vis_index[:1] # seperate self and ally
f_vis_index = f_vis_index[1:] # seperate self and ally
f_invis_index = f_iden_sort[~f_vis_mask]
f_vis_index, f_invis_index = self.item_random_mv(src=f_vis_index, dst=f_invis_index,prob=0, rand=True)
f_ind = np.concatenate((self_vis_index, f_vis_index, f_invis_index))
f_msk = np.concatenate((self_vis_index<0, f_vis_index<0, f_invis_index>=0)) # "<0" project to False; ">=0" project to True
self_ally_feature_sort = f_feature[f_ind]
self_ally_feature_sort[f_msk] = 0
if len(self_ally_feature_sort) 0:
OBJ_UID_OFFSET = 32768
obs_arr = RawObsArray(key = 'GameObj')
for i, obj in enumerate(self.key_obj):
assert obj['uId'] - OBJ_UID_OFFSET == i
obs_arr.append(
-self.uid_binary[i] # reverse uid binary, self.uid_binary[i]
)
obs_arr.append([
obj['uId'] - OBJ_UID_OFFSET, #agent.index,
-1, #agent.team,
True, #agent.alive,
obj['uId'] - OBJ_UID_OFFSET, #agent.uid_remote,
])
# tear_num_arr(agent.pos3d, n_digits=6, base=10, mv_left=0)
obs_arr.append(
[
obj['location']['x'], obj['location']['y'], obj['location']['z'] # agent.pos3d
]
# tear_num_arr([
# obj['location']['x'], obj['location']['y'], obj['location']['z'] # agent.pos3d
# ], 6, ScenarioConfig.ObsBreakBase, 0)
)
obs_arr.append([
obj['velocity']['x'], obj['velocity']['y'], obj['velocity']['z'] # agent.vel3d
]+
[
-1, # hp
obj['rotation']['yaw'], # yaw
0, # max_speed
])
OBS_GameObj = my_view(obs_arr.get(), [len(self.key_obj), -1])
OBS_GameObj = OBS_GameObj[:MAX_OBJ_NUM_ACCEPT, :]
OBS_GameObj = repeat_at(OBS_GameObj, insert_dim=0, n_times=self.n_agents)
OBS_ALL_AGENTS = np.concatenate((OBS_ALL_AGENTS, OBS_GameObj), axis=1)
return OBS_ALL_AGENTS
def init_drone(self, agent_info, pos_ro):
N_COL = 2
agent_class = agent_info['type']
team = agent_info['team']
n_team_agent = 10
tid = agent_info['tid']
uid = agent_info['uid']
x = 0 + 800*(tid - n_team_agent//2) //N_COL
y = (400* (tid%N_COL) + 2000) * (-1)**(team+1)
x,y = np.matmul(np.array([x,y]), np.array([[np.cos(pos_ro), -np.sin(pos_ro)], [np.sin(pos_ro), np.cos(pos_ro)] ]))
z = 500 # 500 is slightly above the ground
yaw = 90 if team==0 else -90
assert np.abs(x) < 15000.0 and np.abs(y) < 15000.0
agent_property = copy.deepcopy(AgentPropertyDefaults)
agent_property.update({
'DebugAgent': False,
# max drive/fly speed
'MaxMoveSpeed': 400,
# also influence object mass, please change it with causion!
'AgentScale' : { 'x': 0.7, 'y': 0.7, 'z': 0.7, },
# probability of escaping dmg 闪避
"DodgeProb": 0.0,
# ms explode dmg
"ExplodeDmg": 75,
# team belonging
'AgentTeam': team,
# choose ue class to init
'ClassName': agent_class,
# Weapon CD
'WeaponCD': 1,
# open fire range
"PerceptionRange": 2000,
"GuardRange": 1400,
"FireRange": 1300 ,
# debugging
'RSVD1': f'-CarrierName=T0-0 -NumDrone={self.n_team_agent[team]-1}' if team==0 else f'-CarrierName=T1-0 -NumDrone={self.n_team_agent[team]-1}',
# regular
'RSVD2': '-InitAct=ActionSet2::Idle;StaticAlert',
# agent hp
'AgentHp': 110,
# the rank of agent inside the team
'IndexInTeam': tid,
# the unique identity of this agent in simulation system
'UID': uid,
# show color
'Color':'(R=0,G=1,B=0,A=1)' if team==0 else '(R=0,G=0,B=1,A=1)',
# initial location
'InitLocation': { 'x': x, 'y': y, 'z': z, },
# initial facing direction et.al.
'InitRotator': { 'pitch': 0, 'roll': 0, 'yaw': yaw, },
}),
return agent_property
def init_carrier(self, agent_info, pos_ro):
N_COL = 2
agent_class = agent_info['type']
team = agent_info['team']
n_team_agent = 10
tid = agent_info['tid']
uid = agent_info['uid']
x = 0 + 800*(tid - n_team_agent//2) //N_COL
y = 2000 * (-1)**(team+1)
x,y = np.matmul(np.array([x,y]), np.array([[np.cos(pos_ro), -np.sin(pos_ro)], [np.sin(pos_ro), np.cos(pos_ro)] ]))
z = 1000
yaw = 90 if team==0 else -90
assert np.abs(x) < 15000.0 and np.abs(y) < 15000.0
agent_property = copy.deepcopy(AgentPropertyDefaults)
agent_property.update({
'DebugAgent': False,
# max drive/fly speed
'MaxMoveSpeed': 900,
# also influence object mass, please change it with causion!
'AgentScale' : { 'x': 1.0, 'y': 1.0, 'z': 1.0, },
# probability of escaping dmg 闪避
"DodgeProb": 0.0,
# ms explode dmg
"ExplodeDmg": 100,
# team belonging
'AgentTeam': team,
# choose ue class to init
'ClassName': agent_class,
# Weapon CD
'WeaponCD': 3,
# open fire range
"PerceptionRange": 5000,
"GuardRange": 4800,
"FireRange": 4800,
# debugging
'RSVD1': '',
# regular
'RSVD2': '-InitAct=ActionSet2::Idle;StaticAlert',
# agent hp
'AgentHp': 500,
# the rank of agent inside the team
'IndexInTeam': tid,
# the unique identity of this agent in simulation system
'UID': uid,
# show color
'Color':'(R=0,G=1,B=0,A=1)' if team==0 else '(R=0,G=0,B=1,A=1)',
# initial location
'InitLocation': { 'x': x, 'y': y, 'z': z, },
# initial facing direction et.al.
'InitRotator': { 'pitch': 0, 'roll': 0, 'yaw': yaw, },
}),
return agent_property
================================================
FILE: PythonExample/hmp_minimal_modules/MISSION/uhmap/SubTasks/UhmapCarrierConf.py
================================================
class SubTaskConfig():
agent_list = [
{ "team": 0, "type": "Carrier", "init_fn_name": "init_carrier" },
{ "team": 0, "type": "SmallDrone", "init_fn_name": "init_drone" },
{ "team": 0, "type": "SmallDrone", "init_fn_name": "init_drone" },
{ "team": 0, "type": "SmallDrone", "init_fn_name": "init_drone" },
{ "team": 0, "type": "SmallDrone", "init_fn_name": "init_drone" },
{ "team": 0, "type": "SmallDrone", "init_fn_name": "init_drone" },
{ "team": 0, "type": "SmallDrone", "init_fn_name": "init_drone" },
{ "team": 0, "type": "SmallDrone", "init_fn_name": "init_drone" },
{ "team": 0, "type": "SmallDrone", "init_fn_name": "init_drone" },
{ "team": 0, "type": "SmallDrone", "init_fn_name": "init_drone" },
{ "team": 0, "type": "SmallDrone", "init_fn_name": "init_drone" },
{ "team": 0, "type": "SmallDrone", "init_fn_name": "init_drone" },
{ "team": 0, "type": "SmallDrone", "init_fn_name": "init_drone" },
{ "team": 0, "type": "SmallDrone", "init_fn_name": "init_drone" },
{ "team": 0, "type": "SmallDrone", "init_fn_name": "init_drone" },
{ "team": 0, "type": "SmallDrone", "init_fn_name": "init_drone" },
{ "team": 0, "type": "SmallDrone", "init_fn_name": "init_drone" },
{ "team": 0, "type": "SmallDrone", "init_fn_name": "init_drone" },
{ "team": 0, "type": "SmallDrone", "init_fn_name": "init_drone" },
{ "team": 0, "type": "SmallDrone", "init_fn_name": "init_drone" },
{ "team": 1, "type": "Carrier", "init_fn_name": "init_carrier" },
{ "team": 1, "type": "SmallDrone", "init_fn_name": "init_drone" },
{ "team": 1, "type": "SmallDrone", "init_fn_name": "init_drone" },
{ "team": 1, "type": "SmallDrone", "init_fn_name": "init_drone" },
{ "team": 1, "type": "SmallDrone", "init_fn_name": "init_drone" },
{ "team": 1, "type": "SmallDrone", "init_fn_name": "init_drone" },
{ "team": 1, "type": "SmallDrone", "init_fn_name": "init_drone" },
{ "team": 1, "type": "SmallDrone", "init_fn_name": "init_drone" },
{ "team": 1, "type": "SmallDrone", "init_fn_name": "init_drone" },
{ "team": 1, "type": "SmallDrone", "init_fn_name": "init_drone" },
{ "team": 1, "type": "SmallDrone", "init_fn_name": "init_drone" },
{ "team": 1, "type": "SmallDrone", "init_fn_name": "init_drone" },
{ "team": 1, "type": "SmallDrone", "init_fn_name": "init_drone" },
{ "team": 1, "type": "SmallDrone", "init_fn_name": "init_drone" },
{ "team": 1, "type": "SmallDrone", "init_fn_name": "init_drone" },
{ "team": 1, "type": "SmallDrone", "init_fn_name": "init_drone" },
{ "team": 1, "type": "SmallDrone", "init_fn_name": "init_drone" },
{ "team": 1, "type": "SmallDrone", "init_fn_name": "init_drone" },
{ "team": 1, "type": "SmallDrone", "init_fn_name": "init_drone" },
{ "team": 1, "type": "SmallDrone", "init_fn_name": "init_drone" },
]
obs_vec_length = 23
obs_n_entity = 11
ActionFormat = 'ASCII'
================================================
FILE: PythonExample/hmp_minimal_modules/MISSION/uhmap/SubTasks/UhmapEscape.py
================================================
import json, copy, re, os, inspect, os
import numpy as np
from UTIL.tensor_ops import my_view, repeat_at
from ...common.base_env import RawObsArray
from ..actionset_v3 import digitsToStrAction
from ..agent import Agent
from ..uhmap_env_wrapper import UhmapEnv, ScenarioConfig
from .UhmapEscapeConf import SubTaskConfig
from .cython_func import tear_num_arr
def init_position_helper(x_max, x_min, y_max, y_min, total, this):
n_col = np.ceil(np.sqrt(np.abs(x_max-x_min) * total / np.abs(y_max-y_min)))
n_row = np.ceil(total / n_col)
which_row = this // n_col
which_col = this % n_col
x = x_min + (which_col/n_col)*(x_max-x_min)
y = y_min + (which_row/n_row)*(y_max-y_min)
return x, y
class UhmapEscape(UhmapEnv):
def __init__(self, rank) -> None:
super().__init__(rank)
self.observation_space = self.make_obs(get_shape=True)
self.SubTaskConfig = SubTaskConfig
inspect.getfile(SubTaskConfig)
assert os.path.basename(inspect.getfile(SubTaskConfig)) == type(self).__name__+'Conf.py', \
('make sure you have imported the correct SubTaskConfig class')
def reset(self):
"""
Reset function, it delivers reset command to unreal engine to spawn all agents
环境复位,每个episode的开始会执行一次此函数中会初始化所有智能体
"""
super().reset()
self.t = 0
pos_ro = np.random.rand()*2*np.pi
# spawn agents
AgentSettingArray = []
# count the number of agent in each team
n_team_agent = {}
for i, agent_info in enumerate(SubTaskConfig.agent_list):
team = agent_info['team']
if team not in n_team_agent: n_team_agent[team] = 0
SubTaskConfig.agent_list[i]['uid'] = i
SubTaskConfig.agent_list[i]['tid'] = n_team_agent[team]
n_team_agent[team] += 1
# push agent init info one by one
for i, agent_info in enumerate(SubTaskConfig.agent_list):
team = agent_info['team']
agent_info['n_team_agent'] = n_team_agent[team]
init_fn = getattr(self, agent_info['init_fn_name'])
AgentSettingArray.append(init_fn(agent_info, pos_ro))
self.agents = [Agent(team=a['team'], team_id=a['tid'], uid=a['uid']) for a in SubTaskConfig.agent_list]
# refer to struct.cpp, FParsedDataInput
resp = self.client.send_and_wait_reply(json.dumps({
'valid': True,
'DataCmd': 'reset',
'NumAgents' : len(SubTaskConfig.agent_list),
'AgentSettingArray': AgentSettingArray, # refer to struct.cpp, FAgentProperty
'TimeStepMax': ScenarioConfig.MaxEpisodeStep,
'TimeStep' : 0,
'Actions': None,
}))
resp = json.loads(resp)
# make sure the map (level in UE) is correct
# assert resp['dataGlobal']['levelName'] == 'UhmapLargeScale'
assert len(resp['dataArr']) == len(AgentSettingArray)
return self.parse_response_ob_info(resp)
def step(self, act):
"""
step 函数,act中包含了所有agent的决策
"""
assert len(act) == self.n_agents
# translate actions to the format recognized by unreal engine
if ScenarioConfig.ActionFormat == 'Single-Digit':
act_send = [digit2act_dictionary[a] for a in act]
elif ScenarioConfig.ActionFormat == 'Multi-Digit':
act_send = [decode_action_as_string(a) for a in act]
elif ScenarioConfig.ActionFormat == 'ASCII':
act_send = [digitsToStrAction(a) for a in act]
else:
raise "ActionFormat is wrong!"
# simulation engine IO
resp = json.loads(self.client.send_and_wait_reply(json.dumps({
'valid': True,
'DataCmd': 'step',
'TimeStep': self.t,
'Actions': None,
'StringActions': act_send,
})))
# get obs for RL, info for script AI
ob, info = self.parse_response_ob_info(resp)
# generate reward, get the episode ending infomation
RewardForAllTeams, WinningResult = self.gen_reward_and_win(resp)
if WinningResult is not None:
info.update(WinningResult)
assert resp['dataGlobal']['episodeDone']
done = True
else:
done = False
if resp['dataGlobal']['timeCnt'] >= ScenarioConfig.MaxEpisodeStep:
assert done
return (ob, RewardForAllTeams, done, info) # choose this if RewardAsUnity
def parse_event(self, event):
"""
解析环境返回的一些关键事件,
如智能体阵亡,某队伍胜利等等。
关键事件需要在ue中进行定义.
该设计极大地简化了python端奖励的设计流程,
减小了python端的运算量。
"""
if not hasattr(self, 'pattern'): self.pattern = re.compile(r'<([^<>]*)>([^<>]*)')
return {k:v for k,v in re.findall(self.pattern, event)}
def extract_key_gameobj(self, resp):
"""
获取非智能体的仿真物件,例如重要landmark等
"""
keyObjArr = resp['dataGlobal']['keyObjArr']
return keyObjArr
def gen_reward_and_win(self, resp):
"""
奖励的设计在此定义,
(UE端编程死板,虽然预留了相关字段,
但请不要在UE端提供奖励的定义。)
建议:在UE端定义触发奖励的事件,如智能体阵亡、战术目标完成等,见parse_event
"""
reward = [0]*self.n_teams
events = resp['dataGlobal']['events']
WinningResult = None
for event in events:
event_parsed = self.parse_event(event)
if event_parsed['Event'] == 'Destroyed':
team = self.find_agent_by_uid(event_parsed['UID']).team
reward[team] -= 0.10 # this team
if event_parsed['Event'] == 'EndEpisode':
# print([a.alive * a.hp for a in self.agents])
DefenderWin = False
DefenderRank = False
DefenderReward = 0
AttackerWin = -1
AttackerRank = -1
AttackerReward = 0
EndReason = event_parsed['EndReason']
# print(EndReason)
# According to MISSION\uhmap\SubTasks\UhmapAttackPostConf.py, team 0 is Attacker team, team 1 is Defender team
if EndReason == "Team_0_AllDead":
DefenderWin = True; DefenderRank = 0; DefenderReward = 1
AttackerWin = False; AttackerRank = 1; AttackerReward = -1
elif EndReason == "TimeMaxCntReached":
DefenderWin = True; DefenderRank = 0; DefenderReward = 1
AttackerWin = False; AttackerRank = 1; AttackerReward = -1
elif EndReason == "Team_1_AllDead":
DefenderWin = False; DefenderRank = 1; DefenderReward = -1
AttackerWin = True; AttackerRank = 0; AttackerReward = 1
else:
print('unexpected end reaon:', EndReason)
WinningResult = {"team_ranking": [AttackerRank, DefenderRank], "end_reason": EndReason}
reward = [AttackerReward, DefenderReward]
# print(reward)
return reward, WinningResult
def step_skip(self):
"""
跳过一次决策,无用的函数
"""
return self.client.send_and_wait_reply(json.dumps({
'valid': True,
'DataCmd': 'skip_frame',
}))
def find_agent_by_uid(self, uid):
"""
用uid查找智能体(带缓存加速机制)
"""
if not hasattr(self, 'uid_to_agent_dict'):
self.uid_to_agent_dict = {}
self.uid_to_agent_dict.update({agent.uid:agent for agent in self.agents})
if isinstance(uid, str):
self.uid_to_agent_dict.update({str(agent.uid):agent for agent in self.agents})
return self.uid_to_agent_dict[uid]
def parse_response_ob_info(self, resp):
"""
粗解析智能体的观测,例如把死智能体的位置替换为inf(无穷远),
将智能体的agentLocation从字典形式转变为更简洁的(x,y,z)tuple形式
"""
assert resp['valid']
resp['dataGlobal']['distanceMat'] = np.array(resp['dataGlobal']['distanceMat']['flat_arr']).reshape(self.n_agents,self.n_agents)
if len(resp['dataGlobal']['events'])>0:
tmp = [kv.split('>') for kv in resp['dataGlobal']['events'][0].split('<') if kv]
info_parse = {t[0]:t[1] for t in tmp}
info_dict = resp
for info in info_dict['dataArr']:
alive = info['agentAlive']
if alive:
agentLocation = info.pop('agentLocation')
agentRotation = info.pop('agentRotation')
agentVelocity = info.pop('agentVelocity')
agentScale = info.pop('agentScale')
info['agentLocationArr'] = (agentLocation['x'], agentLocation['y'], agentLocation['z'])
info['agentVelocityArr'] = (agentVelocity['x'], agentVelocity['y'], agentVelocity['z'])
info['agentRotationArr'] = (agentRotation['yaw'], agentRotation['pitch'], agentRotation['roll'])
info['agentScaleArr'] = (agentScale['x'], agentScale['y'], agentScale['z'])
info.pop('previousAction')
info.pop('availActions')
# info.pop('rSVD1')
info.pop('interaction')
else:
inf = float('inf')
info['agentLocationArr'] = (inf, inf, inf)
info['agentVelocityArr'] = (inf, inf, inf)
info['agentRotationArr'] = (inf, inf, inf)
info = resp['dataArr']
for i, agent_info in enumerate(info):
self.agents[i].update_agent_attrs(agent_info)
self.key_obj = self.extract_key_gameobj(resp)
# return ob, info
return self.make_obs(resp), info_dict
@staticmethod
def item_random_mv(src,dst,prob,rand=False):
assert len(src.shape)==1; assert len(dst.shape)==1
if rand: np.random.shuffle(src)
len_src = len(src)
n_mv = (np.random.rand(len_src) < prob).sum()
item_mv = src[range(len_src-n_mv,len_src)]
src = src[range(0,0+len_src-n_mv)]
dst = np.concatenate((item_mv, dst))
return src, dst
@staticmethod
def get_binary_array(n_int, n_bits=8, dtype=np.float32):
arr = np.zeros((*n_int.shape, n_bits), dtype=dtype)
for i in range(n_bits):
arr[:, i] = (n_int%2==1).astype(int)
n_int = n_int / 2
n_int = n_int.astype(np.int8)
return arr
def make_obs(self, resp=None, get_shape=False):
# CORE_DIM = 38
CORE_DIM = 23
assert ScenarioConfig.obs_vec_length == CORE_DIM
if get_shape:
return CORE_DIM
# temporary parameters
OBS_RANGE_PYTHON_SIDE = 15000
MAX_NUM_OPP_OBS = 5
MAX_NUM_ALL_OBS = 5
# get and calculate distance array
pos3d_arr = np.zeros(shape=(self.n_agents, 3), dtype=np.float32)
for i, agent in enumerate(self.agents): pos3d_arr[i] = agent.pos3d
# use the distance matrix calculated by unreal engine to accelerate
# dis_mat = distance_matrix(pos3d_arr) # dis_mat is a matrix, shape = (n_agent, n_agent)
dis_mat = resp['dataGlobal']['distanceMat']
alive_all = np.array([agent.alive for agent in self.agents])
dis_mat[~alive_all,:] = +np.inf
dis_mat[:,~alive_all] = +np.inf
# get team list
team_belonging = np.array([agent.team for agent in self.agents])
# gather the obs arr of all known agents
obs_arr = RawObsArray(key='Agent')
if not hasattr(self, "uid_binary"):
self.uid_binary = self.get_binary_array(np.arange(self.n_agents), 10)
for i, agent in enumerate(self.agents):
assert agent.location is not None
assert agent.uid == i
obs_arr.append(
self.uid_binary[i] # 0~9
)
obs_arr.append([
agent.index, # 10
agent.team, # 11
agent.alive, # 12
agent.uid_remote, # 13
])
obs_arr.append( #[14,15,16,17,18,19]
agent.pos3d
# tear_num_arr(agent.pos3d, n_digits=6, base=10, mv_left=0)
# tear_num_arr(agent.pos3d, 6, ScenarioConfig.ObsBreakBase, 0) # 3 -- > 3*6 = 18 , 18-3=15, 23+15 = 38
)
obs_arr.append(
agent.vel3d
)
obs_arr.append([
agent.hp,
agent.yaw,
agent.max_speed,
])
obs_ = obs_arr.get()
new_obs = my_view(obs_, [self.n_agents, -1])
assert CORE_DIM == new_obs.shape[-1]
OBS_ALL_AGENTS = np.zeros(shape=(self.n_agents, MAX_NUM_OPP_OBS+MAX_NUM_ALL_OBS, CORE_DIM))
# now arranging the individual obs
for i, agent in enumerate(self.agents):
if not agent.alive:
OBS_ALL_AGENTS[i, :] = np.nan
continue
# if alive
# scope
dis2all = dis_mat[i, :]
is_ally = (team_belonging == agent.team)
# scope
a2h_dis = dis2all[~is_ally]
h_alive = alive_all[~is_ally]
h_feature = new_obs[~is_ally]
h_iden_sort = np.argsort(a2h_dis)[:MAX_NUM_OPP_OBS]
a2h_dis_sorted = a2h_dis[h_iden_sort]
h_alive_sorted = h_alive[h_iden_sort]
h_vis_mask = (a2h_dis_sorted <= OBS_RANGE_PYTHON_SIDE) & h_alive_sorted
# scope
h_vis_index = h_iden_sort[h_vis_mask]
h_invis_index = h_iden_sort[~h_vis_mask]
h_vis_index, h_invis_index = self.item_random_mv(src=h_vis_index, dst=h_invis_index,prob=0, rand=True)
h_ind = np.concatenate((h_vis_index, h_invis_index))
h_msk = np.concatenate((h_vis_index<0, h_invis_index>=0)) # "<0" project to False; ">=0" project to True
a2h_feature_sort = h_feature[h_ind]
a2h_feature_sort[h_msk] = 0
if len(a2h_feature_sort)
a2f_dis = dis2all[is_ally]
f_alive = alive_all[is_ally]
f_feature = new_obs[is_ally]
f_iden_sort = np.argsort(a2f_dis)[:MAX_NUM_ALL_OBS]
a2f_dis_sorted = a2f_dis[f_iden_sort]
f_alive_sorted = f_alive[f_iden_sort]
f_vis_mask = (a2f_dis_sorted <= OBS_RANGE_PYTHON_SIDE) & f_alive_sorted
# scope
f_vis_index = f_iden_sort[f_vis_mask]
self_vis_index = f_vis_index[:1] # seperate self and ally
f_vis_index = f_vis_index[1:] # seperate self and ally
f_invis_index = f_iden_sort[~f_vis_mask]
f_vis_index, f_invis_index = self.item_random_mv(src=f_vis_index, dst=f_invis_index,prob=0, rand=True)
f_ind = np.concatenate((self_vis_index, f_vis_index, f_invis_index))
f_msk = np.concatenate((self_vis_index<0, f_vis_index<0, f_invis_index>=0)) # "<0" project to False; ">=0" project to True
self_ally_feature_sort = f_feature[f_ind]
self_ally_feature_sort[f_msk] = 0
if len(self_ally_feature_sort) None:
super().__init__(rank)
self.observation_space = self.make_obs(get_shape=True)
self.SubTaskConfig = SubTaskConfig
assert os.path.basename(inspect.getfile(SubTaskConfig)) == type(self).__name__+'Conf.py', \
('make sure you have imported the correct SubTaskConfig class')
def extract_key_gameobj(self, resp):
keyObjArr = resp['dataGlobal']['keyObjArr']
return keyObjArr
def gen_reward_and_win(self, resp):
reward = [0]*self.n_teams
events = resp['dataGlobal']['events']
WinningResult = None
for event in events:
event_parsed = self.parse_event(event)
if event_parsed['Event'] == 'Destroyed':
team = self.find_agent_by_uid(event_parsed['UID']).team
reward[team] -= 0.05 # this team
reward[1-team] += 0.10 # opp team
if event_parsed['Event'] == 'EndEpisode':
# print([a.alive * a.hp for a in self.agents])
EndReason = event_parsed['EndReason']
WinTeam = int(event_parsed['WinTeam'])
if WinTeam<0: # end due to timeout
agents_left_each_team = [0 for _ in range(self.n_teams)]
for a in self.agents:
if a.alive: agents_left_each_team[a.team] += 1
WinTeam = np.argmax(agents_left_each_team)
# <<1>> The alive agent number is EQUAL
if agents_left_each_team[WinTeam] == agents_left_each_team[1-WinTeam]:
hp_each_team = [0 for _ in range(self.n_teams)]
for a in self.agents:
if a.alive: hp_each_team[a.team] += a.hp
WinTeam = np.argmax(hp_each_team)
# <<2>> The alive agent HP sum is EQUAL
if hp_each_team[WinTeam] == hp_each_team[1-WinTeam]:
WinTeam = -1
if WinTeam >= 0:
WinningResult = {
"team_ranking": [0,1] if WinTeam==0 else [1,0],
"end_reason": EndReason
}
reward[WinTeam] += 1
reward[1-WinTeam] -= 1
else:
WinningResult = {
"team_ranking": [-1, -1],
"end_reason": EndReason
}
reward = [-1 for _ in range(self.n_teams)]
# print(reward)
return reward, WinningResult
@staticmethod
def item_random_mv(src,dst,prob,rand=False):
assert len(src.shape)==1; assert len(dst.shape)==1
if rand: np.random.shuffle(src)
len_src = len(src)
n_mv = (np.random.rand(len_src) < prob).sum()
item_mv = src[range(len_src-n_mv,len_src)]
src = src[range(0,0+len_src-n_mv)]
dst = np.concatenate((item_mv, dst))
return src, dst
@staticmethod
def get_binary_array(n_int, n_bits=8, dtype=np.float32):
arr = np.zeros((*n_int.shape, n_bits), dtype=dtype)
for i in range(n_bits):
arr[:, i] = (n_int%2==1).astype(int)
n_int = n_int / 2
n_int = n_int.astype(np.int8)
return arr
def make_obs(self, resp=None, get_shape=False):
# CORE_DIM = 38
CORE_DIM = 23
assert ScenarioConfig.obs_vec_length == CORE_DIM
if get_shape:
return CORE_DIM
# temporary parameters
OBS_RANGE_PYTHON_SIDE = 1500
MAX_NUM_OPP_OBS = 5
MAX_NUM_ALL_OBS = 5
# get and calculate distance array
pos3d_arr = np.zeros(shape=(self.n_agents, 3), dtype=np.float32)
for i, agent in enumerate(self.agents): pos3d_arr[i] = agent.pos3d
# use the distance matrix calculated by unreal engine to accelerate
# dis_mat = distance_matrix(pos3d_arr) # dis_mat is a matrix, shape = (n_agent, n_agent)
dis_mat = resp['dataGlobal']['distanceMat']
alive_all = np.array([agent.alive for agent in self.agents])
try:
dis_mat[~alive_all,:] = +np.inf
dis_mat[:,~alive_all] = +np.inf
except:
pass
# get team list
team_belonging = np.array([agent.team for agent in self.agents])
# gather the obs arr of all known agents
obs_arr = RawObsArray(key='Agent')
if not hasattr(self, "uid_binary"):
self.uid_binary = self.get_binary_array(np.arange(self.n_agents), 10)
for i, agent in enumerate(self.agents):
assert agent.location is not None
assert agent.uid == i
obs_arr.append(
self.uid_binary[i] # 0~9
)
obs_arr.append([
agent.index, # 10
agent.team, # 11
agent.alive, # 12
agent.uid_remote, # 13
])
obs_arr.append( #[14,15,16,17,18,19]
agent.pos3d
# tear_num_arr(agent.pos3d, n_digits=6, base=10, mv_left=0)
# tear_num_arr(agent.pos3d, 6, ScenarioConfig.ObsBreakBase, 0) # 3 -- > 3*6 = 18 , 18-3=15, 23+15 = 38
)
obs_arr.append(
agent.vel3d
)
obs_arr.append([
agent.hp,
agent.yaw,
agent.max_speed,
])
obs_ = obs_arr.get()
new_obs = my_view(obs_, [self.n_agents, -1])
assert CORE_DIM == new_obs.shape[-1]
OBS_ALL_AGENTS = np.zeros(shape=(
self.n_agents,
MAX_NUM_OPP_OBS+MAX_NUM_ALL_OBS,
CORE_DIM
))
# now arranging the individual obs
for i, agent in enumerate(self.agents):
if not agent.alive:
OBS_ALL_AGENTS[i, :] = np.nan
continue
# if alive
# scope
dis2all = dis_mat[i, :]
is_ally = (team_belonging == agent.team)
# scope
a2h_dis = dis2all[~is_ally]
h_alive = alive_all[~is_ally]
h_feature = new_obs[~is_ally]
h_iden_sort = np.argsort(a2h_dis)[:MAX_NUM_OPP_OBS]
a2h_dis_sorted = a2h_dis[h_iden_sort]
h_alive_sorted = h_alive[h_iden_sort]
h_vis_mask = (a2h_dis_sorted <= OBS_RANGE_PYTHON_SIDE) & h_alive_sorted
# scope
h_vis_index = h_iden_sort[h_vis_mask]
h_invis_index = h_iden_sort[~h_vis_mask]
h_vis_index, h_invis_index = self.item_random_mv(src=h_vis_index, dst=h_invis_index,prob=0, rand=True)
h_ind = np.concatenate((h_vis_index, h_invis_index))
h_msk = np.concatenate((h_vis_index<0, h_invis_index>=0)) # "<0" project to False; ">=0" project to True
a2h_feature_sort = h_feature[h_ind]
a2h_feature_sort[h_msk] = 0
if len(a2h_feature_sort)
a2f_dis = dis2all[is_ally]
f_alive = alive_all[is_ally]
f_feature = new_obs[is_ally]
f_iden_sort = np.argsort(a2f_dis)[:MAX_NUM_ALL_OBS]
a2f_dis_sorted = a2f_dis[f_iden_sort]
f_alive_sorted = f_alive[f_iden_sort]
f_vis_mask = (a2f_dis_sorted <= OBS_RANGE_PYTHON_SIDE) & f_alive_sorted
# scope
f_vis_index = f_iden_sort[f_vis_mask]
self_vis_index = f_vis_index[:1] # seperate self and ally
f_vis_index = f_vis_index[1:] # seperate self and ally
f_invis_index = f_iden_sort[~f_vis_mask]
f_vis_index, f_invis_index = self.item_random_mv(src=f_vis_index, dst=f_invis_index,prob=0, rand=True)
f_ind = np.concatenate((self_vis_index, f_vis_index, f_invis_index))
f_msk = np.concatenate((self_vis_index<0, f_vis_index<0, f_invis_index>=0)) # "<0" project to False; ">=0" project to True
self_ally_feature_sort = f_feature[f_ind]
self_ally_feature_sort[f_msk] = 0
if len(self_ally_feature_sort) 0:
OBJ_UID_OFFSET = 32768
obs_arr = RawObsArray(key = 'GameObj')
for i, obj in enumerate(self.key_obj):
assert obj['uId'] - OBJ_UID_OFFSET == i
obs_arr.append(
-self.uid_binary[i] # reverse uid binary, self.uid_binary[i]
)
obs_arr.append([
obj['uId'] - OBJ_UID_OFFSET, #agent.index,
-1, #agent.team,
True, #agent.alive,
obj['uId'] - OBJ_UID_OFFSET, #agent.uid_remote,
])
# tear_num_arr(agent.pos3d, n_digits=6, base=10, mv_left=0)
obs_arr.append(
[
obj['location']['x'], obj['location']['y'], obj['location']['z'] # agent.pos3d
]
# tear_num_arr([
# obj['location']['x'], obj['location']['y'], obj['location']['z'] # agent.pos3d
# ], 6, ScenarioConfig.ObsBreakBase, 0)
)
obs_arr.append([
obj['velocity']['x'], obj['velocity']['y'], obj['velocity']['z'] # agent.vel3d
]+
[
-1, # hp
obj['rotation']['yaw'], # yaw
0, # max_speed
])
OBS_GameObj = my_view(obs_arr.get(), [len(self.key_obj), -1])
OBS_GameObj = OBS_GameObj[:MAX_OBJ_NUM_ACCEPT, :]
OBS_GameObj = repeat_at(OBS_GameObj, insert_dim=0, n_times=self.n_agents)
OBS_ALL_AGENTS = np.concatenate((OBS_ALL_AGENTS, OBS_GameObj), axis=1)
return OBS_ALL_AGENTS
def init_drone(self, agent_info, pos_ro):
N_COL = 2
agent_class = agent_info['type']
team = agent_info['team']
n_team_agent = 10
tid = agent_info['tid']
uid = agent_info['uid']
x = 0 + 800*(tid - n_team_agent//2) //N_COL
y = (400* (tid%N_COL) + 2000) * (-1)**(team+1)
x,y = np.matmul(np.array([x,y]), np.array([[np.cos(pos_ro), -np.sin(pos_ro)], [np.sin(pos_ro), np.cos(pos_ro)] ]))
z = 500 # 500 is slightly above the ground
yaw = 90 if team==0 else -90
assert np.abs(x) < 15000.0 and np.abs(y) < 15000.0
agent_property = copy.deepcopy(AgentPropertyDefaults)
agent_property.update({
'DebugAgent': False,
# max drive/fly speed
'MaxMoveSpeed': 400,
# also influence object mass, please change it with causion!
'AgentScale' : { 'x': 0.7, 'y': 0.7, 'z': 0.7, },
# probability of escaping dmg 闪避
"DodgeProb": 0.0,
# ms explode dmg
"ExplodeDmg": 75,
# team belonging
'AgentTeam': team,
# choose ue class to init
'ClassName': agent_class,
# Weapon CD
'WeaponCD': 1,
# open fire range
"PerceptionRange": 2000,
"GuardRange": 1400,
"FireRange": 1300 ,
# debugging
'RSVD1': f'-CarrierName=T0-0 -NumDrone={self.n_team_agent[team]-1}' if team==0 else f'-CarrierName=T1-0 -NumDrone={self.n_team_agent[team]-1}',
# regular
'RSVD2': '-InitAct=ActionSet2::Idle;StaticAlert',
# agent hp
'AgentHp': 110,
# the rank of agent inside the team
'IndexInTeam': tid,
# the unique identity of this agent in simulation system
'UID': uid,
# show color
'Color':'(R=0,G=1,B=0,A=1)' if team==0 else '(R=0,G=0,B=1,A=1)',
# initial location
'InitLocation': { 'x': x, 'y': y, 'z': z, },
# initial facing direction et.al.
'InitRotator': { 'pitch': 0, 'roll': 0, 'yaw': yaw, },
}),
return agent_property
================================================
FILE: PythonExample/hmp_minimal_modules/MISSION/uhmap/SubTasks/UhmapFormationConf.py
================================================
class SubTaskConfig():
agent_list = [
{ "team": 0, "type": "Lv3_MomentumAgentWithHp", "init_fn_name": "init_drone" },
{ "team": 0, "type": "Lv3_MomentumAgentWithHp", "init_fn_name": "init_drone" },
{ "team": 0, "type": "Lv3_MomentumAgentWithHp", "init_fn_name": "init_drone" },
{ "team": 0, "type": "Lv3_MomentumAgentWithHp", "init_fn_name": "init_drone" },
{ "team": 0, "type": "Lv3_MomentumAgentWithHp", "init_fn_name": "init_drone" },
{ "team": 0, "type": "Lv3_MomentumAgentWithHp", "init_fn_name": "init_drone" },
{ "team": 0, "type": "Lv3_MomentumAgentWithHp", "init_fn_name": "init_drone" },
{ "team": 0, "type": "Lv3_MomentumAgentWithHp", "init_fn_name": "init_drone" },
{ "team": 0, "type": "Lv3_MomentumAgentWithHp", "init_fn_name": "init_drone" },
{ "team": 0, "type": "Lv3_MomentumAgentWithHp", "init_fn_name": "init_drone" },
{ "team": 0, "type": "Lv3_MomentumAgentWithHp", "init_fn_name": "init_drone" },
{ "team": 0, "type": "Lv3_MomentumAgentWithHp", "init_fn_name": "init_drone" },
{ "team": 0, "type": "Lv3_MomentumAgentWithHp", "init_fn_name": "init_drone" },
{ "team": 0, "type": "Lv3_MomentumAgentWithHp", "init_fn_name": "init_drone" },
{ "team": 0, "type": "Lv3_MomentumAgentWithHp", "init_fn_name": "init_drone" },
{ "team": 0, "type": "Lv3_MomentumAgentWithHp", "init_fn_name": "init_drone" },
{ "team": 0, "type": "Lv3_MomentumAgentWithHp", "init_fn_name": "init_drone" },
{ "team": 0, "type": "Lv3_MomentumAgentWithHp", "init_fn_name": "init_drone" },
{ "team": 0, "type": "Lv3_MomentumAgentWithHp", "init_fn_name": "init_drone" },
{ "team": 0, "type": "Lv3_MomentumAgentWithHp", "init_fn_name": "init_drone" },
{ "team": 1, "type": "Lv3_MomentumAgentWithHp", "init_fn_name": "init_drone" },
{ "team": 1, "type": "Lv3_MomentumAgentWithHp", "init_fn_name": "init_drone" },
{ "team": 1, "type": "Lv3_MomentumAgentWithHp", "init_fn_name": "init_drone" },
{ "team": 1, "type": "Lv3_MomentumAgentWithHp", "init_fn_name": "init_drone" },
{ "team": 1, "type": "Lv3_MomentumAgentWithHp", "init_fn_name": "init_drone" },
{ "team": 1, "type": "Lv3_MomentumAgentWithHp", "init_fn_name": "init_drone" },
{ "team": 1, "type": "Lv3_MomentumAgentWithHp", "init_fn_name": "init_drone" },
{ "team": 1, "type": "Lv3_MomentumAgentWithHp", "init_fn_name": "init_drone" },
{ "team": 1, "type": "Lv3_MomentumAgentWithHp", "init_fn_name": "init_drone" },
{ "team": 1, "type": "Lv3_MomentumAgentWithHp", "init_fn_name": "init_drone" },
{ "team": 1, "type": "Lv3_MomentumAgentWithHp", "init_fn_name": "init_drone" },
{ "team": 1, "type": "Lv3_MomentumAgentWithHp", "init_fn_name": "init_drone" },
{ "team": 1, "type": "Lv3_MomentumAgentWithHp", "init_fn_name": "init_drone" },
{ "team": 1, "type": "Lv3_MomentumAgentWithHp", "init_fn_name": "init_drone" },
{ "team": 1, "type": "Lv3_MomentumAgentWithHp", "init_fn_name": "init_drone" },
{ "team": 1, "type": "Lv3_MomentumAgentWithHp", "init_fn_name": "init_drone" },
{ "team": 1, "type": "Lv3_MomentumAgentWithHp", "init_fn_name": "init_drone" },
{ "team": 1, "type": "Lv3_MomentumAgentWithHp", "init_fn_name": "init_drone" },
{ "team": 1, "type": "Lv3_MomentumAgentWithHp", "init_fn_name": "init_drone" },
{ "team": 1, "type": "Lv3_MomentumAgentWithHp", "init_fn_name": "init_drone" },
]
obs_vec_length = 23
obs_n_entity = 11
ActionFormat = 'ASCII'
================================================
FILE: PythonExample/hmp_minimal_modules/MISSION/uhmap/SubTasks/UhmapHuge.py
================================================
import json, copy, re, os, inspect, os
import numpy as np
from UTIL.tensor_ops import my_view, repeat_at
from ...common.base_env import RawObsArray
from ..actset_lookup import digit2act_dictionary, AgentPropertyDefaults
from ..actset_lookup import decode_action_as_string, decode_action_as_string
from ..agent import Agent
from ..uhmap_env_wrapper import UhmapEnv, ScenarioConfig
from .UhmapHugeConf import SubTaskConfig
from .cython_func import tear_num_arr
from .SubtaskCommonFn import UhmapCommonFn
class UhmapHuge(UhmapCommonFn, UhmapEnv):
def __init__(self, rank) -> None:
super().__init__(rank)
self.observation_space = self.make_obs(get_shape=True)
self.SubTaskConfig = SubTaskConfig
assert os.path.basename(inspect.getfile(SubTaskConfig)) == type(self).__name__+'Conf.py', \
('make sure you have imported the correct SubTaskConfig class')
def extract_key_gameobj(self, resp):
keyObjArr = resp['dataGlobal']['keyObjArr']
return keyObjArr
def gen_reward_and_win(self, resp):
reward = [0]*self.n_teams
events = resp['dataGlobal']['events']
WinningResult = None
for event in events:
event_parsed = self.parse_event(event)
if event_parsed['Event'] == 'Destroyed':
team = self.find_agent_by_uid(event_parsed['UID']).team
reward[team] -= 0.05 # this team
reward[1-team] += 0.10 # opp team
if event_parsed['Event'] == 'EndEpisode':
# print([a.alive * a.hp for a in self.agents])
EndReason = event_parsed['EndReason']
WinTeam = int(event_parsed['WinTeam'])
if WinTeam<0: # end due to timeout
agents_left_each_team = [0 for _ in range(self.n_teams)]
for a in self.agents:
if a.alive: agents_left_each_team[a.team] += 1
WinTeam = np.argmax(agents_left_each_team)
# <<1>> The alive agent number is EQUAL
if agents_left_each_team[WinTeam] == agents_left_each_team[1-WinTeam]:
hp_each_team = [0 for _ in range(self.n_teams)]
for a in self.agents:
if a.alive: hp_each_team[a.team] += a.hp
WinTeam = np.argmax(hp_each_team)
# <<2>> The alive agent HP sum is EQUAL
if hp_each_team[WinTeam] == hp_each_team[1-WinTeam]:
WinTeam = -1
if WinTeam >= 0:
WinningResult = {
"team_ranking": [0,1] if WinTeam==0 else [1,0],
"end_reason": EndReason
}
reward[WinTeam] += 1
reward[1-WinTeam] -= 1
else:
WinningResult = {
"team_ranking": [-1, -1],
"end_reason": EndReason
}
reward = [-1 for _ in range(self.n_teams)]
# print(reward)
return reward, WinningResult
@staticmethod
def item_random_mv(src,dst,prob,rand=False):
assert len(src.shape)==1; assert len(dst.shape)==1
if rand: np.random.shuffle(src)
len_src = len(src)
n_mv = (np.random.rand(len_src) < prob).sum()
item_mv = src[range(len_src-n_mv,len_src)]
src = src[range(0,0+len_src-n_mv)]
dst = np.concatenate((item_mv, dst))
return src, dst
@staticmethod
def get_binary_array(n_int, n_bits=8, dtype=np.float32):
arr = np.zeros((*n_int.shape, n_bits), dtype=dtype)
for i in range(n_bits):
arr[:, i] = (n_int%2==1).astype(int)
n_int = n_int / 2
n_int = n_int.astype(np.int8)
return arr
def make_obs(self, resp=None, get_shape=False):
# CORE_DIM = 38
CORE_DIM = 23
assert ScenarioConfig.obs_vec_length == CORE_DIM
if get_shape:
return CORE_DIM
# temporary parameters
OBS_RANGE_PYTHON_SIDE = 1500
MAX_NUM_OPP_OBS = 5
MAX_NUM_ALL_OBS = 5
# get and calculate distance array
pos3d_arr = np.zeros(shape=(self.n_agents, 3), dtype=np.float32)
for i, agent in enumerate(self.agents): pos3d_arr[i] = agent.pos3d
# use the distance matrix calculated by unreal engine to accelerate
# dis_mat = distance_matrix(pos3d_arr) # dis_mat is a matrix, shape = (n_agent, n_agent)
dis_mat = resp['dataGlobal']['distanceMat']
alive_all = np.array([agent.alive for agent in self.agents])
try:
dis_mat[~alive_all,:] = +np.inf
dis_mat[:,~alive_all] = +np.inf
except:
pass
# get team list
team_belonging = np.array([agent.team for agent in self.agents])
# gather the obs arr of all known agents
obs_arr = RawObsArray(key='Agent')
if not hasattr(self, "uid_binary"):
self.uid_binary = self.get_binary_array(np.arange(self.n_agents), 10)
for i, agent in enumerate(self.agents):
assert agent.location is not None
assert agent.uid == i
obs_arr.append(
self.uid_binary[i] # 0~9
)
obs_arr.append([
agent.index, # 10
agent.team, # 11
agent.alive, # 12
agent.uid_remote, # 13
])
obs_arr.append( #[14,15,16,17,18,19]
agent.pos3d
# tear_num_arr(agent.pos3d, n_digits=6, base=10, mv_left=0)
# tear_num_arr(agent.pos3d, 6, ScenarioConfig.ObsBreakBase, 0) # 3 -- > 3*6 = 18 , 18-3=15, 23+15 = 38
)
obs_arr.append(
agent.vel3d
)
obs_arr.append([
agent.hp,
agent.yaw,
agent.max_speed,
])
obs_ = obs_arr.get()
new_obs = my_view(obs_, [self.n_agents, -1])
assert CORE_DIM == new_obs.shape[-1]
OBS_ALL_AGENTS = np.zeros(shape=(
self.n_agents,
MAX_NUM_OPP_OBS+MAX_NUM_ALL_OBS,
CORE_DIM
))
# now arranging the individual obs
for i, agent in enumerate(self.agents):
if not agent.alive:
OBS_ALL_AGENTS[i, :] = np.nan
continue
# if alive
# scope
dis2all = dis_mat[i, :]
is_ally = (team_belonging == agent.team)
# scope
a2h_dis = dis2all[~is_ally]
h_alive = alive_all[~is_ally]
h_feature = new_obs[~is_ally]
h_iden_sort = np.argsort(a2h_dis)[:MAX_NUM_OPP_OBS]
a2h_dis_sorted = a2h_dis[h_iden_sort]
h_alive_sorted = h_alive[h_iden_sort]
h_vis_mask = (a2h_dis_sorted <= OBS_RANGE_PYTHON_SIDE) & h_alive_sorted
# scope
h_vis_index = h_iden_sort[h_vis_mask]
h_invis_index = h_iden_sort[~h_vis_mask]
h_vis_index, h_invis_index = self.item_random_mv(src=h_vis_index, dst=h_invis_index,prob=0, rand=True)
h_ind = np.concatenate((h_vis_index, h_invis_index))
h_msk = np.concatenate((h_vis_index<0, h_invis_index>=0)) # "<0" project to False; ">=0" project to True
a2h_feature_sort = h_feature[h_ind]
a2h_feature_sort[h_msk] = 0
if len(a2h_feature_sort)
a2f_dis = dis2all[is_ally]
f_alive = alive_all[is_ally]
f_feature = new_obs[is_ally]
f_iden_sort = np.argsort(a2f_dis)[:MAX_NUM_ALL_OBS]
a2f_dis_sorted = a2f_dis[f_iden_sort]
f_alive_sorted = f_alive[f_iden_sort]
f_vis_mask = (a2f_dis_sorted <= OBS_RANGE_PYTHON_SIDE) & f_alive_sorted
# scope
f_vis_index = f_iden_sort[f_vis_mask]
self_vis_index = f_vis_index[:1] # seperate self and ally
f_vis_index = f_vis_index[1:] # seperate self and ally
f_invis_index = f_iden_sort[~f_vis_mask]
f_vis_index, f_invis_index = self.item_random_mv(src=f_vis_index, dst=f_invis_index,prob=0, rand=True)
f_ind = np.concatenate((self_vis_index, f_vis_index, f_invis_index))
f_msk = np.concatenate((self_vis_index<0, f_vis_index<0, f_invis_index>=0)) # "<0" project to False; ">=0" project to True
self_ally_feature_sort = f_feature[f_ind]
self_ally_feature_sort[f_msk] = 0
if len(self_ally_feature_sort) None:
super().__init__(rank)
self.observation_space = self.make_obs(get_shape=True)
self.SubTaskConfig = SubTaskConfig
inspect.getfile(SubTaskConfig)
assert os.path.basename(inspect.getfile(SubTaskConfig)) == type(self).__name__+'Conf.py', \
('make sure you have imported the correct SubTaskConfig class')
def reset(self):
"""
Reset function, it delivers reset command to unreal engine to spawn all agents
环境复位,每个episode的开始会执行一次此函数中会初始化所有智能体
"""
super().reset()
self.t = 0
pos_ro = np.random.rand()*2*np.pi
# spawn agents
AgentSettingArray = []
# count the number of agent in each team
n_team_agent = {}
for i, agent_info in enumerate(SubTaskConfig.agent_list):
team = agent_info['team']
if team not in n_team_agent: n_team_agent[team] = 0
SubTaskConfig.agent_list[i]['uid'] = i
SubTaskConfig.agent_list[i]['tid'] = n_team_agent[team]
n_team_agent[team] += 1
# push agent init info one by one
for i, agent_info in enumerate(SubTaskConfig.agent_list):
team = agent_info['team']
agent_info['n_team_agent'] = n_team_agent[team]
init_fn = getattr(self, agent_info['init_fn_name'])
AgentSettingArray.append(init_fn(agent_info, pos_ro))
self.agents = [Agent(team=a['team'], team_id=a['tid'], uid=a['uid']) for a in SubTaskConfig.agent_list]
# refer to struct.cpp, FParsedDataInput
resp = self.client.send_and_wait_reply(json.dumps({
'valid': True,
'DataCmd': 'reset',
'NumAgents' : len(SubTaskConfig.agent_list),
'AgentSettingArray': AgentSettingArray, # refer to struct.cpp, FAgentProperty
'TimeStepMax': ScenarioConfig.MaxEpisodeStep,
'TimeStep' : 0,
'Actions': None,
}))
resp = json.loads(resp)
# make sure the map (level in UE) is correct
# assert resp['dataGlobal']['levelName'] == 'UhmapLargeScale'
assert len(resp['dataArr']) == len(AgentSettingArray)
return self.parse_response_ob_info(resp)
def step(self, act):
"""
step 函数,act中包含了所有agent的决策
"""
assert len(act) == self.n_agents
# translate actions to the format recognized by unreal engine
if ScenarioConfig.ActionFormat == 'Single-Digit':
act_send = [digit2act_dictionary[a] for a in act]
elif ScenarioConfig.ActionFormat == 'Multi-Digit':
act_send = [decode_action_as_string(a) for a in act]
elif ScenarioConfig.ActionFormat == 'ASCII':
act_send = [digitsToStrAction(a) for a in act]
else:
raise "ActionFormat is wrong!"
# simulation engine IO
resp = json.loads(self.client.send_and_wait_reply(json.dumps({
'valid': True,
'DataCmd': 'step',
'TimeStep': self.t,
'Actions': None,
'StringActions': act_send,
})))
# get obs for RL, info for script AI
ob, info = self.parse_response_ob_info(resp)
# generate reward, get the episode ending infomation
RewardForAllTeams, WinningResult = self.gen_reward_and_win(resp)
if WinningResult is not None:
info.update(WinningResult)
assert resp['dataGlobal']['episodeDone']
done = True
else:
done = False
if resp['dataGlobal']['timeCnt'] >= ScenarioConfig.MaxEpisodeStep:
assert done
return (ob, RewardForAllTeams, done, info) # choose this if RewardAsUnity
def parse_event(self, event):
"""
解析环境返回的一些关键事件,
如智能体阵亡,某队伍胜利等等。
关键事件需要在ue中进行定义.
该设计极大地简化了python端奖励的设计流程,
减小了python端的运算量。
"""
if not hasattr(self, 'pattern'): self.pattern = re.compile(r'<([^<>]*)>([^<>]*)')
return {k:v for k,v in re.findall(self.pattern, event)}
def extract_key_gameobj(self, resp):
"""
获取非智能体的仿真物件,例如重要landmark等
"""
keyObjArr = resp['dataGlobal']['keyObjArr']
return keyObjArr
def gen_reward_and_win(self, resp):
"""
奖励的设计在此定义,
(UE端编程死板,虽然预留了相关字段,
但请不要在UE端提供奖励的定义。)
建议:在UE端定义触发奖励的事件,如智能体阵亡、战术目标完成等,见parse_event
"""
reward = [0]*self.n_teams
events = resp['dataGlobal']['events']
WinningResult = None
for event in events:
event_parsed = self.parse_event(event)
# if event_parsed['Event'] == 'Destroyed':
# team = self.find_agent_by_uid(event_parsed['UID']).team
# reward[team] -= 0.05 # this team
# reward[1-team] += 0.10 # opp team
if event_parsed['Event'] == 'EndEpisode':
# print([a.alive * a.hp for a in self.agents])
PredatorWin = False
PredatorRank = False
PredatorReward = 0
PreyWin = -1
PreyRank = -1
PreyReward = 0
EndReason = event_parsed['EndReason']
# According to MISSION\uhmap\SubTasks\UhmapInterceptConf.py, team 0 is prey team, team 1 is predator team
if EndReason == "AllPreyCaught" or EndReason == "Team_0_AllDead":
PredatorWin = True; PredatorRank = 0; PredatorReward = 1
PreyWin = False; PreyRank = 1; PreyReward = -1
elif EndReason == "TimeMaxCntReached" or EndReason == "Team_1_AllDead":
PredatorWin = False; PredatorRank = 1; PredatorReward = -1
PreyWin = True; PreyRank = 0; PreyReward = 1
else:
print('unexpected end reaon:', EndReason)
WinningResult = {"team_ranking": [PreyRank, PredatorRank], "end_reason": EndReason}
reward = [PreyReward, PredatorReward]
# print(reward)
return reward, WinningResult
def step_skip(self):
"""
跳过一次决策,无用的函数
"""
return self.client.send_and_wait_reply(json.dumps({
'valid': True,
'DataCmd': 'skip_frame',
}))
def find_agent_by_uid(self, uid):
"""
用uid查找智能体(带缓存加速机制)
"""
if not hasattr(self, 'uid_to_agent_dict'):
self.uid_to_agent_dict = {}
self.uid_to_agent_dict.update({agent.uid:agent for agent in self.agents})
if isinstance(uid, str):
self.uid_to_agent_dict.update({str(agent.uid):agent for agent in self.agents})
return self.uid_to_agent_dict[uid]
def parse_response_ob_info(self, resp):
"""
粗解析智能体的观测,例如把死智能体的位置替换为inf(无穷远),
将智能体的agentLocation从字典形式转变为更简洁的(x,y,z)tuple形式
"""
assert resp['valid']
resp['dataGlobal']['distanceMat'] = np.array(resp['dataGlobal']['distanceMat']['flat_arr']).reshape(self.n_agents,self.n_agents)
if len(resp['dataGlobal']['events'])>0:
tmp = [kv.split('>') for kv in resp['dataGlobal']['events'][0].split('<') if kv]
info_parse = {t[0]:t[1] for t in tmp}
info_dict = resp
for info in info_dict['dataArr']:
alive = info['agentAlive']
if alive:
agentLocation = info.pop('agentLocation')
agentRotation = info.pop('agentRotation')
agentVelocity = info.pop('agentVelocity')
agentScale = info.pop('agentScale')
info['agentLocationArr'] = (agentLocation['x'], agentLocation['y'], agentLocation['z'])
info['agentVelocityArr'] = (agentVelocity['x'], agentVelocity['y'], agentVelocity['z'])
info['agentRotationArr'] = (agentRotation['yaw'], agentRotation['pitch'], agentRotation['roll'])
info['agentScaleArr'] = (agentScale['x'], agentScale['y'], agentScale['z'])
info.pop('previousAction')
info.pop('availActions')
# info.pop('rSVD1')
info.pop('interaction')
else:
inf = float('inf')
info['agentLocationArr'] = (inf, inf, inf)
info['agentVelocityArr'] = (inf, inf, inf)
info['agentRotationArr'] = (inf, inf, inf)
info = resp['dataArr']
for i, agent_info in enumerate(info):
self.agents[i].update_agent_attrs(agent_info)
self.key_obj = self.extract_key_gameobj(resp)
# return ob, info
return self.make_obs(resp), info_dict
@staticmethod
def item_random_mv(src,dst,prob,rand=False):
assert len(src.shape)==1; assert len(dst.shape)==1
if rand: np.random.shuffle(src)
len_src = len(src)
n_mv = (np.random.rand(len_src) < prob).sum()
item_mv = src[range(len_src-n_mv,len_src)]
src = src[range(0,0+len_src-n_mv)]
dst = np.concatenate((item_mv, dst))
return src, dst
@staticmethod
def get_binary_array(n_int, n_bits=8, dtype=np.float32):
arr = np.zeros((*n_int.shape, n_bits), dtype=dtype)
for i in range(n_bits):
arr[:, i] = (n_int%2==1).astype(int)
n_int = n_int / 2
n_int = n_int.astype(np.int8)
return arr
def make_obs(self, resp=None, get_shape=False):
# CORE_DIM = 38
CORE_DIM = 23
assert ScenarioConfig.obs_vec_length == CORE_DIM
if get_shape:
return CORE_DIM
# temporary parameters
OBS_RANGE_PYTHON_SIDE = 15000
MAX_NUM_OPP_OBS = 5
MAX_NUM_ALL_OBS = 5
# get and calculate distance array
pos3d_arr = np.zeros(shape=(self.n_agents, 3), dtype=np.float32)
for i, agent in enumerate(self.agents): pos3d_arr[i] = agent.pos3d
# use the distance matrix calculated by unreal engine to accelerate
# dis_mat = distance_matrix(pos3d_arr) # dis_mat is a matrix, shape = (n_agent, n_agent)
dis_mat = resp['dataGlobal']['distanceMat']
alive_all = np.array([agent.alive for agent in self.agents])
try:
dis_mat[~alive_all,:] = +np.inf
dis_mat[:,~alive_all] = +np.inf
except:
pass
# get team list
team_belonging = np.array([agent.team for agent in self.agents])
# gather the obs arr of all known agents
obs_arr = RawObsArray(key='Agent')
if not hasattr(self, "uid_binary"):
self.uid_binary = self.get_binary_array(np.arange(self.n_agents), 10)
for i, agent in enumerate(self.agents):
assert agent.location is not None
assert agent.uid == i
obs_arr.append(
self.uid_binary[i] # 0~9
)
obs_arr.append([
agent.index, # 10
agent.team, # 11
agent.alive, # 12
agent.uid_remote, # 13
])
obs_arr.append( #[14,15,16,17,18,19]
agent.pos3d
# tear_num_arr(agent.pos3d, n_digits=6, base=10, mv_left=0)
# tear_num_arr(agent.pos3d, 6, ScenarioConfig.ObsBreakBase, 0) # 3 -- > 3*6 = 18 , 18-3=15, 23+15 = 38
)
obs_arr.append(
agent.vel3d
)
obs_arr.append([
agent.hp,
agent.yaw,
agent.max_speed,
])
obs_ = obs_arr.get()
new_obs = my_view(obs_, [self.n_agents, -1])
assert CORE_DIM == new_obs.shape[-1]
OBS_ALL_AGENTS = np.zeros(shape=(self.n_agents, MAX_NUM_OPP_OBS+MAX_NUM_ALL_OBS, CORE_DIM))
# now arranging the individual obs
for i, agent in enumerate(self.agents):
if not agent.alive:
OBS_ALL_AGENTS[i, :] = np.nan
continue
# if alive
# scope
dis2all = dis_mat[i, :]
is_ally = (team_belonging == agent.team)
# scope
a2h_dis = dis2all[~is_ally]
h_alive = alive_all[~is_ally]
h_feature = new_obs[~is_ally]
h_iden_sort = np.argsort(a2h_dis)[:MAX_NUM_OPP_OBS]
a2h_dis_sorted = a2h_dis[h_iden_sort]
h_alive_sorted = h_alive[h_iden_sort]
h_vis_mask = (a2h_dis_sorted <= OBS_RANGE_PYTHON_SIDE) & h_alive_sorted
# scope
h_vis_index = h_iden_sort[h_vis_mask]
h_invis_index = h_iden_sort[~h_vis_mask]
h_vis_index, h_invis_index = self.item_random_mv(src=h_vis_index, dst=h_invis_index,prob=0, rand=True)
h_ind = np.concatenate((h_vis_index, h_invis_index))
h_msk = np.concatenate((h_vis_index<0, h_invis_index>=0)) # "<0" project to False; ">=0" project to True
a2h_feature_sort = h_feature[h_ind]
a2h_feature_sort[h_msk] = 0
if len(a2h_feature_sort)
a2f_dis = dis2all[is_ally]
f_alive = alive_all[is_ally]
f_feature = new_obs[is_ally]
f_iden_sort = np.argsort(a2f_dis)[:MAX_NUM_ALL_OBS]
a2f_dis_sorted = a2f_dis[f_iden_sort]
f_alive_sorted = f_alive[f_iden_sort]
f_vis_mask = (a2f_dis_sorted <= OBS_RANGE_PYTHON_SIDE) & f_alive_sorted
# scope
f_vis_index = f_iden_sort[f_vis_mask]
self_vis_index = f_vis_index[:1] # seperate self and ally
f_vis_index = f_vis_index[1:] # seperate self and ally
f_invis_index = f_iden_sort[~f_vis_mask]
f_vis_index, f_invis_index = self.item_random_mv(src=f_vis_index, dst=f_invis_index,prob=0, rand=True)
f_ind = np.concatenate((self_vis_index, f_vis_index, f_invis_index))
f_msk = np.concatenate((self_vis_index<0, f_vis_index<0, f_invis_index>=0)) # "<0" project to False; ">=0" project to True
self_ally_feature_sort = f_feature[f_ind]
self_ally_feature_sort[f_msk] = 0
if len(self_ally_feature_sort) None:
super().__init__(rank)
self.observation_space = self.make_obs(get_shape=True)
self.SubTaskConfig = SubTaskConfig
inspect.getfile(SubTaskConfig)
assert os.path.basename(inspect.getfile(SubTaskConfig)) == type(self).__name__+'Conf.py', \
('make sure you have imported the correct SubTaskConfig class')
def extract_key_gameobj(self, resp):
"""
获取非智能体的仿真物件,例如重要landmark等
"""
keyObjArr = resp['dataGlobal']['keyObjArr']
return keyObjArr
def gen_reward_and_win(self, resp):
"""
奖励的设计在此定义,
(UE端编程死板,虽然预留了相关字段,
但请不要在UE端提供奖励的定义。)
建议:在UE端定义触发奖励的事件,如智能体阵亡、战术目标完成等,见parse_event
"""
reward = [0]*self.n_teams
events = resp['dataGlobal']['events']
WinningResult = None
for event in events:
event_parsed = self.parse_event(event)
if event_parsed['Event'] == 'EndEpisode':
# print([a.alive * a.hp for a in self.agents])
EndReason = event_parsed['EndReason']
# WinTeam = int(event_parsed['WinTeam'])
WinningResult = {
# 每个队伍的排名,可以指定例如[1, 0, 2],代表一队第2名,二队第1名,三队第3名
# 如果没有任何队伍取得胜利,可以指定例如[-1, -1, -1]
# 如果有两只队伍成绩并列,可以指定例如[0, 2, 0, 2], 代表一队三队并列第1名,二队四队并列第3名
"team_ranking": [-1, ],
"end_reason": EndReason
}
assert len(WinningResult["team_ranking"]) == ScenarioConfig.N_TEAM
# print(reward)
return reward, WinningResult
@staticmethod
def item_random_mv(src,dst,prob,rand=False):
assert len(src.shape)==1; assert len(dst.shape)==1
if rand: np.random.shuffle(src)
len_src = len(src)
n_mv = (np.random.rand(len_src) < prob).sum()
item_mv = src[range(len_src-n_mv,len_src)]
src = src[range(0,0+len_src-n_mv)]
dst = np.concatenate((item_mv, dst))
return src, dst
@staticmethod
def get_binary_array(n_int, n_bits=8, dtype=np.float32):
arr = np.zeros((*n_int.shape, n_bits), dtype=dtype)
for i in range(n_bits):
arr[:, i] = (n_int%2==1).astype(int)
n_int = n_int / 2
n_int = n_int.astype(np.int8)
return arr
def make_obs(self, resp=None, get_shape=False):
# CORE_DIM = 38
CORE_DIM = 23
assert ScenarioConfig.obs_vec_length == CORE_DIM
if get_shape:
return CORE_DIM
# temporary parameters
OBS_RANGE_PYTHON_SIDE = 1500
MAX_NUM_OPP_OBS = 5
MAX_NUM_ALL_OBS = 5
# get and calculate distance array
pos3d_arr = np.zeros(shape=(self.n_agents, 3), dtype=np.float32)
for i, agent in enumerate(self.agents): pos3d_arr[i] = agent.pos3d
# use the distance matrix calculated by unreal engine to accelerate
# dis_mat = distance_matrix(pos3d_arr) # dis_mat is a matrix, shape = (n_agent, n_agent)
dis_mat = resp['dataGlobal']['distanceMat']
alive_all = np.array([agent.alive for agent in self.agents])
try:
dis_mat[~alive_all,:] = +np.inf
dis_mat[:,~alive_all] = +np.inf
except:
pass
# get team list
team_belonging = np.array([agent.team for agent in self.agents])
# gather the obs arr of all known agents
obs_arr = RawObsArray(key='Agent')
if not hasattr(self, "uid_binary"):
self.uid_binary = self.get_binary_array(np.arange(self.n_agents), 10)
for i, agent in enumerate(self.agents):
assert agent.location is not None
assert agent.uid == i
obs_arr.append(
self.uid_binary[i] # 0~9
)
obs_arr.append([
agent.index, # 10
agent.team, # 11
agent.alive, # 12
agent.uid_remote, # 13
])
obs_arr.append( #[14,15,16,17,18,19]
agent.pos3d
# tear_num_arr(agent.pos3d, n_digits=6, base=10, mv_left=0)
# tear_num_arr(agent.pos3d, 6, ScenarioConfig.ObsBreakBase, 0) # 3 -- > 3*6 = 18 , 18-3=15, 23+15 = 38
)
obs_arr.append(
agent.vel3d
)
obs_arr.append([
agent.hp,
agent.yaw,
agent.max_speed,
])
obs_ = obs_arr.get()
new_obs = my_view(obs_, [self.n_agents, -1])
assert CORE_DIM == new_obs.shape[-1]
OBS_ALL_AGENTS = np.zeros(shape=(
self.n_agents,
MAX_NUM_OPP_OBS+MAX_NUM_ALL_OBS,
CORE_DIM
))
# now arranging the individual obs
for i, agent in enumerate(self.agents):
if not agent.alive:
OBS_ALL_AGENTS[i, :] = np.nan
continue
# if alive
# scope
dis2all = dis_mat[i, :]
is_ally = (team_belonging == agent.team)
# scope
a2h_dis = dis2all[~is_ally]
h_alive = alive_all[~is_ally]
h_feature = new_obs[~is_ally]
h_iden_sort = np.argsort(a2h_dis)[:MAX_NUM_OPP_OBS]
a2h_dis_sorted = a2h_dis[h_iden_sort]
h_alive_sorted = h_alive[h_iden_sort]
h_vis_mask = (a2h_dis_sorted <= OBS_RANGE_PYTHON_SIDE) & h_alive_sorted
# scope
h_vis_index = h_iden_sort[h_vis_mask]
h_invis_index = h_iden_sort[~h_vis_mask]
h_vis_index, h_invis_index = self.item_random_mv(src=h_vis_index, dst=h_invis_index,prob=0, rand=True)
h_ind = np.concatenate((h_vis_index, h_invis_index))
h_msk = np.concatenate((h_vis_index<0, h_invis_index>=0)) # "<0" project to False; ">=0" project to True
a2h_feature_sort = h_feature[h_ind]
a2h_feature_sort[h_msk] = 0
if len(a2h_feature_sort)
a2f_dis = dis2all[is_ally]
f_alive = alive_all[is_ally]
f_feature = new_obs[is_ally]
f_iden_sort = np.argsort(a2f_dis)[:MAX_NUM_ALL_OBS]
a2f_dis_sorted = a2f_dis[f_iden_sort]
f_alive_sorted = f_alive[f_iden_sort]
f_vis_mask = (a2f_dis_sorted <= OBS_RANGE_PYTHON_SIDE) & f_alive_sorted
# scope
f_vis_index = f_iden_sort[f_vis_mask]
self_vis_index = f_vis_index[:1] # seperate self and ally
f_vis_index = f_vis_index[1:] # seperate self and ally
f_invis_index = f_iden_sort[~f_vis_mask]
f_vis_index, f_invis_index = self.item_random_mv(src=f_vis_index, dst=f_invis_index,prob=0, rand=True)
f_ind = np.concatenate((self_vis_index, f_vis_index, f_invis_index))
f_msk = np.concatenate((self_vis_index<0, f_vis_index<0, f_invis_index>=0)) # "<0" project to False; ">=0" project to True
self_ally_feature_sort = f_feature[f_ind]
self_ally_feature_sort[f_msk] = 0
if len(self_ally_feature_sort) None:
super().__init__(rank)
self.observation_space = self.make_obs(get_shape=True)
self.SubTaskConfig = SubTaskConfig
assert os.path.basename(inspect.getfile(SubTaskConfig)) == type(self).__name__+'Conf.py', \
('make sure you have imported the correct SubTaskConfig class')
def extract_key_gameobj(self, resp):
keyObjArr = resp['dataGlobal']['keyObjArr']
return keyObjArr
def gen_reward_and_win(self, resp):
reward = [0]*self.n_teams
events = resp['dataGlobal']['events']
WinningResult = None
for event in events:
event_parsed = self.parse_event(event)
if event_parsed['Event'] == 'Destroyed':
team = self.find_agent_by_uid(event_parsed['UID']).team
reward[team] -= 0.05 # this team
reward[1-team] += 0.10 # opp team
if event_parsed['Event'] == 'EndEpisode':
# print([a.alive * a.hp for a in self.agents])
EndReason = event_parsed['EndReason']
WinTeam = int(event_parsed['WinTeam'])
if WinTeam<0: # end due to timeout
agents_left_each_team = [0 for _ in range(self.n_teams)]
for a in self.agents:
if a.alive: agents_left_each_team[a.team] += 1
WinTeam = np.argmax(agents_left_each_team)
# <<1>> The alive agent number is EQUAL
if agents_left_each_team[WinTeam] == agents_left_each_team[1-WinTeam]:
hp_each_team = [0 for _ in range(self.n_teams)]
for a in self.agents:
if a.alive: hp_each_team[a.team] += a.hp
WinTeam = np.argmax(hp_each_team)
# <<2>> The alive agent HP sum is EQUAL
if hp_each_team[WinTeam] == hp_each_team[1-WinTeam]:
WinTeam = -1
if WinTeam >= 0:
WinningResult = {
"team_ranking": [0,1] if WinTeam==0 else [1,0],
"end_reason": EndReason
}
reward[WinTeam] += 1
reward[1-WinTeam] -= 1
else:
WinningResult = {
"team_ranking": [-1, -1],
"end_reason": EndReason
}
reward = [-1 for _ in range(self.n_teams)]
# print(reward)
return reward, WinningResult
@staticmethod
def item_random_mv(src,dst,prob,rand=False):
assert len(src.shape)==1; assert len(dst.shape)==1
if rand: np.random.shuffle(src)
len_src = len(src)
n_mv = (np.random.rand(len_src) < prob).sum()
item_mv = src[range(len_src-n_mv,len_src)]
src = src[range(0,0+len_src-n_mv)]
dst = np.concatenate((item_mv, dst))
return src, dst
@staticmethod
def get_binary_array(n_int, n_bits=8, dtype=np.float32):
arr = np.zeros((*n_int.shape, n_bits), dtype=dtype)
for i in range(n_bits):
arr[:, i] = (n_int%2==1).astype(int)
n_int = n_int / 2
n_int = n_int.astype(np.int8)
return arr
def make_obs(self, resp=None, get_shape=False):
# CORE_DIM = 38
CORE_DIM = 23
assert ScenarioConfig.obs_vec_length == CORE_DIM
if get_shape:
return CORE_DIM
# temporary parameters
OBS_RANGE_PYTHON_SIDE = 1500
MAX_NUM_OPP_OBS = 5
MAX_NUM_ALL_OBS = 5
# get and calculate distance array
pos3d_arr = np.zeros(shape=(self.n_agents, 3), dtype=np.float32)
for i, agent in enumerate(self.agents): pos3d_arr[i] = agent.pos3d
# use the distance matrix calculated by unreal engine to accelerate
# dis_mat = distance_matrix(pos3d_arr) # dis_mat is a matrix, shape = (n_agent, n_agent)
dis_mat = resp['dataGlobal']['distanceMat']
alive_all = np.array([agent.alive for agent in self.agents])
try:
dis_mat[~alive_all,:] = +np.inf
dis_mat[:,~alive_all] = +np.inf
except:
pass
# get team list
team_belonging = np.array([agent.team for agent in self.agents])
# gather the obs arr of all known agents
obs_arr = RawObsArray(key='Agent')
if not hasattr(self, "uid_binary"):
self.uid_binary = self.get_binary_array(np.arange(self.n_agents), 10)
for i, agent in enumerate(self.agents):
assert agent.location is not None
assert agent.uid == i
obs_arr.append(
self.uid_binary[i] # 0~9
)
obs_arr.append([
agent.index, # 10
agent.team, # 11
agent.alive, # 12
agent.uid_remote, # 13
])
obs_arr.append( #[14,15,16,17,18,19]
agent.pos3d
# tear_num_arr(agent.pos3d, n_digits=6, base=10, mv_left=0)
# tear_num_arr(agent.pos3d, 6, ScenarioConfig.ObsBreakBase, 0) # 3 -- > 3*6 = 18 , 18-3=15, 23+15 = 38
)
obs_arr.append(
agent.vel3d
)
obs_arr.append([
agent.hp,
agent.yaw,
agent.max_speed,
])
obs_ = obs_arr.get()
new_obs = my_view(obs_, [self.n_agents, -1])
assert CORE_DIM == new_obs.shape[-1]
OBS_ALL_AGENTS = np.zeros(shape=(
self.n_agents,
MAX_NUM_OPP_OBS+MAX_NUM_ALL_OBS,
CORE_DIM
))
# now arranging the individual obs
for i, agent in enumerate(self.agents):
if not agent.alive:
OBS_ALL_AGENTS[i, :] = np.nan
continue
# if alive
# scope
dis2all = dis_mat[i, :]
is_ally = (team_belonging == agent.team)
# scope
a2h_dis = dis2all[~is_ally]
h_alive = alive_all[~is_ally]
h_feature = new_obs[~is_ally]
h_iden_sort = np.argsort(a2h_dis)[:MAX_NUM_OPP_OBS]
a2h_dis_sorted = a2h_dis[h_iden_sort]
h_alive_sorted = h_alive[h_iden_sort]
h_vis_mask = (a2h_dis_sorted <= OBS_RANGE_PYTHON_SIDE) & h_alive_sorted
# scope
h_vis_index = h_iden_sort[h_vis_mask]
h_invis_index = h_iden_sort[~h_vis_mask]
h_vis_index, h_invis_index = self.item_random_mv(src=h_vis_index, dst=h_invis_index,prob=0, rand=True)
h_ind = np.concatenate((h_vis_index, h_invis_index))
h_msk = np.concatenate((h_vis_index<0, h_invis_index>=0)) # "<0" project to False; ">=0" project to True
a2h_feature_sort = h_feature[h_ind]
a2h_feature_sort[h_msk] = 0
if len(a2h_feature_sort)
a2f_dis = dis2all[is_ally]
f_alive = alive_all[is_ally]
f_feature = new_obs[is_ally]
f_iden_sort = np.argsort(a2f_dis)[:MAX_NUM_ALL_OBS]
a2f_dis_sorted = a2f_dis[f_iden_sort]
f_alive_sorted = f_alive[f_iden_sort]
f_vis_mask = (a2f_dis_sorted <= OBS_RANGE_PYTHON_SIDE) & f_alive_sorted
# scope
f_vis_index = f_iden_sort[f_vis_mask]
self_vis_index = f_vis_index[:1] # seperate self and ally
f_vis_index = f_vis_index[1:] # seperate self and ally
f_invis_index = f_iden_sort[~f_vis_mask]
f_vis_index, f_invis_index = self.item_random_mv(src=f_vis_index, dst=f_invis_index,prob=0, rand=True)
f_ind = np.concatenate((self_vis_index, f_vis_index, f_invis_index))
f_msk = np.concatenate((self_vis_index<0, f_vis_index<0, f_invis_index>=0)) # "<0" project to False; ">=0" project to True
self_ally_feature_sort = f_feature[f_ind]
self_ally_feature_sort[f_msk] = 0
if len(self_ally_feature_sort) 0:
OBJ_UID_OFFSET = 32768
obs_arr = RawObsArray(key = 'GameObj')
for i, obj in enumerate(self.key_obj):
assert obj['uId'] - OBJ_UID_OFFSET == i
obs_arr.append(
-self.uid_binary[i] # reverse uid binary, self.uid_binary[i]
)
obs_arr.append([
obj['uId'] - OBJ_UID_OFFSET, #agent.index,
-1, #agent.team,
True, #agent.alive,
obj['uId'] - OBJ_UID_OFFSET, #agent.uid_remote,
])
# tear_num_arr(agent.pos3d, n_digits=6, base=10, mv_left=0)
obs_arr.append(
[
obj['location']['x'], obj['location']['y'], obj['location']['z'] # agent.pos3d
]
# tear_num_arr([
# obj['location']['x'], obj['location']['y'], obj['location']['z'] # agent.pos3d
# ], 6, ScenarioConfig.ObsBreakBase, 0)
)
obs_arr.append([
obj['velocity']['x'], obj['velocity']['y'], obj['velocity']['z'] # agent.vel3d
]+
[
-1, # hp
obj['rotation']['yaw'], # yaw
0, # max_speed
])
OBS_GameObj = my_view(obs_arr.get(), [len(self.key_obj), -1])
OBS_GameObj = OBS_GameObj[:MAX_OBJ_NUM_ACCEPT, :]
OBS_GameObj = repeat_at(OBS_GameObj, insert_dim=0, n_times=self.n_agents)
OBS_ALL_AGENTS = np.concatenate((OBS_ALL_AGENTS, OBS_GameObj), axis=1)
return OBS_ALL_AGENTS
def init_ground(self, agent_info, pos_ro):
N_COL = 2
agent_class = agent_info['type']
team = agent_info['team']
n_team_agent = 10
tid = agent_info['tid']
uid = agent_info['uid']
x = 0 + 800*(tid - n_team_agent//2) //N_COL
y = (400* (tid%N_COL) + 2000) * (-1)**(team+1)
x,y = np.matmul(np.array([x,y]), np.array([[np.cos(pos_ro), -np.sin(pos_ro)], [np.sin(pos_ro), np.cos(pos_ro)] ]))
z = 500 # 500 is slightly above the ground
yaw = 90 if team==0 else -90
assert np.abs(x) < 15000.0 and np.abs(y) < 15000.0
agent_property = copy.deepcopy(AgentPropertyDefaults)
agent_property.update({
'DebugAgent': False,
# max drive/fly speed
'MaxMoveSpeed': 720 if agent_class == 'RLA_CAR_Laser' else 600,
# also influence object mass, please change it with causion!
'AgentScale' : { 'x': 1, 'y': 1, 'z': 1, },
# probability of escaping dmg 闪避
"DodgeProb": 0.0,
# ms explode dmg
"ExplodeDmg": 20,
# team belonging
'AgentTeam': team,
# choose ue class to init
'ClassName': agent_class,
# Weapon CD
'WeaponCD': 1,
# open fire range
"PerceptionRange": 2000 if agent_class == 'RLA_CAR_Laser' else 2500,
"GuardRange": 1400 if agent_class == 'RLA_CAR_Laser' else 1700,
"FireRange": 750 if agent_class == 'RLA_CAR_Laser' else 1400,
# debugging
'RSVD1': '-Ring1=2000 -Ring2=1400 -Ring3=750' if agent_class == 'RLA_CAR_Laser' else '-Ring1=2500 -Ring2=1700 -Ring3=1400',
# regular
'RSVD2': '-InitAct=ActionSet2::Idle;AsFarAsPossible',
# agent hp
'AgentHp':np.random.randint(low=95,high=105) if agent_class == 'RLA_CAR_Laser' else np.random.randint(low=145,high=155),
# the rank of agent inside the team
'IndexInTeam': tid,
# the unique identity of this agent in simulation system
'UID': uid,
# show color
'Color':'(R=0,G=1,B=0,A=1)' if team==0 else '(R=0,G=0,B=1,A=1)',
# initial location
'InitLocation': { 'x': x, 'y': y, 'z': z, },
# initial facing direction et.al.
'InitRotator': { 'pitch': 0, 'roll': 0, 'yaw': yaw, },
}),
return agent_property
def init_air(self, agent_info, pos_ro):
N_COL = 2
agent_class = agent_info['type']
team = agent_info['team']
n_team_agent = 10
tid = agent_info['tid']
uid = agent_info['uid']
x = 0 + 800*(tid - n_team_agent//2) //N_COL
y = 2000 * (-1)**(team+1)
x,y = np.matmul(np.array([x,y]), np.array([[np.cos(pos_ro), -np.sin(pos_ro)], [np.sin(pos_ro), np.cos(pos_ro)] ]))
z = 1000
yaw = 90 if team==0 else -90
assert np.abs(x) < 15000.0 and np.abs(y) < 15000.0
agent_property = copy.deepcopy(AgentPropertyDefaults)
agent_property.update({
'DebugAgent': False,
# max drive/fly speed
'MaxMoveSpeed': 900,
# also influence object mass, please change it with causion!
'AgentScale' : { 'x': 1, 'y': 1, 'z': 1, },
# probability of escaping dmg 闪避
"DodgeProb": 0.0,
# ms explode dmg
"ExplodeDmg": 10,
# team belonging
'AgentTeam': team,
# choose ue class to init
'ClassName': agent_class,
# Weapon CD
'WeaponCD': 3,
# open fire range
"PerceptionRange": 2500,
"GuardRange": 1800,
"FireRange": 1700,
# debugging
'RSVD1': '-ring1=2500 -ring2=1800 -ring3=1700',
# regular
'RSVD2': '-InitAct=ActionSet2::Idle;StaticAlert',
# agent hp
'AgentHp':50,
# the rank of agent inside the team
'IndexInTeam': tid,
# the unique identity of this agent in simulation system
'UID': uid,
# show color
'Color':'(R=0,G=1,B=0,A=1)' if team==0 else '(R=0,G=0,B=1,A=1)',
# initial location
'InitLocation': { 'x': x, 'y': y, 'z': z, },
# initial facing direction et.al.
'InitRotator': { 'pitch': 0, 'roll': 0, 'yaw': yaw, },
}),
return agent_property
================================================
FILE: PythonExample/hmp_minimal_modules/MISSION/uhmap/SubTasks/UhmapLargeScaleConf.py
================================================
class SubTaskConfig():
agent_list = [
{ 'team':0, 'tid':0, 'uid':0, 'n_team_agent':10, 'type':'RLA_CAR_Laser', 'init_fn_name':'init_ground', },
{ 'team':0, 'tid':1, 'uid':1, 'n_team_agent':10, 'type':'RLA_CAR', 'init_fn_name':'init_ground', },
{ 'team':0, 'tid':2, 'uid':2, 'n_team_agent':10, 'type':'RLA_CAR_Laser', 'init_fn_name':'init_ground', },
{ 'team':0, 'tid':3, 'uid':3, 'n_team_agent':10, 'type':'RLA_CAR', 'init_fn_name':'init_ground', },
{ 'team':0, 'tid':4, 'uid':4, 'n_team_agent':10, 'type':'RLA_CAR_Laser', 'init_fn_name':'init_ground', },
{ 'team':0, 'tid':5, 'uid':5, 'n_team_agent':10, 'type':'RLA_CAR', 'init_fn_name':'init_ground', },
{ 'team':0, 'tid':6, 'uid':6, 'n_team_agent':10, 'type':'RLA_CAR_Laser', 'init_fn_name':'init_ground', },
{ 'team':0, 'tid':7, 'uid':7, 'n_team_agent':10, 'type':'RLA_CAR', 'init_fn_name':'init_ground', },
{ 'team':0, 'tid':8, 'uid':8, 'n_team_agent':10, 'type':'RLA_CAR_Laser', 'init_fn_name':'init_ground', },
{ 'team':0, 'tid':9, 'uid':9, 'n_team_agent':10, 'type':'RLA_CAR', 'init_fn_name':'init_ground', },
{ 'team':1, 'tid':0, 'uid':10, 'n_team_agent':10, 'type':'RLA_CAR_Laser', 'init_fn_name':'init_ground', },
{ 'team':1, 'tid':1, 'uid':11, 'n_team_agent':10, 'type':'RLA_CAR', 'init_fn_name':'init_ground', },
{ 'team':1, 'tid':2, 'uid':12, 'n_team_agent':10, 'type':'RLA_CAR_Laser', 'init_fn_name':'init_ground', },
{ 'team':1, 'tid':3, 'uid':13, 'n_team_agent':10, 'type':'RLA_CAR', 'init_fn_name':'init_ground', },
{ 'team':1, 'tid':4, 'uid':14, 'n_team_agent':10, 'type':'RLA_CAR_Laser', 'init_fn_name':'init_ground', },
{ 'team':1, 'tid':5, 'uid':15, 'n_team_agent':10, 'type':'RLA_CAR', 'init_fn_name':'init_ground', },
{ 'team':1, 'tid':6, 'uid':16, 'n_team_agent':10, 'type':'RLA_CAR_Laser', 'init_fn_name':'init_ground', },
{ 'team':1, 'tid':7, 'uid':17, 'n_team_agent':10, 'type':'RLA_CAR', 'init_fn_name':'init_ground', },
{ 'team':1, 'tid':8, 'uid':18, 'n_team_agent':10, 'type':'RLA_CAR_Laser', 'init_fn_name':'init_ground', },
{ 'team':1, 'tid':9, 'uid':19, 'n_team_agent':10, 'type':'RLA_CAR', 'init_fn_name':'init_ground', },
]
obs_vec_length = 23
obs_n_entity = 11
ActionFormat = 'Multi-Digit'
================================================
FILE: PythonExample/hmp_minimal_modules/MISSION/uhmap/SubTasks/UhmapPreyPredator.py
================================================
import json, copy, re, os, inspect, os
import numpy as np
from UTIL.tensor_ops import my_view, repeat_at
from ...common.base_env import RawObsArray
from ..actionset_v3 import digitsToStrAction
from ..agent import Agent
from ..uhmap_env_wrapper import UhmapEnv, ScenarioConfig
from .UhmapPreyPredatorConf import SubTaskConfig
from .SubtaskCommonFn import UhmapCommonFn
from .cython_func import tear_num_arr
class UhmapPreyPredator(UhmapCommonFn, UhmapEnv):
def __init__(self, rank) -> None:
super().__init__(rank)
self.observation_space = self.make_obs(get_shape=True)
self.SubTaskConfig = SubTaskConfig
inspect.getfile(SubTaskConfig)
assert os.path.basename(inspect.getfile(SubTaskConfig)) == type(self).__name__+'Conf.py', \
('make sure you have imported the correct SubTaskConfig class')
def init_ground(self, agent_info, pos_ro):
N_COL = 4
agent_class = agent_info['type']
team = agent_info['team']
n_team_agent = 50
tid = agent_info['tid']
uid = agent_info['uid']
x = 0 + 800*(tid - n_team_agent//2) //N_COL
y = (400* (tid%N_COL) + 2000) * (-1)**(team+1)
x,y = np.matmul(np.array([x,y]), np.array([[np.cos(pos_ro), -np.sin(pos_ro)], [np.sin(pos_ro), np.cos(pos_ro)] ]))
z = 500 # 500 is slightly above the ground
yaw = 90 if team==0 else -90
assert np.abs(x) < 15000.0 and np.abs(y) < 15000.0
agent_property = copy.deepcopy(SubTaskConfig.AgentPropertyDefaults)
agent_property.update({
'DebugAgent': False,
# max drive/fly speed
'MaxMoveSpeed': 720 if agent_class == 'RLA_CAR_Laser' else 600,
# also influence object mass, please change it with causion!
'AgentScale' : { 'x': 1, 'y': 1, 'z': 1, },
# team belonging
'AgentTeam': team,
# choose ue class to init
'ClassName': agent_class,
# debugging
'RSVD1': '-Ring1=2000 -Ring2=1400 -Ring3=750',
# the rank of agent inside the team
'IndexInTeam': tid,
# the unique identity of this agent in simulation system
'UID': uid,
# show color
'Color':'(R=0,G=1,B=0,A=1)' if team==0 else '(R=0,G=0,B=1,A=1)',
# initial location
'InitLocation': { 'x': x, 'y': y, 'z': z, },
# initial facing direction et.al.
'InitRotator': { 'pitch': 0, 'roll': 0, 'yaw': yaw, },
}),
return agent_property
def extract_key_gameobj(self, resp):
"""
获取非智能体的仿真物件,例如重要landmark等
"""
keyObjArr = resp['dataGlobal']['keyObjArr']
return keyObjArr
def gen_reward_and_win(self, resp):
"""
奖励的设计在此定义,
(UE端编程死板,虽然预留了相关字段,
但请不要在UE端提供奖励的定义。)
建议:在UE端定义触发奖励的事件,如智能体阵亡、战术目标完成等,见parse_event
"""
reward = [0]*self.n_teams
events = resp['dataGlobal']['events']
WinningResult = None
for event in events:
event_parsed = self.parse_event(event)
# if event_parsed['Event'] == 'Destroyed':
# team = self.find_agent_by_uid(event_parsed['UID']).team
# reward[team] -= 0.05 # this team
# reward[1-team] += 0.10 # opp team
if event_parsed['Event'] == 'EndEpisode':
# print([a.alive * a.hp for a in self.agents])
PredatorWin = False
PredatorRank = False
PredatorReward = 0
PreyWin = -1
PreyRank = -1
PreyReward = 0
EndReason = event_parsed['EndReason']
# According to MISSION\uhmap\SubTasks\UhmapPreyPredatorConf.py, team 0 is prey team, team 1 is predator team
if EndReason == "AllPreyCaught" or EndReason == "Team_0_AllDead":
PredatorWin = True; PredatorRank = 0; PredatorReward = 1
PreyWin = False; PreyRank = 1; PreyReward = -1
elif EndReason == "TimeMaxCntReached" or EndReason == "Team_1_AllDead":
PredatorWin = False; PredatorRank = 1; PredatorReward = -1
PreyWin = True; PreyRank = 0; PreyReward = 1
else:
print('unexpected end reaon:', EndReason)
WinningResult = {"team_ranking": [PreyRank, PredatorRank], "end_reason": EndReason}
reward = [PreyReward, PredatorReward]
# print(reward)
return reward, WinningResult
@staticmethod
def item_random_mv(src,dst,prob,rand=False):
assert len(src.shape)==1; assert len(dst.shape)==1
if rand: np.random.shuffle(src)
len_src = len(src)
n_mv = (np.random.rand(len_src) < prob).sum()
item_mv = src[range(len_src-n_mv,len_src)]
src = src[range(0,0+len_src-n_mv)]
dst = np.concatenate((item_mv, dst))
return src, dst
@staticmethod
def get_binary_array(n_int, n_bits=8, dtype=np.float32):
arr = np.zeros((*n_int.shape, n_bits), dtype=dtype)
for i in range(n_bits):
arr[:, i] = (n_int%2==1).astype(int)
n_int = n_int / 2
n_int = n_int.astype(np.int8)
return arr
def make_obs(self, resp=None, get_shape=False):
# CORE_DIM = 38
CORE_DIM = 23
assert ScenarioConfig.obs_vec_length == CORE_DIM
if get_shape:
return CORE_DIM
# temporary parameters
OBS_RANGE_PYTHON_SIDE = 15000
MAX_NUM_OPP_OBS = 5
MAX_NUM_ALL_OBS = 5
# get and calculate distance array
pos3d_arr = np.zeros(shape=(self.n_agents, 3), dtype=np.float32)
for i, agent in enumerate(self.agents): pos3d_arr[i] = agent.pos3d
# use the distance matrix calculated by unreal engine to accelerate
# dis_mat = distance_matrix(pos3d_arr) # dis_mat is a matrix, shape = (n_agent, n_agent)
dis_mat = resp['dataGlobal']['distanceMat']
alive_all = np.array([agent.alive for agent in self.agents])
try:
dis_mat[~alive_all,:] = +np.inf
dis_mat[:,~alive_all] = +np.inf
except:
pass
# get team list
team_belonging = np.array([agent.team for agent in self.agents])
# gather the obs arr of all known agents
obs_arr = RawObsArray(key='Agent')
if not hasattr(self, "uid_binary"):
self.uid_binary = self.get_binary_array(np.arange(self.n_agents), 10)
for i, agent in enumerate(self.agents):
assert agent.location is not None
assert agent.uid == i
obs_arr.append(
self.uid_binary[i] # 0~9
)
obs_arr.append([
agent.index, # 10
agent.team, # 11
agent.alive, # 12
agent.uid_remote, # 13
])
obs_arr.append( #[14,15,16,17,18,19]
agent.pos3d
# tear_num_arr(agent.pos3d, n_digits=6, base=10, mv_left=0)
# tear_num_arr(agent.pos3d, 6, ScenarioConfig.ObsBreakBase, 0) # 3 -- > 3*6 = 18 , 18-3=15, 23+15 = 38
)
obs_arr.append(
agent.vel3d
)
obs_arr.append([
agent.hp,
agent.yaw,
agent.max_speed,
])
obs_ = obs_arr.get()
new_obs = my_view(obs_, [self.n_agents, -1])
assert CORE_DIM == new_obs.shape[-1]
OBS_ALL_AGENTS = np.zeros(shape=(self.n_agents, MAX_NUM_OPP_OBS+MAX_NUM_ALL_OBS, CORE_DIM))
# now arranging the individual obs
for i, agent in enumerate(self.agents):
if not agent.alive:
OBS_ALL_AGENTS[i, :] = np.nan
continue
# if alive
# scope
dis2all = dis_mat[i, :]
is_ally = (team_belonging == agent.team)
# scope
a2h_dis = dis2all[~is_ally]
h_alive = alive_all[~is_ally]
h_feature = new_obs[~is_ally]
h_iden_sort = np.argsort(a2h_dis)[:MAX_NUM_OPP_OBS]
a2h_dis_sorted = a2h_dis[h_iden_sort]
h_alive_sorted = h_alive[h_iden_sort]
h_vis_mask = (a2h_dis_sorted <= OBS_RANGE_PYTHON_SIDE) & h_alive_sorted
# scope
h_vis_index = h_iden_sort[h_vis_mask]
h_invis_index = h_iden_sort[~h_vis_mask]
h_vis_index, h_invis_index = self.item_random_mv(src=h_vis_index, dst=h_invis_index,prob=0, rand=True)
h_ind = np.concatenate((h_vis_index, h_invis_index))
h_msk = np.concatenate((h_vis_index<0, h_invis_index>=0)) # "<0" project to False; ">=0" project to True
a2h_feature_sort = h_feature[h_ind]
a2h_feature_sort[h_msk] = 0
if len(a2h_feature_sort)
a2f_dis = dis2all[is_ally]
f_alive = alive_all[is_ally]
f_feature = new_obs[is_ally]
f_iden_sort = np.argsort(a2f_dis)[:MAX_NUM_ALL_OBS]
a2f_dis_sorted = a2f_dis[f_iden_sort]
f_alive_sorted = f_alive[f_iden_sort]
f_vis_mask = (a2f_dis_sorted <= OBS_RANGE_PYTHON_SIDE) & f_alive_sorted
# scope
f_vis_index = f_iden_sort[f_vis_mask]
self_vis_index = f_vis_index[:1] # seperate self and ally
f_vis_index = f_vis_index[1:] # seperate self and ally
f_invis_index = f_iden_sort[~f_vis_mask]
f_vis_index, f_invis_index = self.item_random_mv(src=f_vis_index, dst=f_invis_index,prob=0, rand=True)
f_ind = np.concatenate((self_vis_index, f_vis_index, f_invis_index))
f_msk = np.concatenate((self_vis_index<0, f_vis_index<0, f_invis_index>=0)) # "<0" project to False; ">=0" project to True
self_ally_feature_sort = f_feature[f_ind]
self_ally_feature_sort[f_msk] = 0
if len(self_ally_feature_sort) None:
super().__init__(rank)
self.observation_space = self.make_obs(get_shape=True)
self.SubTaskConfig = SubTaskConfig
inspect.getfile(SubTaskConfig)
assert os.path.basename(inspect.getfile(SubTaskConfig)) == type(self).__name__+'Conf.py', \
('make sure you have imported the correct SubTaskConfig class')
def reset(self):
"""
Reset function, it delivers reset command to unreal engine to spawn all agents
环境复位,每个episode的开始会执行一次此函数中会初始化所有智能体
"""
super().reset()
self.t = 0
pos_ro = np.random.rand()*2*np.pi
# spawn agents
AgentSettingArray = []
# count the number of agent in each team
n_team_agent = {}
for i, agent_info in enumerate(SubTaskConfig.agent_list):
team = agent_info['team']
if team not in n_team_agent: n_team_agent[team] = 0
SubTaskConfig.agent_list[i]['uid'] = i
SubTaskConfig.agent_list[i]['tid'] = n_team_agent[team]
n_team_agent[team] += 1
# push agent init info one by one
for i, agent_info in enumerate(SubTaskConfig.agent_list):
team = agent_info['team']
agent_info['n_team_agent'] = n_team_agent[team]
init_fn = getattr(self, agent_info['init_fn_name'])
AgentSettingArray.append(init_fn(agent_info, pos_ro))
self.agents = [Agent(team=a['team'], team_id=a['tid'], uid=a['uid']) for a in SubTaskConfig.agent_list]
# refer to struct.cpp, FParsedDataInput
resp = self.client.send_and_wait_reply(json.dumps({
'valid': True,
'DataCmd': 'reset',
'NumAgents' : len(SubTaskConfig.agent_list),
'AgentSettingArray': AgentSettingArray, # refer to struct.cpp, FAgentProperty
'TimeStepMax': ScenarioConfig.MaxEpisodeStep,
'TimeStep' : 0,
'Actions': None,
}))
resp = json.loads(resp)
# make sure the map (level in UE) is correct
# assert resp['dataGlobal']['levelName'] == 'UhmapLargeScale'
assert len(resp['dataArr']) == len(AgentSettingArray)
return self.parse_response_ob_info(resp)
def step(self, act):
"""
step 函数,act中包含了所有agent的决策
"""
assert len(act) == self.n_agents
# translate actions to the format recognized by unreal engine
if ScenarioConfig.ActionFormat == 'Single-Digit':
act_send = [digit2act_dictionary[a] for a in act]
elif ScenarioConfig.ActionFormat == 'Multi-Digit':
act_send = [decode_action_as_string(a) for a in act]
elif ScenarioConfig.ActionFormat == 'ASCII':
act_send = [digitsToStrAction(a) for a in act]
else:
raise "ActionFormat is wrong!"
# simulation engine IO
resp = json.loads(self.client.send_and_wait_reply(json.dumps({
'valid': True,
'DataCmd': 'step',
'TimeStep': self.t,
'Actions': None,
'StringActions': act_send,
})))
# get obs for RL, info for script AI
ob, info = self.parse_response_ob_info(resp)
# generate reward, get the episode ending infomation
RewardForAllTeams, WinningResult = self.gen_reward_and_win(resp)
if WinningResult is not None:
info.update(WinningResult)
assert resp['dataGlobal']['episodeDone']
done = True
else:
done = False
if resp['dataGlobal']['timeCnt'] >= ScenarioConfig.MaxEpisodeStep:
assert done
return (ob, RewardForAllTeams, done, info) # choose this if RewardAsUnity
def parse_event(self, event):
"""
解析环境返回的一些关键事件,
如智能体阵亡,某队伍胜利等等。
关键事件需要在ue中进行定义.
该设计极大地简化了python端奖励的设计流程,
减小了python端的运算量。
"""
if not hasattr(self, 'pattern'): self.pattern = re.compile(r'<([^<>]*)>([^<>]*)')
return {k:v for k,v in re.findall(self.pattern, event)}
def extract_key_gameobj(self, resp):
"""
获取非智能体的仿真物件,例如重要landmark等
"""
keyObjArr = resp['dataGlobal']['keyObjArr']
return keyObjArr
def gen_reward_and_win(self, resp):
"""
奖励的设计在此定义,
(UE端编程死板,虽然预留了相关字段,
但请不要在UE端提供奖励的定义。)
建议:在UE端定义触发奖励的事件,如智能体阵亡、战术目标完成等,见parse_event
"""
reward = [0]*self.n_teams
events = resp['dataGlobal']['events']
WinningResult = None
for event in events:
event_parsed = self.parse_event(event)
if event_parsed['Event'] == 'EndEpisode':
PreyRank = -1
PreyReward = 0
EndReason = event_parsed['EndReason']
WinningResult = {"team_ranking": [0], "end_reason": EndReason}
reward = [0]
return reward, WinningResult
def step_skip(self):
"""
跳过一次决策,无用的函数
"""
return self.client.send_and_wait_reply(json.dumps({
'valid': True,
'DataCmd': 'skip_frame',
}))
def find_agent_by_uid(self, uid):
"""
用uid查找智能体(带缓存加速机制)
"""
if not hasattr(self, 'uid_to_agent_dict'):
self.uid_to_agent_dict = {}
self.uid_to_agent_dict.update({agent.uid:agent for agent in self.agents})
if isinstance(uid, str):
self.uid_to_agent_dict.update({str(agent.uid):agent for agent in self.agents})
return self.uid_to_agent_dict[uid]
def parse_response_ob_info(self, resp):
"""
粗解析智能体的观测,例如把死智能体的位置替换为inf(无穷远),
将智能体的agentLocation从字典形式转变为更简洁的(x,y,z)tuple形式
"""
assert resp['valid']
resp['dataGlobal']['distanceMat'] = np.array(resp['dataGlobal']['distanceMat']['flat_arr']).reshape(self.n_agents,self.n_agents)
if len(resp['dataGlobal']['events'])>0:
tmp = [kv.split('>') for kv in resp['dataGlobal']['events'][0].split('<') if kv]
info_parse = {t[0]:t[1] for t in tmp}
info_dict = resp
for info in info_dict['dataArr']:
alive = info['agentAlive']
if alive:
agentLocation = info.pop('agentLocation')
agentRotation = info.pop('agentRotation')
agentVelocity = info.pop('agentVelocity')
agentScale = info.pop('agentScale')
info['agentLocationArr'] = (agentLocation['x'], agentLocation['y'], agentLocation['z'])
info['agentVelocityArr'] = (agentVelocity['x'], agentVelocity['y'], agentVelocity['z'])
info['agentRotationArr'] = (agentRotation['yaw'], agentRotation['pitch'], agentRotation['roll'])
info['agentScaleArr'] = (agentScale['x'], agentScale['y'], agentScale['z'])
info.pop('previousAction')
info.pop('availActions')
# info.pop('rSVD1')
info.pop('interaction')
else:
inf = float('inf')
info['agentLocationArr'] = (inf, inf, inf)
info['agentVelocityArr'] = (inf, inf, inf)
info['agentRotationArr'] = (inf, inf, inf)
info = resp['dataArr']
for i, agent_info in enumerate(info):
self.agents[i].update_agent_attrs(agent_info)
self.key_obj = self.extract_key_gameobj(resp)
# return ob, info
return self.make_obs(resp), info_dict
@staticmethod
def item_random_mv(src,dst,prob,rand=False):
assert len(src.shape)==1; assert len(dst.shape)==1
if rand: np.random.shuffle(src)
len_src = len(src)
n_mv = (np.random.rand(len_src) < prob).sum()
item_mv = src[range(len_src-n_mv,len_src)]
src = src[range(0,0+len_src-n_mv)]
dst = np.concatenate((item_mv, dst))
return src, dst
@staticmethod
def get_binary_array(n_int, n_bits=8, dtype=np.float32):
arr = np.zeros((*n_int.shape, n_bits), dtype=dtype)
for i in range(n_bits):
arr[:, i] = (n_int%2==1).astype(int)
n_int = n_int / 2
n_int = n_int.astype(np.int8)
return arr
def make_obs(self, resp=None, get_shape=False):
# CORE_DIM = 38
CORE_DIM = 23
assert ScenarioConfig.obs_vec_length == CORE_DIM
if get_shape:
return CORE_DIM
# temporary parameters
OBS_RANGE_PYTHON_SIDE = 15000
MAX_NUM_OPP_OBS = 5
MAX_NUM_ALL_OBS = 5
# get and calculate distance array
pos3d_arr = np.zeros(shape=(self.n_agents, 3), dtype=np.float32)
for i, agent in enumerate(self.agents): pos3d_arr[i] = agent.pos3d
# use the distance matrix calculated by unreal engine to accelerate
# dis_mat = distance_matrix(pos3d_arr) # dis_mat is a matrix, shape = (n_agent, n_agent)
dis_mat = resp['dataGlobal']['distanceMat']
alive_all = np.array([agent.alive for agent in self.agents])
try:
dis_mat[~alive_all,:] = +np.inf
dis_mat[:,~alive_all] = +np.inf
except:
pass
# get team list
team_belonging = np.array([agent.team for agent in self.agents])
# gather the obs arr of all known agents
obs_arr = RawObsArray(key='Agent')
if not hasattr(self, "uid_binary"):
self.uid_binary = self.get_binary_array(np.arange(self.n_agents), 10)
for i, agent in enumerate(self.agents):
assert agent.location is not None
assert agent.uid == i
obs_arr.append(
self.uid_binary[i] # 0~9
)
obs_arr.append([
agent.index, # 10
agent.team, # 11
agent.alive, # 12
agent.uid_remote, # 13
])
obs_arr.append( #[14,15,16,17,18,19]
agent.pos3d
# tear_num_arr(agent.pos3d, n_digits=6, base=10, mv_left=0)
# tear_num_arr(agent.pos3d, 6, ScenarioConfig.ObsBreakBase, 0) # 3 -- > 3*6 = 18 , 18-3=15, 23+15 = 38
)
obs_arr.append(
agent.vel3d
)
obs_arr.append([
agent.hp,
agent.yaw,
agent.max_speed,
])
obs_ = obs_arr.get()
new_obs = my_view(obs_, [self.n_agents, -1])
assert CORE_DIM == new_obs.shape[-1]
OBS_ALL_AGENTS = np.zeros(shape=(self.n_agents, MAX_NUM_OPP_OBS+MAX_NUM_ALL_OBS, CORE_DIM))
# now arranging the individual obs
for i, agent in enumerate(self.agents):
if not agent.alive:
OBS_ALL_AGENTS[i, :] = np.nan
continue
# if alive
# scope
dis2all = dis_mat[i, :]
is_ally = (team_belonging == agent.team)
# scope
a2h_dis = dis2all[~is_ally]
h_alive = alive_all[~is_ally]
h_feature = new_obs[~is_ally]
h_iden_sort = np.argsort(a2h_dis)[:MAX_NUM_OPP_OBS]
a2h_dis_sorted = a2h_dis[h_iden_sort]
h_alive_sorted = h_alive[h_iden_sort]
h_vis_mask = (a2h_dis_sorted <= OBS_RANGE_PYTHON_SIDE) & h_alive_sorted
# scope
h_vis_index = h_iden_sort[h_vis_mask]
h_invis_index = h_iden_sort[~h_vis_mask]
h_vis_index, h_invis_index = self.item_random_mv(src=h_vis_index, dst=h_invis_index,prob=0, rand=True)
h_ind = np.concatenate((h_vis_index, h_invis_index))
h_msk = np.concatenate((h_vis_index<0, h_invis_index>=0)) # "<0" project to False; ">=0" project to True
a2h_feature_sort = h_feature[h_ind]
a2h_feature_sort[h_msk] = 0
if len(a2h_feature_sort)
a2f_dis = dis2all[is_ally]
f_alive = alive_all[is_ally]
f_feature = new_obs[is_ally]
f_iden_sort = np.argsort(a2f_dis)[:MAX_NUM_ALL_OBS]
a2f_dis_sorted = a2f_dis[f_iden_sort]
f_alive_sorted = f_alive[f_iden_sort]
f_vis_mask = (a2f_dis_sorted <= OBS_RANGE_PYTHON_SIDE) & f_alive_sorted
# scope
f_vis_index = f_iden_sort[f_vis_mask]
self_vis_index = f_vis_index[:1] # seperate self and ally
f_vis_index = f_vis_index[1:] # seperate self and ally
f_invis_index = f_iden_sort[~f_vis_mask]
f_vis_index, f_invis_index = self.item_random_mv(src=f_vis_index, dst=f_invis_index,prob=0, rand=True)
f_ind = np.concatenate((self_vis_index, f_vis_index, f_invis_index))
f_msk = np.concatenate((self_vis_index<0, f_vis_index<0, f_invis_index>=0)) # "<0" project to False; ">=0" project to True
self_ally_feature_sort = f_feature[f_ind]
self_ally_feature_sort[f_msk] = 0
if len(self_ally_feature_sort) None:
super().__init__(rank)
self.observation_space = self.make_obs(get_shape=True)
self.SubTaskConfig = SubTaskConfig
inspect.getfile(SubTaskConfig)
assert os.path.basename(inspect.getfile(SubTaskConfig)) == type(self).__name__+'Conf.py', \
('make sure you have imported the correct SubTaskConfig class')
def init_ship(self, agent_info, pos_ro):
agent_class = agent_info['type']
team = agent_info['team']
tid = agent_info['tid'] # tid 是智能体在队伍中的编号
uid = agent_info['uid'] # uid 是智能体在仿真中的唯一编号
x = -2000
y = (tid * 1000) # tid 是智能体在队伍中的编号
z = 500 #
agent_property = copy.deepcopy(SubTaskConfig.AgentPropertyDefaults)
agent_property.update({
'DebugAgent': False,
# max drive/fly speed
'MaxMoveSpeed': 500,
# also influence object mass, please change it with causion!
'AgentScale' : { 'x': 1, 'y': 1, 'z': 1, },
# team belonging
'AgentTeam': team,
# choose ue class to init
'ClassName': agent_class,
# custom args
'RSVD1': '',
# the rank of agent inside the team
'IndexInTeam': tid,
# the unique identity of this agent in simulation system
'UID': uid,
# show color
'Color':'(R=0,G=1,B=0,A=1)' if team==0 else '(R=0,G=0,B=1,A=1)',
# initial location
'InitLocation': { 'x': x, 'y': y, 'z': z, },
# initial facing direction et.al.
'InitRotator': { 'pitch': 0, 'roll': 0, 'yaw': 0, },
}),
return agent_property
def init_waterdrop(self, agent_info, pos_ro):
agent_class = agent_info['type']
team = agent_info['team']
tid = agent_info['tid']
uid = agent_info['uid']
x = +2000
y = (tid * 200)
z = 500 #
agent_property = copy.deepcopy(SubTaskConfig.AgentPropertyDefaults)
agent_property.update({
'DebugAgent': False,
# max drive/fly speed
'MaxMoveSpeed': 1000,
# also influence object mass, please change it with causion!
'AgentScale' : { 'x': 1, 'y': 1, 'z': 1, },
# team belonging
'AgentTeam': team,
# choose ue class to init
'ClassName': agent_class,
# custom args
'RSVD1': '-MyCustomArg1=abc -MyCustomArg2=12345',
# the rank of agent inside the team
'IndexInTeam': tid,
# the unique identity of this agent in simulation system
'UID': uid,
# show color
'Color':'(R=0,G=1,B=0,A=1)' if team==0 else '(R=0,G=0,B=1,A=1)',
# initial location
'InitLocation': { 'x': x, 'y': y, 'z': z, },
# initial facing direction et.al.
'InitRotator': { 'pitch': 0, 'roll': 0, 'yaw': 0, },
}),
return agent_property
def extract_key_gameobj(self, resp):
"""
获取非智能体的仿真物件,例如重要landmark等
"""
keyObjArr = resp['dataGlobal']['keyObjArr']
return keyObjArr
def gen_reward_and_win(self, resp):
"""
奖励的设计在此定义,
(UE端编程死板,虽然预留了相关字段,
但请不要在UE端提供奖励的定义。)
建议:在UE端定义触发奖励的事件,如智能体阵亡、战术目标完成等,见parse_event
"""
reward = [0]*self.n_teams
events = resp['dataGlobal']['events']
WinningResult = None
for event in events:
event_parsed = self.parse_event(event)
# if event_parsed['Event'] == 'Destroyed':
# team = self.find_agent_by_uid(event_parsed['UID']).team
# reward[team] -= 0.05 # this team
# reward[1-team] += 0.10 # opp team
if event_parsed['Event'] == 'EndEpisode':
# print([a.alive * a.hp for a in self.agents])
WaterdropWin = False
WaterdropRank = False
WaterdropReward = 0
ShipWin = -1
ShipRank = -1
ShipReward = 0
EndReason = event_parsed['EndReason']
# According to MISSION\uhmap\SubTasks\UhmapWaterdropConf.py, team 0 is Ship team, team 1 is Waterdrop team
if EndReason == "ShipNumLessThanTheshold" or EndReason == "Team_0_AllDead":
WaterdropWin = True; WaterdropRank = 0; WaterdropReward = 1
ShipWin = False; ShipRank = 1; ShipReward = -1
elif EndReason == "TimeMaxCntReached" or EndReason == "Team_1_AllDead":
WaterdropWin = False; WaterdropRank = 1; WaterdropReward = -1
ShipWin = True; ShipRank = 0; ShipReward = 1
else:
print('unexpected end reaon:', EndReason)
WinningResult = {"team_ranking": [ShipRank, WaterdropRank], "end_reason": EndReason}
reward = [ShipReward, WaterdropReward]
# print(reward)
return reward, WinningResult
@staticmethod
def item_random_mv(src,dst,prob,rand=False):
assert len(src.shape)==1; assert len(dst.shape)==1
if rand: np.random.shuffle(src)
len_src = len(src)
n_mv = (np.random.rand(len_src) < prob).sum()
item_mv = src[range(len_src-n_mv,len_src)]
src = src[range(0,0+len_src-n_mv)]
dst = np.concatenate((item_mv, dst))
return src, dst
@staticmethod
def get_binary_array(n_int, n_bits=8, dtype=np.float32):
arr = np.zeros((*n_int.shape, n_bits), dtype=dtype)
for i in range(n_bits):
arr[:, i] = (n_int%2==1).astype(int)
n_int = n_int / 2
n_int = n_int.astype(np.int8)
return arr
def make_obs(self, resp=None, get_shape=False):
# CORE_DIM = 38
CORE_DIM = 23
assert ScenarioConfig.obs_vec_length == CORE_DIM
if get_shape:
return CORE_DIM
# temporary parameters
OBS_RANGE_PYTHON_SIDE = 15000
MAX_NUM_OPP_OBS = 5
MAX_NUM_ALL_OBS = 5
# get and calculate distance array
pos3d_arr = np.zeros(shape=(self.n_agents, 3), dtype=np.float32)
for i, agent in enumerate(self.agents): pos3d_arr[i] = agent.pos3d
# use the distance matrix calculated by unreal engine to accelerate
# dis_mat = distance_matrix(pos3d_arr) # dis_mat is a matrix, shape = (n_agent, n_agent)
dis_mat = resp['dataGlobal']['distanceMat']
alive_all = np.array([agent.alive for agent in self.agents])
try:
dis_mat[~alive_all,:] = +np.inf
dis_mat[:,~alive_all] = +np.inf
except:
pass
# get team list
team_belonging = np.array([agent.team for agent in self.agents])
# gather the obs arr of all known agents
obs_arr = RawObsArray(key='Agent')
if not hasattr(self, "uid_binary"):
self.uid_binary = self.get_binary_array(np.arange(self.n_agents), 10)
for i, agent in enumerate(self.agents):
assert agent.location is not None
assert agent.uid == i
obs_arr.append(
self.uid_binary[i] # 0~9
)
obs_arr.append([
agent.index, # 10
agent.team, # 11
agent.alive, # 12
agent.uid_remote, # 13
])
obs_arr.append( #[14,15,16,17,18,19]
agent.pos3d
# tear_num_arr(agent.pos3d, n_digits=6, base=10, mv_left=0)
# tear_num_arr(agent.pos3d, 6, ScenarioConfig.ObsBreakBase, 0) # 3 -- > 3*6 = 18 , 18-3=15, 23+15 = 38
)
obs_arr.append(
agent.vel3d
)
obs_arr.append([
agent.hp,
agent.yaw,
agent.max_speed,
])
obs_ = obs_arr.get()
new_obs = my_view(obs_, [self.n_agents, -1])
assert CORE_DIM == new_obs.shape[-1]
OBS_ALL_AGENTS = np.zeros(shape=(self.n_agents, MAX_NUM_OPP_OBS+MAX_NUM_ALL_OBS, CORE_DIM))
# now arranging the individual obs
for i, agent in enumerate(self.agents):
if not agent.alive:
OBS_ALL_AGENTS[i, :] = np.nan
continue
# if alive
# scope
dis2all = dis_mat[i, :]
is_ally = (team_belonging == agent.team)
# scope
a2h_dis = dis2all[~is_ally]
h_alive = alive_all[~is_ally]
h_feature = new_obs[~is_ally]
h_iden_sort = np.argsort(a2h_dis)[:MAX_NUM_OPP_OBS]
a2h_dis_sorted = a2h_dis[h_iden_sort]
h_alive_sorted = h_alive[h_iden_sort]
h_vis_mask = (a2h_dis_sorted <= OBS_RANGE_PYTHON_SIDE) & h_alive_sorted
# scope
h_vis_index = h_iden_sort[h_vis_mask]
h_invis_index = h_iden_sort[~h_vis_mask]
h_vis_index, h_invis_index = self.item_random_mv(src=h_vis_index, dst=h_invis_index,prob=0, rand=True)
h_ind = np.concatenate((h_vis_index, h_invis_index))
h_msk = np.concatenate((h_vis_index<0, h_invis_index>=0)) # "<0" project to False; ">=0" project to True
a2h_feature_sort = h_feature[h_ind]
a2h_feature_sort[h_msk] = 0
if len(a2h_feature_sort)
a2f_dis = dis2all[is_ally]
f_alive = alive_all[is_ally]
f_feature = new_obs[is_ally]
f_iden_sort = np.argsort(a2f_dis)[:MAX_NUM_ALL_OBS]
a2f_dis_sorted = a2f_dis[f_iden_sort]
f_alive_sorted = f_alive[f_iden_sort]
f_vis_mask = (a2f_dis_sorted <= OBS_RANGE_PYTHON_SIDE) & f_alive_sorted
# scope
f_vis_index = f_iden_sort[f_vis_mask]
self_vis_index = f_vis_index[:1] # seperate self and ally
f_vis_index = f_vis_index[1:] # seperate self and ally
f_invis_index = f_iden_sort[~f_vis_mask]
f_vis_index, f_invis_index = self.item_random_mv(src=f_vis_index, dst=f_invis_index,prob=0, rand=True)
f_ind = np.concatenate((self_vis_index, f_vis_index, f_invis_index))
f_msk = np.concatenate((self_vis_index<0, f_vis_index<0, f_invis_index>=0)) # "<0" project to False; ">=0" project to True
self_ally_feature_sort = f_feature[f_ind]
self_ally_feature_sort[f_msk] = 0
if len(self_ally_feature_sort)
parts = tear_number_apart(255, n_digit=10, base=2, mv_left=1)
print(parts)
comb_num_back(parts, n_digit=10, base=2, mv_left=1)
test <2>
parts = tear_number_apart(255.778, n_digit=10, base=10, mv_left=-1)
print(parts)
comb_num_back(parts, n_digit=10, base=10, mv_left=-1)
test <3>
for i in range(1000):
q = (np.random.rand() - 0.5)*1e3
parts = tear_number_apart(q, n_digit=10, base=10, mv_left=0)
print(q, parts)
res = np.abs(comb_num_back(parts, n_digit=10, base=10, mv_left=0)-q) < 1e-6
if not res:
print('??? np.abs(comb_num_back(parts, n_digit=10, base=10, mv_left=0)-q)', np.abs(comb_num_back(parts, n_digit=10, base=10, mv_left=0)-q))
assert False
'''
================================================
FILE: PythonExample/hmp_minimal_modules/MISSION/uhmap/actionset.py
================================================
import numpy as np
ActDigitLen = 100
def strActionToDigits(act_string):
t = [ord(c) for c in act_string]
d_len = len(t)
assert d_len <= ActDigitLen, ("Action string is tooo long! Don't be wordy. Or you can increase ActDigitLen above.")
pad = [-1 for _ in range(ActDigitLen-d_len)]
return (t+pad)
def digitsToStrAction(digits):
if all([a==0 for a in digits]): return 'ActionSet3::N/A;N/A'
arr = [chr(d) for d in digits.astype(int) if d >= 0]
return ''.join(arr)
"""
'ActionSet3::ChangeHeight;100'
"""
================================================
FILE: PythonExample/hmp_minimal_modules/MISSION/uhmap/actionset_v3.py
================================================
import numpy as np
ActDigitLen = 100
def strActionToDigits(act_string):
t = [ord(c) for c in act_string]
d_len = len(t)
assert d_len <= ActDigitLen, ("Action string is tooo long! Don't be wordy. Or you can increase ActDigitLen above.")
pad = [-1 for _ in range(ActDigitLen-d_len)]
return (t+pad)
def digitsToStrAction(digits):
if all([a==0 for a in digits]): return 'ActionSet3::N/A;N/A'
arr = [chr(d) for d in digits.astype(int) if d >= 0]
return ''.join(arr)
"""
'ActionSet3::ChangeHeight;100'
"""
================================================
FILE: PythonExample/hmp_minimal_modules/MISSION/uhmap/actset_lookup.py
================================================
import numpy as np
# # # # # # # # # # # # # # # # # # # # # # #
# # # # # Part 1, interface for RL # # # # #
# # # # # # # # # # # # # # # # # # # # # # #
dictionary_items = [
'ActionSet2::N/A;N/A', # 0
'ActionSet2::Idle;DynamicGuard' , # 1
'ActionSet2::Idle;StaticAlert' , # 2
'ActionSet2::Idle;AggressivePersue' , # 3
'ActionSet2::SpecificMoving;Dir+X' , # 4
'ActionSet2::SpecificMoving;Dir+Y' , # 5
'ActionSet2::SpecificMoving;Dir-X' , # 6
'ActionSet2::SpecificMoving;Dir-Y' , # 7
'ActionSet2::SpecificAttacking;T1-0', # 8
'ActionSet2::SpecificAttacking;T1-1', # 9
'ActionSet2::SpecificAttacking;T1-2', # 10
'ActionSet2::SpecificAttacking;T1-3', # 11
'ActionSet2::SpecificAttacking;T1-4', # 12
'ActionSet2::SpecificAttacking;T0-0', # 13
'ActionSet2::SpecificAttacking;T0-1', # 14
'ActionSet2::SpecificAttacking;T0-2', # 15
'ActionSet2::SpecificAttacking;T0-3', # 16
'ActionSet2::SpecificAttacking;T0-4', # 17
'ActionSet2::PatrolMoving;Dir+X' ,
'ActionSet2::PatrolMoving;Dir+Y' ,
'ActionSet2::PatrolMoving;Dir-X' ,
'ActionSet2::PatrolMoving;Dir-Y' ,
'ActionSet2::Idle;AsFarAsPossible',
'ActionSet2::Idle;StayWhenTargetInRange',
'ActionSet2::Idle;StayWhenTargetInHalfRange' ,
]
dictionary_n_actions = len(dictionary_items)
digit2act_dictionary = {
i: dictionary_items[i] for i, item in enumerate(dictionary_items)
}
act2digit_dictionary = {
dictionary_items[i]:i for i, item in enumerate(dictionary_items)
}
# # # # # # # # # # # # # # # # # # # # # # #
# # # # # Part 2, translate actions # # # # #
# # # # # # # # # # # # # # # # # # # # # # #
agent_json2local_attrs = [
# json key -----> agent key
('agentAlive', 'alive'),
('agentTeam', 'team'),
('indexInTeam', 'index'),
('uId', 'uid_remote'),
('maxMoveSpeed', 'max_speed'),
('agentLocationArr', 'location'),
('agentRotationArr', 'rotation'),
('agentScaleArr', 'scale3'),
('agentVelocityArr', 'velocity'),
('agentHp', 'hp'),
('weaponCD', 'weapon_cd'),
('type', 'type'),
]
# 'ActionSet2::Idle;AsFarAsPossible',
# 'ActionSet2::Idle;StayWhenTargetInRange',
# 'ActionSet2::Idle;StayWhenTargetInHalfRange' ,
def encode_action_as_digits(main_cmd, sub_cmd, x=None, y=None, z=None, UID=None, T=None, T_index=None):
main_cmd_encoder = {
"Idle" : 0,
"SpecificMoving" : 1,
"PatrolMoving" : 2,
"SpecificAttacking" : 3,
"N/A" : 4,
}
sub_cmd_encoder = {
"DynamicGuard" : 0 ,
"StaticAlert" : 1 ,
"AggressivePersue" : 2 ,
"SpecificAttacking" : 3 ,
"AsFarAsPossible" : 4 ,
"StayWhenTargetInRange" : 5 ,
"StayWhenTargetInHalfRange" : 6 ,
"N/A" : 7 ,
'Dir+X' : 8 ,
'Dir+X+Y' : 9 ,
'Dir+Y' : 10,
'Dir-X+Y' : 11,
'Dir-X' : 12,
'Dir-X-Y' : 13,
'Dir-Y' : 14,
'Dir+X-Y' : 15,
}
return np.array([
main_cmd_encoder[main_cmd],
sub_cmd_encoder[sub_cmd],
x if x is not None else np.inf,
y if y is not None else np.inf,
z if z is not None else np.inf,
UID if UID is not None else np.inf,
T if T is not None else np.inf,
T_index if T_index is not None else np.inf
])
def decode_action_as_string(digits):
main_cmd_decoder = {
0 :"Idle" ,
1 :"SpecificMoving" ,
2 :"PatrolMoving" ,
3 :"SpecificAttacking" ,
4 :"N/A" ,
}
sub_cmd_decoder = {
0 : "DynamicGuard" ,
1 : "StaticAlert" ,
2 : "AggressivePersue" ,
3 : "SpecificAttacking" ,
4 : "AsFarAsPossible" ,
5 : "StayWhenTargetInRange" ,
6 : "StayWhenTargetInHalfRange" ,
7 : "N/A" ,
8 : 'Dir+X' ,
9 : 'Dir+X+Y' ,
10 : 'Dir+Y' ,
11 : 'Dir-X+Y' ,
12 : 'Dir-X' ,
13 : 'Dir-X-Y' ,
14 : 'Dir-Y' ,
15 : 'Dir+X-Y' ,
}
main_cmd = main_cmd_decoder[digits[0]]
sub_cmd = sub_cmd_decoder[digits[1]]
x = digits[2] if np.isfinite(digits[2]) else None
y = digits[3] if np.isfinite(digits[3]) else None
z = digits[4] if np.isfinite(digits[4]) else None
UID = digits[5] if np.isfinite(digits[5]) else None
T = digits[6] if np.isfinite(digits[6]) else None
T_index = digits[7] if np.isfinite(digits[7]) else None
if main_cmd == "Idle":
res = 'ActionSet2::Idle;%s'%sub_cmd
assert res in dictionary_items, '指令错误无法解析'
elif main_cmd == "SpecificMoving":
if sub_cmd == 'N/A':
res = 'ActionSet2::SpecificMoving;X=%f Y=%f Z=%f'%(x,y,z)
else:
res = 'ActionSet2::SpecificMoving;%s'%sub_cmd
elif main_cmd == "PatrolMoving":
if sub_cmd == 'N/A':
res = 'ActionSet2::PatrolMoving;X=%f Y=%f Z=%f'%(x,y,z)
else:
res = 'ActionSet2::PatrolMoving;%s'%sub_cmd
elif main_cmd == "SpecificAttacking":
# 'ActionSet2::SpecificAttacking;T1-3',
# 'ActionSet2::SpecificAttacking;UID-4',
assert sub_cmd == 'N/A', '指令错误无法解析'
if UID is not None:
res = 'ActionSet2::SpecificAttacking;UID-%d'%UID
else:
res = 'ActionSet2::SpecificAttacking;T%d-%d'%(T,T_index)
elif main_cmd == "N/A":
res = 'ActionSet2::N/A;N/A'
else:
print('指令错误无法解析')
assert False
return res
# # # # # # # # # # # # # # # # # # # # # # #
# # # # # Part 3, agent init defaults # # # #
# # # # # # # # # # # # # # # # # # # # # # #
AgentPropertyDefaults = {
'ClassName': 'RLA_CAR', # FString ClassName = "";
'DebugAgent': False,
'AgentTeam': 0, # int AgentTeam = 0;
'IndexInTeam': 0, # int IndexInTeam = 0;
'UID': 0, # int UID = 0;
'MaxMoveSpeed': 600, # move speed, test ok
'InitLocation': { 'x': 0, 'y': 0, 'z': 0, },
'InitRotation': { 'x': 0, 'y': 0, 'z': 0, },
'AgentScale' : { 'x': 1, 'y': 1, 'z': 1, }, # agent size, test ok
'InitVelocity': { 'x': 0, 'y': 0, 'z': 0, },
'AgentHp':100,
"WeaponCD": 1, # weapon fire rate
"IsTeamReward": True,
"Type": "",
"DodgeProb": 0.8, # probability of escaping dmg 闪避概率, test ok
"ExplodeDmg": 25, # ms explode dmg. test ok
"FireRange": 1000.0, # <= 1500
"GuardRange": 1400.0, # <= 1500
"PerceptionRange": 1500.0, # <= 1500
'Color':'(R=0,G=1,B=0,A=1)', # color
"FireRange": 1000,
'RSVD1':'',
'RSVD2':'',
}
# # # # # # # # # # # # # # # # # # # # # # #
# # # # # Part 3, framerate selection # # # #
# # # # # # # # # # # # # # # # # # # # # # #
# Check whether a number can be represented precisely by a float
def binary_friendly(x):
y_f16 = np.array(x, dtype=np.float16)
y_f64 = np.array(x, dtype=np.float64)
t = y_f64 - y_f16
assert t.dtype == np.float64
return (t==0)
# '''
# T0-55Destroyed
# T1-616Destroyed
# T0-44Destroyed
# T0-22Destroyed
# T1-717Destroyed
# T1-111Destroyed
# T0-88Destroyed
# T0-77Destroyed
# T1-313Destroyed
# T0-99Destroyed
# T0-66Destroyed
# T0-00Destroyed
# T1-212Destroyed
# T0-11Destroyed
# T0-33Destroyed
# EndEpisodeLose1
# '''
################## ########################## ########################
################## ########################## ########################
################## ########################## ########################
################## single digit encode, not used ########################
# h_map_center = (-7290.0, 6010.0)
# h_grid_size = 400
# v_ground = 340
# v_grid_size = 1000
# x_arr = np.array([h_map_center[0]+v_grid_size*i for i in range(-20, 20)]) # 0~39, 40, 1
# y_arr = np.array([h_map_center[1]+v_grid_size*i for i in range(-20, 20)]) # 0~39, 40, 40
# z_arr = np.array([v_ground+v_grid_size*i for i in range(4)]) # 0~3, 4, 1600
# # offset # 0~1, 2, 6400
# # output $y \in [1000, 12800]$
# def _2digit(main_cmd, x, y, z):
# z_logit = np.argmin(np.abs(z - z_arr))
# x_logit = np.argmin(np.abs(x - x_arr))
# y_logit = np.argmin(np.abs(y - y_arr))
# if main_cmd=='SpecificMoving': cmd_logit = 0
# elif main_cmd=='PatrolMoving': cmd_logit = 1
# ls_mod = [1,40,1600,6400]
# offset = 1000
# x = np.array([x_logit, y_logit, z_logit, cmd_logit])
# print(x)
# y = np.dot(x, ls_mod)+offset
# return y
# def _2coordinate(x):
# offset = 1000
# ls_mod = [1,40,1600,6400]
# x = x - offset
# res = []
# for mod in reversed(ls_mod):
# tmp = x // mod
# x = x - tmp*mod
# res.append(tmp)
# res = list(reversed(res))
# x_logit, y_logit, z_logit, cmd_logit = res
# if cmd_logit == 0 : main_cmd ='SpecificMoving'
# elif cmd_logit == 1 : main_cmd ='PatrolMoving'
# x = x_arr[x_logit]
# y = y_arr[y_logit]
# z = z_arr[z_logit]
# print(main_cmd, x, y, z)
# return main_cmd, x, y, z
################## ########################## ########################
################## ########################## ########################
################## ########################## ########################
================================================
FILE: PythonExample/hmp_minimal_modules/MISSION/uhmap/agent.py
================================================
import numpy as np
from .actset_lookup import agent_json2local_attrs
class Agent(object):
def __init__(self, team, team_id, uid) -> None:
self.team = team
self.team_id = team_id
self.uid = uid
self.attrs = agent_json2local_attrs
for attr_json, attr_agent in self.attrs: setattr(self, attr_agent, None)
self.pos3d = np.array([np.nan, np.nan, np.nan])
self.pos2d = np.array([np.nan, np.nan])
def update_agent_attrs(self, dictionary):
if (not dictionary['agentAlive']):
self.alive = False
else:
assert dictionary['valid']
for attr_json, attr_agent in self.attrs:
setattr(self, attr_agent, dictionary[attr_json])
assert self.uid == self.uid_remote
self.pos3d = np.array(self.location)
self.pos2d = self.pos3d[:2]
self.vel3d = np.array(self.velocity)
self.vel2d = self.pos3d[:2]
self.scale3d = np.array(self.scale3)
self.scale = self.scale3[0]
self.yaw = self.rotation[0]
================================================
FILE: PythonExample/hmp_minimal_modules/MISSION/uhmap/auto_download.py
================================================
import os, commentjson, shutil, subprocess, tqdm, shutil, distutils
from onedrivedownloader import download
try: os.makedirs('./TEMP')
except: pass
def download_from_shared_server(key = 'cat'):
# download uhmap file manifest | 下载manifest目录文件
print('download uhmap file manifest | 下载manifest目录文件')
manifest_url = "https://ageasga-my.sharepoint.com/:u:/g/personal/fuqingxu_yiteam_tech/EVmCQMSUWV5MgREWaxiz_GoBalBRV3DWBU3ToSJ5OTQaLQ?e=I8yjl9"
try:
file = download(manifest_url, filename="./TEMP/", force_download=True)
except:
print('failed to connect to onedrive | 连接onedrive失败, 您可能需要翻墙才能下载资源')
with open("./TEMP/uhmap_manifest.jsonc", "r") as f:
manifest = commentjson.load(f)
if key not in key:
print('The version you are looking for does not exists!')
uhmap_url = manifest[key]
print('download main files | 下载预定文件')
try:
file = download(uhmap_url, filename="./TEMP/DOWNLOAD", unzip=True, unzip_path='./TEMP/UNZIP')
except:
print(f'download timeout | 下载失败, 您可能需要翻墙才能下载资源。另外如果您想手动下载的话: {uhmap_url}')
return file
def download_client_binary_on_platform(desired_path, desired_version, is_render_client, platform):
key = f"Uhmap_{platform}_Build_Version{desired_version}"
print('downloading', key)
download_from_shared_server(key = key)
print('download and extract complete, moving files')
from distutils import dir_util
target_dir = os.path.abspath(os.path.dirname(desired_path) + './..')
dir_util.copy_tree('./TEMP/UNZIP', target_dir)
assert os.path.exists(desired_path), "unexpected path error! Are you using Linux style path on Windows?"
return
def download_client_binary(desired_path, desired_version, is_render_client):
import platform
plat = "Windows"
if platform.system()=="Linux": plat = "Linux"
download_client_binary_on_platform(desired_path, desired_version, is_render_client, platform=plat)
return
================================================
FILE: PythonExample/hmp_minimal_modules/MISSION/uhmap/struct.cpp
================================================
#pragma once
#include "CoreMinimal.h"
#include "Containers/UnrealString.h"
#include "XtensorAPIBPLibrary.h"
#include "DataStruct.generated.h"
USTRUCT(BlueprintType)
struct FAgentProperty
{
GENERATED_BODY()
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
FString ClassName = "";
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
int AgentTeam = 0;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
int IndexInTeam = 0;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
int UID = 0;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
bool DebugAgent = false;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
float MaxMoveSpeed = 0;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
FVector InitLocation;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
FVector InitRotation;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
FRotator InitRotator;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
FVector AgentScale;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
FVector InitVelocity;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
float AgentHp;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
float WeaponCD = 0;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
bool IsTeamReward = false;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
FString Type = "";
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
FString WeaponType = "";
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
FString Color = "";
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
float DodgeProb = 0.0;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
float ExplodeDmg = 20.0;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
float FireRange = 1000.0;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
float GuardRange = 1400.0;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
float PerceptionRange = 1400.0;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
FString RSVD1 = "";
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
FString RSVD2 = "";
};
USTRUCT(BlueprintType)
struct FParsedDataInput
{
// please change lines in
// bool AHMPLevelScriptActor::ParsedTcpInData()
// together with this struct
GENERATED_BODY()
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
bool valid = false;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
FString DataCmd;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
int NumAgents = 0;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
TArray AgentSettingArray;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
int TimeStep = 0;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
int TimeStepMax = 0;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
TArray Actions;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
TArray StringActions;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
FString RSVD1 = "";
};
USTRUCT(BlueprintType)
struct FAgentDataOutput
{
GENERATED_BODY()
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
bool Valid = false;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
bool AgentAlive = true;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
int AgentTeam = 0;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
int IndexInTeam = 0;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
int UID = 0;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
float MaxMoveSpeed = 0;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
FVector AgentLocation;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
FRotator AgentRotation;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
FVector AgentScale;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
FVector AgentVelocity;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
float AgentHp;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
float WeaponCD = 0;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
int PreviousAction;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
TArray AvailActions;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
float Reward;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
bool IsTeamReward = false;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
TArray Interaction;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
FString Type = "";
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
FString RSVD1 = "";
};
USTRUCT(BlueprintType)
struct FKeyObjDataOutput
{
GENERATED_BODY()
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
bool Valid = false;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
int UID;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
FString ClassName;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
FVector Location;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
FRotator Rotation;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
FVector Scale;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
FVector Velocity;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
float Hp;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
FString RSVD1 = "";
};
USTRUCT(BlueprintType)
struct FGlobalDataOutput
{
GENERATED_BODY()
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
bool Valid = false;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
float TeamReward = 0;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
bool UseTeamReward = false;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
TArray Events;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
TArray VisibleMatFlatten;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
TArray DisMatFlatten;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
float MaxEpisodeStep = 999;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
int TimeCnt = 0;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
float Time = 0;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
bool EpisodeDone = false;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
FString EpisodeEndReason = "unknown";
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
int TeamWin = -1;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
TArray KeyObjArr;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
FString LevelName = "";
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
FXTensor DistanceMat;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
FString RSVD1 = "";
};
USTRUCT(BlueprintType)
struct FAgentDataOutputArr
{
GENERATED_BODY()
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
bool Valid = false;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
TArray DataArr;
UPROPERTY(EditDefaultsOnly, BlueprintReadWrite)
FGlobalDataOutput DataGlobal;
};
================================================
FILE: PythonExample/hmp_minimal_modules/MISSION/uhmap/uhmap.md
================================================
# Unreal HMAP (UHMAP) 混合多智能体平台-虚幻仿真模块
## UHMAP 中对虚幻源代码的修改
- (1) 将lz4的接口暴露在外,方便使用
```
F:\UnrealSourceCode\UnrealEngine-4.27.2-release\Engine\Source\Runtime\Core\Public\Compression\lz4.h
新增一行
#define LZ4_DLL_EXPORT 1
```
- (2) 将AIPerception Sight的计算量拉高
```
F:\UnrealSourceCode\UnrealEngine-4.27.2-release\Engine\Source\Runtime\AIModule\Private\Perception\AISense_Sight.cpp
```
修改参数,这两个参数增大,有助于尽早发现进入范围的智能体(源代码中为了运行效率牺牲了实时性,用运行时间和Trace数量加以约束)
```
static const int32 DefaultMaxTracesPerTick = 16;
static const int32 DefaultMinQueriesPerTimeSliceCheck = 40;
```
## Switching MISSION to UHMAP in Json Config 切至虚幻仿真模块
Please use following template:
请使用以下配置文件模板:
```jsonc
{
// config HMP core
"config.py->GlobalConfig": {
"note": "uhmp-dev",
"env_name": "uhmap", // ***
"env_path": "MISSION.uhmap", // ***
"draw_mode": "Img",
"num_threads": "1",
// "heartbeat_on": "False",
"report_reward_interval": "1",
"test_interval": "128",
"test_epoch": "4",
"device": "cuda",
"max_n_episode": 500000,
"fold": "1",
"backup_files": [
]
},
// config MISSION
"MISSION.uhmap.uhmap_env_wrapper.py->ScenarioConfig": { // ***
"N_AGENT_EACH_TEAM": [3, 2], // update N_AGENT_EACH_TEAM
"MaxEpisodeStep": 30,
"n_actions": 10,
"StateProvided": false,
"render": false,
"SubTaskSelection": "UhmapBreakingBad",
"UhmapPort": 21051,
// "UhmapServerExe": "",
"UhmapRenderExe": "./../../WindowsNoEditor/UHMP.exe",
"UhmapServerExe": "./../../WindowsServer/UHMPServer.exe",
"TimeDilation": 1.25, // 时间膨胀系数
"TEAM_NAMES": [
"ALGORITHM.script_ai.dummy_uhmap->DummyAlgorithmT1", // *** select ALGORITHMs
"ALGORITHM.script_ai.dummy_uhmap->DummyAlgorithmT2" // *** select ALGORITHMs
]
},
// config ALGORITHMs
"ALGORITHM.script_ai.dummy_uhmap.py->DummyAlgConfig": {
"reserve": ""
}
}
```
## Configurations 重要配置参数
path:json配置文件
| Field | Value | Explaination | zh Explaination |
| ---- | ---- | ---- | ---- |
| device | ```str``` | select gpu | 选择GPU或CPU |
| N_AGENT_EACH_TEAM | ```list of int``` | Agent Num in Each Team | 各队智能体数量 |
| MaxEpisodeStep | ```int``` | Time Step Limit | 对战时间步数限制 |
| n_actions | ```int``` | ---- | 强化学习预留 |
| render | ```bool``` | use render server | 是否使用渲染 |
| UhmapPort | ```int``` | | 临时,端口选择,后期将改为自动 |
| UhmapPort | ```int``` | | 临时,端口选择,后期将改为自动 |
| TimeDilation | ```float``` | | 时间膨胀,减小实现以实现慢动作,增大可以让CPU燃烧 |
| TEAM_NAMES | ```str``` | | 分别指定一队、二队策略 |
## Unreal Agent Initializing Options 智能体初始化参数
path:```MISSION\uhmap\SubTasks\UhmapBreakingBad.py```
function:```reset```
| Field | Value | Explaination | zh Explaination |
| ---- | ---- | ---- | ---- |
| ClassName | ```str``` | Select Agent Class in unreal engine side | |
| AgentTeam | ```int``` | team belonging of an agent | 智能体的队伍归属 |
| IndexInTeam | ```int``` | team index of an agent | 智能体在队伍中的编号 |
| UID | ```int``` | index of an agent in the environment | 智能体在虚幻仿真中的唯一编号 |
| MaxMoveSpeed | ```float``` | | 暂未接入,无效 |
| AgentHp | ```int``` | | 初始生命值 |
| WeaponCD | ```float``` | | 武器cooldown时间,单位秒 |
| RSVD1 | ```str``` | | 智能体展现颜色 |
| InitLocation | ```dict``` | | 智能体初始位置 |
## Unit conversion 单位转换
Length unit in the system is 1mm,
e.g. 800 = 800mm = 0.8m
## Algorithm For demonstration
path:```ALGORITHM\script_ai\dummy_uhmap.py```
function:```interact_with_env(override)```
### Argument:
| Field | Value | Explaination | zh Explaination |
| ---- | ---- | ---- | ---- |
| ```State_Recall['Latest-Obs']``` | | observation array for reinforcement learning | |
| ```State_Recall['ENV-PAUSE']``` | | show which thread is paused (refer to [TimeLine](./../../VISUALIZE/md_imgs/timeline.jpg)) | |
| ```State_Recall['Current-Obs-Step']``` | | show time step index in an episode | |
| ```State_Recall['Latest-Team-Info']``` | | interfacing with script-based AIs, including structed agent location, uid, et.al. | |
| ```State_Recall['Test-Flag']'``` | | show whether HMP central has recommanded to do a test run for RL | |
| ```'State_Recall['Env-Suffered-Reset']''``` | | show whether a thread has be reset and start a new episode | |
### Convert Command Format:
#### attack a agent with UID
```python
encode_action_as_digits("SpecificAttacking", "N/A", x=None, y=None, z=None, UID=4, T=None, T_index=None)
```
#### PatrolMoving with coordinate
```python
encode_action_as_digits("PatrolMoving", "N/A", x=444*5, y=444*5, z=379, UID=None, T=None, T_index=None)
```
#### PatrolMoving with direction
```python
encode_action_as_digits("PatrolMoving", "Dir+X+Y", x=None, y=None, z=None, UID=None, T=None, T_index=None)
encode_action_as_digits("PatrolMoving", "Dir+X-Y", x=None, y=None, z=None, UID=None, T=None, T_index=None)
encode_action_as_digits("PatrolMoving", "Dir+X", x=None, y=None, z=None, UID=None, T=None, T_index=None)
```
#### SpecificMoving with coordinate
```python
encode_action_as_digits("SpecificMoving", "N/A", x=444*5, y=444*5, z=379, UID=None, T=None, T_index=None)
```
#### SpecificMoving with direction
```python
encode_action_as_digits("SpecificMoving", "Dir+X+Y", x=None, y=None, z=None, UID=None, T=None, T_index=None)
encode_action_as_digits("SpecificMoving", "Dir+X-Y", x=None, y=None, z=None, UID=None, T=None, T_index=None)
encode_action_as_digits("SpecificMoving", "Dir+X", x=None, y=None, z=None, UID=None, T=None, T_index=None)
```
#### Idle and change guard state
```python
encode_action_as_digits("Idle", "DynamicGuard", x=None, y=None, z=None, UID=None, T=None, T_index=None)
encode_action_as_digits("Idle", "StaticAlert", x=None, y=None, z=None, UID=None, T=None, T_index=None)
encode_action_as_digits("Idle", "AggressivePersue", x=None, y=None, z=None, UID=None, T=None, T_index=None)
```
================================================
FILE: PythonExample/hmp_minimal_modules/MISSION/uhmap/uhmap_env_wrapper.py
================================================
import json, os, subprocess, time, stat, platform, importlib
import numpy as np
from UTIL.colorful import print蓝, print靛, print亮红
from UTIL.network import TcpClientP2PWithCompress, find_free_port_no_repeat, get_host_ip
from UTIL.config_args import ChainVar
from config import GlobalConfig
from ..common.base_env import BaseEnv
from .actset_lookup import binary_friendly, dictionary_n_actions
from .agent import Agent
# please register this into MISSION/env_router.py
def make_uhmap_env(env_id, rank):
if ScenarioConfig.SubTaskSelection == 'UhmapEnv':
return UhmapEnv(rank)
else:
ST = ScenarioConfig.SubTaskSelection
assert os.path.exists(f'./MISSION/uhmap/SubTasks/{ST}.py'), "Unknown subtask!"
ST_CLASS = getattr(importlib.import_module(f'.SubTasks.{ST}', package='MISSION.uhmap'), ST)
return ST_CLASS(rank)
def get_subtask_conf(subtask):
ST = subtask
assert os.path.exists(f'./MISSION/uhmap/SubTasks/{ST}Conf.py'), "Configuration not found!"
ST_CONF_CLASS = getattr(importlib.import_module(f'.SubTasks.{ST}Conf', package='MISSION.uhmap'), 'SubTaskConfig')
return ST_CONF_CLASS
def usual_id_arrangment(N_AGENT_EACH_TEAM):
"""
e.g.,
input [5, 3]
output [range(0,5), range(5,8)]
"""
AGENT_ID_EACH_TEAM = []
p = 0
for team_agent_num in N_AGENT_EACH_TEAM:
AGENT_ID_EACH_TEAM.append(range(p, p + team_agent_num))
p += team_agent_num
return AGENT_ID_EACH_TEAM
# please register this ScenarioConfig into MISSION/env_router.py
class ScenarioConfig(object):
'''
ScenarioConfig: This config class will be 'injected' with new settings from JSONC.
(E.g., override configs with ```python main.py --cfg example.jsonc```)
(As the name indicated, ChainVars will change WITH vars it 'chained_with' during config injection)
(please see UTIL.config_args to find out how this advanced trick works out.)
'''
# Needed by the hmp core #
N_AGENT_EACH_TEAM = [10, ]
AGENT_ID_EACH_TEAM = usual_id_arrangment(N_AGENT_EACH_TEAM)
N_TEAM = len(N_AGENT_EACH_TEAM)
# chained parameters, will change along with 'N_AGENT_EACH_TEAM'
AGENT_ID_EACH_TEAM_cv = ChainVar(lambda N_AGENT_EACH_TEAM: usual_id_arrangment(N_AGENT_EACH_TEAM), chained_with=['N_AGENT_EACH_TEAM'])
N_TEAM_cv = ChainVar(lambda N_AGENT_EACH_TEAM: len(N_AGENT_EACH_TEAM), chained_with=['N_AGENT_EACH_TEAM'])
# algorithm selection
TEAM_NAMES = ['ALGORITHM.None->None',]
'''
## If the length of action array == the number of teams, set ActAsUnity to True
## If the length of action array == the number of agents, set ActAsUnity to False
'''
ActAsUnity = False
'''
## If the length of reward array == the number of agents, set RewardAsUnity to False
## If the length of reward array == 1, set RewardAsUnity to True
'''
RewardAsUnity = True
'''
## If the length of obs array == the number of agents, set ObsAsUnity to False
## If the length of obs array == the number of teams, set ObsAsUnity to True
'''
ObsAsUnity = False
# Needed by env itself #
MaxEpisodeStep = 100
render = False
TcpAddr = '127.0.0.1'
UhmapPort = 21051
UnrealLevel = 'UhmapBreakingBad'
SubTaskSelection = 'UhmapBreakingBad'
SubTaskConfig = get_subtask_conf(UnrealLevel)
SubTaskConfig_cv = ChainVar(lambda UnrealLevel:get_subtask_conf(UnrealLevel), chained_with=['SubTaskSelection'])
UElink2editor = False
AutoPortOverride = True
# AutoPortOverride is usually the reverse of UElink2editor
AutoPortOverride_cv = ChainVar(lambda UElink2editor:(not UElink2editor), chained_with=['UElink2editor'])
# this is not going to be precise,
# the precise step time will be floor(StepGameTime/TimeDilation*FrameRate)*TimeDilation/FrameRate
StepGameTime = 0.5
UhmapServerExe = 'F:/UHMP/Build/WindowsServer/UHMPServer.exe'
UhmapRenderExe = ''
TimeDilation = 1.0 # engine calcualtion speed control
FrameRate = 25.6 # must satisfy: (TimeDilation=1*n, FrameRate=25.6*n)
FrameRate_cv = ChainVar(lambda TimeDilation: (TimeDilation/1 * 25.6), chained_with=['TimeDilation'])
UhmapStartCmd = []
# Needed by some ALGORITHM #
StateProvided = False
AvailActProvided = False
EntityOriented = True
ActionFormat = 'ASCII' # 'ASCII'/'Multi-Digit'/'Single-Digit'
n_actions = dictionary_n_actions
obs_vec_length = get_subtask_conf(UnrealLevel).obs_vec_length
obs_vec_length_cv = ChainVar(lambda UnrealLevel:get_subtask_conf(UnrealLevel).obs_vec_length, chained_with=['SubTaskSelection'])
obs_n_entity = get_subtask_conf(UnrealLevel).obs_n_entity
obs_n_entity_cv = ChainVar(lambda UnrealLevel:get_subtask_conf(UnrealLevel).obs_n_entity, chained_with=['SubTaskSelection'])
# # ObsBreakBase = 1e4
UhmapVersion = '2.3'
CanTurnOff = False
# Hete agents
HeteAgents = False
# 演示demo类别
DemoType = "Default"
class UhmapEnvParseHelper:
def parse_response_ob_info(self, response):
raise NotImplementedError
def make_obs(self):
raise NotImplementedError
class UhmapEnv(BaseEnv, UhmapEnvParseHelper):
def __init__(self, rank) -> None:
super().__init__(rank)
self.id = rank
self.render = ScenarioConfig.render and (self.id==0)
self.n_agents = sum(ScenarioConfig.N_AGENT_EACH_TEAM)
assert self.n_agents == len(ScenarioConfig.SubTaskConfig.agent_list), 'agent number defination error'
self.n_teams = ScenarioConfig.N_TEAM
self.sim_thread = None
self.client = None
# self.observation_space = ?
# self.action_space = ?
if ScenarioConfig.StateProvided:
# self.observation_space['state_shape'] = ?
pass
# Restart env, this is very fast, can be a failsafe if there is memory leaking away on UE side
self.max_simulation_life = 2048
self.simulation_life = self.max_simulation_life
# with a lock, we can initialize UE side one by one (not necessary though)
# wait until thread 0 finish its initialization (to avoid a traffic jam in server memory)
traffic_light = './TEMP/uhmap_thread_0_init_ok_%s'%GlobalConfig.machine_info['ExpUUID'][:8]
if rank != 0:
while not os.path.exists(traffic_light): time.sleep(1)
self.activate_simulation(self.id, find_port=True)
# thread 0 finish its initialization,
if rank == 0:
with open(traffic_light, mode='w+') as f: f.write(traffic_light)
def __del__(self):
self.terminate_simulation()
def activate_simulation(self, rank, find_port=True):
print('thread %d initializing'%rank)
self.sim_thread = 'activiting'
if find_port:
self.render = ScenarioConfig.render # and (rank==0)
self.hmp_ue_port = ScenarioConfig.UhmapPort
if ScenarioConfig.AutoPortOverride:
self.hmp_ue_port, release_port_fn = find_free_port_no_repeat() # port for hmp data exchanging
if not ScenarioConfig.UElink2editor:
self.ue_vis_port, release_port_fn = find_free_port_no_repeat() # port for remote visualizing
# self.ue_vis_port = 32222
print蓝('Port %d will be used by hmp, port %d will be used by UE internally'%(self.hmp_ue_port, self.ue_vis_port))
if (not self.render) and (not ScenarioConfig.UElink2editor):
print蓝('To visualize on Windows, run "./UHMP.exe -OpenLevel=%s:%d -WINDOWED -TimeDilation=%.8f -FrameRate=%.8f -IOInterval=%.8f -DebugMod=False -LockGameDuringCom=True"'%(
get_host_ip(), self.ue_vis_port, ScenarioConfig.TimeDilation, ScenarioConfig.FrameRate, ScenarioConfig.StepGameTime))
self.ip_port = (ScenarioConfig.TcpAddr, self.hmp_ue_port)
# os.system()
if not ScenarioConfig.UElink2editor:
assert ScenarioConfig.AutoPortOverride
# * A Butterfly Effect problem *:
# UE4 use float (instead of double) for time delta calculation,
# causing some error calcualtion dt = 1/FrameRate
# which will be enlarged due to Butterfly Effect
# therefore we have to make sure that FrameRate = 16,32,64,...
print('checking ScenarioConfig args problems ...')
assert ScenarioConfig.TimeDilation <= 128, "* TimeDilation <= 128 *"
assert binary_friendly(1/ScenarioConfig.FrameRate), "* A Butterfly Effect problem *"
assert binary_friendly(ScenarioConfig.TimeDilation/256), "* A Butterfly Effect problem *"
# real_step_time =
# np.floor(ScenarioConfig.StepGameTime/ScenarioConfig.TimeDilation*ScenarioConfig.FrameRate)
# * ScenarioConfig.TimeDilation / ScenarioConfig.FrameRate
if not self.render:
simulation_exe = ScenarioConfig.UhmapServerExe
assert 'Server' in simulation_exe
else:
simulation_exe = ScenarioConfig.UhmapRenderExe
assert 'NoEditor' in simulation_exe
if platform.system()=="Linux":
if self.render: assert False, "You really want to render on Linux? If so, remove this line."
if simulation_exe.endswith('.exe'):
simulation_exe = simulation_exe.replace('/Windows', '/Linux')
simulation_exe = simulation_exe.replace('.exe','.sh')
# expand '~' path
simulation_exe = os.path.expanduser(simulation_exe)
else: # Windows
if simulation_exe.endswith('.sh'):
simulation_exe = simulation_exe.replace('/Linux', '/Windows')
simulation_exe = simulation_exe.replace('.sh', '.exe')
if simulation_exe.startswith('/home'):
simulation_exe = './TEMP' + simulation_exe
if not os.path.exists(simulation_exe):
if self.rank == 0:
from .auto_download import download_client_binary
download_client_binary(desired_path=simulation_exe, desired_version=ScenarioConfig.UhmapVersion, is_render_client=self.render)
else:
while True:
time.sleep(60)
if os.path.exists(simulation_exe): break
# give execution permission
if platform.system()=="Linux":
st = os.stat(simulation_exe)
os.chmod(simulation_exe, st.st_mode | stat.S_IEXEC)
if (not self.render) and simulation_exe != '':
# start child process
self.sim_thread = subprocess.Popen([
simulation_exe,
# '-log',
'-TcpPort=%d'%self.hmp_ue_port, # port for hmp data exchanging
'-Port=%d'%self.ue_vis_port, # port for remote visualizing
'-OpenLevel=%s'%ScenarioConfig.UnrealLevel,
'-TimeDilation=%.8f'%ScenarioConfig.TimeDilation,
'-FrameRate=%.8f'%ScenarioConfig.FrameRate,
'-IOInterval=%.8f'%ScenarioConfig.StepGameTime,
'-Seed=%d'%int(np.random.rand()*1e5), # 如果已经设定了主线程随机数种子,这里随机出来的数字则是确定的
'-DebugMod=False',
# '-LLMCSV',
'-ABSLOG=%s'%os.path.abspath('./TEMP/uhmap/%s/%d.log'%(GlobalConfig.machine_info['ExpUUID'][:8], rank)),
'-Version=%s'%ScenarioConfig.UhmapVersion,
'-LockGameDuringCom=True',
], stdout=subprocess.DEVNULL)
print('UHMAP (Headless) started ...')
elif self.render and simulation_exe != '':
self.sim_thread = subprocess.Popen([
simulation_exe,
# '-log',
'-TcpPort=%d'%self.hmp_ue_port, # port for hmp data exchanging
'-Port=%d'%self.ue_vis_port, # port for remote visualizing
'-OpenLevel=%s'%ScenarioConfig.UnrealLevel,
'-TimeDilation=%.8f'%ScenarioConfig.TimeDilation,
'-FrameRate=%.8f'%ScenarioConfig.FrameRate,
'-IOInterval=%.8f'%ScenarioConfig.StepGameTime,
'-Seed=%d'%int(np.random.rand()*1e5), # 如果已经设定了主线程随机数种子,这里随机出来的数字则是确定的
'-DebugMod=False',
# '-LLMCSV',
'-ABSLOG=%s'%os.path.abspath('./TEMP/uhmap/%s/%d.log'%(GlobalConfig.machine_info['ExpUUID'][:8], rank)),
'-Version=%s'%ScenarioConfig.UhmapVersion,
'-LockGameDuringCom=True',
"-ResX=1280",
"-ResY=720",
"-WINDOWED"
], stdout=subprocess.DEVNULL)
print('UHMAP (Render) started ...')
else:
print('Cannot start Headless Server Or GUI Server!')
assert False, 'Cannot start Headless Server Or GUI Server!'
else:
print('Trying to link to unreal editor ...')
assert not ScenarioConfig.AutoPortOverride
time.sleep(1+np.abs(self.id)/100)
self.client = TcpClientP2PWithCompress(self.ip_port)
MAX_RETRY = 150
for i in range(MAX_RETRY):
try:
self.client.manual_connect()
print('handshake complete %d'%rank)
break
except:
if i>25:
print('Thread %d: Trying to connect to unreal engine. Related library not in memory, going to take some minutes. Retry %d ...'%(rank, i))
elif i>75:
print('Thread %d: Waiting too long, please reduce parallel threads (num_threads), Retry %d ... | 请减小num_threads运行一次, 让动态库载入内存, 然后恢复num_threads即可'%(rank, i))
elif i >= MAX_RETRY-1:
assert False, ('uhmap connection timeout, please reduce parallel threads (num_threads) !')
time.sleep(1)
# now that port is bind, no need to hold them anymore
if find_port:
if ScenarioConfig.AutoPortOverride:
release_port_fn(self.hmp_ue_port)
if not ScenarioConfig.UElink2editor:
release_port_fn(self.ue_vis_port)
self.t = 0
print('thread %d initialize complete'%rank)
def terminate_simulation(self):
if hasattr(self,'sim_thread') and (self.sim_thread is not None) and (self.client is not None):
# self.sim_thread.terminate()
# send terminate command to unreal side
self.client.send_dgram_to_target(json.dumps({
'valid': True,
'DataCmd': 'end_unreal_engine',
'TimeStepMax': ScenarioConfig.MaxEpisodeStep,
'TimeStep' : 0,
'Actions': None,
}))
self.client.close()
self.sim_thread = None
self.client = None
# override reset function
def reset(self):
self.simulation_life -= 1
if self.simulation_life < 0:
print('restarting simutation')
self.terminate_simulation()
self.simulation_life = self.max_simulation_life
self.activate_simulation(self.id, find_port=False)
def sleep(self):
self.simulation_life = -1
self.terminate_simulation()
# override step function
def step(self, act):
raise NotImplementedError
# return (ob, RewardForAllTeams, done, info) # choose this if RewardAsUnity
================================================
FILE: PythonExample/hmp_minimal_modules/README.md
================================================
# HMP:Hybrid Multi-agent Playground
See https://github.com/binary-husky/hmp2g
# Run demo in Editor mode
```
(Open map UhmapLargeScale in Unreal Editor)
cd PythonExample/hmp_minimal_modules
python main.py -c ZHECKPOINT/uhmap_hete10vs10/render_result_editor.jsonc
```
# Run tutorial of designing custom actions in Editor mode
```
(Open map UhmapWaterdrop or UhmapLargeScale in Unreal Editor)
cd PythonExample/hmp_minimal_modules
python main.py -c ZDOCS/examples/uhmap/random_waterdrop.jsonc
```
================================================
FILE: PythonExample/hmp_minimal_modules/UTIL/__init__.py
================================================
================================================
FILE: PythonExample/hmp_minimal_modules/UTIL/auto_gpu.py
================================================
"""
Created on Tue Aug 22 19:41:55 2017
@author: Quantum Liu
"""
'''
Example:
gm=GPUManager()
with torch.cuda.device(gm.auto_choice()):
blabla
Or:
gm=GPUManager()
torch.cuda.set_device(gm.auto_choice())
'''
import os, time
from UTIL.colorful import print黄
class sel_gpu():
'''
qargs:
query arguments
A manager which can list all available GPU devices
and sort them and choice the most free one.Unspecified
ones pref.
GPU设备管理器, 考虑列举出所有可用GPU设备, 并加以排序, 自动选出
最空闲的设备。在一个GPUManager对象内会记录每个GPU是否已被指定,
优先选择未指定的GPU。
'''
def __init__(self,qargs=[]):
'''
'''
self.qargs=qargs
def _sort_by_memory(self,gpus,by_size=False):
for gpu in gpus:
# 优先使用A100显卡
if 'A100' in gpu['gpu_name']:
gpu['memory.free'] *= 1.25
gpu['memory.total'] *= 1.25
if by_size:
print黄('Sorted by free memory size')
res = sorted(gpus,key=lambda d:d['memory.free'],reverse=True)
return res
else:
print黄('Sorted by free memory rate')
return sorted(gpus,key=lambda d:float(d['memory.free'])/ d['memory.total'],reverse=True)
def _sort_by_power(self,gpus):
return sorted(gpus,key='by_power')
def _sort_by_custom(self,gpus,key,reverse=False,qargs=[]):
if isinstance(key,str) and (key in qargs):
return sorted(gpus,key=lambda d:d[key],reverse=reverse)
if isinstance(key,type(lambda a:a)):
return sorted(gpus,key=key,reverse=reverse)
raise ValueError("The argument 'key' must be a function or a key in query args,please read the documention of nvidia-smi")
def auto_choice(self,mode=0):
'''
mode:
0:(default)sorted by free memory size
return:
a TF device object
Auto choice the freest GPU device,not specified
ones
自动选择最空闲GPU,返回索引
'''
from UTIL.colorful import print黄
self.gpus=self.query_gpu(self.qargs)
for gpu in self.gpus:
gpu['specified']=False
self.gpu_num=len(self.gpus)
# if not self.check_gpus():
# raise ImportError('GPU available check failed')
for old_infos,new_infos in zip(self.gpus,self.query_gpu(self.qargs)):
old_infos.update(new_infos)
unspecified_gpus=[gpu for gpu in self.gpus if not gpu['specified']] or self.gpus
if mode==0:
chosen_gpu=self._sort_by_memory(unspecified_gpus,True)[0]
print黄('Choosing the GPU device has largest free memory...\n')
elif mode==1:
chosen_gpu=self._sort_by_power(unspecified_gpus)[0]
print黄('Choosing the GPU device has highest free memory rate...\n')
elif mode==2:
chosen_gpu=self._sort_by_power(unspecified_gpus)[0]
print黄('Choosing the GPU device by power...\n')
else:
chosen_gpu=self._sort_by_memory(unspecified_gpus)[0]
print黄('Given an unaviliable mode,will be chosen by memory\n')
chosen_gpu['specified']=True
index=chosen_gpu['index']
print黄('Using GPU {i}:\n{info}'.format(i=index,info='\n'.join([str(k)+':'+str(v) for k,v in chosen_gpu.items()])))
return int(index)
@staticmethod
def check_gpus():
'''
GPU available check
http://pytorch-cn.readthedocs.io/zh/latest/package_references/torch-cuda/
'''
import torch
if not torch.cuda.is_available():
print黄('This script could only be used to manage NVIDIA GPUs,but no GPU found in your device')
return False
with os.popen('nvidia-smi -h') as f:
if not 'NVIDIA System Management' in f.read():
print黄("'nvidia-smi' tool not found.")
f.close()
return False
f.close()
return True
@staticmethod
def parse(line,qargs):
'''
line:
a line of text
qargs:
query arguments
return:
a dict of gpu infos
Pasing a line of csv format text returned by nvidia-smi
解析一行nvidia-smi返回的csv格式文本
'''
numberic_args = ['memory.free', 'memory.total', 'power.draw', 'power.limit'] #可计数的参数
power_manage_enable=lambda v:(('Not Support' not in v) and ('[N/A]' not in v)) #lambda表达式,显卡是否支持power management(笔记本可能不支持)
to_numberic=lambda v:float(v.upper().strip().replace('MIB','').replace('W','')) #带单位字符串去掉单位
process = lambda k,v:((int(to_numberic(v)) if power_manage_enable(v) else 1) if k in numberic_args else v.strip())
return {k:process(k,v) for k,v in zip(qargs,line.strip().split(','))}
def query_gpu(self, qargs=[]):
'''
qargs:
query arguments
return:
a list of dict
Querying GPUs infos
查询GPU信息
'''
qargs =['index','gpu_name', 'memory.free', 'memory.total', 'power.draw', 'power.limit']+ qargs
cmd = 'nvidia-smi --query-gpu={} --format=csv,noheader'.format(','.join(qargs))
results = os.popen(cmd).readlines()
return [self.parse(line,qargs) for line in results]
@staticmethod
def by_power(d):
'''
helper function fo sorting gpus by power
'''
power_infos=(d['power.draw'],d['power.limit'])
if any(v==1 for v in power_infos):
print黄('Power management unable for GPU {}'.format(d['index']))
return 1
return float(d['power.draw'])/d['power.limit']
================================================
FILE: PythonExample/hmp_minimal_modules/UTIL/batch_exp.py
================================================
import subprocess
import threading
import copy, os
import time
import json
from UTIL.network import get_host_ip
from UTIL.colorful import *
def get_info(script_path):
info = {
'HostIP': get_host_ip(),
'RunPath': os.getcwd(),
'ScriptPath': os.path.abspath(script_path),
'StartDateTime': time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
}
try:
info['DockerContainerHash'] = subprocess.getoutput(r'cat /proc/self/cgroup | grep -o -e "docker/.*"| head -n 1 |sed "s/docker\\/\\(.*\\)/\\1/" |cut -c1-12')
except:
info['DockerContainerHash'] = 'None'
return info
def run_batch_exp(sum_note, n_run, n_run_mode, base_conf, conf_override, script_path):
arg_base = ['python', 'main.py']
time_mark_only = time.strftime("%Y-%m-%d-%H-%M-%S", time.localtime())
time_mark = time.strftime("%Y-%m-%d-%H-%M-%S", time.localtime()) + '-' + sum_note
log_dir = '%s/'%time_mark
exp_log_dir = log_dir+'exp_log'
if not os.path.exists('PROFILE/%s'%exp_log_dir):
os.makedirs('PROFILE/%s'%exp_log_dir)
exp_json_dir = log_dir+'exp_json'
if not os.path.exists('PROFILE/%s'%exp_json_dir):
os.makedirs('PROFILE/%s'%exp_json_dir)
conf_list = []
new_json_paths = []
for i in range(n_run):
conf = copy.deepcopy(base_conf)
new_json_path = 'PROFILE/%s/run-%d.json'%(exp_json_dir, i+1)
for key in conf_override:
assert n_run == len(conf_override[key]), ('检查!n_run是否对应')
tree_path, item = key.split('-->')
conf[tree_path][item] = conf_override[key][i]
with open(new_json_path,'w') as f:
json.dump(conf, f, indent=4)
# print(conf)
conf_list.append(conf)
new_json_paths.append(new_json_path)
print红('\n')
print红('\n')
print红('\n')
printX = [
print亮红, print亮绿, print亮黄, print亮蓝, print亮紫, print亮靛,
print红, print绿, print黄, print蓝, print紫, print靛,
print亮红, print亮绿, print亮黄, print亮蓝, print亮紫, print亮靛,
print红, print绿, print黄, print蓝, print紫, print靛,
print亮红, print亮绿, print亮黄, print亮蓝, print亮紫, print亮靛,
print红, print绿, print黄, print蓝, print紫, print靛,
print亮红, print亮绿, print亮黄, print亮蓝, print亮紫, print亮靛,
print红, print绿, print黄, print蓝, print紫, print靛,
print亮红, print亮绿, print亮黄, print亮蓝, print亮紫, print亮靛,
print红, print绿, print黄, print蓝, print紫, print靛,
]
conf_base_ = conf_list[0]
for k_ in conf_base_:
conf_base = conf_base_[k_]
for key in conf_base:
different = False
for i in range(len(conf_list)):
if conf_base[key]!=conf_list[i][k_][key]:
different = True
break
#
if different:
for i in range(len(conf_list)):
printX[i](key, conf_list[i][k_][key])
else:
print(key, conf_base[key])
final_arg_list = []
for ith_run in range(n_run):
final_arg = copy.deepcopy(arg_base)
final_arg.append('--cfg')
final_arg.append(new_json_paths[ith_run])
final_arg_list.append(final_arg)
print('')
def local_worker(ith_run):
log_path = open('PROFILE/%s/run-%d.log'%(exp_log_dir, ith_run+1), 'w+')
printX[ith_run%len(printX)](final_arg_list[ith_run])
subprocess.run(final_arg_list[ith_run], stdout=log_path, stderr=log_path)
def remote_worker(ith_run):
# step 1: transfer all files
from UTIL.exp_helper import get_ssh_sftp
addr = n_run_mode[ith_run]['addr']
if 'exe_here' in addr:
_, addr = addr.split('=>')
usr = n_run_mode[ith_run]['usr']
pwd = n_run_mode[ith_run]['pwd']
ssh, sftp = get_ssh_sftp(addr, usr, pwd)
src_path = os.getcwd()
else:
# assert False
usr = n_run_mode[ith_run]['usr']
pwd = n_run_mode[ith_run]['pwd']
ssh, sftp = get_ssh_sftp(addr, usr, pwd)
sftp.mkdir('/home/%s/MultiServerMission'%(usr), ignore_existing=True)
sftp.mkdir('/home/%s/MultiServerMission/%s'%(usr, time_mark), ignore_existing=True)
src_path = '/home/%s/MultiServerMission/%s/src'%(usr, time_mark)
try:
sftp.mkdir(src_path, ignore_existing=False)
sftp.put_dir('./', src_path, ignore_list=['__pycache__','TEMP','ZHECKPOINT'])
sftp.close()
print紫('upload complete')
except:
sftp.close()
print紫('do not need upload')
print('byobu attach -t %s'%time_mark_only)
addr_ip, addr_port = addr.split(':')
print亮蓝("Attach cmd: ssh %s@%s -p %s -t \"byobu attach -t %s\""%(usr, addr_ip, addr_port, time_mark_only))
stdin, stdout, stderr = ssh.exec_command(command='byobu new-session -d -s %s'%time_mark_only, timeout=1)
print亮紫('byobu new-session -d -s %s'%time_mark_only)
time.sleep(1)
byobu_win_name = '%s--run-%d'%(time_mark_only, ith_run)
byobu_win_name = byobu_win_name
stdin, stdout, stderr = ssh.exec_command(command='byobu new-window -t %s'%time_mark_only, timeout=1)
print亮紫('byobu new-window -t %s'%time_mark_only)
time.sleep(1)
cmd = 'cd ' + src_path
stdin, stdout, stderr = ssh.exec_command(command='byobu send-keys -t %s "%s" C-m'%(time_mark_only, cmd), timeout=1)
print亮紫('byobu send-keys "%s" C-m'%cmd)
time.sleep(1)
cmd = ' '.join(['echo', str(get_info(script_path)) ,'>>', './private_remote_execution.log'])
stdin, stdout, stderr = ssh.exec_command(command='byobu send-keys -t %s "%s" C-m'%(time_mark_only, cmd), timeout=1)
print亮紫('byobu send-keys "%s" C-m'%cmd)
time.sleep(1)
cmd = ' '.join(final_arg_list[ith_run])
stdin, stdout, stderr = ssh.exec_command(command='byobu send-keys -t %s "%s" C-m'%(time_mark_only, cmd), timeout=1)
print亮紫('byobu send-keys "%s" C-m'%cmd)
time.sleep(1)
print亮蓝("command send is done!")
time.sleep(2)
# 杀死
# stdin, stdout, stderr = ssh.exec_command(command='byobu kill-session -t %s'%byobu_win_name, timeout=1)
pass
def worker(ith_run):
if n_run_mode[ith_run] is None:
local_worker(ith_run)
else:
remote_worker(ith_run)
def clean_process(pid):
import psutil
parent = psutil.Process(pid)
for child in parent.children(recursive=True):
try:
print亮红('sending Terminate signal to', child)
child.terminate()
time.sleep(5)
print亮红('sending Kill signal to', child)
child.kill()
except: pass
parent.kill()
def clean_up():
print亮红('clean up!')
parent_pid = os.getpid() # my example
clean_process(parent_pid)
input('Confirm execution? 确认执行?')
input('Confirm execution! 确认执行!')
t = 0
while (t >= 0):
print('Counting down ', t)
time.sleep(1)
t -= 1
DELAY = 60
for ith_run in range(n_run):
worker(ith_run)
for i in range(DELAY):
time.sleep(1)
print('all submitted')
================================================
FILE: PythonExample/hmp_minimal_modules/UTIL/colorful.py
================================================
import platform
from sys import stdout
if platform.system()=="Linux":
pass
else:
from colorama import init
init()
# Do you like the elegance of Chinese characters?
def print红(*kw,**kargs):
print("\033[0;31m",*kw,"\033[0m",**kargs)
def print绿(*kw,**kargs):
print("\033[0;32m",*kw,"\033[0m",**kargs)
def print黄(*kw,**kargs):
print("\033[0;33m",*kw,"\033[0m",**kargs)
def print蓝(*kw,**kargs):
print("\033[0;34m",*kw,"\033[0m",**kargs)
def print紫(*kw,**kargs):
print("\033[0;35m",*kw,"\033[0m",**kargs)
def print靛(*kw,**kargs):
print("\033[0;36m",*kw,"\033[0m",**kargs)
def print亮红(*kw,**kargs):
print("\033[1;31m",*kw,"\033[0m",**kargs)
def print亮绿(*kw,**kargs):
print("\033[1;32m",*kw,"\033[0m",**kargs)
def print亮黄(*kw,**kargs):
print("\033[1;33m",*kw,"\033[0m",**kargs)
def print亮蓝(*kw,**kargs):
print("\033[1;34m",*kw,"\033[0m",**kargs)
def print亮紫(*kw,**kargs):
print("\033[1;35m",*kw,"\033[0m",**kargs)
def print亮靛(*kw,**kargs):
print("\033[1;36m",*kw,"\033[0m",**kargs)
def print亮红(*kw,**kargs):
print("\033[1;31m",*kw,"\033[0m",**kargs)
def print亮绿(*kw,**kargs):
print("\033[1;32m",*kw,"\033[0m",**kargs)
def print亮黄(*kw,**kargs):
print("\033[1;33m",*kw,"\033[0m",**kargs)
def print亮蓝(*kw,**kargs):
print("\033[1;34m",*kw,"\033[0m",**kargs)
def print亮紫(*kw,**kargs):
print("\033[1;35m",*kw,"\033[0m",**kargs)
def print亮靛(*kw,**kargs):
print("\033[1;36m",*kw,"\033[0m",**kargs)
print_red = print红
print_green = print绿
print_yellow = print黄
print_blue = print蓝
print_purple = print紫
print_indigo = print靛
print_bold_red = print亮红
print_bold_green = print亮绿
print_bold_yellow = print亮黄
print_bold_blue = print亮蓝
print_bold_purple = print亮紫
print_bold_indigo = print亮靛
if not stdout.isatty():
# redirection, avoid a fucked up log file
print红 = print
print绿 = print
print黄 = print
print蓝 = print
print紫 = print
print靛 = print
print亮红 = print
print亮绿 = print
print亮黄 = print
print亮蓝 = print
print亮紫 = print
print亮靛 = print
print_red = print
print_green = print
print_yellow = print
print_blue = print
print_purple = print
print_indigo = print
print_bold_red = print
print_bold_green = print
print_bold_yellow = print
print_bold_blue = print
print_bold_purple = print
print_bold_indigo = print
================================================
FILE: PythonExample/hmp_minimal_modules/UTIL/config_args.py
================================================
import argparse, os, time, func_timeout
from ast import Global
from shutil import copyfile, copytree, ignore_patterns, rmtree
from .colorful import *
from .data_struct import remove_prefix, remove_suffix
'''
This a chained var class, it deal with hyper-parameters that are bound together,
e.g. number of threads and test episode interval.
ChainVars are handled in utils.config_args.py
'''
class ChainVar(object):
def __init__(self, chain_func, chained_with):
self.chain_func = chain_func
self.chained_with = chained_with
# ChainVar relationship must end with '_cv' or '_CV'
def is_chained_key(key):
if key.endswith('_cv'):
return True, remove_suffix(key, '_cv')
elif key.endswith('_CV'):
return True, remove_suffix(key, '_CV')
else:
return False, key
'''
Load all parameters in place
'''
def prepare_args(vb=True):
if vb: prepare_tmp_folder()
parser = argparse.ArgumentParser(description='HMP')
parser.add_argument('-c', '--cfg', help='Path of the configuration file')
parser.add_argument('-s', '--skip', action='store_true', help='skip logdir check')
args, unknown = parser.parse_known_args()
load_via_json = (hasattr(args, 'cfg') and args.cfg is not None)
assert load_via_json
skip_logdir_check = (hasattr(args, 'skip') and (args.skip is not None) and args.skip) or (not vb)
if len(unknown) > 0 and vb: print亮红('Warning! In json setting mode, %s is ignored'%str(unknown))
# load configuration from file
import commentjson as json
if vb: print亮绿('reading configuration at', args.cfg)
# inject configuration into place
with open(args.cfg, encoding='utf8') as f: json_data = json.load(f)
# check and process tmp alg folder
if vb: prepare_alg_tmp_folder(json_data)
# inject configuration into place
load_config_via_json(json_data, vb)
# read the new global configuration
from config import GlobalConfig as cfg
# check log path conflict, change note name if required
note_name_overide = None
if not skip_logdir_check:
note_name_overide = check_experiment_log_path(cfg.logdir)
if note_name_overide is not None:
override_config_file('config.py->GlobalConfig', {'note':note_name_overide}, vb)
# create log path
if not os.path.exists(cfg.logdir): os.makedirs(cfg.logdir)
# back up essiential files
if (not cfg.recall_previous_session) and vb:
copyfile(args.cfg, '%s/experiment.jsonc'%cfg.logdir)
if not os.path.exists('%s/raw_exp.jsonc'%cfg.logdir):
copyfile(args.cfg, '%s/raw_exp.jsonc'%cfg.logdir)
backup_files(cfg.backup_files, cfg.logdir, args.cfg)
cfg.machine_info = register_machine_info(cfg.logdir)
# light up the ready flag
cfg.cfg_ready = True
# finish
return cfg
def load_config_via_json(json_data, vb):
for cfg_group in json_data:
if cfg_group == 'config.py->GlobalConfig': random_seed_warning(json_data[cfg_group])
dependency = override_config_file(cfg_group, json_data[cfg_group], vb)
if dependency is not None:
for dep in dependency:
assert any([dep in k for k in json_data.keys()]), 'Arg check failure, There is something missing!'
check_config_relevence(json_data)
return None
def override_config_file(cfg_group, new_cfg, vb):
import importlib
assert '->' in cfg_group
str_pro = '------------- %s -------------'%cfg_group
if vb: print绿(str_pro)
file_, class_ = cfg_group.split('->')
if '.py' in file_:
# replace it with removesuffix('.py') if you have python>=3.9
if file_.endswith('.py'): file_ = file_[:-3]
default_configs = getattr(importlib.import_module(file_), class_)
for key in new_cfg:
if new_cfg[key] is None: continue
my_setattr(conf_class=default_configs, key=key, new_value=new_cfg[key], vb=vb)
altered_cv = secure_chained_vars(default_configs, new_cfg, vb)
if vb:
print绿(''.join(['-']*len(str_pro)),)
arg_summary(default_configs, new_cfg, altered_cv)
print绿(''.join(['-']*len(str_pro)),'\n\n\n')
if 'TEAM_NAMES' in new_cfg:
return [item.split('->')[0] for item in new_cfg['TEAM_NAMES'] if not item.startswith('TEMP')]
return None
def secure_chained_vars(default_cfg, new_cfg, vb):
default_cfg_dict = default_cfg.__dict__
altered_cv = []
for key in default_cfg_dict:
is_chain, o_key = is_chained_key(key)
if not is_chain: continue
if o_key in new_cfg: continue
assert hasattr(default_cfg, o_key), ('twin var does not have original')
# get twin
chain_var = getattr(default_cfg, key)
need_reflesh = False
for chain_by_var in chain_var.chained_with:
if chain_by_var in new_cfg: need_reflesh = True
if not need_reflesh: continue
replace_item = chain_var.chain_func(*[getattr(default_cfg, v) for v in chain_var.chained_with])
original_item = getattr(default_cfg, o_key)
if vb: print靛('[config] warning, %s is chained by %s, automatic modifying:'%(o_key,
str(chain_var.chained_with)), original_item, '-->', replace_item)
setattr(default_cfg, o_key, replace_item)
altered_cv.append(o_key)
return altered_cv
"""
make sure that env selection Matches env configuration
"""
def check_config_relevence(json_data):
env_name = json_data['config.py->GlobalConfig']['env_name']
env_path = json_data['config.py->GlobalConfig']['env_path']
for key in json_data.keys():
if 'MISSION' in key: assert env_path in key, ('configering wrong env!')
"""
Warn user if the random seed is not given
"""
def random_seed_warning(json_data):
if 'seed' not in json_data:
from config import GlobalConfig as cfg
print亮红('Random seed not given, using %d'%cfg.seed)
time.sleep(5)
def prepare_tmp_folder():
def init_dir(dir):
if not os.path.exists(dir): os.makedirs(dir)
local_temp_folder = './TEMP'
global_temp_folder = os.path.expanduser('~/HmapTemp')
init_dir(local_temp_folder)
init_dir(global_temp_folder+'/GpuLock')
init_dir(global_temp_folder+'/PortFinder')
def prepare_alg_tmp_folder(json_data):
try:
# scan mission conf
mission_key = [k for k in json_data.keys() if k.startswith('MISSION')][0]
# obtain algorithm assignment
TEAM_NAMES = json_data[mission_key]['TEAM_NAMES']
for tname in TEAM_NAMES:
if not tname.startswith('TEMP'): continue
# obtain the path of algorithm to be mirrored
path = tname.split('->')[0].replace('.','/')
# trace path parent to algorithm folder.
trace_success = False
max_depth = 5
for _ in range(max_depth):
parent = os.path.relpath(path+'/..')
if os.path.basename(parent) == 'ALGORITHM':
src_path = os.path.relpath(path, start=os.path.relpath(parent+'/..'))
trace_success = True
break
path = parent
# transmit temp algorithm
if trace_success:
import glob
from stat import S_IREAD, S_IRGRP, S_IROTH, S_IWRITE
def readonly_handler(func, path, execinfo):
try:
os.chmod(path, S_IWRITE)
func(path)
except:
pass
return
rmtree(path, onerror=readonly_handler)
# src_path = remove_prefix(path, 'TEMP/')
print亮绿(f'[config] Copying mirror algorithm from {src_path} to {path}')
copytree(src_path, path)
# make these temp files read only
for f in glob.glob(path+'/**/*.py', recursive=True): os.chmod(f, S_IREAD|S_IRGRP|S_IROTH)
except:
print亮红('[config] Errors occurs when executing prepare_alg_tmp_folder')
time.sleep(5)
return
def register_machine_info(logdir):
import socket, json, subprocess, uuid
from .network import get_host_ip
info = {
'HostIP': get_host_ip(),
'ExpUUID': uuid.uuid1().hex,
'RunPath': os.getcwd(),
'StartDateTime': time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
}
try:
info['DockerContainerHash'] = subprocess.getoutput(r'cat /proc/self/cgroup | grep -o -e "docker/.*"| head -n 1 |sed "s/docker\\/\\(.*\\)/\\1/" |cut -c1-12')
except:
info['DockerContainerHash'] = 'None'
with open('%s/info.json'%logdir, 'w+') as f:
json.dump(info, f, indent=4)
return info
def backup_files(files, logdir, jsonfile):
from config import GlobalConfig as cfg
if cfg.remote_server_ops != "":
remote_server_ops = cfg.remote_server_ops.replace("LOCALFILE", jsonfile).replace(
"REMOTEFILE",
time.strftime("%Y_%m_%d_%H_%M_%S__", time.localtime())+ cfg.note + '__' + os.path.basename(jsonfile))
os.popen(remote_server_ops)
for file in files:
if os.path.isfile(file):
print绿('[config] Backup File:',file)
bkdir = '%s/backup_files/'%logdir
if not os.path.exists(bkdir): os.makedirs(bkdir)
copyfile(file, '%s/%s'%(bkdir, os.path.basename(file)))
else:
print亮绿('[config] Backup Folder:',file)
assert os.path.isdir(file), ('cannot find', file)
copytree(file, '%s/backup_files/%s'%(logdir, os.path.basename(file)),
dirs_exist_ok=True, ignore=ignore_patterns("__pycache__"))
return
def check_experiment_log_path(logdir):
res = None
if os.path.exists(logdir):
if os.path.exists(logdir+'test_stage'): return None
print亮红('Current log path:', logdir)
print亮红('Warning! you will overwrite old logs if continue!')
print亮红("Pause for 60 seconds before continue (or press Enter to confirm!)")
try:
res = askChoice()
if res == '': res = None
except func_timeout.exceptions.FunctionTimedOut as e:
res = None
return res
@func_timeout.func_set_timeout(60)
def askChoice():
return input('>>')
def arg_summary(config_class, modify_dict = {}, altered_cv = []):
for key in config_class.__dict__:
if '__' in key: continue
is_chain, _ = is_chained_key(key)
if is_chain: continue
if (not key in modify_dict) or (modify_dict[key] is None):
if key not in altered_cv:
print绿(key.center(25), '-->', str(getattr(config_class,key)))
else:
print靛(key.center(25), '-->', str(getattr(config_class,key)))
else:
print红(key.center(25), '-->', str(getattr(config_class,key)))
def my_setattr(conf_class, key, new_value, vb):
assert hasattr(conf_class, key), (conf_class, 'has no such config item: **%s**'%key)
setting_name = key
replace_item = new_value
original_item = getattr(conf_class, setting_name)
if vb: print绿('[config] override %s:'%setting_name, original_item, '-->', replace_item)
if isinstance(original_item, float):
replace_item = float(replace_item)
elif isinstance(original_item, bool):
if replace_item == 'True':
replace_item = True
elif replace_item == 'False':
replace_item = False
elif isinstance(replace_item, bool):
replace_item = replace_item
else:
assert False, ('enter True or False, but have:', replace_item)
elif isinstance(original_item, int):
assert int(replace_item) == float(replace_item), ("warning, this var **%s** has an int default, but given a float override!"%key)
replace_item = int(replace_item)
elif isinstance(original_item, str):
replace_item = replace_item
elif isinstance(original_item, list):
assert isinstance(replace_item, list)
elif isinstance(original_item, dict):
assert isinstance(replace_item, dict)
else:
assert False, ('not support this type')
setattr(conf_class, setting_name, replace_item)
return
def find_all_conf():
import glob
py_script_list = glob.glob('./**/*.py', recursive=True)
conf_class_gather = []
for python_file in py_script_list:
with open(python_file,encoding='UTF-8') as f:
lines = f.readlines()
for line in lines:
if 'ADD_TO_CONF_SYSTEM' not in line: continue
if 'class ' not in line: continue
conf_class_gather.append({'line':line, 'file':python_file})
def getBetween(str, str1, str2):
strOutput = str[str.find(str1)+len(str1):str.find(str2)]
return strOutput
for target in conf_class_gather:
class_name = getBetween(target['line'], 'class ', '(')
target['class_name'] = class_name
target['file'] = target['file'].replace('/', '.').replace('..', '')
import importlib
target['class'] = getattr(importlib.import_module(target['file'].replace('.py', '')), class_name)
return conf_class_gather
def make_json(conf_list):
import json
out = {}
for conf in conf_list:
local_conf = {}
config_class = conf['class']
for key in config_class.__dict__:
if '__' in key: continue
is_chain, _ = is_chained_key(key)
if is_chain: continue
item_to_be_serialize = getattr(config_class, key)
try:
json.dumps(item_to_be_serialize)
except:
item_to_be_serialize = '[cannot be json]' + str(item_to_be_serialize)
local_conf[key] = item_to_be_serialize
out[conf['file']] = local_conf
# json_str = json.dumps(out)
with open('all_conf.json', 'w') as f:
json.dump(out, f, indent=4)
print亮紫('the conf summary is successfully saved to all_conf.json')
if __name__ == '__main__':
conf_list = find_all_conf()
res_json = make_json(conf_list)
================================================
FILE: PythonExample/hmp_minimal_modules/UTIL/data_struct.py
================================================
class UniqueList():
def __init__(self, list_input=None):
self._list = []
if list_input is not None:
self.extend_unique(list_input)
def append_unique(self, item):
if item in self._list:
return False
else:
self._list.append(item)
def extend_unique(self, list_input):
for item in list_input:
self.append_unique(item)
def has(self, item):
return (item in self._list)
def len(self):
return len(self._list)
def __len__(self):
return len(self._list)
def get(self):
return self._list
def __iter__(self):
return self._list.__iter__()
# # https://stackoverflow.com/questions/16891340/remove-a-prefix-from-a-string
# def remove_prefix(text, prefix):
# return text[text.startswith(prefix) and len(prefix):]
# https://stackoverflow.com/questions/3663450/remove-substring-only-at-the-end-of-string
def remove_suffix(s, sub):
return s[:-len(sub)] if s.endswith(sub) else s
def remove_prefix(s, sub):
return s[len(sub):] if s.startswith(sub) else s
================================================
FILE: PythonExample/hmp_minimal_modules/UTIL/exp_helper.py
================================================
import paramiko, os, time
from UTIL.colorful import print亮紫, print亮靛
from UTIL.tensor_ops import __hash__
def singleton(cls):
_instance = {}
def inner(*args, **kwargs):
if cls not in _instance:
_instance[cls] = cls(*args, **kwargs)
return _instance[cls]
return inner
class ChainVar(object):
def __init__(self, chain_func, chained_with):
self.chain_func = chain_func
self.chained_with = chained_with
class DataCentralServer(object): # ADD_TO_CONF_SYSTEM //DO NOT remove this comment//
addr = 'None'
usr = 'None'
pwd = 'None'
@singleton
class changed():
def __init__(self):
self._storage = {}
def check(self, value, key):
if key in self._storage:
new_hash = __hash__(value)
if self._storage[key] == new_hash:
return False
else:
self._storage[key] = new_hash
return True
else:
self._storage[key] = __hash__(value)
return True
from stat import S_ISDIR
# great thank to skoll for sharing this at stackoverflow:
# https://stackoverflow.com/questions/4409502/directory-transfers-with-paramiko
class MySFTPClient(paramiko.SFTPClient):
def put_dir(self, source, target, ignore_list=[]):
''' Uploads the contents of the source directory to the target path. The
target directory needs to exists. All subdirectories in source are
created under target.
'''
for item in os.listdir(source):
if item in ignore_list: continue
if os.path.isfile(os.path.join(source, item)):
# print亮靛('uploading: %s --> %s'%(os.path.join(source, item),'%s/%s' % (target, item)))
self.put(os.path.join(source, item), '%s/%s' % (target, item))
else:
self.mkdir('%s/%s' % (target, item), ignore_existing=True)
self.put_dir(os.path.join(source, item), '%s/%s' % (target, item), ignore_list)
def isfile(self, path):
try:
return not S_ISDIR(self.stat(path).st_mode)
except IOError:
#Path does not exist, so by definition not a directory
return True
def get_dir(self, source, target, ignore_list=[]):
''' Download the contents of the source directory to the target path. The
target directory needs to exists. All subdirectories in source are
created under target.
'''
for item in self.listdir(source):
if item in ignore_list: continue
if self.isfile(os.path.join(source, item).replace('\\','/')):
# print亮靛('uploading: %s --> %s'%(os.path.join(source, item),'%s/%s' % (target, item)))
self.get(os.path.join(source, item).replace('\\','/'), '%s/%s' % (target, item))
else:
if os.path.exists('%s/%s' % (target, item)):
print('local dir already exists:', '%s/%s' % (target, item))
continue
os.mkdir('%s/%s' % (target, item))
self.get_dir(os.path.join(source, item).replace('\\','/'), '%s/%s' % (target, item), ignore_list)
def mkdir(self, path, mode=511, ignore_existing=False):
''' Augments mkdir by adding an option to not fail if the folder exists '''
try:
super(MySFTPClient, self).mkdir(path, mode)
except IOError as e:
if e.__class__ == FileNotFoundError:
raise
if ignore_existing:
pass
else:
raise
def get_ssh_sftp(addr, usr, pwd):
ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh.load_host_keys(os.path.expanduser(os.path.join("~", ".ssh", "known_hosts")))
port = 22
if ':' in addr: addr, port = addr.split(':')
ssh.connect(addr, username=usr, password=pwd, port=port)
sftp = MySFTPClient.from_transport(ssh.get_transport())
return ssh, sftp
def upload_exp(cfg): # shell it to catch error
try: upload_exp_(cfg)
except: pass
def upload_exp_(cfg):
path = cfg.logdir
name = cfg.note
try:
addr = DataCentralServer.addr # ssh ubuntu address
usr = DataCentralServer.usr # ubuntu user
pwd = DataCentralServer.pwd # ubuntu password
assert addr != 'None' and (addr is not None)
assert usr != 'None' and (usr is not None)
assert pwd != 'None' and (pwd is not None)
except:
print('No experiment data central server is configured, 没有配置中央日志服务器')
return
remote_path = '/home/%s/CenterHmp/'%usr
ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh.load_host_keys(os.path.expanduser(os.path.join("~", ".ssh", "known_hosts")))
ssh.connect(addr, username=usr, password=pwd)
put_str = '[%s] [%s] %s'%(cfg.note, time.strftime("%Y-%m-%d-%H:%M:%S", time.localtime()), str(cfg.machine_info).replace('\'',''))
ssh.exec_command(command='echo -e "%s" >> %s/active.log'%(put_str, remote_path), timeout=1)
sftp = MySFTPClient.from_transport(ssh.get_transport())
print亮紫('uploading results: %s --> %s'%(path, '%s/%s'%(remote_path, name)))
sftp.mkdir(remote_path, ignore_existing=True)
sftp.mkdir('%s/%s'%(remote_path, name), ignore_existing=True)
sftp.put_dir(path, '%s/%s'%(remote_path, name))
sftp.close()
print亮紫('upload complete')
================================================
FILE: PythonExample/hmp_minimal_modules/UTIL/fetch_multiserver.py
================================================
from UTIL.exp_helper import get_ssh_sftp
from UTIL.colorful import *
import time,os
'''
Fetch experiment results from worker servers
'''
n_run_mode = [
# { # @1
# "addr": "172.18.116.149:2233",
# "usr": "hmp",
# "pwd": "hmp"
# },
# { # @2
# "addr": "172.18.116.150:2233",
# "usr": "fuqingxu",
# "pwd": "clara"
# },
{ # @3
"addr": "172.18.116.149:2233",
"usr": "fuqingxu",
"pwd": "clara"
}
]
download_dir = './fetch/'
after_date = '2022-03-22-17-22-00'
consider_days = None
info_list = {}
to_download = {}
for ith_run in range(len(n_run_mode)):
addr = n_run_mode[ith_run]['addr']
usr = n_run_mode[ith_run]['usr']
pwd = n_run_mode[ith_run]['pwd']
ssh, sftp = get_ssh_sftp(addr, usr, pwd)
experiments_path = sftp.listdir(path='./MultiServerMission/')
# 将顺序改为从最早到最晚
experiments_path = reversed(sorted(experiments_path))
for index, exp_time in enumerate(experiments_path):
time_then = time.mktime(time.strptime(exp_time,"%Y-%m-%d-%H:%M:%S"))
time_now = time.mktime(time.localtime())
diff_time_days = (time_now - time_then)/3600/24
if consider_days is None:
consider_days = (time_now - time.mktime(time.strptime(after_date,"%Y-%m-%d-%H-%M-%S")))/3600/24
if diff_time_days > consider_days: continue
path_ckpt = './MultiServerMission/%s/src/ZHECKPOINT/'%exp_time
try:
list_of_sub_exp = sftp.listdir(path=path_ckpt)
except:
print('No ZHECKPOINT directory found!')
continue
key = str(ith_run)+'-'+str(index)
print亮绿(key,':',exp_time)
for sep in list_of_sub_exp:
print亮紫('\t- ',sep)
info_list[key] = {'ith_run':ith_run, 'index':index, 'path':path_ckpt}
target_path = (download_dir+'/%s/'%exp_time.replace(':','-'))
try:
os.mkdir(target_path)
sftp.get_dir(source=path_ckpt,target=target_path) # 下载!
except BaseException as e:
print('This directory already exists, skipping:', target_path)
print('download complete')
================================================
FILE: PythonExample/hmp_minimal_modules/UTIL/file_lock.py
================================================
# pip install filelock
from filelock import FileLock as FileLockBase
class FileLock(FileLockBase):
def __init__(self, lock_file, timeout: float = -1) -> None:
assert lock_file.endswith('.lock')
super().__init__(lock_file, timeout)
def is_file_empty(file_path):
with open(file_path, 'r') as f:
file_content = f.read()
if file_content == '' or file_content == '\n':
return True
else:
return False
================================================
FILE: PythonExample/hmp_minimal_modules/UTIL/gpu_eater.py
================================================
def validate_path():
import os, sys
dir_name = os.path.dirname(__file__)
root_dir_assume = os.path.abspath(os.path.dirname(__file__) + '/..')
os.chdir(root_dir_assume)
sys.path.append(root_dir_assume)
if __name__ == '__main__':
validate_path()
from multiprocessing import Process
from UTIL.network import UnixTcpServerMultiClient
import os, time, re, torch
import threading
def check_devices_mem():
devices_info = os.popen(
'"/usr/bin/nvidia-smi"' +
' --query-gpu=memory.total,memory.used' +
' --format=csv,nounits,noheader'
).read().strip().split("\n")
divices_mem_info = [x.split(',') for x in devices_info]
divices = os.environ.get("CUDA_VISIBLE_DEVICES")
if divices is None:
return divices_mem_info
else:
device_list = []
for i in [int(x) for x in divices.split(',')]:
device_list.append(divices_mem_info[i])
return device_list
def occupy_device_mem(cuda_device, mem_info, free=1024):
total, used = int(mem_info[0]), int(mem_info[1])
block_mem = total - used - free
if block_mem > 0:
print('Occupy device_{}\'s mem ...'.format(cuda_device))
x = torch.zeros(
(256, 1024, block_mem),
dtype=torch.float32,
device='cuda:{}'.format(cuda_device)
)
del x
print('Occupy device_{}\'s mem finished'.format(cuda_device))
else:
print('Device_{}\'s out of memory'.format(cuda_device))
def occupy_gpus_mem(free=4096):
for i, mem_info in enumerate(check_devices_mem()):
occupy_device_mem(i, mem_info, free)
print('Occupy all device\'s mem finished')
class GPU_Eater(Process):
def __init__(self, unix_path, party):
super(GPU_Eater, self).__init__()
self.unix_path = unix_path
self.server = None
self.party = party
match_res = re.match(pattern=r'cuda(.)_party(.)', string=party)
cudax, self.party_index = match_res[1], match_res[2]
assert self.party_index == '0'
self.device = f'cuda:{cudax}'
cudax_int = int(cudax)
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = cudax
self.previous_req = time.time()
def __del__(self):
if self.server is not None:
self.server.close()
self.terminate()
def run_timer(self):
while True:
time.sleep(60)
delta_time = time.time() - self.previous_req
print(f'inactive for {delta_time} seconds')
if delta_time > 3600:
self.__del__()
break
def release_gpu(self):
torch.cuda.empty_cache()
pass
def hold_gpu(self):
occupy_gpus_mem(free=2048)
pass
def on_receive_data(self, data):
print('data incoming')
if data == 'link':
self.previous_req = time.time()
reply = 'success'
elif data == 'need_gpu':
self.release_gpu()
self.previous_req = time.time()
reply = 'ok'
elif data == 'giveup_gpu':
self.hold_gpu()
self.previous_req = time.time()
reply = 'ok'
elif data == 'offline':
self.previous_req = time.time()
reply = 'ok'
else:
assert False
print(data)
return reply
def run(self):
print('started')
try: os.unlink(self.unix_path)
except: pass
t = threading.Thread(target=self.run_timer)
t.daemon = True
t.start()
self.server = UnixTcpServerMultiClient(self.unix_path, obj='str')
self.server.on_receive_data = lambda data: self.on_receive_data(data)
self.server.be_online()
if __name__ == '__main__':
import argparse
parser = argparse.ArgumentParser(description='gpu_party')
parser.add_argument('--party', type=str)
args = parser.parse_args()
party = args.party
unix_path = os.path.expanduser(f'~/HmapTemp/GpuLock/GpuEater_{party}')
o = GPU_Eater(unix_path, party)
o.run()
================================================
FILE: PythonExample/hmp_minimal_modules/UTIL/gpu_share.py
================================================
import platform, os, torch, uuid, time, psutil, json, random
from UTIL.network import UnixTcpClientP2P, UnixTcpServerP2P
from atexit import register
from .file_lock import FileLock
def pid_exist(pid_str):
pid = int(pid_str)
return psutil.pid_exists(pid)
def read_json(fp):
# create if not exist
if not os.path.exists(fp):
with open(fp, "w") as f:
pass
# try to read, otherwise reset
try:
with open(fp, "r+") as f:
json_data = json.load(f)
except:
json_data = {}
return json_data
def write_json(fp, buf):
with open(fp, "w") as f:
json.dump(buf, fp=f)
return
def create_eater(unix_path):
from .gpu_eater import GPU_Eater
proc = GPU_Eater(unix_path)
proc.daemon = True
proc.start()
class GpuHolder():
def __init__(self, device) -> None:
# try to communicate with gpu holder
unix_path = os.path.expanduser(f'~/HmapTemp/GpuLock/GpuEater_{device}')
try:
self.client = UnixTcpClientP2P(unix_path, obj='str')
success = self.client.send_and_wait_reply('link')
print('already have a GpuHolder online')
except:
assert False
print('creating GpuHolder')
create_eater(unix_path)
time.sleep(3)
print('creating Finished')
self.client = UnixTcpClientP2P(unix_path, obj='str')
success = self.client.send_and_wait_reply('link')
assert success == 'success'
def __del__(self):
if self.client is not None:
self.client.send_and_wait_reply('offline')
self.client.__del__()
def need_gpu(self):
ok = self.client.send_and_wait_reply('need_gpu')
assert ok == 'ok'
def giveup_gpu(self):
ok = self.client.send_and_wait_reply('giveup_gpu')
assert ok == 'ok'
class GpuShareUnit():
flesh = True
def __init__(self, which_gpu, lock_path=None, manual_gpu_ctl=True, gpu_party='', gpu_ensure_safe=False):
self.device = which_gpu
self.manual_gpu_ctl = True
self.lock_path=lock_path
self.gpu_party = gpu_party
self.gpu_lock = None
self.ensure_gpu_safe = gpu_ensure_safe
self.pid_str = str(os.getpid())
self.n_gpu_process_online = 1
if self.ensure_gpu_safe:
assert 'party0' in self.gpu_party; assert 'cuda' in self.gpu_party
self.gpu_eater = GpuHolder(device=self.gpu_party)
if gpu_party == 'off':
self.manual_gpu_ctl = False
# the default file lock path
if self.lock_path is None:
self.lock_path = os.path.expanduser('~/HmapTemp/GpuLock')
# create a folder if the path is invalid
if not os.path.exists(self.lock_path):
os.makedirs(self.lock_path)
# gpu party register file
self.register_file = self.lock_path+'/lock_gpu_%s_%s.json'%(self.device, self.gpu_party)
register(self.__del__)
def __del__(self):
if hasattr(self,'_deleted_'):
# avoid exit twice
return
else:
self._deleted_ = True # avoid exit twice
try:
with FileLock(self.register_file+'.lock'):
self.unregister_pid()
except:
pass
try: self.gpu_lock.__exit__(None,None,None)
except:pass
def __enter__(self):
self.get_gpu_lock()
return self
def __exit__(self, exc_type, exc_value, traceback):
self.release_gpu_lock()
def get_gpu_lock(self):
if self.manual_gpu_ctl:
print('Waiting for GPU %s %s...'%(self.device, self.gpu_party), end='', flush=True)
with FileLock(self.register_file+'.lock'):
self.n_gpu_process_online = self.register_pid()
fp = self.lock_path+'/gpu_lock_%s_%s'%(self.device, self.gpu_party)
self.gpu_lock = FileLock(fp+'.lock')
self.gpu_lock.__enter__()
if self.ensure_gpu_safe: self.gpu_eater.need_gpu()
print('Get GPU, currently shared with %d process!'%self.n_gpu_process_online)
return
def release_gpu_lock(self):
if self.manual_gpu_ctl:
# if self.n_gpu_process_online > 1:
torch.cuda.empty_cache()
if self.ensure_gpu_safe: self.gpu_eater.giveup_gpu()
self.gpu_lock.__exit__(None,None,None)
# else:
# print('GPU not shared')
return
def register_pid(self):
all_pids = read_json(self.register_file)
need_write = False
# check all pid alive occasionally
if GpuShareUnit.flesh or random.random() < 0.05:
for pid in list(all_pids.keys()):
if not pid_exist(pid):
all_pids.pop(pid); print('removing dead item', pid)
need_write = True
GpuShareUnit.flesh = False
# add entry if not exist
if self.pid_str not in all_pids:
all_pids[self.pid_str] = {}
need_write = True
# write back if needed
if need_write: write_json(self.register_file, all_pids)
return len(all_pids)
def unregister_pid(self):
all_pids = read_json(self.register_file)
# check all pid alive
for pid in list(all_pids.keys()):
if not pid_exist(pid):
all_pids.pop(pid); print('removing dead item', pid)
try:
all_pids.pop(self.pid_str)
except:
pass
# write back if needed
write_json(self.register_file, all_pids)
================================================
FILE: PythonExample/hmp_minimal_modules/UTIL/hidden_print.py
================================================
import sys, os
class HiddenPrints:
def __enter__(self):
self._original_stdout = sys.stdout
sys.stdout = open(os.devnull, 'w')
def __exit__(self, exc_type, exc_val, exc_tb):
sys.stdout.close()
sys.stdout = self._original_stdout
================================================
FILE: PythonExample/hmp_minimal_modules/UTIL/hmp_daemon.py
================================================
import time, requests, threading, os, atexit, psutil
from UTIL.colorful import *
def kill_process(p):
try:
# print('正在发送terminate命令到进程:', os.getpid(), '-->', p.pid)
p.terminate()
_, alive = psutil.wait_procs([p,], timeout=0.01) # 先等 10ms
if len(alive):
_, alive = psutil.wait_procs(alive, timeout=0.10) # 再等 100ms
if len(alive):
# print('\t (R1) 很遗憾, 进程不服从terminate信号, 正在发送kill-9命令到进程:', os.getpid(), '-->', p.pid)
for p in alive: p.kill()
else:
# print('\t (R2) 进程成功结束')
pass
else:
# print('\t (R2) 进程成功结束')
pass
except Exception as e:
print(e)
def kill_process_and_its_children(p):
p = psutil.Process(p.pid) # p might be Python's process, convert to psutil's process
if len(p.children())>0:
# print('有子进程')
for child in p.children():
if hasattr(child,'children') and len(child.children())>0:
kill_process_and_its_children(child)
else:
kill_process(child)
else:
pass
# print('无子进程')
kill_process(p)
def kill_process_children(p):
p = psutil.Process(p.pid) # p might be Python's process, convert to psutil's process
if len(p.children())>0:
# print('有子进程')
for child in p.children():
if hasattr(child,'children') and len(child.children())>0:
kill_process_and_its_children(child)
else:
kill_process(child)
else:
pass
# print('无子进程')
def clean_child_process(pid):
parent = psutil.Process(pid)
kill_process_children(parent)
def hmp_clean_up():
from UTIL.exp_helper import upload_exp
from config import GlobalConfig as cfg
print亮黄('[main.py] upload results to storage server via SSH')
if cfg.allow_res_upload: upload_exp(cfg)
print亮黄('[main.py] kill all children process, then self-terminate.')
clean_child_process(os.getpid())
def start_periodic_daemon(cfg):
print('[hmp_daemon.py] Disable periodic daemon to debug.')
return
periodic_thread = threading.Thread(target=periodic_daemon,args=(cfg,))
periodic_thread.setDaemon(True)
periodic_thread.start()
for i in range(100):
time.sleep(1)
print(i)
atexit.register(hmp_clean_up)
def periodic_daemon(cfg):
while True:
try:
print('start periodic_daemon_(cfg)')
periodic_daemon_(cfg)
print('end periodic_daemon_(cfg)')
except AssertionError:
hmp_clean_up()
except BaseException:
print('hmp server failed')
break
time.sleep(15*60)
def periodic_daemon_(cfg):
report = {
'type': 'hmp-client',
'note': cfg.note,
'time': time.strftime("%Y-%m-%d-%H:%M:%S", time.localtime()),
'client_status': 'Running',
'StartingTime': cfg.machine_info['StartDateTime'],
'HostIP': cfg.machine_info['HostIP'],
'ExpUUID': cfg.machine_info['ExpUUID'],
'RunPath':cfg.machine_info['RunPath'],
'DockerContainerHash':cfg.machine_info['DockerContainerHash']
}
res = requests.post('http://linux.ipv4.fuqingxu.top:11511/',data = report)
if res.text=='Stop_Now':
report['client_status'] = 'Terminate'
requests.post('http://linux.ipv4.fuqingxu.top:11511/',data = report)
raise AssertionError('HMP-Center Has Given Terminate Signal!')
================================================
FILE: PythonExample/hmp_minimal_modules/UTIL/legacy/gpu_share_unfin.py
================================================
import flock, os, torch, uuid, time, glob
from atexit import register
class GpuShareUnit():
def __init__(self, which_gpu, lock_path=None, manual_gpu_ctl=True, gpu_party=''):
self.device = which_gpu
self.manual_gpu_ctl = True
self.lock_path=lock_path
self.gpu_party = gpu_party
self.experiment_uuid = uuid.uuid1().hex + '\n'
self.n_gpu_process_online = 1
self.flag_req_all_party = False
self.parties_req = None # 如果单个party的显存不够用,那么需要同时排队多个party,获取内存
if gpu_party == 'off' or gpu_party == 'OFF' or gpu_party<0:
self.manual_gpu_ctl = False
if self.lock_path is None:
self.lock_path = os.path.expanduser('~/GpuLock')
if not os.path.exists(self.lock_path): os.makedirs(self.lock_path)
register(self.unregister_uuids_)
def __exit__(self, exc_type, exc_value, traceback):
if not self.flag_req_all_party:
self.release_gpu_lock()
def __enter__(self):
self._get_gpu_locks()
return self
def __del__(self):
self.unregister_uuids_()
def _get_gpu_locks(self):
if not self.flag_req_all_party:
self.parties_req = None
self.__get_gpu_lock(self.device, self.gpu_party)
else:
self.parties_req = self.__find_all_active_party(self.device)
if not (self.gpu_party in self.parties_req): self.parties_req.append(self.gpu_party)
for each_party in self.parties_req: self.__get_gpu_lock(self.device, each_party)
def __find_all_active_party(self, device):
list_of_active_parties = []
for indx in range(64):
res = self.___get_party_n_share(device, gpu_party=str(indx))
if res is None:
break
if res == 0:
break
if res >0:
list_of_active_parties.append(str(indx))
return list_of_active_parties
def __get_gpu_lock(self, device, gpu_party):
if self.manual_gpu_ctl:
print('Waiting for GPU %s %s...'%(device, gpu_party), end='', flush=True)
gpu_lock, gpu_lock_file = (None, None)
self.n_gpu_process_online = self.register_uuid_(device, gpu_party)
self.gpu_lock_file = open(self.lock_path+'/lock_gpu_%s_%s.glock'%(device, gpu_party), 'w+')
self.gpu_lock = flock.Flock(self.gpu_lock_file, flock.LOCK_EX)
self.gpu_lock.__enter__()
print('Get GPU, currently shared with %d process!'%self.n_gpu_process_online)
return
def release_gpu_lock(self):
self.flag_req_all_party = False
if self.manual_gpu_ctl:
if self.n_gpu_process_online >1:
torch.cuda.empty_cache()
self.gpu_lock.__exit__(None,None,None)
self.gpu_lock_file.close()
else:
print('不共享GPU')
return
def ___get_party_n_share(self, device, gpu_party):
try:
flag = 'r'
with open(self.lock_path+'/lock_gpu_%s_%s.register'%(device, gpu_party), mode=flag) as gpu_register_file:
_lock = flock.Flock(gpu_register_file, flock.LOCK_EX); _lock.__enter__()
lines = gpu_register_file.readlines()
_lock.__exit__(None,None,None)
return len(lines)
except:
return None
def register_uuid_(self, device, gpu_party):
try:
flag = 'w+' if not os.path.exists(self.lock_path+'/lock_gpu_%s_%s.register'%(device, gpu_party)) else 'r+'
with open(self.lock_path+'/lock_gpu_%s_%s.register'%(device, gpu_party), mode=flag) as gpu_register_file:
_lock = flock.Flock(gpu_register_file, flock.LOCK_EX); _lock.__enter__()
lines = gpu_register_file.readlines()
if not any([line==self.experiment_uuid for line in lines]):
lines.append(self.experiment_uuid)
gpu_register_file.seek(0); gpu_register_file.truncate(0)
gpu_register_file.writelines(lines)
gpu_register_file.flush()
_lock.__exit__(None,None,None)
return len(lines)
except:
print('GPU 队列异常!')
return 999
def unregister_uuids_(self, device, gpu_party):
for __
self.unregister_uuid_(device, gpu_party)
try: self.gpu_lock.__exit__(None,None,None)
except:pass
try: self.gpu_lock_file.close()
except:pass
def unregister_uuid__(self, device, gpu_party):
flag = 'w+' if not os.path.exists(self.lock_path+'/lock_gpu_%s_%s.register'%(device, gpu_party)) else 'r+'
with open(self.lock_path+'/lock_gpu_%s_%s.register'%(device, gpu_party), mode=flag) as gpu_register_file:
_lock = flock.Flock(gpu_register_file, flock.LOCK_EX); _lock.__enter__()
lines = gpu_register_file.readlines()
gpu_register_file.seek(0); gpu_register_file.truncate(0)
gpu_register_file.writelines([line for line in lines if line!=self.experiment_uuid])
gpu_register_file.flush()
_lock.__exit__(None,None,None)
print('unregister')
def req_all_party(self):
self.flag_req_all_party = True
================================================
FILE: PythonExample/hmp_minimal_modules/UTIL/mem_watcher_ue.py
================================================
def validate_path():
import os, sys
# '/home/hmp/xx/hmp2g-heterogeneous-phase2/UTIL'
dir_name = os.path.dirname(__file__)
# '/home/hmp/xx/hmp2g-heterogeneous-phase2'
root_dir_assume = os.path.abspath(os.path.dirname(__file__) + '/..')
# change working dir
os.chdir(root_dir_assume)
# import root
sys.path.append(root_dir_assume)
validate_path()
import time, requests, threading, os, atexit, psutil
from UTIL.colorful import *
def thread_dfs(p, depth=0, info=None):
try:
if isinstance(p, int):
p = psutil.Process(p)
elif isinstance(p, psutil.Process):
pass
else:
p = psutil.Process(p.pid)
pp = p
print_info(depth, pp, info)
if len(p.children())>0:
# print('有子进程')
for child in p.children():
if hasattr(child,'children') and len(child.children())>0:
thread_dfs(child, depth = depth+1, info=info)
else:
pp = child
print_info(depth+1, pp, info)
else:
pass
except:
return
def print_info(depth, pp, info=None):
pid = pp.pid
name = pp.name()
name_trim = 'HmapShmPoolWorker' if name.startswith('HmapShmPoolWorker') else name
mem = (psutil.Process(pid).memory_info().rss / 1024 / 1024 / 1024)
info['tot_mem'] += mem
info['tot_procs'] += 1
if name_trim not in info:
info[name_trim] = {
'mem':0,
'procs':0,
}
info[name_trim]['mem'] += mem
info[name_trim]['procs'] += 1
def find_procs_by_name(name):
"Return a list of processes matching 'name'."
ls = []
for p in psutil.process_iter(["name", "exe", "cmdline"]):
if name == p.info['name'] or \
p.info['exe'] and os.path.basename(p.info['exe']) == name or \
p.info['cmdline'] and p.info['cmdline'][0] == name:
ls.append(p)
return ls[0]
if __name__ == "__main__":
from VISUALIZE.mcom import mcom
mcv = mcom(
path='TEMP',
digit=-1,
rapid_flush=True, draw_mode='Img'
)
def main(root_name = 'UE4Editor.exe'):
pid = find_procs_by_name(root_name)
mem = (pid.memory_info().rss / 1024 / 1024 / 1024)
mcv.rec(mem, 'mem')
mcv.rec_show()
if __name__ == "__main__":
while True:
main()
time.sleep(10) # 十分钟一次
# time.sleep(300) # 十分钟一次
================================================
FILE: PythonExample/hmp_minimal_modules/UTIL/memleak_finder.py
================================================
from pympler import tracker
tr = tracker.SummaryTracker()
def memdb_print_diff():
tr.print_diff()
================================================
FILE: PythonExample/hmp_minimal_modules/UTIL/mprofile.py
================================================
import subprocess
import threading
import copy, os
import time
import json
from UTIL.colorful import *
# test sync to github
# ubuntu command to kill process: kill -9 $(ps -ef | grep fuqingxu |grep python | grep -v grep | awk '{print $ 2}')
arg_base = ['python', 'main.py']
log_dir = '%s/'%time.strftime("%Y-%m-%d-%H:%M:%S", time.localtime())
run_group = "bench"
# base_conf = 'train.json'
n_run = 4
n_run_mode = ['local', 'remote']
conf_override = {
"config.py->GlobalConfig-->note":
[
"train_origin_T(5itf) t5",
"train_origin_T(5itf) t6",
"train_origin_T(5itf) t7",
"train_origin_T(5itf) t8",
],
"MISSION.collective_assult.collective_assult_parallel_run.py->ScenarioConfig-->random_jam_prob":
[
0.05,
0.05,
0.05,
0.05,
],
"config.py->GlobalConfig-->seed":
[
22222221,
22222222,
22222223,
22222224,
],
"config.py->GlobalConfig-->device":
[
"cuda:0",
"cuda:1",
"cuda:2",
"cuda:3",
],
"config.py->GlobalConfig-->gpu_party":
[
"off",
"off",
"off",
"off",
],
}
base_conf = {
"config.py->GlobalConfig": {
# please checkout config.py for information
"note": "example experiment", # in case you forget the purpose of this trainning session, write a note
"env_name": "collective_assult", # which environment, see ./MISSION/env_router.py
"env_path": "MISSION.collective_assult", # path of environment
"draw_mode": "Img", # activate data plotting (Tensorboard is not used because I do not like it)
"num_threads": "64", # run N parallel envs, a 'env' is refered to as a 'thread'
"report_reward_interval": "64", # reporting interval
"test_interval": "2048", # test every $test_interval episode
"fold": "1", # this 'folding' is designed for IPC efficiency, you can thank python GIL for such a strange design...
"seed": 22222222, # seed controls pytorch and numpy
"backup_files": [ # backup files, pack them up
"example.jsonc",
"ALGORITHM/conc",
"MISSION/collective_assult/envs/collective_assult_env.py"
],
"device": "cuda:0", # choose from 'cpu' (no GPU), 'cuda' (auto select GPU), 'cuda:3' (manual select GPU)
# GPU memory is precious! assign multiple training process to a 'party', then they will share GPU memory
"gpu_party": "Cuda0-Party0", # default is 'off',
"upload_after_test": True
},
"UTIL.exp_upload.py->DataCentralServer": {
"addr": "172.18.112.16",
"usr": "fuqingxu",
"pwd": "clara"
},
"MISSION.collective_assult.collective_assult_parallel_run.py->ScenarioConfig": {
# please checkout ./MISSION/collective_assult/collective_assult_parallel_run.py for information
"size": "5",
"random_jam_prob": 0.05,
"introduce_terrain": "True",
"terrain_parameters": [
0.05,
0.2
],
"num_steps": "180",
"render": "False",
"render_with_unity": "False",
"MCOM_DEBUG": "False",
"render_ip_with_unity": "cn-cd-dx-1.natfrp.cloud:55861",
"half_death_reward": "True",
"TEAM_NAMES": [
"ALGORITHM.conc.foundation->ReinforceAlgorithmFoundation"
]
},
"ALGORITHM.conc.foundation.py->AlgorithmConfig": {
"n_focus_on": 2,
"actor_attn_mod": "False",
"extral_train_loop": "False",
"lr": 0.0005,
"ppo_epoch": 24,
"train_traj_needed": "64",
"load_checkpoint": False
}
}
assert '_' not in run_group, ('下划线的显示效果不好')
exp_log_dir = log_dir+'exp_log'
if not os.path.exists('PROFILE/%s'%exp_log_dir):
os.makedirs('PROFILE/%s'%exp_log_dir)
exp_json_dir = log_dir+'exp_json'
if not os.path.exists('PROFILE/%s'%exp_json_dir):
os.makedirs('PROFILE/%s'%exp_json_dir)
new_json_paths = []
for i in range(n_run):
conf = copy.deepcopy(base_conf)
new_json_path = 'PROFILE/%s/run-%d.json'%(exp_json_dir, i+1)
for key in conf_override:
assert n_run == len(conf_override[key]), ('检查!n_run是否对应')
tree_path, item = key.split('-->')
conf[tree_path][item] = conf_override[key][i]
with open(new_json_path,'w') as f:
json.dump(conf, f, indent=4)
print(conf)
new_json_paths.append(new_json_path)
final_arg_list = []
printX = [print红,print绿,print黄,print蓝,print紫,print靛,print亮红,print亮绿,print亮黄,print亮蓝,print亮紫,print亮靛]
for ith_run in range(n_run):
final_arg = copy.deepcopy(arg_base)
final_arg.append('--cfg')
final_arg.append(new_json_paths[ith_run])
final_arg_list.append(final_arg)
print('')
def worker(ith_run):
log_path = open('PROFILE/%s/run-%d.log'%(exp_log_dir, ith_run+1), 'w+')
printX[ith_run%len(printX)](final_arg_list[ith_run])
res = subprocess.run(final_arg_list[ith_run], stdout=log_path, stderr=log_path)
print('worker end')
def clean_process(pid):
import psutil
parent = psutil.Process(pid)
for child in parent.children(recursive=True):
try:
print亮红('sending Terminate signal to', child)
child.terminate()
time.sleep(5)
print亮红('sending Kill signal to', child)
child.kill()
except: pass
parent.kill()
def clean_up():
print亮红('clean up!')
parent_pid = os.getpid() # my example
clean_process(parent_pid)
if __name__ == '__main__':
input('确认执行?')
input('确认执行!')
t = 0
while (t >= 0):
print('运行倒计时:', t)
time.sleep(1)
t -= 1
threads = [ threading.Thread( target=worker,args=(ith_run,) ) for ith_run in range(n_run) ]
for thread in threads:
thread.setDaemon(True)
thread.start()
print('错峰执行,启动', thread)
DELAY = 3
for i in range(DELAY):
print('\r 错峰执行,启动倒计时%d '%(DELAY-i), end='', flush=True)
time.sleep(1)
from atexit import register
register(clean_up)
while True:
is_alive = [thread.is_alive() for thread in threads]
if any(is_alive):
time_now = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
print(time_now, 'I am still running!', is_alive)
print靛('current scipt:%s, current log:%s'%(os.path.abspath(__file__), 'PROFILE/%s/run-%d.log'%(exp_log_dir, ith_run+1)))
time.sleep(60)
else:
break
print('[profile] All task done!')
for thread in threads:
thread.join()
================================================
FILE: PythonExample/hmp_minimal_modules/UTIL/mserver_launcher.sh
================================================
byobu new-session -d -s $USER
================================================
FILE: PythonExample/hmp_minimal_modules/UTIL/network.py
================================================
import socket, threading, pickle, uuid, os, atexit, time, json, psutil
from UTIL.file_lock import FileLock
port_finder = os.path.expanduser('~/HmapTemp') + '/PortFinder/find_free_port_no_repeat.json'
def check_pid(pid):
return psutil.pid_exists(pid)
# return True
# """ Check For the existence of a unix pid. """
# try:
# os.kill(pid, 0)
# except OSError:
# return False
# else:
# return True
def find_free_port():
from contextlib import closing
with closing(socket.socket(socket.AF_INET, socket.SOCK_STREAM)) as s:
s.bind(('', 0))
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
return s.getsockname()[1]
def find_free_port_no_repeat():
fp = port_finder
def read():
if not os.path.exists(fp):
with open(fp, "w") as f: pass
try:
with open(fp, "r+") as f: ports_to_be_taken = json.load(f)
except:
ports_to_be_taken = {}
return ports_to_be_taken
def write(ports_to_be_taken):
# clean outdated
for port in list(ports_to_be_taken.keys()):
if not check_pid(ports_to_be_taken[port]['pid']):
ports_to_be_taken.pop(port)
print('removing dead item', port)
with open(fp, "w") as f:
json.dump(ports_to_be_taken, fp=f)
with FileLock(fp+'.lock'):
ports_to_be_taken = read()
while True:
new_port = find_free_port()
if str(new_port) not in ports_to_be_taken:
break
else:
print('port taken, change another')
print('find port:', new_port)
ports_to_be_taken[str(new_port)] = {
'time': time.time(),
'pid': os.getpid(),
}
write(ports_to_be_taken)
def release_fn(port):
with FileLock(fp+'.lock'):
ports_to_be_taken = read()
if str(port) in ports_to_be_taken:
ports_to_be_taken.pop(str(port))
else:
pass
write(ports_to_be_taken)
return release_fn
import atexit
atexit.register(release_fn, port=new_port)
return new_port, release_fn
def get_host_ip():
ip = None
try:
s=socket.socket(socket.AF_INET,socket.SOCK_DGRAM)
s.connect(('8.8.8.8',80)) # if fail here, please connect Internet to get IP?
ip=s.getsockname()[0]
finally:
s.close()
return ip
BUFSIZE = 10485760
# ip_port = ('127.0.0.1', 9999)
DEBUG_NETWORK = False
class UdpServer:
def __init__(self, ip_port, obj='bytes') -> None:
self.ip_port = ip_port
self.server = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
self.server.bind(self.ip_port)
self.most_recent_client = None
self.use_pickle = (obj=='pickle')
self.convert_str = (obj=='str')
return
def wait_next_dgram(self):
data, self.most_recent_client = self.server.recvfrom(BUFSIZE)
if self.convert_str: data = data.decode('utf8')
if self.use_pickle: data = pickle.loads(data)
if DEBUG_NETWORK: print('recv from :', self.most_recent_client, ' data :', data)
return data
def reply_last_client(self, data):
assert self.most_recent_client is not None
if DEBUG_NETWORK: print('reply_last_client :', self.most_recent_client, ' data :', data)
if self.use_pickle: data = pickle.dumps(data)
if self.convert_str: data = bytes(data, encoding='utf8')
self.server.sendto(data, self.most_recent_client)
return
def __del__(self):
self.server.close()
return
class UdpTargetedClient:
def __init__(self, target_ip_port, obj='bytes') -> None:
self.target_ip_port = target_ip_port
self.client = socket.socket(socket.AF_INET,socket.SOCK_DGRAM)
self.use_pickle = (obj=='pickle')
self.convert_str = (obj=='str')
return
def send_dgram_to_target(self, data):
if self.use_pickle: data = pickle.dumps(data)
if self.convert_str: data = bytes(data, encoding='utf8')
self.client.sendto(data, self.target_ip_port)
if DEBUG_NETWORK: print('send_targeted_dgram :', self.target_ip_port, ' data :', data)
return
def send_and_wait_reply(self, data):
if self.use_pickle: data = pickle.dumps(data)
if self.convert_str: data = bytes(data, encoding='utf8')
self.client.sendto(data, self.target_ip_port)
data, _ = self.client.recvfrom(BUFSIZE)
if self.convert_str: data = data.decode('utf8')
if self.use_pickle: data = pickle.loads(data)
if DEBUG_NETWORK: print('get_reply :', self.target_ip_port, ' data :', data)
return data
# /////// test ipv4 udp
# import numpy as np
# server = UdpServer(ip_port, obj='pickle')
# client = UdpTargetedClient(ip_port, obj='pickle')
# def server_fn():
# data = server.wait_next_dgram()
# server.reply_last_client(np.array([4,5,6]))
# def client_fn():
# rep = client.send_and_wait_reply(np.array([1,2,3]))
# thread_hi = threading.Thread(target=server_fn)
# thread_hello = threading.Thread(target=client_fn)
# # 启动线程
# thread_hi.start()
# thread_hello.start()
class UnixUdpServer:
def __init__(self, unix_path, obj='bytes') -> None:
try: os.makedirs(os.path.dirname(unix_path))
except: pass
self.unix_path = unix_path
self.server = socket.socket(socket.AF_UNIX, socket.SOCK_DGRAM)
self.server.bind(self.unix_path)
self.most_recent_client = None
self.use_pickle = (obj=='pickle')
self.convert_str = (obj=='str')
return
def wait_next_dgram(self):
data, self.most_recent_client = self.server.recvfrom(BUFSIZE)
if DEBUG_NETWORK: print('self.most_recent_client',self.most_recent_client)
if self.convert_str: data = data.decode('utf8')
if self.use_pickle: data = pickle.loads(data)
if DEBUG_NETWORK: print('recv from :', self.most_recent_client, ' data :', data)
return data
def reply_last_client(self, data):
assert self.most_recent_client is not None
if DEBUG_NETWORK: print('reply_last_client :', self.most_recent_client, ' data :', data)
if self.use_pickle: data = pickle.dumps(data)
if self.convert_str: data = bytes(data, encoding='utf8')
self.server.sendto(data, self.most_recent_client)
return
def __del__(self):
self.server.close()
os.unlink(self.unix_path)
return
class UnixUdpTargetedClient:
def __init__(self, target_unix_path, self_unix_path=None, obj='bytes') -> None:
self.target_unix_path = target_unix_path
if self_unix_path is not None:
self.self_unix_path = self_unix_path
else:
self.self_unix_path = target_unix_path+'_client_'+uuid.uuid1().hex[:5]
self.client = socket.socket(socket.AF_UNIX, socket.SOCK_DGRAM)
self.client.bind(self.self_unix_path)
self.use_pickle = (obj=='pickle')
self.convert_str = (obj=='str')
return
def send_dgram_to_target(self, data):
if self.use_pickle: data = pickle.dumps(data)
if self.convert_str: data = bytes(data, encoding='utf8')
self.client.sendto(data, self.target_unix_path)
if DEBUG_NETWORK: print('send_targeted_dgram :', self.target_unix_path, ' data :', data)
return
def send_and_wait_reply(self, data):
if self.use_pickle: data = pickle.dumps(data)
if self.convert_str: data = bytes(data, encoding='utf8')
self.client.sendto(data, self.target_unix_path)
data, _ = self.client.recvfrom(BUFSIZE)
if self.convert_str: data = data.decode('utf8')
if self.use_pickle: data = pickle.loads(data)
if DEBUG_NETWORK: print('get_reply :', self.target_unix_path, ' data :', data)
return data
def __del__(self):
self.client.close()
os.unlink(self.self_unix_path)
return
# /////// test unix udp
# remote_uuid = uuid.uuid1().hex # use uuid to identify threads
# unix_path = 'TEMP/Sockets/unix/%s'%remote_uuid
# server = UnixUdpServer(unix_path, obj='pickle')
# client = UnixUdpTargetedClient(unix_path, obj='pickle')
# def server_fn():
# data = server.wait_next_dgram()
# server.reply_last_client(np.array([4,5,6]))
# def client_fn():
# rep = client.send_and_wait_reply(np.array([1,2,3]))
# thread_hi = threading.Thread(target=server_fn)
# thread_hello = threading.Thread(target=client_fn)
# # 启动线程
# thread_hi.start()
# thread_hello.start()
class StreamingPackageSep:
def __init__(self):
self.buff = [b'']
self.myEOF = b'\xaa\x55\xaaHMP\xaa\x55' # those bytes follow 010101 or 101010 pattern
# self.myEOF = b'#A5@5A#' # the EOF string for frame seperation
def lower_send(self, data, connection):
if DEBUG_NETWORK: assert self.myEOF not in data, 'This is (almost) not possible!'
data = data + self.myEOF
if DEBUG_NETWORK: print('data length:', len(data))
connection.send(data)
def lowest_recv(self, connection):
while True:
recvData = connection.recv(BUFSIZE)
# ends_with_mark = recvData.endswith(self.myEOF)
split_res = recvData.split(self.myEOF)
assert len(split_res) != 0
if len(split_res) == 1:
# 说明没有终止符,直接将结果贴到buf最后一项
self.buff[-1] = self.buff[-1] + split_res[0]
if self.myEOF in self.buff[-1]: self.handle_flag_breakdown()
else:
n_split = len(split_res)
for i, r in enumerate(split_res):
self.buff[-1] = self.buff[-1] + r # 追加buff
if i == 0 and (self.myEOF in self.buff[-1]):
# 第一次追加后,在修复的数据断面上发现了myEOF!
self.handle_flag_breakdown()
if i != n_split-1:
# starts a new entry
self.buff.append(b'')
else:
# i == n_split-1, which is the last item
if r == b'': continue
if len(self.buff)>=2:
# 数据成型,拿取成型的数据
buff_list = self.buff[:-1]
self.buff = self.buff[-1:]
return buff_list
# Fox-Protocal
def lower_recv(self, connection, expect_single=True):
buff_list = self.lowest_recv(connection)
if expect_single:
assert len(buff_list) == 1, ('一次拿到了多帧数据, 但expect_single=True, 触发错误.', buff_list)
return buff_list[0], connection
else:
return buff_list, connection
def handle_flag_breakdown(self):
split_ = self.buff[-1].split(self.myEOF)
assert len(split_)==2
self.buff[-1] = split_[0]
# starts a new entry
self.buff.append(b'')
self.buff[-1] = split_[1]
return
# send() is used for TCP SOCK_STREAM connected sockets, and sendto() is used for UDP SOCK_DGRAM unconnected datagram sockets
class UnixTcpServerP2P(StreamingPackageSep):
def __init__(self, unix_path, obj='bytes') -> None:
super().__init__()
try: os.makedirs(os.path.dirname(unix_path))
except: pass
self.unix_path = unix_path
self.server = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
self.server.bind(self.unix_path)
self.server.listen()
self.most_recent_client = None
self.use_pickle = (obj=='pickle')
self.convert_str = (obj=='str')
atexit.register(self.__del__)
def accept_conn(self):
conn, _ = self.server.accept()
return conn
def wait_next_dgram(self):
if self.most_recent_client is None: self.most_recent_client, _ = self.server.accept()
data, self.most_recent_client = self.lower_recv(self.most_recent_client)
if DEBUG_NETWORK: print('self.most_recent_client',self.most_recent_client)
if self.convert_str: data = data.decode('utf8')
if self.use_pickle: data = pickle.loads(data)
if DEBUG_NETWORK: print('recv from :', self.most_recent_client, ' data :', data)
return data
def reply_last_client(self, data):
assert self.most_recent_client is not None
if DEBUG_NETWORK: print('reply_last_client :', self.most_recent_client, ' data :', data)
if self.use_pickle: data = pickle.dumps(data)
if self.convert_str: data = bytes(data, encoding='utf8')
self.lower_send(data, self.most_recent_client)
return
def __del__(self):
self.server.close()
try: os.unlink(self.unix_path)
except: pass
return
class UnixTcpServerMultiClient(StreamingPackageSep):
def __init__(self, unix_path, obj='bytes') -> None:
super().__init__()
try: os.makedirs(os.path.dirname(unix_path))
except: pass
self.unix_path = unix_path
self.server = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
self.server.bind(self.unix_path)
self.server.listen()
self.most_recent_client = None
self.use_pickle = (obj=='pickle')
self.convert_str = (obj=='str')
self.on_receive_data = lambda data: data
atexit.register(self.__del__)
def serve_clients(self, most_recent_client):
while True:
data, most_recent_client = self.lower_recv(most_recent_client)
if self.convert_str: data = data.decode('utf8')
if self.use_pickle: data = pickle.loads(data)
reply = self.on_receive_data(data)
if self.use_pickle: reply = pickle.dumps(reply)
if self.convert_str: reply = bytes(reply, encoding='utf8')
self.lower_send(reply, most_recent_client)
if data == 'offline': break
def be_online(self):
while True:
most_recent_client, _ = self.server.accept()
t = threading.Thread(target=self.serve_clients, args=(most_recent_client, ))
t.daemon = True
t.start()
def __del__(self):
self.server.close()
try: os.unlink(self.unix_path)
except: pass
return
class UnixTcpClientP2P(StreamingPackageSep):
def __init__(self, target_unix_path, self_unix_path=None, obj='bytes') -> None:
super().__init__()
self.target_unix_path = target_unix_path
if self_unix_path is not None:
self.self_unix_path = self_unix_path
else:
self.self_unix_path = target_unix_path+'_client_'+uuid.uuid1().hex[:5]
self.client = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
self.client.bind(self.self_unix_path)
self.use_pickle = (obj=='pickle')
self.convert_str = (obj=='str')
self.connected = False
atexit.register(self.__del__)
def send_dgram_to_target(self, data):
if self.use_pickle: data = pickle.dumps(data)
if self.convert_str: data = bytes(data, encoding='utf8')
if not self.connected: self.client.connect(self.target_unix_path); self.connected = True
self.lower_send(data, self.client)
if DEBUG_NETWORK: print('send_targeted_dgram :', self.client, ' data :', data)
return
def send_and_wait_reply(self, data):
if self.use_pickle: data = pickle.dumps(data)
if self.convert_str: data = bytes(data, encoding='utf8')
if not self.connected: self.client.connect(self.target_unix_path); self.connected = True
self.lower_send(data, self.client)
data, _ = self.lower_recv(self.client)
if self.convert_str: data = data.decode('utf8')
if self.use_pickle: data = pickle.loads(data)
if DEBUG_NETWORK: print('get_reply :', self.client, ' data :', data)
return data
def __del__(self):
self.client.close()
os.unlink(self.self_unix_path)
return
'''
remote_uuid = uuid.uuid1().hex # use uuid to identify threads
unix_path = 'TEMP/Sockets/unix/%s'%remote_uuid
server = UnixTcpServerP2P(unix_path, obj='pickle')
client = UnixTcpClientP2P(unix_path, obj='pickle')
def server_fn():
# data = server.wait_next_dgram()
# server.reply_last_client(np.array([4,5,6]))
while 1:
data = server.wait_next_dgram()
server.reply_last_client(data)
def client_fn():
# rep = client.send_and_wait_reply(np.array([1,2,3]))
while True:
buf = np.random.rand(100,1000)
rep = client.send_and_wait_reply(buf)
assert (buf==rep).all()
print('成功')
thread_hi = threading.Thread(target=server_fn)
thread_hello = threading.Thread(target=client_fn)
# 启动线程
thread_hi.start()
thread_hello.start()
'''
# send() is used for TCP SOCK_STREAM connected sockets, and sendto() is used for UDP SOCK_DGRAM unconnected datagram sockets
class TcpServerP2P(StreamingPackageSep):
def __init__(self, ip_port, obj='bytes') -> None:
super().__init__()
self.ip_port = ip_port
self.server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
self.server.bind(self.ip_port)
self.server.listen()
self.most_recent_client = None
self.use_pickle = (obj=='pickle')
self.convert_str = (obj=='str')
atexit.register(self.__del__)
def accept_conn(self):
conn, _ = self.server.accept()
return conn
def manual_wait_connection(self):
if self.most_recent_client is None:
self.most_recent_client, _ = self.server.accept()
return
def wait_next_dgram(self):
if self.most_recent_client is None: self.most_recent_client, _ = self.server.accept()
data, self.most_recent_client = self.lower_recv(self.most_recent_client)
if DEBUG_NETWORK: print('self.most_recent_client',self.most_recent_client)
if self.convert_str: data = data.decode('utf8')
if self.use_pickle: data = pickle.loads(data)
if DEBUG_NETWORK: print('recv from :', self.most_recent_client, ' data :', data)
return data
def wait_multi_dgrams(self):
if self.most_recent_client is None: self.most_recent_client, _ = self.server.accept()
data_list, self.most_recent_client = self.lower_recv(self.most_recent_client, expect_single=False)
if DEBUG_NETWORK: print('self.most_recent_client',self.most_recent_client)
if self.convert_str: data_list = [data.decode('utf8') for data in data_list]
if self.use_pickle: data_list = [pickle.loads(data) for data in data_list]
if DEBUG_NETWORK: print('recv from :', self.most_recent_client, ' data_list :', data_list)
return data_list
def reply_last_client(self, data):
assert self.most_recent_client is not None
if DEBUG_NETWORK: print('reply_last_client :', self.most_recent_client, ' data :', data)
if self.use_pickle: data = pickle.dumps(data)
if self.convert_str: data = bytes(data, encoding='utf8')
self.lower_send(data, self.most_recent_client)
return
def __del__(self):
self.close()
return
def close(self):
self.server.close()
class TcpClientP2P(StreamingPackageSep):
def __init__(self, target_ip_port, self_ip_port=None, obj='bytes') -> None:
super().__init__()
self.target_ip_port = target_ip_port
self.client = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
self.use_pickle = (obj=='pickle')
self.convert_str = (obj=='str')
self.connected = False
atexit.register(self.__del__)
def send_dgram_to_target(self, data):
if self.use_pickle: data = pickle.dumps(data)
if self.convert_str: data = bytes(data, encoding='utf8')
if not self.connected: self.client.connect(self.target_ip_port); self.connected = True
self.lower_send(data, self.client)
if DEBUG_NETWORK: print('send_targeted_dgram :', self.client, ' data :', data)
return
def manual_connect(self):
if not self.connected: self.client.connect(self.target_ip_port); self.connected = True
def send_and_wait_reply(self, data):
if self.use_pickle: data = pickle.dumps(data)
if self.convert_str: data = bytes(data, encoding='utf8')
if not self.connected: self.client.connect(self.target_ip_port); self.connected = True
self.lower_send(data, self.client)
data, _ = self.lower_recv(self.client)
if self.convert_str: data = data.decode('utf8')
if self.use_pickle: data = pickle.loads(data)
if DEBUG_NETWORK: print('get_reply :', self.client, ' data :', data)
return data
def __del__(self):
self.close()
return
def close(self):
self.client.close()
'''
ipport = ('127.0.0.1', 25453)
server = TcpServerP2P(ipport, obj='pickle')
client = TcpClientP2P(ipport, obj='pickle')
def server_fn():
data = server.wait_next_dgram()
server.reply_last_client(np.array([4,5,6]))
def client_fn():
rep = client.send_and_wait_reply(np.array([1,2,3]))
thread_hi = threading.Thread(target=server_fn)
thread_hello = threading.Thread(target=client_fn)
# 启动线程
thread_hi.start()
thread_hello.start()
'''
class TcpClientP2PWithCompress(StreamingPackageSep):
def __init__(self, target_ip_port, self_ip_port=None, obj='bytes') -> None:
import lz4.block as lz4block
self.lz4block = lz4block
self.try_decom_usize = 255
super().__init__()
self.target_ip_port = target_ip_port
self.client = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
self.use_pickle = (obj=='pickle')
assert not (obj=='str')
self.connected = False
atexit.register(self.__del__)
def decompress(self, data):
while True:
try:
decompressed = self.lz4block.decompress(data, uncompressed_size=self.try_decom_usize)
return decompressed
except:
self.try_decom_usize *= 2
if self.try_decom_usize > 10485760: # 10 MB
assert False, "compression failure"
return None
def compress(self, data):
compressed = self.lz4block.compress(data, store_size=False)
return compressed
def send_dgram_to_target(self, data):
if self.use_pickle: data = pickle.dumps(data)
data = bytes(data, encoding='utf8')
if not self.connected: self.client.connect(self.target_ip_port); self.connected = True
data = self.compress(data)
self.lower_send(data, self.client)
if DEBUG_NETWORK: print('send_targeted_dgram :', self.client, ' data :', data)
return
def manual_connect(self):
if not self.connected: self.client.connect(self.target_ip_port); self.connected = True
def send_and_wait_reply(self, data):
if self.use_pickle: data = pickle.dumps(data)
data = bytes(data, encoding='utf8')
if not self.connected: self.client.connect(self.target_ip_port); self.connected = True
data = self.compress(data)
self.lower_send(data, self.client)
data, _ = self.lower_recv(self.client)
data = self.decompress(data)
if self.use_pickle: data = pickle.loads(data)
if DEBUG_NETWORK: print('get_reply :', self.client, ' data :', data)
return data
def __del__(self):
self.close()
return
def close(self):
self.client.close()
class QueueOnTcpClient():
def __init__(self, ip):
TCP_IP, TCP_PORT = ip.split(':')
TCP_PORT = int(TCP_PORT)
ip_port = (TCP_IP, TCP_PORT)
self.tcpClientP2P = TcpClientP2P(ip_port, obj='str')
self.tcpClientP2P.manual_connect()
def send_str(self, b_msg):
self.tcpClientP2P.send_dgram_to_target(b_msg)
def close(self):
self.tcpClientP2P.close()
def __del__(self):
self.close()
class QueueOnTcpServer():
def __init__(self, ip_port):
from UTIL.network import TcpServerP2P
self.tcpServerP2P = TcpServerP2P(ip_port, obj='str')
self.handler = None
self.queue = None
self.buff = ['']
def wait_connection(self):
self.tcpServerP2P.manual_wait_connection()
t = threading.Thread(target=self.listening_thread)
t.daemon = True
t.start()
def listening_thread(self):
while True:
buff_list = self.tcpServerP2P.wait_multi_dgrams()
if self.handler is not None:
self.handler(buff_list)
if self.queue is not None:
self.queue.put(buff_list)
def set_handler(self, handler):
self.handler = handler
def get_queue(self):
import queue
self.queue = queue.Queue()
return self.queue
def recv(self):
return
def close(self):
self.tcpServerP2P.close()
def __del__(self):
self.close()
================================================
FILE: PythonExample/hmp_minimal_modules/UTIL/pip_find_missing.py
================================================
#coding=utf-8
import glob,os,sys,re,subprocess,platform
def print红(*kw):
print("\033[0;31m",*kw,"\033[0m")
def print绿(*kw):
print("\033[0;32m",*kw,"\033[0m")
def print黄(*kw):
print("\033[0;33m",*kw,"\033[0m")
def print蓝(*kw):
print("\033[0;34m",*kw,"\033[0m")
def print紫(*kw):
print("\033[0;35m",*kw,"\033[0m")
def print靛(*kw):
print("\033[0;36m",*kw,"\033[0m")
def printX(*kw):
print("\033[0;38m",*kw,"\033[0m")
# 用pip执行安装指令
def install(package):
try:
subprocess.check_call([sys.executable, "-m", "pip", "install",
"-i","https://pypi.tuna.tsinghua.edu.cn/simple",
"--progress-bar","emoji",
"--prefer-binary",
package])
except:
print红("执行命令 ", "pip", "install","-i","https://pypi.tuna.tsinghua.edu.cn/simple", package, "时,抛出错误")
pass
sys_name = platform.system()
if sys_name == "Windows":
try:
from colorama import init,Fore,Back,Style
init(autoreset=False)
def print红(*kw):
print(Fore.RED,*kw)
def print绿(*kw):
print(Fore.GREEN,*kw)
def print黄(*kw):
print(Fore.YELLOW,*kw)
def print蓝(*kw):
print(Fore.BLUE,*kw)
def print紫(*kw):
print(Fore.MAGENTA,*kw)
def print靛(*kw):
print(Fore.CYAN,*kw)
except:
install('colorama')
print('颜色组件安装完成!现在请重新运行!')
sys_name.exit(0)
"""
# step 1, 查询所有子路径.py脚本文件,列表
"""
py_script_list = glob.glob('./**/*.py', recursive=True)
required = []
local_name_list = {"None":False}
引发连锁错误的包_列表 = {"None":False}
"""
# step 2, 提取 import 以及 from *** import
"""
def 是否为工程内的文件交叉调用(包,python_file):
包_org = 包
if '.' not in 包:
res = os.path.exists("./"+包+".py")
if res:
return True,包_org
else:
return False, 包_org
if 包.startswith('.'):
包 = os.path.dirname(python_file).replace("/", ".").replace("..", ".") + 包
包_org = 包
包 = 包.replace(".", "/")
res = os.path.exists("./"+包+".py")
if res:
tmp = 包_org.split(".")
if tmp[0]!='': local_name_list[tmp[0]] = True
return True, 包_org
else:
return False, 包_org
for python_file in py_script_list:
with open(python_file,encoding='UTF-8') as f:
lines = f.readlines()
for line in lines:
if "import" in line or "from" in line:
t = line.split()
# from 开头 或者 import开头
if t[0] == "import" or t[0] == "from":
i = 1
包 = ""
for ti in t[1:]:
if (ti!="import") and (ti!="as"):
包 = 包 + ti
else:
break
if "," in 包:
包_l = 包.split(",")
else:
包_l = [包]
for 包 in 包_l:
包_debug = 包
if 包_debug == '.':
continue
res,包 = 是否为工程内的文件交叉调用(包,python_file)
if not res:
required.append(包)
required = set(required)
required = sorted(required)
"""
# step 3, 尝试import,筛查缺失的包
"""
print黄("**************************************************************")
print黄("尝试import")
# 使用清华镜像
need_fix_cmd_orig = "pip install -i https://pypi.tuna.tsinghua.edu.cn/simple "
need_fix_cmd = "pip install -i https://pypi.tuna.tsinghua.edu.cn/simple "
need_fix_list = []
failed_cmd = []
chain_failed = []
for 包 in required:
cmd = "import "+包
try:
# 如果这里罕见地报错,
# 说明该文件有 import开头的、被“”“包裹的注释,
# 找到它,然后删除这个奇葩注释
exec(cmd)
except ImportError as error:
print红("error trying to do:",cmd,error.msg)
error_str = error.msg.split('\'')
package_import_error = (len(error_str) >= 2)
if not package_import_error:
continue
包_error = error_str[1]
if '.' in 包:
包_l = 包.split('.')
包_tmp = 包_l[0]
# 引发问题的不是这个包本身,而是这个包import其他包,但这个包内引包失败了
# 仅仅是一个连锁错误而已,无需处理
if 包_error != 包:
print红("发生连锁引包错误: ",error.msg)
chain_failed.append(error.msg)
cmd = cmd + "\t\t此项仅仅由连锁引包错误导致: " + error.msg
if 包_tmp not in 引发连锁错误的包_列表:
引发连锁错误的包_列表[包_tmp]=True
else:
# 非连锁,一定是真的缺
引发连锁错误的包_列表[包]=False
failed_cmd.append(cmd)
if '.' in 包:
包_l = 包.split('.')
包 = 包_l[0]
if len(包)>19: # some comment mixed in somehow
continue
need_fix_list.append(包)
except BaseException as error:
print红(error)
else:
print绿("this package is ok:",cmd)
need_fix_list = set(need_fix_list)
need_fix_list = sorted(need_fix_list)
if len(failed_cmd) > 0:
print红("以下的包import操作失败")
for cmd in failed_cmd:
print红(cmd)
"""
# step 4, 处理缺失的包,并找到对应的pip安装指令
"""
term_replace_dict = {
"cv2":"opencv-python",
"torch":"torch",
"mpi":"mpi4py",
"MPI":"mpi4py",
"mujoco_py":"None", # pip cannot install this????
"pybullet_envs":"None",
"stable_baselines3":"None",
"pyximport":"cython",
"PIL":"None",
"collective_assult":"None",
"gym_fortattack":"None",
"multiagent":"None",
"z_config":"None",
"gym_vecenv":"None"
}
PIL
for inx, 包 in enumerate(need_fix_list):
if 包 in term_replace_dict:
包 = term_replace_dict[包]
need_fix_list[inx] = 包
if (包 in local_name_list) or (包 in 引发连锁错误的包_列表 and 引发连锁错误的包_列表[包]==True):
need_fix_list[inx] = "None"
need_fix_list = set(need_fix_list)
need_fix_list = sorted(need_fix_list)
if len(need_fix_list) == 0:
print绿("所有依赖已就绪")
exit(0)
"""
# step 5, 如果有requirement.txt,从中提取出有用的版本信息
"""
print黄("**************************************************************")
print蓝("requirement.txt中的相关信息")
execute_fix = []
if os.path.exists("./requirements.txt"):
with open("./requirements.txt",encoding='UTF-8') as f:
lines = f.readlines()
for line in lines:
if line.startswith("-"):
print蓝("requirement.txt要求以下版本: --> "+"pip install "+line[:-1])
print蓝("首先git clone,然后找到setup.py的路径,然后执行 pip install --no-deps -e .")
continue
line_split = line.split("==")
if (len(line_split)==2) and (line_split[0] in need_fix_list):
print蓝("requirement.txt要求以下版本: --> "+"pip install "+line[:-1])
"""
# step 6, 如果需要安装pytorch,gym等特殊包,对应给出安装建议
"""
def config_anaconda():
with open(__file__,'r') as f:
conda_cmd = f.readlines()
condarc_lines = conda_cmd[-18:-2]
f = open('./.condarc','w+')
f.writelines(condarc_lines)
f.close()
print黄("**************************************************************")
try:
conda_env_name = sys.executable.split('/')[-3]
except:
conda_env_name = sys.executable.split('\\')[-2]
for 包 in need_fix_list:
if 包 == "torch":
print蓝("pytorch需要手动安装,pytorch 的安装方法(选择其一),然后重新运行该脚本:")
print蓝("conda install -n %s pytorch torchvision torchaudio cudatoolkit=10.2 -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/"%conda_env_name)
print蓝("conda install -n %s pytorch torchvision torchaudio cudatoolkit=11.0 -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/"%conda_env_name)
print蓝("")
sys.exit(0)
# if 包 == "tensorflow":
# print靛("Tensorflow需要手动安装,首先,更换conda源的指令")
# config_anaconda()
# print靛("cp",os.getcwd()+"/.condarc","~/.condarc")
# print靛("然后,安装TF一代的指令")
# print靛("conda install -n %s tensorflow-gpu=1.*"%conda_env_name)
# sys.exit(0)
if (包 is not "None"):
need_fix_cmd = need_fix_cmd + 包 + " "
execute_fix.append(包)
print黄("**************************************************************")
print绿(need_fix_cmd)
print黄("**************************************************************")
"""
# step 7, 对于除了特殊包之外的其他软件包,调用pip直接安装
"""
print绿("注意!当前的conda环境是:",conda_env_name," 所有操作都将只在该conda环境内生效")
input("执行自动安装?")
if input("确定执行自动安装?(y/n)")=='y':
for 包 in execute_fix:
install(包)
"""
# step 8, 完成任务,取消以下代码的注释,测试pytorch是否工作
"""
# import torch
# flag = torch.cuda.is_available()
# print(flag)
# ngpu= 1
# # Decide which device we want to run on
# device = torch.device("cuda:0" if (torch.cuda.is_available() and ngpu > 0) else "cpu")
# print(device)
# print(torch.cuda.get_device_name(0))
# print(torch.rand(3,3).cuda())
'''
不要修改或者删除以下内容!!有用!!
channels:
- defaults
show_channel_urls: true
channel_alias: https://mirrors.tuna.tsinghua.edu.cn/anaconda
default_channels:
- https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
- https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
- https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/r
- https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/pro
- https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/msys2
custom_channels:
conda-forge: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
msys2: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
bioconda: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
menpo: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
pytorch: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
simpleitk: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
'''
================================================
FILE: PythonExample/hmp_minimal_modules/UTIL/shm_env.py
================================================
import numpy as np
import time
from MISSION.env_router import make_env_function
from UTIL.colorful import print亮红
N = lambda x: np.array(x)
# Here use a pool of multiprocess workers to control a bundle of environment to sync step
# SuperPool.add_target: in each process, initiate a class object named xxxx,
# example:
# self.SuperPool.add_target(name='env', lam=EnvWithRay, args_list=env_args_dict_list)
# SuperPool.exec_target: in each process, make the object (id by name) to call its method
# example:
# self.SuperPool.exec_target(name='env', dowhat='step', args_list=actions)
# self.SuperPool.exec_target(name='env', dowhat='reset')
# ! this class execute in child process
# Ray is much slower compare to our shm/pipe solution,
# we don't use it any more despite the class name
class EnvWithRay(object):
def __init__(self, env_args_dict):
env_name = env_args_dict['env_name']
proc_index = env_args_dict['proc_index']
env_init_fn = make_env_function(env_name=env_name, rank=proc_index)
self.env = env_init_fn()
# finally the env is initialized
self.observation_space = self.env.observation_space
self.action_space = self.env.action_space
self.echo = None
def __del__(self):
# print亮红('[shm_env.py] exec EnvWithRay exit')
if hasattr(self,'env'):
del self.env
def step(self, act):
if np.isnan(act).any():
# env is paused, skip by returning previous obs
assert self.echo is not None
return self.echo
# ! step here
ob, reward, done, info = self.env.step(act)
if isinstance(ob, list):
print('warning, ob is list, which is low-efficient')
ob = np.array(ob, dtype=object)
if np.any(done):
# if the environment is terminated,
# first, put terminal obs into 'info'
if info is None:
info = {'obs-echo':ob}
else:
assert isinstance(info, dict), ('oh? info is not dictionary? did not expect that...')
info.update({'obs-echo': ob.copy()})
# second, automatically reset env
ob = self.env.reset()
if isinstance(ob, tuple):
# some env like starcraft return (ob, info) tuple at reset
# have info, then update info
ob, info_reset = ob
info = self.dict_update(info, info_reset)
# preserve an echo here,
# will be use to handle unexpected env pause
self.echo = [ob, reward, done, info]
# give everything back to main process
return (ob, reward, done, info)
def dict_update(self, info, info_reset):
for key in info_reset:
if key in info: info[key+'-echo'] = info.pop(key)
info.update(info_reset)
return info
def reset(self):
return self.env.reset()
def sleep(self):
return self.env.sleep()
def render(self):
return self.env.render()
def close(self):
return None
def get_act_space(self):
return self.action_space
def get_obs_space(self):
return self.observation_space
def get_act_space_str(self):
return str(self.action_space)
def get_obs_space_str(self):
return str(self.observation_space)
# ! this class execute in main process
class SuperpoolEnv(object):
def __init__(self, process_pool, env_args_dict_list, spaces=None):
self.SuperPool = process_pool
self.num_envs = len(env_args_dict_list)
self.env_name_marker = env_args_dict_list[0][0]['marker']
self.env = 'env' + self.env_name_marker
self.SuperPool.add_target(name=self.env, lam=EnvWithRay, args_list=env_args_dict_list)
try:
self.observation_space = self.SuperPool.exec_target(name=self.env, dowhat='get_obs_space')[0]
self.action_space = self.SuperPool.exec_target(name=self.env, dowhat='get_act_space')[0]
except:
print亮红('Gym Space is unable to transfer between processes, using string instead')
self.observation_space = self.SuperPool.exec_target(name=self.env, dowhat='get_obs_space_str')[0]
self.action_space = self.SuperPool.exec_target(name=self.env, dowhat='get_act_space_str')[0]
# self.observation_space = self.SuperPool.exec_target(name=self.env, dowhat='get_obs_space_str')[0]
# self.action_space = self.SuperPool.exec_target(name=self.env, dowhat='get_act_space_str')[0]
return
def get_space(self):
return {'obs_space': self.observation_space, 'act_space': self.action_space}
def step(self, actions):
# ENV_PAUSE = [np.isnan(thread_act).any() for thread_act in actions]
results = self.SuperPool.exec_target(name=self.env, dowhat='step', args_list=actions)
obs, rews, dones, infos = zip(*results)
# if any(ENV_PAUSE):
# assert not all(ENV_PAUSE)
# return self.stack(ENV_PAUSE, obs, rews, dones, infos)
# else:
try:
return np.stack(obs), np.stack(rews), np.stack(dones), np.stack(infos)
except:
assert False, ('unalign! ',obs, rews, dones)
def reset(self):
results = self.SuperPool.exec_target(name=self.env, dowhat='reset')
# [ env.reset.remote() for env in self.ray_env_vector])
if isinstance(results[0], tuple):
obs, infos = zip(*results)
return np.stack(obs), np.stack(infos)
else:
return np.stack(results)
def sleep(self):
self.SuperPool.exec_target(name=self.env, dowhat='sleep')
================================================
FILE: PythonExample/hmp_minimal_modules/UTIL/shm_pool.pyx
================================================
"""
Author: Fu Qingxu,CASIA
Description: Efficient parallel execting tool,
it resembles Ray but:
1.optimized for single machine using shared memory
2.optimized for numpy ndarray
3.use semaphore for IPC
4.faster!
Note:
SHARE_BUF_SIZE: shared memory size, 10MB per process
"""
import time, pickle, platform, setproctitle, numpy, copy, traceback
from multiprocessing import Process, RawValue, Semaphore
from multiprocessing import shared_memory
from .hmp_daemon import kill_process_and_its_children
from ctypes import c_bool, c_uint32
from sys import stdout
SHARE_BUF_SIZE = 10485760 # 10 MB for parameter buffer
REGULAR_BUF_SIZE = 500000 # The non-numpy content max buffer size
TRAFFIC_LIGHT_ERROR = 2
TRAFFIC_LIGHT_CHILD_BUSY = 1
TRAFFIC_LIGHT_CHILD_FREE = 0
# define Python user-defined exceptions
class ChildExitException(Exception):
pass
def print_red(*kw,**kargs):
print("\033[1;31m",*kw,"\033[0m",**kargs)
def print_green(*kw,**kargs):
print("\033[1;32m",*kw,"\033[0m",**kargs)
if not stdout.isatty():
print_green = print_red = print
# optimize share mem IO for numpy ndarray
class ndarray_indicator():
def __init__(self, shape, dtype, shm_start, shm_end):
self.shape = shape
self.dtype = dtype
self.shm_start = shm_start
self.shm_end = shm_end
self.count = (self.shm_end-self.shm_start)//self.dtype.itemsize
# optimize share mem IO for numpy ndarray
def convert_ndarray(numpy_ndarray, shm_pointer, shm):
nbyte = numpy_ndarray.nbytes
shape = numpy_ndarray.shape
dtype = numpy_ndarray.dtype
assert shm_pointer+nbyte < SHARE_BUF_SIZE, ('share memory overflow, need at least %d, yet only have %d'%(shm_pointer+nbyte, SHARE_BUF_SIZE))
shm_array_object = numpy.ndarray(shape, dtype=dtype, buffer=shm[shm_pointer:shm_pointer+nbyte])
shm_array_object[:] = numpy_ndarray[:]
NID = ndarray_indicator(shape, dtype, shm_pointer, shm_pointer+nbyte)
shm_pointer = shm_pointer+nbyte
return NID, shm_pointer
# optimize share mem IO for numpy ndarray
def deepin(obj, shm, shm_pointer):
if isinstance(obj, list): iterator_ = enumerate(obj)
elif isinstance(obj, dict): iterator_ = obj.items()
elif isinstance(obj, numpy.ndarray) and obj.dtype=='object': iterator_ = enumerate(obj)
else:
assert not isinstance(obj, tuple)
return shm_pointer
for k, v in iterator_:
if isinstance(v, (list,dict)) and len(v)>0:
shm_pointer = deepin(v, shm, shm_pointer)
elif isinstance(v, tuple):
item2 = list(v)
shm_pointer = deepin(item2, shm, shm_pointer)
obj[k] = tuple(item2)
elif isinstance(v, numpy.ndarray) and len(v)>0:
if v.dtype == 'object':
shm_pointer = deepin(v, shm, shm_pointer)
elif v.nbytes < 64:
pass
else:
NID, shm_pointer = convert_ndarray(v, shm_pointer, shm)
obj[k] = NID
else:
continue
return shm_pointer
# optimize share mem IO for numpy ndarray
def opti_numpy_object(obj, shm, shm_pointer=REGULAR_BUF_SIZE):
shm_pointer_terminal = deepin(obj, shm, shm_pointer)
return obj, shm_pointer_terminal
# optimize share mem IO for numpy ndarray
def reverse_deepin(obj, shm):
if isinstance(obj, list): iterator_ = enumerate(obj)
elif isinstance(obj, dict): iterator_ = obj.items()
elif isinstance(obj, numpy.ndarray) and obj.dtype == 'object': iterator_ = enumerate(obj)
else: return
for k, v in iterator_:
if isinstance(v, (list,dict)) and len(v)>0:
reverse_deepin(v, shm)
if isinstance(v, numpy.ndarray) and v.dtype == 'object' and len(v)>0:
reverse_deepin(v, shm)
elif isinstance(v, tuple):
item2 = list(v)
reverse_deepin(item2, shm)
obj[k] = tuple(item2)
elif isinstance(v, ndarray_indicator):
obj[k] = numpy.frombuffer(shm, dtype=v.dtype, offset=v.shm_start, count=v.count).reshape(v.shape)
return
# optimize share mem IO for numpy ndarray
def reverse_opti_numpy_object(obj, shm):
reverse_deepin(obj, shm)
return obj
class SuperProc(Process):
"""
Child process worker (efficient distributed worker)
"""
# initialize traffic IO
def __init__(self, index, smib, smiobli, smtl, buf_size_limit, base_seed, sem_push, sem_pull):
super(SuperProc, self).__init__()
self.shared_memory = smib
self.shared_memory_io_buffer = smib.buf
self.shared_memory_io_buffer_len_indicator = smiobli
self.shared_memory_traffic_light = smtl
self.buf_size_limit = buf_size_limit
self.local_seed = index + base_seed
self.index = index
self.sem_push = sem_push
self.sem_pull = sem_pull
self.target_tracker = []
# on parent exit
def __del__(self):
if hasattr(self,'_deleted_'): return # avoid exit twice
else: self._deleted_ = True # avoid exit twice
self.shared_memory.close()
for target_name in self.target_tracker:
setattr(self, target_name, None) # GC by clearing the pointer.
# force terminate all child process
try: kill_process_and_its_children(self)
except Exception as e: print_red('[shm_pool]: error occur when kill_process_and_its_children:\n', e)
# add any class level objects
def automatic_generation(self, name, gen_fn, *arg):
setattr(self, name, gen_fn(*arg))
# add any class level objects
def add_targets(self, new_tarprepare_args):
for new_target_arg in new_tarprepare_args:
name, gen_fn, arg = new_target_arg
if name not in self.target_tracker: self.target_tracker.append(name)
if arg is None:
self.automatic_generation(name, gen_fn)
elif isinstance(arg, tuple):
self.automatic_generation(name, gen_fn, *arg)
else:
self.automatic_generation(name, gen_fn, arg)
# execute any class method, return the results
def execute_target(self, recv_args):
res_list = [None] * len(recv_args)
for i, recv_arg in enumerate(recv_args):
name, dowhat, arg = recv_arg
if dowhat == 'None':
continue
if arg is None:
res = getattr(getattr(self, name), dowhat)()
elif isinstance(arg, tuple):
res = getattr(getattr(self, name), dowhat)(*arg)
else:
res = getattr(getattr(self, name), dowhat)(arg)
res_list[i] = res
return res_list
# inf loop, controlled / blocked by semaphore
def run(self):
# reset numpy seed
import numpy; numpy.random.seed(self.local_seed)
# set top process title
setproctitle.setproctitle('HmapShmPoolWorker_%d'%self.index)
try:
while True:
recv_args = self._recv_squence() # block and wait incoming req
if not isinstance(recv_args, list): # not list object, switch to helper channel
if recv_args == 0:
self._set_done()
self.add_targets(self._recv_squence())
self._set_done()
elif recv_args == -1:
self._set_done() # termination signal
break
else:
assert False, "unknown command"
continue
else:
# if list, execute target
result = self.execute_target(recv_args)
# return the results (self._set_done() is called inside)
self._send_squence(result)
except KeyboardInterrupt:
# 'child KeyboardInterrupt: close unlink'
self._demand_exit()
self.__del__()
except:
print_red(traceback.format_exc(), flush=True)
self._demand_exit()
self.__del__()
def _demand_exit(self):
self.shared_memory_traffic_light.value = TRAFFIC_LIGHT_ERROR # CORE! the job is done, waiting for next one
self.sem_pull.release()
# block and wait incoming req
def _recv_squence(self):
self.sem_push.acquire()
assert self.shared_memory_traffic_light.value == TRAFFIC_LIGHT_CHILD_BUSY
bufLen = self.shared_memory_io_buffer_len_indicator.value
recv_args = pickle.loads(self.shared_memory_io_buffer[:bufLen])
recv_args = reverse_opti_numpy_object(recv_args, shm=self.shared_memory_io_buffer)
return recv_args
# return results
def _send_squence(self, send_obj):
assert self.shared_memory_traffic_light.value == TRAFFIC_LIGHT_CHILD_BUSY
# second prepare parameter
send_obj, _ = opti_numpy_object(send_obj, shm=self.shared_memory_io_buffer)
picked_obj = pickle.dumps(send_obj, protocol=pickle.HIGHEST_PROTOCOL)
lenOfObj = len(picked_obj)
assert lenOfObj <= REGULAR_BUF_SIZE, ('The non-numpy content size > 0.5MB, please check!', lenOfObj)
self.shared_memory_io_buffer_len_indicator.value = lenOfObj
self.shared_memory_io_buffer[:lenOfObj] = picked_obj
# then light up the work flag, turn off the processed flag
self.shared_memory_traffic_light.value = TRAFFIC_LIGHT_CHILD_FREE # CORE! the job is done, waiting for next one
self.sem_pull.release()
# set traffic IO flag
def _set_done(self):
self.shared_memory_traffic_light.value = TRAFFIC_LIGHT_CHILD_FREE # CORE! the job is done, waiting for next one
self.sem_pull.release()
class SmartPool(object):
"""
Main parallel runner / coodinator
"""
# setup and spawn workers
def __init__(self, proc_num, fold, base_seed=None):
self.proc_num = proc_num
self.task_fold = fold
self.base_seed = int(numpy.random.rand()*1e5) if base_seed is None else base_seed
self.buf_size_limit = SHARE_BUF_SIZE # 10 MB for parameter buffer
print_green('Linux multi-env using share memory')
setproctitle.setproctitle('HmapRootProcess')
self.shared_memory_io_buffer_handle = [shared_memory.SharedMemory(create=True, size=SHARE_BUF_SIZE) for _ in range(proc_num)]
self.shared_memory_io_buffer_len_indicator = [RawValue(c_uint32, 0) for _ in range(proc_num)]
self.shared_memory_traffic_light = [RawValue(c_uint32, False) for _ in range(proc_num)] # time to work flag
self.last_time_response_handled = [True for _ in range(proc_num)] # time to work flag
self.semaphore_push = [Semaphore(value=0) for _ in range(proc_num)] # time to work flag
self.semaphore_pull = Semaphore(value=0) # time to work flag
self.proc_pool = [SuperProc(cnt, smib, smiobli, smtl, SHARE_BUF_SIZE, self.base_seed,
sem_push, self.semaphore_pull)
for cnt, smib, smiobli, smtl, sem_push in
zip(range(proc_num),
self.shared_memory_io_buffer_handle, self.shared_memory_io_buffer_len_indicator,
self.shared_memory_traffic_light, self.semaphore_push
)]
self.shared_memory_io_buffer = [shm.buf for shm in self.shared_memory_io_buffer_handle]
self.t_profile = 0
for proc in self.proc_pool:
# proc.daemon = True
proc.start()
# add class level targets in each worker
def add_target(self, name, lam, args_list=None):
lam_list = None
if isinstance(lam, list): lam_list = lam
# send command for workers to wait appending new target
for j in range(self.proc_num):
self._send_squence(send_obj=0, target_proc=j)
self.notify_all_children()
for j in range(self.proc_num): self._wait_done(j)
for j in range(self.proc_num):
tuple_list_to_be_send = []
for i in range(self.task_fold):
name_fold = name + str(i)
args = None if args_list is None else args_list[i + j*self.task_fold]
if lam_list is not None: lam = lam_list[i + j*self.task_fold]
tuple_list_to_be_send.append((name_fold, lam, args))
self._send_squence(send_obj=tuple_list_to_be_send, target_proc=j)
self.notify_all_children()
for j in range(self.proc_num): self._wait_done(j)
# run class method in each worker
def exec_target(self, name, dowhat, args_list = None, index_list = None, ensure_safe = False):
if index_list is not None:
for j in range(self.proc_num):
tuple_list_to_be_send = []
for i in range(self.task_fold):
n_thread = i + j*self.task_fold
name_fold = name + str(i)
if n_thread in index_list:
args = None if args_list is None else args_list[index_list.index(n_thread)]
tuple_list_to_be_send.append((name_fold, dowhat, args))
else:
tuple_list_to_be_send.append((name_fold, 'None', 'None'))
self._send_squence(send_obj=tuple_list_to_be_send, target_proc=j, ensure_safe=ensure_safe)
self.semaphore_push[j].release()
else: # if index_list is None:
for j in range(self.proc_num):
tuple_list_to_be_send = []
for i in range(self.task_fold):
name_fold = name + str(i)
args = None if args_list is None else args_list[i + j*self.task_fold]
tuple_list_to_be_send.append((name_fold, dowhat, args))
self._send_squence(send_obj=tuple_list_to_be_send, target_proc=j, ensure_safe=ensure_safe)
self.semaphore_push[j].release()
res_sort = self._recv_squence_all()
return res_sort
# low-level send
def _send_squence(self, send_obj, target_proc, ensure_safe=False):
assert self.last_time_response_handled[target_proc] == True
send_obj, shm_pointer = opti_numpy_object(send_obj, shm=self.shared_memory_io_buffer[target_proc])
picked_obj = pickle.dumps(send_obj, protocol=pickle.HIGHEST_PROTOCOL)
lenOfObj = len(picked_obj)
assert lenOfObj <= REGULAR_BUF_SIZE, ('The non-numpy content size > 0.5MB, please check!', lenOfObj)
self.shared_memory_io_buffer_len_indicator[target_proc].value = lenOfObj
self.shared_memory_io_buffer[target_proc][:lenOfObj] = picked_obj
self.last_time_response_handled[target_proc] = False # then light up the work flag, turn off the processed flag
if ensure_safe and shm_pointer != REGULAR_BUF_SIZE:
send_obj = reverse_opti_numpy_object(send_obj, shm=self.shared_memory_io_buffer[target_proc])
self.shared_memory_traffic_light[target_proc].value = TRAFFIC_LIGHT_CHILD_BUSY
# low-level recv
def _recv_squence_all(self):
res_sort = [None] * (self.proc_num*self.task_fold)
not_ready = [True] * self.proc_num
n_acq = 0
ready_n = 0
while True:
self.semaphore_pull.acquire() # wait child process and OS coordination, it will take a moment
n_acq += 1
for target_proc, not_r in enumerate(not_ready):
if not not_r: continue # finish already
if self.shared_memory_traffic_light[target_proc].value == TRAFFIC_LIGHT_CHILD_BUSY: continue # not ready
if self.shared_memory_traffic_light[target_proc].value == TRAFFIC_LIGHT_ERROR: raise ChildExitException
bufLen = self.shared_memory_io_buffer_len_indicator[target_proc].value
recv_obj = pickle.loads(self.shared_memory_io_buffer[target_proc][:bufLen])
recv_obj = reverse_opti_numpy_object(recv_obj, shm=self.shared_memory_io_buffer[target_proc])
self.last_time_response_handled[target_proc] = True
res_sort[target_proc*self.task_fold: (target_proc+1)*self.task_fold] = recv_obj
not_ready[target_proc] = False
ready_n += 1
if ready_n == self.proc_num:
break
for _ in range(self.proc_num-n_acq):
self.semaphore_pull.acquire() # clear semaphore_pull
return res_sort
# low-level wait
def _wait_done(self, target_proc): # used only in add_target
self.semaphore_pull.acquire()
if self.shared_memory_traffic_light[target_proc].value == TRAFFIC_LIGHT_ERROR: raise ChildExitException
self.last_time_response_handled[target_proc] = True
# let all workers know about incomming req
def notify_all_children(self):
for j in range(self.proc_num):
self.semaphore_push[j].release() # notify all child process
# exit and clean up carefully
def party_over(self):
self.__del__()
# exit and clean up carefully
def __del__(self):
if hasattr(self, 'terminated'):
return
# traceback.print_exc()
print_green('[shm_pool]: executing superpool del')
try:
for i in range(self.proc_num): self._send_squence(send_obj=-1, target_proc=i)
self.notify_all_children()
# print('[shm_pool]: self.notify_all_children()')
except: pass
# print('[shm_pool]: shm.close(); shm.unlink()')
for shm in self.shared_memory_io_buffer_handle:
try: shm.close(); shm.unlink()
except: pass
N_SEC_WAIT = 2
for i in range(N_SEC_WAIT):
print_red('[shm_pool]: terminate in %d'%(N_SEC_WAIT-i));time.sleep(1)
# kill shm_pool's process tree
# print_red('[shm_pool]: kill_process_and_its_children(proc)')
for proc in self.proc_pool:
try: kill_process_and_its_children(proc)
except Exception as e: pass # print_red('[shm_pool]: error occur when kill_process_and_its_children:\n', e)
print_green('[shm_pool]: __del__ finish')
self.terminated = True
# To compat Windows, redirect to pipe solution
if not platform.system()=="Linux":
from UTIL.win_pool import SmartPool
================================================
FILE: PythonExample/hmp_minimal_modules/UTIL/sync_exp.py
================================================
import torch, time
import pickle, os
from UTIL.colorful import print亮红
from .tensor_ops import __hash__
from UTIL.exp_helper import singleton
@singleton
class SynWorker:
def __init__(self, mod) -> None:
self.sychronize_FILE_hashdict = 'TEMP/sychronize_hashdict'
self.sychronize_FILE_cnt = 'TEMP/sychronize_cnt'
self.mod = mod
self.sychronize_internal_hashdict = {}
self.sychronize_internal_cnt = {}
self.follow_cnt = {}
print亮红('warning, SynWorker init, mod is', mod)
time.sleep(5)
if mod == 'follow':
with open(self.sychronize_FILE_hashdict, 'rb') as f:
self.sychronize_internal_hashdict = pickle.load(f)
with open(self.sychronize_FILE_cnt, 'rb') as f:
self.sychronize_internal_cnt = pickle.load(f)
else:
try:
os.remove(self.sychronize_FILE_hashdict)
os.remove(self.sychronize_FILE_cnt)
except: pass
def dump_sychronize_data(self):
if self.mod == 'follow':
return
with open(self.sychronize_FILE_hashdict, 'wb+') as f:
pickle.dump(self.sychronize_internal_hashdict, f)
with open(self.sychronize_FILE_cnt, 'wb+') as f:
pickle.dump(self.sychronize_internal_cnt, f)
def sychronize_experiment(self, key, data, reset_when_close=False):
if self.mod == 'lead':
hash_code = __hash__(data)
if key not in self.sychronize_internal_hashdict:
self.sychronize_internal_cnt[key] = 0
self.sychronize_internal_hashdict[key] = [
{
'hash_code':hash_code,
'data': data,
}
,
]
else:
self.sychronize_internal_hashdict[key].append({
'hash_code':hash_code,
'data': data,
})
self.sychronize_internal_cnt[key] += 1
if self.mod == 'follow':
hash_code = __hash__(data)
if key not in self.follow_cnt:
self.follow_cnt[key] = 0
if hash_code != self.sychronize_internal_hashdict[key][self.follow_cnt[key]]['hash_code']:
if not (torch.isclose(self.sychronize_internal_hashdict[key][self.follow_cnt[key]]['data'],data).all()) or (not isinstance(data, torch.Tensor)):
print('%s: error expected hash: %s, get hash %s, data %s'%(key,
self.sychronize_internal_hashdict[key][self.follow_cnt[key]]['hash_code'],
hash_code,
str(data)
))
else:
print('%s: error expected hash, but very very close (<1e-5)'%key)
if reset_when_close:
return data
self.follow_cnt[key] += 1
================================================
FILE: PythonExample/hmp_minimal_modules/UTIL/tensor_ops.py
================================================
import copy, json
import numpy as np
from functools import lru_cache
try:
import torch
import torch.nn.functional as F
except:
print('warning, pytorch not installed!')
print('警告, 没有安装pytorch, 所有pytorch相关函数不可用!')
class torch():
Tensor = Exception
from functools import wraps
class ConfigCache(object):
def __init__(self) -> None:
super().__init__()
self.init = False
def read_cfg(self):
from config import GlobalConfig
if GlobalConfig.cfg_ready:
self.device_ = GlobalConfig.device
self.use_float64_ = GlobalConfig.use_float64
self.init = True
@property
def device(self):
if not self.init: self.read_cfg()
assert self.init, ('cuda_cfg not ready!')
return self.device_
@property
def use_float64(self):
if not self.init: self.read_cfg()
assert self.init, ('cuda_cfg not ready!')
return self.use_float64_
cuda_cfg = ConfigCache()
def pt_inf():
# if not cuda_cfg.init: cuda_cfg.read_cfg()
pt_dtype = torch.float64 if cuda_cfg.use_float64 else torch.float32
return torch.tensor(np.inf, dtype=pt_dtype, device=cuda_cfg.device)
def pt_nan():
# if not cuda_cfg.init: cuda_cfg.read_cfg()
pt_dtype = torch.float64 if cuda_cfg.use_float64 else torch.float32
return torch.tensor(np.nan, dtype=pt_dtype, device=cuda_cfg.device)
def vis_mat(mat):
mat = mat.astype(np.float)
mat = mat - mat.min()
mat = mat / mat.max()
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
imgplot = plt.imshow(mat)
plt.xlabel("cols, 2rd dim")
plt.ylabel("lines, 1st dim")
plt.show()
"""
improve torch.repeat / torch.expand function
eg.1 x.shape = (4, 5, 6, 7); insert_dim = -1; n_times=666
y = repeat_at(x, insert_dim, n_times)
y.shape = (4, 5, 6, 7, 666)
eg.2 x.shape = (4, 5, 6, 7); insert_dim = +1; n_times=666
y = repeat_at(x, insert_dim, n_times)
y.shape = (4, 666, 5, 6, 7)
"""
def repeat_at(tensor, insert_dim, n_times, copy_mem=False):
if not isinstance(tensor, torch.Tensor):
return np_repeat_at(tensor, insert_dim, n_times)
tensor = tensor.unsqueeze(insert_dim)
shape = list(tensor.shape)
assert shape[insert_dim] == 1
shape[insert_dim] = n_times
if copy_mem: tensor.repeat(*shape)
return tensor.expand(*shape)
def np_repeat_at(array, insert_dim, n_times):
array = np.expand_dims(array, insert_dim)
return array.repeat(axis=insert_dim, repeats=n_times)
def copy_clone(x):
if x is None:
return None
return (
x.clone()
if hasattr(x, "clone")
else x.copy()
if hasattr(x, "copy")
else copy.deepcopy(x)
)
"""
improve np.reshape and torch.view function
If a dim is assigned with 0, it will keep its original dimension
eg.1 x.shape = (4, 5, 6, 7); new_shape = [0, 0, -1]
y = my_view(x, new_shape)
y.shape = (4, 5, 6*7)
eg.2 x.shape = (4, 5, 6, 7); new_shape = [-1, 0, 0]
y = my_view(x, new_shape)
y.shape = (4*5, 6, 7)
eg.3 x.shape = (4, 5, 6); new_shape = [0, 0, -1, 3]
y = my_view(x, new_shape)
y.shape = [4, 5, 2, 3]
eg.4 x.shape = (3, 4, 5, 6); new_shape = [0, 2, -1, 0, 0]
y = my_view(x, new_shape)
y.shape = [3, 2, 2, 5, 6]
eg.5 x.shape = (32, 10, 24); new_shape = [32, 10, 24, 1]
y = my_view(x, new_shape)
y.shape = [32, 10, 24, 1]
Error eg.1
x.shape = (3, 4, 5, 6); new_shape = [0, 2, 0, -1, 0]
Error: 2(!=4) and -1 must stick together!
Fix 1: new_shape = [0, 2, 2, 0, 0]
Fix 2: new_shape = [0, 2, -1, 0, 0]
Fix 3: new_shape = [0, -1, 2, 0, 0]
After Fix: y.shape = [3, 2, 2, 5, 6]
Error eg.2
x.shape = (3, 4, 5, 6); new_shape = [12, 0, -1]
Error: 12(!=3) and -1 must stick together!
Fix 1: new_shape = [12, 0, 0]
Fix 2: new_shape = [12, -1, 6]
Fix 3: new_shape = [12, -1, 0]
Fix 4: new_shape = [-1, 0, 0]
After Fix: y.shape = [12, 5, 6]
"""
def my_view(x, shape):
# fill both way until meet -1
for i, dim in enumerate(shape):
if dim == 0: shape[i] = x.shape[i]
elif dim == -1: break
elif i >= len(x.shape): break # prevent x.shape[i] out of range
elif dim != x.shape[i]: break
for i in range(len(shape)):
if i >= len(x.shape): break # prevent x.shape[ni] out of range
ni = -(i + 1)
dim = shape[ni]
if dim == 0: shape[ni] = x.shape[ni]
elif dim == -1: break
# print(shape)
if isinstance(x, np.ndarray):
return x.reshape(*shape)
return x.view(*shape)
def add_onehot_id_at_last_dim(x):
if isinstance(x, np.ndarray):
return np_add_onehot_id_at_last_dim(x)
_hot_dim = x.shape[-2]
_identity = torch.tile(torch.eye(_hot_dim, device=x.device), (*x.shape[:-2], 1, 1))
return torch.cat((x, _identity), -1)
def np_add_onehot_id_at_last_dim(x):
_hot_dim = x.shape[-2]
_identity = np.tile(np.eye(_hot_dim), (*x.shape[:-2], 1, 1))
return np.concatenate((x, _identity), -1)
# x. shape = (..., core_dim)
# agent_ids.shape = (..., null)
# output. shape = (..., core_dim+fixlen)
def add_onehot_id_at_last_dim_fixlen(x, fixlen, agent_ids):
if agent_ids is None:
return add_onehot_id_at_last_dim(x)
# if isinstance(x, np.ndarray):
# return np_add_onehot_id_at_last_dim_fixlen(x, fixlen)
# manually control output vector length
# or
# adjust output vector length according to -2 dim
_identity = torch.eye(fixlen, device=x.device)[agent_ids]
return torch.cat((x, _identity), -1)
# def np_add_onehot_id_at_last_dim_fixlen(x, fixlen, agent_ids):
# _identity = np.tile(np.eye(fixlen), (*x.shape[:-2], 1, 1))
# return np.concatenate((x, _identity[..., :x.shape[-2], :]), -1)
"""
numpy corresponding to torch.nn.functional.one_hot
x is array, e.g. x = [4,2,3,1]
n is int, e.g. n=5
>> np_one_hot( np.array([4,2,3,1]), n=5)
np.array([
[0,0,0,0,1],
[0,0,1,0,0],
[0,0,0,1,0],
[0,1,0,0,0],
])
"""
def np_one_hot(x, n):
return np.eye(n)[x]
def add_obs_container_subject(container_emb, subject_emb, div):
# for subject, add one-hot embedding of its group
n_container = container_emb.shape[1]
subject_belonging_info = np_one_hot(div, n_container)
subject_out_emb = np.concatenate((subject_emb, subject_belonging_info), -1)
# for container, add add multi-hot embedding of its subjects
container_multihot = np.concatenate(
[np.expand_dims((div == nth_container).astype(np.long), 1)
for nth_container in range(n_container)],
1,
)
container_out_emb = np.concatenate((container_emb, container_multihot), -1)
return container_out_emb, subject_out_emb
def MayGoWrong(f):
@wraps(f)
def decorated(*args, **kwargs):
try:
return f(*args, **kwargs)
except:
print('going wrong!')
return f(*args, **kwargs)
return decorated
def dummy_decorator(f=None):
if callable(f):
@wraps(f)
def decorated(*args, **kwargs):
return f(*args, **kwargs)
return decorated
else:
def actual_decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
return func(*args, **kwargs)
return wrapper
return actual_decorator
"""
Function decorate,
Turning numpy array to torch.Tensor, then put it on the right GPU / CPU
"""
def Args2tensor(f):
# if not cuda_cfg.init: cuda_cfg.read_cfg()
def _2tensor(x):
if isinstance(x, torch.Tensor):
return x.to(cuda_cfg.device)
elif isinstance(x, np.ndarray):
if (not cuda_cfg.use_float64) and x.dtype == np.float64:
x = x.astype(np.float32)
if cuda_cfg.use_float64 and x.dtype == np.float32:
x = x.astype(np.float64)
return torch.from_numpy(x).to(cuda_cfg.device)
elif isinstance(x, dict):
y = {}
for key in x:
y[key] = _2tensor(x[key])
return y
else:
return x
@wraps(f)
def decorated(*args, **kwargs):
for key in kwargs:
kwargs[key] = _2tensor(kwargs[key])
return f(*(_2tensor(arg) for arg in args), **kwargs)
return decorated
def Return2numpy(f):
def _2cpu2numpy(x):
return (
None
if x is None
else x
if not isinstance(x, torch.Tensor)
else x.detach().cpu().numpy()
if x.requires_grad
else x.cpu().numpy()
)
@wraps(f)
def decorated(*args, **kwargs):
ret_tuple = f(*args, **kwargs)
if isinstance(ret_tuple, tuple):
return (_2cpu2numpy(ret) for ret in ret_tuple)
else:
return _2cpu2numpy(ret_tuple)
return decorated
"""
Function decorate,
Turning numpy array to torch.Tensor, then put it on the right GPU / CPU,
When returning, convert all torch.Tensor to numpy array
"""
def Args2tensor_Return2numpy(f):
def _2tensor(x):
if isinstance(x, torch.Tensor):
return x.to(cuda_cfg.device)
elif isinstance(x, np.ndarray) and x.dtype != 'object':
if (not cuda_cfg.use_float64) and x.dtype == np.float64:
x = x.astype(np.float32)
if cuda_cfg.use_float64 and x.dtype == np.float32:
x = x.astype(np.float64)
return torch.from_numpy(x).to(cuda_cfg.device)
elif isinstance(x, dict):
y = {}
for key in x:
y[key] = _2tensor(x[key])
return y
else:
return x
def _2cpu2numpy(x):
return (
None
if x is None
else x
if not isinstance(x, torch.Tensor)
else x.detach().cpu().numpy()
if x.requires_grad
else x.cpu().numpy()
)
@wraps(f)
def decorated(*args, **kwargs):
for key in kwargs:
kwargs[key] = _2tensor(kwargs[key])
ret_tuple = f(*(_2tensor(arg) for arg in args), **kwargs)
if not isinstance(ret_tuple, tuple):
return _2cpu2numpy(ret_tuple)
return (_2cpu2numpy(ret) for ret in ret_tuple)
return decorated
"""
Turning torch.Tensor to numpy array, put it on CPU,
"""
def _2cpu2numpy(x):
return (
None
if x is None
else x
if not isinstance(x, torch.Tensor)
else x.detach().cpu().numpy()
if x.requires_grad
else x.cpu().numpy()
)
"""
Convert torch.Tensor to numpy array.
Turning numpy array to torch.Tensor, then put it on the right GPU / CPU.
"""
def _2tensor(x):
# if not cuda_cfg.init: cuda_cfg.read_cfg()
if isinstance(x, torch.Tensor):
return x.to(cuda_cfg.device)
elif isinstance(x, np.ndarray):
if (not cuda_cfg.use_float64) and x.dtype == np.float64:
x = x.astype(np.float32)
if cuda_cfg.use_float64 and x.dtype == np.float32:
x = x.astype(np.float64)
return torch.from_numpy(x).to(cuda_cfg.device)
elif isinstance(x, dict):
y = {}
for key in x:
y[key] = _2tensor(x[key])
return y
elif isinstance(x, torch.nn.Module):
x.to(cuda_cfg.device)
return x
else:
return x
"""
Stack an array whose elements with different len, pad empty place with with NaN
"""
def pad_vec_array(arr_list, max_len):
# init to NaNs
res = np.zeros(shape=(len(arr_list), max_len), dtype=np.double) + np.nan
for i in range(len(arr_list)):
if arr_list[i] is None:
continue
res[i, : len(arr_list[i])] = arr_list[i]
return res
def one_hot_with_nan_np(tensr, num_classes):
tensr = tensr.copy()
tensr[np.isnan(tensr)] = num_classes
Res_1MoreCol = np_one_hot(tensr.astype(np.long), num_classes + 1)
return Res_1MoreCol[..., :-1]
def one_hot_with_nan(tensr, num_classes):
if isinstance(tensr, np.ndarray):
return one_hot_with_nan_np(tensr, num_classes)
tensr = tensr.clone()
tensr[torch.isnan(tensr)] = num_classes
Res_1MoreCol = F.one_hot(tensr.long(), num_classes + 1)
return Res_1MoreCol[..., :-1]
def scatter_with_nan(tensr, num_classes, out_type="binary"):
res = one_hot_with_nan(tensr, num_classes)
res = res.sum(-2)
if out_type == "bool":
res = res != 0
return res
"""
Not used anymore
"""
def process_space(space):
# starcraft 环境无须特殊处理
if not ("Box" in space["obs_space"] or "Discrete" in space["act_space"]):
return space
# 其他环境需要进行格式转换
import re
obs_dim = int(
re.findall(
re.compile(r"Box[(]-inf, inf, [(](.*?)[,)]", re.S), space["obs_space"]
)[0]
)
print(space["obs_space"])
space_ = {}
space_["obs_space"] = {}
space_["act_space"] = {}
space_["obs_space"]["state_shape"] = 8
space_["obs_space"]["obs_shape"] = obs_dim
space_["act_space"]["n_actions"] = 8
space_["obs_space"] = str(space_["obs_space"])
space_["act_space"] = str(space_["act_space"])
return space_
"""
Not used anymore
"""
class Policy_shift_observer(object):
def __init__(self, act_range, act_num):
self.act_range = act_range # 15
self.act_num = act_num # 3
self.act_cnt_array = np.zeros(shape=(act_num, act_range))
self.rate = None
self.rate_history = None
def new_sample(self, act):
act_rec = act.shape[0]
for act_index in range(self.act_num):
for act_nth in range(self.act_range):
self.act_cnt_array[act_index, act_nth] = torch.sum(
(act[:, act_index] == act_nth).long()
)
self.rate = self.act_cnt_array / act_rec
if self.rate_history is None:
self.rate_history = self.rate
else:
self.rate_history = self.rate_history * 0.9 + self.rate * 0.1
print("rate", self.rate)
# conclusion: the action distribution is not reinforced because the rewards signal is too weak.
"""
Get the hash code string of an array,
compatable for numpy array and torch.tensor
"""
def __hash__(x):
import hashlib
md5 = hashlib.md5() # ignore
# if isinstance(x, str):
# md5.update(x)
# return md5.hexdigest()
if hasattr(x, "cpu"):
md5.update(x.detach().cpu().numpy().data.tobytes())
return md5.hexdigest()
elif hasattr(x, "numpy"):
md5.update(x.numpy().data.tobytes())
return md5.hexdigest()
elif hasattr(x, "data"):
md5.update(x.data.tobytes())
return md5.hexdigest()
else:
try:
md5.update(x.encode("utf-8"))
return md5.hexdigest()
except:
return str(x)
def __hashm__(*args):
import hashlib
md5 = hashlib.md5() # ignore
for arg in args:
x = arg
if hasattr(x, "cpu"):
md5.update(x.detach().cpu().numpy().data.tobytes())
elif hasattr(x, "numpy"):
md5.update(x.numpy().data.tobytes())
elif hasattr(x, "data"):
md5.update(x.data.tobytes())
else:
try:
md5.update(x.encode("utf-8"))
except:
md5.update(str(x).encode("utf-8"))
return md5.hexdigest()
"""
Get the hash code string of the pytorch network parameters
eg.
__hashn__(mlp_module.parameters())
"""
def __hashn__(generator):
import hashlib
md5 = hashlib.md5() # ignore
for arg in generator:
x = arg.data
if hasattr(x, "cpu"):
md5.update(x.detach().cpu().numpy().data.tobytes())
elif hasattr(x, "numpy"):
md5.update(x.numpy().data.tobytes())
elif hasattr(x, "data"):
md5.update(x.data.tobytes())
else:
try:
md5.update(x.encode("utf-8"))
except:
md5.update(str(x).encode("utf-8"))
return md5.hexdigest()
"""
numpy version of softmax
"""
def np_softmax(x, axis=None):
# compute in log space for numerical stability
return np.exp(x - logsumexp(x, axis=axis, keepdims=True))
"""
numpy version of logsumexp
"""
def logsumexp(a, axis=None, keepdims=False, return_sign=False):
a_max = np.amax(a, axis=axis, keepdims=True)
if a_max.ndim > 0:
a_max[~np.isfinite(a_max)] = 0
elif not np.isfinite(a_max):
a_max = 0
tmp = np.exp(a - a_max)
# suppress warnings about log of zero
with np.errstate(divide="ignore"):
s = np.sum(tmp, axis=axis, keepdims=keepdims)
if return_sign:
sgn = np.sign(s)
s *= sgn # /= makes more sense but we need zero -> zero
out = np.log(s)
if not keepdims:
a_max = np.squeeze(a_max, axis=axis)
out += a_max
if return_sign:
return out, sgn
else:
return out
"""
函数说明:在有限的、不均衡的多标签数据集中,按照预设的比例,取出尽可能多的样本
"""
def sample_balance(x, y, n_class, weight=None):
if weight is None:
weight = torch.ones(n_class, device=x.device)
else:
weight = torch.Tensor(weight).to(x.device)
n_instance = torch.zeros(n_class, device=x.device)
indices = [None] * n_class
for i in range(n_class):
indices[i] = torch.where(y == i)[0]
n_instance[i] = len(indices[i])
ratio = n_instance / weight
bottle_neck = torch.argmin(n_instance / weight)
r = ratio[bottle_neck]
n_sample = (r * weight).long()
# print(n_instance, n_sample)
new_indices = [indices[i][torch.randperm(n_sample[i])] for i in range(n_class)]
# print(new_indices)
new_indices_ = torch.cat(new_indices)
assert len(new_indices_) == sum(n_sample)
return x[new_indices_], y[new_indices_]
"""
gather tensor with index,
regarding all right hand dimensions as dimensions need to be gathered
eg.1
src = torch.Tensor([[[ 0, 1, 2], [ 3, 4, 5]],
[[ 6, 7, 8], [ 9, 10, 11]],
[[12, 13, 14], [15, 16, 17]]])
index = torch.Tensor([[0], [1], [0]])
src.shape = (3, 2, 3)
src.shape = (3, 1)
>> res = gather_righthand(src,index)
res.shape = (3, 1, 3)
res= tensor([[[ 0., 1., 2.]],
[[ 9., 10., 11.]],
[[12., 13., 14.]]])
eg.2
src.shape = (64, 16, 8, 88, 888)
index.shape = (64, 5)
>> res = gather_righthand(src,index)
res.shape = (64, 5, 8, 88, 888)
eg.3
src.shape = (64, 16, 88, 888)
index.shape = (64, 777)
>> res = gather_righthand(src,index)
res.shape = (64, 777, 88, 888)
"""
def gather_righthand(src, index, check=True):
if not isinstance(src, torch.Tensor):
return np_gather_righthand(src, index, check)
index = index.long()
i_dim = index.dim()
s_dim = src.dim()
t_dim = i_dim - 1
if check:
assert s_dim >= i_dim
assert index.max() <= src.shape[t_dim] - 1
if index.max() != src.shape[t_dim] - 1:
print(
"[gather_righthand] warning, index max value does not match src target dim"
)
assert (
src.shape[t_dim] != index.shape[t_dim]
), "Do you really want to select %d item out of %d?? If so, please set check=False." % (
index.shape[t_dim],
src.shape[t_dim],
)
for d in range(0, t_dim):
assert src.shape[d] == index.shape[d]
index_new_shape = list(src.shape)
index_new_shape[t_dim] = index.shape[t_dim]
for _ in range(i_dim, s_dim):
index = index.unsqueeze(-1)
index_expand = index.expand(index_new_shape) # only this two line matters
return torch.gather(
src, dim=t_dim, index=index_expand
) # only this two line matters
"""
numpy version of 'gather_righthand'
"""
def np_gather_righthand(src, index, check=True):
index = index.astype(np.long)
dim = lambda x: len(x.shape)
i_dim = dim(index)
s_dim = dim(src)
t_dim = i_dim - 1
if check:
assert s_dim >= i_dim
assert index.max() <= src.shape[t_dim] - 1, ("\tindex.max()=", index.max(), "\tsrc.shape[t_dim]-1=", src.shape[t_dim] - 1)
if index.max() != src.shape[t_dim] - 1:
print(
"[gather_righthand] warning, index max value does not match src target dim"
)
assert (
src.shape[t_dim] != index.shape[t_dim]
), "you really want to select %d item out of %d?" % (
index.shape[t_dim],
src.shape[t_dim],
)
for d in range(0, t_dim):
assert src.shape[d] == index.shape[d]
tile_shape = np.array(src.shape) # warning: careful when moving to pytorch
tile_shape[: (t_dim + 1)] = 1
for _ in range(i_dim, s_dim):
index = np.expand_dims(index, -1)
index_expand = np.tile(
index, tile_shape
) # index.expand(index_new_shape) # only this two line matters
return np.take_along_axis(arr=src, indices=index_expand, axis=t_dim)
# return torch.gather(src, dim=t_dim, index=index_expand) # only this two line matters
"""
reverse operation of 'gather_righthand'
"""
def scatter_righthand(scatter_into, src, index, check=True):
index = index.long()
i_dim = index.dim()
s_dim = src.dim()
t_dim = i_dim - 1
index_new_shape = list(src.shape)
index_new_shape[t_dim] = index.shape[t_dim]
for _ in range(i_dim, s_dim):
index = index.unsqueeze(-1)
index_expand = index.expand(index_new_shape) # only this two line matters
return scatter_into.scatter(t_dim, index_expand, src)
"""
calculate distance matrix between two position vector array A and B, support 3d and 2d
test >>
A = np.array([ [0,0],
[1,1],])
B = np.array([ [0,-1],
[1, 0],
[0, 1],])
distance_mat_between(A, B) == [ [ 1 1 1 ], [sqrt(5), 1, 1 ]] => shape = (2,3)
"""
def distance_mat_between(A, B):
n_subject_a = A.shape[-2] # A (64, 3)
n_subject_b = B.shape[-2] # B (28, 3)
A = np.repeat(np.expand_dims(A, -2), n_subject_b, axis=-2) # =>(64, 28, 3)
B = np.repeat(np.expand_dims(B, -2), n_subject_a, axis=-2) # =>(28, 64, 3)
B = np.swapaxes(B, -2, -3) # =>(64, 28, 3)
dis = A - B # =>(64, 100, 100, 2)
dis = np.linalg.norm(dis, axis=-1)
return dis
"""
calculate distance matrix for a position vector array A, support 3d and 2d
"""
def distance_matrix(A):
n_subject = A.shape[-2] # is 2
A = np.repeat(np.expand_dims(A, -2), n_subject, axis=-2) # =>(64, 100, 100, 2)
At = np.swapaxes(A, -2, -3) # =>(64, 100, 100, 2)
dis = At - A # =>(64, 100, 100, 2)
dis = np.linalg.norm(dis, axis=-1)
return dis
"""
calculate delta matrix for a position vector array A
"""
def delta_matrix(A):
n_subject = A.shape[-2] # is 2
A = np.repeat(np.expand_dims(A, -2), n_subject, axis=-2) # =>(64, 100, 100, 2)
At = np.swapaxes(A, -2, -3) # =>(64, 100, 100, 2)
delta = At - A # =>(64, 100, 100, 2)
return delta
def np_normalize_last_dim(mat):
return mat / np.expand_dims(np.linalg.norm(mat, axis=-1) + 1e-16, axis=-1)
def dir2rad_old(delta_pos):
result = np.empty(delta_pos.shape[:-1], dtype=complex)
result.real = delta_pos[..., 0]
result.imag = delta_pos[..., 1]
rad_angle = np.angle(result)
# assert (dir2rad_new(delta_pos)==rad_angle).all()
return rad_angle
"""
arctan2, but support any batch
"""
def dir2rad(delta_pos):
return np.arctan2(delta_pos[..., 1], delta_pos[..., 0])
def dir3d_rad(delta_pos):
assert delta_pos.shape[-1]==3
xy = delta_pos[..., :2]
r1 = dir2rad(xy)
xy_norm = np.linalg.norm(xy, axis=-1)
r2 = dir2rad(np.stack((xy_norm, delta_pos[..., 2]),-1))
return np.stack((r1,r2), axis=-1)
def reg_deg(deg):
return (deg + 180) % 360 - 180
# make angles comparable
def reg_deg_at(rad, ref):
return reg_deg(rad-ref) + ref
def reg_rad(rad):
# it's OK to show "RuntimeWarning: invalid value encountered in remainder"
return (rad + np.pi) % (2 * np.pi) - np.pi
# make angles comparable
def reg_rad_at(rad, ref):
return reg_rad(rad-ref) + ref
# the average of two angles (in rad)
def avg_rad(rad1, rad2):
return reg_rad_at(rad1, rad2)/2 + rad2/2
def zeros_like_except_dim(array, except_dim, n):
shape_ = list(array.shape)
shape_[except_dim] = n
return torch.zeros(size=shape_, device=array.device, dtype=array.dtype)
def pad_at_dim(array, dim, n):
extra_n = n-array.shape[dim]
padding = zeros_like_except_dim(array, except_dim=dim, n=extra_n)
return torch.cat((array, padding), axis=dim)
def stack_vec_with_padding(arr_list):
_len = [arr.len() for arr in arr_list]
max_len = max(_len)
n_subject = arr_list.len()
dtype = arr_list[0].dtype
arr_np = np.zeros(shape=(n_subject, max_len), dtype=dtype)
for i, arr in enumerate(arr_list):
arr_np[i,:_len[i]] = arr
return arr_np
def objdump(obj):
import pickle
with open('objdump.tmp', 'wb+') as f:
pickle.dump(obj, f)
return
def objload():
import pickle, os
if not os.path.exists('objdump.tmp'):
return
with open('objdump.tmp', 'rb') as f:
return pickle.load(f)
def stack_padding(l, padding=np.nan):
max_len = max([t.shape[0] for t in l])
shape_desired = (len(l), max_len, *(l[0].shape[1:]))
target = np.zeros(shape=shape_desired, dtype=float) + padding
for i in range(len(l)): target[i, :len(l[i])] = l[i]
return target
def n_item(tensor):
n = 1
for d in tensor.shape:
n = n*d
return n
def cat_last_dim(tensor, cat):
assert tensor.shape[-1] >= cat.shape[-1]
for i, s in enumerate(tensor.shape[:-1]):
if s!=cat.shape[i]:
cat = repeat_at(cat, i, s)
cat = tensor[..., :cat.shape[-1]] * 0 + cat
return torch.cat((tensor, cat), -1)
"""
input: [25, 25]
output: [ range(0,25), range(25,50) ]
"""
# @lru_cache(10)
def arrange_id(N_AGENT_EACH_TEAM):
AGENT_ID_EACH_TEAM_cv = []
begin = 0
for _, n in enumerate(N_AGENT_EACH_TEAM):
b = begin
s = begin + n
AGENT_ID_EACH_TEAM_cv.append(range(b, s))
begin = s
return AGENT_ID_EACH_TEAM_cv
"""
convert digit to binary
>> get_binary(3, 8)
np.array([ 1,1,0,0, 0,0,0,0 ])
"""
@lru_cache(500)
def get_binary(n:int, n_bits:int, dtype=np.float32):
arr = np.zeros(n_bits, dtype=dtype)
pointer = 0
while True:
arr[pointer] = int(n%2==1)
n = n >> 1
pointer += 1
if n == 0: break
return arr
"""
>> get_binary_n_rows( 3, 8)
array([[0., 0., 0., 0., 0., 0., 0., 0.],
[1., 0., 0., 0., 0., 0., 0., 0.],
[0., 1., 0., 0., 0., 0., 0., 0.]], dtype=float32)
"""
@lru_cache(10)
def get_binary_n_rows(n_row, n_bit=8, dtype=np.float32):
n_int = np.arange(n_row)
arr = np.zeros((n_row, n_bit), dtype=dtype)
for i in range(n_bit):
arr[:, i] = (n_int%2==1).astype(int)
n_int = n_int / 2
n_int = n_int.astype(np.int8)
return arr
================================================
FILE: PythonExample/hmp_minimal_modules/UTIL/tensor_ops_c.pyx
================================================
import numpy as np
cimport numpy as np
cimport cython
from cython.parallel import prange
from libc.math cimport cos, atan2, abs
np.import_array()
ctypedef np.float64_t DTYPE_F64_t
ctypedef np.float32_t DTYPE_t
ctypedef fused DTYPE_int64_t:
np.int64_t
np.int32_t # to compat Windows
ctypedef np.uint8_t DTYPE_bool_t
PI = np.pi
@cython.boundscheck(False)
@cython.wraparound(False)
@cython.nonecheck(False)
def reg_rad_arr(DTYPE_F64_t[:] rad):
cdef Py_ssize_t dim = rad.shape[0]
cdef Py_ssize_t x, y
result = np.zeros(dim, dtype=np.double)
cdef DTYPE_F64_t[:] result_view = result
cdef DTYPE_F64_t PI = np.pi
for x in prange(dim, nogil=True):
result_view[x] = (rad[x] + PI) % (2*PI) - PI
return result
# @cython.boundscheck(False)
# @cython.wraparound(False)
# @cython.nonecheck(False)
# def roll_hisory( DTYPE_t[:,:,:,:] obs_feed_new,
# DTYPE_t[:,:,:,:] prev_obs_feed,
# DTYPE_bool_t[:,:,:] valid_mask,
# DTYPE_int64_t[:,:] N_valid,
# DTYPE_t[:,:,:,:] next_his_pool):
# cdef Py_ssize_t vmax = N_valid.shape[0]
# cdef Py_ssize_t wmax = N_valid.shape[1]
# cdef Py_ssize_t max_obs_entity = obs_feed_new.shape[2]
# cdef int n_v, th, a, t, k, pointer
# for th in prange(vmax, nogil=True):
# for a in range(wmax):
# pointer = 0
# for k in range(max_obs_entity):
# if valid_mask[th,a,k]:
# next_his_pool[th, a, pointer] = obs_feed_new[th,a,k]
# pointer = pointer + 1
# n_v = N_valid[th,a]
# for k in range(n_v, max_obs_entity):
# next_his_pool[th,a,k] = prev_obs_feed[th,a,k-n_v]
# return np.asarray(next_his_pool)
# https://cython.readthedocs.io/en/latest/src/userguide/source_files_and_compilation.html?highlight=wraparound#compiler-directives
'''
binding (True): Python函数的内省, 查看函数内部的细节['__class__', '__delatrr__', ...., 'co_code', 'co_filename', 'co_argcount', 'co_varnames',...]等等
boundscheck (True): 数组的边界检查
wraparound (True) : 是否支持索引倒数,如a[-1]
initializedcheck (True / False): ?
nonecheck (False)
always_allow_keywords (True / False)
profile (False): Write hooks for Python profilers into the compiled C code. Default is False.
infer_types (True / False): Infer types of untyped variables in function bodies. Default is None, indicating that only safe (semantically-unchanging) inferences are allowed. In particular, inferring integral types for variables used in arithmetic expressions is considered unsafe (due to possible overflow) and must be explicitly requested.
'''
================================================
FILE: PythonExample/hmp_minimal_modules/UTIL/win_pool.py
================================================
"""
Author: Fu Qingxu, CASIA
Description: Efficient parallel execting tool,
Less efficient than the shm_pool (Linux only),
but this one supports Windows as well as Linux.
"""
import numpy as np
import time, psutil, platform, copy, multiprocessing
from multiprocessing import Pipe
from config import GlobalConfig
from .hmp_daemon import kill_process_and_its_children
from sys import stdout
def print_red(*kw,**kargs):
print("\033[1;31m",*kw,"\033[0m",**kargs)
def print_green(*kw,**kargs):
print("\033[1;32m",*kw,"\033[0m",**kargs)
if not stdout.isatty():
print_green = print_red = print
def child_process_load_config(machine_info):
# This function is only needed in Windows:
# Load json config or cmdline config to child process,
from UTIL.config_args import prepare_args
prepare_args(vb=False)
# there is a 'machine_info' in GlobalConfig that must agree with main process
GlobalConfig.machine_info = machine_info
pass
class SuperProc(multiprocessing.Process):
def __init__(self, pipe, pipeHelp, index, base_seed, machine_info):
super(SuperProc, self).__init__()
self.p = pipe
self.pH = pipeHelp
self.local_seed = index + base_seed
self.index = index
self.machine_info = machine_info
def automatic_generation(self, name, gen_fn, *arg):
setattr(self, name, gen_fn(*arg))
def automatic_execution(self, name, dowhat, *arg):
return getattr(getattr(self, name), dowhat)(*arg)
def add_targets(self, new_target_args):
for new_target_arg in new_target_args:
name, gen_fn, arg = new_target_arg
if arg is None:
self.automatic_generation(name, gen_fn)
elif isinstance(arg, tuple):
self.automatic_generation(name, gen_fn, *arg)
else:
self.automatic_generation(name, gen_fn, arg)
def execute_target(self, recv_args):
res_list = [None] * len(recv_args)
for i, recv_arg in enumerate(recv_args):
name, dowhat, arg = recv_arg
if arg is None:
res = self.automatic_execution(name, dowhat)
elif isinstance(arg, tuple):
res = self.automatic_execution(name, dowhat, *arg)
else:
res = self.automatic_execution(name, dowhat, arg)
res_list[i] = res
return res_list
def run(self):
import numpy
numpy.random.seed(self.local_seed)
# linux uses fork, but windows does not, reload config for windows
if not platform.system()=="Linux":
child_process_load_config(self.machine_info)
print('[win_pool]: process worker %d started'%self.index)
try:
while True:
recv_args = self.p.recv()
if not isinstance(recv_args, list): # not list object, switch to helper channel
if recv_args == 0:
self.add_targets(self.pH.recv())
elif recv_args == -1:
print('Parallel worker exit')
break # terminate
else:
assert False
continue
result = self.execute_target(recv_args)
self.p.send(result)
except KeyboardInterrupt:
self.__del__()
self.__del__()
def __del__(self):
self.p.close()
self.pH.close()
kill_process_and_its_children(psutil.Process())
class SmartPool(object):
def __init__(self, proc_num, fold, base_seed=None):
self.proc_num = proc_num
self.task_fold = fold
self.thisSide, self.thatSide = zip(*[Pipe() for _ in range(proc_num)])
self.thisSideHelp, self.thatSideHelp = zip(*[Pipe() for _ in range(proc_num)])
self.base_seed = int(np.random.rand()*1e5) if base_seed is None else base_seed
print('[win_pool]: SmartPool base rand seed', self.base_seed)
self.proc_pool = [SuperProc(pipe=p, pipeHelp=pH, index=cnt, base_seed=self.base_seed, machine_info=GlobalConfig.machine_info)
for p, pH, cnt in zip(self.thatSide, self.thatSideHelp, range(proc_num))]
for proc in self.proc_pool:
proc.daemon = False
proc.start()
time.sleep(0.001)
# shut down
for i in range(proc_num):
self.thatSide[i].close()
self.thatSideHelp[i].close()
# add an object of some class, initialize it proc_num=64 times, assigning them to proc_num/fold_num=16 python
# processes
def add_target(self, name, lam, args_list=None):
lam_list = None
if isinstance(lam, list): lam_list = lam
for j in range(self.proc_num):
tuple_list_to_be_send = []
for i in range(self.task_fold):
name_fold = name + str(i)
args = None if args_list is None else args_list[i + j*self.task_fold]
if lam_list is not None: lam = lam_list[i + j*self.task_fold]
tuple_list_to_be_send.append((name_fold, lam, args))
self.thisSide[j].send(0) # switch to helper channel
self.thisSideHelp[j].send(tuple_list_to_be_send)
# if there is index, execute one, otherwise execute all
def exec_target(self, name, dowhat, args_list = None, index_list = None):
if index_list is None:
for j in range(self.proc_num):
tuple_list_to_be_send = []
for i in range(self.task_fold):
name_fold = name + str(i)
args = None if args_list is None else args_list[i + j*self.task_fold]
tuple_list_to_be_send.append((name_fold, dowhat, args))
self.thisSide[j].send(tuple_list_to_be_send)
res_sort = []
for j in range(self.proc_num):
res_sort.extend(self.thisSide[j].recv())
return res_sort
else:
tuple_List_List = [[None for _ in range(self.task_fold)] for _ in range(self.proc_num)]
do_task_flag = [False for _ in range(self.proc_num)]
do_task_fold = [[] for _ in range(self.proc_num)]
result_recv_List_List = [[None for _ in range(self.task_fold)] for _ in range(self.proc_num)]
# sort args
for i, index in enumerate(index_list):
which_proc = index // self.task_fold
which_fold = index % self.task_fold
name_fold = name + str(which_fold)
args = None if args_list is None else args_list[i]
tuple_List_List[which_proc][which_fold] = (name_fold, dowhat, args)
do_task_flag[which_proc] = True
# send args
for which_proc in range(self.proc_num):
tuple_send_buffer = []
for which_fold, item in enumerate(tuple_List_List[which_proc]):
if item is None: continue
tuple_send_buffer.append(item)
do_task_fold[which_proc].append(which_fold)
if do_task_flag[which_proc]:
assert len(tuple_send_buffer) > 0
self.thisSide[which_proc].send(tuple_send_buffer)
# receive returns
for which_proc in range(self.proc_num):
if not do_task_flag[which_proc]:
continue
recv_tmp = self.thisSide[which_proc].recv()
for index, recv_item in enumerate(recv_tmp):
which_fold = do_task_fold[which_proc][index]
result_recv_List_List[which_proc][which_fold] = recv_item
# sort returns
res_sort = [None] * len(index_list)
for i, index in enumerate(index_list):
which_proc = index // self.task_fold
which_fold = index % self.task_fold
res_sort[i] = result_recv_List_List[which_proc][which_fold]
return res_sort
def party_over(self):
self.__del__()
def __del__(self):
print('[win_pool]: executing superpool del')
if hasattr(self, 'terminated'):
print_red('[shm_pool]: already terminated, skipping ~')
return
print('[win_pool]: Sending exit command to workers ...')
try:
for i in range(self.proc_num):
self.thisSide[i].send(-1) # switch to helper channel
self.terminated = True
except: pass
print('[win_pool]: Closing pipe ...')
for i in range(self.proc_num):
try:
self.thisSide[i].close()
self.thisSideHelp[i].close()
except: pass
N_SEC_WAIT = 2
for i in range(N_SEC_WAIT):
print_red('[win_pool]: terminate in %d'%(N_SEC_WAIT-i));time.sleep(1)
# 杀死shm_pool创建的所有子进程,以及子进程的孙进程
print_red('[win_pool]: kill_process_and_its_children(proc)')
for proc in self.proc_pool:
try: kill_process_and_its_children(proc)
except Exception as e: print_red('[win_pool]: error occur when kill_process_and_its_children:\n', e)
print_green('[shm_pool]: __del__ finish')
self.terminated = True
================================================
FILE: PythonExample/hmp_minimal_modules/VISUALIZE/README.md
================================================
# Visual Hybrid Multi-Agent Playground (VHMAP 使用说明书)
## 面向场景和特点
面向场景:
- 科研,尤其是多智能体强化学习领域
- 3D演示
- 娱乐
应用特点:
- Python接口简化到极致
- 渲染在客户端,自动插帧,纵享丝滑帧率
- 服务端依赖少
- 占用服务端资源极少
- 基于ThreeJs,支持拖动,支持手机触屏
- 支持透视和投影两种视图的切换
- 支持回放
- 使用zlib压缩数据流,网络带宽需求小
## 安装
```shell
pip install vhmap
```
## 20行代码-展示VHMAP的简单、丝滑
实现下图,仅需要20行python代码(含初始化)
界面功能、操作介绍:
- 鼠标右键平移,左键旋转,滚轮缩放
- 支持触屏,如果你笔记本或手机有触控屏幕
- 左上角显示渲染刷新率
- play fps:每秒播放多少关键帧(小于渲染刷新率,则插帧;大于渲染刷新率,则超出部分无效)
- pause:暂停
- next frame:暂停并切换下一帧
- previous frame:暂停并切换上一帧
- loop to start:播放完所有数据,回到第一帧
- ppt step:以极慢极慢的速度播放一帧,方便录屏,按下后会卡顿几秒
- use orthcam:切换透视视图(物体近大远小)/投影视图(工程制图学过没),
- P.S. 第一次切换到投影视图时,需要用鼠标滚轮放大画面
```python
from VISUALIZE.mcom import mcom
import numpy as np
class TestVhmap():
def render(self, t):
if not hasattr(self, '可视化桥'):
self.可视化桥 = mcom(path='TEMP/v2d_logger/', draw_mode='Threejs')
self.可视化桥.初始化3D()
self.可视化桥.设置样式('gray')
self.可视化桥.其他几何体之旋转缩放和平移('box', 'BoxGeometry(1,1,1)', 0,0,0, 1,1,1, 0,0,0)
x = np.cos(t); y=np.sin(t); z= np.cos(t)*np.sin(t) # 此帧的x,y,z坐标
self.可视化桥.发送几何体(
'box|2233|Red|0.1', # 填入 ‘形状|几何体之ID标识|颜色|大小’即可
x, y, z, ro_x=0, ro_y=0, ro_z=np.sin(t), # 三维位置+欧拉旋转变换,六自由度
track_n_frame=20) # 显示历史20帧留下的轨迹
self.可视化桥.结束关键帧()
if __name__ == '__main__':
x = TestVhmap()
for step in range(1000): x.render(t=step/np.pi)
import time; time.sleep(1000) # 启动后打开输出的url地址即可
# 这是第21行,已经写完了 :joy:
```
## 50行代码-演示3维N体运动(低精度定步长)
- 代码1详情请见:VISUALIZE/examples/nb.py
运行方法:
```
pip install vhmap
python -m VISUALIZE.examples.nb
```
# 1. Introduction
### 1.1 Basic Introduction
Unreal-based Multi-Agent Playground (Unreal-MAP) is a new generation of multi-agent general platform based on the Unreal Engine.
This platform supports adversial training between swarms & algorithms, and it is the first (and currently the only) extensible RL/MARL environment based on the Unreal Engine to support multi-team training.
### 1.2 Architecture
Unreal-MAP employs a hierarchical five-layer architecture,
where each layer builds upon the previous one. From bottom
to top,the five layers are: *native layer*, *specification layer*, *base class layer*, ***advanced module layer***, and ***interface layer***.
layer. **You only need to focus on the *advanced module layer* (Blueprint) and the *interface layer* (Python).**
From the perspective of creating a standard MARL scenario, using these two layers is sufficient to modify all elements in the task (e.g., POMDP) such as states, actions, observations, transitions, etc.
### 1.3 Features
Unreal-MAP can be used to develop various multi-agent simulation scenarios. Our case studies have already included scenarios with large-scale, heterogeneous, and multi-team characteristics.
**Compared to other RL general platforms** such as [Unity ML-Agents](https://unity-technologies.github.io/ml-agents/), Unreal-MAP has the following advantages in terms of scientific research and experiment:
**(1) Fully Open-Source and Easily Modifiable**: Unreal-MAP utilizes a layered design, and all components from the bottom-level engine to the top-level interfaces are open-sourced.
**(2) Optimized Specifically for MARL**: The underlying engine of Unreal-MAP has been optimized to enhance efficiency in large-scale agent simulations and data transmission.
**(3) Parallel Multi-Process Execution and Controllable Single-Process Time Flow**: Unreal-MAP supports the parallel execution of multiple simulation processes as well as the adjustment of the simulation time flow speed in a single process. You can accelerate simulations to speed up training or decelerate simulations for detailed slow-motion analysis.
**Compared to all current MARL simulation environments**, Unreal-MAP has advantages in terms of scientific research and experiment:
- **Freely build realistic tasks** using the massive resources available in the [Unreal Engine Marketplace](https://www.fab.com/).
- Simultaneously supports **large-scale, heterogeneous, multi-team** simulations.
- **Highly efficient training** with TPS (Timesteps per second) up to 10k+ and FPS (Frames per second) up to 10M+.
- **Controllable simulation time**: you can accelerate simulation to speed up training (until CPU is fully utilized, acceleration doesn't consume extra memory or VRAM), or decelerate for slow-motion analysis.
- **Strong reproducibility**: eliminated various butterfly effect factors in Unreal Engine that could cause experimental irreproducibility.
- **Multi-platform support**: compile both Headless mode and rendering mode clients on Windows, Linux, and MacOS.
- **Rich rendering mechanisms**: supports a) rendering in the UE editor, b) on a compiled pure rendering client, c) cross-platform real-time rendering. You can train on a Linux server and render on Windows host at the same time!
### 1.4 Some Future Works
Unreal-MAP introduces modern game engines into the MARL field with tremendous potential. This potential is mainly reflected in two dimensions: **Scalability** and **Realism**. In terms of scalability, users can not only ***freely*** construct environments using the extremely rich resources from the Unreal Engine community, but can also ***quickly*** build environments according to their ideas using Unreal Engine's future generative AI plugins (such as [ACE](https://developer.nvidia.com/ace.)).
In terms of realism, users can leverage Unreal-MAP to build ***highly realistic*** MARL environments and even develop ***digital twins*** of real-world scenarios. We attempted a sim2real demo using Unreal-MAP. In this demo, we first deployed a multi-UAV-UGV gaming scenario in the experimental field, then recreated the scenario using Unreal-MAP (including model proportions, agent kinematics and dynamics, etc.). We conducted training in the sim environment and then validated it in the real-world scenario, achieving preliminary positive results. In the current solution, Unreal-MAP not only serves as a simulation environment creator, but also acts as a data transmission intermediary, connecting data from the real-world scenario with the algorithmic side.
# 2. How to Install
## 2.1 Professional version
- Step 1, you must install the Unreal Engine from the source code. For details, see the official document of the Unreal Engine: ```https://docs.unrealengine.com/4.27/zh-CN/ProductionPipelines/DevelopmentSetup/BuildingUnrealEngine/```
- Step 2: Clone the git resp ```git clone https://github.com/binary-husky/unreal-hmp.git```
- Step 3: Download large files that github cannot manage. Run ```python Please_ Run_ This_ First_ To_ Fetch_ Big_ Files.py```
- Step 4: Right click the ```UHMP.upproject``` downloaded in step 3, select ```switch unreal engine version```, and then select ```source build at xxxxx``` to confirm. Then open the generated ```UHMP. sln``` and compile it
- Finally, double-click ```UHMP. upproject``` to enter the Unreal Engine Editor.
Note that steps 1 and 4 are difficult. It is recommended to refer to the following video (the 0:00->1:46 in the video is the steps 1, and 1:46->end is steps 4): ```https://ageasga-my.sharepoint.com/:v:/g/personal/fuqingxu_yiteam_tech/EawfqsV2jF5Nsv3KF7X1-woBH-VTvELL6FSRX4cIgUboLg?e=Vmp67E```
## 2.2 Only compiled binary version
```https://github.com/binary-husky/hmp2g/blob/master/ZDOCS/use_unreal_hmap.md```
# 3. Tutorial
The document is being improved. For the video tutorial of simple demo, see ```EnvDesignTutorial.pptx``` (you need to complete step 3 of installation to download this pptx file)
Directory:
- Chapter I. Unreal Engine
- - Build a map (Level) ```https://www.bilibili.com/video/BV1U24y1D7i4/?spm_id_from=333.999.0.0&vd_source=e3bc3eddd1d2414cb64ae72b6a64df55```
- - Establish Agent Actor
- - Design agent blueprint program logic
- - Episode key event notification mechanism
- - Define Custom actions (Unreal Engine side)
- - The Python side controls the custom parameters of the agent
- Chapter II. Python Interface
- - Create a task file (SubTask)
- - Modify agent initialization code
- - Modify the agent reward code
- - Select the control algorithm of each team
- - Full closed loop debugging method
- Chapter III. Appendix
- - Headless acceleration and cross-compiling Linux package
- - Define Custom actions (Need to be familiar with the full closed-loop debugging method first)
- - - Draft a list of actions
- - - Python side action generation
- - - UE-side action parse and execution
- - - Action discretization
- - Installation guide for cross compilation tool chain
# 4. How to Build Binary Client
Run following scripts.
```
python BuildlinuxRender.py
python BuildLinuxServer.py
python BuildWinRender.py
python BuildWinServer.py
```
- Among them, ```Render/Server``` represents ```including graphic rendering / only computing```, the later is generally used for RL training.
- Among them, ```Windows/linux``` represents the target operating system. Note that you need to install ```Unreal Engine Cross Compilation Tool``` to compile Linux programs on Windows.
- After adding new ActionSets in ```Content/Assets/DefAction/ParseAction.uasset```, you may encounter ```Ensure condition failed: !FindPin(FFunctionEntryHelper::GetWorldContextPinName())``` error during packaging. If so, find and remove an extra blueprint function parameter named ```__WorldContext``` that you created by accident in ```ParseAction.uasset```. For more details: ```https://forums.unrealengine.com/t/ensure-condition-failed-on-project-start/469587```
- If you encounter BuildCMakeLib.Automation.cs(45,54): error CS1002 after project migration, please **Rebuild** (not Build!) the AutomationTool in Visual Studio. For more details: ```https://forums.unrealengine.com/t/unreal-engine-version-4-27-2-i-get-an-error-when-trying-to-package-any-project/270627```
# Cite this project !
```
@article{unrealmap,
title={Unreal-MAP: Unreal-Engine-Based General Platform for Multi-Agent Reinforcement Learning},
author={Hu, Tianyi and Fu, Qingxu and Pu, Zhiqiang and Wang, Yuan and Qiu, Tenghai},
journal={arXiv preprint arXiv:2503.15947},
year={2025}
}
```
================================================
FILE: README_CN.md
================================================
# Unreal-MAP
[English](README.md) | [中文](README_CN.md)
[](https://github.com/binary-husky/unreal-map)
[](LICENSE)
[](https://www.python.org/)
[](https://www.unrealengine.com/)
[](https://github.com/binary-husky/unreal-map)
[](README_CN.md)
这是**虚幻多智能体游乐场**(Unreal-MAP),一个基于[虚幻引擎](https://www.unrealengine.com/)的多智能体通用平台。
在这里,您可以使用虚幻引擎的所有功能(蓝图、行为树、物理引擎、AI导航、3D模型/动画和插件资源等)来构建优雅(但也计算高效)和宏伟(但也实验可重现)的多智能体环境。
Unreal-MAP不仅可以用于开发常规的多智能体仿真环境,还针对多智能体强化学习(MARL)仿真进行了特殊优化。您可以使用它来开发各种真实和复杂的MARL场景。您还可以将Unreal-MAP与我们开发的[HMAP](https://github.com/binary-husky/hmp2g)(一个强大的MARL专用实验框架)一起使用,轻松开发MARL场景并快速部署前沿算法。
> 本研究旨在寻找潜在的合作伙伴。如果对这个研究项目感兴趣,请随时联系我们中科院自动化研究所的办公室:tenghai.qiu@ia.ac.cn, hutianyi2021@ia.ac.cn。
>
**请为Github项目点亮```star```。作为研究人员,您的鼓励对我们来说极其重要:```https://github.com/binary-husky/unreal-hmp```** !