[
  {
    "path": ".gitignore",
    "content": ".DS_Store\n.idea"
  },
  {
    "path": "README.md",
    "content": "# TLeague Project Page\nThis is the project page for the following technical reports:\n\nLei Han∗, Jiechao Xiong∗, Peng Sun∗, Xinghai Sun, Meng Fang, Qingwei Guo, Qiaobo Chen, Tengfei Shi, Hongsheng Yu, Zhengyou Zhang.\nTStarBot-X: An Open-Sourced and Comprehensive Study for Efficient League Training in StarCraft II Full Game.\n[arXiv preprint arXiv:2011.13729](https://arxiv.org/abs/2011.13729), 2020.\n(* Equal contribution, correspondence to the first three authors)\n\nPeng Sun∗, Jiechao Xiong∗, Lei Han∗, Xinghai Sun, Shuxing Li, Jiawei Xu, Meng Fang, Zhengyou Zhang.\nTLeague: A Framework for Competitive Self-Play based Distributed Multi-Agent Reinforcement Learning.\n[arXiv preprint arXiv:2011.12895](https://arxiv.org/abs/2011.12895), 2020.\n(* Equal contribution, correspondence to the first three authors)\n\n<big>**Impatient reader for the StarCraft II AI TStarBot-X could see the [TStarBot-X project page here](tstarbotx/README.md)**<big>.\n\n## Quick Start\n* For general TLeague usage, see [the section below](#usage)\n* For the resources of TStarBotX, see [the page here](tstarbotx/README.md)\n* For the resources of ViZDoom experiments, see [the pager here](vizdoom/README.md)\n* For the resources of Pommerman experiments, see [the page here](pommerman/README.md)\n\n## Usage\n`Python>=3.6` is required. We've tested `Python 3.6.5`.\n\n### Minimal Working Example\nTo use the TLeague framework and run a minimal training,\none needs to install the following basic packages:\n* [TLeague](https://github.com/tencent-ailab/TLeague): the main logic of Competitive SelfPlay MultiAgent Reinforcement Learning.\n* [TPolicies](https://github.com/tencent-ailab/TPolicies): a lib for building Neural Net used in RL and IL.\n* [Arena](https://github.com/tencent-ailab/Arena): a lib of environments and env-agent interfaces.\n\nSee the docs therein for how to install `TLeague`, `TPolicies`, `Arena`, respectively.\nBriefly, \nit amounts to git-cloning/downloading the repos and do the in-place pip installation. \nFor examples,\n```bash\ngit clone https://github.com/tencent-ailab/TLeague.git ~/TLeague\ngit clone https://github.com/tencent-ailab/TPolicies.git ~/TPolicies\ngit clone https://github.com/tencent-ailab/Arena.git ~/Arena\ncd ~/TLeague && pip install -e . && cd ~\ncd ~/TPolicies && pip install -e . && cd ~\ncd ~/Arena && pip install -e . && cd ~\n# manually install tensorflow 1.15.0 as required by TPolicies\npip install tensorflow==1.15.0\n``` \n\nThen, try the example of training with the simple game `pong-2p` (an environment contained in `Arena`) as a sanity check. \nSee the [link here](https://github.com/tencent-ailab/TLeague/blob/dev-open/docs/EXAMPLE_SM.md#pong-2p).\n\nTo run training for other environments, extra binaries and/or packages must be installed, as explained in the following.\n\n### StarCraft II Training\nWhen installing the `Arena` package, \none needs additionally install [TImitate](https://github.com/tencent-ailab/TImitate),\nwhich is a lib for SC2 observation and action, zstat extraction, replay parsing, etc.\nSee also the [link here](https://github.com/tencent-ailab/Arena#dependencies).\n\n[Here are examples](https://github.com/tencent-ailab/TLeague/blob/dev-open/docs/EXAMPLE_SM.md#starcraft-ii) \nfor both Reinforcement Learning (CSP-MARL) and Imitation Learning in a single machine.\n\nTODO: pointer to the Docker Auto Build repo and say it's yet-another guide to installation from scratch.\n\nTODO: texts for how to train with k8s\n\n### ViZDoom Training\nWhen installing the `Arena` package, \none needs additionally install ViZDoom (>=1.1.8), \nsee the [link here](https://github.com/tencent-ailab/Arena#dependencies).\n\n[Here are examples](https://github.com/tencent-ailab/TLeague/blob/dev-open/docs/EXAMPLE_SM.md#vizdoom)\nof how to train ViZDoom in a single machine.\n\nRefer also to the [link here](https://github.com/tencent-ailab/TLeagueAutoBuild/tree/dev-open) for how to (auto-)build the docker image,\nwhich is yet-another guide to installation from scratch.\n\nFor running training over a k8s cluster, see the [link here](vizdoom/README.md#training-code).\n\n### Pommerman Training\nWhen installing the `Arena` package, \none needs additionally install Pommerman, \nsee the [link here](https://github.com/tencent-ailab/Arena#dependencies).\n\n[Here are examples](https://github.com/tencent-ailab/TLeague/blob/dev-open/docs/EXAMPLE_SM.md#pommerman)\nfor how to train Pommerman in a single machine. \n\nRefer also to the [link here](https://github.com/tencent-ailab/TLeagueAutoBuild/tree/pommerman) for how to (auto-)build the docker image,\nwhich is yet-another guide to installation from scratch.\n\nFor running training over a k8s cluster, see the [link here](pommerman/README.md#training-code).\n\n### Single Agent RL\nTLeague also works for pure RL,\nwhich can be viewed as a special case of MARL where the number of agents equals to one.\n[Here are examples](https://github.com/tencent-ailab/TLeague/blob/dev-open/docs/EXAMPLE_SM.md#single-agent-rl)\nfor how to train gym Atari in a single machine.\n\nEnsure the correct dependencies are installed:\n```bash\npip install gym[atari]==0.12.1\n```\n\n# Disclaimer\nThis is not an officially supported Tencent product.\nThe code and data in this repository are for research purpose only. \nNo representation or warranty whatsoever, expressed or implied, is made as to its accuracy, reliability or completeness. \nWe assume no liability and are not responsible for any misuse or damage caused by the code and data. \nYour use of the code and data are subject to applicable laws and your use of them is at your own risk."
  },
  {
    "path": "pommerman/README.md",
    "content": "# Pommerman Experiments\nThis page contains the resources for the experiments of Pommerman as discussed in the TLeague technical report.\n\n\n## Training Code\nThe training yaml in the technical reports can be generated by [pommerman.yml.jinja2](pommerman.yml.jinja2).\n\n## Evaluation\n```\npython3 -m tleague.sandbox.run_local_battle_pommerman \\\n--policy_config=\"{\n    'use_xla': False,\n    'rollout_len': 1,\n    'test': True,\n    'rl': False,\n    'use_loss_type': 'none',\n    'use_value_head': False,\n    'use_self_fed_heads': True,\n    'use_lstm': True,\n    'nlstm': 64,\n    'hs_len': 128,\n    'lstm_duration': 1,\n    'lstm_dropout_rate': 0.0,\n    'lstm_cell_type': 'lstm',\n    'lstm_layer_norm': True,\n    'weight_decay': 0.00000002,\n    'n_v': 11,\n    'merge_pi': False,\n  }\" \\\n--model=0076:0077_20201029114642.model\n```\n## Downloads\n### Trained Model\nThe trained model (after 10 day's training) can be downloaded at [Google Drive](https://drive.google.com/file/d/125eUbQl0QTw9f4uyGTcTxMR6GUfRvPBE/view?usp=sharing) \nor [Tencent Weiyun](https://share.weiyun.com/hkLvLNT0).\n### Replay Files\nHere are the 100 replay files of our agent against Navocado as discussed in the TLeague technical report:\n[Google Drive](https://drive.google.com/file/d/1miuqo7EpzgNIGHUNtPqdIswe8rKuoRk0/view?usp=sharing) \nor [Tencent Weiyun](https://share.weiyun.com/csEpj1R3).\n\nReplays can be displayed by drag-and-drop on the page [Pommerman Playback](https://www.pommerman.com/battle).\n"
  },
  {
    "path": "pommerman/pommerman.yml.jinja2",
    "content": "{% set session = 'tr2504-pommerman' %}\n{% set time_tag = \"20210104145000\" %}\n{% set image = \"ccr.ccs.tencentyun.com/sc2ai/tleague-pommerman:\" + time_tag %}\n{% set learner_image = \"ccr.ccs.tencentyun.com/sc2ai/tleague-gpu-hvd-pommerman:\" + time_tag %}\n{% set docker_registry_credential = \"tke-dockreg-cred\" %}\n{% set require_resources = true %}\n{% set pvc_name = \"pvc-share-full\" %}\n{% set chkpoints_zoo_pvc_sub_dir = \"chkpoints_zoo/\" %}\n{% set chkpoints_pvc_sub_dir = chkpoints_zoo_pvc_sub_dir + session + \"_chkpoints\" %}\n{# common #}\n{% set env = \"pommerman_v2_fog\" %}\n{% set policy = \"tpolicies.net_zoo.pommerman.conv_lstm\" %}\n{% set policy_config = {\n  'use_xla': True,\n  'test': False,\n  'rl': True,\n  'use_loss_type': 'rl_ppo',\n  'use_value_head': True,\n  'use_self_fed_heads': False,\n  'use_lstm': True,\n  'nlstm': 64,\n  'hs_len': 128,\n  'lstm_duration': 1,\n  'lstm_dropout_rate': 0.0,\n  'lstm_cell_type': 'lstm',\n  'lstm_layer_norm': True,\n  'weight_decay': 0.00000002,\n  'n_v': 11,\n  'merge_pi': False,\n  'forget_bias': 1.0,\n} %}\n{% set self_policy_config = {\n  'batch_size': 8,\n  'rollout_len': 1,\n  'use_xla': False,\n  'test': True,\n  'use_loss_type': 'none',\n  'use_value_head': True,\n  'use_self_fed_heads': True,\n  'use_lstm': True,\n  'nlstm': 64,\n  'hs_len': 128,\n  'lstm_duration': 1,\n  'lstm_dropout_rate': 0.0,\n  'lstm_cell_type': 'lstm',\n  'lstm_layer_norm': True,\n  'weight_decay': 0.00000002,\n  'n_v': 11,\n  'merge_pi': False,\n  'forget_bias': 1.0,\n} %}\n{% set use_infserver = false %}\n{% set self_infserver_config = {\n  'outputs': ['a', 'neglogp'],\n  'update_model_seconds': 30,\n  'model_key': '',\n} %}\n{% set parallel_infserver_num = 1 %}\n{% set infserver_batch_worker_num = 4 %}\n{% set unroll_length = 64 %}\n{% set rollout_length = 16 %}\n{# model pool#}\n{% set n_model_pools = 1 %}\n{% set model_pool_port1 = 10003 %}\n{% set model_pool_port2 = 10004 %}\n{% set model_pool_verbose = 0 %}\n{# league mgr #}\n{% set league_mgr_port = 20005 %}\n{% set game_mgr_type = \"tleague.game_mgr.ae_game_mgrs.AEMatchMakingGameMgr\" %}\n{% set game_mgr_config = {\n  'lrn_id_list': ['lrngrp0'],\n  'lrn_role_list': ['MA'],\n  'main_agent_pfsp_prob': 0.5,\n  'main_agent_forgotten_prob': 0.15,\n  'main_agent_forgotten_me_winrate_thre': 0.5,\n  'main_agent_forgotten_ma_winrate_thre': 0.7,\n} %}\n{% set mutable_hyperparam_type = \"MutableHyperparam\" %}\n{% set hyperparam_config_name = {\n  'learning_rate': 0.00001,\n  'lam': 0.8,\n  'gamma': 1.0,\n  'burn_in_timesteps': 1000000,\n  'reward_weights': [1.0, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5],\n} %}\n{% set league_mgr_restore_checkpoint_dir = \"\" %}\n{% set league_mgr_chkpoints_dir = \"/root/results/\" %}\n{% set league_mgr_save_checkpoint_root = league_mgr_chkpoints_dir + session + \"_chkpoints\" %}\n{% set league_mgr_save_interval_secs = 900 %}\n{% set mute_actor_msg = true %}\n{% set pseudo_learner_num = -1 %}\n{% set init_model_paths = \"[]\"\n%}\n{% set league_mgr_verbose = 0 %}\n{# learners #}\n{% set n_lrn_groups = 1 %}\n{% set n_hosts_per_lrn_group = [1] %}\n{% set n_gpus_per_host = 2 %}\n{% set hvd_ssh_port = 9527 %}\n{% set lrn_port_base = 30003 %}\n{% set batch_size = 4096 %}\n{% set lrn_rm_size = 32000 %}\n{% set lrn_pub_interval = 200 %}\n{% set lrn_log_interval = 100 %}\n{% set lrn_burn_in_timesteps = 0 %}\n{% set n_v = 11 %}\n{% set lrn_rwd_shape = true %}\n{% set lrn_tb_port = 9003 %}\n{% set lrngrp_learner_config = [\n  {'vf_coef': [10, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],\n   'max_grad_norm': 1.0,\n   'distill_coef': 0.0,\n   'ent_coef': [0.01, 0.01],\n   'adam_beta1': 0.0,\n   'adam_beta2': 0.99,\n   'adam_eps': 0.00001,\n   'ep_loss_coef': {}\n  },\n]%}\n{% set lrngrp_total_timesteps = [\n    200000000,\n] %}\n{# actors per learner #}\n{% set n_actors_per_learner = 25 %}\n{% set actor_update_model_freq = 3200 %}\n{% set actor_rwd_shape = false %}\n{% set actor_log_interval_steps = 51 %}\n{% set actor_verbose = 11 %}\n{% set actor_replay_dir = \"/root/replays/\" %}\n{% set lrngrp_interface_config = [{}]%}\n{% set lrngrp_env_config = [\n  {'rotate': False, \n   'centralV': False,\n   'random_side': True,\n  },\n] %} \n\n{# --- league manager --- #}\n{% if true %}\n---\nkind: Service\napiVersion: v1\nmetadata:\n  name: {{ session }}-league-mgr\n  labels:\n    session: {{ session }}\n    job: league-mgr\n    type: league-mgr\nspec:\n  selector:\n    session: {{ session }}\n    job: league-mgr\n  ports:\n  - port: {{ league_mgr_port }}\n    name: port1\n---\napiVersion: v1\nkind: Pod\nmetadata:\n  name: {{ session }}-league-mgr\n  labels:\n    session: {{ session }}\n    type: league-mgr\n    job: league-mgr\nspec:\n  nodeSelector:\n    type: cpu\n  restartPolicy: Never  # if failure, let it die\n  volumes:\n    - name: data-dir\n      persistentVolumeClaim:\n        claimName: {{ pvc_name }}\n{% if docker_registry_credential %}\n  imagePullSecrets:\n  - name: {{ docker_registry_credential }}\n{% endif %}\n  containers:\n    - name: {{ session }}-league-mgr-container\n      image: {{ image }}\n      ports:\n      - containerPort: {{ league_mgr_port }}\n{% if require_resources %}\n      resources:\n        limits:\n          nvidia.com/gpu: 0\n        requests:\n          nvidia.com/gpu: 0\n          cpu: 3\n          memory: 6Gi\n{% endif %}\n      volumeMounts:\n        - mountPath: {{ league_mgr_chkpoints_dir }}\n          name: data-dir\n          subPath: {{ chkpoints_zoo_pvc_sub_dir }}\n      command:\n      - \"python\"\n      args:\n      - \"-m\"\n      - \"tleague.bin.run_league_mgr\"\n{% set sep = joiner(',') %}\n      - \"--model_pool_addrs={% for i in range(n_model_pools) %}{{ sep() }}{{ session }}-mp{{ i }}:{{ model_pool_port1 }}:{{ model_pool_port2 }}{% endfor %}\"\n      - \"--port={{ league_mgr_port }}\"\n      - \"--game_mgr_type={{ game_mgr_type }}\"\n      - \"--game_mgr_config={{ game_mgr_config }}\"\n      - \"--mutable_hyperparam_type={{ mutable_hyperparam_type }}\"\n      - \"--hyperparam_config_name={{ hyperparam_config_name }}\"\n      - \"--restore_checkpoint_dir={{ league_mgr_restore_checkpoint_dir }}\"\n      - \"--save_checkpoint_root={{ league_mgr_save_checkpoint_root }}\"\n      - \"--save_interval_secs={{ league_mgr_save_interval_secs }}\"\n      - \"--{% if mute_actor_msg %}mute_actor_msg{% else %}nomute_actor_msg{% endif %}\"\n      - \"--verbose={{ league_mgr_verbose }}\"\n      - \"--pseudo_learner_num={{ pseudo_learner_num }}\"\n      - \"--init_model_paths={{ init_model_paths }}\"\n{% endif %}\n{# --- model pools --- #}\n{% if true %}\n{% for i in range(n_model_pools) %}\n---\nkind: Service\napiVersion: v1\nmetadata:\n  name: {{ session }}-mp{{ i }}\n  labels:\n    session: {{ session }}\n    job: model-pool-{{ i }}\n    type: model-pool\nspec:\n  selector:\n    session: {{ session }}\n    job: model-pool-{{ i }}\n  ports:\n  - port: {{ model_pool_port1 }}\n    name: port1\n  - port: {{ model_pool_port2 }}\n    name: port2\n---\napiVersion: v1\nkind: Pod\nmetadata:\n  name: {{ session }}-mp{{ i }}\n  labels:\n    session: {{ session }}\n    job: model-pool-{{ i }}\n    type: model-pool\nspec:\n  nodeSelector:\n    type: cpu \n{% if docker_registry_credential %}\n  imagePullSecrets:\n  - name: {{ docker_registry_credential }}\n{% endif %}\n  restartPolicy: Never  # if failure, let it die\n  containers:\n    - name: {{ session }}-model-pool-container\n      image: {{ image }}\n      ports:\n      - containerPort: {{ model_pool_port1 }}\n      - containerPort: {{ model_pool_port2 }}\n{% if require_resources %}\n      resources:\n        limits:\n          nvidia.com/gpu: 0\n        requests:\n          nvidia.com/gpu: 0\n          cpu: 7\n          memory: 14Gi\n{% endif %}\n      command:\n      - \"python\"\n      args:\n      - \"-m\"\n      - \"tleague.bin.run_model_pool\"\n      - \"--ports={{ model_pool_port1 }}:{{ model_pool_port2 }}\"\n      - \"--verbose={{ model_pool_verbose }}\"\n{% endfor %}\n{% endif %}\n{# --- learners and actors per learner --- #}\n{% if true %}\n{% for i in range(n_lrn_groups) %}\n{% for j in range(n_hosts_per_lrn_group[i] - 1, -1, -1) %}\n{# --- each host corresponds to a service owning a DNS name #}\n---\nkind: Service\napiVersion: v1\nmetadata:\n  name: {{ session }}-lg{{ i }}-h{{ j }}\n  labels:\n    session: {{ session }}\n    type: learner\nspec:\n  selector:\n    session: {{ session }}\n    type: learner\n    group: group-{{ i }}\n    host: host-{{ j }}\n  ports:\n  - port: {{ hvd_ssh_port }}\n    name: port-ssh\n{% for k in range(n_gpus_per_host) %}\n  - port: {{ lrn_port_base + 2*k}}\n    name: port{{ 2*k }}\n  - port: {{ lrn_port_base + 2*k + 1 }}\n    name: port{{ 2*k + 1 }}\n{% endfor %}\n{% if lrn_tb_port %}\n  - port: {{ lrn_tb_port }}\n    name: port-tb\n{% endif %}\n---\napiVersion: v1\nkind: Pod\nmetadata:\n  name: {{ session }}-lg{{ i }}-h{{ j }}\n  labels:\n    session: {{ session }}\n    type: learner\n    group: group-{{ i }}\n    host: host-{{ j }}\nspec:\n  nodeSelector:\n    type: gpu\n  restartPolicy: Never  # if failure, let it die\n  volumes:\n  - name: training-log-dir\n    emptyDir: {}\n{% if docker_registry_credential %}\n  imagePullSecrets:\n  - name: {{ docker_registry_credential }}\n{% endif %}\n  containers:\n    - name: {{ session }}-lg{{ i }}-h{{ j }}-container\n      image: {{ learner_image }}\n      ports:\n      - containerPort: {{ hvd_ssh_port }}\n{% for k in range(n_gpus_per_host) %}\n      - containerPort: {{ lrn_port_base + 2*k }}\n      - containerPort: {{ lrn_port_base + 2*k + 1}}\n{% endfor %}\n{% if lrn_tb_port %}\n      - containerPort: {{ lrn_tb_port }}\n{% endif %}\n{% if require_resources %}\n      resources:\n        limits:\n          nvidia.com/gpu: {{ n_gpus_per_host }}\n        requests:\n          nvidia.com/gpu: {{ n_gpus_per_host }}\n          cpu: 7\n          memory: 14Gi\n{% endif %}\n      env:\n      - name: NONCCL_DEBUG\n        value: \"INFO\"\n{% if j == 0 %}\n{# --- run the mpirun/horovodrun command --- #}\n      volumeMounts:\n      - name: training-log-dir\n        mountPath: /root/work/training_log\n      command:\n      - \"tleague_horovodrun\"\n      args:\n      - \"--verbose\"\n      - \"--exclude-env-vars-pattern\"\n      - \"TR|IM|EVBACK|EVFRONT\"\n      - \"--start-timeout\"\n      - \"1800\"\n      - \"-p\"\n      - \"{{ hvd_ssh_port }}\"\n      - \"-np\"\n      - \"{{ n_hosts_per_lrn_group[i] * n_gpus_per_host }}\"\n      - \"-H\"\n{% set sep = joiner(',') %}\n      - \"{% for jj in range(n_hosts_per_lrn_group[i]) %}{{ sep() }}{{ session }}-lg{{ i }}-h{{ jj }}:{{ n_gpus_per_host }}{% endfor %}\"\n      - \"python\"\n      - \"-m\"\n      - \"tleague.bin.run_pg_learner\"\n      - \"--type=PPO\"\n      - \"--league_mgr_addr={{ session }}-league-mgr:{{ league_mgr_port }}\"\n{% set sep = joiner(',') %}\n      - \"--model_pool_addrs={% for i in range(n_model_pools) %}{{ sep() }}{{ session }}-mp{{ i }}:{{ model_pool_port1 }}:{{ model_pool_port2 }}{% endfor %}\"\n{% for ind_host in range(n_hosts_per_lrn_group[i]) %}\n{% set sep = joiner(',') %}\n      - \"--learner_spec={% for gpu_id in range(n_gpus_per_host) %}{{ sep() }}{{ gpu_id }}:{{ lrn_port_base + 2*gpu_id }}:{{ lrn_port_base + 2*gpu_id + 1 }}{% endfor %}\"\n{% endfor %}\n      - \"--learner_id=lrngrp{{ i }}\"\n      - \"--unroll_length={{ unroll_length }}\"\n      - \"--rollout_length={{ rollout_length }}\"\n      - \"--batch_size={{ batch_size }}\"\n      - \"--rm_size={{ lrn_rm_size }}\"\n      - \"--pub_interval={{ lrn_pub_interval }}\"\n      - \"--log_interval={{ lrn_log_interval }}\"\n      - \"--total_timesteps={{ lrngrp_total_timesteps[i] }}\"\n      - \"--burn_in_timesteps={{ lrn_burn_in_timesteps }}\"\n      - \"--env={{ env }}\"\n      - \"--env_config={{ lrngrp_env_config[i] }}\"\n      - \"--interface_config={{ lrngrp_interface_config[i] }}\"\n      - \"--policy={{ policy }}\"\n      - \"--policy_config={{ policy_config }}\"\n      - \"--{% if lrn_rwd_shape %}rwd_shape{% else %}norwd_shape{% endif %}\"\n      - \"--batch_worker_num={{ 2 }}\"\n      - \"--learner_config={{ lrngrp_learner_config[i] }}\"\n      - \"--data_server_version=v2\"\n      - \"--decode\"\n      - \"--log_infos_interval=1000\"\n{% else %}\n{# --- start an ssh deamon and run an arbitray command that occupies the container --- #}\n      command:\n      - \"bash\"\n      - \"-c\"\n      args:\n      - \"/usr/sbin/sshd -p {{ hvd_ssh_port }}; sleep {{ 3600 * 24 * 7 * 52 * 3}}\"\n{% endif %}\n{% if j==0 and lrn_tb_port %}\n{# --- start tensorboard when applicable --- #}\n    - name: {{ session }}-tb-lrngrp{{ i }}rank0-container\n      image: {{ learner_image }}\n      ports:\n      - containerPort: {{ lrn_tb_port }}\n      volumeMounts:\n      - name: training-log-dir\n        mountPath: /root/training_log\n      env:\n      - name: CUDA_VISIBLE_DEVICES\n        value: \"\"\n      command:\n      - \"tensorboard\"\n      args:\n      - \"--logdir=/root/training_log/lrngrp{{ i }}rank0\"\n      - \"--port={{ lrn_tb_port }}\"\n{% endif %}\n{# --- endif j == 0 --- #}\n{% if true %}\n{% for k in range(n_gpus_per_host) %}\n{# --- the actors correspond to group i host j localrank k--- #}\n---\nkind: Deployment\napiVersion: extensions/v1beta1\nmetadata:\n  name: {{ session }}-actor-lg{{ i }}-h{{ j }}-localrank{{ k }}\n  labels:\n    session: {{ session }}\n    type: actor\nspec:\n  replicas: {{ n_actors_per_learner }}\n  selector:\n    matchLabels:\n      session: {{ session }}\n      type: actor\n      group: group-{{ i }}\n      host: host-{{ j }}\n      localrank: localrank-{{ k }}\n  template:\n    metadata:\n      labels:\n        session: {{ session }}\n        type: actor\n        group: group-{{ i }}\n        host: host-{{ j }}\n        localrank: localrank-{{ k }}\n    spec:\n      nodeSelector:\n        type: cpu\n      volumes:\n      - name: data-dir\n        persistentVolumeClaim:\n          claimName: {{ pvc_name }}\n{% if docker_registry_credential != \"\" %}\n      imagePullSecrets:\n      - name: {{ docker_registry_credential }}\n{% endif %}\n      containers:\n      - name: {{ session }}-actor-lg{{ i }}-h{{ j }}-localrank{{ k }}-container\n        image: {{ image }}\n        imagePullPolicy: IfNotPresent\n{% if require_resources %}\n        resources:\n          limits:\n            nvidia.com/gpu: 0\n          requests:\n            nvidia.com/gpu: 0\n            cpu: 1700m\n            memory: 3.4Gi\n{% endif %}\n        command:\n        - \"python\"\n        args:\n        - \"-m\"\n        - \"tleague.bin.run_pg_actor\"\n        - \"--type=PPO\"\n        - \"--league_mgr_addr={{ session }}-league-mgr:{{ league_mgr_port }}\"\n{% set sep = joiner(',') %}\n        - \"--model_pool_addrs={% for i in range(n_model_pools) %}{{ sep() }}{{ session }}-mp{{ i }}:{{ model_pool_port1 }}:{{ model_pool_port2 }}{% endfor %}\"\n        - \"--learner_addr={{ session }}-lg{{ i }}-h{{ j }}:{{ lrn_port_base + 2*k }}:{{ lrn_port_base + 2*k + 1 }}\"\n        - \"--unroll_length={{ unroll_length }}\"\n        - \"--update_model_freq={{ actor_update_model_freq }}\"\n        - \"--env={{ env }}\"\n        - \"--env_config={{ lrngrp_env_config[i] }}\"\n        - \"--policy={{ policy }}\"\n        - \"--policy_config={{ self_policy_config }}\"\n        - \"--verbose={{ actor_verbose }}\"\n        - \"--log_interval_steps={{ actor_log_interval_steps }}\"\n        - \"--n_v={{ n_v }}\"\n        - \"--{% if actor_rwd_shape %}rwd_shape{% else %}norwd_shape{% endif %}\"\n        - \"--interface_config={{ lrngrp_interface_config[i] }}\"\n        - \"--replay_dir={{ actor_replay_dir }}\"\n{% if use_infserver %}\n{% set sep = joiner(',') %}\n        - \"--self_infserver_addr={% for m in range(parallel_infserver_num) %}{{ sep() }}{{ session }}-infserver-lg{{ i }}-h{{ j }}:{{ lrn_port_base - 1 - m }}{% endfor %}\"\n{% endif %}\n{% endfor %}\n{# --- endfor k --- #}\n{% endif %}\n{# --- infserver --- #}\n{% if use_infserver %}\n---\nkind: Service\napiVersion: v1\nmetadata:\n  name: {{ session }}-infserver-lg{{i}}-h{{ j }}\n  labels:\n    session: {{ session }}\n    type: infserver\n    group: group-{{ i }}\n    host: host-{{ j }}\nspec:\n  selector:\n    session: {{ session }}\n    type: infserver\n    group: group-{{ i }}\n    host: host-{{ j }}\n  ports:\n{% for m in range(parallel_infserver_num) %}\n  - port: {{ lrn_port_base - 1 - m}}\n    name: port-inf-{{ m }}\n{% endfor %}\n---\napiVersion: v1\nkind: Pod\nmetadata:\n  name: {{ session }}-infserver-lg{{ i }}-h{{ j }}\n  labels:\n    session: {{ session }}\n    type: infserver\n    group: group-{{ i }}\n    host: host-{{ j }}\nspec:\n  nodeSelector:\n    type: gpu\n  restartPolicy: Never  # if failure, let it die\n  volumes:\n    - name: data-dir\n      persistentVolumeClaim:\n        claimName: {{ pvc_name }}\n{% if docker_registry_credential %}\n  imagePullSecrets:\n  - name: {{ docker_registry_credential }}\n{% endif %}\n  containers:\n    - name: {{ session }}-infserver-lg{{ i }}-h{{ j }}-container\n      image: {{ learner_image }}\n      ports:\n{% for m in range(parallel_infserver_num) %}\n      - containerPort: {{ lrn_port_base - 1 - m }}\n{% endfor %}\n{% if require_resources %}\n      resources:\n        limits:\n          nvidia.com/gpu: 1\n        requests:\n          nvidia.com/gpu: 1\n          cpu: 6\n          memory: 10Gi\n{% endif %}\n      env:\n      - name: NCCL_DEBUG\n        value: \"INFO\"\n      command:\n      - \"python3\"\n      args:\n      - \"-m\"\n      - \"tleague.bin.run_inference_server\"\n      - \"--nohvd_run\"\n      - \"--env={{ env }}\"\n      - \"--env_config={{ lrngrp_env_config[i] }}\"\n      - \"--interface_config={{ lrngrp_interface_config[i] }}\"\n      - \"--is_rl\"\n      - \"--policy={{ policy }}\"\n      - \"--port={{ lrn_port_base - 1 }}\"\n      - \"--policy_config={{ self_policy_config }}\"\n      - \"--infserver_config={{ self_infserver_config }}\"\n      - \"--league_mgr_addr={{ session }}-league-mgr:{{ league_mgr_port }}\"\n      - \"--learner_id=lrngrp{{ i }}\"\n{% set sep = joiner(',') %}\n      - \"--model_pool_addrs={% for i in range(n_model_pools) %}{{ sep() }}{{ session }}-mp{{ i }}:{{ model_pool_port1 }}:{{ model_pool_port2 }}{% endfor %}\"\n      - \"--batch_worker_num={{ infserver_batch_worker_num }}\"\n{% endif %}\n{% endfor %}\n{# --- endfor j --- #}\n{% endfor %}\n{# --- endfor i --- #}\n{% endif %}\n{# --- endif true/false --- #}\n\n\n"
  },
  {
    "path": "tstarbotx/README.md",
    "content": "# TStarBot-X Project Page\nThis is the project page for the StarCraft II AI TStarBot-X, \ndiscussed in the following technical report:\n\nLei Han∗, Jiechao Xiong∗, Peng Sun∗, Xinghai Sun, Meng Fang, Qingwei Guo, Qiaobo Chen, Tengfei Shi, Hongsheng Yu, Zhengyou Zhang.\nTStarBot-X: An Open-Sourced and Comprehensive Study for Efficient League Training in StarCraft II Full Game.\n[arXiv preprint arXiv:2011.13729](https://arxiv.org/abs/2011.13729), 2020.\n(* Equal contribution, correspondence to the first three authors)\n\n\n## Quick Start\n* If you are an SC2 player who wants to experience playing against TStarBot-X,\nsee our guidance of [human-machine test](hm_test.md). \n* If you are a researcher/developer interested in the training, see todo.\n* For downloading the resources (Replays, NN models, zstat ...), see the [section below](#downloads).\n\n## Downloads\n### TStarBot-X Replay Files\nHere are the replay files of the human-machine tests as discussed in the TStarBot-X technical report:\n[Google Drive](https://drive.google.com/file/d/1U6vMdsjfQWJE9DMGNs-OlLqngOwdwbsf/view?usp=sharing) \nor [Tencent Weiyun](https://share.weiyun.com/zNuIVoxh)\n\n### Maps\nWhen running the human-machine test or opening the replays,\nyou need the map file for `KairosJunction`.\nSee [here](https://github.com/deepmind/pysc2#get-the-maps) and [here](https://github.com/Blizzard/s2client-proto#map-packs) for how to download and where to place the maps.\nFor your convenience, we provide the `KairosJunction` file here \n[Google Drive](https://drive.google.com/file/d/1O_L4E91b3sAUunrDxGV-a_uj7bTuwQ8H/view?usp=sharing) \nor [Tencent Weiyun](https://share.weiyun.com/de2LCco8)\n\n### Data\nFor NN models, zstat files, etc., please see the [link here](hm_test.md#downloads).\n"
  },
  {
    "path": "tstarbotx/gm_test.md",
    "content": "# Human-Machine Test\nWe provide guidelines for how a human plays against TStarBot-X.\nPlease follow the instructions of this page which is self contained (You don't need care the training code when running the human-machine test).\n**Note, currently it only plays zerg-vs-zerg on the map \"KairosJunction\". SC2 version 4.10.0 is required.**\n\n## Prerequisites and Terminology\nYou need TWO machines for the human-machine test.\n\n**Machine A**: the machine used by a human player.\nWindows or MacOS.\nHave the SC2 game installed.\nVersion 4.10.0 game core is required,\nwhich can be ensured by opening an arbitrary 4.10.0 replay file (for example, [here](./README.md#tstarbot-x-replay-files)) that the auto-downloading will be triggered when necessary.\nWe've tested for MacOS where the SC2 game is downloaded from https://starcraft2.com/.\n\n**Machine B**: the machine where TStarBot-X deploys.\nUbuntu is required.\nUsing a GPU is recommended (otherwise the NN forward-pass can be slow, causing high delay and degraded performance).\nTStarBot-X has been tested on a laptop with GTX 1650 and ubuntu 18.04.\n\n## Installs\nHere are the step-by-step instructions.\n\n* Make sure that machine A can connect to machine B via passwordless ssh.\n  - One can google for how to do the passwordless ssh setup, e.g.,\n  [here](https://www.tecmint.com/ssh-passwordless-login-using-ssh-keygen-in-5-easy-steps/).\n  In summary, one generates the public and private keys on machine A,\n  then copy (e.g, using `scp` for remote copying) the private key to the folder `~/.ssh` under machine B.\n  Make sure the `sshd` service has started on machine B.\n  - Verify the passwordless setup by connecting to B from A and see whether it succeeds.\n* On machine B, install the two packages `DistLadder3`, `pysc2 (Tencent Extension)`.\nSee the [section below](#downloads) for downloading.\nJust cd to the corresponding folder and typing `pip install -e .` to complete the installation.\nThe Linux SC2 binary version 4.10.0 is required,\nsee the link [here](https://github.com/deepmind/pysc2#linux) and [here](https://github.com/Blizzard/s2client-proto#downloads).\nThen install the main package `SC2AgentsZoo2` (see the [section below](#downloads) for downloading) for TStarBot-X as follows:\n  - To let TStarBot-X use GPU, you need install `cudnn` and `cuda`\n  - Have `virtualenv` installed. You can do `pip install virtualenv`\n  - Have ssh server started. You can do `apt-get install openssh-server`\n  - cd to the folder `SC2AgentsZoo2/agent_TLeagueFormal14`,\n  run the command `bash install_virtualenv3.sh` to complete the installation.\n* On machine A, install `DistLadder3`, `pysc2 (Tencent Extension)` as aforementioned.\nInstall also ther commercial SC2 on this machine.\n\n## Usage\nHere is how to start the human-machine test.\n\nOn machine A, run the command:\n```\npython3 -m distladder3.bin.play_vs_remote_agent \\\n--human \\\n--port 6789 \\\n--remote <machine-B-usr-name@machine-B-ip-address> \\\n--replay_dir /Users/your-name/Desktop \\\n--map KairosJunction \\\n--game_version 4.10.0 \\\n--replay_name xxx-A-machine.SC2Replay\n```\nwhich starts a game UI that a human can play SC2 with mouse and keyboard.\nFor the `--remote` arg,\nif machine A and B happen to have the same user name,\nyou can also omit the user name and simply write `--remote <machine-B-ip-address>`.\nExamples: `--remote sc2tester@xx.xx.xx.xx` or `--remote xx.xx.xx.xx`.\n\nOn machine B, run the command:\n```\npython3 -m distladder3.bin.play_vs_remote_agent \\\n--port 6789 \\\n--replay_dir /Users/your-name/Desktop \\\n--map KairosJunction \\\n--game_version 4.10.0 \\\n--replay_name xxx-B-machine.SC2Replay \\\n--player_name_path_config \"TStarBot-X,/your/path/to/SC2AgentsZoo2/agent_TLeagueFormal14/,/your/path/to/the/config/file.ini\"\n```\nto start the AI,\nwhere the arg `--player_name_path_config` determines an agent by its name (TStarBotX),\npath (`/your/path/to/SC2AgentsZoo2/agent_TLeagueFormal14/`),\nand config (`/your/path/to/the/config/file.ini`) in comma separated value.\n\nThe `*.ini` config file specifies more detailed args for the agent,\ne.g.,\nwhether to use GPU,\nthe NN model path,\nthe zstat path (a folder),\nthe zstat category,\nprobability for zeroing the zstat,\netc.,\nas shown in the snippet below:\n```ini\n[config]\nuse_gpu_id=-1\n;-1 for not using GPU, 0 means using GPU #0, 1 for GPU #1, etc.\n...\nchkpoints_root_dir=/Users/usr-name/tstarbotx/data\nmodel_filename=TStarBot-X-33days.model\n...\nzstat_zeroing_prob=0.0\nzstat_category=Normal174\n...\ntleague_interface_config.zstat_data_src=/Users/usr-name/tstarbotx/data/rp2124-mv8-victory-selected-Misc\n...\n```\n\nWe've prepared several agent config `.ini` files,\nas well as the corresponding NN models and zstat files (in a folder),\nsee the [section below](#downloads).\n\n## Downloads\n### Packages\n* `DistLadder3`\n[Google Drive](https://drive.google.com/file/d/1ufCtU2JIyoSiSMwN4lqT66oxitmZeArh/view?usp=sharing)\nor [Tencent Weiyun](https://share.weiyun.com/QFgOzG4n)\n* `pysc2 (Tencent Extension)`\n[Google Drive](https://drive.google.com/file/d/1rJnmK1aNIFaYuYkXXmkDvTe-JRKrzFhe/view?usp=sharing)\nor [Tencent Weiyun](https://share.weiyun.com/mCCEZtOX)\n* `SC2AgentsZoo2`\n[Google Drive](https://drive.google.com/file/d/1neXug1fn3miHnKu9Z8tBpMC-ZKfIzXAP/view?usp=sharing)\nor [Tencent Weiyun](https://share.weiyun.com/NKiLym42)\n\n### Data Files\n* zstat files (`rp2124-mv8-victory-selected-Misc`, a folder) used by the 8/25/33 days model\n[Google Drive](https://drive.google.com/file/d/1pV8wD_AXbbESQL2L4LticKTTwiaQpLCf/view?usp=sharing)\nor [Tencent Weiyun](https://share.weiyun.com/ZXeYGjZp)\n* Main Agent 8 days:\n  - `ini` config [Google Drive](https://drive.google.com/file/d/1Ed80rcYaafVRGlQJ7hCcsge1snCx8oLQ/view?usp=sharing)\n  or [Tencent Weiyun](https://share.weiyun.com/GJ1Bwfie)\n  - NN model [Google Drive](https://drive.google.com/file/d/1mJ9s3dpScgKbYj3vZusnC0fJ1IPQuMrc/view?usp=sharing)\n  or [Tencent Weiyun](https://share.weiyun.com/spHoQIg3)\n* Main Agent 25 days:\n  - `ini` config [Google Drive](https://drive.google.com/file/d/1JrfERGRQrVaVPOU8AFjn_B9jhy5eeMl1/view?usp=sharing)\n  or [Tencent Weiyun](https://share.weiyun.com/bNGLU4Zj)\n  - NN model [Google Drive](https://drive.google.com/file/d/1BcQERcIGZvulCd5M4gCej80lJmdIYcLh/view?usp=sharing)\n  or [Tencent Weiyun](https://share.weiyun.com/24QfxkMZ)\n* Main Agent 33 days:\n  - `ini` config [Google Drive](https://drive.google.com/file/d/1AohBDH4C4Y86usNbEDrVhq1g2ZUEJfvp/view?usp=sharing)\n  or [Tencent Weiyun](https://share.weiyun.com/ue3zQ7RG)\n  - NN model [Google Drive](https://drive.google.com/file/d/1M6m-vGGGYNI-KHETq8t8gBKi_luuPLyD/view?usp=sharing)\n  or [Tencent Weiyun](https://share.weiyun.com/yuh9ZDSe)\n"
  },
  {
    "path": "vizdoom/README.md",
    "content": "# ViZDoom Experiments\nThis page contains the resources for the experiments of ViZDoom as discussed in the TLeague technical report.\n\n## Trained Model\nA trained model for the experiments discussed in the technical report (CIG 2016 Track 1) has been given in the evaluation code,\nsee the [section below](#install-myplayer).\n\n## Evaluation\nThe evaluation code can be found here: \n[Google Drive](https://drive.google.com/file/d/1soi_nHglpSazRv2znZzbqO3S6GcFJoPL/view?usp=sharing) \nor [Tencent Weiyun](https://share.weiyun.com/YyN0IqXS),\nwhich is a modification over the original evaluation code https://github.com/mihahauke/VDAIC2017\n\nOur modification allows a synchronous mode for the host, \"F1\" and \"MyPlayer\",\nas discussed in the technical report and is summarized below.\n\nThe root config file `_vizdoom.cfg` overrides all the private configurations of an agent,\nso we've commented out the `ASYNC_PLAYER` setting in `_vizdoom.cfg`:\n```buildoutcfg\ndoom_scenario_path = cig2017.wad\n# window_visible = False\n# mode = ASYNC_PLAYER\ngame_args += -join localhost\n```\n\nFor \"F1\", we add extra arguments to `f1/my_glorious_agent.py`\n```\n    -c F1_COLOR, --color F1_COLOR\n                            0 - green, 1 - gray, 2 - brown, 3 - red, 4 - light\n                            gray, 5 - light brown, 6 - light red, 7 - light blue\n                            (default: 0)\n    -w, --watch           window visible (default: False)\n    -mode MODE, --mode MODE\n                            1 for PLAYER, 2 for ASYNC_PLAYER (default: 1)\n```\nand default \"F1\" to green color and synchronous player.\n\nFor the host, we add an argument\n```\n    -mode MODE, --mode MODE\n                            1 for PLAYER, 2 for ASYNC_PLAYER, 3 for SPECTATOR, 4\n                            for ASYNC_SPECTATOR (default: 1)  \n```\nto allow it be synchronous (mode `3`, the `SPECTATOR`).\n\n### Installation\nThe official code requires each submitted agent be packed as docker (see the descriptions at https://github.com/mihahauke/VDAIC2017).\nIn our modified code, \nwe do a common `pip install` for \"MyPlayer\", \nand docker build for any other third-party agent (e.g., \"F1\"),\nas explained below.\n\n#### Install MyPlayer\nInstall the three packages `Arena`, `TLeague`, `TPolicies`, respectively:\n```\ncd MyPlayer/Arena\npip3 install -e .\ncd MyPlayer/TLeague\npip3 install -e .\ncd MyPlayer/TPolicies\npip3 install -e .\n```\nNote, the MyPlayer evaluation code here is self-contained and relies on the old `TLeague`, `Arena`, `TPolicies` code that is somehow different from the `dev-open` branch. \nYou can avoid the possible conflicts by using, for example, `virtualenv`.\n\nAn NN model has been shipped with the code and placed in the path `MyPlayer/model/`.\n\n#### Install F1\nBuild the image:\n```bash\nsudo chmod u+x build.sh\n./build.sh f1 \n```\nThe corresponding NN model has been contained.\n\n### Install host\nBuild the image:\n```bash\nsudo chmod u+x build.sh\n./build.sh host\n```\n\n### Run\nTo run the evaluation, start the `host`, `MyPlayer` and `F1` in separate terminals. \n`tmux` is recommended.\nAlso, ensure\n```bash\nsudo chmod u+x run.sh\n```\n\nFor 1 MyPlayer, 7 builtin bots, run the following commands in separate terminals:\n```bash\n./run.sh host -mode 3 -b 7 \nbash MyPlayer/TLeague/tleague/sandbox/example_evaluation_vd.sh evaluation\n``` \nwhere `-mode 3` means synchronous spectator, `-b 7` means adding 7 builtin bots.\n\nFor 1 MyPlayer, 1 F1 and 6 builtin bots, run the following commands in separate terminals:\n```bash\n./run.sh host -mode 3 -b 6 -p 2\nbash MyPlayer/TLeague/tleague/sandbox/example_evaluation_vd.sh evaluation\n./run.sh f1\n```\nwhere `-p 2` means there are 2 AI agents to join in.\n\nFor 4 MyPlayer, 4 F1, run the following commands in separate terminals:\n```bash\n./run.sh host -mode 3 -p 8\nbash MyPlayer/TLeague/tleague/sandbox/example_evaluation_vd.sh evaluation\nbash MyPlayer/TLeague/tleague/sandbox/example_evaluation_vd.sh evaluation\nbash MyPlayer/TLeague/tleague/sandbox/example_evaluation_vd.sh evaluation\nbash MyPlayer/TLeague/tleague/sandbox/example_evaluation_vd.sh evaluation\n./run.sh f1\n./run.sh f1\n./run.sh f1\n./run.sh f1\n```\n\n## Training Code\nAs described in the technical reports, there are two stages for training:\none for the navigation, and the other for the frag.\nWe provide the corresponding `.yml.jinja2` files here: [for navigation](vdtr-navi-open.yml.jinja2) and [for frag](vdtr-frag-open.yml.jinja2), respectively.\n\nRun the training over a k8s cluster:\n```bash\n# start\npython render_template.py vdtr-navi-open.yml.jinja2 | kubectl apply -f -\n# stop\npython render_template.py vdtr-navi-open.yml.jinja2 | kubectl delete -f -\n```\n```bash\n# start\npython render_template.py vdtr-frag-open.yml.jinja2 | kubectl apply -f -\n# stop\npython render_template.py vdtr-frag-open.yml.jinja2 | kubectl delete -f -\n```\n\nTODO: guidance to setting up PVC?\n\n## Downloads\nTODO: link to the video clips for the evaluation"
  },
  {
    "path": "vizdoom/vdtr-frag-open.yml.jinja2",
    "content": "{% set session = 'vdtr-frag-open' %}\n{% set image = \"your-docker-registry:port/sc2ai/tleague-vd118:20201209171727\" %}\n{% set learner_image = \"your-docker-registry:port/sc2ai/tleague-gpu-hvd-vd118:20201209171727\" %}\n{% set docker_registry_credential = \"regsecret\" %}\n{% set require_resources = true %}\n{% set pvc_name = \"cephfs-pvc-test\" %}\n{% set chkpoints_zoo_pvc_sub_dir = \"chkpoints_zoo/\" %}\n{% set chkpoints_pvc_sub_dir = chkpoints_zoo_pvc_sub_dir + session + \"_chkpoints\" %}\n{# common #}\n{% set env = \"vizdoom_cig2017_track1\" %}\n{% set env_config = {\n  'num_players' : 8,\n  'num_bots' : 0,\n  'train_mode' : 'frag'\n} %}\n{% set policy = \"tpolicies.net_zoo.conv_lstm.conv_lstm\" %}\n{% set policy_config = {\n  'use_xla': False,\n  'test': False,\n  'rl': True,\n  'use_loss_type': 'rl',\n  'use_value_head': True,\n  'n_v': 1,\n  'use_lstm': True,\n  'rollout_len': 1,\n  'nlstm': 128,\n  'hs_len': 256,\n  'lstm_dropout_rate': 0.2,\n  'lstm_cell_type':'lstm',\n  'lstm_layer_norm': True,\n  'weight_decay': 0.00002,\n  'sync_statistics': 'horovod'\n} %}\n{% set self_policy_config = {\n  'batch_size': 32,\n  'use_xla': False,\n  'test': True,\n  'use_loss_type': 'none',\n  'use_value_head': True,\n  'n_v': 1,\n  'use_lstm': True,\n  'rollout_len': 1,\n  'nlstm': 128,\n  'hs_len': 256,\n  'lstm_dropout_rate': 0.2,\n  'lstm_cell_type':'lstm',\n  'lstm_layer_norm': True,\n  'weight_decay': 0.00002,\n  'sync_statistics': 'horovod'\n} %}\n{% set unroll_length = 64 %}\n{% set rollout_length = 16 %}\n{# model pool#}\n{% set n_model_pools = 5 %}\n{% set model_pool_port1 = 10005 %}\n{% set model_pool_port2 = 10006 %}\n{% set model_pool_verbose = 0 %}\n{# league mgr #}\n{% set league_mgr_port = 10007 %}\n{% set game_mgr_type = \"tleague.game_mgr.game_mgrs.SelfPlayGameMgr\" %}\n{% set game_mgr_config = { 'max_n_players': 30} %}\n{% set mutable_hyperparam_type = \"ConstantHyperparam\" %}\n{% set hyperparam_config_name ={\n  'learning_rate': 5e-5,\n  'lam': 0.95,\n  'gamma': 0.99,\n  'sigma': 10,\n  'reward_weights': [[0.05, 0.7, 0.5, 0.3, 0.4, 0, 0.03, 0.05, 0, 0.05]]\n} %}\n{% set league_mgr_chkpoints_dir = \"/root/results/\" %}\n{% set league_mgr_restore_checkpoint_dir = '' %}\n{# [[0.05, 0.7, 0.5, 0.3, 0.4, 0, 0.03, 0.05, 0, 0.05]] #}\n{% set league_mgr_save_checkpoint_root = league_mgr_chkpoints_dir + session + \"_chkpoints\" %}\n{% set league_mgr_save_interval_secs = 3600 %}\n{% set mute_actor_msg = False %}\n{% set pseudo_learner_num = -1 %}\n{# set init_model_paths = [['0000', league_mgr_chkpoints_dir + 'vdtr2308-navi-18-vd-sample_chkpoints' + '/0066:0067_20201109121749.model']] #}\n{% set init_model_paths = [['0000', league_mgr_chkpoints_dir + 'vdtr2410-navi-18-open_chkpoints' + '/0016:0017_20201211004705.model']] %}\n{% set league_mgr_verbose = 9 %}\n{# learners #}\n{% set n_lrn_groups = 1 %}\n{% set n_hosts_per_lrn_group = 2 %}\n{% set n_gpus_per_host = 8 %}\n{% set hvd_ssh_port = 9527 %}\n{% set lrn_port_base = 30003 %}\n{% set batch_size = 32 %}\n{% set lrn_rm_size = 20480 %}\n{% set lrn_pub_interval = 200 %}\n{% set lrn_log_interval = 50 %}\n{% set lrn_total_timesteps = 10000000 %}\n{% set lrn_burn_in_timesteps = 0 %}\n{% set n_v = 1 %}\n{% set lrn_rwd_shape = False %}\n{% set lrn_tb_port = 9003 %}\n{% set learner_config ={\n  'vf_coef': 0.5,\n  'ent_coef': 0.00003125,\n  'distill_coef': 0.0,\n  'max_grad_norm': 0.5\n} %}\n{# actors per learner #}\n{% set n_actors_per_learner = 8 %}\n{% set actor_distillation = False %}\n{% set actor_update_model_freq = 40 %}\n{% set actor_rwd_shape = True %}\n{% set actor_log_interval_steps = 51 %}\n{% set actor_verbose = 11 %}\n{% set actor_replay_dir = \"/root/replays/\" %}\n{% set interface_config  = \"\" %}\n\n\n{# --- league manager --- #}\n{% if true %}\n---\nkind: Service\napiVersion: v1\nmetadata:\n  name: {{ session }}-league-mgr\n  labels:\n    session: {{ session }}\n    job: league-mgr\n    type: league-mgr\nspec:\n  selector:\n    session: {{ session }}\n    job: league-mgr\n  ports:\n  - port: {{ league_mgr_port }}\n    name: port1\n---\napiVersion: v1\nkind: Pod\nmetadata:\n  name: {{ session }}-league-mgr\n  labels:\n    session: {{ session }}\n    type: league-mgr\n    job: league-mgr\nspec:\n  nodeSelector:\n    type: cpu\n  restartPolicy: Never  # if failure, let it die\n  volumes:\n    - name: data-dir\n      persistentVolumeClaim:\n        claimName: {{ pvc_name }}\n{% if docker_registry_credential %}\n  imagePullSecrets:\n  - name: {{ docker_registry_credential }}\n{% endif %}\n  containers:\n    - name: {{ session }}-league-mgr-container\n      image: {{ image }}\n      ports:\n      - containerPort: {{ league_mgr_port }}\n{% if require_resources %}\n      resources:\n        limits:\n          nvidia.com/gpu: 0\n        requests:\n          nvidia.com/gpu: 0\n          cpu: 36\n          memory: 64Gi\n{% endif %}\n      volumeMounts:\n        - mountPath: {{ league_mgr_chkpoints_dir }}\n          name: data-dir\n          subPath: {{ chkpoints_zoo_pvc_sub_dir }}\n      command:\n      - \"python3\"\n      args:\n      - \"-m\"\n      - \"tleague.bin.run_league_mgr\"\n{% set sep = joiner(',') %}\n      - \"--model_pool_addrs={% for i in range(n_model_pools) %}{{ sep() }}{{ session }}-mp{{ i }}:{{ model_pool_port1 }}:{{ model_pool_port2 }}{% endfor %}\"\n      - \"--port={{ league_mgr_port }}\"\n      - \"--game_mgr_type={{ game_mgr_type }}\"\n      - \"--game_mgr_config={{game_mgr_config}}\"\n      - \"--mutable_hyperparam_type={{ mutable_hyperparam_type }}\"\n      - \"--hyperparam_config_name={{ hyperparam_config_name }}\"\n      - \"--restore_checkpoint_dir={{ league_mgr_restore_checkpoint_dir }}\"\n      - \"--save_checkpoint_root={{ league_mgr_save_checkpoint_root }}\"\n      - \"--save_interval_secs={{ league_mgr_save_interval_secs }}\"\n      - \"--{% if mute_actor_msg %}mute_actor_msg{% else %}nomute_actor_msg{% endif %}\"\n      - \"--verbose={{ league_mgr_verbose }}\"\n      - \"--init_model_paths={{ init_model_paths }}\"\n      - \"--pseudo_learner_num={{ pseudo_learner_num }}\"\n{% endif %}\n{# --- model pools --- #}\n{% if true %}\n{% for i in range(n_model_pools) %}\n---\nkind: Service\napiVersion: v1\nmetadata:\n  name: {{ session }}-mp{{ i }}\n  labels:\n    session: {{ session }}\n    job: model-pool-{{ i }}\n    type: model-pool\nspec:\n  selector:\n    session: {{ session }}\n    job: model-pool-{{ i }}\n  ports:\n  - port: {{ model_pool_port1 }}\n    name: port1\n  - port: {{ model_pool_port2 }}\n    name: port2\n---\napiVersion: v1\nkind: Pod\nmetadata:\n  name: {{ session }}-mp{{ i }}\n  labels:\n    session: {{ session }}\n    job: model-pool-{{ i }}\n    type: model-pool\nspec:\n  nodeSelector:\n    type: cpu\n{% if docker_registry_credential %}\n  imagePullSecrets:\n  - name: {{ docker_registry_credential }}\n{% endif %}\n  restartPolicy: Never  # if failure, let it die\n  containers:\n    - name: {{ session }}-model-pool-container\n      image: {{ image }}\n      ports:\n      - containerPort: {{ model_pool_port1 }}\n      - containerPort: {{ model_pool_port2 }}\n{% if require_resources %}\n      resources:\n        limits:\n          nvidia.com/gpu: 0\n        requests:\n          nvidia.com/gpu: 0\n          cpu: 36\n          memory: 36Gi\n{% endif %}\n      command:\n      - \"python3\"\n      args:\n      - \"-m\"\n      - \"tleague.bin.run_model_pool\"\n      - \"--ports={{ model_pool_port1 }}:{{ model_pool_port2 }}\"\n      - \"--verbose={{ model_pool_verbose }}\"\n{% endfor %}\n{% endif %}\n{# --- learners and actors per learner --- #}\n{% if true %}\n{% for i in range(n_lrn_groups) %}\n{% for j in range(n_hosts_per_lrn_group - 1, -1, -1) %}\n{# --- each host corresponds to a service owning a DNS name #}\n---\nkind: Service\napiVersion: v1\nmetadata:\n  name: {{ session }}-lg{{ i }}-h{{ j }}\n  labels:\n    session: {{ session }}\n    type: learner\nspec:\n  selector:\n    session: {{ session }}\n    type: learner\n    group: group-{{ i }}\n    host: host-{{ j }}\n  ports:\n  - port: {{ hvd_ssh_port }}\n    name: port-ssh\n{% for k in range(n_gpus_per_host) %}\n  - port: {{ lrn_port_base + 2*k}}\n    name: port{{ 2*k }}\n  - port: {{ lrn_port_base + 2*k + 1 }}\n    name: port{{ 2*k + 1 }}\n{% endfor %}\n{% if lrn_tb_port %}\n  - port: {{ lrn_tb_port }}\n    name: port-tb\n{% endif %}\n---\napiVersion: v1\nkind: Pod\nmetadata:\n  name: {{ session }}-lg{{ i }}-h{{ j }}\n  labels:\n    session: {{ session }}\n    type: learner\n    group: group-{{ i }}\n    host: host-{{ j }}\nspec:\n  nodeSelector:\n    type: gpu\n  restartPolicy: Never  # if failure, let it die\n  volumes:\n  - name: training-log-dir\n    emptyDir: {}\n{% if docker_registry_credential %}\n  imagePullSecrets:\n  - name: {{ docker_registry_credential }}\n{% endif %}\n  containers:\n    - name: {{ session }}-lg{{ i }}-h{{ j }}-container\n      image: {{ learner_image }}\n      ports:\n      - containerPort: {{ hvd_ssh_port }}\n{% for k in range(n_gpus_per_host) %}\n      - containerPort: {{ lrn_port_base + 2*k }}\n      - containerPort: {{ lrn_port_base + 2*k + 1}}\n{% endfor %}\n{% if lrn_tb_port %}\n      - containerPort: {{ lrn_tb_port }}\n{% endif %}\n{% if require_resources %}\n      resources:\n        limits:\n          nvidia.com/gpu: {{ n_gpus_per_host }}\n        requests:\n          nvidia.com/gpu: {{ n_gpus_per_host }}\n          cpu: 48\n          memory: 150Gi\n{% endif %}\n      env:\n      - name: NONCCL_DEBUG\n        value: \"INFO\"\n{% if j == 0 %}\n{# --- run the mpirun/horovodrun command --- #}\n      volumeMounts:\n      - name: training-log-dir\n        mountPath: /root/work/training_log\n      command:\n      - \"horovodrun\"\n      args:\n      - \"--verbose\"\n      - \"--start-timeout\"\n      - \"1800\"\n      - \"-p\"\n      - \"{{ hvd_ssh_port }}\"\n      - \"-np\"\n      - \"{{ n_hosts_per_lrn_group * n_gpus_per_host }}\"\n      - \"-H\"\n{% set sep = joiner(',') %}\n      - \"{% for jj in range(n_hosts_per_lrn_group) %}{{ sep() }}{{ session }}-lg{{ i }}-h{{ jj }}:{{ n_gpus_per_host }}{% endfor %}\"\n      - \"python\"\n      - \"-m\"\n      - \"tleague.bin.run_pg_learner\"\n      - \"--league_mgr_addr={{ session }}-league-mgr:{{ league_mgr_port }}\"\n{% set sep = joiner(',') %}\n      - \"--model_pool_addrs={% for i in range(n_model_pools) %}{{ sep() }}{{ session }}-mp{{ i }}:{{ model_pool_port1 }}:{{ model_pool_port2 }}{% endfor %}\"\n{% for ind_host in range(n_hosts_per_lrn_group) %}\n{% set sep = joiner(',') %}\n      - \"--learner_spec={% for gpu_id in range(n_gpus_per_host) %}{{ sep() }}{{ gpu_id }}:{{ lrn_port_base + 2*gpu_id }}:{{ lrn_port_base + 2*gpu_id + 1 }}{% endfor %}\"\n{% endfor %}\n      - \"--learner_id=lrngrp{{ i }}\"\n      - \"--unroll_length={{ unroll_length }}\"\n      - \"--rollout_length={{ rollout_length }}\"\n      - \"--batch_size={{ batch_size }}\"\n      - \"--rm_size={{ lrn_rm_size }}\"\n      - \"--pub_interval={{ lrn_pub_interval }}\"\n      - \"--log_interval={{ lrn_log_interval }}\"\n      - \"--total_timesteps={{ lrn_total_timesteps }}\"\n      - \"--burn_in_timesteps={{ lrn_burn_in_timesteps }}\"\n      - \"--env={{ env }}\"\n      - \"--policy={{ policy }}\"\n      - \"--policy_config={{ policy_config }}\"\n      - \"--{% if lrn_rwd_shape %}rwd_shape{% else %}norwd_shape{% endif %}\"\n      - \"--batch_worker_num={{ 4 }}\"\n      - \"--learner_config={{ learner_config }}\"\n      - \"--type=PPO\"\n{% else %}\n{# --- start an ssh deamon and run an arbitray command that occupies the container --- #}\n      command:\n      - \"bash\"\n      - \"-c\"\n      args:\n      - \"/usr/sbin/sshd -p {{ hvd_ssh_port }}; sleep {{ 3600 * 24 * 7 * 52 * 3}}\"\n{% endif %}\n{% if j==0 and lrn_tb_port %}\n{# --- start tensorboard when applicable --- #}\n    - name: {{ session }}-tb-lrngrp{{ i }}rank0-container\n      image: {{ learner_image }}\n      ports:\n      - containerPort: {{ lrn_tb_port }}\n      volumeMounts:\n      - name: training-log-dir\n        mountPath: /root/training_log\n      env:\n      - name: CUDA_VISIBLE_DEVICES\n        value: \"\"\n      command:\n      - \"tensorboard\"\n      args:\n      - \"--logdir=/root/training_log/lrngrp{{ i }}rank0\"\n      - \"--port={{ lrn_tb_port }}\"\n{% endif %}\n{# --- endif j == 0 --- #}\n{% for k in range(n_gpus_per_host) %}\n{# --- the actors correspond to group i host j localrank k--- #}\n---\nkind: Deployment\napiVersion: extensions/v1beta1\nmetadata:\n  name: {{ session }}-actor-lg{{ i }}-h{{ j }}-localrank{{ k }}\n  labels:\n    session: {{ session }}\n    type: actor\nspec:\n  replicas: {{ n_actors_per_learner }}\n  selector:\n    matchLabels:\n      session: {{ session }}\n      type: actor\n      group: group-{{ i }}\n      host: host-{{ j }}\n      localrank: localrank-{{ k }}\n  template:\n    metadata:\n      labels:\n        session: {{ session }}\n        type: actor\n        group: group-{{ i }}\n        host: host-{{ j }}\n        localrank: localrank-{{ k }}\n    spec:\n      nodeSelector:\n        type: cpu\n      volumes:\n      - name: data-dir\n        persistentVolumeClaim:\n          claimName: {{ pvc_name }}\n{% if docker_registry_credential != \"\" %}\n      imagePullSecrets:\n      - name: {{ docker_registry_credential }}\n{% endif %}\n      containers:\n      - name: {{ session }}-actor-lg{{ i }}-h{{ j }}-localrank{{ k }}-container\n        image: {{ image }}\n        imagePullPolicy: Always\n        stdin: true\n{% if require_resources %}\n        resources:\n          limits:\n            nvidia.com/gpu: 0\n          requests:\n            nvidia.com/gpu: 0\n            cpu: 8\n            memory: 20Gi\n{% endif %}\n        command:\n        - \"python3\"\n        args:\n        - \"-m\"\n        - \"tleague.bin.run_pg_actor\"\n        - \"--league_mgr_addr={{ session }}-league-mgr:{{ league_mgr_port }}\"\n{% set sep = joiner(',') %}\n        - \"--model_pool_addrs={% for i in range(n_model_pools) %}{{ sep() }}{{ session }}-mp{{ i }}:{{ model_pool_port1 }}:{{ model_pool_port2 }}{% endfor %}\"\n        - \"--learner_addr={{ session }}-lg{{ i }}-h{{ j }}:{{ lrn_port_base + 2*k }}:{{ lrn_port_base + 2*k + 1 }}\"\n        - \"--unroll_length={{ unroll_length }}\"\n        - \"--update_model_freq={{ actor_update_model_freq }}\"\n        - \"--env={{ env }}\"\n        - \"--env_config={{env_config}}\"\n        - \"--interface_config={{interface_config}}\"\n        - \"--policy={{ policy }}\"\n        - \"--policy_config={{ self_policy_config }}\"\n        - \"--verbose={{ actor_verbose }}\"\n        - \"--log_interval_steps={{ actor_log_interval_steps }}\"\n        - \"--n_v={{ n_v }}\"\n        - \"--{% if actor_rwd_shape %}rwd_shape{% else %}norwd_shape{% endif %}\"\n        - \"--{% if actor_distillation %}distillation{% else %}nodistillation{% endif %}\"\n        - \"--type=PPO\"\n{% endfor %}\n{# --- endfor k --- #}\n{% endfor %}\n{# --- endfor j --- #}\n{% endfor %}\n{# --- endfor i --- #}\n{% endif %}\n{# --- endif true/false --- #}\n"
  },
  {
    "path": "vizdoom/vdtr-navi-open.yml.jinja2",
    "content": "{% set session = 'vdtr-navi' %}\n{% set image = \"your-docker-registry:port/sc2ai/tleague-vd118:20201209171727\" %}\n{% set learner_image = \"your-docker-registry:port/sc2ai/tleague-gpu-hvd-vd118:20201209171727\" %}\n{% set docker_registry_credential = \"regsecret\" %}\n{% set require_resources = true %}\n{% set pvc_name = \"cephfs-pvc-test\" %}\n{% set chkpoints_zoo_pvc_sub_dir = \"chkpoints_zoo/\" %}\n{% set chkpoints_pvc_sub_dir = chkpoints_zoo_pvc_sub_dir + session + \"_chkpoints\" %}\n{# common #}\n{% set env = \"vizdoom_cig2017_track1\" %}\n{% set env_config = {\n  'num_players' : 8,\n  'num_bots' : 0,\n  'train_mode' : 'navi'\n} %}\n{% set policy = \"tpolicies.net_zoo.conv_lstm.conv_lstm\" %}\n{% set policy_config = {\n  'use_xla': False,\n  'test': False,\n  'rl': True,\n  'use_loss_type': 'rl',\n  'use_value_head': True,\n  'n_v': 1,\n  'use_lstm': True,\n  'rollout_len': 1,\n  'nlstm': 128,\n  'hs_len': 256,\n  'lstm_dropout_rate': 0.2,\n  'lstm_cell_type':'lstm',\n  'lstm_layer_norm': True,\n  'weight_decay': 0.00002,\n  'sync_statistics': 'horovod'\n} %}\n{% set self_policy_config = {\n  'use_xla': False,\n  'test': True,\n  'use_loss_type': 'none',\n  'use_value_head': True,\n  'n_v': 1,\n  'use_lstm': True,\n  'rollout_len': 1,\n  'nlstm': 128,\n  'hs_len': 256,\n  'lstm_dropout_rate': 0.2,\n  'lstm_cell_type':'lstm',\n  'lstm_layer_norm': True,\n  'weight_decay': 0.00002,\n  'sync_statistics': 'horovod'\n} %}\n{% set unroll_length = 64 %}\n{% set rollout_length = 16 %}\n{# model pool#}\n{% set n_model_pools = 15 %}\n{% set model_pool_port1 = 10005 %}\n{% set model_pool_port2 = 10006 %}\n{% set model_pool_verbose = 0 %}\n{# league mgr #}\n{% set league_mgr_port = 10007 %}\n{% set game_mgr_type = \"tleague.game_mgr.game_mgrs.SelfPlayGameMgr\" %}\n{% set game_mgr_config = { 'max_n_players': 30} %}\n{% set mutable_hyperparam_type = \"ConstantHyperparam\" %}\n{% set hyperparam_config_name ={\n  'learning_rate': 1e-4,\n  'lam': 0.95,\n  'gamma': 0.99,\n  'sigma': 10,\n  'reward_weights': [[1, 0, 0, 0, 0, 0, 0, 0, 0, 1]]\n} %}\n{% set league_mgr_chkpoints_dir = \"/root/results/\" %}\n{% set league_mgr_restore_checkpoint_dir = '' %}\n{# [[0.05, 0.7, 0.5, 0.3, 0.4, 0, 0.03, 0.05, 0, 0.05]] #}\n{% set league_mgr_save_checkpoint_root = league_mgr_chkpoints_dir + session + \"_chkpoints\" %}\n{% set league_mgr_save_interval_secs = 3600 %}\n{% set mute_actor_msg = False %}\n{% set pseudo_learner_num = -1 %}\n{# set init_model_paths = [['0000', league_mgr_chkpoints_dir + 'vdtr2308-navi-18-vd-sample_chkpoints' + '/0066:0067_20201109121749.model']] #}\n{% set init_model_paths = [] %}\n{% set league_mgr_verbose = 9 %}\n{# learners #}\n{% set n_lrn_groups = 1 %}\n{% set n_hosts_per_lrn_group = 2 %}\n{% set n_gpus_per_host = 8 %}\n{% set hvd_ssh_port = 9527 %}\n{% set lrn_port_base = 30003 %}\n{% set batch_size = 32 %}\n{% set lrn_rm_size = 20480 %}\n{% set lrn_pub_interval = 200 %}\n{% set lrn_log_interval = 50 %}\n{% set lrn_total_timesteps = 10000000 %}\n{% set lrn_burn_in_timesteps = 0 %}\n{% set n_v = 1 %}\n{% set lrn_rwd_shape = False %}\n{% set lrn_tb_port = 9003 %}\n{% set learner_config ={\n  'vf_coef': 0.5,\n  'ent_coef': 0.00003125,\n  'distill_coef': 0.0,\n  'max_grad_norm': 0.5\n} %}\n{# actors per learner #}\n{% set n_actors_per_learner = 24 %}\n{% set actor_distillation = False %}\n{% set actor_update_model_freq = 40 %}\n{% set actor_rwd_shape = True %}\n{% set actor_log_interval_steps = 51 %}\n{% set actor_verbose = 11 %}\n{% set actor_replay_dir = \"/root/replays/\" %}\n{% set interface_config  = \"\" %}\n\n\n{# --- league manager --- #}\n{% if true %}\n---\nkind: Service\napiVersion: v1\nmetadata:\n  name: {{ session }}-league-mgr\n  labels:\n    session: {{ session }}\n    job: league-mgr\n    type: league-mgr\nspec:\n  selector:\n    session: {{ session }}\n    job: league-mgr\n  ports:\n  - port: {{ league_mgr_port }}\n    name: port1\n---\napiVersion: v1\nkind: Pod\nmetadata:\n  name: {{ session }}-league-mgr\n  labels:\n    session: {{ session }}\n    type: league-mgr\n    job: league-mgr\nspec:\n  nodeSelector:\n    type: cpu\n  restartPolicy: Never  # if failure, let it die\n  volumes:\n    - name: data-dir\n      persistentVolumeClaim:\n        claimName: {{ pvc_name }}\n{% if docker_registry_credential %}\n  imagePullSecrets:\n  - name: {{ docker_registry_credential }}\n{% endif %}\n  containers:\n    - name: {{ session }}-league-mgr-container\n      image: {{ image }}\n      ports:\n      - containerPort: {{ league_mgr_port }}\n{% if require_resources %}\n      resources:\n        limits:\n          nvidia.com/gpu: 0\n        requests:\n          nvidia.com/gpu: 0\n          cpu: 36\n          memory: 64Gi\n{% endif %}\n      volumeMounts:\n        - mountPath: {{ league_mgr_chkpoints_dir }}\n          name: data-dir\n          subPath: {{ chkpoints_zoo_pvc_sub_dir }}\n      command:\n      - \"python3\"\n      args:\n      - \"-m\"\n      - \"tleague.bin.run_league_mgr\"\n{% set sep = joiner(',') %}\n      - \"--model_pool_addrs={% for i in range(n_model_pools) %}{{ sep() }}{{ session }}-mp{{ i }}:{{ model_pool_port1 }}:{{ model_pool_port2 }}{% endfor %}\"\n      - \"--port={{ league_mgr_port }}\"\n      - \"--game_mgr_type={{ game_mgr_type }}\"\n      - \"--game_mgr_config={{game_mgr_config}}\"\n      - \"--mutable_hyperparam_type={{ mutable_hyperparam_type }}\"\n      - \"--hyperparam_config_name={{ hyperparam_config_name }}\"\n      - \"--restore_checkpoint_dir={{ league_mgr_restore_checkpoint_dir }}\"\n      - \"--save_checkpoint_root={{ league_mgr_save_checkpoint_root }}\"\n      - \"--save_interval_secs={{ league_mgr_save_interval_secs }}\"\n      - \"--{% if mute_actor_msg %}mute_actor_msg{% else %}nomute_actor_msg{% endif %}\"\n      - \"--verbose={{ league_mgr_verbose }}\"\n      - \"--init_model_paths={{ init_model_paths }}\"\n      - \"--pseudo_learner_num={{ pseudo_learner_num }}\"\n{% endif %}\n{# --- model pools --- #}\n{% if true %}\n{% for i in range(n_model_pools) %}\n---\nkind: Service\napiVersion: v1\nmetadata:\n  name: {{ session }}-mp{{ i }}\n  labels:\n    session: {{ session }}\n    job: model-pool-{{ i }}\n    type: model-pool\nspec:\n  selector:\n    session: {{ session }}\n    job: model-pool-{{ i }}\n  ports:\n  - port: {{ model_pool_port1 }}\n    name: port1\n  - port: {{ model_pool_port2 }}\n    name: port2\n---\napiVersion: v1\nkind: Pod\nmetadata:\n  name: {{ session }}-mp{{ i }}\n  labels:\n    session: {{ session }}\n    job: model-pool-{{ i }}\n    type: model-pool\nspec:\n  nodeSelector:\n    type: cpu\n{% if docker_registry_credential %}\n  imagePullSecrets:\n  - name: {{ docker_registry_credential }}\n{% endif %}\n  restartPolicy: Never  # if failure, let it die\n  containers:\n    - name: {{ session }}-model-pool-container\n      image: {{ image }}\n      ports:\n      - containerPort: {{ model_pool_port1 }}\n      - containerPort: {{ model_pool_port2 }}\n{% if require_resources %}\n      resources:\n        limits:\n          nvidia.com/gpu: 0\n        requests:\n          nvidia.com/gpu: 0\n          cpu: 36\n          memory: 36Gi\n{% endif %}\n      command:\n      - \"python3\"\n      args:\n      - \"-m\"\n      - \"tleague.bin.run_model_pool\"\n      - \"--ports={{ model_pool_port1 }}:{{ model_pool_port2 }}\"\n      - \"--verbose={{ model_pool_verbose }}\"\n{% endfor %}\n{% endif %}\n{# --- learners and actors per learner --- #}\n{% if true %}\n{% for i in range(n_lrn_groups) %}\n{% for j in range(n_hosts_per_lrn_group - 1, -1, -1) %}\n{# --- each host corresponds to a service owning a DNS name #}\n---\nkind: Service\napiVersion: v1\nmetadata:\n  name: {{ session }}-lg{{ i }}-h{{ j }}\n  labels:\n    session: {{ session }}\n    type: learner\nspec:\n  selector:\n    session: {{ session }}\n    type: learner\n    group: group-{{ i }}\n    host: host-{{ j }}\n  ports:\n  - port: {{ hvd_ssh_port }}\n    name: port-ssh\n{% for k in range(n_gpus_per_host) %}\n  - port: {{ lrn_port_base + 2*k}}\n    name: port{{ 2*k }}\n  - port: {{ lrn_port_base + 2*k + 1 }}\n    name: port{{ 2*k + 1 }}\n{% endfor %}\n{% if lrn_tb_port %}\n  - port: {{ lrn_tb_port }}\n    name: port-tb\n{% endif %}\n---\napiVersion: v1\nkind: Pod\nmetadata:\n  name: {{ session }}-lg{{ i }}-h{{ j }}\n  labels:\n    session: {{ session }}\n    type: learner\n    group: group-{{ i }}\n    host: host-{{ j }}\nspec:\n  nodeSelector:\n    type: gpu\n  restartPolicy: Never  # if failure, let it die\n  volumes:\n  - name: training-log-dir\n    emptyDir: {}\n{% if docker_registry_credential %}\n  imagePullSecrets:\n  - name: {{ docker_registry_credential }}\n{% endif %}\n  containers:\n    - name: {{ session }}-lg{{ i }}-h{{ j }}-container\n      image: {{ learner_image }}\n      ports:\n      - containerPort: {{ hvd_ssh_port }}\n{% for k in range(n_gpus_per_host) %}\n      - containerPort: {{ lrn_port_base + 2*k }}\n      - containerPort: {{ lrn_port_base + 2*k + 1}}\n{% endfor %}\n{% if lrn_tb_port %}\n      - containerPort: {{ lrn_tb_port }}\n{% endif %}\n{% if require_resources %}\n      resources:\n        limits:\n          nvidia.com/gpu: {{ n_gpus_per_host }}\n        requests:\n          nvidia.com/gpu: {{ n_gpus_per_host }}\n          cpu: 48\n          memory: 150Gi\n{% endif %}\n      env:\n      - name: NONCCL_DEBUG\n        value: \"INFO\"\n{% if j == 0 %}\n{# --- run the mpirun/horovodrun command --- #}\n      volumeMounts:\n      - name: training-log-dir\n        mountPath: /root/work/training_log\n      command:\n      - \"horovodrun\"\n      args:\n      - \"--verbose\"\n      - \"--start-timeout\"\n      - \"1800\"\n      - \"-p\"\n      - \"{{ hvd_ssh_port }}\"\n      - \"-np\"\n      - \"{{ n_hosts_per_lrn_group * n_gpus_per_host }}\"\n      - \"-H\"\n{% set sep = joiner(',') %}\n      - \"{% for jj in range(n_hosts_per_lrn_group) %}{{ sep() }}{{ session }}-lg{{ i }}-h{{ jj }}:{{ n_gpus_per_host }}{% endfor %}\"\n      - \"python\"\n      - \"-m\"\n      - \"tleague.bin.run_pg_learner\"\n      - \"--league_mgr_addr={{ session }}-league-mgr:{{ league_mgr_port }}\"\n{% set sep = joiner(',') %}\n      - \"--model_pool_addrs={% for i in range(n_model_pools) %}{{ sep() }}{{ session }}-mp{{ i }}:{{ model_pool_port1 }}:{{ model_pool_port2 }}{% endfor %}\"\n{% for ind_host in range(n_hosts_per_lrn_group) %}\n{% set sep = joiner(',') %}\n      - \"--learner_spec={% for gpu_id in range(n_gpus_per_host) %}{{ sep() }}{{ gpu_id }}:{{ lrn_port_base + 2*gpu_id }}:{{ lrn_port_base + 2*gpu_id + 1 }}{% endfor %}\"\n{% endfor %}\n      - \"--learner_id=lrngrp{{ i }}\"\n      - \"--unroll_length={{ unroll_length }}\"\n      - \"--rollout_length={{ rollout_length }}\"\n      - \"--batch_size={{ batch_size }}\"\n      - \"--rm_size={{ lrn_rm_size }}\"\n      - \"--pub_interval={{ lrn_pub_interval }}\"\n      - \"--log_interval={{ lrn_log_interval }}\"\n      - \"--total_timesteps={{ lrn_total_timesteps }}\"\n      - \"--burn_in_timesteps={{ lrn_burn_in_timesteps }}\"\n      - \"--env={{ env }}\"\n      - \"--policy={{ policy }}\"\n      - \"--policy_config={{ policy_config }}\"\n      - \"--{% if lrn_rwd_shape %}rwd_shape{% else %}norwd_shape{% endif %}\"\n      - \"--batch_worker_num={{ 4 }}\"\n      - \"--learner_config={{ learner_config }}\"\n      - \"--type=PPO\"\n{% else %}\n{# --- start an ssh deamon and run an arbitray command that occupies the container --- #}\n      command:\n      - \"bash\"\n      - \"-c\"\n      args:\n      - \"/usr/sbin/sshd -p {{ hvd_ssh_port }}; sleep {{ 3600 * 24 * 7 * 52 * 3}}\"\n{% endif %}\n{% if j==0 and lrn_tb_port %}\n{# --- start tensorboard when applicable --- #}\n    - name: {{ session }}-tb-lrngrp{{ i }}rank0-container\n      image: {{ learner_image }}\n      ports:\n      - containerPort: {{ lrn_tb_port }}\n      volumeMounts:\n      - name: training-log-dir\n        mountPath: /root/training_log\n      env:\n      - name: CUDA_VISIBLE_DEVICES\n        value: \"\"\n      command:\n      - \"tensorboard\"\n      args:\n      - \"--logdir=/root/training_log/lrngrp{{ i }}rank0\"\n      - \"--port={{ lrn_tb_port }}\"\n{% endif %}\n{# --- endif j == 0 --- #}\n{% for k in range(n_gpus_per_host) %}\n{# --- the actors correspond to group i host j localrank k--- #}\n---\nkind: Deployment\napiVersion: extensions/v1beta1\nmetadata:\n  name: {{ session }}-actor-lg{{ i }}-h{{ j }}-localrank{{ k }}\n  labels:\n    session: {{ session }}\n    type: actor\nspec:\n  replicas: {{ n_actors_per_learner }}\n  selector:\n    matchLabels:\n      session: {{ session }}\n      type: actor\n      group: group-{{ i }}\n      host: host-{{ j }}\n      localrank: localrank-{{ k }}\n  template:\n    metadata:\n      labels:\n        session: {{ session }}\n        type: actor\n        group: group-{{ i }}\n        host: host-{{ j }}\n        localrank: localrank-{{ k }}\n    spec:\n      nodeSelector:\n        type: cpu\n      volumes:\n      - name: data-dir\n        persistentVolumeClaim:\n          claimName: {{ pvc_name }}\n{% if docker_registry_credential != \"\" %}\n      imagePullSecrets:\n      - name: {{ docker_registry_credential }}\n{% endif %}\n      containers:\n      - name: {{ session }}-actor-lg{{ i }}-h{{ j }}-localrank{{ k }}-container\n        image: {{ image }}\n        imagePullPolicy: Always\n        stdin: true\n{% if require_resources %}\n        resources:\n          limits:\n            nvidia.com/gpu: 0\n          requests:\n            nvidia.com/gpu: 0\n            cpu: 8\n            memory: 20Gi\n{% endif %}\n        command:\n        - \"python3\"\n        args:\n        - \"-m\"\n        - \"tleague.bin.run_pg_actor\"\n        - \"--league_mgr_addr={{ session }}-league-mgr:{{ league_mgr_port }}\"\n{% set sep = joiner(',') %}\n        - \"--model_pool_addrs={% for i in range(n_model_pools) %}{{ sep() }}{{ session }}-mp{{ i }}:{{ model_pool_port1 }}:{{ model_pool_port2 }}{% endfor %}\"\n        - \"--learner_addr={{ session }}-lg{{ i }}-h{{ j }}:{{ lrn_port_base + 2*k }}:{{ lrn_port_base + 2*k + 1 }}\"\n        - \"--unroll_length={{ unroll_length }}\"\n        - \"--update_model_freq={{ actor_update_model_freq }}\"\n        - \"--env={{ env }}\"\n        - \"--env_config={{env_config}}\"\n        - \"--interface_config={{interface_config}}\"\n        - \"--policy={{ policy }}\"\n        - \"--policy_config={{ self_policy_config }}\"\n        - \"--verbose={{ actor_verbose }}\"\n        - \"--log_interval_steps={{ actor_log_interval_steps }}\"\n        - \"--n_v={{ n_v }}\"\n        - \"--{% if actor_rwd_shape %}rwd_shape{% else %}norwd_shape{% endif %}\"\n        - \"--{% if actor_distillation %}distillation{% else %}nodistillation{% endif %}\"\n        - \"--type=PPO\"\n{% endfor %}\n{# --- endfor k --- #}\n{% endfor %}\n{# --- endfor j --- #}\n{% endfor %}\n{# --- endfor i --- #}\n{% endif %}\n{# --- endif true/false --- #}\n"
  }
]