Repository: tencent-ailab/tleague_projpage Branch: master Commit: d9519bf6b836 Files: 9 Total size: 64.9 KB Directory structure: gitextract_5vhilzt5/ ├── .gitignore ├── README.md ├── pommerman/ │ ├── README.md │ └── pommerman.yml.jinja2 ├── tstarbotx/ │ ├── README.md │ └── gm_test.md └── vizdoom/ ├── README.md ├── vdtr-frag-open.yml.jinja2 └── vdtr-navi-open.yml.jinja2 ================================================ FILE CONTENTS ================================================ ================================================ FILE: .gitignore ================================================ .DS_Store .idea ================================================ FILE: README.md ================================================ # TLeague Project Page This is the project page for the following technical reports: Lei Han∗, Jiechao Xiong∗, Peng Sun∗, Xinghai Sun, Meng Fang, Qingwei Guo, Qiaobo Chen, Tengfei Shi, Hongsheng Yu, Zhengyou Zhang. TStarBot-X: An Open-Sourced and Comprehensive Study for Efficient League Training in StarCraft II Full Game. [arXiv preprint arXiv:2011.13729](https://arxiv.org/abs/2011.13729), 2020. (* Equal contribution, correspondence to the first three authors) Peng Sun∗, Jiechao Xiong∗, Lei Han∗, Xinghai Sun, Shuxing Li, Jiawei Xu, Meng Fang, Zhengyou Zhang. TLeague: A Framework for Competitive Self-Play based Distributed Multi-Agent Reinforcement Learning. [arXiv preprint arXiv:2011.12895](https://arxiv.org/abs/2011.12895), 2020. (* Equal contribution, correspondence to the first three authors) **Impatient reader for the StarCraft II AI TStarBot-X could see the [TStarBot-X project page here](tstarbotx/README.md)**. ## Quick Start * For general TLeague usage, see [the section below](#usage) * For the resources of TStarBotX, see [the page here](tstarbotx/README.md) * For the resources of ViZDoom experiments, see [the pager here](vizdoom/README.md) * For the resources of Pommerman experiments, see [the page here](pommerman/README.md) ## Usage `Python>=3.6` is required. We've tested `Python 3.6.5`. ### Minimal Working Example To use the TLeague framework and run a minimal training, one needs to install the following basic packages: * [TLeague](https://github.com/tencent-ailab/TLeague): the main logic of Competitive SelfPlay MultiAgent Reinforcement Learning. * [TPolicies](https://github.com/tencent-ailab/TPolicies): a lib for building Neural Net used in RL and IL. * [Arena](https://github.com/tencent-ailab/Arena): a lib of environments and env-agent interfaces. See the docs therein for how to install `TLeague`, `TPolicies`, `Arena`, respectively. Briefly, it amounts to git-cloning/downloading the repos and do the in-place pip installation. For examples, ```bash git clone https://github.com/tencent-ailab/TLeague.git ~/TLeague git clone https://github.com/tencent-ailab/TPolicies.git ~/TPolicies git clone https://github.com/tencent-ailab/Arena.git ~/Arena cd ~/TLeague && pip install -e . && cd ~ cd ~/TPolicies && pip install -e . && cd ~ cd ~/Arena && pip install -e . && cd ~ # manually install tensorflow 1.15.0 as required by TPolicies pip install tensorflow==1.15.0 ``` Then, try the example of training with the simple game `pong-2p` (an environment contained in `Arena`) as a sanity check. See the [link here](https://github.com/tencent-ailab/TLeague/blob/dev-open/docs/EXAMPLE_SM.md#pong-2p). To run training for other environments, extra binaries and/or packages must be installed, as explained in the following. ### StarCraft II Training When installing the `Arena` package, one needs additionally install [TImitate](https://github.com/tencent-ailab/TImitate), which is a lib for SC2 observation and action, zstat extraction, replay parsing, etc. See also the [link here](https://github.com/tencent-ailab/Arena#dependencies). [Here are examples](https://github.com/tencent-ailab/TLeague/blob/dev-open/docs/EXAMPLE_SM.md#starcraft-ii) for both Reinforcement Learning (CSP-MARL) and Imitation Learning in a single machine. TODO: pointer to the Docker Auto Build repo and say it's yet-another guide to installation from scratch. TODO: texts for how to train with k8s ### ViZDoom Training When installing the `Arena` package, one needs additionally install ViZDoom (>=1.1.8), see the [link here](https://github.com/tencent-ailab/Arena#dependencies). [Here are examples](https://github.com/tencent-ailab/TLeague/blob/dev-open/docs/EXAMPLE_SM.md#vizdoom) of how to train ViZDoom in a single machine. Refer also to the [link here](https://github.com/tencent-ailab/TLeagueAutoBuild/tree/dev-open) for how to (auto-)build the docker image, which is yet-another guide to installation from scratch. For running training over a k8s cluster, see the [link here](vizdoom/README.md#training-code). ### Pommerman Training When installing the `Arena` package, one needs additionally install Pommerman, see the [link here](https://github.com/tencent-ailab/Arena#dependencies). [Here are examples](https://github.com/tencent-ailab/TLeague/blob/dev-open/docs/EXAMPLE_SM.md#pommerman) for how to train Pommerman in a single machine. Refer also to the [link here](https://github.com/tencent-ailab/TLeagueAutoBuild/tree/pommerman) for how to (auto-)build the docker image, which is yet-another guide to installation from scratch. For running training over a k8s cluster, see the [link here](pommerman/README.md#training-code). ### Single Agent RL TLeague also works for pure RL, which can be viewed as a special case of MARL where the number of agents equals to one. [Here are examples](https://github.com/tencent-ailab/TLeague/blob/dev-open/docs/EXAMPLE_SM.md#single-agent-rl) for how to train gym Atari in a single machine. Ensure the correct dependencies are installed: ```bash pip install gym[atari]==0.12.1 ``` # Disclaimer This is not an officially supported Tencent product. The code and data in this repository are for research purpose only. No representation or warranty whatsoever, expressed or implied, is made as to its accuracy, reliability or completeness. We assume no liability and are not responsible for any misuse or damage caused by the code and data. Your use of the code and data are subject to applicable laws and your use of them is at your own risk. ================================================ FILE: pommerman/README.md ================================================ # Pommerman Experiments This page contains the resources for the experiments of Pommerman as discussed in the TLeague technical report. ## Training Code The training yaml in the technical reports can be generated by [pommerman.yml.jinja2](pommerman.yml.jinja2). ## Evaluation ``` python3 -m tleague.sandbox.run_local_battle_pommerman \ --policy_config="{ 'use_xla': False, 'rollout_len': 1, 'test': True, 'rl': False, 'use_loss_type': 'none', 'use_value_head': False, 'use_self_fed_heads': True, 'use_lstm': True, 'nlstm': 64, 'hs_len': 128, 'lstm_duration': 1, 'lstm_dropout_rate': 0.0, 'lstm_cell_type': 'lstm', 'lstm_layer_norm': True, 'weight_decay': 0.00000002, 'n_v': 11, 'merge_pi': False, }" \ --model=0076:0077_20201029114642.model ``` ## Downloads ### Trained Model The trained model (after 10 day's training) can be downloaded at [Google Drive](https://drive.google.com/file/d/125eUbQl0QTw9f4uyGTcTxMR6GUfRvPBE/view?usp=sharing) or [Tencent Weiyun](https://share.weiyun.com/hkLvLNT0). ### Replay Files Here are the 100 replay files of our agent against Navocado as discussed in the TLeague technical report: [Google Drive](https://drive.google.com/file/d/1miuqo7EpzgNIGHUNtPqdIswe8rKuoRk0/view?usp=sharing) or [Tencent Weiyun](https://share.weiyun.com/csEpj1R3). Replays can be displayed by drag-and-drop on the page [Pommerman Playback](https://www.pommerman.com/battle). ================================================ FILE: pommerman/pommerman.yml.jinja2 ================================================ {% set session = 'tr2504-pommerman' %} {% set time_tag = "20210104145000" %} {% set image = "ccr.ccs.tencentyun.com/sc2ai/tleague-pommerman:" + time_tag %} {% set learner_image = "ccr.ccs.tencentyun.com/sc2ai/tleague-gpu-hvd-pommerman:" + time_tag %} {% set docker_registry_credential = "tke-dockreg-cred" %} {% set require_resources = true %} {% set pvc_name = "pvc-share-full" %} {% set chkpoints_zoo_pvc_sub_dir = "chkpoints_zoo/" %} {% set chkpoints_pvc_sub_dir = chkpoints_zoo_pvc_sub_dir + session + "_chkpoints" %} {# common #} {% set env = "pommerman_v2_fog" %} {% set policy = "tpolicies.net_zoo.pommerman.conv_lstm" %} {% set policy_config = { 'use_xla': True, 'test': False, 'rl': True, 'use_loss_type': 'rl_ppo', 'use_value_head': True, 'use_self_fed_heads': False, 'use_lstm': True, 'nlstm': 64, 'hs_len': 128, 'lstm_duration': 1, 'lstm_dropout_rate': 0.0, 'lstm_cell_type': 'lstm', 'lstm_layer_norm': True, 'weight_decay': 0.00000002, 'n_v': 11, 'merge_pi': False, 'forget_bias': 1.0, } %} {% set self_policy_config = { 'batch_size': 8, 'rollout_len': 1, 'use_xla': False, 'test': True, 'use_loss_type': 'none', 'use_value_head': True, 'use_self_fed_heads': True, 'use_lstm': True, 'nlstm': 64, 'hs_len': 128, 'lstm_duration': 1, 'lstm_dropout_rate': 0.0, 'lstm_cell_type': 'lstm', 'lstm_layer_norm': True, 'weight_decay': 0.00000002, 'n_v': 11, 'merge_pi': False, 'forget_bias': 1.0, } %} {% set use_infserver = false %} {% set self_infserver_config = { 'outputs': ['a', 'neglogp'], 'update_model_seconds': 30, 'model_key': '', } %} {% set parallel_infserver_num = 1 %} {% set infserver_batch_worker_num = 4 %} {% set unroll_length = 64 %} {% set rollout_length = 16 %} {# model pool#} {% set n_model_pools = 1 %} {% set model_pool_port1 = 10003 %} {% set model_pool_port2 = 10004 %} {% set model_pool_verbose = 0 %} {# league mgr #} {% set league_mgr_port = 20005 %} {% set game_mgr_type = "tleague.game_mgr.ae_game_mgrs.AEMatchMakingGameMgr" %} {% set game_mgr_config = { 'lrn_id_list': ['lrngrp0'], 'lrn_role_list': ['MA'], 'main_agent_pfsp_prob': 0.5, 'main_agent_forgotten_prob': 0.15, 'main_agent_forgotten_me_winrate_thre': 0.5, 'main_agent_forgotten_ma_winrate_thre': 0.7, } %} {% set mutable_hyperparam_type = "MutableHyperparam" %} {% set hyperparam_config_name = { 'learning_rate': 0.00001, 'lam': 0.8, 'gamma': 1.0, 'burn_in_timesteps': 1000000, 'reward_weights': [1.0, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5], } %} {% set league_mgr_restore_checkpoint_dir = "" %} {% set league_mgr_chkpoints_dir = "/root/results/" %} {% set league_mgr_save_checkpoint_root = league_mgr_chkpoints_dir + session + "_chkpoints" %} {% set league_mgr_save_interval_secs = 900 %} {% set mute_actor_msg = true %} {% set pseudo_learner_num = -1 %} {% set init_model_paths = "[]" %} {% set league_mgr_verbose = 0 %} {# learners #} {% set n_lrn_groups = 1 %} {% set n_hosts_per_lrn_group = [1] %} {% set n_gpus_per_host = 2 %} {% set hvd_ssh_port = 9527 %} {% set lrn_port_base = 30003 %} {% set batch_size = 4096 %} {% set lrn_rm_size = 32000 %} {% set lrn_pub_interval = 200 %} {% set lrn_log_interval = 100 %} {% set lrn_burn_in_timesteps = 0 %} {% set n_v = 11 %} {% set lrn_rwd_shape = true %} {% set lrn_tb_port = 9003 %} {% set lrngrp_learner_config = [ {'vf_coef': [10, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], 'max_grad_norm': 1.0, 'distill_coef': 0.0, 'ent_coef': [0.01, 0.01], 'adam_beta1': 0.0, 'adam_beta2': 0.99, 'adam_eps': 0.00001, 'ep_loss_coef': {} }, ]%} {% set lrngrp_total_timesteps = [ 200000000, ] %} {# actors per learner #} {% set n_actors_per_learner = 25 %} {% set actor_update_model_freq = 3200 %} {% set actor_rwd_shape = false %} {% set actor_log_interval_steps = 51 %} {% set actor_verbose = 11 %} {% set actor_replay_dir = "/root/replays/" %} {% set lrngrp_interface_config = [{}]%} {% set lrngrp_env_config = [ {'rotate': False, 'centralV': False, 'random_side': True, }, ] %} {# --- league manager --- #} {% if true %} --- kind: Service apiVersion: v1 metadata: name: {{ session }}-league-mgr labels: session: {{ session }} job: league-mgr type: league-mgr spec: selector: session: {{ session }} job: league-mgr ports: - port: {{ league_mgr_port }} name: port1 --- apiVersion: v1 kind: Pod metadata: name: {{ session }}-league-mgr labels: session: {{ session }} type: league-mgr job: league-mgr spec: nodeSelector: type: cpu restartPolicy: Never # if failure, let it die volumes: - name: data-dir persistentVolumeClaim: claimName: {{ pvc_name }} {% if docker_registry_credential %} imagePullSecrets: - name: {{ docker_registry_credential }} {% endif %} containers: - name: {{ session }}-league-mgr-container image: {{ image }} ports: - containerPort: {{ league_mgr_port }} {% if require_resources %} resources: limits: nvidia.com/gpu: 0 requests: nvidia.com/gpu: 0 cpu: 3 memory: 6Gi {% endif %} volumeMounts: - mountPath: {{ league_mgr_chkpoints_dir }} name: data-dir subPath: {{ chkpoints_zoo_pvc_sub_dir }} command: - "python" args: - "-m" - "tleague.bin.run_league_mgr" {% set sep = joiner(',') %} - "--model_pool_addrs={% for i in range(n_model_pools) %}{{ sep() }}{{ session }}-mp{{ i }}:{{ model_pool_port1 }}:{{ model_pool_port2 }}{% endfor %}" - "--port={{ league_mgr_port }}" - "--game_mgr_type={{ game_mgr_type }}" - "--game_mgr_config={{ game_mgr_config }}" - "--mutable_hyperparam_type={{ mutable_hyperparam_type }}" - "--hyperparam_config_name={{ hyperparam_config_name }}" - "--restore_checkpoint_dir={{ league_mgr_restore_checkpoint_dir }}" - "--save_checkpoint_root={{ league_mgr_save_checkpoint_root }}" - "--save_interval_secs={{ league_mgr_save_interval_secs }}" - "--{% if mute_actor_msg %}mute_actor_msg{% else %}nomute_actor_msg{% endif %}" - "--verbose={{ league_mgr_verbose }}" - "--pseudo_learner_num={{ pseudo_learner_num }}" - "--init_model_paths={{ init_model_paths }}" {% endif %} {# --- model pools --- #} {% if true %} {% for i in range(n_model_pools) %} --- kind: Service apiVersion: v1 metadata: name: {{ session }}-mp{{ i }} labels: session: {{ session }} job: model-pool-{{ i }} type: model-pool spec: selector: session: {{ session }} job: model-pool-{{ i }} ports: - port: {{ model_pool_port1 }} name: port1 - port: {{ model_pool_port2 }} name: port2 --- apiVersion: v1 kind: Pod metadata: name: {{ session }}-mp{{ i }} labels: session: {{ session }} job: model-pool-{{ i }} type: model-pool spec: nodeSelector: type: cpu {% if docker_registry_credential %} imagePullSecrets: - name: {{ docker_registry_credential }} {% endif %} restartPolicy: Never # if failure, let it die containers: - name: {{ session }}-model-pool-container image: {{ image }} ports: - containerPort: {{ model_pool_port1 }} - containerPort: {{ model_pool_port2 }} {% if require_resources %} resources: limits: nvidia.com/gpu: 0 requests: nvidia.com/gpu: 0 cpu: 7 memory: 14Gi {% endif %} command: - "python" args: - "-m" - "tleague.bin.run_model_pool" - "--ports={{ model_pool_port1 }}:{{ model_pool_port2 }}" - "--verbose={{ model_pool_verbose }}" {% endfor %} {% endif %} {# --- learners and actors per learner --- #} {% if true %} {% for i in range(n_lrn_groups) %} {% for j in range(n_hosts_per_lrn_group[i] - 1, -1, -1) %} {# --- each host corresponds to a service owning a DNS name #} --- kind: Service apiVersion: v1 metadata: name: {{ session }}-lg{{ i }}-h{{ j }} labels: session: {{ session }} type: learner spec: selector: session: {{ session }} type: learner group: group-{{ i }} host: host-{{ j }} ports: - port: {{ hvd_ssh_port }} name: port-ssh {% for k in range(n_gpus_per_host) %} - port: {{ lrn_port_base + 2*k}} name: port{{ 2*k }} - port: {{ lrn_port_base + 2*k + 1 }} name: port{{ 2*k + 1 }} {% endfor %} {% if lrn_tb_port %} - port: {{ lrn_tb_port }} name: port-tb {% endif %} --- apiVersion: v1 kind: Pod metadata: name: {{ session }}-lg{{ i }}-h{{ j }} labels: session: {{ session }} type: learner group: group-{{ i }} host: host-{{ j }} spec: nodeSelector: type: gpu restartPolicy: Never # if failure, let it die volumes: - name: training-log-dir emptyDir: {} {% if docker_registry_credential %} imagePullSecrets: - name: {{ docker_registry_credential }} {% endif %} containers: - name: {{ session }}-lg{{ i }}-h{{ j }}-container image: {{ learner_image }} ports: - containerPort: {{ hvd_ssh_port }} {% for k in range(n_gpus_per_host) %} - containerPort: {{ lrn_port_base + 2*k }} - containerPort: {{ lrn_port_base + 2*k + 1}} {% endfor %} {% if lrn_tb_port %} - containerPort: {{ lrn_tb_port }} {% endif %} {% if require_resources %} resources: limits: nvidia.com/gpu: {{ n_gpus_per_host }} requests: nvidia.com/gpu: {{ n_gpus_per_host }} cpu: 7 memory: 14Gi {% endif %} env: - name: NONCCL_DEBUG value: "INFO" {% if j == 0 %} {# --- run the mpirun/horovodrun command --- #} volumeMounts: - name: training-log-dir mountPath: /root/work/training_log command: - "tleague_horovodrun" args: - "--verbose" - "--exclude-env-vars-pattern" - "TR|IM|EVBACK|EVFRONT" - "--start-timeout" - "1800" - "-p" - "{{ hvd_ssh_port }}" - "-np" - "{{ n_hosts_per_lrn_group[i] * n_gpus_per_host }}" - "-H" {% set sep = joiner(',') %} - "{% for jj in range(n_hosts_per_lrn_group[i]) %}{{ sep() }}{{ session }}-lg{{ i }}-h{{ jj }}:{{ n_gpus_per_host }}{% endfor %}" - "python" - "-m" - "tleague.bin.run_pg_learner" - "--type=PPO" - "--league_mgr_addr={{ session }}-league-mgr:{{ league_mgr_port }}" {% set sep = joiner(',') %} - "--model_pool_addrs={% for i in range(n_model_pools) %}{{ sep() }}{{ session }}-mp{{ i }}:{{ model_pool_port1 }}:{{ model_pool_port2 }}{% endfor %}" {% for ind_host in range(n_hosts_per_lrn_group[i]) %} {% set sep = joiner(',') %} - "--learner_spec={% for gpu_id in range(n_gpus_per_host) %}{{ sep() }}{{ gpu_id }}:{{ lrn_port_base + 2*gpu_id }}:{{ lrn_port_base + 2*gpu_id + 1 }}{% endfor %}" {% endfor %} - "--learner_id=lrngrp{{ i }}" - "--unroll_length={{ unroll_length }}" - "--rollout_length={{ rollout_length }}" - "--batch_size={{ batch_size }}" - "--rm_size={{ lrn_rm_size }}" - "--pub_interval={{ lrn_pub_interval }}" - "--log_interval={{ lrn_log_interval }}" - "--total_timesteps={{ lrngrp_total_timesteps[i] }}" - "--burn_in_timesteps={{ lrn_burn_in_timesteps }}" - "--env={{ env }}" - "--env_config={{ lrngrp_env_config[i] }}" - "--interface_config={{ lrngrp_interface_config[i] }}" - "--policy={{ policy }}" - "--policy_config={{ policy_config }}" - "--{% if lrn_rwd_shape %}rwd_shape{% else %}norwd_shape{% endif %}" - "--batch_worker_num={{ 2 }}" - "--learner_config={{ lrngrp_learner_config[i] }}" - "--data_server_version=v2" - "--decode" - "--log_infos_interval=1000" {% else %} {# --- start an ssh deamon and run an arbitray command that occupies the container --- #} command: - "bash" - "-c" args: - "/usr/sbin/sshd -p {{ hvd_ssh_port }}; sleep {{ 3600 * 24 * 7 * 52 * 3}}" {% endif %} {% if j==0 and lrn_tb_port %} {# --- start tensorboard when applicable --- #} - name: {{ session }}-tb-lrngrp{{ i }}rank0-container image: {{ learner_image }} ports: - containerPort: {{ lrn_tb_port }} volumeMounts: - name: training-log-dir mountPath: /root/training_log env: - name: CUDA_VISIBLE_DEVICES value: "" command: - "tensorboard" args: - "--logdir=/root/training_log/lrngrp{{ i }}rank0" - "--port={{ lrn_tb_port }}" {% endif %} {# --- endif j == 0 --- #} {% if true %} {% for k in range(n_gpus_per_host) %} {# --- the actors correspond to group i host j localrank k--- #} --- kind: Deployment apiVersion: extensions/v1beta1 metadata: name: {{ session }}-actor-lg{{ i }}-h{{ j }}-localrank{{ k }} labels: session: {{ session }} type: actor spec: replicas: {{ n_actors_per_learner }} selector: matchLabels: session: {{ session }} type: actor group: group-{{ i }} host: host-{{ j }} localrank: localrank-{{ k }} template: metadata: labels: session: {{ session }} type: actor group: group-{{ i }} host: host-{{ j }} localrank: localrank-{{ k }} spec: nodeSelector: type: cpu volumes: - name: data-dir persistentVolumeClaim: claimName: {{ pvc_name }} {% if docker_registry_credential != "" %} imagePullSecrets: - name: {{ docker_registry_credential }} {% endif %} containers: - name: {{ session }}-actor-lg{{ i }}-h{{ j }}-localrank{{ k }}-container image: {{ image }} imagePullPolicy: IfNotPresent {% if require_resources %} resources: limits: nvidia.com/gpu: 0 requests: nvidia.com/gpu: 0 cpu: 1700m memory: 3.4Gi {% endif %} command: - "python" args: - "-m" - "tleague.bin.run_pg_actor" - "--type=PPO" - "--league_mgr_addr={{ session }}-league-mgr:{{ league_mgr_port }}" {% set sep = joiner(',') %} - "--model_pool_addrs={% for i in range(n_model_pools) %}{{ sep() }}{{ session }}-mp{{ i }}:{{ model_pool_port1 }}:{{ model_pool_port2 }}{% endfor %}" - "--learner_addr={{ session }}-lg{{ i }}-h{{ j }}:{{ lrn_port_base + 2*k }}:{{ lrn_port_base + 2*k + 1 }}" - "--unroll_length={{ unroll_length }}" - "--update_model_freq={{ actor_update_model_freq }}" - "--env={{ env }}" - "--env_config={{ lrngrp_env_config[i] }}" - "--policy={{ policy }}" - "--policy_config={{ self_policy_config }}" - "--verbose={{ actor_verbose }}" - "--log_interval_steps={{ actor_log_interval_steps }}" - "--n_v={{ n_v }}" - "--{% if actor_rwd_shape %}rwd_shape{% else %}norwd_shape{% endif %}" - "--interface_config={{ lrngrp_interface_config[i] }}" - "--replay_dir={{ actor_replay_dir }}" {% if use_infserver %} {% set sep = joiner(',') %} - "--self_infserver_addr={% for m in range(parallel_infserver_num) %}{{ sep() }}{{ session }}-infserver-lg{{ i }}-h{{ j }}:{{ lrn_port_base - 1 - m }}{% endfor %}" {% endif %} {% endfor %} {# --- endfor k --- #} {% endif %} {# --- infserver --- #} {% if use_infserver %} --- kind: Service apiVersion: v1 metadata: name: {{ session }}-infserver-lg{{i}}-h{{ j }} labels: session: {{ session }} type: infserver group: group-{{ i }} host: host-{{ j }} spec: selector: session: {{ session }} type: infserver group: group-{{ i }} host: host-{{ j }} ports: {% for m in range(parallel_infserver_num) %} - port: {{ lrn_port_base - 1 - m}} name: port-inf-{{ m }} {% endfor %} --- apiVersion: v1 kind: Pod metadata: name: {{ session }}-infserver-lg{{ i }}-h{{ j }} labels: session: {{ session }} type: infserver group: group-{{ i }} host: host-{{ j }} spec: nodeSelector: type: gpu restartPolicy: Never # if failure, let it die volumes: - name: data-dir persistentVolumeClaim: claimName: {{ pvc_name }} {% if docker_registry_credential %} imagePullSecrets: - name: {{ docker_registry_credential }} {% endif %} containers: - name: {{ session }}-infserver-lg{{ i }}-h{{ j }}-container image: {{ learner_image }} ports: {% for m in range(parallel_infserver_num) %} - containerPort: {{ lrn_port_base - 1 - m }} {% endfor %} {% if require_resources %} resources: limits: nvidia.com/gpu: 1 requests: nvidia.com/gpu: 1 cpu: 6 memory: 10Gi {% endif %} env: - name: NCCL_DEBUG value: "INFO" command: - "python3" args: - "-m" - "tleague.bin.run_inference_server" - "--nohvd_run" - "--env={{ env }}" - "--env_config={{ lrngrp_env_config[i] }}" - "--interface_config={{ lrngrp_interface_config[i] }}" - "--is_rl" - "--policy={{ policy }}" - "--port={{ lrn_port_base - 1 }}" - "--policy_config={{ self_policy_config }}" - "--infserver_config={{ self_infserver_config }}" - "--league_mgr_addr={{ session }}-league-mgr:{{ league_mgr_port }}" - "--learner_id=lrngrp{{ i }}" {% set sep = joiner(',') %} - "--model_pool_addrs={% for i in range(n_model_pools) %}{{ sep() }}{{ session }}-mp{{ i }}:{{ model_pool_port1 }}:{{ model_pool_port2 }}{% endfor %}" - "--batch_worker_num={{ infserver_batch_worker_num }}" {% endif %} {% endfor %} {# --- endfor j --- #} {% endfor %} {# --- endfor i --- #} {% endif %} {# --- endif true/false --- #} ================================================ FILE: tstarbotx/README.md ================================================ # TStarBot-X Project Page This is the project page for the StarCraft II AI TStarBot-X, discussed in the following technical report: Lei Han∗, Jiechao Xiong∗, Peng Sun∗, Xinghai Sun, Meng Fang, Qingwei Guo, Qiaobo Chen, Tengfei Shi, Hongsheng Yu, Zhengyou Zhang. TStarBot-X: An Open-Sourced and Comprehensive Study for Efficient League Training in StarCraft II Full Game. [arXiv preprint arXiv:2011.13729](https://arxiv.org/abs/2011.13729), 2020. (* Equal contribution, correspondence to the first three authors) ## Quick Start * If you are an SC2 player who wants to experience playing against TStarBot-X, see our guidance of [human-machine test](hm_test.md). * If you are a researcher/developer interested in the training, see todo. * For downloading the resources (Replays, NN models, zstat ...), see the [section below](#downloads). ## Downloads ### TStarBot-X Replay Files Here are the replay files of the human-machine tests as discussed in the TStarBot-X technical report: [Google Drive](https://drive.google.com/file/d/1U6vMdsjfQWJE9DMGNs-OlLqngOwdwbsf/view?usp=sharing) or [Tencent Weiyun](https://share.weiyun.com/zNuIVoxh) ### Maps When running the human-machine test or opening the replays, you need the map file for `KairosJunction`. See [here](https://github.com/deepmind/pysc2#get-the-maps) and [here](https://github.com/Blizzard/s2client-proto#map-packs) for how to download and where to place the maps. For your convenience, we provide the `KairosJunction` file here [Google Drive](https://drive.google.com/file/d/1O_L4E91b3sAUunrDxGV-a_uj7bTuwQ8H/view?usp=sharing) or [Tencent Weiyun](https://share.weiyun.com/de2LCco8) ### Data For NN models, zstat files, etc., please see the [link here](hm_test.md#downloads). ================================================ FILE: tstarbotx/gm_test.md ================================================ # Human-Machine Test We provide guidelines for how a human plays against TStarBot-X. Please follow the instructions of this page which is self contained (You don't need care the training code when running the human-machine test). **Note, currently it only plays zerg-vs-zerg on the map "KairosJunction". SC2 version 4.10.0 is required.** ## Prerequisites and Terminology You need TWO machines for the human-machine test. **Machine A**: the machine used by a human player. Windows or MacOS. Have the SC2 game installed. Version 4.10.0 game core is required, which can be ensured by opening an arbitrary 4.10.0 replay file (for example, [here](./README.md#tstarbot-x-replay-files)) that the auto-downloading will be triggered when necessary. We've tested for MacOS where the SC2 game is downloaded from https://starcraft2.com/. **Machine B**: the machine where TStarBot-X deploys. Ubuntu is required. Using a GPU is recommended (otherwise the NN forward-pass can be slow, causing high delay and degraded performance). TStarBot-X has been tested on a laptop with GTX 1650 and ubuntu 18.04. ## Installs Here are the step-by-step instructions. * Make sure that machine A can connect to machine B via passwordless ssh. - One can google for how to do the passwordless ssh setup, e.g., [here](https://www.tecmint.com/ssh-passwordless-login-using-ssh-keygen-in-5-easy-steps/). In summary, one generates the public and private keys on machine A, then copy (e.g, using `scp` for remote copying) the private key to the folder `~/.ssh` under machine B. Make sure the `sshd` service has started on machine B. - Verify the passwordless setup by connecting to B from A and see whether it succeeds. * On machine B, install the two packages `DistLadder3`, `pysc2 (Tencent Extension)`. See the [section below](#downloads) for downloading. Just cd to the corresponding folder and typing `pip install -e .` to complete the installation. The Linux SC2 binary version 4.10.0 is required, see the link [here](https://github.com/deepmind/pysc2#linux) and [here](https://github.com/Blizzard/s2client-proto#downloads). Then install the main package `SC2AgentsZoo2` (see the [section below](#downloads) for downloading) for TStarBot-X as follows: - To let TStarBot-X use GPU, you need install `cudnn` and `cuda` - Have `virtualenv` installed. You can do `pip install virtualenv` - Have ssh server started. You can do `apt-get install openssh-server` - cd to the folder `SC2AgentsZoo2/agent_TLeagueFormal14`, run the command `bash install_virtualenv3.sh` to complete the installation. * On machine A, install `DistLadder3`, `pysc2 (Tencent Extension)` as aforementioned. Install also ther commercial SC2 on this machine. ## Usage Here is how to start the human-machine test. On machine A, run the command: ``` python3 -m distladder3.bin.play_vs_remote_agent \ --human \ --port 6789 \ --remote \ --replay_dir /Users/your-name/Desktop \ --map KairosJunction \ --game_version 4.10.0 \ --replay_name xxx-A-machine.SC2Replay ``` which starts a game UI that a human can play SC2 with mouse and keyboard. For the `--remote` arg, if machine A and B happen to have the same user name, you can also omit the user name and simply write `--remote `. Examples: `--remote sc2tester@xx.xx.xx.xx` or `--remote xx.xx.xx.xx`. On machine B, run the command: ``` python3 -m distladder3.bin.play_vs_remote_agent \ --port 6789 \ --replay_dir /Users/your-name/Desktop \ --map KairosJunction \ --game_version 4.10.0 \ --replay_name xxx-B-machine.SC2Replay \ --player_name_path_config "TStarBot-X,/your/path/to/SC2AgentsZoo2/agent_TLeagueFormal14/,/your/path/to/the/config/file.ini" ``` to start the AI, where the arg `--player_name_path_config` determines an agent by its name (TStarBotX), path (`/your/path/to/SC2AgentsZoo2/agent_TLeagueFormal14/`), and config (`/your/path/to/the/config/file.ini`) in comma separated value. The `*.ini` config file specifies more detailed args for the agent, e.g., whether to use GPU, the NN model path, the zstat path (a folder), the zstat category, probability for zeroing the zstat, etc., as shown in the snippet below: ```ini [config] use_gpu_id=-1 ;-1 for not using GPU, 0 means using GPU #0, 1 for GPU #1, etc. ... chkpoints_root_dir=/Users/usr-name/tstarbotx/data model_filename=TStarBot-X-33days.model ... zstat_zeroing_prob=0.0 zstat_category=Normal174 ... tleague_interface_config.zstat_data_src=/Users/usr-name/tstarbotx/data/rp2124-mv8-victory-selected-Misc ... ``` We've prepared several agent config `.ini` files, as well as the corresponding NN models and zstat files (in a folder), see the [section below](#downloads). ## Downloads ### Packages * `DistLadder3` [Google Drive](https://drive.google.com/file/d/1ufCtU2JIyoSiSMwN4lqT66oxitmZeArh/view?usp=sharing) or [Tencent Weiyun](https://share.weiyun.com/QFgOzG4n) * `pysc2 (Tencent Extension)` [Google Drive](https://drive.google.com/file/d/1rJnmK1aNIFaYuYkXXmkDvTe-JRKrzFhe/view?usp=sharing) or [Tencent Weiyun](https://share.weiyun.com/mCCEZtOX) * `SC2AgentsZoo2` [Google Drive](https://drive.google.com/file/d/1neXug1fn3miHnKu9Z8tBpMC-ZKfIzXAP/view?usp=sharing) or [Tencent Weiyun](https://share.weiyun.com/NKiLym42) ### Data Files * zstat files (`rp2124-mv8-victory-selected-Misc`, a folder) used by the 8/25/33 days model [Google Drive](https://drive.google.com/file/d/1pV8wD_AXbbESQL2L4LticKTTwiaQpLCf/view?usp=sharing) or [Tencent Weiyun](https://share.weiyun.com/ZXeYGjZp) * Main Agent 8 days: - `ini` config [Google Drive](https://drive.google.com/file/d/1Ed80rcYaafVRGlQJ7hCcsge1snCx8oLQ/view?usp=sharing) or [Tencent Weiyun](https://share.weiyun.com/GJ1Bwfie) - NN model [Google Drive](https://drive.google.com/file/d/1mJ9s3dpScgKbYj3vZusnC0fJ1IPQuMrc/view?usp=sharing) or [Tencent Weiyun](https://share.weiyun.com/spHoQIg3) * Main Agent 25 days: - `ini` config [Google Drive](https://drive.google.com/file/d/1JrfERGRQrVaVPOU8AFjn_B9jhy5eeMl1/view?usp=sharing) or [Tencent Weiyun](https://share.weiyun.com/bNGLU4Zj) - NN model [Google Drive](https://drive.google.com/file/d/1BcQERcIGZvulCd5M4gCej80lJmdIYcLh/view?usp=sharing) or [Tencent Weiyun](https://share.weiyun.com/24QfxkMZ) * Main Agent 33 days: - `ini` config [Google Drive](https://drive.google.com/file/d/1AohBDH4C4Y86usNbEDrVhq1g2ZUEJfvp/view?usp=sharing) or [Tencent Weiyun](https://share.weiyun.com/ue3zQ7RG) - NN model [Google Drive](https://drive.google.com/file/d/1M6m-vGGGYNI-KHETq8t8gBKi_luuPLyD/view?usp=sharing) or [Tencent Weiyun](https://share.weiyun.com/yuh9ZDSe) ================================================ FILE: vizdoom/README.md ================================================ # ViZDoom Experiments This page contains the resources for the experiments of ViZDoom as discussed in the TLeague technical report. ## Trained Model A trained model for the experiments discussed in the technical report (CIG 2016 Track 1) has been given in the evaluation code, see the [section below](#install-myplayer). ## Evaluation The evaluation code can be found here: [Google Drive](https://drive.google.com/file/d/1soi_nHglpSazRv2znZzbqO3S6GcFJoPL/view?usp=sharing) or [Tencent Weiyun](https://share.weiyun.com/YyN0IqXS), which is a modification over the original evaluation code https://github.com/mihahauke/VDAIC2017 Our modification allows a synchronous mode for the host, "F1" and "MyPlayer", as discussed in the technical report and is summarized below. The root config file `_vizdoom.cfg` overrides all the private configurations of an agent, so we've commented out the `ASYNC_PLAYER` setting in `_vizdoom.cfg`: ```buildoutcfg doom_scenario_path = cig2017.wad # window_visible = False # mode = ASYNC_PLAYER game_args += -join localhost ``` For "F1", we add extra arguments to `f1/my_glorious_agent.py` ``` -c F1_COLOR, --color F1_COLOR 0 - green, 1 - gray, 2 - brown, 3 - red, 4 - light gray, 5 - light brown, 6 - light red, 7 - light blue (default: 0) -w, --watch window visible (default: False) -mode MODE, --mode MODE 1 for PLAYER, 2 for ASYNC_PLAYER (default: 1) ``` and default "F1" to green color and synchronous player. For the host, we add an argument ``` -mode MODE, --mode MODE 1 for PLAYER, 2 for ASYNC_PLAYER, 3 for SPECTATOR, 4 for ASYNC_SPECTATOR (default: 1) ``` to allow it be synchronous (mode `3`, the `SPECTATOR`). ### Installation The official code requires each submitted agent be packed as docker (see the descriptions at https://github.com/mihahauke/VDAIC2017). In our modified code, we do a common `pip install` for "MyPlayer", and docker build for any other third-party agent (e.g., "F1"), as explained below. #### Install MyPlayer Install the three packages `Arena`, `TLeague`, `TPolicies`, respectively: ``` cd MyPlayer/Arena pip3 install -e . cd MyPlayer/TLeague pip3 install -e . cd MyPlayer/TPolicies pip3 install -e . ``` Note, the MyPlayer evaluation code here is self-contained and relies on the old `TLeague`, `Arena`, `TPolicies` code that is somehow different from the `dev-open` branch. You can avoid the possible conflicts by using, for example, `virtualenv`. An NN model has been shipped with the code and placed in the path `MyPlayer/model/`. #### Install F1 Build the image: ```bash sudo chmod u+x build.sh ./build.sh f1 ``` The corresponding NN model has been contained. ### Install host Build the image: ```bash sudo chmod u+x build.sh ./build.sh host ``` ### Run To run the evaluation, start the `host`, `MyPlayer` and `F1` in separate terminals. `tmux` is recommended. Also, ensure ```bash sudo chmod u+x run.sh ``` For 1 MyPlayer, 7 builtin bots, run the following commands in separate terminals: ```bash ./run.sh host -mode 3 -b 7 bash MyPlayer/TLeague/tleague/sandbox/example_evaluation_vd.sh evaluation ``` where `-mode 3` means synchronous spectator, `-b 7` means adding 7 builtin bots. For 1 MyPlayer, 1 F1 and 6 builtin bots, run the following commands in separate terminals: ```bash ./run.sh host -mode 3 -b 6 -p 2 bash MyPlayer/TLeague/tleague/sandbox/example_evaluation_vd.sh evaluation ./run.sh f1 ``` where `-p 2` means there are 2 AI agents to join in. For 4 MyPlayer, 4 F1, run the following commands in separate terminals: ```bash ./run.sh host -mode 3 -p 8 bash MyPlayer/TLeague/tleague/sandbox/example_evaluation_vd.sh evaluation bash MyPlayer/TLeague/tleague/sandbox/example_evaluation_vd.sh evaluation bash MyPlayer/TLeague/tleague/sandbox/example_evaluation_vd.sh evaluation bash MyPlayer/TLeague/tleague/sandbox/example_evaluation_vd.sh evaluation ./run.sh f1 ./run.sh f1 ./run.sh f1 ./run.sh f1 ``` ## Training Code As described in the technical reports, there are two stages for training: one for the navigation, and the other for the frag. We provide the corresponding `.yml.jinja2` files here: [for navigation](vdtr-navi-open.yml.jinja2) and [for frag](vdtr-frag-open.yml.jinja2), respectively. Run the training over a k8s cluster: ```bash # start python render_template.py vdtr-navi-open.yml.jinja2 | kubectl apply -f - # stop python render_template.py vdtr-navi-open.yml.jinja2 | kubectl delete -f - ``` ```bash # start python render_template.py vdtr-frag-open.yml.jinja2 | kubectl apply -f - # stop python render_template.py vdtr-frag-open.yml.jinja2 | kubectl delete -f - ``` TODO: guidance to setting up PVC? ## Downloads TODO: link to the video clips for the evaluation ================================================ FILE: vizdoom/vdtr-frag-open.yml.jinja2 ================================================ {% set session = 'vdtr-frag-open' %} {% set image = "your-docker-registry:port/sc2ai/tleague-vd118:20201209171727" %} {% set learner_image = "your-docker-registry:port/sc2ai/tleague-gpu-hvd-vd118:20201209171727" %} {% set docker_registry_credential = "regsecret" %} {% set require_resources = true %} {% set pvc_name = "cephfs-pvc-test" %} {% set chkpoints_zoo_pvc_sub_dir = "chkpoints_zoo/" %} {% set chkpoints_pvc_sub_dir = chkpoints_zoo_pvc_sub_dir + session + "_chkpoints" %} {# common #} {% set env = "vizdoom_cig2017_track1" %} {% set env_config = { 'num_players' : 8, 'num_bots' : 0, 'train_mode' : 'frag' } %} {% set policy = "tpolicies.net_zoo.conv_lstm.conv_lstm" %} {% set policy_config = { 'use_xla': False, 'test': False, 'rl': True, 'use_loss_type': 'rl', 'use_value_head': True, 'n_v': 1, 'use_lstm': True, 'rollout_len': 1, 'nlstm': 128, 'hs_len': 256, 'lstm_dropout_rate': 0.2, 'lstm_cell_type':'lstm', 'lstm_layer_norm': True, 'weight_decay': 0.00002, 'sync_statistics': 'horovod' } %} {% set self_policy_config = { 'batch_size': 32, 'use_xla': False, 'test': True, 'use_loss_type': 'none', 'use_value_head': True, 'n_v': 1, 'use_lstm': True, 'rollout_len': 1, 'nlstm': 128, 'hs_len': 256, 'lstm_dropout_rate': 0.2, 'lstm_cell_type':'lstm', 'lstm_layer_norm': True, 'weight_decay': 0.00002, 'sync_statistics': 'horovod' } %} {% set unroll_length = 64 %} {% set rollout_length = 16 %} {# model pool#} {% set n_model_pools = 5 %} {% set model_pool_port1 = 10005 %} {% set model_pool_port2 = 10006 %} {% set model_pool_verbose = 0 %} {# league mgr #} {% set league_mgr_port = 10007 %} {% set game_mgr_type = "tleague.game_mgr.game_mgrs.SelfPlayGameMgr" %} {% set game_mgr_config = { 'max_n_players': 30} %} {% set mutable_hyperparam_type = "ConstantHyperparam" %} {% set hyperparam_config_name ={ 'learning_rate': 5e-5, 'lam': 0.95, 'gamma': 0.99, 'sigma': 10, 'reward_weights': [[0.05, 0.7, 0.5, 0.3, 0.4, 0, 0.03, 0.05, 0, 0.05]] } %} {% set league_mgr_chkpoints_dir = "/root/results/" %} {% set league_mgr_restore_checkpoint_dir = '' %} {# [[0.05, 0.7, 0.5, 0.3, 0.4, 0, 0.03, 0.05, 0, 0.05]] #} {% set league_mgr_save_checkpoint_root = league_mgr_chkpoints_dir + session + "_chkpoints" %} {% set league_mgr_save_interval_secs = 3600 %} {% set mute_actor_msg = False %} {% set pseudo_learner_num = -1 %} {# set init_model_paths = [['0000', league_mgr_chkpoints_dir + 'vdtr2308-navi-18-vd-sample_chkpoints' + '/0066:0067_20201109121749.model']] #} {% set init_model_paths = [['0000', league_mgr_chkpoints_dir + 'vdtr2410-navi-18-open_chkpoints' + '/0016:0017_20201211004705.model']] %} {% set league_mgr_verbose = 9 %} {# learners #} {% set n_lrn_groups = 1 %} {% set n_hosts_per_lrn_group = 2 %} {% set n_gpus_per_host = 8 %} {% set hvd_ssh_port = 9527 %} {% set lrn_port_base = 30003 %} {% set batch_size = 32 %} {% set lrn_rm_size = 20480 %} {% set lrn_pub_interval = 200 %} {% set lrn_log_interval = 50 %} {% set lrn_total_timesteps = 10000000 %} {% set lrn_burn_in_timesteps = 0 %} {% set n_v = 1 %} {% set lrn_rwd_shape = False %} {% set lrn_tb_port = 9003 %} {% set learner_config ={ 'vf_coef': 0.5, 'ent_coef': 0.00003125, 'distill_coef': 0.0, 'max_grad_norm': 0.5 } %} {# actors per learner #} {% set n_actors_per_learner = 8 %} {% set actor_distillation = False %} {% set actor_update_model_freq = 40 %} {% set actor_rwd_shape = True %} {% set actor_log_interval_steps = 51 %} {% set actor_verbose = 11 %} {% set actor_replay_dir = "/root/replays/" %} {% set interface_config = "" %} {# --- league manager --- #} {% if true %} --- kind: Service apiVersion: v1 metadata: name: {{ session }}-league-mgr labels: session: {{ session }} job: league-mgr type: league-mgr spec: selector: session: {{ session }} job: league-mgr ports: - port: {{ league_mgr_port }} name: port1 --- apiVersion: v1 kind: Pod metadata: name: {{ session }}-league-mgr labels: session: {{ session }} type: league-mgr job: league-mgr spec: nodeSelector: type: cpu restartPolicy: Never # if failure, let it die volumes: - name: data-dir persistentVolumeClaim: claimName: {{ pvc_name }} {% if docker_registry_credential %} imagePullSecrets: - name: {{ docker_registry_credential }} {% endif %} containers: - name: {{ session }}-league-mgr-container image: {{ image }} ports: - containerPort: {{ league_mgr_port }} {% if require_resources %} resources: limits: nvidia.com/gpu: 0 requests: nvidia.com/gpu: 0 cpu: 36 memory: 64Gi {% endif %} volumeMounts: - mountPath: {{ league_mgr_chkpoints_dir }} name: data-dir subPath: {{ chkpoints_zoo_pvc_sub_dir }} command: - "python3" args: - "-m" - "tleague.bin.run_league_mgr" {% set sep = joiner(',') %} - "--model_pool_addrs={% for i in range(n_model_pools) %}{{ sep() }}{{ session }}-mp{{ i }}:{{ model_pool_port1 }}:{{ model_pool_port2 }}{% endfor %}" - "--port={{ league_mgr_port }}" - "--game_mgr_type={{ game_mgr_type }}" - "--game_mgr_config={{game_mgr_config}}" - "--mutable_hyperparam_type={{ mutable_hyperparam_type }}" - "--hyperparam_config_name={{ hyperparam_config_name }}" - "--restore_checkpoint_dir={{ league_mgr_restore_checkpoint_dir }}" - "--save_checkpoint_root={{ league_mgr_save_checkpoint_root }}" - "--save_interval_secs={{ league_mgr_save_interval_secs }}" - "--{% if mute_actor_msg %}mute_actor_msg{% else %}nomute_actor_msg{% endif %}" - "--verbose={{ league_mgr_verbose }}" - "--init_model_paths={{ init_model_paths }}" - "--pseudo_learner_num={{ pseudo_learner_num }}" {% endif %} {# --- model pools --- #} {% if true %} {% for i in range(n_model_pools) %} --- kind: Service apiVersion: v1 metadata: name: {{ session }}-mp{{ i }} labels: session: {{ session }} job: model-pool-{{ i }} type: model-pool spec: selector: session: {{ session }} job: model-pool-{{ i }} ports: - port: {{ model_pool_port1 }} name: port1 - port: {{ model_pool_port2 }} name: port2 --- apiVersion: v1 kind: Pod metadata: name: {{ session }}-mp{{ i }} labels: session: {{ session }} job: model-pool-{{ i }} type: model-pool spec: nodeSelector: type: cpu {% if docker_registry_credential %} imagePullSecrets: - name: {{ docker_registry_credential }} {% endif %} restartPolicy: Never # if failure, let it die containers: - name: {{ session }}-model-pool-container image: {{ image }} ports: - containerPort: {{ model_pool_port1 }} - containerPort: {{ model_pool_port2 }} {% if require_resources %} resources: limits: nvidia.com/gpu: 0 requests: nvidia.com/gpu: 0 cpu: 36 memory: 36Gi {% endif %} command: - "python3" args: - "-m" - "tleague.bin.run_model_pool" - "--ports={{ model_pool_port1 }}:{{ model_pool_port2 }}" - "--verbose={{ model_pool_verbose }}" {% endfor %} {% endif %} {# --- learners and actors per learner --- #} {% if true %} {% for i in range(n_lrn_groups) %} {% for j in range(n_hosts_per_lrn_group - 1, -1, -1) %} {# --- each host corresponds to a service owning a DNS name #} --- kind: Service apiVersion: v1 metadata: name: {{ session }}-lg{{ i }}-h{{ j }} labels: session: {{ session }} type: learner spec: selector: session: {{ session }} type: learner group: group-{{ i }} host: host-{{ j }} ports: - port: {{ hvd_ssh_port }} name: port-ssh {% for k in range(n_gpus_per_host) %} - port: {{ lrn_port_base + 2*k}} name: port{{ 2*k }} - port: {{ lrn_port_base + 2*k + 1 }} name: port{{ 2*k + 1 }} {% endfor %} {% if lrn_tb_port %} - port: {{ lrn_tb_port }} name: port-tb {% endif %} --- apiVersion: v1 kind: Pod metadata: name: {{ session }}-lg{{ i }}-h{{ j }} labels: session: {{ session }} type: learner group: group-{{ i }} host: host-{{ j }} spec: nodeSelector: type: gpu restartPolicy: Never # if failure, let it die volumes: - name: training-log-dir emptyDir: {} {% if docker_registry_credential %} imagePullSecrets: - name: {{ docker_registry_credential }} {% endif %} containers: - name: {{ session }}-lg{{ i }}-h{{ j }}-container image: {{ learner_image }} ports: - containerPort: {{ hvd_ssh_port }} {% for k in range(n_gpus_per_host) %} - containerPort: {{ lrn_port_base + 2*k }} - containerPort: {{ lrn_port_base + 2*k + 1}} {% endfor %} {% if lrn_tb_port %} - containerPort: {{ lrn_tb_port }} {% endif %} {% if require_resources %} resources: limits: nvidia.com/gpu: {{ n_gpus_per_host }} requests: nvidia.com/gpu: {{ n_gpus_per_host }} cpu: 48 memory: 150Gi {% endif %} env: - name: NONCCL_DEBUG value: "INFO" {% if j == 0 %} {# --- run the mpirun/horovodrun command --- #} volumeMounts: - name: training-log-dir mountPath: /root/work/training_log command: - "horovodrun" args: - "--verbose" - "--start-timeout" - "1800" - "-p" - "{{ hvd_ssh_port }}" - "-np" - "{{ n_hosts_per_lrn_group * n_gpus_per_host }}" - "-H" {% set sep = joiner(',') %} - "{% for jj in range(n_hosts_per_lrn_group) %}{{ sep() }}{{ session }}-lg{{ i }}-h{{ jj }}:{{ n_gpus_per_host }}{% endfor %}" - "python" - "-m" - "tleague.bin.run_pg_learner" - "--league_mgr_addr={{ session }}-league-mgr:{{ league_mgr_port }}" {% set sep = joiner(',') %} - "--model_pool_addrs={% for i in range(n_model_pools) %}{{ sep() }}{{ session }}-mp{{ i }}:{{ model_pool_port1 }}:{{ model_pool_port2 }}{% endfor %}" {% for ind_host in range(n_hosts_per_lrn_group) %} {% set sep = joiner(',') %} - "--learner_spec={% for gpu_id in range(n_gpus_per_host) %}{{ sep() }}{{ gpu_id }}:{{ lrn_port_base + 2*gpu_id }}:{{ lrn_port_base + 2*gpu_id + 1 }}{% endfor %}" {% endfor %} - "--learner_id=lrngrp{{ i }}" - "--unroll_length={{ unroll_length }}" - "--rollout_length={{ rollout_length }}" - "--batch_size={{ batch_size }}" - "--rm_size={{ lrn_rm_size }}" - "--pub_interval={{ lrn_pub_interval }}" - "--log_interval={{ lrn_log_interval }}" - "--total_timesteps={{ lrn_total_timesteps }}" - "--burn_in_timesteps={{ lrn_burn_in_timesteps }}" - "--env={{ env }}" - "--policy={{ policy }}" - "--policy_config={{ policy_config }}" - "--{% if lrn_rwd_shape %}rwd_shape{% else %}norwd_shape{% endif %}" - "--batch_worker_num={{ 4 }}" - "--learner_config={{ learner_config }}" - "--type=PPO" {% else %} {# --- start an ssh deamon and run an arbitray command that occupies the container --- #} command: - "bash" - "-c" args: - "/usr/sbin/sshd -p {{ hvd_ssh_port }}; sleep {{ 3600 * 24 * 7 * 52 * 3}}" {% endif %} {% if j==0 and lrn_tb_port %} {# --- start tensorboard when applicable --- #} - name: {{ session }}-tb-lrngrp{{ i }}rank0-container image: {{ learner_image }} ports: - containerPort: {{ lrn_tb_port }} volumeMounts: - name: training-log-dir mountPath: /root/training_log env: - name: CUDA_VISIBLE_DEVICES value: "" command: - "tensorboard" args: - "--logdir=/root/training_log/lrngrp{{ i }}rank0" - "--port={{ lrn_tb_port }}" {% endif %} {# --- endif j == 0 --- #} {% for k in range(n_gpus_per_host) %} {# --- the actors correspond to group i host j localrank k--- #} --- kind: Deployment apiVersion: extensions/v1beta1 metadata: name: {{ session }}-actor-lg{{ i }}-h{{ j }}-localrank{{ k }} labels: session: {{ session }} type: actor spec: replicas: {{ n_actors_per_learner }} selector: matchLabels: session: {{ session }} type: actor group: group-{{ i }} host: host-{{ j }} localrank: localrank-{{ k }} template: metadata: labels: session: {{ session }} type: actor group: group-{{ i }} host: host-{{ j }} localrank: localrank-{{ k }} spec: nodeSelector: type: cpu volumes: - name: data-dir persistentVolumeClaim: claimName: {{ pvc_name }} {% if docker_registry_credential != "" %} imagePullSecrets: - name: {{ docker_registry_credential }} {% endif %} containers: - name: {{ session }}-actor-lg{{ i }}-h{{ j }}-localrank{{ k }}-container image: {{ image }} imagePullPolicy: Always stdin: true {% if require_resources %} resources: limits: nvidia.com/gpu: 0 requests: nvidia.com/gpu: 0 cpu: 8 memory: 20Gi {% endif %} command: - "python3" args: - "-m" - "tleague.bin.run_pg_actor" - "--league_mgr_addr={{ session }}-league-mgr:{{ league_mgr_port }}" {% set sep = joiner(',') %} - "--model_pool_addrs={% for i in range(n_model_pools) %}{{ sep() }}{{ session }}-mp{{ i }}:{{ model_pool_port1 }}:{{ model_pool_port2 }}{% endfor %}" - "--learner_addr={{ session }}-lg{{ i }}-h{{ j }}:{{ lrn_port_base + 2*k }}:{{ lrn_port_base + 2*k + 1 }}" - "--unroll_length={{ unroll_length }}" - "--update_model_freq={{ actor_update_model_freq }}" - "--env={{ env }}" - "--env_config={{env_config}}" - "--interface_config={{interface_config}}" - "--policy={{ policy }}" - "--policy_config={{ self_policy_config }}" - "--verbose={{ actor_verbose }}" - "--log_interval_steps={{ actor_log_interval_steps }}" - "--n_v={{ n_v }}" - "--{% if actor_rwd_shape %}rwd_shape{% else %}norwd_shape{% endif %}" - "--{% if actor_distillation %}distillation{% else %}nodistillation{% endif %}" - "--type=PPO" {% endfor %} {# --- endfor k --- #} {% endfor %} {# --- endfor j --- #} {% endfor %} {# --- endfor i --- #} {% endif %} {# --- endif true/false --- #} ================================================ FILE: vizdoom/vdtr-navi-open.yml.jinja2 ================================================ {% set session = 'vdtr-navi' %} {% set image = "your-docker-registry:port/sc2ai/tleague-vd118:20201209171727" %} {% set learner_image = "your-docker-registry:port/sc2ai/tleague-gpu-hvd-vd118:20201209171727" %} {% set docker_registry_credential = "regsecret" %} {% set require_resources = true %} {% set pvc_name = "cephfs-pvc-test" %} {% set chkpoints_zoo_pvc_sub_dir = "chkpoints_zoo/" %} {% set chkpoints_pvc_sub_dir = chkpoints_zoo_pvc_sub_dir + session + "_chkpoints" %} {# common #} {% set env = "vizdoom_cig2017_track1" %} {% set env_config = { 'num_players' : 8, 'num_bots' : 0, 'train_mode' : 'navi' } %} {% set policy = "tpolicies.net_zoo.conv_lstm.conv_lstm" %} {% set policy_config = { 'use_xla': False, 'test': False, 'rl': True, 'use_loss_type': 'rl', 'use_value_head': True, 'n_v': 1, 'use_lstm': True, 'rollout_len': 1, 'nlstm': 128, 'hs_len': 256, 'lstm_dropout_rate': 0.2, 'lstm_cell_type':'lstm', 'lstm_layer_norm': True, 'weight_decay': 0.00002, 'sync_statistics': 'horovod' } %} {% set self_policy_config = { 'use_xla': False, 'test': True, 'use_loss_type': 'none', 'use_value_head': True, 'n_v': 1, 'use_lstm': True, 'rollout_len': 1, 'nlstm': 128, 'hs_len': 256, 'lstm_dropout_rate': 0.2, 'lstm_cell_type':'lstm', 'lstm_layer_norm': True, 'weight_decay': 0.00002, 'sync_statistics': 'horovod' } %} {% set unroll_length = 64 %} {% set rollout_length = 16 %} {# model pool#} {% set n_model_pools = 15 %} {% set model_pool_port1 = 10005 %} {% set model_pool_port2 = 10006 %} {% set model_pool_verbose = 0 %} {# league mgr #} {% set league_mgr_port = 10007 %} {% set game_mgr_type = "tleague.game_mgr.game_mgrs.SelfPlayGameMgr" %} {% set game_mgr_config = { 'max_n_players': 30} %} {% set mutable_hyperparam_type = "ConstantHyperparam" %} {% set hyperparam_config_name ={ 'learning_rate': 1e-4, 'lam': 0.95, 'gamma': 0.99, 'sigma': 10, 'reward_weights': [[1, 0, 0, 0, 0, 0, 0, 0, 0, 1]] } %} {% set league_mgr_chkpoints_dir = "/root/results/" %} {% set league_mgr_restore_checkpoint_dir = '' %} {# [[0.05, 0.7, 0.5, 0.3, 0.4, 0, 0.03, 0.05, 0, 0.05]] #} {% set league_mgr_save_checkpoint_root = league_mgr_chkpoints_dir + session + "_chkpoints" %} {% set league_mgr_save_interval_secs = 3600 %} {% set mute_actor_msg = False %} {% set pseudo_learner_num = -1 %} {# set init_model_paths = [['0000', league_mgr_chkpoints_dir + 'vdtr2308-navi-18-vd-sample_chkpoints' + '/0066:0067_20201109121749.model']] #} {% set init_model_paths = [] %} {% set league_mgr_verbose = 9 %} {# learners #} {% set n_lrn_groups = 1 %} {% set n_hosts_per_lrn_group = 2 %} {% set n_gpus_per_host = 8 %} {% set hvd_ssh_port = 9527 %} {% set lrn_port_base = 30003 %} {% set batch_size = 32 %} {% set lrn_rm_size = 20480 %} {% set lrn_pub_interval = 200 %} {% set lrn_log_interval = 50 %} {% set lrn_total_timesteps = 10000000 %} {% set lrn_burn_in_timesteps = 0 %} {% set n_v = 1 %} {% set lrn_rwd_shape = False %} {% set lrn_tb_port = 9003 %} {% set learner_config ={ 'vf_coef': 0.5, 'ent_coef': 0.00003125, 'distill_coef': 0.0, 'max_grad_norm': 0.5 } %} {# actors per learner #} {% set n_actors_per_learner = 24 %} {% set actor_distillation = False %} {% set actor_update_model_freq = 40 %} {% set actor_rwd_shape = True %} {% set actor_log_interval_steps = 51 %} {% set actor_verbose = 11 %} {% set actor_replay_dir = "/root/replays/" %} {% set interface_config = "" %} {# --- league manager --- #} {% if true %} --- kind: Service apiVersion: v1 metadata: name: {{ session }}-league-mgr labels: session: {{ session }} job: league-mgr type: league-mgr spec: selector: session: {{ session }} job: league-mgr ports: - port: {{ league_mgr_port }} name: port1 --- apiVersion: v1 kind: Pod metadata: name: {{ session }}-league-mgr labels: session: {{ session }} type: league-mgr job: league-mgr spec: nodeSelector: type: cpu restartPolicy: Never # if failure, let it die volumes: - name: data-dir persistentVolumeClaim: claimName: {{ pvc_name }} {% if docker_registry_credential %} imagePullSecrets: - name: {{ docker_registry_credential }} {% endif %} containers: - name: {{ session }}-league-mgr-container image: {{ image }} ports: - containerPort: {{ league_mgr_port }} {% if require_resources %} resources: limits: nvidia.com/gpu: 0 requests: nvidia.com/gpu: 0 cpu: 36 memory: 64Gi {% endif %} volumeMounts: - mountPath: {{ league_mgr_chkpoints_dir }} name: data-dir subPath: {{ chkpoints_zoo_pvc_sub_dir }} command: - "python3" args: - "-m" - "tleague.bin.run_league_mgr" {% set sep = joiner(',') %} - "--model_pool_addrs={% for i in range(n_model_pools) %}{{ sep() }}{{ session }}-mp{{ i }}:{{ model_pool_port1 }}:{{ model_pool_port2 }}{% endfor %}" - "--port={{ league_mgr_port }}" - "--game_mgr_type={{ game_mgr_type }}" - "--game_mgr_config={{game_mgr_config}}" - "--mutable_hyperparam_type={{ mutable_hyperparam_type }}" - "--hyperparam_config_name={{ hyperparam_config_name }}" - "--restore_checkpoint_dir={{ league_mgr_restore_checkpoint_dir }}" - "--save_checkpoint_root={{ league_mgr_save_checkpoint_root }}" - "--save_interval_secs={{ league_mgr_save_interval_secs }}" - "--{% if mute_actor_msg %}mute_actor_msg{% else %}nomute_actor_msg{% endif %}" - "--verbose={{ league_mgr_verbose }}" - "--init_model_paths={{ init_model_paths }}" - "--pseudo_learner_num={{ pseudo_learner_num }}" {% endif %} {# --- model pools --- #} {% if true %} {% for i in range(n_model_pools) %} --- kind: Service apiVersion: v1 metadata: name: {{ session }}-mp{{ i }} labels: session: {{ session }} job: model-pool-{{ i }} type: model-pool spec: selector: session: {{ session }} job: model-pool-{{ i }} ports: - port: {{ model_pool_port1 }} name: port1 - port: {{ model_pool_port2 }} name: port2 --- apiVersion: v1 kind: Pod metadata: name: {{ session }}-mp{{ i }} labels: session: {{ session }} job: model-pool-{{ i }} type: model-pool spec: nodeSelector: type: cpu {% if docker_registry_credential %} imagePullSecrets: - name: {{ docker_registry_credential }} {% endif %} restartPolicy: Never # if failure, let it die containers: - name: {{ session }}-model-pool-container image: {{ image }} ports: - containerPort: {{ model_pool_port1 }} - containerPort: {{ model_pool_port2 }} {% if require_resources %} resources: limits: nvidia.com/gpu: 0 requests: nvidia.com/gpu: 0 cpu: 36 memory: 36Gi {% endif %} command: - "python3" args: - "-m" - "tleague.bin.run_model_pool" - "--ports={{ model_pool_port1 }}:{{ model_pool_port2 }}" - "--verbose={{ model_pool_verbose }}" {% endfor %} {% endif %} {# --- learners and actors per learner --- #} {% if true %} {% for i in range(n_lrn_groups) %} {% for j in range(n_hosts_per_lrn_group - 1, -1, -1) %} {# --- each host corresponds to a service owning a DNS name #} --- kind: Service apiVersion: v1 metadata: name: {{ session }}-lg{{ i }}-h{{ j }} labels: session: {{ session }} type: learner spec: selector: session: {{ session }} type: learner group: group-{{ i }} host: host-{{ j }} ports: - port: {{ hvd_ssh_port }} name: port-ssh {% for k in range(n_gpus_per_host) %} - port: {{ lrn_port_base + 2*k}} name: port{{ 2*k }} - port: {{ lrn_port_base + 2*k + 1 }} name: port{{ 2*k + 1 }} {% endfor %} {% if lrn_tb_port %} - port: {{ lrn_tb_port }} name: port-tb {% endif %} --- apiVersion: v1 kind: Pod metadata: name: {{ session }}-lg{{ i }}-h{{ j }} labels: session: {{ session }} type: learner group: group-{{ i }} host: host-{{ j }} spec: nodeSelector: type: gpu restartPolicy: Never # if failure, let it die volumes: - name: training-log-dir emptyDir: {} {% if docker_registry_credential %} imagePullSecrets: - name: {{ docker_registry_credential }} {% endif %} containers: - name: {{ session }}-lg{{ i }}-h{{ j }}-container image: {{ learner_image }} ports: - containerPort: {{ hvd_ssh_port }} {% for k in range(n_gpus_per_host) %} - containerPort: {{ lrn_port_base + 2*k }} - containerPort: {{ lrn_port_base + 2*k + 1}} {% endfor %} {% if lrn_tb_port %} - containerPort: {{ lrn_tb_port }} {% endif %} {% if require_resources %} resources: limits: nvidia.com/gpu: {{ n_gpus_per_host }} requests: nvidia.com/gpu: {{ n_gpus_per_host }} cpu: 48 memory: 150Gi {% endif %} env: - name: NONCCL_DEBUG value: "INFO" {% if j == 0 %} {# --- run the mpirun/horovodrun command --- #} volumeMounts: - name: training-log-dir mountPath: /root/work/training_log command: - "horovodrun" args: - "--verbose" - "--start-timeout" - "1800" - "-p" - "{{ hvd_ssh_port }}" - "-np" - "{{ n_hosts_per_lrn_group * n_gpus_per_host }}" - "-H" {% set sep = joiner(',') %} - "{% for jj in range(n_hosts_per_lrn_group) %}{{ sep() }}{{ session }}-lg{{ i }}-h{{ jj }}:{{ n_gpus_per_host }}{% endfor %}" - "python" - "-m" - "tleague.bin.run_pg_learner" - "--league_mgr_addr={{ session }}-league-mgr:{{ league_mgr_port }}" {% set sep = joiner(',') %} - "--model_pool_addrs={% for i in range(n_model_pools) %}{{ sep() }}{{ session }}-mp{{ i }}:{{ model_pool_port1 }}:{{ model_pool_port2 }}{% endfor %}" {% for ind_host in range(n_hosts_per_lrn_group) %} {% set sep = joiner(',') %} - "--learner_spec={% for gpu_id in range(n_gpus_per_host) %}{{ sep() }}{{ gpu_id }}:{{ lrn_port_base + 2*gpu_id }}:{{ lrn_port_base + 2*gpu_id + 1 }}{% endfor %}" {% endfor %} - "--learner_id=lrngrp{{ i }}" - "--unroll_length={{ unroll_length }}" - "--rollout_length={{ rollout_length }}" - "--batch_size={{ batch_size }}" - "--rm_size={{ lrn_rm_size }}" - "--pub_interval={{ lrn_pub_interval }}" - "--log_interval={{ lrn_log_interval }}" - "--total_timesteps={{ lrn_total_timesteps }}" - "--burn_in_timesteps={{ lrn_burn_in_timesteps }}" - "--env={{ env }}" - "--policy={{ policy }}" - "--policy_config={{ policy_config }}" - "--{% if lrn_rwd_shape %}rwd_shape{% else %}norwd_shape{% endif %}" - "--batch_worker_num={{ 4 }}" - "--learner_config={{ learner_config }}" - "--type=PPO" {% else %} {# --- start an ssh deamon and run an arbitray command that occupies the container --- #} command: - "bash" - "-c" args: - "/usr/sbin/sshd -p {{ hvd_ssh_port }}; sleep {{ 3600 * 24 * 7 * 52 * 3}}" {% endif %} {% if j==0 and lrn_tb_port %} {# --- start tensorboard when applicable --- #} - name: {{ session }}-tb-lrngrp{{ i }}rank0-container image: {{ learner_image }} ports: - containerPort: {{ lrn_tb_port }} volumeMounts: - name: training-log-dir mountPath: /root/training_log env: - name: CUDA_VISIBLE_DEVICES value: "" command: - "tensorboard" args: - "--logdir=/root/training_log/lrngrp{{ i }}rank0" - "--port={{ lrn_tb_port }}" {% endif %} {# --- endif j == 0 --- #} {% for k in range(n_gpus_per_host) %} {# --- the actors correspond to group i host j localrank k--- #} --- kind: Deployment apiVersion: extensions/v1beta1 metadata: name: {{ session }}-actor-lg{{ i }}-h{{ j }}-localrank{{ k }} labels: session: {{ session }} type: actor spec: replicas: {{ n_actors_per_learner }} selector: matchLabels: session: {{ session }} type: actor group: group-{{ i }} host: host-{{ j }} localrank: localrank-{{ k }} template: metadata: labels: session: {{ session }} type: actor group: group-{{ i }} host: host-{{ j }} localrank: localrank-{{ k }} spec: nodeSelector: type: cpu volumes: - name: data-dir persistentVolumeClaim: claimName: {{ pvc_name }} {% if docker_registry_credential != "" %} imagePullSecrets: - name: {{ docker_registry_credential }} {% endif %} containers: - name: {{ session }}-actor-lg{{ i }}-h{{ j }}-localrank{{ k }}-container image: {{ image }} imagePullPolicy: Always stdin: true {% if require_resources %} resources: limits: nvidia.com/gpu: 0 requests: nvidia.com/gpu: 0 cpu: 8 memory: 20Gi {% endif %} command: - "python3" args: - "-m" - "tleague.bin.run_pg_actor" - "--league_mgr_addr={{ session }}-league-mgr:{{ league_mgr_port }}" {% set sep = joiner(',') %} - "--model_pool_addrs={% for i in range(n_model_pools) %}{{ sep() }}{{ session }}-mp{{ i }}:{{ model_pool_port1 }}:{{ model_pool_port2 }}{% endfor %}" - "--learner_addr={{ session }}-lg{{ i }}-h{{ j }}:{{ lrn_port_base + 2*k }}:{{ lrn_port_base + 2*k + 1 }}" - "--unroll_length={{ unroll_length }}" - "--update_model_freq={{ actor_update_model_freq }}" - "--env={{ env }}" - "--env_config={{env_config}}" - "--interface_config={{interface_config}}" - "--policy={{ policy }}" - "--policy_config={{ self_policy_config }}" - "--verbose={{ actor_verbose }}" - "--log_interval_steps={{ actor_log_interval_steps }}" - "--n_v={{ n_v }}" - "--{% if actor_rwd_shape %}rwd_shape{% else %}norwd_shape{% endif %}" - "--{% if actor_distillation %}distillation{% else %}nodistillation{% endif %}" - "--type=PPO" {% endfor %} {# --- endfor k --- #} {% endfor %} {# --- endfor j --- #} {% endfor %} {# --- endfor i --- #} {% endif %} {# --- endif true/false --- #}