[
  {
    "path": "A Guide to DeepMind's StarCraft AI Environment.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# A Guide to DeepMind's StarCraft AI Environment\\n\",\n    \"\\n\",\n    \"## Demo -- We're going to setup and install the necessary tools to run a pretrained Deep Q Network model on the CollectMineralShards mini-game of DeepMind's StarCraft II Environment.\\n\",\n    \"\\n\",\n    \"![alt text](https://cdn.technologyreview.com/i/images/20161104patrick-strackblizzcon16070a0766.jpg?sw=2360&cx=0&cy=0&cw=950&ch=633 \\\"Logo Title Text 1\\\")\\n\",\n    \"\\n\",\n    \"## History\\n\",\n    \"\\n\",\n    \"Deepmind already beat Atari Games with the Deep Q Learner\\n\",\n    \"\\n\",\n    \"![alt text](https://rubenfiszel.github.io/posts/rl4j/conv.png \\\"Logo Title Text 1\\\")\\n\",\n    \"![alt text](https://s3-ap-south-1.amazonaws.com/av-blog-media/wp-content/uploads/2017/01/12042140/11038f3.jpg \\\"Logo Title Text 1\\\")\\n\",\n    \"![alt text](https://www.intelnervana.com/wp-content/uploads/sites/53/2017/06/Screen-Shot-2015-12-21-at-11.23.43-AM-1.png \\\"Logo Title Text 1\\\")\\n\",\n    \"\\n\",\n    \"Then they beat the \\\"unbeatable\\\" game of \\\"Go\\\" with AlphaGo\\n\",\n    \"\\n\",\n    \"![alt text](https://image.slidesharecdn.com/masteringthegameofgowithdeepneuralnetworksandtreesearch-160321115146/95/mastering-the-game-of-go-with-deep-neural-networks-and-tree-search-10-638.jpg?cb=1458635243 \\\"Logo Title Text 1\\\")\\n\",\n    \"![alt text](https://image.slidesharecdn.com/howdeepmindmasteredthegameofgo-160903224536/95/how-deepmind-mastered-the-game-of-go-20-638.jpg?cb=1472943238 \\\"Logo Title Text 1\\\")\\n\",\n    \"\\n\",\n    \"And now they've set their sights on Starcraft. For an AI to play StarCraft well, it'll need\\n\",\n    \"\\n\",\n    \"- An effective use of memory\\n\",\n    \"- an ability to plan over a long time\\n\",\n    \"- The capacity to adapt plans based on new information. \\n\",\n    \"- To execute something as simple as “expand your base to some location”, one must coordinate mouse clicks, camera, and available resources.  This makes actions and planning hierarchical, which is a challenging aspect of Reinforcement Learning.\\n\",\n    \"\\n\",\n    \"Blizzard's StarCraft II API is an interface that provides full external control of StarCraft II.\\n\",\n    \"\\n\",\n    \"This API exposes functionality for developing software for:\\n\",\n    \"\\n\",\n    \"- Scripted bots.\\n\",\n    \"- Machine-learning based bots.\\n\",\n    \"- Replay analysis.\\n\",\n    \"- Tool assisted human play.\\n\",\n    \"\\n\",\n    \"DeepMind's PySC2 - StarCraft II Learning Environment exposes it as a Python RL Environment. \\n\",\n    \"\\n\",\n    \"- A Machine Learning API developed by Blizzard that gives researchers and developers hooks into the game. This includes the release of tools for Linux for the first time.\\n\",\n    \"- A dataset of anonymised game replays, which will increase from 65k to more than half a million in the coming weeks.  \\n\",\n    \"- An open source version of DeepMind’s toolset, PySC2, to allow researchers to easily use Blizzard’s feature-layer API with their agents.\\n\",\n    \"- A series of simple RL mini-games to allow researchers to test the performance of agents on specific tasks.\\n\",\n    \"- A joint paper that outlines the environment, and reports initial baseline results on the mini-games, supervised learning from replays, and the full 1v1 ladder game against the built-in AI.\\n\",\n    \"\\n\",\n    \"Starcraft II is a real-time strategy game developed by Blizzard entertainment, otherwise known as the makers of World of Warcraft. It's the sequel to Starcraft, a game from 1998 that many regard as one of the greatest PC games ever released. Even now, over a decade on, it's still played regularly by people all over the world; in Korea, it's so popular that there are professional leagues dedicated solely to playing the game.\\n\",\n    \"\\n\",\n    \"\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Installation Steps\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"Steps\\n\",\n    \"\\n\",\n    \"1) Install pysc2\\n\",\n    \"\\n\",\n    \"2) Clone pysc2-examples repository\\n\",\n    \"\\n\",\n    \"3) Download mini-games StarCraft II Maps\\n\",\n    \"\\n\",\n    \"4) Install Tensorflow, baselines libraries\\n\",\n    \"\\n\",\n    \"5) Open the project with IntelliJ \\n\",\n    \"\\n\",\n    \"6) Run the training script\\n\",\n    \"\\n\",\n    \"7) Run the pre-trained model\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"## Step 1 - Install pysc2\\n\",\n    \"\\n\",\n    \"`pip3 install pysc2`\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"## Step 2 - Git Clone psyc2 examples\\n\",\n    \"\\n\",\n    \"`git clone https://github.com/llSourcell/A-Guide-to-DeepMinds-StarCraft-AI-Environment`\\n\",\n    \"\\n\",\n    \"## Step 3 - Download mini-games StarCraft II Maps\\n\",\n    \"\\n\",\n    \"https://github.com/deepmind/pysc2/releases/download/v1.0/mini_games.zip\\n\",\n    \"\\n\",\n    \"save these maps to StarCraft II/Maps \\n\",\n    \"\\n\",\n    \"## Step 4 - Install Tensorflow + OpenAI Baselines\\n\",\n    \"\\n\",\n    \"`pip3 install tensorflow`\\n\",\n    \"`pip3 install baselines`\\n\",\n    \"\\n\",\n    \"## Step 5 - Open the Project with Intellij\\n\",\n    \"\\n\",\n    \"### Start training\\n\",\n    \"\\n\",\n    \"`python3 train_mineral_shards.py`\\n\",\n    \"\\n\",\n    \"### Open project , Python 3 SDK \\n\",\n    \"\\n\",\n    \"## Step 6  Run training script\\n\",\n    \"\\n\",\n    \"Right click the train_mineral_shards.py and select [Run 'train_mineral_shards'] menu.\\n\",\n    \"\\n\",\n    \"This is the brief explanation of console logs.\\n\",\n    \"\\n\",\n    \"- steps : The number of commands that we sent to marines.\\n\",\n    \"- episodes : The number of games that we played.\\n\",\n    \"- mean 100 episode reward : mean rewards of last 100 episodes.\\n\",\n    \"- mean 100 episode min… : mean minerals of last 100 episodes.\\n\",\n    \"- % time spent exploring : The percentage of Exploring (Exploration & Exploit)\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"## Step 7 Run pre-trained model\\n\",\n    \"\\n\",\n    \"- Right click the enjoy_mineral_shards.py and select [Run 'enjoy_mineral_shards'] menu.\\n\",\n    \"\\n\",\n    \"Then we can see the pre-trained agent of CollectMineralShards map.\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {\n    \"collapsed\": true\n   },\n   \"outputs\": [],\n   \"source\": []\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"Python 3\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.6.0\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 2\n}\n"
  },
  {
    "path": "LICENSE",
    "content": "                                 Apache License\n                           Version 2.0, January 2004\n                        http://www.apache.org/licenses/\n\n   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION\n\n   1. Definitions.\n\n      \"License\" shall mean the terms and conditions for use, reproduction,\n      and distribution as defined by Sections 1 through 9 of this document.\n\n      \"Licensor\" shall mean the copyright owner or entity authorized by\n      the copyright owner that is granting the License.\n\n      \"Legal Entity\" shall mean the union of the acting entity and all\n      other entities that control, are controlled by, or are under common\n      control with that entity. For the purposes of this definition,\n      \"control\" means (i) the power, direct or indirect, to cause the\n      direction or management of such entity, whether by contract or\n      otherwise, or (ii) ownership of fifty percent (50%) or more of the\n      outstanding shares, or (iii) beneficial ownership of such entity.\n\n      \"You\" (or \"Your\") shall mean an individual or Legal Entity\n      exercising permissions granted by this License.\n\n      \"Source\" form shall mean the preferred form for making modifications,\n      including but not limited to software source code, documentation\n      source, and configuration files.\n\n      \"Object\" form shall mean any form resulting from mechanical\n      transformation or translation of a Source form, including but\n      not limited to compiled object code, generated documentation,\n      and conversions to other media types.\n\n      \"Work\" shall mean the work of authorship, whether in Source or\n      Object form, made available under the License, as indicated by a\n      copyright notice that is included in or attached to the work\n      (an example is provided in the Appendix below).\n\n      \"Derivative Works\" shall mean any work, whether in Source or Object\n      form, that is based on (or derived from) the Work and for which the\n      editorial revisions, annotations, elaborations, or other modifications\n      represent, as a whole, an original work of authorship. For the purposes\n      of this License, Derivative Works shall not include works that remain\n      separable from, or merely link (or bind by name) to the interfaces of,\n      the Work and Derivative Works thereof.\n\n      \"Contribution\" shall mean any work of authorship, including\n      the original version of the Work and any modifications or additions\n      to that Work or Derivative Works thereof, that is intentionally\n      submitted to Licensor for inclusion in the Work by the copyright owner\n      or by an individual or Legal Entity authorized to submit on behalf of\n      the copyright owner. For the purposes of this definition, \"submitted\"\n      means any form of electronic, verbal, or written communication sent\n      to the Licensor or its representatives, including but not limited to\n      communication on electronic mailing lists, source code control systems,\n      and issue tracking systems that are managed by, or on behalf of, the\n      Licensor for the purpose of discussing and improving the Work, but\n      excluding communication that is conspicuously marked or otherwise\n      designated in writing by the copyright owner as \"Not a Contribution.\"\n\n      \"Contributor\" shall mean Licensor and any individual or Legal Entity\n      on behalf of whom a Contribution has been received by Licensor and\n      subsequently incorporated within the Work.\n\n   2. Grant of Copyright License. Subject to the terms and conditions of\n      this License, each Contributor hereby grants to You a perpetual,\n      worldwide, non-exclusive, no-charge, royalty-free, irrevocable\n      copyright license to reproduce, prepare Derivative Works of,\n      publicly display, publicly perform, sublicense, and distribute the\n      Work and such Derivative Works in Source or Object form.\n\n   3. Grant of Patent License. Subject to the terms and conditions of\n      this License, each Contributor hereby grants to You a perpetual,\n      worldwide, non-exclusive, no-charge, royalty-free, irrevocable\n      (except as stated in this section) patent license to make, have made,\n      use, offer to sell, sell, import, and otherwise transfer the Work,\n      where such license applies only to those patent claims licensable\n      by such Contributor that are necessarily infringed by their\n      Contribution(s) alone or by combination of their Contribution(s)\n      with the Work to which such Contribution(s) was submitted. If You\n      institute patent litigation against any entity (including a\n      cross-claim or counterclaim in a lawsuit) alleging that the Work\n      or a Contribution incorporated within the Work constitutes direct\n      or contributory patent infringement, then any patent licenses\n      granted to You under this License for that Work shall terminate\n      as of the date such litigation is filed.\n\n   4. Redistribution. You may reproduce and distribute copies of the\n      Work or Derivative Works thereof in any medium, with or without\n      modifications, and in Source or Object form, provided that You\n      meet the following conditions:\n\n      (a) You must give any other recipients of the Work or\n          Derivative Works a copy of this License; and\n\n      (b) You must cause any modified files to carry prominent notices\n          stating that You changed the files; and\n\n      (c) You must retain, in the Source form of any Derivative Works\n          that You distribute, all copyright, patent, trademark, and\n          attribution notices from the Source form of the Work,\n          excluding those notices that do not pertain to any part of\n          the Derivative Works; and\n\n      (d) If the Work includes a \"NOTICE\" text file as part of its\n          distribution, then any Derivative Works that You distribute must\n          include a readable copy of the attribution notices contained\n          within such NOTICE file, excluding those notices that do not\n          pertain to any part of the Derivative Works, in at least one\n          of the following places: within a NOTICE text file distributed\n          as part of the Derivative Works; within the Source form or\n          documentation, if provided along with the Derivative Works; or,\n          within a display generated by the Derivative Works, if and\n          wherever such third-party notices normally appear. The contents\n          of the NOTICE file are for informational purposes only and\n          do not modify the License. You may add Your own attribution\n          notices within Derivative Works that You distribute, alongside\n          or as an addendum to the NOTICE text from the Work, provided\n          that such additional attribution notices cannot be construed\n          as modifying the License.\n\n      You may add Your own copyright statement to Your modifications and\n      may provide additional or different license terms and conditions\n      for use, reproduction, or distribution of Your modifications, or\n      for any such Derivative Works as a whole, provided Your use,\n      reproduction, and distribution of the Work otherwise complies with\n      the conditions stated in this License.\n\n   5. Submission of Contributions. Unless You explicitly state otherwise,\n      any Contribution intentionally submitted for inclusion in the Work\n      by You to the Licensor shall be under the terms and conditions of\n      this License, without any additional terms or conditions.\n      Notwithstanding the above, nothing herein shall supersede or modify\n      the terms of any separate license agreement you may have executed\n      with Licensor regarding such Contributions.\n\n   6. Trademarks. This License does not grant permission to use the trade\n      names, trademarks, service marks, or product names of the Licensor,\n      except as required for reasonable and customary use in describing the\n      origin of the Work and reproducing the content of the NOTICE file.\n\n   7. Disclaimer of Warranty. Unless required by applicable law or\n      agreed to in writing, Licensor provides the Work (and each\n      Contributor provides its Contributions) on an \"AS IS\" BASIS,\n      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or\n      implied, including, without limitation, any warranties or conditions\n      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A\n      PARTICULAR PURPOSE. You are solely responsible for determining the\n      appropriateness of using or redistributing the Work and assume any\n      risks associated with Your exercise of permissions under this License.\n\n   8. Limitation of Liability. In no event and under no legal theory,\n      whether in tort (including negligence), contract, or otherwise,\n      unless required by applicable law (such as deliberate and grossly\n      negligent acts) or agreed to in writing, shall any Contributor be\n      liable to You for damages, including any direct, indirect, special,\n      incidental, or consequential damages of any character arising as a\n      result of this License or out of the use or inability to use the\n      Work (including but not limited to damages for loss of goodwill,\n      work stoppage, computer failure or malfunction, or any and all\n      other commercial damages or losses), even if such Contributor\n      has been advised of the possibility of such damages.\n\n   9. Accepting Warranty or Additional Liability. While redistributing\n      the Work or Derivative Works thereof, You may choose to offer,\n      and charge a fee for, acceptance of support, warranty, indemnity,\n      or other liability obligations and/or rights consistent with this\n      License. However, in accepting such obligations, You may act only\n      on Your own behalf and on Your sole responsibility, not on behalf\n      of any other Contributor, and only if You agree to indemnify,\n      defend, and hold each Contributor harmless for any liability\n      incurred by, or claims asserted against, such Contributor by reason\n      of your accepting any such warranty or additional liability.\n\n   END OF TERMS AND CONDITIONS\n\n   APPENDIX: How to apply the Apache License to your work.\n\n      To apply the Apache License to your work, attach the following\n      boilerplate notice, with the fields enclosed by brackets \"{}\"\n      replaced with your own identifying information. (Don't include\n      the brackets!)  The text should be enclosed in the appropriate\n      comment syntax for the file format. We also recommend that a\n      file or class name and description of purpose be included on the\n      same \"printed page\" as the copyright notice for easier\n      identification within third-party archives.\n\n   Copyright {yyyy} {name of copyright owner}\n\n   Licensed under the Apache License, Version 2.0 (the \"License\");\n   you may not use this file except in compliance with the License.\n   You may obtain a copy of the License at\n\n       http://www.apache.org/licenses/LICENSE-2.0\n\n   Unless required by applicable law or agreed to in writing, software\n   distributed under the License is distributed on an \"AS IS\" BASIS,\n   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n   See the License for the specific language governing permissions and\n   limitations under the License.\n"
  },
  {
    "path": "README.md",
    "content": "# A-Guide-to-DeepMind-s-StarCraft-AI-Environment-\nThis is the code for \"A Guide to DeepMind's StarCraft AI Environment\" by Siraj Raval on Youtube\n\n## Overview\n\nThis is the code for [this](https://youtu.be/URWXG5jRB-A) video on on Youtube by Siraj Raval. This code will help you train or run a pretrained AI model in the DeepMind Starcraft II environment. \n\n## Dependencies \n\n- pysc2 (Deepmind) [https://github.com/deepmind/pysc2]\n- baselines (OpenAI) [https://github.com/openai/baselines]\n- s2client-proto (Blizzard) [https://github.com/Blizzard/s2client-proto]\n- Tensorflow 1.3 (Google) [https://github.com/tensorflow/tensorflow]\n\n## Usage\n\n\n## 1. Get PySC2\n\n### PyPI\n\nThe easiest way to get PySC2 is to use pip:\n\n```shell\n$ pip install pysc2\n```\n\nAlso, you have to install `baselines` library.\n\n```shell\n$ pip install baselines\n```\n\n## 2. Install StarCraft II\n\n### Mac / Win\n\nYou have to purchase StarCraft II and install it. Or even the Starter Edition will work.\n\nhttp://us.battle.net/sc2/en/legacy-of-the-void/\n\n### Linux Packages\n\nFollow Blizzard's [documentation](https://github.com/Blizzard/s2client-proto#downloads) to\nget the linux version. By default, PySC2 expects the game to live in\n`~/StarCraftII/`.\n\n* [3.16.1](http://blzdistsc2-a.akamaihd.net/Linux/SC2.3.16.1.zip)\n\n## 3. Download Maps\n\nDownload the [ladder maps](https://github.com/Blizzard/s2client-proto#downloads)\nand the [mini games](https://github.com/deepmind/pysc2/releases/download/v1.0/mini_games.zip)\nand extract them to your `StarcraftII/Maps/` directory.\n\n## 4. Train it!\n\n```shell\n$ python train_mineral_shards.py\n```\n\n## 5. Enjoy it!\n\n```shell\n$ python enjoy_mineral_shards.py\n```\n\n## Credits\n\nThe credits for this code go to [chris-chris](https://github.com/chris-chris/pysc2-examples). I've merely created a wrapper to get people started. \n"
  },
  {
    "path": "deepq_mineral_shards.py",
    "content": "import numpy as np\nimport os\nimport dill\nimport tempfile\nimport tensorflow as tf\nimport zipfile\n\nimport baselines.common.tf_util as U\n\nfrom baselines import logger\nfrom baselines.common.schedules import LinearSchedule\nfrom baselines import deepq\nfrom baselines.deepq.replay_buffer import ReplayBuffer, PrioritizedReplayBuffer\n\nfrom pysc2.lib import actions as sc2_actions\nfrom pysc2.env import environment\nfrom pysc2.lib import features\nfrom pysc2.lib import actions\n\nimport gflags as flags\n\n_PLAYER_RELATIVE = features.SCREEN_FEATURES.player_relative.index\n_PLAYER_FRIENDLY = 1\n_PLAYER_NEUTRAL = 3  # beacon/minerals\n_PLAYER_HOSTILE = 4\n_NO_OP = actions.FUNCTIONS.no_op.id\n_MOVE_SCREEN = actions.FUNCTIONS.Move_screen.id\n_ATTACK_SCREEN = actions.FUNCTIONS.Attack_screen.id\n_SELECT_ARMY = actions.FUNCTIONS.select_army.id\n_NOT_QUEUED = [0]\n_SELECT_ALL = [0]\n\nFLAGS = flags.FLAGS\n\nclass ActWrapper(object):\n  def __init__(self, act):\n    self._act = act\n    #self._act_params = act_params\n\n  @staticmethod\n  def load(path, act_params, num_cpu=16):\n    with open(path, \"rb\") as f:\n      model_data = dill.load(f)\n    act = deepq.build_act(**act_params)\n    sess = U.make_session(num_cpu=num_cpu)\n    sess.__enter__()\n    with tempfile.TemporaryDirectory() as td:\n      arc_path = os.path.join(td, \"packed.zip\")\n      with open(arc_path, \"wb\") as f:\n        f.write(model_data)\n\n      zipfile.ZipFile(arc_path, 'r', zipfile.ZIP_DEFLATED).extractall(td)\n      U.load_state(os.path.join(td, \"model\"))\n\n    return ActWrapper(act)\n\n  def __call__(self, *args, **kwargs):\n    return self._act(*args, **kwargs)\n\n  def save(self, path):\n    \"\"\"Save model to a pickle located at `path`\"\"\"\n    with tempfile.TemporaryDirectory() as td:\n      U.save_state(os.path.join(td, \"model\"))\n      arc_name = os.path.join(td, \"packed.zip\")\n      with zipfile.ZipFile(arc_name, 'w') as zipf:\n        for root, dirs, files in os.walk(td):\n          for fname in files:\n            file_path = os.path.join(root, fname)\n            if file_path != arc_name:\n              zipf.write(file_path, os.path.relpath(file_path, td))\n      with open(arc_name, \"rb\") as f:\n        model_data = f.read()\n    with open(path, \"wb\") as f:\n      dill.dump((model_data), f)\n\n\ndef load(path, act_params, num_cpu=16):\n  \"\"\"Load act function that was returned by learn function.\n\n  Parameters\n  ----------\n  path: str\n      path to the act function pickle\n  num_cpu: int\n      number of cpus to use for executing the policy\n\n  Returns\n  -------\n  act: ActWrapper\n      function that takes a batch of observations\n      and returns actions.\n  \"\"\"\n  return ActWrapper.load(path, num_cpu=num_cpu, act_params=act_params)\n\n\ndef learn(env,\n          q_func,\n          num_actions=4,\n          lr=5e-4,\n          max_timesteps=100000,\n          buffer_size=50000,\n          exploration_fraction=0.1,\n          exploration_final_eps=0.02,\n          train_freq=1,\n          batch_size=32,\n          print_freq=1,\n          checkpoint_freq=10000,\n          learning_starts=1000,\n          gamma=1.0,\n          target_network_update_freq=500,\n          prioritized_replay=False,\n          prioritized_replay_alpha=0.6,\n          prioritized_replay_beta0=0.4,\n          prioritized_replay_beta_iters=None,\n          prioritized_replay_eps=1e-6,\n          num_cpu=16,\n          param_noise=False,\n          param_noise_threshold=0.05,\n          callback=None):\n  \"\"\"Train a deepq model.\n\n  Parameters\n  -------\n  env: pysc2.env.SC2Env\n      environment to train on\n  q_func: (tf.Variable, int, str, bool) -> tf.Variable\n      the model that takes the following inputs:\n          observation_in: object\n              the output of observation placeholder\n          num_actions: int\n              number of actions\n          scope: str\n          reuse: bool\n              should be passed to outer variable scope\n      and returns a tensor of shape (batch_size, num_actions) with values of every action.\n  lr: float\n      learning rate for adam optimizer\n  max_timesteps: int\n      number of env steps to optimizer for\n  buffer_size: int\n      size of the replay buffer\n  exploration_fraction: float\n      fraction of entire training period over which the exploration rate is annealed\n  exploration_final_eps: float\n      final value of random action probability\n  train_freq: int\n      update the model every `train_freq` steps.\n      set to None to disable printing\n  batch_size: int\n      size of a batched sampled from replay buffer for training\n  print_freq: int\n      how often to print out training progress\n      set to None to disable printing\n  checkpoint_freq: int\n      how often to save the model. This is so that the best version is restored\n      at the end of the training. If you do not wish to restore the best version at\n      the end of the training set this variable to None.\n  learning_starts: int\n      how many steps of the model to collect transitions for before learning starts\n  gamma: float\n      discount factor\n  target_network_update_freq: int\n      update the target network every `target_network_update_freq` steps.\n  prioritized_replay: True\n      if True prioritized replay buffer will be used.\n  prioritized_replay_alpha: float\n      alpha parameter for prioritized replay buffer\n  prioritized_replay_beta0: float\n      initial value of beta for prioritized replay buffer\n  prioritized_replay_beta_iters: int\n      number of iterations over which beta will be annealed from initial value\n      to 1.0. If set to None equals to max_timesteps.\n  prioritized_replay_eps: float\n      epsilon to add to the TD errors when updating priorities.\n  num_cpu: int\n      number of cpus to use for training\n  callback: (locals, globals) -> None\n      function called at every steps with state of the algorithm.\n      If callback returns true training stops.\n\n  Returns\n  -------\n  act: ActWrapper\n      Wrapper over act function. Adds ability to save it and load it.\n      See header of baselines/deepq/categorical.py for details on the act function.\n  \"\"\"\n  # Create all the functions necessary to train the model\n\n  sess = U.make_session(num_cpu=num_cpu)\n  sess.__enter__()\n\n  def make_obs_ph(name):\n    return U.BatchInput((64, 64), name=name)\n\n  act, train, update_target, debug = deepq.build_train(\n    make_obs_ph=make_obs_ph,\n    q_func=q_func,\n    num_actions=num_actions,\n    optimizer=tf.train.AdamOptimizer(learning_rate=lr),\n    gamma=gamma,\n    grad_norm_clipping=10\n  )\n  act_params = {\n    'make_obs_ph': make_obs_ph,\n    'q_func': q_func,\n    'num_actions': num_actions,\n  }\n\n  # Create the replay buffer\n  if prioritized_replay:\n    replay_buffer = PrioritizedReplayBuffer(buffer_size, alpha=prioritized_replay_alpha)\n    if prioritized_replay_beta_iters is None:\n      prioritized_replay_beta_iters = max_timesteps\n    beta_schedule = LinearSchedule(prioritized_replay_beta_iters,\n                                   initial_p=prioritized_replay_beta0,\n                                   final_p=1.0)\n  else:\n    replay_buffer = ReplayBuffer(buffer_size)\n    beta_schedule = None\n  # Create the schedule for exploration starting from 1.\n  exploration = LinearSchedule(schedule_timesteps=int(exploration_fraction * max_timesteps),\n                               initial_p=1.0,\n                               final_p=exploration_final_eps)\n\n  # Initialize the parameters and copy them to the target network.\n  U.initialize()\n  update_target()\n\n  episode_rewards = [0.0]\n  #episode_minerals = [0.0]\n  saved_mean_reward = None\n\n  path_memory = np.zeros((64,64))\n\n  obs = env.reset()\n  # Select all marines first\n  obs = env.step(actions=[sc2_actions.FunctionCall(_SELECT_ARMY, [_SELECT_ALL])])\n\n  player_relative = obs[0].observation[\"screen\"][_PLAYER_RELATIVE]\n\n  screen = player_relative + path_memory\n\n  player_y, player_x = (player_relative == _PLAYER_FRIENDLY).nonzero()\n  player = [int(player_x.mean()), int(player_y.mean())]\n\n  if(player[0]>32):\n    screen = shift(LEFT, player[0]-32, screen)\n  elif(player[0]<32):\n    screen = shift(RIGHT, 32 - player[0], screen)\n\n  if(player[1]>32):\n    screen = shift(UP, player[1]-32, screen)\n  elif(player[1]<32):\n    screen = shift(DOWN, 32 - player[1], screen)\n\n  reset = True\n  with tempfile.TemporaryDirectory() as td:\n    model_saved = False\n    model_file = os.path.join(td, \"model\")\n\n    for t in range(max_timesteps):\n      if callback is not None:\n        if callback(locals(), globals()):\n          break\n      # Take action and update exploration to the newest value\n      kwargs = {}\n      if not param_noise:\n        update_eps = exploration.value(t)\n        update_param_noise_threshold = 0.\n      else:\n        update_eps = 0.\n        if param_noise_threshold >= 0.:\n          update_param_noise_threshold = param_noise_threshold\n        else:\n          # Compute the threshold such that the KL divergence between perturbed and non-perturbed\n          # policy is comparable to eps-greedy exploration with eps = exploration.value(t).\n          # See Appendix C.1 in Parameter Space Noise for Exploration, Plappert et al., 2017\n          # for detailed explanation.\n          update_param_noise_threshold = -np.log(1. - exploration.value(t) + exploration.value(t) / float(num_actions))\n        kwargs['reset'] = reset\n        kwargs['update_param_noise_threshold'] = update_param_noise_threshold\n        kwargs['update_param_noise_scale'] = True\n      action = act(np.array(screen)[None], update_eps=update_eps, **kwargs)[0]\n      reset = False\n\n      coord = [player[0], player[1]]\n      rew = 0\n\n      path_memory_ = np.array(path_memory, copy=True)\n      if(action == 0): #UP\n\n        if(player[1] >= 16):\n          coord = [player[0], player[1] - 16]\n          path_memory_[player[1] - 16 : player[1], player[0]] = -1\n        elif(player[1] > 0):\n          coord = [player[0], 0]\n          path_memory_[0 : player[1], player[0]] = -1\n        #else:\n        #  rew -= 1\n\n      elif(action == 1): #DOWN\n\n        if(player[1] <= 47):\n          coord = [player[0], player[1] + 16]\n          path_memory_[player[1] : player[1] + 16, player[0]] = -1\n        elif(player[1] > 47):\n          coord = [player[0], 63]\n          path_memory_[player[1] : 63, player[0]] = -1\n        #else:\n        #  rew -= 1\n\n      elif(action == 2): #LEFT\n\n        if(player[0] >= 16):\n          coord = [player[0] - 16, player[1]]\n          path_memory_[player[1], player[0] - 16 : player[0]] = -1\n        elif(player[0] < 16):\n          coord = [0, player[1]]\n          path_memory_[player[1], 0 : player[0]] = -1\n        #else:\n        #  rew -= 1\n\n      elif(action == 3): #RIGHT\n\n        if(player[0] <= 47):\n          coord = [player[0] + 16, player[1]]\n          path_memory_[player[1], player[0] : player[0] + 16] = -1\n        elif(player[0] > 47):\n          coord = [63, player[1]]\n          path_memory_[player[1], player[0] : 63] = -1\n        #else:\n        #  rew -= 1\n\n      #else:\n        #Cannot move, give minus reward\n      #  rew -= 1\n\n      #if(path_memory[coord[1],coord[0]] != 0):\n      #  rew -= 0.5\n\n      path_memory = np.array(path_memory_)\n      #print(\"action : %s Coord : %s\" % (action, coord))\n\n      if _MOVE_SCREEN not in obs[0].observation[\"available_actions\"]:\n        obs = env.step(actions=[sc2_actions.FunctionCall(_SELECT_ARMY, [_SELECT_ALL])])\n\n      new_action = [sc2_actions.FunctionCall(_MOVE_SCREEN, [_NOT_QUEUED, coord])]\n\n      # else:\n      #   new_action = [sc2_actions.FunctionCall(_NO_OP, [])]\n\n      obs = env.step(actions=new_action)\n\n      player_relative = obs[0].observation[\"screen\"][_PLAYER_RELATIVE]\n      new_screen = player_relative + path_memory\n\n      player_y, player_x = (player_relative == _PLAYER_FRIENDLY).nonzero()\n      player = [int(player_x.mean()), int(player_y.mean())]\n\n      if(player[0]>32):\n        new_screen = shift(LEFT, player[0]-32, new_screen)\n      elif(player[0]<32):\n        new_screen = shift(RIGHT, 32 - player[0], new_screen)\n\n      if(player[1]>32):\n        new_screen = shift(UP, player[1]-32, new_screen)\n      elif(player[1]<32):\n        new_screen = shift(DOWN, 32 - player[1], new_screen)\n\n      rew = obs[0].reward\n\n      done = obs[0].step_type == environment.StepType.LAST\n\n      # Store transition in the replay buffer.\n      replay_buffer.add(screen, action, rew, new_screen, float(done))\n      screen = new_screen\n\n      episode_rewards[-1] += rew\n      #episode_minerals[-1] += obs[0].reward\n\n      if done:\n        obs = env.reset()\n        player_relative = obs[0].observation[\"screen\"][_PLAYER_RELATIVE]\n\n        screen = player_relative + path_memory\n\n        player_y, player_x = (player_relative == _PLAYER_FRIENDLY).nonzero()\n        player = [int(player_x.mean()), int(player_y.mean())]\n\n        if(player[0]>32):\n          screen = shift(LEFT, player[0]-32, screen)\n        elif(player[0]<32):\n          screen = shift(RIGHT, 32 - player[0], screen)\n\n        if(player[1]>32):\n          screen = shift(UP, player[1]-32, screen)\n        elif(player[1]<32):\n          screen = shift(DOWN, 32 - player[1], screen)\n\n        # Select all marines first\n        env.step(actions=[sc2_actions.FunctionCall(_SELECT_ARMY, [_SELECT_ALL])])\n        episode_rewards.append(0.0)\n        #episode_minerals.append(0.0)\n\n        path_memory = np.zeros((64,64))\n\n        reset = True\n\n      if t > learning_starts and t % train_freq == 0:\n        # Minimize the error in Bellman's equation on a batch sampled from replay buffer.\n        if prioritized_replay:\n          experience = replay_buffer.sample(batch_size, beta=beta_schedule.value(t))\n          (obses_t, actions, rewards, obses_tp1, dones, weights, batch_idxes) = experience\n        else:\n          obses_t, actions, rewards, obses_tp1, dones = replay_buffer.sample(batch_size)\n          weights, batch_idxes = np.ones_like(rewards), None\n        td_errors = train(obses_t, actions, rewards, obses_tp1, dones, weights)\n        if prioritized_replay:\n          new_priorities = np.abs(td_errors) + prioritized_replay_eps\n          replay_buffer.update_priorities(batch_idxes, new_priorities)\n\n      if t > learning_starts and t % target_network_update_freq == 0:\n        # Update target network periodically.\n        update_target()\n\n      mean_100ep_reward = round(np.mean(episode_rewards[-101:-1]), 1)\n      #mean_100ep_mineral = round(np.mean(episode_minerals[-101:-1]), 1)\n      num_episodes = len(episode_rewards)\n      if done and print_freq is not None and len(episode_rewards) % print_freq == 0:\n        logger.record_tabular(\"steps\", t)\n        logger.record_tabular(\"episodes\", num_episodes)\n        logger.record_tabular(\"mean 100 episode reward\", mean_100ep_reward)\n        #logger.record_tabular(\"mean 100 episode mineral\", mean_100ep_mineral)\n        logger.record_tabular(\"% time spent exploring\", int(100 * exploration.value(t)))\n        logger.dump_tabular()\n\n      if (checkpoint_freq is not None and t > learning_starts and\n              num_episodes > 100 and t % checkpoint_freq == 0):\n        if saved_mean_reward is None or mean_100ep_reward > saved_mean_reward:\n          if print_freq is not None:\n            logger.log(\"Saving model due to mean reward increase: {} -> {}\".format(\n              saved_mean_reward, mean_100ep_reward))\n          U.save_state(model_file)\n          model_saved = True\n          saved_mean_reward = mean_100ep_reward\n    if model_saved:\n      if print_freq is not None:\n        logger.log(\"Restored model with mean reward: {}\".format(saved_mean_reward))\n      U.load_state(model_file)\n\n  return ActWrapper(act)\n\ndef intToCoordinate(num, size=64):\n  if size!=64:\n    num = num * size * size // 4096\n  y = num // size\n  x = num - size * y\n  return [x, y]\n\nUP, DOWN, LEFT, RIGHT = 'up', 'down', 'left', 'right'\n\ndef shift(direction, number, matrix):\n  ''' shift given 2D matrix in-place the given number of rows or columns\n      in the specified (UP, DOWN, LEFT, RIGHT) direction and return it\n  '''\n  if direction in (UP):\n    matrix = np.roll(matrix, -number, axis=0)\n    matrix[number:,:] = -2\n    return matrix\n  elif direction in (DOWN):\n    matrix = np.roll(matrix, number, axis=0)\n    matrix[:number,:] = -2\n    return matrix\n  elif direction in (LEFT):\n    matrix = np.roll(matrix, -number, axis=1)\n    matrix[:,number:] = -2\n    return matrix\n  elif direction in (RIGHT):\n    matrix = np.roll(matrix, number, axis=1)\n    matrix[:,:number] = -2\n    return matrix\n  else:\n    return matrix"
  },
  {
    "path": "defeat_zerglings/common.py",
    "content": "import numpy as np\n\nfrom pysc2.lib import actions as sc2_actions\nfrom pysc2.lib import features\nfrom pysc2.lib import actions\n\n_PLAYER_RELATIVE = features.SCREEN_FEATURES.player_relative.index\n\n_UNIT_TYPE = features.SCREEN_FEATURES.unit_type.index\n_SELECTED = features.SCREEN_FEATURES.selected.index\n_PLAYER_FRIENDLY = 1\n_PLAYER_NEUTRAL = 3  # beacon/minerals\n_PLAYER_HOSTILE = 4\n_NO_OP = actions.FUNCTIONS.no_op.id\n_SELECT_UNIT_ID = 1\n\n_CONTROL_GROUP_SET = 1\n_CONTROL_GROUP_RECALL = 0\n\n_SELECT_CONTROL_GROUP = actions.FUNCTIONS.select_control_group.id\n_MOVE_SCREEN = actions.FUNCTIONS.Move_screen.id\n_ATTACK_SCREEN = actions.FUNCTIONS.Attack_screen.id\n_SELECT_ARMY = actions.FUNCTIONS.select_army.id\n_SELECT_UNIT = actions.FUNCTIONS.select_unit.id\n_SELECT_POINT = actions.FUNCTIONS.select_point.id\n\n_NOT_QUEUED = [0]\n_SELECT_ALL = [0]\n\ndef init(env, player_relative, obs):\n\n  #print(\"init\")\n  army_count = env._obs.observation.player_common.army_count\n\n  if(army_count==0):\n    return obs\n  try:\n    obs = env.step(actions=[sc2_actions.FunctionCall(_NO_OP, [])])\n    obs = env.step(actions=[sc2_actions.FunctionCall(_NO_OP, [])])\n    obs = env.step(actions=[sc2_actions.FunctionCall(_NO_OP, [])])\n    obs = env.step(actions=[sc2_actions.FunctionCall(_NO_OP, [])])\n    obs = env.step(actions=[sc2_actions.FunctionCall(_NO_OP, [])])\n\n    player_y, player_x = (player_relative == _PLAYER_FRIENDLY).nonzero()\n    obs = env.step(actions=[sc2_actions.FunctionCall(_SELECT_ARMY, [_SELECT_ALL])])\n  except Exception as e:\n    print(e)\n  for i in range(len(player_x)):\n    if i % 4 != 0:\n      continue\n\n    xy = [player_x[i], player_y[i]]\n    obs = env.step(actions=[sc2_actions.FunctionCall(_SELECT_POINT, [[0], xy])])\n\n  group_id = 0\n  group_list = []\n  unit_xy_list = []\n  for i in range(len(player_x)):\n    if i % 4 != 0:\n      continue\n\n    if group_id > 9:\n      break\n\n    xy = [player_x[i], player_y[i]]\n    unit_xy_list.append(xy)\n\n    if(len(unit_xy_list) >= 1):\n      for idx, xy in enumerate(unit_xy_list):\n        if(idx==0):\n          obs = env.step(actions=[sc2_actions.FunctionCall(_SELECT_POINT, [[0], xy])])\n        else:\n          obs = env.step(actions=[sc2_actions.FunctionCall(_SELECT_POINT, [[1], xy])])\n\n      obs = env.step(actions=[sc2_actions.FunctionCall(_SELECT_CONTROL_GROUP, [[_CONTROL_GROUP_SET], [group_id]])])\n      unit_xy_list = []\n\n      group_list.append(group_id)\n      group_id += 1\n\n  if(len(unit_xy_list) >= 1):\n    for idx, xy in enumerate(unit_xy_list):\n      if(idx==0):\n        obs = env.step(actions=[sc2_actions.FunctionCall(_SELECT_POINT, [[0], xy])])\n      else:\n        obs = env.step(actions=[sc2_actions.FunctionCall(_SELECT_POINT, [[1], xy])])\n\n    obs = env.step(actions=[sc2_actions.FunctionCall(_SELECT_CONTROL_GROUP, [[_CONTROL_GROUP_SET], [group_id]])])\n\n    group_list.append(group_id)\n    group_id += 1\n\n  return obs\n\ndef update_group_list(obs):\n  control_groups = obs[0].observation[\"control_groups\"]\n  group_count = 0\n  group_list = []\n  for id, group in enumerate(control_groups):\n    if(group[0]!=0):\n      group_count += 1\n      group_list.append(id)\n  return group_list\n\ndef check_group_list(env, obs):\n  error = False\n  control_groups = obs[0].observation[\"control_groups\"]\n  army_count = 0\n  for id, group in enumerate(control_groups):\n    if(group[0]==48):\n      army_count += group[1]\n      if(group[1] != 1):\n        #print(\"group error group_id : %s count : %s\" % (id, group[1]))\n        error = True\n        return error\n  if(army_count != env._obs.observation.player_common.army_count):\n    error = True\n    # print(\"army_count %s !=  %s env._obs.observation.player_common.army_count \"\n    #      % (army_count, env._obs.observation.player_common.army_count))\n\n\n  return error\n\n\nUP, DOWN, LEFT, RIGHT = 'up', 'down', 'left', 'right'\n\ndef shift(direction, number, matrix):\n  ''' shift given 2D matrix in-place the given number of rows or columns\n      in the specified (UP, DOWN, LEFT, RIGHT) direction and return it\n  '''\n  if direction in (UP):\n    matrix = np.roll(matrix, -number, axis=0)\n    matrix[number:,:] = -2\n    return matrix\n  elif direction in (DOWN):\n    matrix = np.roll(matrix, number, axis=0)\n    matrix[:number,:] = -2\n    return matrix\n  elif direction in (LEFT):\n    matrix = np.roll(matrix, -number, axis=1)\n    matrix[:,number:] = -2\n    return matrix\n  elif direction in (RIGHT):\n    matrix = np.roll(matrix, number, axis=1)\n    matrix[:,:number] = -2\n    return matrix\n  else:\n    return matrix\n\ndef select_marine(env, obs):\n\n  player_relative = obs[0].observation[\"screen\"][_PLAYER_RELATIVE]\n  screen = player_relative\n\n  group_list = update_group_list(obs)\n\n  if(check_group_list(env, obs)):\n    obs = init(env, player_relative, obs)\n    group_list = update_group_list(obs)\n\n  # if(len(group_list) == 0):\n  #   obs = init(env, player_relative, obs)\n  #   group_list = update_group_list(obs)\n\n  player_relative = obs[0].observation[\"screen\"][_PLAYER_RELATIVE]\n\n  friendly_y, friendly_x = (player_relative == _PLAYER_FRIENDLY).nonzero()\n\n  enemy_y, enemy_x = (player_relative == _PLAYER_HOSTILE).nonzero()\n\n  player = []\n\n  danger_closest, danger_min_dist = None, None\n  for e in zip(enemy_x, enemy_y):\n    for p in zip(friendly_x, friendly_y):\n      dist = np.linalg.norm(np.array(p) - np.array(e))\n      if not danger_min_dist or dist < danger_min_dist:\n        danger_closest, danger_min_dist = p, dist\n\n\n  marine_closest, marine_min_dist = None, None\n  for e in zip(friendly_x, friendly_y):\n    for p in zip(friendly_x, friendly_y):\n      dist = np.linalg.norm(np.array(p) - np.array(e))\n      if not marine_min_dist or dist < marine_min_dist:\n        if dist >= 2:\n          marine_closest, marine_min_dist = p, dist\n\n  if(danger_min_dist != None and danger_min_dist <= 5):\n    obs = env.step(actions=[sc2_actions.FunctionCall(_SELECT_POINT, [[0], danger_closest])])\n\n    selected = obs[0].observation[\"screen\"][_SELECTED]\n    player_y, player_x = (selected == _PLAYER_FRIENDLY).nonzero()\n    if(len(player_y)>0):\n      player = [int(player_x.mean()), int(player_y.mean())]\n\n  elif(marine_closest != None and marine_min_dist <= 3):\n    obs = env.step(actions=[sc2_actions.FunctionCall(_SELECT_POINT, [[0], marine_closest])])\n\n    selected = obs[0].observation[\"screen\"][_SELECTED]\n    player_y, player_x = (selected == _PLAYER_FRIENDLY).nonzero()\n    if(len(player_y)>0):\n      player = [int(player_x.mean()), int(player_y.mean())]\n\n  else:\n\n    # If there is no marine in danger, select random\n    while(len(group_list)>0):\n      # units = env._obs.observation.raw_data.units\n      # marine_list = []          # for unit in units:\n      #   if(unit.alliance == 1):\n      #     marine_list.append(unit)\n\n      group_id = np.random.choice(group_list)\n      #xy = [int(unit.pos.y - 10), int(unit.pos.x+8)]\n      #print(\"check xy : %s - %s\" % (xy, player_relative[xy[0],xy[1]]))\n      obs = env.step(actions=[sc2_actions.FunctionCall(_SELECT_CONTROL_GROUP, [[_CONTROL_GROUP_RECALL], [group_id]])])\n\n      selected = obs[0].observation[\"screen\"][_SELECTED]\n      player_y, player_x = (selected == _PLAYER_FRIENDLY).nonzero()\n      if(len(player_y)>0):\n        player = [int(player_x.mean()), int(player_y.mean())]\n        break\n      else:\n        group_list.remove(group_id)\n\n  if(len(player) == 2):\n\n    if(player[0]>32):\n      screen = shift(LEFT, player[0]-32, screen)\n    elif(player[0]<32):\n      screen = shift(RIGHT, 32 - player[0], screen)\n\n    if(player[1]>32):\n      screen = shift(UP, player[1]-32, screen)\n    elif(player[1]<32):\n      screen = shift(DOWN, 32 - player[1], screen)\n\n  return obs, screen, player\n\ndef marine_action(env, obs, player, action):\n\n  player_relative = obs[0].observation[\"screen\"][_PLAYER_RELATIVE]\n\n  enemy_y, enemy_x = (player_relative == _PLAYER_HOSTILE).nonzero()\n\n  closest, min_dist = None, None\n\n  if(len(player) == 2):\n    for p in zip(enemy_x, enemy_y):\n      dist = np.linalg.norm(np.array(player) - np.array(p))\n      if not min_dist or dist < min_dist:\n        closest, min_dist = p, dist\n\n\n  player_relative = obs[0].observation[\"screen\"][_PLAYER_RELATIVE]\n  friendly_y, friendly_x = (player_relative == _PLAYER_FRIENDLY).nonzero()\n\n  closest_friend, min_dist_friend = None, None\n  if(len(player) == 2):\n    for p in zip(friendly_x, friendly_y):\n      dist = np.linalg.norm(np.array(player) - np.array(p))\n      if not min_dist_friend or dist < min_dist_friend:\n        closest_friend, min_dist_friend = p, dist\n\n  if(closest == None):\n\n    new_action = [sc2_actions.FunctionCall(_NO_OP, [])]\n\n  elif(action == 0 and closest_friend != None and min_dist_friend < 3):\n    # Friendly marine is too close => Sparse!\n\n    mean_friend = [int(friendly_x.mean()), int(friendly_x.mean())]\n\n    diff = np.array(player) - np.array(closest_friend)\n\n    norm = np.linalg.norm(diff)\n\n    if(norm != 0):\n      diff = diff / norm\n\n    coord = np.array(player) + diff * 4\n\n    if(coord[0]<0):\n      coord[0] = 0\n    elif(coord[0]>63):\n      coord[0] = 63\n\n    if(coord[1]<0):\n      coord[1] = 0\n    elif(coord[1]>63):\n      coord[1] = 63\n\n    new_action = [sc2_actions.FunctionCall(_MOVE_SCREEN, [_NOT_QUEUED, coord])]\n\n  elif(action <= 1): #Attack\n\n    # nearest enemy\n\n    coord = closest\n\n    new_action = [sc2_actions.FunctionCall(_ATTACK_SCREEN, [_NOT_QUEUED, coord])]\n\n    #print(\"action : %s Attack Coord : %s\" % (action, coord))\n\n  elif(action == 2): # Oppsite direcion from enemy\n\n    # nearest enemy opposite\n\n    diff = np.array(player) - np.array(closest)\n\n    norm = np.linalg.norm(diff)\n\n    if(norm != 0):\n      diff = diff / norm\n\n    coord = np.array(player) + diff * 7\n\n    if(coord[0]<0):\n      coord[0] = 0\n    elif(coord[0]>63):\n      coord[0] = 63\n\n    if(coord[1]<0):\n      coord[1] = 0\n    elif(coord[1]>63):\n      coord[1] = 63\n\n    new_action = [sc2_actions.FunctionCall(_MOVE_SCREEN, [_NOT_QUEUED, coord])]\n\n  elif(action == 4): #UP\n    coord = [player[0], player[1] - 3]\n    new_action = [sc2_actions.FunctionCall(_MOVE_SCREEN, [_NOT_QUEUED, coord])]\n\n  elif(action == 5): #DOWN\n    coord = [player[0], player[1] + 3]\n    new_action = [sc2_actions.FunctionCall(_MOVE_SCREEN, [_NOT_QUEUED, coord])]\n\n  elif(action == 6): #LEFT\n    coord = [player[0] - 3, player[1]]\n    new_action = [sc2_actions.FunctionCall(_MOVE_SCREEN, [_NOT_QUEUED, coord])]\n\n  elif(action == 7): #RIGHT\n    coord = [player[0] + 3, player[1]]\n    new_action = [sc2_actions.FunctionCall(_MOVE_SCREEN, [_NOT_QUEUED, coord])]\n\n    #print(\"action : %s Back Coord : %s\" % (action, coord))\n\n\n  return obs, new_action"
  },
  {
    "path": "defeat_zerglings/demo_agent.py",
    "content": "\"\"\"A random agent for starcraft.\"\"\"\n\nfrom __future__ import absolute_import\nfrom __future__ import division\nfrom __future__ import print_function\n\nimport numpy\n\nfrom pysc2.agents import base_agent\nfrom pysc2.lib import actions\n\nfrom pysc2.lib import actions as sc2_actions\nfrom pysc2.lib import features\nfrom pysc2.lib import actions\n\nfrom defeat_zerglings import common\n\nimport numpy as np\n\n_PLAYER_RELATIVE = features.SCREEN_FEATURES.player_relative.index\n\n_UNIT_TYPE = features.SCREEN_FEATURES.unit_type.index\n_SELECTED = features.SCREEN_FEATURES.selected.index\n_PLAYER_FRIENDLY = 1\n_PLAYER_NEUTRAL = 3  # beacon/minerals\n_PLAYER_HOSTILE = 4\n_NO_OP = actions.FUNCTIONS.no_op.id\n_SELECT_UNIT_ID = 1\n\n_CONTROL_GROUP_SET = 1\n_CONTROL_GROUP_RECALL = 0\n\n_SELECT_CONTROL_GROUP = actions.FUNCTIONS.select_control_group.id\n_MOVE_SCREEN = actions.FUNCTIONS.Move_screen.id\n_ATTACK_SCREEN = actions.FUNCTIONS.Attack_screen.id\n_SELECT_ARMY = actions.FUNCTIONS.select_army.id\n_SELECT_UNIT = actions.FUNCTIONS.select_unit.id\n_SELECT_POINT = actions.FUNCTIONS.select_point.id\n\n_NOT_QUEUED = [0]\n_SELECT_ALL = [0]\n\nclass MarineAgent(base_agent.BaseAgent):\n  \"\"\"A random agent for starcraft.\"\"\"\n  demo_replay = []\n\n  def __init__(self, env):\n    self.env = env\n\n  def step(self, obs):\n    super(MarineAgent, self).step(obs)\n\n    #1. Select marine!\n    obs, screen, player = common.select_marine(self.env, [obs])\n\n    player_relative = obs[0].observation[\"screen\"][_PLAYER_RELATIVE]\n\n    enemy_y, enemy_x = (player_relative == _PLAYER_HOSTILE).nonzero()\n\n\n    #2. Run away from nearby enemy\n    closest, min_dist = None, None\n\n    if(len(player) == 2):\n      for p in zip(enemy_x, enemy_y):\n        dist = np.linalg.norm(np.array(player) - np.array(p))\n        if not min_dist or dist < min_dist:\n          closest, min_dist = p, dist\n\n\n    #3. Sparse!\n    friendly_y, friendly_x = (player_relative == _PLAYER_FRIENDLY).nonzero()\n\n    closest_friend, min_dist_friend = None, None\n    if(len(player) == 2):\n      for p in zip(friendly_x, friendly_y):\n        dist = np.linalg.norm(np.array(player) - np.array(p))\n        if not min_dist_friend or dist < min_dist_friend:\n          closest_friend, min_dist_friend = p, dist\n\n    if(min_dist != None and min_dist <= 7):\n\n      obs, new_action = common.marine_action(self.env, obs, player, 2)\n\n    elif(min_dist_friend != None and min_dist_friend <= 3):\n\n      sparse_or_attack = np.random.randint(0,2)\n\n      obs, new_action = common.marine_action(self.env, obs, player, sparse_or_attack)\n\n    else:\n\n      obs, new_action = common.marine_action(self.env, obs, player, 1)\n\n    return new_action[0]\n"
  },
  {
    "path": "defeat_zerglings/dqfd.py",
    "content": "import numpy as np\nimport os\nimport dill\nimport tempfile\nimport tensorflow as tf\nimport zipfile\n\nimport baselines.common.tf_util as U\n\nfrom baselines import logger\nfrom baselines.common.schedules import LinearSchedule\nfrom baselines import deepq\nfrom baselines.deepq.replay_buffer import ReplayBuffer, PrioritizedReplayBuffer\n\nfrom pysc2.lib import actions as sc2_actions\nfrom pysc2.env import environment\nfrom pysc2.lib import features\nfrom pysc2.lib import actions\n\nfrom defeat_zerglings import common\n\nimport gflags as flags\n\n_PLAYER_RELATIVE = features.SCREEN_FEATURES.player_relative.index\n\n_UNIT_TYPE = features.SCREEN_FEATURES.unit_type.index\n_SELECTED = features.SCREEN_FEATURES.selected.index\n_PLAYER_FRIENDLY = 1\n_PLAYER_NEUTRAL = 3  # beacon/minerals\n_PLAYER_HOSTILE = 4\n_NO_OP = actions.FUNCTIONS.no_op.id\n_SELECT_UNIT_ID = 1\n\n_CONTROL_GROUP_SET = 1\n_CONTROL_GROUP_RECALL = 0\n\n_SELECT_CONTROL_GROUP = actions.FUNCTIONS.select_control_group.id\n_MOVE_SCREEN = actions.FUNCTIONS.Move_screen.id\n_ATTACK_SCREEN = actions.FUNCTIONS.Attack_screen.id\n_SELECT_ARMY = actions.FUNCTIONS.select_army.id\n_SELECT_UNIT = actions.FUNCTIONS.select_unit.id\n_SELECT_POINT = actions.FUNCTIONS.select_point.id\n\n_NOT_QUEUED = [0]\n_SELECT_ALL = [0]\n\nUP, DOWN, LEFT, RIGHT = 'up', 'down', 'left', 'right'\n\nFLAGS = flags.FLAGS\n\nclass ActWrapper(object):\n  def __init__(self, act):\n    self._act = act\n    #self._act_params = act_params\n\n  @staticmethod\n  def load(path, act_params, num_cpu=16):\n    with open(path, \"rb\") as f:\n      model_data = dill.load(f)\n    act = deepq.build_act(**act_params)\n    sess = U.make_session(num_cpu=num_cpu)\n    sess.__enter__()\n    with tempfile.TemporaryDirectory() as td:\n      arc_path = os.path.join(td, \"packed.zip\")\n      with open(arc_path, \"wb\") as f:\n        f.write(model_data)\n\n      zipfile.ZipFile(arc_path, 'r', zipfile.ZIP_DEFLATED).extractall(td)\n      U.load_state(os.path.join(td, \"model\"))\n\n    return ActWrapper(act)\n\n  def __call__(self, *args, **kwargs):\n    return self._act(*args, **kwargs)\n\n  def save(self, path):\n    \"\"\"Save model to a pickle located at `path`\"\"\"\n    with tempfile.TemporaryDirectory() as td:\n      U.save_state(os.path.join(td, \"model\"))\n      arc_name = os.path.join(td, \"packed.zip\")\n      with zipfile.ZipFile(arc_name, 'w') as zipf:\n        for root, dirs, files in os.walk(td):\n          for fname in files:\n            file_path = os.path.join(root, fname)\n            if file_path != arc_name:\n              zipf.write(file_path, os.path.relpath(file_path, td))\n      with open(arc_name, \"rb\") as f:\n        model_data = f.read()\n    with open(path, \"wb\") as f:\n      dill.dump((model_data), f)\n\n\ndef load(path, act_params, num_cpu=16):\n  \"\"\"Load act function that was returned by learn function.\n\n  Parameters\n  ----------\n  path: str\n      path to the act function pickle\n  num_cpu: int\n      number of cpus to use for executing the policy\n\n  Returns\n  -------\n  act: ActWrapper\n      function that takes a batch of observations\n      and returns actions.\n  \"\"\"\n  return ActWrapper.load(path, num_cpu=num_cpu, act_params=act_params)\n\n\ndef learn(env,\n          q_func,\n          num_actions=3,\n          lr=5e-4,\n          max_timesteps=100000,\n          buffer_size=50000,\n          exploration_fraction=0.1,\n          exploration_final_eps=0.02,\n          train_freq=1,\n          batch_size=32,\n          print_freq=1,\n          checkpoint_freq=10000,\n          learning_starts=1000,\n          gamma=1.0,\n          target_network_update_freq=500,\n          prioritized_replay=False,\n          prioritized_replay_alpha=0.6,\n          prioritized_replay_beta0=0.4,\n          prioritized_replay_beta_iters=None,\n          prioritized_replay_eps=1e-6,\n          num_cpu=16,\n          param_noise=False,\n          param_noise_threshold=0.05,\n          callback=None,\n          demo_replay=[]\n          ):\n  \"\"\"Train a deepq model.\n\n  Parameters\n  -------\n  env: pysc2.env.SC2Env\n      environment to train on\n  q_func: (tf.Variable, int, str, bool) -> tf.Variable\n      the model that takes the following inputs:\n          observation_in: object\n              the output of observation placeholder\n          num_actions: int\n              number of actions\n          scope: str\n          reuse: bool\n              should be passed to outer variable scope\n      and returns a tensor of shape (batch_size, num_actions) with values of every action.\n  lr: float\n      learning rate for adam optimizer\n  max_timesteps: int\n      number of env steps to optimizer for\n  buffer_size: int\n      size of the replay buffer\n  exploration_fraction: float\n      fraction of entire training period over which the exploration rate is annealed\n  exploration_final_eps: float\n      final value of random action probability\n  train_freq: int\n      update the model every `train_freq` steps.\n      set to None to disable printing\n  batch_size: int\n      size of a batched sampled from replay buffer for training\n  print_freq: int\n      how often to print out training progress\n      set to None to disable printing\n  checkpoint_freq: int\n      how often to save the model. This is so that the best version is restored\n      at the end of the training. If you do not wish to restore the best version at\n      the end of the training set this variable to None.\n  learning_starts: int\n      how many steps of the model to collect transitions for before learning starts\n  gamma: float\n      discount factor\n  target_network_update_freq: int\n      update the target network every `target_network_update_freq` steps.\n  prioritized_replay: True\n      if True prioritized replay buffer will be used.\n  prioritized_replay_alpha: float\n      alpha parameter for prioritized replay buffer\n  prioritized_replay_beta0: float\n      initial value of beta for prioritized replay buffer\n  prioritized_replay_beta_iters: int\n      number of iterations over which beta will be annealed from initial value\n      to 1.0. If set to None equals to max_timesteps.\n  prioritized_replay_eps: float\n      epsilon to add to the TD errors when updating priorities.\n  num_cpu: int\n      number of cpus to use for training\n  callback: (locals, globals) -> None\n      function called at every steps with state of the algorithm.\n      If callback returns true training stops.\n\n  Returns\n  -------\n  act: ActWrapper\n      Wrapper over act function. Adds ability to save it and load it.\n      See header of baselines/deepq/categorical.py for details on the act function.\n  \"\"\"\n  # Create all the functions necessary to train the model\n\n  sess = U.make_session(num_cpu=num_cpu)\n  sess.__enter__()\n\n  def make_obs_ph(name):\n    return U.BatchInput((64, 64), name=name)\n\n  act, train, update_target, debug = deepq.build_train(\n    make_obs_ph=make_obs_ph,\n    q_func=q_func,\n    num_actions=num_actions,\n    optimizer=tf.train.AdamOptimizer(learning_rate=lr),\n    gamma=gamma,\n    grad_norm_clipping=10\n  )\n  act_params = {\n    'make_obs_ph': make_obs_ph,\n    'q_func': q_func,\n    'num_actions': num_actions,\n  }\n\n  # Create the replay buffer\n  if prioritized_replay:\n    replay_buffer = PrioritizedReplayBuffer(buffer_size, alpha=prioritized_replay_alpha)\n    if prioritized_replay_beta_iters is None:\n      prioritized_replay_beta_iters = max_timesteps\n    beta_schedule = LinearSchedule(prioritized_replay_beta_iters,\n                                   initial_p=prioritized_replay_beta0,\n                                   final_p=1.0)\n  else:\n    replay_buffer = ReplayBuffer(buffer_size)\n    beta_schedule = None\n  # Create the schedule for exploration starting from 1.\n  exploration = LinearSchedule(schedule_timesteps=int(exploration_fraction * max_timesteps),\n                               initial_p=1.0,\n                               final_p=exploration_final_eps)\n\n  # Initialize the parameters and copy them to the target network.\n  U.initialize()\n  update_target()\n\n  episode_rewards = [0.0]\n  saved_mean_reward = None\n\n  obs = env.reset()\n  # Select all marines first\n\n  player_relative = obs[0].observation[\"screen\"][_PLAYER_RELATIVE]\n\n  screen = player_relative\n\n  obs = common.init(env, player_relative, obs)\n\n  group_id = 0\n  reset = True\n  with tempfile.TemporaryDirectory() as td:\n    model_saved = False\n    model_file = os.path.join(td, \"model\")\n\n    for t in range(max_timesteps):\n      if callback is not None:\n        if callback(locals(), globals()):\n          break\n      # Take action and update exploration to the newest value\n      kwargs = {}\n      if not param_noise:\n        update_eps = exploration.value(t)\n        update_param_noise_threshold = 0.\n      else:\n        update_eps = 0.\n        if param_noise_threshold >= 0.:\n          update_param_noise_threshold = param_noise_threshold\n        else:\n          # Compute the threshold such that the KL divergence between perturbed and non-perturbed\n          # policy is comparable to eps-greedy exploration with eps = exploration.value(t).\n          # See Appendix C.1 in Parameter Space Noise for Exploration, Plappert et al., 2017\n          # for detailed explanation.\n          update_param_noise_threshold = -np.log(1. - exploration.value(t) + exploration.value(t) / float(num_actions))\n        kwargs['reset'] = reset\n        kwargs['update_param_noise_threshold'] = update_param_noise_threshold\n        kwargs['update_param_noise_scale'] = True\n\n      # custom process for DefeatZerglingsAndBanelings\n\n      obs, screen, player = common.select_marine(env, obs)\n\n      action = act(np.array(screen)[None], update_eps=update_eps, **kwargs)[0]\n      reset = False\n      rew = 0\n\n      new_action = None\n\n      obs, new_action = common.marine_action(env, obs, player, action)\n      army_count = env._obs.observation.player_common.army_count\n\n      try:\n        if army_count > 0 and _ATTACK_SCREEN in obs[0].observation[\"available_actions\"]:\n          obs = env.step(actions=new_action)\n        else:\n          new_action = [sc2_actions.FunctionCall(_NO_OP, [])]\n          obs = env.step(actions=new_action)\n      except Exception as e:\n        #print(e)\n        1 # Do nothing\n\n      player_relative = obs[0].observation[\"screen\"][_PLAYER_RELATIVE]\n      new_screen = player_relative\n\n      rew += obs[0].reward\n\n      done = obs[0].step_type == environment.StepType.LAST\n\n      selected = obs[0].observation[\"screen\"][_SELECTED]\n      player_y, player_x = (selected == _PLAYER_FRIENDLY).nonzero()\n\n      if(len(player_y)>0):\n        player = [int(player_x.mean()), int(player_y.mean())]\n\n      if(len(player) == 2):\n\n        if(player[0]>32):\n          new_screen = common.shift(LEFT, player[0]-32, new_screen)\n        elif(player[0]<32):\n          new_screen = common.shift(RIGHT, 32 - player[0], new_screen)\n\n        if(player[1]>32):\n          new_screen = common.shift(UP, player[1]-32, new_screen)\n        elif(player[1]<32):\n          new_screen = common.shift(DOWN, 32 - player[1], new_screen)\n\n      # Store transition in the replay buffer.\n      replay_buffer.add(screen, action, rew, new_screen, float(done))\n      screen = new_screen\n\n      episode_rewards[-1] += rew\n\n      if done:\n        print(\"Episode Reward : %s\" % episode_rewards[-1])\n        obs = env.reset()\n        player_relative = obs[0].observation[\"screen\"][_PLAYER_RELATIVE]\n\n        screen = player_relative\n\n        group_list = common.init(env, player_relative, obs)\n\n        # Select all marines first\n        #env.step(actions=[sc2_actions.FunctionCall(_SELECT_UNIT, [_SELECT_ALL])])\n        episode_rewards.append(0.0)\n\n        reset = True\n\n      if t > learning_starts and t % train_freq == 0:\n        # Minimize the error in Bellman's equation on a batch sampled from replay buffer.\n        if prioritized_replay:\n          experience = replay_buffer.sample(batch_size, beta=beta_schedule.value(t))\n          (obses_t, actions, rewards, obses_tp1, dones, weights, batch_idxes) = experience\n        else:\n          obses_t, actions, rewards, obses_tp1, dones = replay_buffer.sample(batch_size)\n          weights, batch_idxes = np.ones_like(rewards), None\n        td_errors = train(obses_t, actions, rewards, obses_tp1, dones, weights)\n        if prioritized_replay:\n          new_priorities = np.abs(td_errors) + prioritized_replay_eps\n          replay_buffer.update_priorities(batch_idxes, new_priorities)\n\n      if t > learning_starts and t % target_network_update_freq == 0:\n        # Update target network periodically.\n        update_target()\n\n      mean_100ep_reward = round(np.mean(episode_rewards[-101:-1]), 1)\n      num_episodes = len(episode_rewards)\n      if done and print_freq is not None and len(episode_rewards) % print_freq == 0:\n        logger.record_tabular(\"steps\", t)\n        logger.record_tabular(\"episodes\", num_episodes)\n        logger.record_tabular(\"mean 100 episode reward\", mean_100ep_reward)\n        logger.record_tabular(\"% time spent exploring\", int(100 * exploration.value(t)))\n        logger.dump_tabular()\n\n      if (checkpoint_freq is not None and t > learning_starts and\n              num_episodes > 100 and t % checkpoint_freq == 0):\n        if saved_mean_reward is None or mean_100ep_reward > saved_mean_reward:\n          if print_freq is not None:\n            logger.log(\"Saving model due to mean reward increase: {} -> {}\".format(\n              saved_mean_reward, mean_100ep_reward))\n          U.save_state(model_file)\n          model_saved = True\n          saved_mean_reward = mean_100ep_reward\n    if model_saved:\n      if print_freq is not None:\n        logger.log(\"Restored model with mean reward: {}\".format(saved_mean_reward))\n      U.load_state(model_file)\n\n  return ActWrapper(act)\n"
  },
  {
    "path": "defeat_zerglings/run_demo_agent.py",
    "content": "import sys\n\nimport gflags as flags\nfrom baselines import deepq\nfrom pysc2.env import sc2_env\nfrom pysc2.lib import actions\nfrom pysc2.env import run_loop\n\nfrom defeat_zerglings import demo_agent\nfrom maps import chris_maps\n\n_MOVE_SCREEN = actions.FUNCTIONS.Move_screen.id\n_SELECT_ARMY = actions.FUNCTIONS.select_army.id\n_SELECT_ALL = [0]\n_NOT_QUEUED = [0]\n\nstep_mul = 1\nsteps = 20000\n\nFLAGS = flags.FLAGS\n\ndef main():\n  FLAGS(sys.argv)\n  with sc2_env.SC2Env(\n      \"DefeatZerglingsAndBanelings\",\n      step_mul=step_mul,\n      visualize=True,\n      game_steps_per_episode=steps * step_mul) as env:\n\n    demo_replay = []\n\n    agent = demo_agent.MarineAgent(env=env)\n    agent.env = env\n    run_loop.run_loop([agent], env, steps)\n\n\nif __name__ == '__main__':\n  main()\n"
  },
  {
    "path": "defeat_zerglings/train.py",
    "content": "import sys\n\nimport gflags as flags\nfrom baselines import deepq\nfrom pysc2.env import sc2_env\nfrom pysc2.lib import actions\n\nfrom defeat_zerglings import dqfd\n\n_MOVE_SCREEN = actions.FUNCTIONS.Move_screen.id\n_SELECT_ARMY = actions.FUNCTIONS.select_army.id\n_SELECT_ALL = [0]\n_NOT_QUEUED = [0]\n\nstep_mul = 1\nsteps = 2000\n\nFLAGS = flags.FLAGS\n\ndef main():\n  FLAGS(sys.argv)\n  with sc2_env.SC2Env(\n      \"DefeatZerglingsAndBanelings\",\n      step_mul=step_mul,\n      visualize=True,\n      game_steps_per_episode=steps * step_mul) as env:\n\n    model = deepq.models.cnn_to_mlp(\n      convs=[(32, 8, 4), (64, 4, 2), (64, 3, 1)],\n      hiddens=[256],\n      dueling=True\n    )\n    demo_replay = []\n    act = dqfd.learn(\n      env,\n      q_func=model,\n      num_actions=3,\n      lr=1e-4,\n      max_timesteps=10000000,\n      buffer_size=100000,\n      exploration_fraction=0.5,\n      exploration_final_eps=0.01,\n      train_freq=2,\n      learning_starts=100000,\n      target_network_update_freq=1000,\n      gamma=0.99,\n      prioritized_replay=True,\n      demo_replay=demo_replay\n    )\n    act.save(\"defeat_zerglings.pkl\")\n\n\nif __name__ == '__main__':\n  main()\n"
  },
  {
    "path": "enjoy_mineral_shards.py",
    "content": "import sys\n\nimport baselines.common.tf_util as U\nimport gflags as flags\nimport numpy as np\nfrom baselines import deepq\nfrom pysc2.env import environment\nfrom pysc2.env import sc2_env\nfrom pysc2.lib import actions\nfrom pysc2.lib import actions as sc2_actions\nfrom pysc2.lib import features\n\nimport deepq_mineral_shards\n\n_PLAYER_RELATIVE = features.SCREEN_FEATURES.player_relative.index\n_PLAYER_FRIENDLY = 1\n_PLAYER_NEUTRAL = 3  # beacon/minerals\n_PLAYER_HOSTILE = 4\n_NO_OP = actions.FUNCTIONS.no_op.id\n_MOVE_SCREEN = actions.FUNCTIONS.Move_screen.id\n_ATTACK_SCREEN = actions.FUNCTIONS.Attack_screen.id\n_SELECT_ARMY = actions.FUNCTIONS.select_army.id\n_NOT_QUEUED = [0]\n_SELECT_ALL = [0]\n\nstep_mul = 16\nsteps = 400\n\nFLAGS = flags.FLAGS\n\ndef main():\n  FLAGS(sys.argv)\n  with sc2_env.SC2Env(\n      \"CollectMineralShards\",\n      step_mul=step_mul,\n      visualize=True,\n      game_steps_per_episode=steps * step_mul) as env:\n\n    model = deepq.models.cnn_to_mlp(\n      convs=[(32, 8, 4), (64, 4, 2), (64, 3, 1)],\n      hiddens=[256],\n      dueling=True\n    )\n\n    def make_obs_ph(name):\n      return U.BatchInput((64, 64), name=name)\n\n    act_params = {\n      'make_obs_ph': make_obs_ph,\n      'q_func': model,\n      'num_actions': 4,\n    }\n\n    act = deepq_mineral_shards.load(\"mineral_shards.pkl\", act_params=act_params)\n\n    while True:\n\n      obs = env.reset()\n      episode_rew = 0\n\n      done = False\n\n      step_result = env.step(actions=[sc2_actions.FunctionCall(_SELECT_ARMY, [_SELECT_ALL])])\n\n      while not done:\n\n        player_relative = step_result[0].observation[\"screen\"][_PLAYER_RELATIVE]\n\n        obs = player_relative\n\n        player_y, player_x = (player_relative == _PLAYER_FRIENDLY).nonzero()\n        player = [int(player_x.mean()), int(player_y.mean())]\n\n        if(player[0]>32):\n          obs = shift(LEFT, player[0]-32, obs)\n        elif(player[0]<32):\n          obs = shift(RIGHT, 32 - player[0], obs)\n\n        if(player[1]>32):\n          obs = shift(UP, player[1]-32, obs)\n        elif(player[1]<32):\n          obs = shift(DOWN, 32 - player[1], obs)\n\n        action = act(obs[None])[0]\n        coord = [player[0], player[1]]\n\n        if(action == 0): #UP\n\n          if(player[1] >= 16):\n            coord = [player[0], player[1] - 16]\n          elif(player[1] > 0):\n            coord = [player[0], 0]\n\n        elif(action == 1): #DOWN\n\n          if(player[1] <= 47):\n            coord = [player[0], player[1] + 16]\n          elif(player[1] > 47):\n            coord = [player[0], 63]\n\n        elif(action == 2): #LEFT\n\n          if(player[0] >= 16):\n            coord = [player[0] - 16, player[1]]\n          elif(player[0] < 16):\n            coord = [0, player[1]]\n\n        elif(action == 3): #RIGHT\n\n          if(player[0] <= 47):\n            coord = [player[0] + 16, player[1]]\n          elif(player[0] > 47):\n            coord = [63, player[1]]\n\n        new_action = [sc2_actions.FunctionCall(_MOVE_SCREEN, [_NOT_QUEUED, coord])]\n\n        step_result = env.step(actions=new_action)\n\n        rew = step_result[0].reward\n        done = step_result[0].step_type == environment.StepType.LAST\n\n        episode_rew += rew\n      print(\"Episode reward\", episode_rew)\n\nUP, DOWN, LEFT, RIGHT = 'up', 'down', 'left', 'right'\n\ndef shift(direction, number, matrix):\n  ''' shift given 2D matrix in-place the given number of rows or columns\n      in the specified (UP, DOWN, LEFT, RIGHT) direction and return it\n  '''\n  if direction in (UP):\n    matrix = np.roll(matrix, -number, axis=0)\n    matrix[number:,:] = -2\n    return matrix\n  elif direction in (DOWN):\n    matrix = np.roll(matrix, number, axis=0)\n    matrix[:number,:] = -2\n    return matrix\n  elif direction in (LEFT):\n    matrix = np.roll(matrix, -number, axis=1)\n    matrix[:,number:] = -2\n    return matrix\n  elif direction in (RIGHT):\n    matrix = np.roll(matrix, number, axis=1)\n    matrix[:,:number] = -2\n    return matrix\n  else:\n    return matrix\n\nif __name__ == '__main__':\n  main()\n"
  },
  {
    "path": "maps/chris_maps.py",
    "content": "\"\"\"Define the mini game map configs. These are maps made by Deepmind.\"\"\"\n\nfrom __future__ import absolute_import\nfrom __future__ import division\nfrom __future__ import print_function\n\nfrom pysc2.maps import lib\n\nclass ChrisMaps(lib.Map):\n  directory = \"chris_maps\"\n  download = \"https://github.com/chris-chris/pysc2-examples#get-the-maps\"\n  players = 1\n  score_index = 0\n  game_steps_per_episode = 0\n  step_mul = 8\n\nchris_maps = [\n  \"DefeatZealots\",  # 120s\n]\n\nfor name in chris_maps:\n  globals()[name] = type(name, (ChrisMaps,), dict(filename=name))\n"
  },
  {
    "path": "tests/scripted_test.py",
    "content": "#!/usr/bin/python\n\nfrom __future__ import absolute_import\nfrom __future__ import division\nfrom __future__ import print_function\n\nfrom pysc2.agents import random_agent\nfrom pysc2.env import run_loop\nfrom pysc2.env import sc2_env\nfrom pysc2.tests import utils\nfrom pysc2.lib import actions as sc2_actions\nfrom pysc2.lib import features\n\nfrom pysc2.lib import basetest\nimport gflags as flags\nimport sys\n\n_NO_OP = sc2_actions.FUNCTIONS.no_op.id\n_PLAYER_RELATIVE = features.SCREEN_FEATURES.player_relative.index\n\nFLAGS = flags.FLAGS\n\nclass TestScripted(utils.TestCase):\n  steps = 2000\n  step_mul = 1\n\n  def test_defeat_zerglings(self):\n    FLAGS(sys.argv)\n\n    with sc2_env.SC2Env(\n        \"DefeatZerglingsAndBanelings\",\n        step_mul=self.step_mul,\n        visualize=True,\n        game_steps_per_episode=self.steps * self.step_mul) as env:\n      obs = env.step(actions=[sc2_actions.FunctionCall(_NO_OP, [])])\n      player_relative = obs[0].observation[\"screen\"][_PLAYER_RELATIVE]\n\n      # Break Point!!\n      print(player_relative)\n\n      agent = random_agent.RandomAgent()\n      run_loop.run_loop([agent], env, self.steps)\n\n    self.assertEqual(agent.steps, self.steps)\n\nif __name__ == \"__main__\":\n  basetest.main()\n"
  },
  {
    "path": "train_mineral_shards.py",
    "content": "import sys\n\nimport gflags as flags\nfrom baselines import deepq\nfrom pysc2.env import sc2_env\nfrom pysc2.lib import actions\n\nimport deepq_mineral_shards\n\n_MOVE_SCREEN = actions.FUNCTIONS.Move_screen.id\n_SELECT_ARMY = actions.FUNCTIONS.select_army.id\n_SELECT_ALL = [0]\n_NOT_QUEUED = [0]\n\nstep_mul = 8\n\nFLAGS = flags.FLAGS\n\ndef main():\n  FLAGS(sys.argv)\n  with sc2_env.SC2Env(\n      \"CollectMineralShards\",\n      step_mul=step_mul,\n      visualize=True) as env:\n\n    model = deepq.models.cnn_to_mlp(\n      convs=[(32, 8, 4), (64, 4, 2), (64, 3, 1)],\n      hiddens=[256],\n      dueling=True\n    )\n\n    act = deepq_mineral_shards.learn(\n      env,\n      q_func=model,\n      num_actions=4,\n      lr=1e-5,\n      max_timesteps=2000000,\n      buffer_size=100000,\n      exploration_fraction=0.5,\n      exploration_final_eps=0.01,\n      train_freq=4,\n      learning_starts=100000,\n      target_network_update_freq=1000,\n      gamma=0.99,\n      prioritized_replay=True\n    )\n    act.save(\"mineral_shards.pkl\")\n\n\nif __name__ == '__main__':\n  main()\n"
  }
]