[
  {
    "path": "README.md",
    "content": "# Deep Reinforcement Learning for Secrecy Energy-Efficient UAV Communication with Reconfigurable Intelligent Surfaces\n\n**IEEE Wireless Communications and Networking Conference 2023 (WCNC 2023)** </br>\nSimulation for Conference Proceedings https://doi.org/10.1109/WCNC55385.2023.10118891 </br>\nRefer [this link](https://github.com/yjwong1999/Twin-TD3/blob/main/WCNC2023%20WS-09%20%231570879488.pdf) for the preprint\n\n## Abstract\nThis paper investigates the **physical layer security (PLS)** issue in **reconfigurable intelligent surface (RIS) aided millimeter-wave rotary-wing unmanned aerial vehicle (UAV) communications** under the presence of multiple eavesdroppers and imperfect channel state information (CSI). The goal is to maximize the **worst-case secrecy energy efficiency (SEE)** of UAV via a **joint optimization of flight trajectory, UAV active beamforming and RIS passive beamforming**. By interacting with the dynamically changing UAV environment, real-time decision making per time slot is possible via deep reinforcement learning (DRL). To decouple the continuous optimization variables, we introduce a **twin twin-delayed deep deterministic policy gradient (TTD3)** to maximize the expected cumulative reward, which is linked to SEE enhancement. Simulation results confirm that the proposed method achieves greater secrecy energy savings than the traditional twin-deep deterministic policy gradient DRL (TDDRL)-based method. \n\n## TLDR\n\n### System model: \n**RIS-aided mmWave UAV system** under the presence of **eavesdroppers** and **imperfect channel state information (CSI)** </br>\n\n### Solution: \nA **Twin-TD3 (TTD3) algorithm** to decouple the joint optimization of:\n1. UAV active beamforming and RIS passive beamforming \n2. UAV flight trajectory\n\nWe adopt **double DRL framework**, where the 1st and 2nd agent provides the policy for task (1) and (2), respectively.\n\n## How to use this repo\n\nSetup the repo\n```\nconda create --name <env> python=3.10.4\nconda activate <env>\ngit clone https://github.com/yjwong1999/Twin-TD3.git\ncd Twin-TD3\npip install -r requirements.txt\n```\n\nUser can train two types of algorithm for training:\n1. Twin DDPG is [TDDRL algorithm](https://doi.org/10.1109/LWC.2021.3081464)\n2. Twin TD3 is our proposed TTD3 algorithm\n\nRun the following  in the `bash` or `powershell`\n\n`main_train.py` is the main python file to train the DRL algorithms\n```shell\n# To use Twin DDPG with SSR as optimization goal\npython3 main_train.py --drl ddpg --reward ssr\n\n# To use Twin TD3 with SSR as optimization goal\npython3 main_train.py --drl td3 --reward ssr\n\n# To use Twin DDPG with SEE as optimization goal\npython3 main_train.py --drl ddpg --reward see\n\n# To use Twin TD3 with SEE as optimization goal\npython3 main_train.py --drl td3 --reward see\n\n\n\n# To use pretrained DRL for UAV trajectory (recommended for stable convergence)\npython3 main_train.py --drl td3 --reward see --trained-uav\n\n# To set number of episodes (default is 300)\npython3 main_train.py --drl td3 --reward see --ep-num 300\n\n# To set seeds for DRL weight initialization (not recommended)\npython3 main_train.py --drl td3 --reward see --seeds 0       # weights of both DRL are initialized with seed 0\npython3 main_train.py --drl td3 --reward see --seeds 0 1     # weights of DRL 1 and DRL2 are initialized with seed 0 and 1, respectively\n```\n\n`run_simulation.py` is the python file to run the simulation using your trained models\n```shell\n# plot everything for each episode\npython3 run_simulation.py --path data/storage/scratch/<DIR>       # if you train the algorithm without the pretrained uav\npython3 run_simulation.py --path data/storage/trained_uav/<DIR>   # if you train the algorithm with the pretrained uav\n```\n\n\n`load_and_plot.py` is the python file to plot the (i) Rewards, (ii) Sum Secrecy Rate (SSR), (iii) Secrecy Energy Efficient (SEE), (iv) UAV Trajectory, (v) RIS configs for each episode in one experiments. The plotted figures are saved at `data/storage/scratch/<DIR>/plot` or `data/storage/trained_uav/<DIR>plot`\n```shell\n# plot everything for each episode\npython3 load_and_plot.py --path data/storage/scratch/<DIR> --ep-num 300       # if you train the algorithm without the pretrained uav\npython3 load_and_plot.py --path data/storage/trained_uav/<DIR> --ep-num 300   # if you train the algorithm with the pretrained uav\n```\n\nNote that you can use the bash script `batch_train.sh` and `batch_eval.sh` to train the algorithms and evaluate them using the previous two python codes\n```shell\n# To train on batch\nbash batch_train.sh\n\n# To evaluate on batch\nbash batch_eval.sh\n```\n\n## Results\n\nWe run the ```main_train.py``` for 5 times for each settings below, and averaged out the performance\n\nSSR and SEE              (the higher the better)\nTotal Energy Consumption (the lower the better)\n\n| Algorithms                     | SSR (bits/s/Hz)| Energy (kJ) | SEE (bits/s/Hz/kJ)|\n|--------------------------------|----------------|-------------|-------------------|\n| TDDRL                          | 5.03           | 12.4        | 40.8              |\n| TTD3                           | 6.05           | 12.7        | 48.2              |\n| TDDRL (with energy constraint) | 4.68           | 11.2        | 39.4              |\n| TTD3  (with energy constraint) | 5.39           | 11.2        | 48.4              |\n\nSummary\n1. In terms of SSR, TTD3 outperforms TDDRL with or without energy constraint\n2. In terms of SEE and Energy, TTD3 (with energy constraint) outperforms all other algorithms\n3. Generally, TTD3 algorithm are better than TTDRL\n4. Even with energy contraint (trade-off between energy consumption and SSR), TTD3 outperforms TDDRL in all aspects\n\n\\* Remarks: </br>\nNote that the performance of DRL (especially twin DRL) has a big variation, sometimes you may get extremely good (or bad) performance </br>\nThe above benchmark results are averaged performance of several experiments, to get a more holistic understandings on the algorithms </br>\nIt is advised to use the benchmark UAV models we trained, for better convergence. </br>\nThis approach is consistent with the codes provided by [TDDRL](https://github.com/Brook1711/WCL-pulish-code)\n\n## References and Acknowledgement\n\nThis work was supported by the **British Council** under **UK-ASEAN Institutional Links Early Career Researchers Scheme** with project number 913030644.\n\nBoth **RIS Simulation** and the **System Model** for this Research Project are based the research work provided by [Brook1711](https://github.com/Brook1711). </br>\nWe intended to fork the original repo for the system model (as stated below) as the base of this project. </br>\nHowever, GitHub does not allow a forked repo to be private. </br>\nHence, we could not maintain our code based a forked version of the original repo, while keeping it private until the project is completed.\nWe would like to express our utmost gratitude for [Brook1711](https://github.com/Brook1711) and his co-authors for their research work.\n\n### RIS Simulation\nRIS Simulation is based on the following research work: </br>\n[SimRIS Channel Simulator for Reconfigurable Intelligent Surface-Empowered Communication Systems](https://ieeexplore.ieee.org/document/9282349) </br>\nThe original simulation code is coded in matlab, this [GitHub repo](https://github.com/Brook1711/RIS_components) provides a Python version of the simulation.\n\n### System Model: RIS-aided mmWave UAV communications\nThe simulation of the System Model is provided by the following research work: </br>\n[Learning-Based Robust and Secure Transmission for Reconfigurable Intelligent Surface Aided Millimeter Wave UAV Communications](https://doi.org/10.1109/LWC.2021.3081464) </br>\nThe code is provided in this [GitHub repo](https://github.com/Brook1711/WCL-pulish-code).\n\n### Rotary-Wing UAV\nWe can derive the Rotary-Wing UAV’s propulsion energy consumption based on the following research work: </br>\n[Energy Minimization in Internet-of-Things System Based on Rotary-Wing UAV](https://doi.org/10.1109/LWC.2019.2916549)\n\n### TD3\nMain reference for TD3 implementation: </br>\n[PyTorch/TensorFlow 2.0 for TD3](https://github.com/philtabor/Actor-Critic-Methods-Paper-To-Code/tree/master/TD3)\n\n\n## TODO\n- [x] Add argparse arguments to set drl algo and reward type\n- [x] Add argparse arguments to set episode number\n- [x] Add argparse arguments to set seeds for the two DRLs\n- [x] Add argparse arguments to load pretrained DRL for UAV trajectory\n- [x] Add benchmark/pretrained model\n- [x] Project naming (use <DRL>_<Reward>_<Num> instead of using datetime format)\n- [x] Remove saving \"best model\", there are no best model, only latest model\n- [ ] The following codes can be used, but you have to manually change the filepath in the codes\n\n`plot_ssr.py` is the python file to plot the final episode's SSR for the 4 benchmarks in the paper\n```shell\n# plot ssr\npython3 plot_ssr.py\n```\n\n`plot_see.py` is the python file to plot the final episode's SSR for the 4 benchmarks in the paper\n```shell\n# plot see\npython3 plot_see.py\n```\n\n`plot_traj.py` is the python file to plot the final episode's UAV trajectory for the 4 benchmarks in the paper\n```shell\n# plot UAV trajectory\npython3 plot_traj.py\n```\n\n## Cite this repository\n```\n@INPROCEEDINGS{10118891,\n  author={Tham, Mau-Luen and Wong, Yi Jie and Iqbal, Amjad and Ramli, Nordin Bin and Zhu, Yongxu and Dagiuklas, Tasos},\n  booktitle={2023 IEEE Wireless Communications and Networking Conference (WCNC)}, \n  title={Deep Reinforcement Learning for Secrecy Energy- Efficient UAV Communication with Reconfigurable Intelligent Surface}, \n  year={2023},\n  doi={10.1109/WCNC55385.2023.10118891}}\n```\n\n<details>\n<summary>Star History</summary>\n  \n[![Star History Chart](https://api.star-history.com/svg?repos=yjwong1999/Twin-TD3&type=Date)](https://star-history.com/#yjwong1999/Twin-TD3&Date)\n\n</details>\n"
  },
  {
    "path": "batch_eval.sh",
    "content": "#!/bin/bash\n\necho \n\necho ddpg_ssr\npython3 load_and_plot.py --path data/storage/trained_uav/ddpg_ssr --ep-num 300\necho ddpg_ssr_2\npython3 load_and_plot.py --path data/storage/trained_uav/ddpg_ssr_2 --ep-num 300\necho ddpg_ssr_3\npython3 load_and_plot.py --path data/storage/trained_uav/ddpg_ssr_3 --ep-num 300\necho ddpg_ssr_4\npython3 load_and_plot.py --path data/storage/trained_uav/ddpg_ssr_4 --ep-num 300\necho ddpg_ssr_5\npython3 load_and_plot.py --path data/storage/trained_uav/ddpg_ssr_5 --ep-num 300\n\n\necho td3_ssr\npython3 load_and_plot.py --path data/storage/trained_uav/td3_ssr --ep-num 300\necho td3_ssr_2\npython3 load_and_plot.py --path data/storage/trained_uav/td3_ssr_2 --ep-num 300\necho td3_ssr_3\npython3 load_and_plot.py --path data/storage/trained_uav/td3_ssr_3 --ep-num 300\necho td3_ssr_4\npython3 load_and_plot.py --path data/storage/trained_uav/td3_ssr_4 --ep-num 300\necho td3_ssr_5\npython3 load_and_plot.py --path data/storage/trained_uav/td3_ssr_5 --ep-num 300\n\n\necho ddpg_see\npython3 load_and_plot.py --path data/storage/trained_uav/ddpg_see --ep-num 300\necho ddpg_see_2\npython3 load_and_plot.py --path data/storage/trained_uav/ddpg_see_2 --ep-num 300\necho ddpg_see_3\npython3 load_and_plot.py --path data/storage/trained_uav/ddpg_see_3 --ep-num 300\necho ddpg_see_4\npython3 load_and_plot.py --path data/storage/trained_uav/ddpg_see_4 --ep-num 300\necho ddpg_see_5\npython3 load_and_plot.py --path data/storage/trained_uav/ddpg_see_5 --ep-num 300\n\n\necho td3_see\npython3 load_and_plot.py --path data/storage/trained_uav/td3_see --ep-num 300\necho td3_see_2\npython3 load_and_plot.py --path data/storage/trained_uav/td3_see_2 --ep-num 300\necho td3_see_3\npython3 load_and_plot.py --path data/storage/trained_uav/td3_see_3 --ep-num 300\necho td3_see_4\npython3 load_and_plot.py --path data/storage/trained_uav/td3_see_4 --ep-num 300\necho td3_see_5\npython3 load_and_plot.py --path data/storage/trained_uav/td3_see_5 --ep-num 300\n\n\n"
  },
  {
    "path": "batch_train.sh",
    "content": "#!/bin/bash\n\necho \n\necho ddpg_ssr\npython3 main_train.py --drl ddpg --reward ssr --ep-num 300 --trained-uav\necho ddpg_ssr_2\npython3 main_train.py --drl ddpg --reward ssr --ep-num 300 --trained-uav\necho ddpg_ssr_3\npython3 main_train.py --drl ddpg --reward ssr --ep-num 300 --trained-uav\necho ddpg_ssr_4\npython3 main_train.py --drl ddpg --reward ssr --ep-num 300 --trained-uav\necho ddpg_ssr_5\npython3 main_train.py --drl ddpg --reward ssr --ep-num 300 --trained-uav\n\necho td3_ssr\npython3 main_train.py --drl td3 --reward ssr --ep-num 300 --trained-uav\necho td3_ssr_2\npython3 main_train.py --drl td3 --reward ssr --ep-num 300 --trained-uav\necho td3_ssr_3\npython3 main_train.py --drl td3 --reward ssr --ep-num 300 --trained-uav\necho td3_ssr_4\npython3 main_train.py --drl td3 --reward ssr --ep-num 300 --trained-uav\necho td3_ssr_5\npython3 main_train.py --drl td3 --reward ssr --ep-num 300 --trained-uav\n\necho ddpg_see\npython3 main_train.py --drl ddpg --reward see --ep-num 300 --trained-uav\necho ddpg_see_2\npython3 main_train.py --drl ddpg --reward see --ep-num 300 --trained-uav\necho ddpg_see_3\npython3 main_train.py --drl ddpg --reward see --ep-num 300 --trained-uav\necho ddpg_see_4\npython3 main_train.py --drl ddpg --reward see --ep-num 300 --trained-uav\necho ddpg_see_5\npython3 main_train.py --drl ddpg --reward see --ep-num 300 --trained-uav\n\necho td3_see\npython3 main_train.py --drl td3 --reward see --ep-num 300 --trained-uav\necho td3_see_2\npython3 main_train.py --drl td3 --reward see --ep-num 300 --trained-uav\necho td3_see_3\npython3 main_train.py --drl td3 --reward see --ep-num 300 --trained-uav\necho td3_see_4\npython3 main_train.py --drl td3 --reward see --ep-num 300 --trained-uav\necho td3_see_5\npython3 main_train.py --drl td3 --reward see --ep-num 300 --trained-uav\n"
  },
  {
    "path": "channel.py",
    "content": "import numpy as np\nimport math\nimport cmath\nfrom math_tool import *\n\nclass mmWave_channel(object):\n    \"\"\"\n    generate MmWave under UMi open\n    input: distance, angle, pair entity object\n    output: Instantaneous CSI\n    \"\"\"\n    def __init__(self, transmitter, receiver, frequncy):\n        \"\"\"\n        transmitter: object in entity.py\n        receiver: object in entity.py\n        \n        \"\"\"\n        self.channel_name = ''\n        self.n = 0\n        self.sigma = 0\n        self.transmitter = transmitter \n        self.receiver = receiver\n        self.channel_type = self.init_type()    # 'UAV_RIS', 'UAV_user', 'UAV_attacker', 'RIS_user', 'RIS_attacker'\n        \n        # self.distance = np.linalg.norm(transmitter.coordinate - receiver.coordinate)\n        self.frequncy = frequncy\n        # init & updata path loss\n        self.path_loss_normal = self.get_channel_path_loss()\n        self.path_loss_dB = normal_to_dB(self.path_loss_normal)\n        # init & update channel CSI matrix\n        self.channel_matrix = self.get_estimated_channel_matrix()\n        \n    def init_type(self):\n        channel_type = self.transmitter.type+'_'+self.receiver.type\n        if channel_type == 'UAV_RIS' or channel_type == 'RIS_UAV':\n            self.n = 2.2\n            self.sigma = 3\n            self.channel_name = 'H_UR'\n        elif channel_type == 'UAV_user' or channel_type == 'UAV_attacker':\n            self.n = 3.5\n            self.sigma = 3\n            if channel_type =='UAV_user':\n                self.channel_name = 'h_U_k,' + str(self.transmitter.index)\n            elif channel_type == 'UAV_attacker':\n                self.channel_name = 'h_U_p,' + str(self.transmitter.index)\n        elif channel_type == 'user_UAV' or channel_type == 'attacker_UAV':\n            self.n = 3.5\n            self.sigma = 3\n            if channel_type =='user_UAV':\n                self.channel_name = 'h_U_k,' + str(self.transmitter.index)\n            elif channel_type == 'attacker_UAV':\n                self.channel_name = 'h_U_p,' + str(self.transmitter.index)\n                \n        elif channel_type == 'RIS_user' or channel_type == 'RIS_attacker':\n            self.n = 2.8\n            self.sigma = 3\n            if channel_type =='RIS_user':\n                self.channel_name = 'h_R_k,' + str(self.transmitter.index)\n            elif channel_type == 'RIS_attacker':\n                self.channel_name = 'h_R_p,' + str(self.transmitter.index)        \n        elif channel_type == 'user_RIS' or channel_type == 'attacker_RIS':\n            self.n = 2.8\n            self.sigma = 3\n            if channel_type =='user_RIS':\n                self.channel_name = 'h_R_k,' + str(self.transmitter.index)\n            elif channel_type == 'attacker_RIS':\n                self.channel_name = 'h_R_p,' + str(self.transmitter.index)  \n        return channel_type\n\n    def get_channel_path_loss(self):\n        \"\"\"\n        calculate the path loss including shadow fading \n        (in normal form)\n        \"\"\"\n        distance = np.linalg.norm(self.transmitter.coordinate - self.receiver.coordinate)\n        PL = -20 * math.log10(4*math.pi/(3e8/self.frequncy)) - 10*self.n*math.log10(distance)\n        shadow_loss = np.random.normal() * self.sigma\n        # return dB_to_normal(PL - shadow_loss)\n        return dB_to_normal(PL)\n\n    def get_estimated_channel_matrix(self):\n        \"\"\"\n        init & update channel matrix\n        \"\"\"\n        # init matrix\n        N_t = self.transmitter.ant_num\n        N_r = self.receiver.ant_num\n        channel_matrix = np.mat(np.ones(shape=(N_r,N_t),dtype=complex), dtype=complex)\n\n        # get relevant coordinate receiver under transmitter system\n        r_under_t_car_coor = get_coor_ref(\\\n        self.transmitter.coor_sys, \\\n        self.receiver.coordinate - self.transmitter.coordinate)\n        # get relevant spherical_coordinate \n        r_t_r, r_t_theta, r_t_fai = cartesian_coordinate_to_spherical_coordinate(\\\n        cartesian_coordinate=r_under_t_car_coor\\\n        )\n\n        # get relevant coordinate transmitter under receiver system\n        t_under_r_car_coor = get_coor_ref(\\\n        #   remmber to Meet channel direction restrictions\n        [-self.receiver.coor_sys[0], self.receiver.coor_sys[1], -self.receiver.coor_sys[2]],\\\n        self.transmitter.coordinate - self.receiver.coordinate)\n        # get relevant spherical_coordinate \n        t_r_r, t_r_theta, t_r_fai = cartesian_coordinate_to_spherical_coordinate(\\\n        cartesian_coordinate=t_under_r_car_coor\\\n        )\n\n        # calculate array response\n        t_array_response = self.generate_array_response(self.transmitter, r_t_theta, r_t_fai)\n        r_array_response = self.generate_array_response(self.receiver, t_r_theta, t_r_fai)\n        array_response_product = r_array_response * t_array_response.H\n        # get H_LOS\n        #   get LOS path loss \n        PL = self.path_loss_normal\n        \n        #   get LOS phase shift\n        LOS_fai = 2 * math.pi * self.frequncy * np.linalg.norm(self.transmitter.coordinate - self.receiver.coordinate) / 3e8\n        channel_matrix = cmath.exp(1j*LOS_fai)* math.pow(PL, 0.5) * array_response_product\n        \n        return channel_matrix\n\n    def generate_array_response(self, transceiver, theta, fai):\n        \"\"\"\n        if the ant_type is 'UPA'\n        generate_UPA_response\n        if the ant_type is 'ULA'\n        generate_ULA_response\n        if the ant_type is 'single'\n        generate_singleant_response\n        \"\"\"\n        ant_type = transceiver.ant_type\n        ant_num  = transceiver.ant_num\n\n        if ant_type == 'UPA':\n            row_num = int(math.sqrt(ant_num))\n            Planar_response = np.mat(np.ones(shape=(ant_num, 1)), dtype=complex)\n            for i in range(row_num):\n                for j in range(row_num):\n                    Planar_response[j+i*row_num,0] = cmath.exp(1j *\\\n                    (math.sin(theta) * math.cos(fai)*i*math.pi + math.sin(theta)*math.sin(fai))\\\n                    )\n                \n            return Planar_response\n        elif ant_type == 'ULA':\n            Linear_response = np.mat(np.ones(shape=(ant_num,1)), dtype=complex)\n            for i in range(ant_num):\n                Linear_response[i, 0] = cmath.exp(1j * math.sin(theta) * math.cos(fai)*i*math.pi)\n            return Linear_response\n        elif ant_type == 'single':\n            return np.mat(np.array([1]))\n        else:\n            return False\n        \n    def update_CSI(self):\n        \"\"\"\n        update pathloss and channel matrix\n        \"\"\"\n        # init & updata path loss\n        self.path_loss_normal = self.get_channel_path_loss()\n        self.path_loss_dB = normal_to_dB(self.path_loss_normal)\n        # init & update channel CSI matrix\n        self.channel_matrix = self.get_estimated_channel_matrix()\n"
  },
  {
    "path": "data/.vscode/settings.json",
    "content": "{\n    \"python.pythonPath\": \"C:\\\\Users\\\\67039\\\\anaconda3\\\\python.exe\"\n}"
  },
  {
    "path": "data/data_test.py",
    "content": "import pandas as pd \nuser_sheet = pd.read_excel('init_location.xlsx', sheet_name='user')\nuser_data = user_sheet\n\nprint(user_data)"
  },
  {
    "path": "data/readme.md",
    "content": "#### init_location.xlsx\n- the config files for the initilization postion of each entity (such as UAV, users, attackers, RIS)\n\n#### data_test.py (should change name)\n- to read and show the initialization position of each entity\n"
  },
  {
    "path": "data/storage/readme.md",
    "content": "# This directory stores all experiments\n\n- Each experiment will be saved as a file project in this directory\n- benchmark directory stores all the pretrained DRL for UAV trajectory (agent 2 in the code), which will be used when --trained-uav flag is raised\n"
  },
  {
    "path": "data_manager.py",
    "content": "import numpy as np\nimport scipy.io\nimport pandas as pd\nimport os\nimport time, csv\n\nclass DataManager(object):\n    \"\"\"\n    class to read and store simulation results\n    before use, please create a direction under current file path './data'\n    and must have a file 'init_location.xlsx' which contain the position of each entities\n    \"\"\"\n    def __init__(self, store_list = ['beamforming_matrix', 'reflecting_coefficient', 'UAV_state', 'user_capacity'],file_path = './data', store_path = './data/storage', project_name = None):\n        # 1 init location data\n        self.store_list = store_list\n        self.init_data_file = file_path + '/init_location.xlsx'\n        if project_name is None:\n            self.time_stemp = time.strftime('/%Y-%m-%d %H_%M_%S',time.localtime(time.time()))\n            self.store_path = store_path + self.time_stemp \n        else:\n            for i in range(1, 100, 1):\n                if i == 1:\n                    dir_name = store_path + '/' + project_name\n                else:\n                    dir_name = store_path + '/' + project_name + f'_{i}'\n                if not os.path.isdir(dir_name):\n                    self.store_path = dir_name\n                    break\n        os.makedirs(self.store_path) \n        # self.writer = pd.ExcelWriter(self.store_path + '/simulation_result.xlsx', engine='openpyxl')  # pylint: disable=abstract-class-instantiated \n        self.simulation_result_dic = {}\n        self.init_format()\n\n    def save_file(self, episode_cnt = 10):\n        # record step counts per episode\n        with open(self.store_path + \"/step_num_per_episode.csv\", \"a\", newline='') as f:\n            writer = csv.writer(f)\n            writer.writerow([len(list(self.simulation_result_dic.values())[0])])\n\n        # when ended, auto save to .mat file\n        scipy.io.savemat(self.store_path + '/simulation_result_ep_' + str(episode_cnt) + '.mat', {'result_' + str(episode_cnt):self.simulation_result_dic})\n        self.simulation_result_dic = {}\n        self.init_format()\n\n    def save_meta_data(self, meta_dic):\n        \"\"\"\n        save system and agent information\n        \"\"\"\n        scipy.io.savemat(self.store_path + '/meta_data.mat', {'meta_data': meta_dic})\n        \n    def init_format(self):\n        \"\"\"\n        used only one time in env.py\n        \"\"\"\n        for store_item in self.store_list:\n            self.simulation_result_dic.update({store_item:[]})\n\n    def read_init_location(self, entity_type = 'user', index = 0):\n        if entity_type == 'user' or 'attacker' or 'RIS' or 'RIS_norm_vec' or 'UAV':\n            return np.array([\\\n            pd.read_excel(self.init_data_file, sheet_name=entity_type)['x'][index],\\\n            pd.read_excel(self.init_data_file, sheet_name=entity_type)['y'][index],\\\n            pd.read_excel(self.init_data_file, sheet_name=entity_type)['z'][index]])\n        else:\n            return None\n    \n    def store_data(self, row_data, value_name):\n        \"\"\"\n        docstring\n        \"\"\"\n        self.simulation_result_dic[value_name].append(row_data)\n"
  },
  {
    "path": "ddpg.py",
    "content": "import os\nimport torch as T\n#import torch.cuda as T\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport torch.optim as optim\nimport numpy as np\nfrom data_manager import DataManager\nclass OUActionNoise(object):\n    def __init__(self, mu, sigma=0.15, theta=.2, dt=1e-2, x0=None):\n        self.theta = theta\n        self.mu = mu\n        self.sigma = sigma\n        self.dt = dt\n        self.x0 = x0\n        self.reset()\n\n    def __call__(self):\n        x = self.x_prev + self.theta * (self.mu - self.x_prev) * self.dt + \\\n            self.sigma * np.sqrt(self.dt) * np.random.normal(size=self.mu.shape)\n        self.x_prev = x\n        return x\n\n    def reset(self):\n        self.x_prev = self.x0 if self.x0 is not None else np.zeros_like(self.mu)\n\n    def __repr__(self):\n        return 'OrnsteinUhlenbeckActionNoise(mu={}, sigma={})'.format(\n                                                            self.mu, self.sigma)\n\nclass AWGNActionNoise(object):\n    def __init__(self, mu = 0, sigma=1):\n        self.mu = mu\n        self.sigma = sigma\n\n    def __call__(self):\n        #self.mu = mu\n        #self.sigma = sigma\n        x = np.random.normal(size=self.mu.shape) * self.sigma\n        return x\n\nclass ReplayBuffer(object):\n    def __init__(self, max_size, input_shape, n_actions):\n        self.mem_size = max_size\n        self.mem_cntr = 0\n        self.state_memory = np.zeros((self.mem_size, *input_shape))\n        self.new_state_memory = np.zeros((self.mem_size, *input_shape))\n        self.action_memory = np.zeros((self.mem_size, n_actions))\n        self.reward_memory = np.zeros(self.mem_size)\n        self.terminal_memory = np.zeros(self.mem_size, dtype=np.float32)\n\n    def store_transition(self, state, action, reward, state_, done):\n        index = self.mem_cntr % self.mem_size\n        self.state_memory[index] = state\n        self.new_state_memory[index] = state_\n        self.action_memory[index] = action\n        self.reward_memory[index] = reward\n        self.terminal_memory[index] = 1 - done\n        self.mem_cntr += 1\n\n    def sample_buffer(self, batch_size):\n        max_mem = min(self.mem_cntr, self.mem_size)\n\n        batch = np.random.choice(max_mem, batch_size)\n\n        states = self.state_memory[batch]\n        actions = self.action_memory[batch]\n        rewards = self.reward_memory[batch]\n        states_ = self.new_state_memory[batch]\n        terminal = self.terminal_memory[batch]\n\n        return states, actions, rewards, states_, terminal\n\nclass CriticNetwork(nn.Module):\n    def __init__(self, beta, input_dims, fc1_dims, fc2_dims, fc3_dims, fc4_dims, n_actions, name,\n                 chkpt_dir='C:\\\\demo\\\\IRS_DDPG_minimal\\\\main_foder\\\\tmp\\\\ddpg', load_file = ''):\n        super(CriticNetwork, self).__init__()\n        self.input_dims = input_dims\n        self.fc1_dims = fc1_dims\n        self.fc2_dims = fc2_dims\n        self.fc3_dims = fc3_dims\n        self.fc4_dims = fc4_dims\n        self.n_actions = n_actions\n        self.checkpoint_file = os.path.join(chkpt_dir,name+'_ddpg')\n        self.load_file = 'C:\\\\demo\\\\other_branch\\\\Learning-based_Secure_Transmission_for_RIS_Aided_mmWave-UAV_Communications_with_Imperfect_CSI\\\\data\\\\mannal_store\\\\models\\\\Critic_UAV_ddpg'\n        self.fc1 = nn.Linear(*self.input_dims, self.fc1_dims)\n        f1 = 1./np.sqrt(self.fc1.weight.data.size()[0])\n        T.nn.init.uniform_(self.fc1.weight.data, -f1, f1)\n        T.nn.init.uniform_(self.fc1.bias.data, -f1, f1)\n        #self.fc1.weight.data.uniform_(-f1, f1)\n        #self.fc1.bias.data.uniform_(-f1, f1)\n        self.bn1 = nn.LayerNorm(self.fc1_dims)\n\n        self.fc2 = nn.Linear(self.fc1_dims, self.fc2_dims)\n        f2 = 1./np.sqrt(self.fc2.weight.data.size()[0])\n        #f2 = 0.002\n        T.nn.init.uniform_(self.fc2.weight.data, -f2, f2)\n        T.nn.init.uniform_(self.fc2.bias.data, -f2, f2)\n        #self.fc2.weight.data.uniform_(-f2, f2)\n        #self.fc2.bias.data.uniform_(-f2, f2)\n        self.bn2 = nn.LayerNorm(self.fc2_dims)\n\n        self.fc3 = nn.Linear(self.fc2_dims, self.fc3_dims)\n        f3 = 1./np.sqrt(self.fc3.weight.data.size()[0])\n        T.nn.init.uniform_(self.fc3.weight.data, -f3, f3)\n        T.nn.init.uniform_(self.fc3.bias.data, -f3, f3)\n        #self.fc2.weight.data.uniform_(-f2, f2)\n        #self.fc2.bias.data.uniform_(-f2, f2)\n        self.bn3 = nn.LayerNorm(self.fc3_dims)\n\n        self.fc4 = nn.Linear(self.fc3_dims, self.fc4_dims)\n        f4 = 1./np.sqrt(self.fc4.weight.data.size()[0])\n        T.nn.init.uniform_(self.fc4.weight.data, -f4, f4)\n        T.nn.init.uniform_(self.fc4.bias.data, -f4, f4)\n        #self.fc2.weight.data.uniform_(-f2, f2)\n        #self.fc2.bias.data.uniform_(-f2, f2)\n        self.bn4 = nn.LayerNorm(self.fc4_dims)\n\n        self.action_value = nn.Linear(self.n_actions, self.fc4_dims)\n        f5 = 0.003\n        self.q = nn.Linear(self.fc4_dims, 1)\n        T.nn.init.uniform_(self.q.weight.data, -f5, f5)\n        T.nn.init.uniform_(self.q.bias.data, -f5, f5)\n        #self.q.weight.data.uniform_(-f3, f3)\n        #self.q.bias.data.uniform_(-f3, f3)\n\n        self.optimizer = optim.Adam(self.parameters(), lr=beta)\n#        if torch.cuda.available():\n#            import torch.cuda as T\n#        else:\n#            import torch as T\n        self.device = T.device('cuda:0' if T.cuda.is_available() else 'cpu')\n\n        self.to(self.device)\n\n    def forward(self, state, action):\n        state_value = self.fc1(state)\n        state_value = self.bn1(state_value)\n        state_value = F.relu(state_value)\n        state_value = self.fc2(state_value)\n        state_value = self.bn2(state_value)\n        state_value = F.relu(state_value)\n        state_value = self.fc3(state_value)\n        state_value = self.bn3(state_value)\n        state_value = F.relu(state_value)\n        state_value = self.fc4(state_value)\n        state_value = self.bn4(state_value)\n\n        action_value = F.relu(self.action_value(action))\n        state_action_value = F.relu(T.add(state_value, action_value))\n        state_action_value = self.q(state_action_value)\n\n        return state_action_value\n\n    def save_checkpoint(self):\n        print('... saving checkpoint ...')\n        T.save(self.state_dict(), self.checkpoint_file)\n\n    def load_checkpoint(self,load_file = ''):\n        print('... loading checkpoint ...')\n        if T.cuda.is_available():\n            self.load_state_dict(T.load(load_file))\n        else:\n            self.load_state_dict(T.load(load_file, map_location=T.device('cpu')))\n\nclass ActorNetwork(nn.Module):\n    def __init__(self, alpha, input_dims, fc1_dims, fc2_dims, fc3_dims, fc4_dims, n_actions, name,\n                 chkpt_dir='C:\\\\demo\\\\IRS_DDPG_minimal\\\\main_foder\\\\tmp\\\\ddpg', load_file = ''):\n        super(ActorNetwork, self).__init__()\n        self.input_dims = input_dims\n        self.fc1_dims = fc1_dims\n        self.fc2_dims = fc2_dims\n        self.fc3_dims = fc3_dims\n        self.fc4_dims = fc4_dims        \n        self.n_actions = n_actions\n        self.checkpoint_file = os.path.join(chkpt_dir,name+'_ddpg')\n        self.load_file = 'C:\\\\demo\\\\other_branch\\\\Learning-based_Secure_Transmission_for_RIS_Aided_mmWave-UAV_Communications_with_Imperfect_CSI\\\\data\\\\mannal_store\\\\models\\\\Actor_UAV_ddpg'\n        self.fc1 = nn.Linear(*self.input_dims, self.fc1_dims)\n        f1 = 1./np.sqrt(self.fc1.weight.data.size()[0])\n#        T.nn.init.uniform_(self.fc1.weight.data, -f1, f1)\n#        T.nn.init.uniform_(self.fc1.bias.data, -f1, f1)\n        self.fc1.weight.data.uniform_(-f1, f1)\n        self.fc1.bias.data.uniform_(-f1, f1)\n        self.bn1 = nn.LayerNorm(self.fc1_dims)\n\n        self.fc2 = nn.Linear(self.fc1_dims, self.fc2_dims)\n        #f2 = 0.002\n        f2 = 1./np.sqrt(self.fc2.weight.data.size()[0])\n#        T.nn.init.uniform_(self.fc2.weight.data, -f2, f2)\n#        T.nn.init.uniform_(self.fc2.bias.data, -f2, f2)\n        self.fc2.weight.data.uniform_(-f2, f2)\n        self.fc2.bias.data.uniform_(-f2, f2)\n        self.bn2 = nn.LayerNorm(self.fc2_dims)\n\n        self.fc3 = nn.Linear(self.fc2_dims, self.fc3_dims)\n        f3 = 1./np.sqrt(self.fc3.weight.data.size()[0])\n#        T.nn.init.uniform_(self.fc3.weight.data, -f3, f3)\n#        T.nn.init.uniform_(self.fc3.bias.data, -f3, f3)\n        self.fc2.weight.data.uniform_(-f2, f2)\n        self.fc2.bias.data.uniform_(-f2, f2)\n        self.bn3 = nn.LayerNorm(self.fc3_dims)\n\n        self.fc4 = nn.Linear(self.fc3_dims, self.fc4_dims)\n        f4 = 1./np.sqrt(self.fc4.weight.data.size()[0])\n#        T.nn.init.uniform_(self.fc4.weight.data, -f4, f4)\n#        T.nn.init.uniform_(self.fc4.bias.data, -f4, f4)\n        self.fc2.weight.data.uniform_(-f2, f2)\n        self.fc2.bias.data.uniform_(-f2, f2)\n        self.bn4 = nn.LayerNorm(self.fc4_dims)\n\n        #f3 = 0.004\n        f5 = 0.003\n        self.mu = nn.Linear(self.fc4_dims, self.n_actions)\n#        T.nn.init.uniform_(self.mu.weight.data, -f5, f5)\n#        T.nn.init.uniform_(self.mu.bias.data, -f5, f5)\n        self.mu.weight.data.uniform_(-f3, f3)\n        self.mu.bias.data.uniform_(-f3, f3)\n\n        self.optimizer = optim.Adam(self.parameters(), lr=alpha)\n        self.device = T.device('cuda:0' if T.cuda.is_available() else 'cpu')\n\n        self.to(self.device)\n\n    def forward(self, state):\n        x = self.fc1(state)\n        x = self.bn1(x)\n        x = F.relu(x)\n        x = self.fc2(x)\n        x = self.bn2(x)\n        x = F.relu(x)\n        x = self.fc3(x)\n        x = self.bn3(x)\n        x = F.relu(x)\n        x = self.fc4(x)\n        x = self.bn4(x)\n        x = F.relu(x)\n        x = T.tanh(self.mu(x))\n\n        return x\n\n    def save_checkpoint(self):\n        print('... saving checkpoint ...')\n        T.save(self.state_dict(), self.checkpoint_file)\n\n    def load_checkpoint(self, load_file=''):\n        print('... loading checkpoint ...')\n        if T.cuda.is_available():\n            self.load_state_dict(T.load(load_file))\n        else:\n            self.load_state_dict(T.load(load_file, map_location=T.device('cpu')))\n\nclass Agent(object):\n    def __init__(self, alpha, beta, input_dims, tau, env, gamma=0.99,\n                 n_actions=2, max_size=1000000, layer1_size=400,\n                 layer2_size=300, layer3_size=256, layer4_size=128, batch_size=64, noise = 'AWGN', agent_name = 'default', load_file = ''):\n        self.load_file = load_file\n        self.layer1_size = layer1_size\n        self.layer2_size = layer2_size\n        self.layer3_size = layer3_size\n        self.layer4_size = layer4_size\n        self.gamma = gamma\n        self.tau = tau\n        self.memory = ReplayBuffer(max_size, input_dims, n_actions)\n        self.batch_size = batch_size\n\n        self.actor = ActorNetwork(alpha, input_dims, layer1_size,\n                                  layer2_size, layer3_size, layer4_size, n_actions=n_actions,\n                                  name='Actor_' + agent_name,chkpt_dir=env.data_manager.store_path )\n        self.critic = CriticNetwork(beta, input_dims, layer1_size,\n                                    layer2_size, layer3_size, layer4_size, n_actions=n_actions,\n                                    name='Critic_' + agent_name,chkpt_dir=env.data_manager.store_path)\n\n        self.target_actor = ActorNetwork(alpha, input_dims, layer1_size,\n                                         layer2_size, layer3_size, layer4_size, n_actions=n_actions,\n                                         name='TargetActor_' + agent_name,chkpt_dir=env.data_manager.store_path)\n        self.target_critic = CriticNetwork(beta, input_dims, layer1_size,\n                                           layer2_size, layer3_size, layer4_size, n_actions=n_actions,\n                                           name='TargetCritic_' + agent_name,chkpt_dir=env.data_manager.store_path)\n        if noise == 'OU':\n            self.noise = OUActionNoise(mu=np.zeros(n_actions))\n        elif noise == 'AWGN':\n            self.noise = AWGNActionNoise(mu = np.zeros(n_actions))\n        # tau = 1 means copy parameters to target\n        self.update_network_parameters(tau=1)\n\n    def choose_action(self, observation, greedy=0.5, epsilon = 1):\n        self.actor.eval()\n        observation = T.tensor(observation, dtype=T.float).to(self.actor.device)\n        mu = self.actor.forward(observation).to(self.actor.device)\n        mu_prime = mu + T.tensor(greedy * self.noise(),\n                                 dtype=T.float).to(self.actor.device)\n        self.actor.train()\n        return mu_prime.cpu().detach().numpy()\n\n\n    def remember(self, state, action, reward, new_state, done):\n        self.memory.store_transition(state, action, reward, new_state, done)\n\n    def learn(self):\n        if self.memory.mem_cntr < self.batch_size:\n            return\n        # the done here is opposite of the done in the env\n        state, action, reward, new_state, done = \\\n                                      self.memory.sample_buffer(self.batch_size)\n\n        # trun s, a, r, new_s into tensor\n        reward = T.tensor(reward, dtype=T.float).to(self.critic.device)\n        done = T.tensor(done).to(self.critic.device)\n        new_state = T.tensor(new_state, dtype=T.float).to(self.critic.device)\n        action = T.tensor(action, dtype=T.float).to(self.critic.device)\n        state = T.tensor(state, dtype=T.float).to(self.critic.device)\n\n        # trun on evaliation mode of target actor, target critic, critic net\n        # fix these three nets\n        self.target_actor.eval()\n        self.target_critic.eval()\n        self.critic.eval()\n\n        # argmax_action*(Q(new_state,action*)),i.e. pi(s|pi),i.e. stratage choose the action* maxmium the future Q \n        target_actions = self.target_actor.forward(new_state)\n        # caculate the Q(new_state, new_action)\n        critic_value_ = self.target_critic.forward(new_state, target_actions)\n        # caculate the Q(state, action)\n        critic_value = self.critic.forward(state, action)\n\n        target = []\n        for j in range(self.batch_size):\n            target.append(reward[j] + self.gamma*critic_value_[j]*done[j])\n        target = T.tensor(target).to(self.critic.device)\n        target = target.view(self.batch_size, 1)\n\n        # here update the critic net using mse of (r + gamma * Q_argmax_a*(newstate, a*)) - Q(state, action)\n        self.critic.train()\n        self.critic.optimizer.zero_grad()\n        critic_loss = F.mse_loss(target, critic_value)\n        critic_loss.backward()\n        self.critic.optimizer.step()\n\n        # here update the actor net by policy gradient\n        # first fix the critic net\n        self.critic.eval()\n        self.actor.optimizer.zero_grad()\n        mu = self.actor.forward(state)\n        self.actor.train()\n        actor_loss = -self.critic.forward(state, mu)\n        actor_loss = T.mean(actor_loss)\n        actor_loss.backward()\n        self.actor.optimizer.step()\n\n        self.update_network_parameters()\n\n    def update_network_parameters(self, tau=None):\n        if tau is None:\n            tau = self.tau\n\n        actor_params = self.actor.named_parameters()\n        critic_params = self.critic.named_parameters()\n        target_actor_params = self.target_actor.named_parameters()\n        target_critic_params = self.target_critic.named_parameters()\n\n        critic_state_dict = dict(critic_params)\n        actor_state_dict = dict(actor_params)\n        target_critic_dict = dict(target_critic_params)\n        target_actor_dict = dict(target_actor_params)\n\n        for name in critic_state_dict:\n            critic_state_dict[name] = tau*critic_state_dict[name].clone() + \\\n                                      (1-tau)*target_critic_dict[name].clone()\n\n        self.target_critic.load_state_dict(critic_state_dict)\n\n        for name in actor_state_dict:\n            actor_state_dict[name] = tau*actor_state_dict[name].clone() + \\\n                                      (1-tau)*target_actor_dict[name].clone()\n        self.target_actor.load_state_dict(actor_state_dict)\n\n        \"\"\"\n        #Verify that the copy assignment worked correctly\n        target_actor_params = self.target_actor.named_parameters()\n        target_critic_params = self.target_critic.named_parameters()\n        critic_state_dict = dict(target_critic_params)\n        actor_state_dict = dict(target_actor_params)\n        print('\\nActor Networks', tau)\n        for name, param in self.actor.named_parameters():\n            print(name, T.equal(param, actor_state_dict[name]))\n        print('\\nCritic Networks', tau)\n        for name, param in self.critic.named_parameters():\n            print(name, T.equal(param, critic_state_dict[name]))\n        input()\n        \"\"\"\n    def save_models(self):\n        self.actor.save_checkpoint()\n        self.target_actor.save_checkpoint()\n        self.critic.save_checkpoint()\n        self.target_critic.save_checkpoint()\n\n    def load_models(self, load_file_actor = '',load_file_critic =''):\n        self.actor.load_checkpoint(load_file = load_file_actor)\n        self.target_actor.load_checkpoint(load_file = load_file_actor)\n        self.critic.load_checkpoint(load_file = load_file_critic)\n        self.target_critic.load_checkpoint(load_file = load_file_critic)\n\n    def check_actor_params(self):\n        current_actor_params = self.actor.named_parameters()\n        current_actor_dict = dict(current_actor_params)\n        original_actor_dict = dict(self.original_actor.named_parameters())\n        original_critic_dict = dict(self.original_critic.named_parameters())\n        current_critic_params = self.critic.named_parameters()\n        current_critic_dict = dict(current_critic_params)\n        print('Checking Actor parameters')\n\n        for param in current_actor_dict:\n            print(param, T.equal(original_actor_dict[param], current_actor_dict[param]))\n        print('Checking critic parameters')\n        for param in current_critic_dict:\n            print(param, T.equal(original_critic_dict[param], current_critic_dict[param]))\n        input()\n"
  },
  {
    "path": "entity.py",
    "content": "import numpy as np\nimport math\n#from math_tool import *\n\nclass UAV(object):\n    \"\"\"\n    UAV object with coordinate \n    And with ULA antenas, default 8 \n    And limited power\n    And with fixed rotation angle\n    \"\"\"\n    def __init__(self, coordinate, index = 0, rotation = 0, ant_num=16, ant_type = 'ULA', max_movement_per_time_slot = 0.5):\n        \"\"\"\n        coordinate is the init coordinate of UAV, meters, np.array\n        \"\"\"\n        self.max_movement_per_time_slot = max_movement_per_time_slot\n        self.type = 'UAV'\n        self.coordinate = coordinate\n        self.rotation = rotation\n        self.ant_num = ant_num\n        self.ant_type = ant_type\n        self.coor_sys = [np.array([1, 0, 0]), np.array([0, -1, 0]), np.array([0, 0, -1])]\n        self.index = index\n\n        # init beamforming matrix in UAV (must be inited in env.py)\n        self.G = np.mat(np.zeros((ant_num, 1)))\n        self.G_Pmax = 0\n\n    def reset(self, coordinate):\n        \"\"\"\n        reset UAV coordinate\n        \"\"\"\n        self.coordinate = coordinate\n        \n    def update_coor_sys(self, delta_angle):\n        \"\"\"\n        used in function move to update the relevant coordinate system \n        \"\"\"\n        self.rotation = self.rotation + delta_angle\n        coor_sys_x = np.array([\\\n        math.cos(self.rotation),\\\n        math.sin(self.rotation),\\\n        0])\n        coor_sys_z = np.array([\\\n        0,\\\n        0,\\\n        -1])\n        coor_sys_y = np.cross(coor_sys_z, coor_sys_x)\n        self.coor_sys = np.array([coor_sys_x,coor_sys_y,coor_sys_z])\n        \n    def update_coordinate(self, distance_delta_d, direction_fai):\n        \"\"\"\n        used in function move to update UAV cordinate\n        \"\"\"\n        delta_x = distance_delta_d * math.cos(direction_fai)\n        delta_y = distance_delta_d * math.sin(direction_fai)\n        self.coordinate[0] += delta_x\n        self.coordinate[1] += delta_y\n\n    def move(self, distance_delta_d, direction_fai, delta_angle = 0):\n        \"\"\"\n        preform the 2D movement every step\n        \"\"\"\n        self.update_coordinate(distance_delta_d, direction_fai)\n        self.update_coor_sys(delta_angle)\n\nclass RIS(object):\n    \"\"\"\n    reconfigrable intelligent surface\n    with N reflecting elements, UPA, default 4 X 4 = 16\n    continues phase shift\n    \"\"\"\n    def __init__(self, coordinate, coor_sys_z, index = 0, ant_num=36, ant_type = 'UPA'):\n        \"\"\"\n        coordinate is the init coordinate of with N reflecting elements, meters, np.array\n        norm_vec is the normal vector of the reflecting direction\n        !!! ant_num Must be the square of a certain int number\n        \"\"\"\n        self.type = 'RIS'\n        self.coordinate = coordinate\n        self.ant_num = ant_num\n        self.ant_type = ant_type\n        coor_sys_z = coor_sys_z / np.linalg.norm(coor_sys_z)\n        coor_sys_x = np.cross(coor_sys_z, np.array([0,0,1]))\n        coor_sys_x = coor_sys_x / np.linalg.norm(coor_sys_x)\n        coor_sys_y = np.cross(coor_sys_z, coor_sys_x)\n        self.coor_sys = [coor_sys_x,coor_sys_y,coor_sys_z]\n        self.index = index\n\n        # init reflecting phase shift\n        self.Phi = np.mat(np.diag(np.ones(self.ant_num, dtype=complex)), dtype = complex)\n\nclass User(object):\n    \"\"\"\n    user with single antenas\n    \"\"\"\n    def __init__(self, coordinate, index, ant_num = 1, ant_type = 'single'):\n        \"\"\"\n        coordinate is the init coordinate of user, meters, np.array\n        ant_num is the antenas number of user\n        \"\"\"\n        self.type = 'user'\n        self.coordinate = coordinate\n        self.ant_num = ant_num\n        self.ant_type = ant_type\n        self.index = index\n        self.coor_sys = [np.array([1, 0, 0]), np.array([0, 1, 0]), np.array([0, 0, 1])]\n\n        # init the capacity\n        self.capacity = 0\n        self.secure_capacity = 0\n        self.QoS_constrain = 0\n        # init the comprehensive_channel, (must used in env.py to init)\n        self.comprehensive_channel = 0\n        # init receive noise sigma in dB\n        self.noise_power = -114\n\n    def reset(self, coordinate):\n        \"\"\"\n        reset user coordinate\n        \"\"\"\n        self.coordinate = coordinate\n        \n    def update_coordinate(self, distance_delta_d, direction_fai):\n        \"\"\"\n        used in function move to update UAV cordinate\n        \"\"\"\n        delta_x = distance_delta_d * math.cos(direction_fai)\n        delta_y = distance_delta_d * math.sin(direction_fai)\n        self.coordinate[0] += delta_x\n        self.coordinate[1] += delta_y\n\n    def move(self, distance_delta_d, direction_fai):\n        \"\"\"\n        preform the 2D movement every step\n        \"\"\"\n        self.update_coordinate(distance_delta_d, direction_fai)\n        \nclass Attacker(object):\n    \"\"\"\n    Attacker with single antenas\n    \"\"\"\n    def __init__(self, coordinate, index, ant_num = 1, ant_type= 'single'):\n        \"\"\"\n        coordinate is the init coordinate of Attacker, meters, np.array\n        ant_num is the antenas number of Attacker\n        \"\"\"\n        self.type = 'attacker'\n        self.coordinate = coordinate\n        self.ant_num = ant_num\n        self.ant_type = ant_type\n        self.index = index\n        self.coor_sys = [np.array([1, 0, 0]), np.array([0, 1, 0]), np.array([0, 0, 1])]\n\n        # init the capacity, this is a K length np.array ,shape: (K,)\n        # represent the attack rate for kth user, (must init in env.py)\n        self.capacity = 0\n        self.comprehensive_channel = 0\n        # init receive noise sigma in dBmW\n        self.noise_power = -114\n\n    def reset(self, coordinate):\n        \"\"\"\n        reset attacker coordinate\n        \"\"\"\n        self.coordinate = coordinate\n\n    def update_coordinate(self, distance_delta_d, direction_fai):\n        \"\"\"\n        used in function move to update UAV cordinate\n        \"\"\"\n        delta_x = distance_delta_d * math.cos(direction_fai)\n        delta_y = distance_delta_d * math.sin(direction_fai)\n        self.coordinate[0] += delta_x\n        self.coordinate[1] += delta_y\n\n    def move(self, distance_delta_d, direction_fai):\n        \"\"\"\n        preform the 2D movement every step\n        \"\"\"\n        self.update_coordinate(distance_delta_d, direction_fai)"
  },
  {
    "path": "env.py",
    "content": "#%matplotlib inline\nimport numpy as np\nfrom entity import *\nfrom channel import *\nfrom math_tool import *\nfrom datetime import datetime\nfrom mpl_toolkits import mplot3d\nimport matplotlib.pyplot as plt\nfrom render import Render\nfrom data_manager import DataManager\n# s.t every simulition is the same model\nnp.random.seed(2)\n\n######################################################\n# new for energy \n# energy related parameters of rotary-wing UAV\n# based on Energy Minimization in Internet-of-Things System Based on Rotary-Wing UAV\nP_i = 790.6715\nP_0 = 580.65\nU2_tip = (200) ** 2\ns = 0.05\nd_0 = 0.3\np = 1.225\nA = 0.79\ndelta_time = 0.1/1000 #0.1ms\n\n# add ons hover veloctiy\n# based on https://www.intechopen.com/chapters/57483\nm = 1.3 # mass: assume 1.3kg https://www.droneblog.com/average-weights-of-common-types-of-drones/#:~:text=In%20most%20cases%2C%20toy%20drones,What%20is%20this%3F\ng = 9.81 # gravity\nT = m * g # thrust\nv_0 = (T / (A * 2 * p)) ** 0.5\n\ndef get_energy_consumption(v_t):\n    '''\n    arg\n    1) v_t = displacement per time slot\n    '''\n    energy_1 = P_0 \\\n                + 3 * P_0 * (abs(v_t)) ** 2 / U2_tip \\\n                + 0.5 * d_0 * p * s * A * (abs(v_t))**3\n    \n    energy_2 = P_i * ((\n                    (1 + (abs(v_t) ** 4) / (4 * (v_0 ** 4))) ** 0.5 \\\n                    - (abs(v_t) ** 2) / (2 * (v_0 **2)) \\\n                ) ** 0.5)\n    \n    energy = delta_time * (energy_1 + energy_2)\n    return energy \n\nENERGY_MIN = get_energy_consumption(0.25)\nENERGY_MAX = get_energy_consumption(0)\n\n######################################################\n\n\nclass MiniSystem(object):\n#class MiniSystem(K=1):\n    \"\"\"\n    define mini RIS communication system with one UAV\n        and one RIS and one user, one attacker\n    \"\"\"\n    def __init__(self, UAV_num = 1, RIS_num = 1, user_num = 1, attacker_num = 1, fre = 28e9, \\\n                 RIS_ant_num = 16, UAV_ant_num=8, if_dir_link = 1, if_with_RIS = True, \\\n                 if_move_users = True, if_movements = True, reverse_x_y = (True, True), \\\n                 if_UAV_pos_state = True, reward_design = 'ssr', project_name = None, step_num=100):\n        self.if_dir_link = if_dir_link\n        self.if_with_RIS = if_with_RIS\n        self.if_move_users = if_move_users\n        self.if_movements = if_movements\n        self.if_UAV_pos_state = if_UAV_pos_state\n        self.reverse_x_y = reverse_x_y\n        self.user_num = user_num\n        self.attacker_num = attacker_num\n        self.border = [(-25,25), (0, 50)]\n        # 1.init entities: 1 UAV, 1 RIS, many users and attackers\n        self.data_manager = DataManager(file_path='./data', project_name = project_name, \\\n        store_list = ['beamforming_matrix', 'reflecting_coefficient', 'UAV_state', 'user_capacity', 'secure_capacity', 'attaker_capacity','G_power', 'reward','UAV_movement'])\n        # 1.1 init UAV position and beamforming matrix\n        self.UAV = UAV(\n            coordinate=self.data_manager.read_init_location('UAV', 0), \n            ant_num= UAV_ant_num, \n            max_movement_per_time_slot=0.25)\n        self.UAV.G = np.mat(np.ones((self.UAV.ant_num, user_num), dtype=complex), dtype=complex)\n        self.power_factor = 100\n        self.UAV.G_Pmax =  np.trace(self.UAV.G * self.UAV.G.H) * self.power_factor\n        # 1.2 init RIS\n        self.RIS = RIS(\\\n        coordinate=self.data_manager.read_init_location('RIS', 0), \\\n        coor_sys_z=self.data_manager.read_init_location('RIS_norm_vec', 0), \\\n        ant_num=RIS_ant_num)\n        # 1.3 init users\n        self.user_list = []\n        \n        for i in range(user_num):\n            user_coordinate = self.data_manager.read_init_location('user', i)\n            user = User(coordinate=user_coordinate, index=i)\n            user.noise_power = -114\n            self.user_list.append(user)\n\n        # 1.4 init attackers\n        self.attacker_list = []\n        \n        for i in range(attacker_num):\n            attacker_coordinate = self.data_manager.read_init_location('attacker', i)\n            attacker = Attacker(coordinate=attacker_coordinate, index=i)\n            attacker.capacity = np.zeros((user_num))\n            attacker.noise_power = -114\n            self.attacker_list.append(attacker)\n        # 1.5 generate the eavesdrop capacity array , shape: P X K\n        self.eavesdrop_capacity_array= np.zeros((attacker_num, user_num))\n        \n        # 1.6 reward design\n        self.reward_design = reward_design # reward_design is ['ssr' or 'see']\n\n        # 1.7 step_num\n        self.step_num = step_num\n        \n        # 2.init channel\n        self.H_UR = mmWave_channel(self.UAV, self.RIS, fre)\n        self.h_U_k = []\n        self.h_R_k = []\n        self.h_U_p = []\n        self.h_R_p = []\n        for user_k in self.user_list:\n            self.h_U_k.append(mmWave_channel(user_k, self.UAV, fre))\n            self.h_R_k.append(mmWave_channel(user_k, self.RIS, fre))\n        for attacker_p in self.attacker_list:\n            self.h_U_p.append(mmWave_channel(attacker_p, self.UAV, fre))\n            self.h_R_p.append(mmWave_channel(attacker_p, self.RIS, fre))\n\n        # 3 update user and attaker channel capacity\n        self.update_channel_capacity()\n\n        # 4 draw system\n        self.render_obj = Render(self)      \n        \n    def reset(self):\n        \"\"\"\n        reset UAV, users, attackers, beamforming matrix, reflecting coefficient\n        \"\"\"\n        # 1 reset UAV\n        self.UAV.reset(coordinate=self.data_manager.read_init_location('UAV', 0))\n        # 2 reset users\n        for i in range(self.user_num):\n            user_coordinate = self.data_manager.read_init_location('user', i)\n            self.user_list[i].reset(coordinate=user_coordinate)\n        # 3 reset attackers\n        for i in range(self.attacker_num):\n            attacker_coordinate = self.data_manager.read_init_location('attacker', i)\n            self.attacker_list[i].reset(coordinate=attacker_coordinate)\n        # 4 reset beamforming matrix\n        self.UAV.G = np.mat(np.ones((self.UAV.ant_num, self.user_num), dtype=complex), dtype=complex)\n        self.UAV.G_Pmax = np.trace(self.UAV.G * self.UAV.G.H) * self.power_factor\n        # 5 reset reflecting coefficient\n        \"\"\"self.RIS = RIS(\\\n        coordinate=self.data_manager.read_init_location('RIS', 0), \\\n        coor_sys_z=self.data_manager.read_init_location('RIS_norm_vec', 0), \\\n        ant_num=16)\"\"\"\n        self.RIS.Phi = np.mat(np.diag(np.ones(self.RIS.ant_num, dtype=complex)), dtype = complex)\n        # 6 reset time\n        self.render_obj.t_index = 0\n        # 7 reset CSI\n        self.H_UR.update_CSI()\n        for h in self.h_U_k + self.h_U_p + self.h_R_k + self.h_R_p:\n            h.update_CSI()\n        # 8 reset capcaity\n        self.update_channel_capacity()\n\n    def step(self, action_0 = 0, action_1 = 0, G = 0, Phi = 0, set_pos_x = 0, set_pos_y = 0):\n        \"\"\"\n        test step only move UAV and update channel\n        \"\"\"\n        # 0 update render\n        \n        self.render_obj.t_index += 1\n        # 1 update entities\n        \n        if self.if_move_users:\n            self.user_list[0].update_coordinate(0.2, -1/2 * math.pi)\n            self.user_list[1].update_coordinate(0.2, -1/2 * math.pi)\n\n        if self.if_movements:\n            move_x = action_0 * self.UAV.max_movement_per_time_slot\n            move_y = action_1 * self.UAV.max_movement_per_time_slot\n            \n            ######################################################\n            # new for energy \n            v_t = (move_x ** 2 + move_y ** 2) ** 0.5\n            #self.data_manager.store_data([v_t],'velocity')\n            ######################################################\n\n            if self.reverse_x_y[0]:\n                move_x = -move_x\n            \n            if self.reverse_x_y[1]:\n                move_y = -move_y\n                \n            self.UAV.coordinate[0] +=move_x\n            self.UAV.coordinate[1] +=move_y\n            self.data_manager.store_data([move_x, move_y], 'UAV_movement')\n        else:\n            set_pos_x = map_to(set_pos_x, (-1, 1), self.border[0])\n            set_pos_y = map_to(set_pos_y, (-1, 1), self.border[1])\n            self.UAV.coordinate[0] = set_pos_x\n            self.UAV.coordinate[1] = set_pos_y\n\n        # 2 update channel CSI\n        \n        for h in self.h_U_k + self.h_U_p + self.h_R_k + self.h_R_p:\n            h.update_CSI()\n        # !!! test to make direct link zero\n        if self.if_dir_link == 0:\n            for h in self.h_U_k + self.h_U_p:\n                h.channel_matrix = np.mat(np.zeros(shape = np.shape(h.channel_matrix)), dtype=complex)\n        if self.if_with_RIS == False:\n            self.H_UR.channel_matrix = np.mat(np.zeros((self.RIS.ant_num, self.UAV.ant_num)), dtype=complex)\n        else:\n            self.H_UR.update_CSI()\n        # 3 update beamforming matrix & reflecting phase shift\n        \"\"\"\n        self.UAV.G = G\n        self.RIS.Phi = Phi\n        \"\"\"\n        self.UAV.G = convert_list_to_complex_matrix(G, (self.UAV.ant_num, self.user_num)) * math.pow(self.power_factor, 0.5)\n        \n        # fix beamforming matrix\n        #self.UAV.G = np.mat(np.ones((self.UAV.ant_num, self.user_num), dtype=complex), dtype=complex) * math.pow(self.power_factor, 0.5)\n        if self.if_with_RIS:\n            self.RIS.Phi = convert_list_to_complex_diag(Phi, self.RIS.ant_num)\n        # 4 update channel capacity in every user and attacker\n        self.update_channel_capacity()\n        # 5 store current system state to .mat\n        self.store_current_system_sate()\n        # 6 get new state\n        new_state = self.observe()\n        # 7 get reward\n        reward = self.reward()\n        \n        # 7.1 reward with energy efficiency\n        ######################################################\n        if self.reward_design == 'see':\n            # new for see\n            energy = energy_raw = get_energy_consumption(v_t)\n            energy -= ENERGY_MIN\n            energy /= (ENERGY_MAX - ENERGY_MIN)\n            energy_penalty = -1 * 0.1 * abs(reward) * energy # -1 * 0.1 * reward * energy\n            if reward > 0:\n                reward += energy_penalty\n        ######################################################\n        \n        # 8 calculate if UAV is cross the bourder\n        reward = math.tanh(reward) # new for energy (ori not commented)\n        done = False\n        x, y = self.UAV.coordinate[0:2]\n        if x < self.border[0][0] or x > self.border[0][1]:\n            done = True\n            reward = -10\n        if y < self.border[1][0] or y > self.border[1][1]:\n            done = True\n            reward = -10\n        self.data_manager.store_data([reward],'reward')\n        return new_state, reward, done, []\n\n    def reward(self):\n        \"\"\"\n        used in function step to get the reward of current step\n        \"\"\"\n        reward = 0\n        reward_ = 0\n        P = np.trace(self.UAV.G * self.UAV.G.H)\n        if abs(P) > abs(self.UAV.G_Pmax) :\n            reward = abs(self.UAV.G_Pmax) - abs(P)\n            reward /= self.power_factor \n        else:\n            for user in self.user_list:\n                r = user.capacity - max(self.eavesdrop_capacity_array[:, user.index])\n                if r < user.QoS_constrain:\n                    reward_ += r - user.QoS_constrain\n                else:\n                    reward += r/(self.user_num*2)\n            if reward_ < 0:\n                reward = reward_ * self.user_num * 10\n     \n        return reward\n    \n    def observe(self):\n        \"\"\"\n        used in function main to get current state\n        the state is a list with \n        \"\"\"\n        # users' and attackers' comprehensive channel\n        comprehensive_channel_elements_list = []\n        for entity in self.user_list + self.attacker_list:\n            tmp_list = list(np.array(np.reshape(entity.comprehensive_channel, (1,-1)))[0])\n            comprehensive_channel_elements_list += list(np.real(tmp_list)) + list(np.imag(tmp_list)) \n        UAV_position_list = []\n        if self.if_UAV_pos_state:\n            UAV_position_list = list(self.UAV.coordinate)\n\n        return comprehensive_channel_elements_list + UAV_position_list\n\n    def store_current_system_sate(self):\n        \"\"\"\n        function used in step() to store system state\n        \"\"\"\n        # 1 store beamforming matrix\n        row_data = list(np.array(np.reshape(self.UAV.G, (1, -1)))[0,:])\n        self.data_manager.store_data(row_data, 'beamforming_matrix')\n        # 2 store reflecting coefficient matrix\n        row_data = list(np.array(np.reshape(diag(self.RIS.Phi), (1,-1)))[0,:])      \n        self.data_manager.store_data(row_data, 'reflecting_coefficient')\n        # 3 store UAV state\n        row_data = list(self.UAV.coordinate)\n        self.data_manager.store_data(row_data, 'UAV_state')\n        # 4 store user_capicity\n        row_data = [user.secure_capacity for user in self.user_list] \\\n        + [user.capacity for user in self.user_list]\n        # 5 store G_power\n        row_data = [np.trace(self.UAV.G*self.UAV.G.H), self.UAV.G_Pmax]\n        self.data_manager.store_data(row_data, 'G_power')\n        row_data = []\n        for user in self.user_list:\n            row_data.append(user.capacity)\n        self.data_manager.store_data(row_data, 'user_capacity')\n\n        row_data = []\n        for attacker in self.attacker_list:\n            row_data.append(attacker.capacity)\n        self.data_manager.store_data(row_data, 'attaker_capacity')\n\n        row_data = []\n        for user in self.user_list:\n            row_data.append(user.secure_capacity)\n        self.data_manager.store_data(row_data, 'secure_capacity')\n\n\n    def update_channel_capacity(self):\n        \"\"\"\n        function used in step to calculate user and attackers' capacity \n        \"\"\"\n        # 1 calculate eavesdrop rate\n        for attacker in self.attacker_list:\n            attacker.capacity = self.calculate_capacity_array_of_attacker_p(attacker.index)\n            self.eavesdrop_capacity_array[attacker.index, :] = attacker.capacity\n            # remmeber to update comprehensive_channel\n            attacker.comprehensive_channel = self.calculate_comprehensive_channel_of_attacker_p(attacker.index)\n        # 2 calculate unsecure rate\n        for user in self.user_list:\n            user.capacity = self.calculate_capacity_of_user_k(user.index)\n            # 3 calculate secure rate\n            user.secure_capacity = self.calculate_secure_capacity_of_user_k(user.index)\n            # remmeber to update comprehensive_channel\n            user.comprehensive_channel = self.calculate_comprehensive_channel_of_user_k(user.index)\n\n    def calculate_comprehensive_channel_of_attacker_p(self, p):\n        \"\"\"\n        used in update_channel_capacity to calculate the comprehensive_channel of attacker p\n        \"\"\"\n        h_U_p = self.h_U_p[p].channel_matrix\n        h_R_p = self.h_R_p[p].channel_matrix\n        Psi = diag_to_vector(self.RIS.Phi)\n        H_c = vector_to_diag(h_R_p).H * self.H_UR.channel_matrix\n        return h_U_p.H + Psi.H * H_c\n\n    def calculate_comprehensive_channel_of_user_k(self, k):\n        \"\"\"\n        used in update_channel_capacity to calculate the comprehensive_channel of user k\n        \"\"\"\n        h_U_k = self.h_U_k[k].channel_matrix\n        h_R_k = self.h_R_k[k].channel_matrix\n        Psi = diag_to_vector(self.RIS.Phi)\n        H_c = vector_to_diag(h_R_k).H * self.H_UR.channel_matrix\n        return h_U_k.H + Psi.H * H_c\n\n    def calculate_capacity_of_user_k(self, k):\n        \"\"\"\n        function used in update_channel_capacity to calculate one user\n        \"\"\"     \n        noise_power = self.user_list[k].noise_power\n        h_U_k = self.h_U_k[k].channel_matrix\n        h_R_k = self.h_R_k[k].channel_matrix\n        Psi = diag_to_vector(self.RIS.Phi)\n        H_c = vector_to_diag(h_R_k).H * self.H_UR.channel_matrix\n        G_k = self.UAV.G[:, k]\n        G_k_ = 0\n        if len(self.user_list) == 1:\n            G_k_ = np.mat(np.zeros((self.UAV.ant_num, 1), dtype=complex), dtype=complex)\n        else:\n            G_k_1 = self.UAV.G[:, 0:k]\n            G_k_2 = self.UAV.G[:, k+1:]\n            G_k_ = np.hstack((G_k_1, G_k_2))\n        alpha_k = math.pow(abs((h_U_k.H + Psi.H * H_c) * G_k), 2)\n        beta_k = math.pow(np.linalg.norm((h_U_k.H + Psi.H * H_c)*G_k_), 2) + dB_to_normal(noise_power) * 1e-3\n        return math.log10(1 + abs(alpha_k / beta_k))\n\n    def calculate_capacity_array_of_attacker_p(self, p):\n        \"\"\"\n        function used in update_channel_capacity to calculate one attacker capacities to K users\n        output is a K length np.array ,shape: (K,)\n        \"\"\"\n        K = len(self.user_list)\n        noise_power = self.attacker_list[p].noise_power\n        h_U_p = self.h_U_p[p].channel_matrix\n        h_R_p = self.h_R_p[p].channel_matrix\n        Psi = diag_to_vector(self.RIS.Phi)\n        H_c = vector_to_diag(h_R_p).H * self.H_UR.channel_matrix\n        if K == 1:\n            G_k = self.UAV.G\n            G_k_ = np.mat(np.zeros((self.UAV.ant_num, 1), dtype=complex), dtype=complex)\n            alpha_p = math.pow(abs((h_U_p.H + Psi.H * H_c) * G_k), 2)\n            beta_p = math.pow(np.linalg.norm((h_U_p.H + Psi.H * H_c)*G_k_), 2) + dB_to_normal(noise_power) * 1e-3\n            return np.array([math.log10(1 + abs(alpha_p / beta_p))])\n        else:\n            result = np.zeros(K)\n            for k in range(K):\n                G_k = G_k = self.UAV.G[:, k]\n                G_k_1 = self.UAV.G[:, 0:k]\n                G_k_2 = self.UAV.G[:, k+1:]\n                G_k_ = np.hstack((G_k_1, G_k_2))\n                alpha_p = math.pow(abs((h_U_p.H + Psi.H * H_c) * G_k), 2)\n                beta_p = math.pow(np.linalg.norm((h_U_p.H + Psi.H * H_c)*G_k_), 2) + dB_to_normal(noise_power) * 1e-3\n                result[k] = math.log10(1 + abs(alpha_p / beta_p))\n            return result\n\n    def calculate_secure_capacity_of_user_k(self, k=2):\n        \"\"\"\n        function used in update_channel_capacity to calculate the secure rate of user k\n        \"\"\"\n        user = self.user_list[k]\n        R_k_unsecure = user.capacity\n        R_k_maxeavesdrop = max(self.eavesdrop_capacity_array[:, k])\n        secrecy_rate= max(0, R_k_unsecure - R_k_maxeavesdrop)\n        return secrecy_rate\n\n    def get_system_action_dim(self):\n        \"\"\"\n        function used in main function to get the dimention of actions\n        \"\"\"\n        result = 0\n        # 0 UAV movement\n        result += 2\n        # 1 RIS reflecting elements\n        if self.if_with_RIS:\n            result += self.RIS.ant_num   \n        else:\n            result += 0\n        # 2 beamforming matrix dimention\n        result += 2 * self.UAV.ant_num * self.user_num \n        return result\n\n    def get_system_state_dim(self):\n        \"\"\"\n        function used in main function to get the dimention of states\n        \"\"\"\n        result = 0\n        # users' and attackers' comprehensive channel\n        result += 2 * (self.user_num + self.attacker_num) * self.UAV.ant_num\n        # UAV position\n        if self.if_UAV_pos_state:\n            result += 3\n        return result\n"
  },
  {
    "path": "env1.py",
    "content": "import numpy as np\nimport math\nimport cmath\nnp.random.seed(42)\n\nclass minimal_IRS_system():\n    def __init__(self, BS_M = 8, IRS_N_x = 4, IRS_N_y =2, K = 8, statistic = False):\n        self.M = BS_M  # number of BS antennas\n        self.N = IRS_N_x * IRS_N_y  # number of IRS elements\n        self.K = K  # number of users\n        self.z_BS = 0 # hight of BS antenas\n        self.BS_rotation = 0 #\n        self.BS_coordinate = np.array([0, 0, 30])\n        self.BS_elevation_angle = 0 \n        self.BS_max_power = 1000 # BS power constrain, W\n\n        self.BS_normal_vecter = np.array([math.cos(self.BS_rotation), math.sin(self.BS_rotation), math.sin(self.BS_elevation_angle)*np.linalg.norm([-math.cos(self.BS_rotation), math.sin(self.BS_rotation)])])\n        self.BS_normal_vecter = self.BS_normal_vecter / np.linalg.norm(self.BS_normal_vecter)\n\n        # channel parameters class\n        class channel_parameters():\n            \"\"\"\n            channel parameters\n            \"\"\"\n            def __init__(self):\n                self.noise_dBm = noise_dBm = -114       # channel noise, -114 dBm\n                self.noise_segma = 0                    # channel noise std,\n                self.d_0 = 1                            # path loss referance distance 1 m\n                self.rho_0 = 0.01                       # path loss parameter\n                self.alpha_BS_to_IRS = 3                       # path loss exponent\n                self.alpha_IRS_to_user = 2.5                      # path loss exponent\n                self.alpha_BS_to_user = 3.5                      # path loss exponent\n                self.K_BS = math.pow(10, 3/10)          # rician factor, 3dB\n                self.K_IRS = math.pow(10, 3/10)         # rician factor, 3dB\n        self.channel_parameters = channel_parameters()\n        self.channel_parameters.noise_segma = math.pow(self.dBm_to_W(self.channel_parameters.noise_dBm),0.5)\n\n        # IRS initial\n        class IRS():\n            \"\"\"\n            store user parameters\n            \"\"\"\n            def __init__(self):\n                self.IRS_N_x = IRS_N_x\n                self.IRS_N_y = IRS_N_y\n                self.IRS_N = IRS_N_x * IRS_N_y\n                self.x_IRS = 1000                       # position x of IRS, 1000 m\n                self.y_IRS = 0                          # position y of IRS, 0 m\n                self.z_IRS = 25                          # position z of IRS, 0 m\n                self.IRS_coordinate = np.array([self.x_IRS,self.y_IRS,self.z_IRS])\n                self.deltaD_to_lambda = 0.5\n                self.d_BS_to_IRS = np.linalg.norm(self.IRS_coordinate)\n                self.theta_IRS = math.atan(self.y_IRS / self.x_IRS) # theta_IRS\n                self.psi_IRS = 0                        # rotation of IRS , range: -pi/2 + theta_IRS  ~  pi/2 - theta_IRS\n                self.elevation_angle = 0                # elevation angle\n\n                self.IRS_normal_vecter = np.array([-math.cos(self.psi_IRS), math.sin(self.psi_IRS),math.sin(self.elevation_angle)*np.linalg.norm([-math.cos(self.psi_IRS), math.sin(self.psi_IRS)])])\n                self.IRS_normal_vecter = self.IRS_normal_vecter / np.linalg.norm(self.IRS_normal_vecter)\n                self.theta_BS_to_IRS = 0\n                self.psi_BS_to_IRS = 0\n                self.theta_IRS_to_BS = 0\n                self.psi_IRS_to_BS = 0\n                \n        self.IRS = IRS()\n        \n        # users initial\n        class user():\n            \"\"\"\n            store user parameters\n            \"\"\"\n            def __init__(self):\n                self.user_index = 0                     # user ID\n                self.distance_to_IRS = 10               # distance to IRS(need caculation), range of (10, 30)\n                self.psi = 0                            # psi angle to IRS\n                self.psi_IRS_surface = 0                # psi used to calculate the UPA response\n                self.x_user = 10                        # position x \n                self.y_user = 10                        # position y\n                self.z_user = 0                         # position z\n                self.user_coordinate = np.array([self.x_user,self.y_user,self.z_user])\n                self.distance_to_BS = np.linalg.norm(self.user_coordinate)                 # distance to BS\n                self.theta_user = 0                     # theta to BS\n                self.theta_IRS_to_kth_user = 0\n                self.psi_IRS_to_kth_user = 0\n                \n        mini_distance = 10                                                 # minimal distance for users to IRS, m\n        max_distance = 30                                                  # maxmial distance for users to IRS, m\n        statistic_distance_list = [10, 12, 14, 17, 20, 23, 25, 29]\n        statistic_angle_list = [0, math.pi/4, math.pi/2, 3*math.pi/4, math.pi, 5*math.pi/4, 3*math.pi/2, 7*math.pi/4]\n        stocastic_distance_list = np.random.uniform(mini_distance, max_distance, self.K)    # list of dis\n        stocastic_angle_list = np.random.uniform(- math.pi / 2, math.pi / 2, self.K)                      # list of angle\n        self.user_list = []\n        for index_user in range(K):\n            user_temp = user()\n            user_temp.user_index = index_user\n            if statistic == True:\n                user_temp.distance_to_IRS = statistic_distance_list[index_user]\n                user_temp.psi = statistic_angle_list[index_user]\n                user_temp.x_user = self.IRS.x_IRS - user_temp.distance_to_IRS * math.cos(user_temp.psi)\n                user_temp.y_user = self.IRS.y_IRS + user_temp.distance_to_IRS * math.sin(user_temp.psi)\n            else:\n                user_temp.distance_to_IRS = stocastic_distance_list[index_user]\n                user_temp.psi = stocastic_angle_list[index_user]\n                user_temp.x_user = self.IRS.x_IRS - user_temp.distance_to_IRS * math.cos(user_temp.psi)\n                user_temp.y_user = self.IRS.y_IRS + user_temp.distance_to_IRS * math.sin(user_temp.psi)\n            user_temp.psi_IRS_surface = user_temp.psi - self.IRS.psi_IRS\n            #user_temp.distance_to_BS = np.linalg.norm([user_temp.x_user, user_temp.y_user])\n            user_temp.theta_user = math.atan(user_temp.y_user / user_temp.x_user )\n            user_temp.user_coordinate = np.array([user_temp.x_user,user_temp.y_user,user_temp.z_user])\n            self.user_list.append(user_temp)\n            \n\n        # transmit signal\n        self.X_transmit = np.mat(np.ones((K, 1)))       # BS transmit signal, unit variance, shape: K X 1\n\n        # active beamforming at BS\n        self.G_beamforming = np.mat(np.ones((self.M, K),dtype=complex))    # BS beamforming matrix, shape: M X K\n        self.BS_max_power = self.calculate_total_transmit_power()\n        self.reset_G = np.mat(np.ones((self.M, K),dtype=complex))\n\n        # channal gain from BS to IRS\n        #self.H_BS_to_IRS = np.mat(np.ones((self.N, self.M)))      # channel gain BS to IRS, shape: N X M\n        self.H_BS_to_IRS = self.calculate_channel_BS_to_IRS(self.IRS)\n\n        # channal gain from IRS to users\n        # self.H_IRS_to_user = np.mat(np.ones((K, self.N),dtype=complex))    # channel gain IRS to user, shape: K X N\n        self.H_IRS_to_user = self.calculate_channel_IRS_to_user(self.IRS)\n\n        # IRS phase shifter parameter\n        self.Fai = np.mat(np.identity(self.N), dtype=complex)               # IRS phase shift \n        self.reset_Fai = np.mat(np.identity(self.N), dtype=complex)\n        # data rate matrex container\n        self.data_rate = np.mat(np.ones((K,1)))         # channal capacities for all K users\n\n        # calculate sum of all user data rate\n        self.sum_of_user_data_rates = self.calculate_data_rate()\n\n    def calculate_total_transmit_power(self):\n        \"\"\"\n        calculate total power\n        \"\"\"\n        G = self.G_beamforming\n        G_H = G.H\n        temp_power = 0\n        for m in range(self.M):\n            temp_power += G[m,:] * G_H[:, m]\n        return float(np.real(temp_power[0,0]))\n\n    def dBm_to_W(self, dBm):\n        mW = math.pow(10, dBm/10)\n        W = mW/1000.0\n        return W\n\n    def generate_noise(self, mu, segma, size = 1):\n        \"\"\"\n        generates noise of variance segma ^ 2\n        size can be a tuple like (3,3)\n\n        output :    a float of noise\n                    or an array with size size of average mu and std of segma\n        \"\"\"\n        result = np.random.normal(mu, segma, size)\n        if size == 1:\n            return float(result)\n        else:\n            return result\n\n    def calculate_received_signal_amplitude(self):\n        \"\"\"\n        calculate_received_signal_amplitude from BS to user\n\n        output :    a matix , size: K X 1\n        \"\"\"   \n        noise_array = self.generate_noise(0, self.channel_parameters.noise_segma, (self.K,1))\n        noise_matrix = np.mat(noise_array)\n\n        Y_matrix = self.H_IRS_to_user * self.Fai * self.H_BS_to_IRS * self.G_beamforming * self.X_transmit + noise_matrix\n        return Y_matrix\n\n    def calculate_SINR_of_kth_user(self, k):\n        \"\"\"\n        calculate kth user's SINR\n\n        output  :   a float\n        \"\"\"\n        expected_amplitude = abs(self.H_IRS_to_user[k] * self.Fai * self.H_BS_to_IRS * self.G_beamforming[:, k])\n        expected_power = math.pow(expected_amplitude, 2)\n\n        noise_power = 0\n        noise_power = noise_power + math.pow(self.channel_parameters.noise_segma, 2)\n        for n in range(self.K):\n            if n != k:\n                noise_power = noise_power + math.pow(abs(self.H_IRS_to_user[n] * self.Fai * self.H_BS_to_IRS * self.G_beamforming[:, n]),2)\n        return expected_power/noise_power\n\n    def calculate_data_rate(self):\n        \"\"\"\n        calculate K data rates\n\n        output :    sum of datarate\n        \"\"\"\n        for kth in range(self.K):\n            self.data_rate[kth][0] = math.log2(1 + self.calculate_SINR_of_kth_user(kth))\n        return float(self.data_rate.sum())\n\n    def calculate_theta_and_psi(self, ref_norm_vector, coordinate):\n        \"\"\"\n        calculate_theta_and_psi\n        \"\"\"\n        theta_ref_to_coor =  math.acos(abs(np.dot(coordinate/ np.linalg.norm(coordinate),ref_norm_vector)))\n        coordinate_xoy = coordinate - abs(math.cos(theta_ref_to_coor)) * np.linalg.norm(coordinate) * ref_norm_vector\n        temp_x_vector = [ref_norm_vector[1], -ref_norm_vector[0], 0]\n        psi_ref_to_coor = abs(np.dot(temp_x_vector, coordinate_xoy)) / (np.linalg.norm(temp_x_vector) * np.linalg.norm(coordinate_xoy))\n        if math.isnan(psi_ref_to_coor):\n            psi_ref_to_coor = 0\n        \n        return theta_ref_to_coor, psi_ref_to_coor\n    def calculate_channel_BS_to_IRS(self, IRS): # IRS elements spacing half wavelength\n        \"\"\"\n        denate the self.H_BS_to_IRS (N X M)\n        \"\"\"\n        temp_vertical_matrix = np.zeros((IRS.IRS_N,1), dtype=complex)\n        temp_horizontal_matrix = np.zeros((1, self.M), dtype=complex)\n        \"\"\"\n        theta_BS_to_IRS = math.acos(np.dot(IRS.IRS_coordinate/ np.linalg.norm(IRS.IRS_coordinate),self.BS_normal_vecter))\n        IRS_xoy = IRS.IRS_coordinate - abs(math.cos(theta_BS_to_IRS)) * np.linalg.norm(IRS.IRS_coordinate) * self.BS_normal_vecter\n        temp_x_BS = [self.BS_normal_vecter[1], -self.BS_normal_vecter[0], 0]\n        psi_BS_to_IRS = abs(np.dot(temp_x_BS, IRS_xoy)) / (np.linalg.norm(temp_x_BS) * np.linalg.norm(IRS_xoy))\n        if math.isnan(psi_BS_to_IRS):\n            psi_BS_to_IRS = 0\n        \n        theta_IRS_to_BS = math.acos(abs(np.dot(IRS.IRS_coordinate/ np.linalg.norm(IRS.IRS_coordinate), IRS.IRS_normal_vecter)))\n        BS_xoy = -1 * np.array(IRS.IRS_coordinate) - abs(math.cos(theta_IRS_to_BS)) * np.linalg.norm(IRS.IRS_coordinate) * IRS.IRS_normal_vecter\n        temp_x_IRS = [IRS.IRS_normal_vecter[1], -IRS.IRS_normal_vecter[0], 0]\n        psi_IRS_to_BS = abs(np.dot(temp_x_IRS, BS_xoy)) / (np.linalg.norm(temp_x_IRS) * np.linalg.norm(BS_xoy))\n        if math.isnan(psi_IRS_to_BS):\n            psi_IRS_to_BS = 0\n        # old way to calculate theta and psi\n        \"\"\"\n        theta_BS_to_IRS, psi_BS_to_IRS = self.calculate_theta_and_psi(self.BS_normal_vecter, IRS.IRS_coordinate - self.BS_coordinate)\n        theta_IRS_to_BS, psi_IRS_to_BS = self.calculate_theta_and_psi(IRS.IRS_normal_vecter, self.BS_coordinate - IRS.IRS_coordinate)\n        IRS.theta_BS_to_IRS = theta_BS_to_IRS\n        IRS.psi_BS_to_IRS = psi_BS_to_IRS\n        IRS.theta_IRS_to_BS = theta_IRS_to_BS\n        IRS.psi_IRS_to_BS = psi_IRS_to_BS\n\n        for i in range(IRS.IRS_N_y):\n            for j in range(IRS.IRS_N_x):\n                temp_vertical_matrix[i * IRS.IRS_N_x + j][0] = \\\n                cmath.exp(-1j * 2 * math.pi * IRS.deltaD_to_lambda * \\\n                (i * math.sin(theta_IRS_to_BS)* math.cos(psi_IRS_to_BS)+ \\\n                j * math.sin(theta_IRS_to_BS)* math.sin(psi_IRS_to_BS)) \\\n                )\n        for i in range(self.M):\n            temp_horizontal_matrix[0][i] = \\\n            cmath.exp(-1j * 2 * math.pi * IRS.deltaD_to_lambda * \\\n            (i * math.sin(theta_BS_to_IRS)* math.cos(psi_BS_to_IRS)) \\\n            )\n\n        temp_horizontal_matrix = np.mat(temp_horizontal_matrix)\n        temp_vertical_matrix = np.mat(temp_vertical_matrix)\n        H_LOS = temp_vertical_matrix * temp_horizontal_matrix\n\n        H = math.pow(self.channel_parameters.rho_0* (math.pow(IRS.d_BS_to_IRS/self.channel_parameters.d_0, - self.channel_parameters.alpha_BS_to_IRS)), 0.5) \\\n        * (H_LOS)\n        return H\n        \n    def calculate_channel_IRS_to_user(self, IRS):\n        \"\"\"\n        channel gain IRS to user, shape: K X N\n        \"\"\"\n        temp_array = np.zeros((self.K, self.N), dtype=complex)\n\n        for kth in range(self.K):\n            theta_IRS_to_kth_user, psi_IRS_to_kth_user = self.calculate_theta_and_psi(IRS.IRS_normal_vecter, self.user_list[kth].user_coordinate - IRS.IRS_coordinate)\n            self.user_list[kth].theta_IRS_to_kth_user = theta_IRS_to_kth_user\n            self.user_list[kth].psi_IRS_to_kth_user = psi_IRS_to_kth_user\n            for i in range(IRS.IRS_N_y):\n                for j in range(IRS.IRS_N_x):\n                    temp_array[kth][i * IRS.IRS_N_x + j] = \\\n                    cmath.exp(-1j * 2 * math.pi * IRS.deltaD_to_lambda * \\\n                    (i * math.sin(theta_IRS_to_kth_user)* math.cos(psi_IRS_to_kth_user)+ \\\n                    j * math.sin(theta_IRS_to_kth_user)* math.sin(psi_IRS_to_kth_user)) \\\n                    )\n        H_LOS = temp_array\n        for kth in range(self.K):\n            H_LOS[kth] = math.pow(self.channel_parameters.rho_0* (math.pow(np.linalg.norm(IRS.IRS_coordinate - self.user_list[kth].user_coordinate)/self.channel_parameters.d_0, - self.channel_parameters.alpha_BS_to_IRS)), 0.5) \\\n            * (H_LOS[kth])\n        \n        H = np.mat(H_LOS)\n        return H\n\n    def reset(self):\n        \"\"\"\n        RL implement, \n        \"\"\"\n        # 1 reset all parameters\n        self.G_beamforming = self.reset_G\n        self.Fai = self.reset_Fai\n        # 2 get state\n        return self.get_state()\n\n    def apply_action(self, action):\n        \"\"\"\n        apply action\n        \"\"\"\n        # 0 parameters and all kinds of matrix\n        K = self.K\n        M = self.M\n        N = self.N\n        # 1 apply action\n        # 1.1 divide action list into two parts\n        # 1.1.1 beamforming part, MXK\n        action_beamforming = action[0 : 2*M*K]\n        # 1.1.2 IRS reflecting coinfetions\n        action_Fai = action[2*M*K : 2*M*K + 2*N]\n\n        # 1.2 apply action\n        # 1.2.1 beamforming part, MXK\n        for m in range(M):\n            for k in range(K):\n                index = m * K + k\n                temp_Gmk = action_beamforming[2*index] + 1j * action_beamforming[2*index + 1]\n                self.G_beamforming[m, k] = temp_Gmk\n        # 1.2.2 IRS reflecting coinfetions\n        for n in range(N):\n            temp_Fainn = action_Fai[2*n] + 1j * action_Fai[2*n + 1]\n            self.Fai[n, n] = temp_Fainn\n\n        return True\n    def get_state(self):\n        \"\"\"\n        get current state\n        \"\"\"\n        # 0 parameters and all kinds of matrix\n        K = self.K\n        M = self.M\n        N = self.N\n        new_state_dims = 2*K + 2*K**2 + 2*N + 2*M*K + 2*N*M + 2*K*N\n        \n        # 1 get new state\n        new_state = []\n        G = self.G_beamforming\n        G_H = self.G_beamforming.H\n        H_IRS_to_user = self.H_IRS_to_user\n        Fai = self.Fai\n        H_BS_to_IRS = self.H_BS_to_IRS\n        # 1.1 transmit power to k users, 2K\n        state_tran_power_BS_to_users = [] # result container\n        for kth in range(K):\n            state_tran_power_BS_to_users.append(\\\n            math.pow(float(np.real(G_H[kth,:] * G[:,kth])),2))\n            state_tran_power_BS_to_users.append(\\\n            math.pow(float(np.imag(G_H[kth,:] * G[:,kth])),2))\n        \n        # 1.2 received power for users, 2K^2\n        state_action_receive_power_for_users = []\n        for k1 in range(K):\n            user_k1_received_power_from_all_other_users = \\\n            H_IRS_to_user[k1, :] * Fai * H_BS_to_IRS * G\n            for k2 in range(K):\n                state_action_receive_power_for_users.append(\\\n                math.pow(np.real(user_k1_received_power_from_all_other_users[0, k2]), 2))\n                state_action_receive_power_for_users.append(\\\n                math.pow(np.imag(user_k1_received_power_from_all_other_users[0, k2]), 2))\n                \n        # 1.3 beamforming at BS, 2*M*K+2*N\n        state_beamforming_at_BS = []\n        for m in range(M):\n            for k in range(K):\n                state_beamforming_at_BS.append(\\\n                math.pow(np.real(G[m, k]), 2))\n                state_beamforming_at_BS.append(\\\n                math.pow(np.imag(G[m, k]), 2))\n        for n in range(N):\n            state_beamforming_at_BS.append(\\\n            math.pow(np.real(Fai[n, n]), 2))\n            state_beamforming_at_BS.append(\\\n            math.pow(np.imag(Fai[n, n]), 2))\n\n        # 1.4 channel state at both BS-IRS & IRS-users, 2*N*M+2*K*M\n        state_channel_BS_IRS_and_IRS_users = []\n        for n in range(N):\n            for m in range(M):\n                state_channel_BS_IRS_and_IRS_users.append(\\\n                math.pow(np.real(H_BS_to_IRS[n, m]), 2))\n                state_channel_BS_IRS_and_IRS_users.append(\\\n                math.pow(np.imag(H_BS_to_IRS[n, m]), 2))\n        for k in range(K):\n            for n in range(N):\n                state_channel_BS_IRS_and_IRS_users.append(\\\n                math.pow(np.real(H_IRS_to_user[k, n]), 2))\n                state_channel_BS_IRS_and_IRS_users.append(\\\n                math.pow(np.imag(H_IRS_to_user[k, n]), 2))\n        # 1.5 finnish getting state\n        new_state.append(state_tran_power_BS_to_users)\n        new_state.append(state_action_receive_power_for_users)\n        new_state.append(state_beamforming_at_BS)\n        new_state.append(state_channel_BS_IRS_and_IRS_users)\n\n        result = state_tran_power_BS_to_users + \\\n        state_action_receive_power_for_users + \\\n        state_beamforming_at_BS + \\\n        state_channel_BS_IRS_and_IRS_users\n        return result\n        \n\n    def step(self, action):\n        \"\"\"\n        RL function ,state is a 1 X (2K+2K^2+2N+2MK+2NM+2KN) vecter\n        \"\"\"\n        done = False\n        # 1 get new state\n        new_state = self.get_state()\n        # 2 apply action \n        self.apply_action(action)\n        # 3 judge if done\n        total_power = self.calculate_total_transmit_power()\n        if total_power > self.BS_max_power:\n            done = True\n        # 4 get other reward\n        reward = 0\n        if total_power > self.BS_max_power:\n            reward = np.real(self.BS_max_power - total_power)\n            # reward = np.real(self.BS_max_power[0,0] - total_power[0,0] )/1000\n        else:\n            reward = self.calculate_data_rate()*100\n\n        info = 0\n        return new_state, reward, done, info\n\n    def render(self):\n        \"\"\"\n        show system function\n        \"\"\"\n\n\n        return True\n     "
  },
  {
    "path": "load_and_plot.py",
    "content": "import matplotlib.pyplot as plt\nimport numpy as np\nimport cmath\nfrom scipy.io import loadmat, savemat\nimport pandas as pd\nimport os\nimport copy\nimport math\nimport csv\n\nimport argparse\n\n# get argument from user\nparser = argparse.ArgumentParser()\nparser.add_argument('--path', type = str, required = False, default=None, help='the path where the training/simulation data is stored')\nparser.add_argument('--ep-num', type = int, required = False, default=300, help='total number of episodes')\n\n\n# extract argument\nargs = parser.parse_args()\nSTORE_PATH = args.path\nEP_NUM = args.ep_num\n\n\n######################################################\n# new for energy \n# energy related parameters of rotary-wing UAV\n# based on Energy Minimization in Internet-of-Things System Based on Rotary-Wing UAV\nP_i = 790.6715\nP_0 = 580.65\nU2_tip = (200) ** 2\ns = 0.05\nd_0 = 0.3\np = 1.225\nA = 0.79\ndelta_time = 0.1 #0.1/1000 #0.1ms\n\n# add ons hover veloctiy\n# based on https://www.intechopen.com/chapters/57483\nm = 1.3 # mass: assume 1.3kg https://www.droneblog.com/average-weights-of-common-types-of-drones/#:~:text=In%20most%20cases%2C%20toy%20drones,What%20is%20this%3F\ng = 9.81 # gravity\nT = m * g # thrust\nv_0 = (T / (A * 2 * p)) ** 0.5\n\ndef get_energy_consumption(v_t):\n    '''\n    arg\n    1) v_t = displacement per time slot\n    '''\n    energy_1 = P_0 \\\n                + 3 * P_0 * (abs(v_t)) ** 2 / U2_tip \\\n                + 0.5 * d_0 * p * s * A * (abs(v_t))**3\n    \n    energy_2 = P_i * ((\n                    (1 + (abs(v_t) ** 4) / (4 * (v_0 ** 4))) ** 0.5 \\\n                    - (abs(v_t) ** 2) / (2 * (v_0 **2)) \\\n                ) ** 0.5)\n    \n    energy = delta_time * (energy_1 + energy_2)\n    return energy \n\nENERGY_MIN = get_energy_consumption(0.25)\nENERGY_MAX = get_energy_consumption(0)\n\n######################################################\n\n\n# modified from data_manager.py\ninit_data_file = 'data/init_location.xlsx'\ndef read_init_location(entity_type = 'user', index = 0):\n    if entity_type == 'user' or 'attacker' or 'RIS' or 'RIS_norm_vec' or 'UAV':\n        return np.array([\\\n        pd.read_excel(init_data_file, sheet_name=entity_type)['x'][index],\\\n        pd.read_excel(init_data_file, sheet_name=entity_type)['y'][index],\\\n        pd.read_excel(init_data_file, sheet_name=entity_type)['z'][index]])\n    else:\n        return None\n\n\n# load and plot everything\nclass LoadAndPlot(object):\n    \"\"\"\n    load date and plot 2022-07-22 16_16_26\n    \"\"\"\n    def __init__(self, store_path, \\\n                       user_num = 2, attacker_num = 1, RIS_ant_num = 4, \\\n                       ep_num = EP_NUM, step_num = 100): # RIS_ant_num = 16 (not true)\n\n        self.color_list = ['b', 'c', 'g', 'k', 'm', 'r', 'y']\n        self.store_path = store_path + '//'\n        self.user_num = user_num\n        self.attacker_num = attacker_num\n        self.RIS_ant_num = RIS_ant_num\n        self.ep_num = ep_num\n        self.step_num = step_num\n\n        self.all_steps = self.load_all_steps()\n\n\n    def load_one_ep(self, file_name):\n        m = loadmat(self.store_path + file_name)\n        return m\n\n\n    def load_all_steps(self):\n        result_dic = {}\n        result_dic.update({'reward':[]})\n\n        result_dic.update({'user_capacity':[]})\n        for i in range(self.user_num):\n            result_dic['user_capacity'].append([])\n\n        result_dic.update({'secure_capacity':[]})\n        for i in range(self.user_num):\n            result_dic['secure_capacity'].append([])\n\n        result_dic.update({'attaker_capacity':[]})\n        for i in range(self.attacker_num):\n            result_dic['attaker_capacity'].append([])\n        \n        result_dic.update({'RIS_elements':[]})\n        for i in range(self.RIS_ant_num):\n            result_dic['RIS_elements'].append([])\n\n        for ep_cnt in range(self.ep_num):\n            mat_ep = self.load_one_ep(\"simulation_result_ep_\" + str(ep_cnt) + \".mat\")\n\n            one_ep_reward = mat_ep[\"result_\" + str(ep_cnt)][\"reward\"][0][0]\n            result_dic['reward'] += list(one_ep_reward[:, 0])\n\n            one_ep_user_capacity = mat_ep[\"result_\" + str(ep_cnt)][\"user_capacity\"][0][0]\n            for i in range(self.user_num):\n                result_dic['user_capacity'][i] += list(one_ep_user_capacity[:, i])\n            \n            one_ep_secure_capacity = mat_ep[\"result_\" + str(ep_cnt)][\"secure_capacity\"][0][0]\n            for i in range(self.user_num):\n                result_dic['secure_capacity'][i] += list(one_ep_secure_capacity[:, i])\n            \n            one_ep_attaker_capacity = mat_ep[\"result_\" + str(ep_cnt)][\"attaker_capacity\"][0][0]\n            for i in range(self.attacker_num):\n                result_dic['attaker_capacity'][i] += list(one_ep_attaker_capacity[:, i])\n\n            one_ep_RIS_first_element = mat_ep[\"result_\" + str(ep_cnt)][\"reflecting_coefficient\"][0][0]\n            for i in range(self.RIS_ant_num):\n                result_dic['RIS_elements'][i] += list(one_ep_RIS_first_element[:, i])\n\n        return result_dic\n\n\n    def plot(self):\n        \"\"\"\n        plot result\n        b--blue c--cyan(青色） g--green k--black m--magenta（紫红色） r--red w--white y--yellow \n        \"\"\"\n        if not os.path.exists(self.store_path + 'plot'):\n            os.makedirs(self.store_path + 'plot')\n            os.makedirs(self.store_path + 'plot/RIS')\n\n        color_list = ['b', 'g', 'c', 'k', 'm', 'r', 'y']\n        \n\n        ###############################\n        # read step counts per episode\n        ###############################\n        step_num_per_episode = []\n        with open(self.store_path + 'step_num_per_episode.csv', newline='') as csvfile:\n            reader = csv.reader(csvfile)\n            for row in reader:\n                step_num_per_episode.append(int(row[0]))\n        \n        ###############################\n        # plot reward\n        ###############################\n        fig = plt.figure('reward')\n        plt.plot(range(len(self.all_steps['reward'])), self.all_steps['reward'])\n        plt.xlabel(\"Time Steps ($t$)\")\n        plt.ylabel(\"Reward\")\n        plt.savefig(self.store_path + 'plot/reward.png')\n        plt.cla()\n        \n        \n        ###############################\n        # plot secure capacity\n        ###############################\n        fig = plt.figure('secure_capacity')\n        for i in range(self.user_num):\n            plt.plot(range(len(self.all_steps['secure_capacity'][i])), self.all_steps['secure_capacity'][i], c=color_list[i])\n        plt.legend(['user_' + str(i) for i in range(self.user_num)])\n        plt.xlabel(\"Time Steps ($t$)\")\n        plt.ylabel(\"Secure Capacity\")\n        plt.savefig(self.store_path + 'plot/secure_capacity.png')\n        plt.cla()\n\n        \n        ###############################\n        # plot average sum secrecy rate of each episode\n        ###############################\n        fig = plt.figure('average_sum_secrecy_rate')\n        sum_secrecy_rate = np.array(self.all_steps['secure_capacity'])\n        sum_secrecy_rate = np.sum(sum_secrecy_rate, axis = 0)\n        average_sum_secrecy_rate = []\n        ssr = []\n        j = 0\n        for i in range(self.ep_num):\n            ssr_one_episode = sum_secrecy_rate[j:j+step_num_per_episode[i]] # ssr means Sum Secrecy Rate\n            #print(j, j+step_num_per_episode[i])\n            j = j+step_num_per_episode[i]\n            ssr.append(ssr_one_episode)\n            try:\n                _ = sum(ssr_one_episode) / len(ssr_one_episode)\n            except:\n                _ = 0\n            average_sum_secrecy_rate.append(_)\n        plt.plot(range(len(average_sum_secrecy_rate)), average_sum_secrecy_rate)\n        plt.xlabel(\"Episodes (Ep)\")\n        plt.ylabel(\"Average Sum Secrecy Rate\")\n        plt.savefig(self.store_path + 'plot/average_sum_secrecy_rate.png')\n        plt.cla()\n\n        print()\n        print('###########################################################')\n        print('Metrics\\t\\t\\tLast Episode\\tMax Values Reached')\n        print('###########################################################')\n        print('SSR (bits/s/Hz)\\t\\t{:.2f}\\t\\t{:.2f}'.format(average_sum_secrecy_rate[-1], max(average_sum_secrecy_rate)))\n        \n\n        ###############################\n        # plot secrecy energy efficient\n        ###############################\n        fig = plt.figure('average_secrecy_energy_efficiency')\n\n        # get init location\n        init_uav_coord = read_init_location(entity_type = 'UAV')\n        init_user_coord_0 = read_init_location(entity_type = 'user', index=0)\n        init_user_coord_1 = read_init_location(entity_type = 'user', index=1)\n        \n        ep_num = EP_NUM\n        energies = []\n        for i in range(ep_num):\n            # read the mat file\n            filename = f'simulation_result_ep_{i}.mat'\n            filename = os.path.join(STORE_PATH, filename)\n            data = loadmat(filename)\n        \n            # v_ts\n            energies_one_episode = []\n        \n            # loop all uav movt\n            uav_movt = data[f'result_{i}'][0][0][-1]\n            for j in range(uav_movt.shape[0]):\n                move_x = uav_movt[j][0]\n                move_y = uav_movt[j][1]\n                v_t = (move_x ** 2 + move_y ** 2) ** 0.5\n                energy = get_energy_consumption(v_t / delta_time)\n                energies_one_episode.append(energy)\n            energies.append(energies_one_episode)\n                \n        average_see = []\n        for ssr_one_episode, energies_one_episode in zip(ssr, energies):\n            ssr_one_episode = ssr_one_episode[:len(energies_one_episode)]\n            energies_one_episode = energies_one_episode[:len(ssr_one_episode)]\n            #print(len(ssr_one_episode), len(energies_one_episode))\n            \n            try:\n                see = np.array(ssr_one_episode) / np.array(energies_one_episode)\n                average_see.append(sum(see)/len(see))\n            except:\n                average_see.append(0)\n\n        \n        plt.plot(range(len(average_see)), average_see)\n        plt.xlabel(\"Episodes (Ep)\")\n        plt.ylabel(\"Average Secrecy Energy Efficiency\")\n        plt.savefig(self.store_path + 'plot/average_secrecy_energy_efficiency.png')\n        plt.cla() \n        \n        print('Energy (kJ)\\t\\t{:.2f}\\t\\t{:.2f}'.format(sum(energies[-1])/1000, sum(energies[np.argmax(average_see)])/1000))\n        print('SEE (bits/s/Hz/kJ)\\t{:.2f}\\t\\t{:.2f}'.format(average_see[-1]*1000, max(average_see)*1000))\n        print('\\nThe final performance is evalulated based on the Last Episode (where exploration=0)\\n')\n        \n        \n        ###############################\n        # plot user capacity\n        ###############################\n        fig = plt.figure('user_capacity')\n        for i in range(self.user_num):\n            plt.plot(range(len(self.all_steps['user_capacity'][i])), self.all_steps['user_capacity'][i], c=color_list[i])\n        plt.legend(['user_' + str(i) for i in range(self.user_num)])\n        plt.xlabel(\"Time Steps ($t$)\")\n        plt.ylabel(\"User Capacity\")\n        plt.savefig(self.store_path + 'plot/user_capacity.png')\n        plt.cla()\n\n        \n        ###############################\n        # plot attacker capacity\n        ###############################\n        fig = plt.figure('attaker_capacity')\n        for i in range(self.attacker_num):\n            plt.plot(range(len(self.all_steps['attaker_capacity'][i])), self.all_steps['attaker_capacity'][i], c=color_list[i])\n        plt.legend(['attacker_' + str(i) for i in range(self.attacker_num)])\n        plt.xlabel(\"Time Steps ($t$)\")\n        plt.ylabel(\"Attack Capacity\")\n        plt.savefig(self.store_path + 'plot/attaker_capacity.png')\n        plt.close('all')\n        \n        \n        ###############################\n        # plot ris\n        ###############################\n        for i in range(self.RIS_ant_num):\n            self.plot_one_RIS_element(i)\n            \n        \n        ###############################\n        # plot trajectory\n        ###############################\n        self.plot_trajectory()\n\n    \n    def plot_one_RIS_element(self, index):\n        \"\"\"\n        docstring\n        \"\"\"\n        ax_real_imag = plt.subplot(1,1,1)\n        ax_pase = ax_real_imag.twinx()\n        #plt.ylim(ymax = 1, ymin = -1)\n        #plt.xlim(xmax = 10000 , xmin = 10000 - 100)\n        ax_real_imag.plot(range(len(self.all_steps['RIS_elements'][index])), np.real(self.all_steps['RIS_elements'][index]), c = self.color_list[0])\n        ax_real_imag.plot(range(len(self.all_steps['RIS_elements'][index])), np.imag(self.all_steps['RIS_elements'][index]), c = self.color_list[1])\n        phase_list = []\n        for complex_num in self.all_steps['RIS_elements'][index]:\n            phase_list.append(cmath.phase(complex_num))\n        plt.ylim(ymax = cmath.pi, ymin = -cmath.pi)\n        ax_pase.plot(range(len(self.all_steps['RIS_elements'][index])), phase_list, c = self.color_list[2])\n#        plt.xlabel(\"Time Steps ($t$)\")\n#        plt.ylabel(\"RIS Dimension\")\n        # plt.set_ylabel(\"position\")\n        # plt.set_ylabel(\"position\")\n        # plt.set_xlabel(\"Time Steps ($t$)\")\n        plt.savefig(self.store_path + 'plot/RIS/RIS_' + str(index) + '_element.png')\n        plt.close('all')\n        pass\n\n        \n    def plot_trajectory(self):\n        # get init location\n        init_uav_coord = read_init_location(entity_type = 'UAV')\n        init_user_coord_0 = read_init_location(entity_type = 'user', index=0)\n        init_user_coord_1 = read_init_location(entity_type = 'user', index=1)\n        \n        ep_num = EP_NUM\n        interval = int(0.2 * EP_NUM)\n        ep_list = [0] + [i for i in range(20-1, ep_num, interval)]\n        if EP_NUM - 1 not in ep_list: ep_list.append(EP_NUM - 1)\n        color_list_template = ['b', 'g', 'c', 'k', 'm', 'r', 'y', 'black', 'red']\n        \n        \n        color_list = copy.deepcopy(color_list_template)\n        for i in ep_list:\n            # read the mat file\n            filename = f'simulation_result_ep_{i}.mat'\n            filename = os.path.join(STORE_PATH, filename)\n            data = loadmat(filename)\n        \n            # uav movt\n            uav_coord = [ [init_uav_coord[0]], [init_uav_coord[1]] ]\n        \n            uav_movt = data[f'result_{i}'][0][0][-1]\n            for j in range(uav_movt.shape[0]):\n                move_x = uav_movt[j][0]\n                move_y = uav_movt[j][1]\n        \n                prev_x = uav_coord[0][-1]\n                prev_y = uav_coord[1][-1]\n        \n                current_x = prev_x + move_x\n                current_y = prev_y + move_y\n        \n                uav_coord[0].append(current_x)\n                uav_coord[1].append(current_y)\n            plt.plot(uav_coord[1],uav_coord[0], c=color_list.pop(0))\n        \n        # user 0 movt\n        direction_fai = -1/2*math.pi \n        distance_delta_d = 0.2\n        user_coord_0 = [ [init_user_coord_0[0]], [init_user_coord_0[1]] ]\n        #color_list = copy.deepcopy(color_list_template)\n        for j in range(uav_movt.shape[0]):\n            delta_x = distance_delta_d * math.cos(direction_fai)\n            delta_y = distance_delta_d * math.sin(direction_fai)\n        \n            prev_x = user_coord_0[0][-1]\n            prev_y = user_coord_0[1][-1]\n        \n            current_x = prev_x + delta_x\n            current_y = prev_y + delta_y\n        \n            user_coord_0[0].append(current_x)\n            user_coord_0[1].append(current_y)\n        plt.plot(user_coord_0[1],user_coord_0[0], c=color_list.pop(0))\n        plt.plot(user_coord_0[1][0], user_coord_0[0][0], marker=\"o\", markersize=10, markeredgecolor=\"red\", markerfacecolor=\"red\")\n        \n        \n        # user 1 movt\n        direction_fai = -1/2*math.pi \n        distance_delta_d = 0.2\n        user_coord_0 = [ [init_user_coord_1[0]], [init_user_coord_1[1]] ]\n        #color_list = copy.deepcopy(color_list_template)\n        for j in range(uav_movt.shape[0]):\n            delta_x = distance_delta_d * math.cos(direction_fai)\n            delta_y = distance_delta_d * math.sin(direction_fai)\n        \n            prev_x = user_coord_0[0][-1]\n            prev_y = user_coord_0[1][-1]\n        \n            current_x = prev_x + delta_x\n            current_y = prev_y + delta_y\n        \n            user_coord_0[0].append(current_x)\n            user_coord_0[1].append(current_y)\n        plt.plot(user_coord_0[1],user_coord_0[0], c=color_list.pop(0))\n        plt.plot(user_coord_0[1][0], user_coord_0[0][0], marker=\"o\", markersize=10, markeredgecolor=\"red\", markerfacecolor=\"red\")\n        \n        plt.legend(ep_list)\n        plt.grid()\n        plt.xlim(0, 50)\n        plt.ylim(-10, 30)\n        plt.gca().invert_yaxis()\n        plt.savefig(self.store_path + 'plot/trajectory.png')\n        plt.cla()\n\n\n    def restruct(self):\n        savemat(self.store_path + 'all_steps.mat',self.all_steps)\n        return 0\nif __name__ == '__main__':\n    LoadPlotObject = LoadAndPlot(\n        store_path = STORE_PATH,\n        )\n    LoadPlotObject.plot()\n    LoadPlotObject.restruct()\n\n    \n\n"
  },
  {
    "path": "main_RIS.py",
    "content": "from env import *\r\nepisode_num= 100\r\nstep_num = 100\r\na= int(5/5 * episode_num * step_num)\r\nprint(a)"
  },
  {
    "path": "main_ref.py",
    "content": "from ddpg import Agent\nfrom env1 import minimal_IRS_system # orginal\nfrom env import MiniSystem\nimport numpy as np\n#from utils import plotLearning\nimport matplotlib.pyplot as plt\nimport os\nfrom data_manager import DataManager\nos.environ[\"KMP_DUPLICATE_LIB_OK\"]=\"TRUE\"\n\nIRS_system = minimal_IRS_system(K = 1)\n#IRS_system = MiniSystem()\nK = IRS_system.K\nM = IRS_system.M\nN = IRS_system.N\nRL_state_dims = 2*K + 2*K**2 + 2*N + 2*M*K + 2*N*M + 2*K*N\nRL_input_dims = RL_state_dims\nRL_action_dims = 2 * (M * K) + N\n\nsteps_per_ep = 200\nalpha_actor_learning_rate = 0.001\nbeta_critic_learning_rate = 0.001\nagent = Agent(alpha=alpha_actor_learning_rate, beta=beta_critic_learning_rate, input_dims=[RL_input_dims], tau=0.001, env=IRS_system,\n              batch_size=64,  layer1_size=400 * 2, layer2_size=300 * 2, n_actions=RL_action_dims)\n\nscores = []\nfor i in range(1000):\n    observersion = IRS_system.reset()\n    done = False\n    done_sys = False\n    score = 0\n    cnt_in_one_epi = 0\n    best_bit_per_Hz = 0\n    draw_bit_rate_list = []\n    draw_tran_power_list = []\n    #draw_bit_rate_one_element = {'if_exceed_max_power':False,'bit_rate' : 0}\n    while not done:\n        cnt_in_one_epi += 1\n        if cnt_in_one_epi > 500:\n            done = True\n        action = agent.choose_action(observersion)\n        new_state, reward, done_sys , info = IRS_system.step(action)\n\n        bit_per_Hz = IRS_system.calculate_data_rate()   \n        #draw_bit_rate_one_element['bit_rate'] = bit_per_Hz \n        #draw_bit_rate_one_element['if_exceed_max_power']=done_sys\n        draw_bit_rate_list.append(bit_per_Hz)\n\n        total_power = IRS_system.calculate_total_transmit_power()\n        draw_tran_power_list.append(total_power)\n        if done_sys == False:# if not exceed max transmit power            \n            if bit_per_Hz > best_bit_per_Hz:\n                best_bit_per_Hz = bit_per_Hz\n        agent.remember(observersion, action, reward, new_state, int(done))\n        agent.learn()\n        score += reward\n        observersion = new_state\n        IRS_system.render()\n    plt.cla()\n    plt.plot(range(len(draw_bit_rate_list)), draw_bit_rate_list, color = 'green')\n    plt.plot(range(len(draw_tran_power_list)), draw_tran_power_list, color = 'red')\n\n    # plt.show()\n    filename_i =os.path.abspath(os.curdir) + '\\\\main_foder\\\\image_result\\\\' + str(i) + '.png'\n    plt.savefig(filename_i)\n    scores.append(score)\n    #if i % 25 == 0:\n        #agent.save_models()\n\n    print('episode ', i, 'score %.2f' % score, 'best sum rate %.3f bit/s/Hz' % best_bit_per_Hz,\n          'trailing 100 games avg %.4f' % np.mean(scores[-100:]))\nfilename = 'C:\\\\demo\\\\IRS_DDPG_minimal\\\\main_foder\\\\LunarLander-alpha000025-beta00025-400-300.png'\n# plotLearning(scores, filename, window=100)\n"
  },
  {
    "path": "main_train.py",
    "content": "# debug field\nimport os\nos.environ[\"KMP_DUPLICATE_LIB_OK\"]=\"TRUE\"\n\nimport argparse\n\n# get argument from user\nparser = argparse.ArgumentParser()\nparser.add_argument('--drl', type = str, required = True, default='td3', help=\"which drl algo would you like to choose ['ddpg', 'td3']\")\nparser.add_argument('--reward', type = str, required = True, default='see', help=\"which reward would you like to implement ['ssr', 'see']\")\nparser.add_argument('--seeds', type = int, required = False, default=None,  nargs='+', help=\"what seed(s) would you like to use for DRL 1 and 2, please provide in one or two int\")\nparser.add_argument('--ep-num', type = int, required = False, default=300, help=\"how many episodes do you want to train your DRL\")\nparser.add_argument('--trained-uav', default=False, action='store_true', help='use trained uav instead of retraining')\n\nargs = parser.parse_args()\nDRL_ALGO = args.drl\nREWARD_DESIGN = args.reward\nSEEDS = args.seeds\nEPISODE_NUM = args.ep_num\nTRAINED_UAV = args.trained_uav\n\n# process the argument\nassert DRL_ALGO in ['ddpg', 'td3'], \"drl must be ['ddpg', 'td3']\"\nassert REWARD_DESIGN in ['ssr', 'see'], \"reward must be ['ssr', 'see']\"\nif SEEDS is not None:\n    assert len(SEEDS) in [1, 2] and isinstance(SEEDS[0], int) and isinstance(SEEDS[-1], int), \"seeds must be a list of 1 or 2 integer\"\n\nif DRL_ALGO == 'td3':\n    from td3 import Agent\nelif DRL_ALGO == 'ddpg':\n    from ddpg import Agent\nimport ddpg\n\nfrom env import MiniSystem\nimport numpy as np\nimport math\nimport time\nimport torch\n\n# 1 init system model\nepisode_num = EPISODE_NUM # recommend to be 300\nepisode_cnt = 0\nstep_num = 100\n\nproject_name = f'trained_uav/{DRL_ALGO}_{REWARD_DESIGN}' if TRAINED_UAV else f'scratch/{DRL_ALGO}_{REWARD_DESIGN}'\n\nsystem = MiniSystem(\n    user_num=2,\n    RIS_ant_num=4,\n    UAV_ant_num=4,\n    if_dir_link=1,\n    if_with_RIS=True,\n    if_move_users=True,\n    if_movements=True,\n    reverse_x_y=(False, False),\n    if_UAV_pos_state = True,\n    reward_design = REWARD_DESIGN,\n    project_name = project_name,\n    step_num = step_num\n    )\n\nif_Theta_fixed = False\nif_G_fixed = False\nif_BS = False\nif_robust = True\n\n\n# 2 init RL Agent\nagent_1_param_dic = {}\nagent_1_param_dic[\"alpha\"] = 0.0001\nagent_1_param_dic[\"beta\"] = 0.001\nagent_1_param_dic[\"input_dims\"] = system.get_system_state_dim()\nagent_1_param_dic[\"tau\"] = 0.001\nagent_1_param_dic[\"batch_size\"] = 64\nagent_1_param_dic[\"n_actions\"] = system.get_system_action_dim() - 2\nagent_1_param_dic[\"action_noise_factor\"] = 0.1\nagent_1_param_dic[\"memory_max_size\"] = int(5/5 * episode_num * step_num) #/2\nagent_1_param_dic[\"agent_name\"] = \"G_and_Phi\"\nagent_1_param_dic[\"layer1_size\"] = 800\nagent_1_param_dic[\"layer2_size\"] = 600\nagent_1_param_dic[\"layer3_size\"] = 512\nagent_1_param_dic[\"layer4_size\"] = 256\n\nagent_2_param_dic = {}\nagent_2_param_dic[\"alpha\"] = 0.0001\nagent_2_param_dic[\"beta\"] = 0.001\nagent_2_param_dic[\"input_dims\"] = 3\nagent_2_param_dic[\"tau\"] = 0.001\nagent_2_param_dic[\"batch_size\"] = 64\nagent_2_param_dic[\"n_actions\"] = 2\nagent_2_param_dic[\"action_noise_factor\"] = 0.5\nagent_2_param_dic[\"memory_max_size\"] = int(5/5 * episode_num * step_num) #/2\nagent_2_param_dic[\"agent_name\"] = \"UAV\"\nagent_2_param_dic[\"layer1_size\"] = 400\nagent_2_param_dic[\"layer2_size\"] = 300\nagent_2_param_dic[\"layer3_size\"] = 256\nagent_2_param_dic[\"layer4_size\"] = 128\n\nif SEEDS is not None:\n    torch.manual_seed(SEEDS[0]) # 1\n    torch.cuda.manual_seed_all(SEEDS[0]) # 1\nagent_1 = Agent(\n    alpha       = agent_1_param_dic[\"alpha\"],\n    beta        = agent_1_param_dic[\"beta\"],\n    input_dims  = [agent_1_param_dic[\"input_dims\"]],\n    tau         = agent_1_param_dic[\"tau\"],\n    env         = system,\n    batch_size  = agent_1_param_dic[\"batch_size\"],\n    layer1_size=agent_1_param_dic[\"layer1_size\"],\n    layer2_size=agent_1_param_dic[\"layer2_size\"], \n    layer3_size=agent_1_param_dic[\"layer3_size\"],\n    layer4_size=agent_1_param_dic[\"layer4_size\"],\n    n_actions   = agent_1_param_dic[\"n_actions\"],\n    max_size = agent_1_param_dic[\"memory_max_size\"],\n    agent_name= agent_1_param_dic[\"agent_name\"]\n    ) \n\nif SEEDS is not None:\n    torch.manual_seed(SEEDS[-1]) # 2\n    torch.cuda.manual_seed_all(SEEDS[-1]) # 2\nagent_2 = Agent(\n    alpha       = agent_2_param_dic[\"alpha\"],\n    beta        = agent_2_param_dic[\"beta\"],\n    input_dims  = [agent_2_param_dic[\"input_dims\"]],\n    tau         = agent_2_param_dic[\"tau\"],\n    env         = system,\n    batch_size  = agent_2_param_dic[\"batch_size\"],\n    layer1_size=agent_2_param_dic[\"layer1_size\"],\n    layer2_size=agent_2_param_dic[\"layer2_size\"], \n    layer3_size=agent_2_param_dic[\"layer3_size\"],\n    layer4_size=agent_2_param_dic[\"layer4_size\"],\n    n_actions   = agent_2_param_dic[\"n_actions\"],\n    max_size = agent_2_param_dic[\"memory_max_size\"],\n    agent_name= agent_2_param_dic[\"agent_name\"]\n    ) \n\n\nif TRAINED_UAV:\n    benchmark = f'data/storage/benchmark/{DRL_ALGO}_{REWARD_DESIGN}_benchmark'\n    if DRL_ALGO == 'td3':\n        agent_2.load_models(\n             load_file_actor = benchmark + '/Actor_UAV_TD3',\n             load_file_critic_1 = benchmark + '/Critic_1_UAV_TD3',\n             load_file_critic_2 = benchmark + '/Critic_2_UAV_TD3'\n             )\n    elif DRL_ALGO == 'ddpg':\n        agent_2.load_models(\n             load_file_actor = benchmark + '/Actor_UAV_ddpg',\n             load_file_critic = benchmark + '/Critic_UAV_ddpg'\n             )\n\nmeta_dic = {}\nprint(\"***********************system information******************************\")\nprint(\"folder_name:     \"+str(system.data_manager.store_path))\nmeta_dic['folder_name'] = system.data_manager.store_path\nprint(\"user_num:        \"+str(system.user_num))\nmeta_dic['user_num'] = system.user_num\nprint(\"if_dir:          \"+str(system.if_dir_link))\nmeta_dic['if_dir_link'] = system.if_dir_link\nprint(\"if_with_RIS:     \"+str(system.if_with_RIS))\nmeta_dic['if_with_RIS'] = system.if_with_RIS\nprint(\"if_user_m:       \"+str(system.if_move_users))\nmeta_dic['if_move_users'] = system.if_move_users\nprint(\"RIS_ant_num:     \"+str(system.RIS.ant_num))\nmeta_dic['system_RIS_ant_num'] = system.RIS.ant_num\nprint(\"UAV_ant_num:     \"+str(system.UAV.ant_num))\nmeta_dic['system_UAV_ant_num'] = system.UAV.ant_num\nprint(\"if_movements:    \"+str(system.if_movements))\nmeta_dic['system_if_movements'] = system.if_movements\nprint(\"reverse_x_y:     \"+str(system.reverse_x_y))\nmeta_dic['system_reverse_x_y'] = system.reverse_x_y\nprint(\"if_UAV_pos_state:\"+str(system.if_UAV_pos_state))\nmeta_dic['if_UAV_pos_state'] = system.if_UAV_pos_state\n\nprint(\"ep_num:          \"+str(episode_num))\nmeta_dic['episode_num'] = episode_num\nprint(\"step_num:        \"+str(step_num))\nmeta_dic['step_num'] = step_num\nprint(\"***********************agent_1 information******************************\")\ntplt = \"{0:{2}^20}\\t{1:{2}^20}\"\nfor i in agent_1_param_dic:\n    parm = agent_1_param_dic[i]\n    print(tplt.format(i, parm, chr(12288)))\nmeta_dic[\"agent_1\"] = agent_1_param_dic\n\nprint(\"***********************agent_2 information******************************\")\nfor i in agent_2_param_dic:\n    parm = agent_2_param_dic[i]\n    print(tplt.format(i, parm, chr(12288)))\nmeta_dic[\"agent_2\"] = agent_2_param_dic\n\nsystem.data_manager.save_meta_data(meta_dic)\n\nprint(\"***********************traning information******************************\")\n\n\nwhile episode_cnt < episode_num:\n    # 1 reset the whole system\n    system.reset()\n    step_cnt = 0\n    score_per_ep = 0\n\n    # 2 get the initial state\n    if if_robust:\n        tmp = system.observe()\n        #z = np.random.multivariate_normal(np.zeros(2), 0.5*np.eye(2), size=len(tmp)).view(np.complex128)\n        z = np.random.normal(size=len(tmp))\n        observersion_1 = list(\n            np.array(tmp) + 0.6 *1e-7* z\n            )\n    else:\n        observersion_1 = system.observe()\n    observersion_2 = list(system.UAV.coordinate)\n    if episode_cnt == 80:\n        print(\"break point\")\n    while step_cnt < step_num:\n        # 1 count num of step in one episode\n        step_cnt += 1\n        # judge if pause the whole system\n        if not system.render_obj.pause:\n            # 2 choose action acoording to current state\n            action_1 = agent_1.choose_action(observersion_1, greedy=agent_1_param_dic[\"action_noise_factor\"] * math.pow((1-episode_cnt / episode_num), 2))\n            action_2 = agent_2.choose_action(observersion_2, greedy=agent_2_param_dic[\"action_noise_factor\"]* math.pow((1-episode_cnt / episode_num), 2))\n            if if_BS:\n                action_2[0]=0\n                action_2[1]=0\n\n            if if_Theta_fixed:\n                action_1[0+2 * system.UAV.ant_num * system.user_num:] = len(action_1[0+2 * system.UAV.ant_num * system.user_num:])*[0]\n                \n            if if_G_fixed:\n                action_1[0:0+2 * system.UAV.ant_num * system.user_num]=np.array([-0.0313, -0.9838, 0.3210, 1.0, -0.9786, -0.1448, 0.3518, 0.5813, -1.0, -0.2803, -0.4616, -0.6352, -0.1449, 0.7040, 0.4090, -0.8521]) * math.pow(episode_cnt / episode_num, 2) * 0.7\n                #action_1[0:0+2 * system.UAV.ant_num * system.user_num]=len(action_1[0:0+2 * system.UAV.ant_num * system.user_num])*[0.5]\n            # 3 get newstate, reward\n            if system.if_with_RIS:\n                new_state_1, reward, done, info = system.step(\n                    action_0=action_2[0],\n                    action_1=action_2[1],\n                    G=action_1[0:0+2 * system.UAV.ant_num * system.user_num],\n                    Phi=action_1[0+2 * system.UAV.ant_num * system.user_num:],\n                    set_pos_x=action_2[0],\n                    set_pos_y=action_2[1]\n                )\n                new_state_2 = list(system.UAV.coordinate)\n            else:\n                new_state_1, reward, done, info = system.step(\n                    action_0=action_2[0],\n                    action_1=action_2[1],\n                    G=action_1[0:0+2 * system.UAV.ant_num * system.user_num],\n                    set_pos_x=action_2[0],\n                    set_pos_y=action_2[1]\n                )\n                new_state_2 = list(system.UAV.coordinate)\n\n            score_per_ep += reward\n            # 4 store state pair into mem pool\n            agent_1.remember(observersion_1, action_1, reward, new_state_1, int(done))\n            agent_2.remember(observersion_2, action_2, reward, new_state_2, int(done))\n            # 5 update DDPG net\n            agent_1.learn()\n            if not TRAINED_UAV:\n                agent_2.learn()\n\n            #system.render_obj.render(0.001) # no rendering for faster\n            observersion_1 = new_state_1\n            observersion_2 = new_state_2\n            if done == True:\n                break\n            \n        else:\n            #system.render_obj.render_pause()  # no rendering for faster\n            time.sleep(0.001) #time.sleep(1)\n    system.data_manager.save_file(episode_cnt=episode_cnt)\n    system.reset()\n    print(\"ep_num: \"+str(episode_cnt)+\"   ep_score:  \"+str(score_per_ep))\n    episode_cnt +=1\n    if episode_cnt % 10 == 0:\n        agent_1.save_models()\n        agent_2.save_models()\n\n# save the last model\nagent_1.save_models()\nagent_2.save_models()\n"
  },
  {
    "path": "math_tool.py",
    "content": "import math\nimport cmath\nimport numpy as np\nimport pandas as pd\nfrom numpy import *\nfrom matplotlib import pyplot as plt\nfrom mpl_toolkits.mplot3d import Axes3D\nfrom matplotlib.patches import FancyArrowPatch\nfrom mpl_toolkits.mplot3d import proj3d\n\ndef cartesian_coordinate_to_spherical_coordinate(cartesian_coordinate):\n    \"\"\"\n    transmit cartesian_coordinate_to_spherical_coordinate\n    input 1 X 3 np.array,   [x, y, z]\n    output 1 X 3 np.array,  [r, theta, fai]\n    \"\"\"\n    r = np.linalg.norm(cartesian_coordinate)\n    if abs(cartesian_coordinate[2]) < 1e-8:\n        theta = math.atan(np.linalg.norm(cartesian_coordinate[0:2])/1e-8)\n    else:\n        theta = math.atan(np.linalg.norm(cartesian_coordinate[0:2])/cartesian_coordinate[2])\n    \n    if abs(cartesian_coordinate[0]) < 1e-8:\n        x = 1e-8\n    else:\n        x = cartesian_coordinate[0]\n\n    y = cartesian_coordinate[1]    \n    if abs(y) < 1e-8:\n        y = 1e-8\n\n    if y > 0 and x > 0:\n        fai = math.atan(y/x)\n    elif x < 0 and y > 0:\n        fai = math.atan(y/x) + math.pi\n    elif x < 0 and y < 0:\n        fai = math.atan(y/x) - math.pi\n    else:\n        fai = math.atan(y/x)\n    return r, theta, fai\n\ndef vecter_normalization(cartesian_coordinate):\n    return cartesian_coordinate/np.linalg.norm(cartesian_coordinate)\n\ndef get_coor_ref(coor_sys, coor):\n    \"\"\"\n    input:  coor_sys: normalized 1,3 np.array list (1,3)\n            coor: coordinate under earth system\n    output: referenced coordinate for x,y, normalized 1,3 np.array\n    \"\"\"\n    x_ref = np.dot(coor_sys[0],coor)\n    y_ref = np.dot(coor_sys[1],coor)\n    z_ref = np.dot(coor_sys[2],coor)\n    return np.array([x_ref, y_ref, z_ref])\n\ndef dB_to_normal(dB):\n    \"\"\"\n    input: dB\n    output: normal vaule\n    \"\"\"\n    return math.pow(10, (dB/10))\n\ndef normal_to_dB(normal):\n    \"\"\"\n    input: normal\n    output: dB value\n    \"\"\"\n    return -10 * math.log10(normal)\n\ndef diag_to_vector(diag):\n    \"\"\"\n    transfer a diagnal matrix into a vector\n    \"\"\"\n    vec_size = np.shape(diag)[0]\n    vector = np.mat(np.zeros((vec_size, 1), dtype=complex), dtype=complex)\n    for i in range(vec_size):\n        vector[i, 0] = diag[i, i]\n    return vector\n    \ndef vector_to_diag(vector):\n    \"\"\"\n    transfer a vector into a diagnal matrix\n    \"\"\"\n    vec_size = np.shape(vector)[0]\n    diag = np.mat(np.zeros((vec_size, vec_size), dtype=complex), dtype=complex)\n    for i in range(vec_size):\n        diag[i, i] = vector[i, 0]\n    return diag\n\ndef bigger_than_zero(value):\n    \"\"\"\n    max(0,value)\n    \"\"\"\n    return max(0, value)\n\ndef dataframe_to_dictionary(df):\n    \"\"\"\n    docstring\n    \"\"\"\n    return {col_name : df[col_name].values for col_name in df.columns.values}\n\ndef convert_list_to_complex_matrix(list_real, shape):\n    \"\"\"\n    list_real is a 2* N*K dim list, convert it to N X K complex matrix\n    shape is a tuple (N, K)\n    \"\"\"\n    N = shape[0]\n    K = shape[1]\n    matrix_complex =np.mat(np.zeros((N, K), dtype=complex), dtype=complex) \n    for i in range(N):\n        for j in range(K):\n            matrix_complex[i, j] = list_real[2*(i*K + j)] + 1j * list_real[2*(i*K + j) + 1]\n            \n    return matrix_complex\n\ndef convert_list_to_complex_diag(list_real, diag_row_num):\n    \"\"\"\n    list_real is a M dim list, convert it to M X M complex diag matrix\n    diag_row_num is the M\n    \"\"\"\n    M = diag_row_num\n    diag_matrix_complex = np.mat(np.zeros((M, M), dtype=complex), dtype=complex) \n    for i in range(M):\n        diag_matrix_complex[i, i] = cmath.exp(1j * list_real[i] * math.pi)\n    return diag_matrix_complex\n\ndef map_to(x, x_range:tuple, y_range:tuple):\n    x_min = x_range[0]\n    x_max = x_range[1]\n    y_min = y_range[0]\n    y_max = y_range[1]\n    y = y_min+(y_max - y_min) / (x_max - x_min) * (x - x_min)\n    return y\n\n"
  },
  {
    "path": "plot_see.py",
    "content": "import matplotlib as mpl\nmpl.rcParams['figure.dpi'] = 300\nimport matplotlib.pyplot as plt\nimport numpy as np\nimport cmath\nfrom scipy.io import loadmat, savemat\nimport pandas as pd\nimport os\nimport copy\nimport math\n\n\n######################################################\n# new for energy \n# energy related parameters of rotary-wing UAV\n# based on Energy Minimization in Internet-of-Things System Based on Rotary-Wing UAV\nP_i = 790.6715\nP_0 = 580.65\nU2_tip = (200) ** 2\ns = 0.05\nd_0 = 0.3\np = 1.225\nA = 0.79\ndelta_time = 0.1 #0.1/1000 #0.1ms\n\n# add ons hover veloctiy\n# based on https://www.intechopen.com/chapters/57483\nm = 1.3 # mass: assume 1.3kg https://www.droneblog.com/average-weights-of-common-types-of-drones/#:~:text=In%20most%20cases%2C%20toy%20drones,What%20is%20this%3F\ng = 9.81 # gravity\nT = m * g # thrust\nv_0 = (T / (A * 2 * p)) ** 0.5\n\ndef get_energy_consumption(v_t):\n    '''\n    arg\n    1) v_t = displacement per time slot\n    '''\n    energy_1 = P_0 \\\n                + 3 * P_0 * (abs(v_t)) ** 2 / U2_tip \\\n                + 0.5 * d_0 * p * s * A * (abs(v_t))**3\n    \n    energy_2 = P_i * ((\n                    (1 + (abs(v_t) ** 4) / (4 * (v_0 ** 4))) ** 0.5 \\\n                    - (abs(v_t) ** 2) / (2 * (v_0 **2)) \\\n                ) ** 0.5)\n    \n    energy = delta_time * (energy_1 + energy_2)\n    return energy \n\nENERGY_MIN = get_energy_consumption(0.25)\nENERGY_MAX = get_energy_consumption(0)\n\n######################################################\n\n\n# modified from data_manager.py\ninit_data_file = 'data/init_location.xlsx'\ndef read_init_location(entity_type = 'user', index = 0):\n    if entity_type == 'user' or 'attacker' or 'RIS' or 'RIS_norm_vec' or 'UAV':\n        return np.array([\\\n        pd.read_excel(init_data_file, sheet_name=entity_type)['x'][index],\\\n        pd.read_excel(init_data_file, sheet_name=entity_type)['y'][index],\\\n        pd.read_excel(init_data_file, sheet_name=entity_type)['z'][index]])\n    else:\n        return None\n\n\n# load and plot everything\nclass LoadAndPlot(object):\n    \"\"\"\n    load date and plot 2022-07-22 16_16_26\n    \"\"\"\n    def __init__(self, store_paths, \\\n                       user_num = 2, attacker_num = 1, RIS_ant_num = 4, \\\n                       ep_num = 300, step_num = 100): # RIS_ant_num = 16 (not true)\n\n        self.store_paths = store_paths\n        \n        self.color_list = ['b', 'c', 'g', 'k', 'm', 'r', 'y']\n#        self.store_paths = store_paths + '//'\n        self.user_num = user_num\n        self.attacker_num = attacker_num\n        self.RIS_ant_num = RIS_ant_num\n        self.ep_num = ep_num\n        self.step_num = step_num\n\n\n    def load_one_ep(self, file_name):\n        m = loadmat(self.store_path + file_name)\n        return m\n\n\n    def load_all_steps(self):\n        result_dic = {}\n        result_dic.update({'reward':[]})\n\n        result_dic.update({'user_capacity':[]})\n        for i in range(self.user_num):\n            result_dic['user_capacity'].append([])\n\n        result_dic.update({'secure_capacity':[]})\n        for i in range(self.user_num):\n            result_dic['secure_capacity'].append([])\n\n        result_dic.update({'attaker_capacity':[]})\n        for i in range(self.attacker_num):\n            result_dic['attaker_capacity'].append([])\n        \n        result_dic.update({'RIS_elements':[]})\n        for i in range(self.RIS_ant_num):\n            result_dic['RIS_elements'].append([])\n\n        for ep_cnt in range(self.ep_num):\n            mat_ep = self.load_one_ep(\"simulation_result_ep_\" + str(ep_cnt) + \".mat\")\n\n            one_ep_reward = mat_ep[\"result_\" + str(ep_cnt)][\"reward\"][0][0]\n            result_dic['reward'] += list(one_ep_reward[:, 0])\n\n            one_ep_user_capacity = mat_ep[\"result_\" + str(ep_cnt)][\"user_capacity\"][0][0]\n            for i in range(self.user_num):\n                result_dic['user_capacity'][i] += list(one_ep_user_capacity[:, i])\n            \n            one_ep_secure_capacity = mat_ep[\"result_\" + str(ep_cnt)][\"secure_capacity\"][0][0]\n            for i in range(self.user_num):\n                result_dic['secure_capacity'][i] += list(one_ep_secure_capacity[:, i])\n            \n            one_ep_attaker_capacity = mat_ep[\"result_\" + str(ep_cnt)][\"attaker_capacity\"][0][0]\n            for i in range(self.attacker_num):\n                result_dic['attaker_capacity'][i] += list(one_ep_attaker_capacity[:, i])\n\n            one_ep_RIS_first_element = mat_ep[\"result_\" + str(ep_cnt)][\"reflecting_coefficient\"][0][0]\n            for i in range(self.RIS_ant_num):\n                result_dic['RIS_elements'][i] += list(one_ep_RIS_first_element[:, i])\n\n        return result_dic\n\n\n    def plot(self):\n        \"\"\"\n        plot result\n        b--blue c--cyan(青色） g--green k--black m--magenta（紫红色） r--red w--white y--yellow \n        \"\"\"\n\n        \n      \n        ###############################\n        # plot average secrecy energy efficient of each episode\n        ###############################\n        fig = plt.figure('average_secrecy_energy_efficiency')\n\n        # get init location\n        init_uav_coord = read_init_location(entity_type = 'UAV')\n        init_user_coord_0 = read_init_location(entity_type = 'user', index=0)\n        init_user_coord_1 = read_init_location(entity_type = 'user', index=1)       \n        \n        # paths\n        legends = ['TDDRL', 'TTD3', 'TDDRL (Energy Penalty)', 'TTD3 (Energy Penalty)']\n        all_average_see = []\n        all_energies = []\n        \n        # energies\n        for store_path in self.store_paths:\n            energies = []\n            for i in range(self.ep_num):\n                # read the mat file\n                filename = f'simulation_result_ep_{i}.mat'\n                filename = os.path.join(store_path, filename)\n                data = loadmat(filename)\n            \n                # v_ts\n                energies_one_episode = []\n            \n                # loop all uav movt\n                uav_movt = data[f'result_{i}'][0][0][-1]\n                for j in range(uav_movt.shape[0]):\n                    move_x = uav_movt[j][0]\n                    move_y = uav_movt[j][1]\n                    v_t = (move_x ** 2 + move_y ** 2) ** 0.5\n                    energy = get_energy_consumption(v_t / delta_time)\n                    energies_one_episode.append(energy)\n                energies.append(energies_one_episode)\n            all_energies.append(energies)\n        \n        # see\n        for store_path, legend in zip(self.store_paths, legends):\n            average_see = []\n            # ssr\n            self.store_path = store_path + '//'\n            self.all_steps = self.load_all_steps()\n            \n            sum_secrecy_rate = np.array(self.all_steps['secure_capacity'])\n            sum_secrecy_rate = np.sum(sum_secrecy_rate, axis = 0)\n\n            # energy\n            energies = all_energies.pop(0)\n            for i in range(0, self.ep_num * self.step_num, self.step_num):\n                ssr_one_episode = sum_secrecy_rate[i:i+self.step_num] # ssr means Sum Secrecy Rate\n                energies_one_episode = energies.pop(0)\n                ssr_one_episode = ssr_one_episode[:len(energies_one_episode)]\n                energies_one_episode = energies_one_episode[:len(ssr_one_episode)]\n                try:\n                    see = np.array(ssr_one_episode) / np.array(energies_one_episode)\n                    average_see.append(sum(see)/len(see))\n                except:\n                    average_see.append(0)\n            \n            # change from /J to /kJ\n            average_see = np.array(average_see) * 1000\n            average_see = list(average_see)\n            \n            all_average_see.append(average_see)  \n            plt.plot(range(len(average_see)), average_see, label=legend)\n            plt.xlabel(\"Episodes (Ep)\")\n            plt.ylabel(\"Average Secrecy Energy Efficiency\")\n                \n        plt.legend()\n        plt.savefig('data/average_secrecy_energy_efficiency.png')\n\n        \n        # dictionary of lists  \n        dict = {legend: average_see for legend, average_see in zip(legends, all_average_see)} \n        df = pd.DataFrame(dict)\n        df.to_excel('data/average_secrecy_energy_efficiency.xlsx', index=False) \n\n        \nif __name__ == '__main__':\n    LoadPlotObject = LoadAndPlot(\n            store_paths = ['data/storage/scratch/ddpg_ssr', 'data/storage/scratch/td3_ssr', 'data/storage/scratch/ddpg_see', 'data/storage/scratch/td3_see'],\n            ep_num = 300,\n         )\n    LoadPlotObject.plot()\n\n    \n\n"
  },
  {
    "path": "plot_ssr.py",
    "content": "import matplotlib as mpl\nmpl.rcParams['figure.dpi'] = 300\nimport matplotlib.pyplot as plt\nimport numpy as np\nimport cmath\nfrom scipy.io import loadmat, savemat\nimport pandas as pd\nimport os\nimport copy\nimport math\n\n\n######################################################\n# new for energy \n# energy related parameters of rotary-wing UAV\n# based on Energy Minimization in Internet-of-Things System Based on Rotary-Wing UAV\nP_i = 790.6715\nP_0 = 580.65\nU2_tip = (200) ** 2\ns = 0.05\nd_0 = 0.3\np = 1.225\nA = 0.79\ndelta_time = 0.1 #0.1/1000 #0.1ms\n\n# add ons hover veloctiy\n# based on https://www.intechopen.com/chapters/57483\nm = 1.3 # mass: assume 1.3kg https://www.droneblog.com/average-weights-of-common-types-of-drones/#:~:text=In%20most%20cases%2C%20toy%20drones,What%20is%20this%3F\ng = 9.81 # gravity\nT = m * g # thrust\nv_0 = (T / (A * 2 * p)) ** 0.5\n\ndef get_energy_consumption(v_t):\n    '''\n    arg\n    1) v_t = displacement per time slot\n    '''\n    energy_1 = P_0 \\\n                + 3 * P_0 * (abs(v_t)) ** 2 / U2_tip \\\n                + 0.5 * d_0 * p * s * A * (abs(v_t))**3\n    \n    energy_2 = P_i * ((\n                    (1 + (abs(v_t) ** 4) / (4 * (v_0 ** 4))) ** 0.5 \\\n                    - (abs(v_t) ** 2) / (2 * (v_0 **2)) \\\n                ) ** 0.5)\n    \n    energy = delta_time * (energy_1 + energy_2)\n    return energy \n\nENERGY_MIN = get_energy_consumption(0.25)\nENERGY_MAX = get_energy_consumption(0)\n\n######################################################\n\n\n# modified from data_manager.py\ninit_data_file = 'data/init_location.xlsx'\ndef read_init_location(entity_type = 'user', index = 0):\n    if entity_type == 'user' or 'attacker' or 'RIS' or 'RIS_norm_vec' or 'UAV':\n        return np.array([\\\n        pd.read_excel(init_data_file, sheet_name=entity_type)['x'][index],\\\n        pd.read_excel(init_data_file, sheet_name=entity_type)['y'][index],\\\n        pd.read_excel(init_data_file, sheet_name=entity_type)['z'][index]])\n    else:\n        return None\n\n\n# load and plot everything\nclass LoadAndPlot(object):\n    \"\"\"\n    load date and plot 2022-07-22 16_16_26\n    \"\"\"\n    def __init__(self, store_paths, \\\n                       user_num = 2, attacker_num = 1, RIS_ant_num = 4, \\\n                       ep_num = 300, step_num = 100): # RIS_ant_num = 16 (not true)\n        \n        self.store_paths = store_paths\n\n        self.color_list = ['b', 'c', 'g', 'k', 'm', 'r', 'y']\n#        self.store_path = store_path + '//'\n        self.user_num = user_num\n        self.attacker_num = attacker_num\n        self.RIS_ant_num = RIS_ant_num\n        self.ep_num = ep_num\n        self.step_num = step_num\n\n\n    def load_one_ep(self, file_name):\n        m = loadmat(self.store_path + file_name)\n        return m\n\n\n    def load_all_steps(self):\n        result_dic = {}\n        result_dic.update({'reward':[]})\n\n        result_dic.update({'user_capacity':[]})\n        for i in range(self.user_num):\n            result_dic['user_capacity'].append([])\n\n        result_dic.update({'secure_capacity':[]})\n        for i in range(self.user_num):\n            result_dic['secure_capacity'].append([])\n\n        result_dic.update({'attaker_capacity':[]})\n        for i in range(self.attacker_num):\n            result_dic['attaker_capacity'].append([])\n        \n        result_dic.update({'RIS_elements':[]})\n        for i in range(self.RIS_ant_num):\n            result_dic['RIS_elements'].append([])\n\n        for ep_cnt in range(self.ep_num):\n            mat_ep = self.load_one_ep(\"simulation_result_ep_\" + str(ep_cnt) + \".mat\")\n\n            one_ep_reward = mat_ep[\"result_\" + str(ep_cnt)][\"reward\"][0][0]\n            result_dic['reward'] += list(one_ep_reward[:, 0])\n\n            one_ep_user_capacity = mat_ep[\"result_\" + str(ep_cnt)][\"user_capacity\"][0][0]\n            for i in range(self.user_num):\n                result_dic['user_capacity'][i] += list(one_ep_user_capacity[:, i])\n            \n            one_ep_secure_capacity = mat_ep[\"result_\" + str(ep_cnt)][\"secure_capacity\"][0][0]\n            for i in range(self.user_num):\n                result_dic['secure_capacity'][i] += list(one_ep_secure_capacity[:, i])\n            \n            one_ep_attaker_capacity = mat_ep[\"result_\" + str(ep_cnt)][\"attaker_capacity\"][0][0]\n            for i in range(self.attacker_num):\n                result_dic['attaker_capacity'][i] += list(one_ep_attaker_capacity[:, i])\n\n            one_ep_RIS_first_element = mat_ep[\"result_\" + str(ep_cnt)][\"reflecting_coefficient\"][0][0]\n            for i in range(self.RIS_ant_num):\n                result_dic['RIS_elements'][i] += list(one_ep_RIS_first_element[:, i])\n\n        return result_dic\n\n\n    def plot(self):\n        \"\"\"\n        plot result\n        b--blue c--cyan(青色） g--green k--black m--magenta（紫红色） r--red w--white y--yellow \n        \"\"\"\n\n        \n      \n        ###############################\n        # plot average sum secrecy rate of each episode\n        ###############################\n        fig = plt.figure('average_sum_secrecy_rate')\n        \n#        store_paths = ['data/storage/ddpg 4', 'data/storage/td3 3', 'data/storage/ddpg seem 6', 'data/storage/td3 seem 2']\n      #  store_paths = r'data/storage/scratch/td3_ssr'\n        legends = ['TDDRL', 'TTD3', 'TDDRL (Energy Penalty)', 'TTD3 (Energy Penalty)']\n        all_average_sum_secrecy_rate = []\n        for store_path, legend in zip(self.store_paths, legends):\n            self.store_path = store_path + '//'\n            self.all_steps = self.load_all_steps()\n            \n            sum_secrecy_rate = np.array(self.all_steps['secure_capacity'])\n            sum_secrecy_rate = np.sum(sum_secrecy_rate, axis = 0)\n            average_sum_secrecy_rate = []\n            ssr = []\n            for i in range(0, self.ep_num * self.step_num, self.step_num):\n                ssr_one_episode = sum_secrecy_rate[i:i+self.step_num] # ssr means Sum Secrecy Rate\n                ssr.append(ssr_one_episode)\n                try:\n                    _ = sum(ssr_one_episode) / len(ssr_one_episode)\n                except:\n                    _ = 0\n                average_sum_secrecy_rate.append(_)\n                \n            all_average_sum_secrecy_rate.append(average_sum_secrecy_rate)  \n            plt.plot(range(len(average_sum_secrecy_rate)), average_sum_secrecy_rate, label=legend)\n            plt.xlabel(\"Episodes (Ep)\")\n            plt.ylabel(\"Average Sum Secrecy Rate\")\n                \n        plt.legend()\n        plt.savefig('data/average_sum_secrecy_rate.png')\n        \n        \n        # transpose\n        '''\n        all_average_sum_secrecy_rate = np.array(all_average_sum_secrecy_rate)\n        all_average_sum_secrecy_rate = np.transpose(all_average_sum_secrecy_rate)\n        all_average_sum_secrecy_rate = list(all_average_sum_secrecy_rate)\n        '''\n        \n        # dictionary of lists  \n        dict = {legend: average_sum_secrecy_rate for legend, average_sum_secrecy_rate in zip(legends, all_average_sum_secrecy_rate)} \n        df = pd.DataFrame(dict)\n        df.to_excel('data/average_sum_secrecy_rate.xlsx', index=False) \n\n        \nif __name__ == '__main__':\n    LoadPlotObject = LoadAndPlot(\n        store_paths = ['data/storage/scratch/ddpg_ssr', 'data/storage/scratch/td3_ssr', 'data/storage/scratch/ddpg_see', 'data/storage/scratch/td3_see'],\n        ep_num = 300,\n        )\n    LoadPlotObject.plot()\n\n    \n\n"
  },
  {
    "path": "plot_traj.py",
    "content": "import matplotlib as mpl\nmpl.rcParams['figure.dpi'] = 100\nimport matplotlib.pyplot as plt\nimport numpy as np\nimport cmath\nfrom scipy.io import loadmat, savemat\nimport pandas as pd\nimport os\nimport copy\nimport math\n\n\n######################################################\n# new for energy \n# energy related parameters of rotary-wing UAV\n# based on Energy Minimization in Internet-of-Things System Based on Rotary-Wing UAV\nP_i = 790.6715\nP_0 = 580.65\nU2_tip = (200) ** 2\ns = 0.05\nd_0 = 0.3\np = 1.225\nA = 0.79\ndelta_time = 0.1 #0.1/1000 #0.1ms\n\n# add ons hover veloctiy\n# based on https://www.intechopen.com/chapters/57483\nm = 1.3 # mass: assume 1.3kg https://www.droneblog.com/average-weights-of-common-types-of-drones/#:~:text=In%20most%20cases%2C%20toy%20drones,What%20is%20this%3F\ng = 9.81 # gravity\nT = m * g # thrust\nv_0 = (T / (A * 2 * p)) ** 0.5\n\ndef get_energy_consumption(v_t):\n    '''\n    arg\n    1) v_t = displacement per time slot\n    '''\n    energy_1 = P_0 \\\n                + 3 * P_0 * (abs(v_t)) ** 2 / U2_tip \\\n                + 0.5 * d_0 * p * s * A * (abs(v_t))**3\n    \n    energy_2 = P_i * ((\n                    (1 + (abs(v_t) ** 4) / (4 * (v_0 ** 4))) ** 0.5 \\\n                    - (abs(v_t) ** 2) / (2 * (v_0 **2)) \\\n                ) ** 0.5)\n    \n    energy = delta_time * (energy_1 + energy_2)\n    return energy \n\nENERGY_MIN = get_energy_consumption(0.25)\nENERGY_MAX = get_energy_consumption(0)\n\n######################################################\n\n\n# modified from data_manager.py\ninit_data_file = 'data/init_location.xlsx'\ndef read_init_location(entity_type = 'user', index = 0):\n    if entity_type == 'user' or 'attacker' or 'RIS' or 'RIS_norm_vec' or 'UAV':\n        return np.array([\\\n        pd.read_excel(init_data_file, sheet_name=entity_type)['x'][index],\\\n        pd.read_excel(init_data_file, sheet_name=entity_type)['y'][index],\\\n        pd.read_excel(init_data_file, sheet_name=entity_type)['z'][index]])\n    else:\n        return None\n\n\n# load and plot everything\nclass LoadAndPlot(object):\n    \"\"\"\n    load date and plot 2022-07-22 16_16_26\n    \"\"\"\n    def __init__(self, store_paths, \\\n                       user_num = 2, attacker_num = 1, RIS_ant_num = 4, \\\n                       ep_num = 300, step_num = 100): # RIS_ant_num = 16 (not true)\n\n        self.store_paths = store_paths\n        self.color_list = ['b', 'c', 'g', 'k', 'm', 'r', 'y']\n#        self.store_path = store_path + '//'\n        self.user_num = user_num\n        self.attacker_num = attacker_num\n        self.RIS_ant_num = RIS_ant_num\n        self.ep_num = ep_num\n        self.step_num = step_num\n\n\n    def load_one_ep(self, file_name):\n        m = loadmat(self.store_path + file_name)\n        return m\n\n\n    def load_all_steps(self):\n        result_dic = {}\n        result_dic.update({'reward':[]})\n\n        result_dic.update({'user_capacity':[]})\n        for i in range(self.user_num):\n            result_dic['user_capacity'].append([])\n\n        result_dic.update({'secure_capacity':[]})\n        for i in range(self.user_num):\n            result_dic['secure_capacity'].append([])\n\n        result_dic.update({'attaker_capacity':[]})\n        for i in range(self.attacker_num):\n            result_dic['attaker_capacity'].append([])\n        \n        result_dic.update({'RIS_elements':[]})\n        for i in range(self.RIS_ant_num):\n            result_dic['RIS_elements'].append([])\n\n        for ep_cnt in range(self.ep_num):\n            mat_ep = self.load_one_ep(\"simulation_result_ep_\" + str(ep_cnt) + \".mat\")\n\n            one_ep_reward = mat_ep[\"result_\" + str(ep_cnt)][\"reward\"][0][0]\n            result_dic['reward'] += list(one_ep_reward[:, 0])\n\n            one_ep_user_capacity = mat_ep[\"result_\" + str(ep_cnt)][\"user_capacity\"][0][0]\n            for i in range(self.user_num):\n                result_dic['user_capacity'][i] += list(one_ep_user_capacity[:, i])\n            \n            one_ep_secure_capacity = mat_ep[\"result_\" + str(ep_cnt)][\"secure_capacity\"][0][0]\n            for i in range(self.user_num):\n                result_dic['secure_capacity'][i] += list(one_ep_secure_capacity[:, i])\n            \n            one_ep_attaker_capacity = mat_ep[\"result_\" + str(ep_cnt)][\"attaker_capacity\"][0][0]\n            for i in range(self.attacker_num):\n                result_dic['attaker_capacity'][i] += list(one_ep_attaker_capacity[:, i])\n\n            one_ep_RIS_first_element = mat_ep[\"result_\" + str(ep_cnt)][\"reflecting_coefficient\"][0][0]\n            for i in range(self.RIS_ant_num):\n                result_dic['RIS_elements'][i] += list(one_ep_RIS_first_element[:, i])\n\n        return result_dic\n\n\n    def plot(self):\n        \"\"\"\n        plot result\n        b--blue c--cyan(青色） g--green k--black m--magenta（紫红色） r--red w--white y--yellow \n        \"\"\"\n\n        \n      \n        ###############################\n        # plot trajectory\n        ###############################\n        # create a fig\n        fig, ax = plt.subplots(figsize=(5.4,5.2))\n        #fig = plt.figure('trajectory')\n        MARKER_SIZE = 8\n\n        # colour\n        color_list_template = ['b', 'g', 'c', 'k', 'm', 'r', 'y', 'black', 'red']\n        color_list_template = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf']\n        color_list = copy.deepcopy(color_list_template)\n\n        # get init location\n        init_uav_coord = read_init_location(entity_type = 'UAV')\n        init_ris_coord = read_init_location(entity_type = 'RIS')\n        init_eaves_coord = read_init_location(entity_type = 'attacker')\n        init_user_coord_0 = read_init_location(entity_type = 'user', index=0)\n        init_user_coord_1 = read_init_location(entity_type = 'user', index=1)\n    \n        \n        plt.text(20, init_uav_coord[0]-1, 'UAV Initial Coordinate', fontsize = 11)\n        plt.plot([init_uav_coord[1]], [init_uav_coord[0]], marker=\"s\", markersize=MARKER_SIZE, markeredgecolor=\"black\", markerfacecolor=\"none\")\n        \n        plt.text(46, init_ris_coord[0]-1, 'RIS', fontsize = 11)\n        plt.plot([init_ris_coord[1]], [init_ris_coord[0]], marker=\"d\", markersize=MARKER_SIZE, markeredgecolor=\"black\", markerfacecolor=\"none\")\n        \n        plt.text(36, init_eaves_coord[0]-1, 'Eavesdropper', fontsize = 11)\n        plt.plot([init_eaves_coord[1]], [init_eaves_coord[0]], marker=\"v\", markersize=MARKER_SIZE, markeredgecolor=\"black\", markerfacecolor=\"none\")\n        \n        # paths\n#        store_paths = ['data/storage/ddpg 2', 'data/storage/td3 3', 'data/storage/ddpg seem 3', 'data/storage/td3 seem 5']\n#        store_paths = ['data/storage/ddpg 2', 'data/storage/td3 3', 'data/storage/ddpg seem 3', 'data/storage/td3 seem 5']\n        legends = ['TDDRL', 'TTD3', 'TDDRL (Energy Penalty)', 'TTD3 (Energy Penalty)']\n        legends = ['Benchmark 1', 'Benchmark 2', 'Benchmark 3', 'Proposed method']\n     \n        for store_path, legend in zip(self.store_paths, legends):\n            # read the mat file\n            i = 5 - 1 # episode 300\n            filename = f'simulation_result_ep_{i}.mat'\n            filename = os.path.join(store_path, filename)\n            data = loadmat(filename)\n        \n            # uav movt\n            uav_coord = [ [init_uav_coord[0]], [init_uav_coord[1]] ]\n        \n            uav_movt = data[f'result_{i}'][0][0][-1]\n            for j in range(uav_movt.shape[0]):\n                move_x = uav_movt[j][0]\n                move_y = uav_movt[j][1]\n        \n                prev_x = uav_coord[0][-1]\n                prev_y = uav_coord[1][-1]\n        \n                current_x = prev_x + move_x\n                current_y = prev_y + move_y\n        \n                uav_coord[0].append(current_x)\n                uav_coord[1].append(current_y)\n            plt.plot(uav_coord[1],uav_coord[0], c=color_list.pop(0), label=legend)\n        \n        # user 0 movt\n        direction_fai = -1/2*math.pi \n        distance_delta_d = 0.25\n        user_coord_0 = [ [init_user_coord_0[0]], [init_user_coord_0[1]] ]\n        plt.text(29, init_user_coord_0[0]-1, 'User 1 Initial Coordinate', fontsize = 11)\n        #color_list = copy.deepcopy(color_list_template)\n        for j in range(uav_movt.shape[0]):\n            delta_x = distance_delta_d * math.cos(direction_fai)\n            delta_y = distance_delta_d * math.sin(direction_fai)\n        \n            prev_x = user_coord_0[0][-1]\n            prev_y = user_coord_0[1][-1]\n        \n            current_x = prev_x + delta_x\n            current_y = prev_y + delta_y\n        \n            user_coord_0[0].append(current_x)\n            user_coord_0[1].append(current_y)\n        plt.plot(user_coord_0[1],user_coord_0[0], c=color_list.pop(0), linestyle='dashed', linewidth=2, label='User 1')\n        plt.plot(user_coord_0[1][0], user_coord_0[0][0], marker=\"o\", markersize=MARKER_SIZE, markeredgecolor=\"black\", markerfacecolor=\"none\")\n        \n        \n        # user 1 movt\n        direction_fai = -1/2*math.pi \n        distance_delta_d = 0.25\n        user_coord_1 = [ [init_user_coord_1[0]], [init_user_coord_1[1]] ]\n        plt.text(13, init_user_coord_1[0]-1, 'User 2 Initial Coordinate', fontsize = 11)\n        #color_list = copy.deepcopy(color_list_template)\n        for j in range(uav_movt.shape[0]):\n            delta_x = distance_delta_d * math.cos(direction_fai)\n            delta_y = distance_delta_d * math.sin(direction_fai)\n        \n            prev_x = user_coord_1[0][-1]\n            prev_y = user_coord_1[1][-1]\n        \n            current_x = prev_x + delta_x\n            current_y = prev_y + delta_y\n        \n            user_coord_1[0].append(current_x)\n            user_coord_1[1].append(current_y)\n        plt.plot(user_coord_1[1],user_coord_1[0], c=color_list.pop(0), linestyle='dashed', linewidth=2, label='User 2')\n        plt.plot(user_coord_1[1][0], user_coord_1[0][0], marker=\"o\", markersize=MARKER_SIZE, markeredgecolor=\"black\", markerfacecolor=\"none\")\n        \n        # plot a line between last coord of user 0 and user 1\n        plt.plot([user_coord_0[1][-1], user_coord_1[1][0-1]], [user_coord_0[0][-1], user_coord_1[0][-1]], 'gray', linestyle='dashed')\n        \n        # plot midpoint between last coord of user 0 and user 1\n        plt.plot([(user_coord_0[1][-1] + user_coord_1[1][0-1])/2], [(user_coord_0[0][-1] + user_coord_1[0][0-1])/2], marker=\"o\", markersize=MARKER_SIZE, markeredgecolor=\"black\", markerfacecolor=\"none\")\n        plt.text(12, 18, \"Midpoint of \\ntwo user's last location\", fontsize = 11)\n       \n        plt.legend(loc='center right', fontsize=10)\n        plt.grid()\n        plt.xlim(0, 50)\n        plt.ylim(-10, 30)\n        plt.xlabel('x(m)')\n        plt.ylabel('y(m)')\n        plt.tight_layout()\n        plt.gca().invert_yaxis()\n        plt.savefig('data/trajectory.png')\n        #plt.cla()\n\n        \nif __name__ == '__main__':\n    LoadPlotObject = LoadAndPlot(\n        store_paths = ['data/storage/scratch/ddpg_ssr', 'data/storage/scratch/td3_ssr', 'data/storage/scratch/ddpg_see', 'data/storage/scratch/td3_see'],\n        ep_num=300,\n        )\n    LoadPlotObject.plot()\n\n    \n\n"
  },
  {
    "path": "render.py",
    "content": "from mpl_toolkits import mplot3d\nimport matplotlib.pyplot as plt\nfrom mpl_toolkits.mplot3d import Axes3D\nfrom matplotlib.patches import FancyArrowPatch\nfrom mpl_toolkits.mplot3d import proj3d\n\nclass Arrow3D(FancyArrowPatch):\n    def __init__(self, xs, ys, zs, *args, **kwargs):\n        FancyArrowPatch.__init__(self, (0,0), (0,0), *args, **kwargs)\n        self._verts3d = xs, ys, zs\n\n    def draw(self, renderer):\n        xs3d, ys3d, zs3d = self._verts3d\n        xs, ys, zs = proj3d.proj_transform(xs3d, ys3d, zs3d, renderer.M)\n        self.set_positions((xs[0],ys[0]),(xs[1],ys[1]))\n        FancyArrowPatch.draw(self, renderer)\n\nclass Render(object):\n    \"\"\"\n    plot functions to render the whole system\n    \"\"\"\n    def __init__(self, system, canv_x = (-25, 25), canv_y = (0, 50), canv_z = (0, 60)):\n        \"\"\"\n        docstring\n        \"\"\"\n        plt.ion()\n        self.system = system\n        self.fig = plt.figure(1)\n        self.pause = False\n        self.t_index = 0\n\n    def render_pause(self):\n        \"\"\"\n        show whole system by using plt.show\n        \"\"\"\n        plt.ion()\n        ax = self.plot_config()\n        # plot the position of UAV, RIS, Users & Attakers\n        self.plot_entities(ax)\n        self.plot_channels(ax)\n        self.plot_text(ax)\n        plt.show(self.fig)\n        plt.cla() \n        self.pause = False\n        plt.ioff() \n\n    def render(self, interval):\n        \"\"\"\n        show whole system in 3D figure\n        \"\"\"\n        plt.ion()\n        ax = self.plot_config()\n        # plot the position of UAV, RIS, Users & Attakers\n        self.plot_entities(ax)\n        self.plot_channels(ax)\n        self.plot_text(ax)\n        plt.pause(interval)\n        plt.cla() \n        plt.ioff()\n\n    def plot_click(self, event):\n        self.pause ^= True\n\n    def plot_config(self):\n        self.fig = plt.figure(1)\n        ax = plt.axes(projection='3d')\n        ax.set_xlabel('X Label')\n        ax.set_ylabel('Y Label')\n        ax.set_zlabel('Z Label')\n        ax.set_xlim3d(-25, 25)\n        ax.set_ylim3d(0, 50)\n        ax.set_zlim3d(0, 60)\n        ax.view_init(90, 0)\n        self.fig.canvas.mpl_connect('key_press_event', self.plot_click)\n        return ax    \n    \n    def plot_entities(self, ax):\n        \"\"\"\n        function used in render to show the UAV, RIS, users and attackers\n        \"\"\"\n        ax.scatter(\\\n        self.system.UAV.coordinate[0],\\\n        self.system.UAV.coordinate[1],\\\n        self.system.UAV.coordinate[2],\\\n        color='r')\n        ax.text(self.system.UAV.coordinate[0],self.system.UAV.coordinate[1],self.system.UAV.coordinate[2], \\\n        'UAV', size=15, zorder=1, color='r') \n\n        ax.scatter(\\\n        self.system.RIS.coordinate[0],\\\n        self.system.RIS.coordinate[1],\\\n        self.system.RIS.coordinate[2],\\\n        color='g')\n        ax.text(self.system.RIS.coordinate[0],self.system.RIS.coordinate[1],self.system.RIS.coordinate[2], \\\n        'RIS', size=15, zorder=1, color='g') \n\n        for user in self.system.user_list:\n            ax.scatter(\n            user.coordinate[0],\\\n            user.coordinate[1],\\\n            user.coordinate[2],\\\n            color='b'\n            )\n            text = 'user_'+str(user.index) + '\\n'\\\n            + 'noise power(dB)    = ' + str(user.noise_power) + '\\n' \\\n            + 'capacity          = ' + str(user.capacity) + '\\n'\\\n            + 'secure_capacity   = ' + str(user.secure_capacity)\n            \n            ax.text(user.coordinate[0],user.coordinate[1],user.coordinate[2], \\\n            text, size=10, zorder=1, color='b') \n        for attacker in self.system.attacker_list:\n            ax.scatter(\n            attacker.coordinate[0],\\\n            attacker.coordinate[1],\\\n            attacker.coordinate[2],\\\n            color='y'\n            )\n            ax.text(attacker.coordinate[0],attacker.coordinate[1],attacker.coordinate[2], \\\n            'attacker_'+str(attacker.index) + '\\n'\\\n            +'capacities:' + str(attacker.capacity), size=10, zorder=1, color='y') \n\n    def plot_channels(self, ax):\n        \"\"\"\n        function used in render to show the H_UR, h_U_k, h_R_k\n        \"\"\"\n        for channel in self.system.h_R_k:\n            self.plot_one_channel(ax, channel, \"b\")\n        for channel in self.system.h_R_p:\n            self.plot_one_channel(ax, channel, \"y\")\n        for channel in self.system.h_U_k:\n            self.plot_one_channel(ax, channel, \"b\")\n        for channel in self.system.h_U_p:\n            self.plot_one_channel(ax, channel, \"y\")\n            \n        self.plot_one_channel(ax, self.system.H_UR, \"r\")\n        \n    def plot_one_channel(self, ax, channel, color, text = \"channel\"):\n        \"\"\"\n        function used in plot channels to show only one channel\n        \"\"\"        \n        arrow_side_coor = channel.receiver.coordinate\n        point_side_coor = channel.transmitter.coordinate\n\n        text = channel.channel_name + '\\n' \\\n        + 'n=' + str(channel.n) \\\n        + '     sigma=' + str(channel.sigma) +'\\n'\\\n        + 'PL=' + str(channel.path_loss_normal) + '\\n'\\\n        + 'PL(dB)=' + str(channel.path_loss_dB)\n        \n        x = (arrow_side_coor[0] + point_side_coor[0]) / 2\n        y = (arrow_side_coor[1] + point_side_coor[1]) / 2\n        z = (arrow_side_coor[2] + point_side_coor[2]) / 2\n        ax.text(x, y, z, text, size=10, zorder=1, color=color) \n        \n        channel_arrow = Arrow3D(\\\n        [point_side_coor[0], arrow_side_coor[0]], \\\n        [point_side_coor[1], arrow_side_coor[1]], \\\n        [point_side_coor[2], arrow_side_coor[2]],\\\n        mutation_scale=20, lw = 3, arrowstyle=\"-|>\", color=color\n        )\n        ax.add_artist(channel_arrow)\n\n    def plot_text(self, ax):\n        \"\"\"\n        used in render to polt texts\n        \"\"\"\n        text = \"pause = \" + str(self.pause) + \"\\n\"\\\n        + \"t_index = \"    + str(self.t_index)\n        ax.text(0, 0, 60, text, size=10, zorder=1, color='b') "
  },
  {
    "path": "requirements.txt",
    "content": "# Twin-TD3 requirements\n# Usage: pip install -r requirements.txt\n\nmatplotlib==3.4.3\npyparsing==3.0.9\n# sklearn==0.0.post1\njoblib==1.2.0\npandas==1.4.3\ncycler==0.11.0\nseaborn==0.12.2\nkiwisolver==1.4.3\ntorchvision==0.13.0\ntorch>=1.7.0\nscikit-learn==1.2.2\npython-dateutil==2.8.2\npytz==2022.1\nscipy==1.8.1\nopenpyxl==3.0.9\n"
  },
  {
    "path": "run_simulation.py",
    "content": "# debug field\nimport os\nos.environ[\"KMP_DUPLICATE_LIB_OK\"]=\"TRUE\"\n\nimport argparse\n\n# get argument from user\nparser = argparse.ArgumentParser()\nparser.add_argument('--path', type = str, required = True, help=\"pretrained model weight path\")\n\nargs = parser.parse_args()\nSTORE_PATH = args.path\n\n# validate the weight path\nif not os.path.isdir(STORE_PATH):\n    raise NotImplementedError(\"The provided weight path does not exist!\")\n\n# get DRL_ALGO\nif 'td3' in STORE_PATH:\n    DRL_ALGO = 'td3'\nelse:\n    DRL_ALGO = 'ddpg'\n\n# reward design\nif 'see' in STORE_PATH:\n    REWARD_DESIGN = 'see'\nelse:\n    REWARD_DESIGN = 'ssr'\n\n# seeds and episode number\nSEEDS = None\n\n# process the argument\nassert DRL_ALGO in ['ddpg', 'td3'], \"drl must be ['ddpg', 'td3']\"\nassert REWARD_DESIGN in ['ssr', 'see'], \"reward must be ['ssr', 'see']\"\nif SEEDS is not None:\n    assert len(SEEDS) in [1, 2] and isinstance(SEEDS[0], int) and isinstance(SEEDS[-1], int), \"seeds must be a list of 1 or 2 integer\"\n\nif DRL_ALGO == 'td3':\n    from td3 import Agent\nelif DRL_ALGO == 'ddpg':\n    from ddpg import Agent\nimport ddpg\n\nfrom env import MiniSystem\nimport numpy as np\nimport math\nimport time\nimport torch\nimport shutil\n\n# 1 init system model\nepisode_num = 1\nepisode_cnt = 0\nstep_num = 100\n\nproject_name = STORE_PATH\n\nsystem = MiniSystem(\n    user_num=2,\n    RIS_ant_num=4,\n    UAV_ant_num=4,\n    if_dir_link=1,\n    if_with_RIS=True,\n    if_move_users=True,\n    if_movements=True,\n    reverse_x_y=(False, False),\n    if_UAV_pos_state = True,\n    reward_design = REWARD_DESIGN,\n    project_name = project_name,\n    step_num = step_num\n    )\n\nif_Theta_fixed = False\nif_G_fixed = False\nif_BS = False\nif_robust = True\n\n\n# 2 init RL Agent\nagent_1_param_dic = {}\nagent_1_param_dic[\"alpha\"] = 0.0001\nagent_1_param_dic[\"beta\"] = 0.001\nagent_1_param_dic[\"input_dims\"] = system.get_system_state_dim()\nagent_1_param_dic[\"tau\"] = 0.001\nagent_1_param_dic[\"batch_size\"] = 64\nagent_1_param_dic[\"n_actions\"] = system.get_system_action_dim() - 2\nagent_1_param_dic[\"action_noise_factor\"] = 0.1\nagent_1_param_dic[\"memory_max_size\"] = int(5/5 * episode_num * step_num) #/2\nagent_1_param_dic[\"agent_name\"] = \"G_and_Phi\"\nagent_1_param_dic[\"layer1_size\"] = 800\nagent_1_param_dic[\"layer2_size\"] = 600\nagent_1_param_dic[\"layer3_size\"] = 512\nagent_1_param_dic[\"layer4_size\"] = 256\n\nagent_2_param_dic = {}\nagent_2_param_dic[\"alpha\"] = 0.0001\nagent_2_param_dic[\"beta\"] = 0.001\nagent_2_param_dic[\"input_dims\"] = 3\nagent_2_param_dic[\"tau\"] = 0.001\nagent_2_param_dic[\"batch_size\"] = 64\nagent_2_param_dic[\"n_actions\"] = 2\nagent_2_param_dic[\"action_noise_factor\"] = 0.5\nagent_2_param_dic[\"memory_max_size\"] = int(5/5 * episode_num * step_num) #/2\nagent_2_param_dic[\"agent_name\"] = \"UAV\"\nagent_2_param_dic[\"layer1_size\"] = 400\nagent_2_param_dic[\"layer2_size\"] = 300\nagent_2_param_dic[\"layer3_size\"] = 256\nagent_2_param_dic[\"layer4_size\"] = 128\n\nif SEEDS is not None:\n    torch.manual_seed(SEEDS[0]) # 1\n    torch.cuda.manual_seed_all(SEEDS[0]) # 1\nagent_1 = Agent(\n    alpha       = agent_1_param_dic[\"alpha\"],\n    beta        = agent_1_param_dic[\"beta\"],\n    input_dims  = [agent_1_param_dic[\"input_dims\"]],\n    tau         = agent_1_param_dic[\"tau\"],\n    env         = system,\n    batch_size  = agent_1_param_dic[\"batch_size\"],\n    layer1_size=agent_1_param_dic[\"layer1_size\"],\n    layer2_size=agent_1_param_dic[\"layer2_size\"], \n    layer3_size=agent_1_param_dic[\"layer3_size\"],\n    layer4_size=agent_1_param_dic[\"layer4_size\"],\n    n_actions   = agent_1_param_dic[\"n_actions\"],\n    max_size = agent_1_param_dic[\"memory_max_size\"],\n    agent_name= agent_1_param_dic[\"agent_name\"]\n    ) \n\nif SEEDS is not None:\n    torch.manual_seed(SEEDS[-1]) # 2\n    torch.cuda.manual_seed_all(SEEDS[-1]) # 2\nagent_2 = Agent(\n    alpha       = agent_2_param_dic[\"alpha\"],\n    beta        = agent_2_param_dic[\"beta\"],\n    input_dims  = [agent_2_param_dic[\"input_dims\"]],\n    tau         = agent_2_param_dic[\"tau\"],\n    env         = system,\n    batch_size  = agent_2_param_dic[\"batch_size\"],\n    layer1_size=agent_2_param_dic[\"layer1_size\"],\n    layer2_size=agent_2_param_dic[\"layer2_size\"], \n    layer3_size=agent_2_param_dic[\"layer3_size\"],\n    layer4_size=agent_2_param_dic[\"layer4_size\"],\n    n_actions   = agent_2_param_dic[\"n_actions\"],\n    max_size = agent_2_param_dic[\"memory_max_size\"],\n    agent_name= agent_2_param_dic[\"agent_name\"]\n    ) \n\n\nif DRL_ALGO == 'td3':\n    agent_1.load_models(\n         load_file_actor = STORE_PATH + '/Actor_G_and_Phi_TD3',\n         load_file_critic_1 = STORE_PATH + '/Critic_1_G_and_Phi_TD3',\n         load_file_critic_2 = STORE_PATH + '/Critic_2_G_and_Phi_TD3'\n         )\n    agent_2.load_models(\n         load_file_actor = STORE_PATH + '/Actor_UAV_TD3',\n         load_file_critic_1 = STORE_PATH + '/Critic_1_UAV_TD3',\n         load_file_critic_2 = STORE_PATH + '/Critic_2_UAV_TD3'\n         )\nelif DRL_ALGO == 'ddpg':\n    agent_1.load_models(\n         load_file_actor = STORE_PATH + '/Actor_G_and_Phi_ddpg',\n         load_file_critic = STORE_PATH + '/Critic_G_and_Phi_ddpg'\n         )\n    agent_2.load_models(\n         load_file_actor = STORE_PATH + '/Actor_UAV_ddpg',\n         load_file_critic = STORE_PATH + '/Critic_UAV_ddpg'\n         )\n\nmeta_dic = {}\nprint(\"***********************system information******************************\")\nprint(\"folder_name:     \"+str(system.data_manager.store_path))\nmeta_dic['folder_name'] = system.data_manager.store_path\nprint(\"user_num:        \"+str(system.user_num))\nmeta_dic['user_num'] = system.user_num\nprint(\"if_dir:          \"+str(system.if_dir_link))\nmeta_dic['if_dir_link'] = system.if_dir_link\nprint(\"if_with_RIS:     \"+str(system.if_with_RIS))\nmeta_dic['if_with_RIS'] = system.if_with_RIS\nprint(\"if_user_m:       \"+str(system.if_move_users))\nmeta_dic['if_move_users'] = system.if_move_users\nprint(\"RIS_ant_num:     \"+str(system.RIS.ant_num))\nmeta_dic['system_RIS_ant_num'] = system.RIS.ant_num\nprint(\"UAV_ant_num:     \"+str(system.UAV.ant_num))\nmeta_dic['system_UAV_ant_num'] = system.UAV.ant_num\nprint(\"if_movements:    \"+str(system.if_movements))\nmeta_dic['system_if_movements'] = system.if_movements\nprint(\"reverse_x_y:     \"+str(system.reverse_x_y))\nmeta_dic['system_reverse_x_y'] = system.reverse_x_y\nprint(\"if_UAV_pos_state:\"+str(system.if_UAV_pos_state))\nmeta_dic['if_UAV_pos_state'] = system.if_UAV_pos_state\n\nprint(\"ep_num:          \"+str(episode_num))\nmeta_dic['episode_num'] = episode_num\nprint(\"step_num:        \"+str(step_num))\nmeta_dic['step_num'] = step_num\nprint(\"***********************agent_1 information******************************\")\ntplt = \"{0:{2}^20}\\t{1:{2}^20}\"\nfor i in agent_1_param_dic:\n    parm = agent_1_param_dic[i]\n    print(tplt.format(i, parm, chr(12288)))\nmeta_dic[\"agent_1\"] = agent_1_param_dic\n\nprint(\"***********************agent_2 information******************************\")\nfor i in agent_2_param_dic:\n    parm = agent_2_param_dic[i]\n    print(tplt.format(i, parm, chr(12288)))\nmeta_dic[\"agent_2\"] = agent_2_param_dic\n\nsystem.data_manager.save_meta_data(meta_dic)\n\nprint(\"***********************traning information******************************\")\n\ntry:\n    while episode_cnt < episode_num:\n        # 1 reset the whole system\n        system.reset()\n        step_cnt = 0\n        score_per_ep = 0\n\n        # 2 get the initial state\n        if if_robust:\n            tmp = system.observe()\n            #z = np.random.multivariate_normal(np.zeros(2), 0.5*np.eye(2), size=len(tmp)).view(np.complex128)\n            z = np.random.normal(size=len(tmp))\n            observersion_1 = list(\n                np.array(tmp) + 0.6 *1e-7* z\n                )\n        else:\n            observersion_1 = system.observe()\n        observersion_2 = list(system.UAV.coordinate)\n\n        while step_cnt < step_num:\n            # 1 count num of step in one episode\n            step_cnt += 1\n            # judge if pause the whole system\n            if not system.render_obj.pause:\n                # 2 choose action acoording to current state\n                action_1 = agent_1.choose_action(observersion_1, greedy=agent_1_param_dic[\"action_noise_factor\"] * math.pow((1-episode_cnt / episode_num), 2))\n                action_2 = agent_2.choose_action(observersion_2, greedy=agent_2_param_dic[\"action_noise_factor\"]* math.pow((1-episode_cnt / episode_num), 2))\n                if if_BS:\n                    action_2[0]=0\n                    action_2[1]=0\n\n                if if_Theta_fixed:\n                    action_1[0+2 * system.UAV.ant_num * system.user_num:] = len(action_1[0+2 * system.UAV.ant_num * system.user_num:])*[0]\n\n                if if_G_fixed:\n                    action_1[0:0+2 * system.UAV.ant_num * system.user_num]=np.array([-0.0313, -0.9838, 0.3210, 1.0, -0.9786, -0.1448, 0.3518, 0.5813, -1.0, -0.2803, -0.4616, -0.6352, -0.1449, 0.7040, 0.4090, -0.8521]) * math.pow(episode_cnt / episode_num, 2) * 0.7\n                    #action_1[0:0+2 * system.UAV.ant_num * system.user_num]=len(action_1[0:0+2 * system.UAV.ant_num * system.user_num])*[0.5]\n                # 3 get newstate, reward\n                if system.if_with_RIS:\n                    new_state_1, reward, done, info = system.step(\n                        action_0=action_2[0],\n                        action_1=action_2[1],\n                        G=action_1[0:0+2 * system.UAV.ant_num * system.user_num],\n                        Phi=action_1[0+2 * system.UAV.ant_num * system.user_num:],\n                        set_pos_x=action_2[0],\n                        set_pos_y=action_2[1]\n                    )\n                    new_state_2 = list(system.UAV.coordinate)\n                else:\n                    new_state_1, reward, done, info = system.step(\n                        action_0=action_2[0],\n                        action_1=action_2[1],\n                        G=action_1[0:0+2 * system.UAV.ant_num * system.user_num],\n                        set_pos_x=action_2[0],\n                        set_pos_y=action_2[1]\n                    )\n                    new_state_2 = list(system.UAV.coordinate)\n\n                score_per_ep += reward\n\n                # render\n                system.render_obj.render(0.001) # no rendering for faster\n                observersion_1 = new_state_1\n                observersion_2 = new_state_2\n                if done == True:\n                    break\n\n            else:\n                system.render_obj.render_pause()  # no rendering for faster\n                time.sleep(0.001) #time.sleep(1)\n\n        system.reset()\n        print(\"ep_num: \"+str(episode_cnt)+\"   ep_score:  \"+str(score_per_ep))\n        episode_cnt +=1\nexcept KeyboardInterrupt:\n    raise KeyboardInterrupt\nfinally:\n    shutil.rmtree('data/storage/data')\n"
  },
  {
    "path": "td3.py",
    "content": "import os\nimport torch as T\n#import torch.cuda as T\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport torch.optim as optim\nimport numpy as np\nfrom data_manager import DataManager\nclass OUActionNoise(object):\n    def __init__(self, mu, sigma=0.15, theta=.2, dt=1e-2, x0=None):\n        self.theta = theta\n        self.mu = mu\n        self.sigma = sigma\n        self.dt = dt\n        self.x0 = x0\n        self.reset()\n\n    def __call__(self):\n        x = self.x_prev + self.theta * (self.mu - self.x_prev) * self.dt + \\\n            self.sigma * np.sqrt(self.dt) * np.random.normal(size=self.mu.shape)\n        self.x_prev = x\n        return x\n\n    def reset(self):\n        self.x_prev = self.x0 if self.x0 is not None else np.zeros_like(self.mu)\n\n    def __repr__(self):\n        return 'OrnsteinUhlenbeckActionNoise(mu={}, sigma={})'.format(\n                                                            self.mu, self.sigma)\n\nclass AWGNActionNoise(object):\n    def __init__(self, mu = 0, sigma=1):\n        self.mu = mu\n        self.sigma = sigma\n\n    def __call__(self):\n        #self.mu = mu\n        #self.sigma = sigma\n        x = np.random.normal(size=self.mu.shape) * self.sigma\n        return x\n\nclass ReplayBuffer(object):\n    def __init__(self, max_size, input_shape, n_actions):\n        self.mem_size = max_size\n        self.mem_cntr = 0\n        self.state_memory = np.zeros((self.mem_size, *input_shape))\n        self.new_state_memory = np.zeros((self.mem_size, *input_shape))\n        self.action_memory = np.zeros((self.mem_size, n_actions))\n        self.reward_memory = np.zeros(self.mem_size)\n        self.terminal_memory = np.zeros(self.mem_size, dtype=np.float32)\n\n    def store_transition(self, state, action, reward, state_, done):\n        index = self.mem_cntr % self.mem_size\n        self.state_memory[index] = state\n        self.new_state_memory[index] = state_\n        self.action_memory[index] = action\n        self.reward_memory[index] = reward\n        self.terminal_memory[index] = 1 - done\n        self.mem_cntr += 1\n\n    def sample_buffer(self, batch_size):\n        max_mem = min(self.mem_cntr, self.mem_size)\n\n        batch = np.random.choice(max_mem, batch_size)\n\n        states = self.state_memory[batch]\n        actions = self.action_memory[batch]\n        rewards = self.reward_memory[batch]\n        states_ = self.new_state_memory[batch]\n        terminal = self.terminal_memory[batch]\n\n        return states, actions, rewards, states_, terminal\n\nclass CriticNetwork(nn.Module):\n    def __init__(self, beta, input_dims, fc1_dims, fc2_dims, fc3_dims, fc4_dims, n_actions, name,\n                 chkpt_dir='C:\\\\demo\\\\IRS_TD3_minimal\\\\main_foder\\\\tmp\\\\TD3', load_file = ''):\n        super(CriticNetwork, self).__init__()\n        self.input_dims = input_dims\n        self.fc1_dims = fc1_dims\n        self.fc2_dims = fc2_dims\n        self.fc3_dims = fc3_dims\n        self.fc4_dims = fc4_dims\n        self.n_actions = n_actions\n        self.checkpoint_file = os.path.join(chkpt_dir,name+'_TD3')\n        self.load_file = 'C:\\\\demo\\\\other_branch\\\\Learning-based_Secure_Transmission_for_RIS_Aided_mmWave-UAV_Communications_with_Imperfect_CSI\\\\data\\\\mannal_store\\\\models\\\\Critic_UAV_TD3'\n        self.fc1 = nn.Linear(*self.input_dims, self.fc1_dims)\n        f1 = 1./np.sqrt(self.fc1.weight.data.size()[0])\n        T.nn.init.uniform_(self.fc1.weight.data, -f1, f1)\n        T.nn.init.uniform_(self.fc1.bias.data, -f1, f1)\n        #self.fc1.weight.data.uniform_(-f1, f1)\n        #self.fc1.bias.data.uniform_(-f1, f1)\n        self.bn1 = nn.LayerNorm(self.fc1_dims)\n\n        self.fc2 = nn.Linear(self.fc1_dims, self.fc2_dims)\n        f2 = 1./np.sqrt(self.fc2.weight.data.size()[0])\n        #f2 = 0.002\n        T.nn.init.uniform_(self.fc2.weight.data, -f2, f2)\n        T.nn.init.uniform_(self.fc2.bias.data, -f2, f2)\n        #self.fc2.weight.data.uniform_(-f2, f2)\n        #self.fc2.bias.data.uniform_(-f2, f2)\n        self.bn2 = nn.LayerNorm(self.fc2_dims)\n\n        self.fc3 = nn.Linear(self.fc2_dims, self.fc3_dims)\n        f3 = 1./np.sqrt(self.fc3.weight.data.size()[0])\n        T.nn.init.uniform_(self.fc3.weight.data, -f3, f3)\n        T.nn.init.uniform_(self.fc3.bias.data, -f3, f3)\n        #self.fc2.weight.data.uniform_(-f2, f2)\n        #self.fc2.bias.data.uniform_(-f2, f2)\n        self.bn3 = nn.LayerNorm(self.fc3_dims)\n\n        self.fc4 = nn.Linear(self.fc3_dims, self.fc4_dims)\n        f4 = 1./np.sqrt(self.fc4.weight.data.size()[0])\n        T.nn.init.uniform_(self.fc4.weight.data, -f4, f4)\n        T.nn.init.uniform_(self.fc4.bias.data, -f4, f4)\n        #self.fc2.weight.data.uniform_(-f2, f2)\n        #self.fc2.bias.data.uniform_(-f2, f2)\n        self.bn4 = nn.LayerNorm(self.fc4_dims)\n\n        self.action_value = nn.Linear(self.n_actions, self.fc4_dims)\n        f5 = 0.003\n        self.q = nn.Linear(self.fc4_dims, 1)\n        T.nn.init.uniform_(self.q.weight.data, -f5, f5)\n        T.nn.init.uniform_(self.q.bias.data, -f5, f5)\n        #self.q.weight.data.uniform_(-f3, f3)\n        #self.q.bias.data.uniform_(-f3, f3)\n\n        self.optimizer = optim.Adam(self.parameters(), lr=beta)\n#        if torch.cuda.available():\n#            import torch.cuda as T\n#        else:\n#            import torch as T\n        self.device = T.device('cuda:0' if T.cuda.is_available() else 'cpu')\n\n        self.to(self.device)\n\n    def forward(self, state, action):\n        state_value = self.fc1(state)\n        state_value = self.bn1(state_value)\n        state_value = F.relu(state_value)\n        state_value = self.fc2(state_value)\n        state_value = self.bn2(state_value)\n        state_value = F.relu(state_value)\n        state_value = self.fc3(state_value)\n        state_value = self.bn3(state_value)\n        state_value = F.relu(state_value)\n        state_value = self.fc4(state_value)\n        state_value = self.bn4(state_value)\n\n        action_value = F.relu(self.action_value(action))\n        state_action_value = F.relu(T.add(state_value, action_value))\n        state_action_value = self.q(state_action_value)\n\n        return state_action_value\n\n    def save_checkpoint(self):\n        print('... saving checkpoint ...')\n        T.save(self.state_dict(), self.checkpoint_file)\n\n    def load_checkpoint(self,load_file = ''):\n        print('... loading checkpoint ...')\n        if T.cuda.is_available():\n            self.load_state_dict(T.load(load_file))\n        else:\n            self.load_state_dict(T.load(load_file, map_location=T.device('cpu')))\n\nclass ActorNetwork(nn.Module):\n    def __init__(self, alpha, input_dims, fc1_dims, fc2_dims, fc3_dims, fc4_dims, n_actions, name,\n                 chkpt_dir='C:\\\\demo\\\\IRS_TD3_minimal\\\\main_foder\\\\tmp\\\\TD3', load_file = ''):\n        super(ActorNetwork, self).__init__()\n        self.input_dims = input_dims\n        self.fc1_dims = fc1_dims\n        self.fc2_dims = fc2_dims\n        self.fc3_dims = fc3_dims\n        self.fc4_dims = fc4_dims        \n        self.n_actions = n_actions\n        self.checkpoint_file = os.path.join(chkpt_dir,name+'_TD3')\n        self.load_file = 'C:\\\\demo\\\\other_branch\\\\Learning-based_Secure_Transmission_for_RIS_Aided_mmWave-UAV_Communications_with_Imperfect_CSI\\\\data\\\\mannal_store\\\\models\\\\Actor_UAV_TD3'\n        self.fc1 = nn.Linear(*self.input_dims, self.fc1_dims)\n        f1 = 1./np.sqrt(self.fc1.weight.data.size()[0])\n#        T.nn.init.uniform_(self.fc1.weight.data, -f1, f1)\n#        T.nn.init.uniform_(self.fc1.bias.data, -f1, f1)\n        self.fc1.weight.data.uniform_(-f1, f1)\n        self.fc1.bias.data.uniform_(-f1, f1)\n        self.bn1 = nn.LayerNorm(self.fc1_dims)\n\n        self.fc2 = nn.Linear(self.fc1_dims, self.fc2_dims)\n        #f2 = 0.002\n        f2 = 1./np.sqrt(self.fc2.weight.data.size()[0])\n#        T.nn.init.uniform_(self.fc2.weight.data, -f2, f2)\n#        T.nn.init.uniform_(self.fc2.bias.data, -f2, f2)\n        self.fc2.weight.data.uniform_(-f2, f2)\n        self.fc2.bias.data.uniform_(-f2, f2)\n        self.bn2 = nn.LayerNorm(self.fc2_dims)\n\n        self.fc3 = nn.Linear(self.fc2_dims, self.fc3_dims)\n        f3 = 1./np.sqrt(self.fc3.weight.data.size()[0])\n#        T.nn.init.uniform_(self.fc3.weight.data, -f3, f3)\n#        T.nn.init.uniform_(self.fc3.bias.data, -f3, f3)\n        self.fc2.weight.data.uniform_(-f2, f2)\n        self.fc2.bias.data.uniform_(-f2, f2)\n        self.bn3 = nn.LayerNorm(self.fc3_dims)\n\n        self.fc4 = nn.Linear(self.fc3_dims, self.fc4_dims)\n        f4 = 1./np.sqrt(self.fc4.weight.data.size()[0])\n#        T.nn.init.uniform_(self.fc4.weight.data, -f4, f4)\n#        T.nn.init.uniform_(self.fc4.bias.data, -f4, f4)\n        self.fc2.weight.data.uniform_(-f2, f2)\n        self.fc2.bias.data.uniform_(-f2, f2)\n        self.bn4 = nn.LayerNorm(self.fc4_dims)\n\n        #f3 = 0.004\n        f5 = 0.003\n        self.mu = nn.Linear(self.fc4_dims, self.n_actions)\n#        T.nn.init.uniform_(self.mu.weight.data, -f5, f5)\n#        T.nn.init.uniform_(self.mu.bias.data, -f5, f5)\n        self.mu.weight.data.uniform_(-f3, f3)\n        self.mu.bias.data.uniform_(-f3, f3)\n\n        self.optimizer = optim.Adam(self.parameters(), lr=alpha)\n        self.device = T.device('cuda:0' if T.cuda.is_available() else 'cpu')\n\n        self.to(self.device)\n\n    def forward(self, state):\n        x = self.fc1(state)\n        x = self.bn1(x)\n        x = F.relu(x)\n        x = self.fc2(x)\n        x = self.bn2(x)\n        x = F.relu(x)\n        x = self.fc3(x)\n        x = self.bn3(x)\n        x = F.relu(x)\n        x = self.fc4(x)\n        x = self.bn4(x)\n        x = F.relu(x)\n        x = T.tanh(self.mu(x))\n\n        return x\n\n    def save_checkpoint(self):\n        print('... saving checkpoint ...')\n        T.save(self.state_dict(), self.checkpoint_file)\n\n    def load_checkpoint(self, load_file=''):\n        print('... loading checkpoint ...')\n        if T.cuda.is_available():\n            self.load_state_dict(T.load(load_file))\n        else:\n            self.load_state_dict(T.load(load_file, map_location=T.device('cpu')))\n\nclass Agent(object):\n    def __init__(self, alpha, beta, input_dims, tau, env, gamma=0.99,\n                 n_actions=2, max_size=1000000, layer1_size=400,\n                 layer2_size=300, layer3_size=256, layer4_size=128, batch_size=64, \n                 update_actor_interval=2, noise = 'AWGN', agent_name = 'default', load_file = ''):\n        self.load_file = load_file\n        self.layer1_size = layer1_size\n        self.layer2_size = layer2_size\n        self.layer3_size = layer3_size\n        self.layer4_size = layer4_size\n        self.gamma = gamma\n        self.tau = tau\n        self.memory = ReplayBuffer(max_size, input_dims, n_actions)\n        self.batch_size = batch_size\n        self.learn_step_cntr = 0\n        self.update_actor_iter = update_actor_interval\n\n        self.actor = ActorNetwork(alpha, input_dims, layer1_size,\n                                  layer2_size, layer3_size, layer4_size, n_actions=n_actions,\n                                  name='Actor_' + agent_name,chkpt_dir=env.data_manager.store_path )\n        self.critic_1 = CriticNetwork(beta, input_dims, layer1_size,\n                                    layer2_size, layer3_size, layer4_size, n_actions=n_actions,\n                                    name='Critic_1_' + agent_name,chkpt_dir=env.data_manager.store_path)\n        self.critic_2 = CriticNetwork(beta, input_dims, layer1_size,\n                                    layer2_size, layer3_size, layer4_size, n_actions=n_actions,\n                                    name='Critic_2_' + agent_name,chkpt_dir=env.data_manager.store_path)\n\n        self.target_actor = ActorNetwork(alpha, input_dims, layer1_size,\n                                         layer2_size, layer3_size, layer4_size, n_actions=n_actions,\n                                         name='TargetActor_' + agent_name,chkpt_dir=env.data_manager.store_path)\n        self.target_critic_1 = CriticNetwork(beta, input_dims, layer1_size,\n                                           layer2_size, layer3_size, layer4_size, n_actions=n_actions,\n                                           name='TargetCritic_1_' + agent_name,chkpt_dir=env.data_manager.store_path)\n        self.target_critic_2 = CriticNetwork(beta, input_dims, layer1_size,\n                                           layer2_size, layer3_size, layer4_size, n_actions=n_actions,\n                                           name='TargetCritic_2_' + agent_name,chkpt_dir=env.data_manager.store_path)\n        if noise == 'OU':\n            self.noise = OUActionNoise(mu=np.zeros(n_actions))\n        elif noise == 'AWGN':\n            self.noise = AWGNActionNoise(mu = np.zeros(n_actions))\n        # tau = 1 means copy parameters to target\n        self.update_network_parameters(tau=1)\n\n    def choose_action(self, observation, greedy=0.5, epsilon = 1):\n        self.actor.eval()\n        observation = T.tensor(observation, dtype=T.float).to(self.actor.device)\n        mu = self.actor.forward(observation).to(self.actor.device)\n        mu_prime = mu + T.tensor(greedy * self.noise(),\n                                 dtype=T.float).to(self.actor.device)\n        self.actor.train()\n        return mu_prime.cpu().detach().numpy()\n\n\n    def remember(self, state, action, reward, new_state, done):\n        self.memory.store_transition(state, action, reward, new_state, done)\n\n    def learn(self):\n        if self.memory.mem_cntr < self.batch_size:\n            return\n        # the done here is opposite of the done in the env\n        state, action, reward, new_state, done = \\\n                                      self.memory.sample_buffer(self.batch_size)\n\n        # trun s, a, r, new_s into tensor\n        reward = T.tensor(reward, dtype=T.float).to(self.critic_1.device)\n        done = T.tensor(done).to(self.critic_1.device)\n        new_state = T.tensor(new_state, dtype=T.float).to(self.critic_1.device)\n        action = T.tensor(action, dtype=T.float).to(self.critic_1.device)\n        state = T.tensor(state, dtype=T.float).to(self.critic_1.device)\n\n        # trun on evaliation mode of target actor, target critic, critic net\n        # fix these three nets\n        self.target_actor.eval()\n        self.target_critic_1.eval()\n        self.target_critic_2.eval()\n        self.critic_1.eval()\n        self.critic_2.eval()\n\n        \n        target_actions = self.target_actor.forward(new_state)\n        # target_actions = target_actions + \\\n        #         T.clamp(T.tensor(np.random.normal(scale=0.2)), -0.5, 0.5)\n        # # might break if elements of min and max are not all equal\n        # target_actions = T.clamp(target_actions, self.min_action[0], self.max_action[0])\n        \n        critic_value_1_ = self.target_critic_1.forward(new_state, target_actions)\n        critic_value_2_ = self.target_critic_2.forward(new_state, target_actions)\n        \n        critic_value_1 = self.critic_1.forward(state, action)\n        critic_value_2 = self.critic_2.forward(state, action)\n        \n        critic_value_ = T.min(critic_value_1_, critic_value_2_)\n        \n        target = []\n        for j in range(self.batch_size):\n            target.append(reward[j] + self.gamma*critic_value_[j]*done[j])\n        target = T.tensor(target).to(self.critic_1.device)\n        target = target.view(self.batch_size, 1)\n        \n        # here update the critic net using mse of (r + gamma * Q_argmax_a*(newstate, a*)) - Q(state, action)\n        self.critic_1.train()\n        self.critic_2.train()\n        \n        self.critic_1.optimizer.zero_grad()\n        self.critic_2.optimizer.zero_grad()\n        \n        critic_1_loss = F.mse_loss(target, critic_value_1)\n        critic_2_loss = F.mse_loss(target, critic_value_2)\n        critic_loss = critic_1_loss + critic_2_loss\n        critic_loss.backward()\n        \n        self.critic_1.optimizer.step()\n        self.critic_2.optimizer.step()\n        \n        self.learn_step_cntr += 1\n        \n        if self.learn_step_cntr % self.update_actor_iter != 0:\n            return\n\n        # here update the actor net by policy gradient\n        # first fix the critic net\n        self.critic_1.eval()\n        self.critic_2.eval()\n        \n        self.actor.optimizer.zero_grad()\n        mu = self.actor.forward(state)\n        self.actor.train()\n        actor_q1_loss = self.critic_1.forward(state, mu)\n        actor_loss = -T.mean(actor_q1_loss)\n        actor_loss.backward()\n        self.actor.optimizer.step()\n\n        self.update_network_parameters()\n        \n\n    def update_network_parameters(self, tau=None):\n        if tau is None:\n            tau = self.tau\n\n        actor_params = self.actor.named_parameters()\n        critic_1_params = self.critic_1.named_parameters()\n        critic_2_params = self.critic_2.named_parameters()\n        target_actor_params = self.target_actor.named_parameters()\n        target_critic_1_params = self.target_critic_1.named_parameters()\n        target_critic_2_params = self.target_critic_2.named_parameters()\n\n        critic_1_state_dict = dict(critic_1_params)\n        critic_2_state_dict = dict(critic_2_params)\n        actor_state_dict = dict(actor_params)\n        target_actor_state_dict = dict(target_actor_params)\n        target_critic_1_state_dict = dict(target_critic_1_params)\n        target_critic_2_state_dict = dict(target_critic_2_params)\n\n        for name in critic_1_state_dict:\n            critic_1_state_dict[name] = tau*critic_1_state_dict[name].clone() + \\\n                    (1-tau)*target_critic_1_state_dict[name].clone()\n\n        for name in critic_2_state_dict:\n            critic_2_state_dict[name] = tau*critic_2_state_dict[name].clone() + \\\n                    (1-tau)*target_critic_2_state_dict[name].clone()\n\n        for name in actor_state_dict:\n            actor_state_dict[name] = tau*actor_state_dict[name].clone() + \\\n                    (1-tau)*target_actor_state_dict[name].clone()\n                    \n        self.target_critic_1.load_state_dict(critic_1_state_dict)\n        self.target_critic_2.load_state_dict(critic_2_state_dict)\n        self.target_actor.load_state_dict(actor_state_dict)\n\n\n    def save_models(self):\n        self.actor.save_checkpoint()\n        self.target_actor.save_checkpoint()\n        self.critic_1.save_checkpoint()\n        self.critic_2.save_checkpoint()\n        self.target_critic_1.save_checkpoint()\n        self.target_critic_2.save_checkpoint()\n\n    def load_models(self, load_file_actor = '',load_file_critic_1 ='',load_file_critic_2 =''):\n        self.actor.load_checkpoint(load_file = load_file_actor)\n        self.target_actor.load_checkpoint(load_file = load_file_actor)\n        self.critic_1.load_checkpoint(load_file = load_file_critic_1)\n        self.critic_2.load_checkpoint(load_file = load_file_critic_2)\n        self.target_critic_1.load_checkpoint(load_file = load_file_critic_1)\n        self.target_critic_2.load_checkpoint(load_file = load_file_critic_2)\n"
  }
]