Repository: yjwong1999/Twin-TD3 Branch: main Commit: 1e172dc58daa Files: 47 Total size: 180.3 KB Directory structure: gitextract_sya5mzu7/ ├── README.md ├── batch_eval.sh ├── batch_train.sh ├── channel.py ├── data/ │ ├── .vscode/ │ │ └── settings.json │ ├── data_test.py │ ├── init_location.xlsx │ ├── readme.md │ ├── simulation_result.xlsx │ └── storage/ │ ├── benchmark/ │ │ ├── ddpg_see_benchmark/ │ │ │ ├── Actor_UAV_ddpg │ │ │ ├── Critic_UAV_ddpg │ │ │ ├── TargetActor_UAV_ddpg │ │ │ └── TargetCritic_UAV_ddpg │ │ ├── ddpg_ssr_benchmark/ │ │ │ ├── Actor_UAV_ddpg │ │ │ ├── Critic_UAV_ddpg │ │ │ ├── TargetActor_UAV_ddpg │ │ │ └── TargetCritic_UAV_ddpg │ │ ├── td3_see_benchmark/ │ │ │ ├── Actor_UAV_TD3 │ │ │ ├── Critic_1_UAV_TD3 │ │ │ ├── Critic_2_UAV_TD3 │ │ │ ├── TargetActor_UAV_TD3 │ │ │ ├── TargetCritic_1_UAV_TD3 │ │ │ └── TargetCritic_2_UAV_TD3 │ │ └── td3_ssr_benchmark/ │ │ ├── Actor_UAV_TD3 │ │ ├── Critic_1_UAV_TD3 │ │ ├── Critic_2_UAV_TD3 │ │ ├── TargetActor_UAV_TD3 │ │ ├── TargetCritic_1_UAV_TD3 │ │ └── TargetCritic_2_UAV_TD3 │ └── readme.md ├── data_manager.py ├── ddpg.py ├── entity.py ├── env.py ├── env1.py ├── load_and_plot.py ├── main_RIS.py ├── main_ref.py ├── main_train.py ├── math_tool.py ├── plot_see.py ├── plot_ssr.py ├── plot_traj.py ├── render.py ├── requirements.txt ├── run_simulation.py └── td3.py ================================================ FILE CONTENTS ================================================ ================================================ FILE: README.md ================================================ # Deep Reinforcement Learning for Secrecy Energy-Efficient UAV Communication with Reconfigurable Intelligent Surfaces **IEEE Wireless Communications and Networking Conference 2023 (WCNC 2023)**
Simulation for Conference Proceedings https://doi.org/10.1109/WCNC55385.2023.10118891
Refer [this link](https://github.com/yjwong1999/Twin-TD3/blob/main/WCNC2023%20WS-09%20%231570879488.pdf) for the preprint ## Abstract This paper investigates the **physical layer security (PLS)** issue in **reconfigurable intelligent surface (RIS) aided millimeter-wave rotary-wing unmanned aerial vehicle (UAV) communications** under the presence of multiple eavesdroppers and imperfect channel state information (CSI). The goal is to maximize the **worst-case secrecy energy efficiency (SEE)** of UAV via a **joint optimization of flight trajectory, UAV active beamforming and RIS passive beamforming**. By interacting with the dynamically changing UAV environment, real-time decision making per time slot is possible via deep reinforcement learning (DRL). To decouple the continuous optimization variables, we introduce a **twin twin-delayed deep deterministic policy gradient (TTD3)** to maximize the expected cumulative reward, which is linked to SEE enhancement. Simulation results confirm that the proposed method achieves greater secrecy energy savings than the traditional twin-deep deterministic policy gradient DRL (TDDRL)-based method. ## TLDR ### System model: **RIS-aided mmWave UAV system** under the presence of **eavesdroppers** and **imperfect channel state information (CSI)**
### Solution: A **Twin-TD3 (TTD3) algorithm** to decouple the joint optimization of: 1. UAV active beamforming and RIS passive beamforming 2. UAV flight trajectory We adopt **double DRL framework**, where the 1st and 2nd agent provides the policy for task (1) and (2), respectively. ## How to use this repo Setup the repo ``` conda create --name python=3.10.4 conda activate git clone https://github.com/yjwong1999/Twin-TD3.git cd Twin-TD3 pip install -r requirements.txt ``` User can train two types of algorithm for training: 1. Twin DDPG is [TDDRL algorithm](https://doi.org/10.1109/LWC.2021.3081464) 2. Twin TD3 is our proposed TTD3 algorithm Run the following in the `bash` or `powershell` `main_train.py` is the main python file to train the DRL algorithms ```shell # To use Twin DDPG with SSR as optimization goal python3 main_train.py --drl ddpg --reward ssr # To use Twin TD3 with SSR as optimization goal python3 main_train.py --drl td3 --reward ssr # To use Twin DDPG with SEE as optimization goal python3 main_train.py --drl ddpg --reward see # To use Twin TD3 with SEE as optimization goal python3 main_train.py --drl td3 --reward see # To use pretrained DRL for UAV trajectory (recommended for stable convergence) python3 main_train.py --drl td3 --reward see --trained-uav # To set number of episodes (default is 300) python3 main_train.py --drl td3 --reward see --ep-num 300 # To set seeds for DRL weight initialization (not recommended) python3 main_train.py --drl td3 --reward see --seeds 0 # weights of both DRL are initialized with seed 0 python3 main_train.py --drl td3 --reward see --seeds 0 1 # weights of DRL 1 and DRL2 are initialized with seed 0 and 1, respectively ``` `run_simulation.py` is the python file to run the simulation using your trained models ```shell # plot everything for each episode python3 run_simulation.py --path data/storage/scratch/ # if you train the algorithm without the pretrained uav python3 run_simulation.py --path data/storage/trained_uav/ # if you train the algorithm with the pretrained uav ``` `load_and_plot.py` is the python file to plot the (i) Rewards, (ii) Sum Secrecy Rate (SSR), (iii) Secrecy Energy Efficient (SEE), (iv) UAV Trajectory, (v) RIS configs for each episode in one experiments. The plotted figures are saved at `data/storage/scratch//plot` or `data/storage/trained_uav/plot` ```shell # plot everything for each episode python3 load_and_plot.py --path data/storage/scratch/ --ep-num 300 # if you train the algorithm without the pretrained uav python3 load_and_plot.py --path data/storage/trained_uav/ --ep-num 300 # if you train the algorithm with the pretrained uav ``` Note that you can use the bash script `batch_train.sh` and `batch_eval.sh` to train the algorithms and evaluate them using the previous two python codes ```shell # To train on batch bash batch_train.sh # To evaluate on batch bash batch_eval.sh ``` ## Results We run the ```main_train.py``` for 5 times for each settings below, and averaged out the performance SSR and SEE (the higher the better) Total Energy Consumption (the lower the better) | Algorithms | SSR (bits/s/Hz)| Energy (kJ) | SEE (bits/s/Hz/kJ)| |--------------------------------|----------------|-------------|-------------------| | TDDRL | 5.03 | 12.4 | 40.8 | | TTD3 | 6.05 | 12.7 | 48.2 | | TDDRL (with energy constraint) | 4.68 | 11.2 | 39.4 | | TTD3 (with energy constraint) | 5.39 | 11.2 | 48.4 | Summary 1. In terms of SSR, TTD3 outperforms TDDRL with or without energy constraint 2. In terms of SEE and Energy, TTD3 (with energy constraint) outperforms all other algorithms 3. Generally, TTD3 algorithm are better than TTDRL 4. Even with energy contraint (trade-off between energy consumption and SSR), TTD3 outperforms TDDRL in all aspects \* Remarks:
Note that the performance of DRL (especially twin DRL) has a big variation, sometimes you may get extremely good (or bad) performance
The above benchmark results are averaged performance of several experiments, to get a more holistic understandings on the algorithms
It is advised to use the benchmark UAV models we trained, for better convergence.
This approach is consistent with the codes provided by [TDDRL](https://github.com/Brook1711/WCL-pulish-code) ## References and Acknowledgement This work was supported by the **British Council** under **UK-ASEAN Institutional Links Early Career Researchers Scheme** with project number 913030644. Both **RIS Simulation** and the **System Model** for this Research Project are based the research work provided by [Brook1711](https://github.com/Brook1711).
We intended to fork the original repo for the system model (as stated below) as the base of this project.
However, GitHub does not allow a forked repo to be private.
Hence, we could not maintain our code based a forked version of the original repo, while keeping it private until the project is completed. We would like to express our utmost gratitude for [Brook1711](https://github.com/Brook1711) and his co-authors for their research work. ### RIS Simulation RIS Simulation is based on the following research work:
[SimRIS Channel Simulator for Reconfigurable Intelligent Surface-Empowered Communication Systems](https://ieeexplore.ieee.org/document/9282349)
The original simulation code is coded in matlab, this [GitHub repo](https://github.com/Brook1711/RIS_components) provides a Python version of the simulation. ### System Model: RIS-aided mmWave UAV communications The simulation of the System Model is provided by the following research work:
[Learning-Based Robust and Secure Transmission for Reconfigurable Intelligent Surface Aided Millimeter Wave UAV Communications](https://doi.org/10.1109/LWC.2021.3081464)
The code is provided in this [GitHub repo](https://github.com/Brook1711/WCL-pulish-code). ### Rotary-Wing UAV We can derive the Rotary-Wing UAV’s propulsion energy consumption based on the following research work:
[Energy Minimization in Internet-of-Things System Based on Rotary-Wing UAV](https://doi.org/10.1109/LWC.2019.2916549) ### TD3 Main reference for TD3 implementation:
[PyTorch/TensorFlow 2.0 for TD3](https://github.com/philtabor/Actor-Critic-Methods-Paper-To-Code/tree/master/TD3) ## TODO - [x] Add argparse arguments to set drl algo and reward type - [x] Add argparse arguments to set episode number - [x] Add argparse arguments to set seeds for the two DRLs - [x] Add argparse arguments to load pretrained DRL for UAV trajectory - [x] Add benchmark/pretrained model - [x] Project naming (use __ instead of using datetime format) - [x] Remove saving "best model", there are no best model, only latest model - [ ] The following codes can be used, but you have to manually change the filepath in the codes `plot_ssr.py` is the python file to plot the final episode's SSR for the 4 benchmarks in the paper ```shell # plot ssr python3 plot_ssr.py ``` `plot_see.py` is the python file to plot the final episode's SSR for the 4 benchmarks in the paper ```shell # plot see python3 plot_see.py ``` `plot_traj.py` is the python file to plot the final episode's UAV trajectory for the 4 benchmarks in the paper ```shell # plot UAV trajectory python3 plot_traj.py ``` ## Cite this repository ``` @INPROCEEDINGS{10118891, author={Tham, Mau-Luen and Wong, Yi Jie and Iqbal, Amjad and Ramli, Nordin Bin and Zhu, Yongxu and Dagiuklas, Tasos}, booktitle={2023 IEEE Wireless Communications and Networking Conference (WCNC)}, title={Deep Reinforcement Learning for Secrecy Energy- Efficient UAV Communication with Reconfigurable Intelligent Surface}, year={2023}, doi={10.1109/WCNC55385.2023.10118891}} ```

Star History

[![Star History Chart](https://api.star-history.com/svg?repos=yjwong1999/Twin-TD3&type=Date)](https://star-history.com/#yjwong1999/Twin-TD3&Date)

================================================ FILE: batch_eval.sh ================================================ #!/bin/bash echo echo ddpg_ssr python3 load_and_plot.py --path data/storage/trained_uav/ddpg_ssr --ep-num 300 echo ddpg_ssr_2 python3 load_and_plot.py --path data/storage/trained_uav/ddpg_ssr_2 --ep-num 300 echo ddpg_ssr_3 python3 load_and_plot.py --path data/storage/trained_uav/ddpg_ssr_3 --ep-num 300 echo ddpg_ssr_4 python3 load_and_plot.py --path data/storage/trained_uav/ddpg_ssr_4 --ep-num 300 echo ddpg_ssr_5 python3 load_and_plot.py --path data/storage/trained_uav/ddpg_ssr_5 --ep-num 300 echo td3_ssr python3 load_and_plot.py --path data/storage/trained_uav/td3_ssr --ep-num 300 echo td3_ssr_2 python3 load_and_plot.py --path data/storage/trained_uav/td3_ssr_2 --ep-num 300 echo td3_ssr_3 python3 load_and_plot.py --path data/storage/trained_uav/td3_ssr_3 --ep-num 300 echo td3_ssr_4 python3 load_and_plot.py --path data/storage/trained_uav/td3_ssr_4 --ep-num 300 echo td3_ssr_5 python3 load_and_plot.py --path data/storage/trained_uav/td3_ssr_5 --ep-num 300 echo ddpg_see python3 load_and_plot.py --path data/storage/trained_uav/ddpg_see --ep-num 300 echo ddpg_see_2 python3 load_and_plot.py --path data/storage/trained_uav/ddpg_see_2 --ep-num 300 echo ddpg_see_3 python3 load_and_plot.py --path data/storage/trained_uav/ddpg_see_3 --ep-num 300 echo ddpg_see_4 python3 load_and_plot.py --path data/storage/trained_uav/ddpg_see_4 --ep-num 300 echo ddpg_see_5 python3 load_and_plot.py --path data/storage/trained_uav/ddpg_see_5 --ep-num 300 echo td3_see python3 load_and_plot.py --path data/storage/trained_uav/td3_see --ep-num 300 echo td3_see_2 python3 load_and_plot.py --path data/storage/trained_uav/td3_see_2 --ep-num 300 echo td3_see_3 python3 load_and_plot.py --path data/storage/trained_uav/td3_see_3 --ep-num 300 echo td3_see_4 python3 load_and_plot.py --path data/storage/trained_uav/td3_see_4 --ep-num 300 echo td3_see_5 python3 load_and_plot.py --path data/storage/trained_uav/td3_see_5 --ep-num 300 ================================================ FILE: batch_train.sh ================================================ #!/bin/bash echo echo ddpg_ssr python3 main_train.py --drl ddpg --reward ssr --ep-num 300 --trained-uav echo ddpg_ssr_2 python3 main_train.py --drl ddpg --reward ssr --ep-num 300 --trained-uav echo ddpg_ssr_3 python3 main_train.py --drl ddpg --reward ssr --ep-num 300 --trained-uav echo ddpg_ssr_4 python3 main_train.py --drl ddpg --reward ssr --ep-num 300 --trained-uav echo ddpg_ssr_5 python3 main_train.py --drl ddpg --reward ssr --ep-num 300 --trained-uav echo td3_ssr python3 main_train.py --drl td3 --reward ssr --ep-num 300 --trained-uav echo td3_ssr_2 python3 main_train.py --drl td3 --reward ssr --ep-num 300 --trained-uav echo td3_ssr_3 python3 main_train.py --drl td3 --reward ssr --ep-num 300 --trained-uav echo td3_ssr_4 python3 main_train.py --drl td3 --reward ssr --ep-num 300 --trained-uav echo td3_ssr_5 python3 main_train.py --drl td3 --reward ssr --ep-num 300 --trained-uav echo ddpg_see python3 main_train.py --drl ddpg --reward see --ep-num 300 --trained-uav echo ddpg_see_2 python3 main_train.py --drl ddpg --reward see --ep-num 300 --trained-uav echo ddpg_see_3 python3 main_train.py --drl ddpg --reward see --ep-num 300 --trained-uav echo ddpg_see_4 python3 main_train.py --drl ddpg --reward see --ep-num 300 --trained-uav echo ddpg_see_5 python3 main_train.py --drl ddpg --reward see --ep-num 300 --trained-uav echo td3_see python3 main_train.py --drl td3 --reward see --ep-num 300 --trained-uav echo td3_see_2 python3 main_train.py --drl td3 --reward see --ep-num 300 --trained-uav echo td3_see_3 python3 main_train.py --drl td3 --reward see --ep-num 300 --trained-uav echo td3_see_4 python3 main_train.py --drl td3 --reward see --ep-num 300 --trained-uav echo td3_see_5 python3 main_train.py --drl td3 --reward see --ep-num 300 --trained-uav ================================================ FILE: channel.py ================================================ import numpy as np import math import cmath from math_tool import * class mmWave_channel(object): """ generate MmWave under UMi open input: distance, angle, pair entity object output: Instantaneous CSI """ def __init__(self, transmitter, receiver, frequncy): """ transmitter: object in entity.py receiver: object in entity.py """ self.channel_name = '' self.n = 0 self.sigma = 0 self.transmitter = transmitter self.receiver = receiver self.channel_type = self.init_type() # 'UAV_RIS', 'UAV_user', 'UAV_attacker', 'RIS_user', 'RIS_attacker' # self.distance = np.linalg.norm(transmitter.coordinate - receiver.coordinate) self.frequncy = frequncy # init & updata path loss self.path_loss_normal = self.get_channel_path_loss() self.path_loss_dB = normal_to_dB(self.path_loss_normal) # init & update channel CSI matrix self.channel_matrix = self.get_estimated_channel_matrix() def init_type(self): channel_type = self.transmitter.type+'_'+self.receiver.type if channel_type == 'UAV_RIS' or channel_type == 'RIS_UAV': self.n = 2.2 self.sigma = 3 self.channel_name = 'H_UR' elif channel_type == 'UAV_user' or channel_type == 'UAV_attacker': self.n = 3.5 self.sigma = 3 if channel_type =='UAV_user': self.channel_name = 'h_U_k,' + str(self.transmitter.index) elif channel_type == 'UAV_attacker': self.channel_name = 'h_U_p,' + str(self.transmitter.index) elif channel_type == 'user_UAV' or channel_type == 'attacker_UAV': self.n = 3.5 self.sigma = 3 if channel_type =='user_UAV': self.channel_name = 'h_U_k,' + str(self.transmitter.index) elif channel_type == 'attacker_UAV': self.channel_name = 'h_U_p,' + str(self.transmitter.index) elif channel_type == 'RIS_user' or channel_type == 'RIS_attacker': self.n = 2.8 self.sigma = 3 if channel_type =='RIS_user': self.channel_name = 'h_R_k,' + str(self.transmitter.index) elif channel_type == 'RIS_attacker': self.channel_name = 'h_R_p,' + str(self.transmitter.index) elif channel_type == 'user_RIS' or channel_type == 'attacker_RIS': self.n = 2.8 self.sigma = 3 if channel_type =='user_RIS': self.channel_name = 'h_R_k,' + str(self.transmitter.index) elif channel_type == 'attacker_RIS': self.channel_name = 'h_R_p,' + str(self.transmitter.index) return channel_type def get_channel_path_loss(self): """ calculate the path loss including shadow fading (in normal form) """ distance = np.linalg.norm(self.transmitter.coordinate - self.receiver.coordinate) PL = -20 * math.log10(4*math.pi/(3e8/self.frequncy)) - 10*self.n*math.log10(distance) shadow_loss = np.random.normal() * self.sigma # return dB_to_normal(PL - shadow_loss) return dB_to_normal(PL) def get_estimated_channel_matrix(self): """ init & update channel matrix """ # init matrix N_t = self.transmitter.ant_num N_r = self.receiver.ant_num channel_matrix = np.mat(np.ones(shape=(N_r,N_t),dtype=complex), dtype=complex) # get relevant coordinate receiver under transmitter system r_under_t_car_coor = get_coor_ref(\ self.transmitter.coor_sys, \ self.receiver.coordinate - self.transmitter.coordinate) # get relevant spherical_coordinate r_t_r, r_t_theta, r_t_fai = cartesian_coordinate_to_spherical_coordinate(\ cartesian_coordinate=r_under_t_car_coor\ ) # get relevant coordinate transmitter under receiver system t_under_r_car_coor = get_coor_ref(\ # remmber to Meet channel direction restrictions [-self.receiver.coor_sys[0], self.receiver.coor_sys[1], -self.receiver.coor_sys[2]],\ self.transmitter.coordinate - self.receiver.coordinate) # get relevant spherical_coordinate t_r_r, t_r_theta, t_r_fai = cartesian_coordinate_to_spherical_coordinate(\ cartesian_coordinate=t_under_r_car_coor\ ) # calculate array response t_array_response = self.generate_array_response(self.transmitter, r_t_theta, r_t_fai) r_array_response = self.generate_array_response(self.receiver, t_r_theta, t_r_fai) array_response_product = r_array_response * t_array_response.H # get H_LOS # get LOS path loss PL = self.path_loss_normal # get LOS phase shift LOS_fai = 2 * math.pi * self.frequncy * np.linalg.norm(self.transmitter.coordinate - self.receiver.coordinate) / 3e8 channel_matrix = cmath.exp(1j*LOS_fai)* math.pow(PL, 0.5) * array_response_product return channel_matrix def generate_array_response(self, transceiver, theta, fai): """ if the ant_type is 'UPA' generate_UPA_response if the ant_type is 'ULA' generate_ULA_response if the ant_type is 'single' generate_singleant_response """ ant_type = transceiver.ant_type ant_num = transceiver.ant_num if ant_type == 'UPA': row_num = int(math.sqrt(ant_num)) Planar_response = np.mat(np.ones(shape=(ant_num, 1)), dtype=complex) for i in range(row_num): for j in range(row_num): Planar_response[j+i*row_num,0] = cmath.exp(1j *\ (math.sin(theta) * math.cos(fai)*i*math.pi + math.sin(theta)*math.sin(fai))\ ) return Planar_response elif ant_type == 'ULA': Linear_response = np.mat(np.ones(shape=(ant_num,1)), dtype=complex) for i in range(ant_num): Linear_response[i, 0] = cmath.exp(1j * math.sin(theta) * math.cos(fai)*i*math.pi) return Linear_response elif ant_type == 'single': return np.mat(np.array([1])) else: return False def update_CSI(self): """ update pathloss and channel matrix """ # init & updata path loss self.path_loss_normal = self.get_channel_path_loss() self.path_loss_dB = normal_to_dB(self.path_loss_normal) # init & update channel CSI matrix self.channel_matrix = self.get_estimated_channel_matrix() ================================================ FILE: data/.vscode/settings.json ================================================ { "python.pythonPath": "C:\\Users\\67039\\anaconda3\\python.exe" } ================================================ FILE: data/data_test.py ================================================ import pandas as pd user_sheet = pd.read_excel('init_location.xlsx', sheet_name='user') user_data = user_sheet print(user_data) ================================================ FILE: data/readme.md ================================================ #### init_location.xlsx - the config files for the initilization postion of each entity (such as UAV, users, attackers, RIS) #### data_test.py (should change name) - to read and show the initialization position of each entity ================================================ FILE: data/storage/readme.md ================================================ # This directory stores all experiments - Each experiment will be saved as a file project in this directory - benchmark directory stores all the pretrained DRL for UAV trajectory (agent 2 in the code), which will be used when --trained-uav flag is raised ================================================ FILE: data_manager.py ================================================ import numpy as np import scipy.io import pandas as pd import os import time, csv class DataManager(object): """ class to read and store simulation results before use, please create a direction under current file path './data' and must have a file 'init_location.xlsx' which contain the position of each entities """ def __init__(self, store_list = ['beamforming_matrix', 'reflecting_coefficient', 'UAV_state', 'user_capacity'],file_path = './data', store_path = './data/storage', project_name = None): # 1 init location data self.store_list = store_list self.init_data_file = file_path + '/init_location.xlsx' if project_name is None: self.time_stemp = time.strftime('/%Y-%m-%d %H_%M_%S',time.localtime(time.time())) self.store_path = store_path + self.time_stemp else: for i in range(1, 100, 1): if i == 1: dir_name = store_path + '/' + project_name else: dir_name = store_path + '/' + project_name + f'_{i}' if not os.path.isdir(dir_name): self.store_path = dir_name break os.makedirs(self.store_path) # self.writer = pd.ExcelWriter(self.store_path + '/simulation_result.xlsx', engine='openpyxl') # pylint: disable=abstract-class-instantiated self.simulation_result_dic = {} self.init_format() def save_file(self, episode_cnt = 10): # record step counts per episode with open(self.store_path + "/step_num_per_episode.csv", "a", newline='') as f: writer = csv.writer(f) writer.writerow([len(list(self.simulation_result_dic.values())[0])]) # when ended, auto save to .mat file scipy.io.savemat(self.store_path + '/simulation_result_ep_' + str(episode_cnt) + '.mat', {'result_' + str(episode_cnt):self.simulation_result_dic}) self.simulation_result_dic = {} self.init_format() def save_meta_data(self, meta_dic): """ save system and agent information """ scipy.io.savemat(self.store_path + '/meta_data.mat', {'meta_data': meta_dic}) def init_format(self): """ used only one time in env.py """ for store_item in self.store_list: self.simulation_result_dic.update({store_item:[]}) def read_init_location(self, entity_type = 'user', index = 0): if entity_type == 'user' or 'attacker' or 'RIS' or 'RIS_norm_vec' or 'UAV': return np.array([\ pd.read_excel(self.init_data_file, sheet_name=entity_type)['x'][index],\ pd.read_excel(self.init_data_file, sheet_name=entity_type)['y'][index],\ pd.read_excel(self.init_data_file, sheet_name=entity_type)['z'][index]]) else: return None def store_data(self, row_data, value_name): """ docstring """ self.simulation_result_dic[value_name].append(row_data) ================================================ FILE: ddpg.py ================================================ import os import torch as T #import torch.cuda as T import torch.nn as nn import torch.nn.functional as F import torch.optim as optim import numpy as np from data_manager import DataManager class OUActionNoise(object): def __init__(self, mu, sigma=0.15, theta=.2, dt=1e-2, x0=None): self.theta = theta self.mu = mu self.sigma = sigma self.dt = dt self.x0 = x0 self.reset() def __call__(self): x = self.x_prev + self.theta * (self.mu - self.x_prev) * self.dt + \ self.sigma * np.sqrt(self.dt) * np.random.normal(size=self.mu.shape) self.x_prev = x return x def reset(self): self.x_prev = self.x0 if self.x0 is not None else np.zeros_like(self.mu) def __repr__(self): return 'OrnsteinUhlenbeckActionNoise(mu={}, sigma={})'.format( self.mu, self.sigma) class AWGNActionNoise(object): def __init__(self, mu = 0, sigma=1): self.mu = mu self.sigma = sigma def __call__(self): #self.mu = mu #self.sigma = sigma x = np.random.normal(size=self.mu.shape) * self.sigma return x class ReplayBuffer(object): def __init__(self, max_size, input_shape, n_actions): self.mem_size = max_size self.mem_cntr = 0 self.state_memory = np.zeros((self.mem_size, *input_shape)) self.new_state_memory = np.zeros((self.mem_size, *input_shape)) self.action_memory = np.zeros((self.mem_size, n_actions)) self.reward_memory = np.zeros(self.mem_size) self.terminal_memory = np.zeros(self.mem_size, dtype=np.float32) def store_transition(self, state, action, reward, state_, done): index = self.mem_cntr % self.mem_size self.state_memory[index] = state self.new_state_memory[index] = state_ self.action_memory[index] = action self.reward_memory[index] = reward self.terminal_memory[index] = 1 - done self.mem_cntr += 1 def sample_buffer(self, batch_size): max_mem = min(self.mem_cntr, self.mem_size) batch = np.random.choice(max_mem, batch_size) states = self.state_memory[batch] actions = self.action_memory[batch] rewards = self.reward_memory[batch] states_ = self.new_state_memory[batch] terminal = self.terminal_memory[batch] return states, actions, rewards, states_, terminal class CriticNetwork(nn.Module): def __init__(self, beta, input_dims, fc1_dims, fc2_dims, fc3_dims, fc4_dims, n_actions, name, chkpt_dir='C:\\demo\\IRS_DDPG_minimal\\main_foder\\tmp\\ddpg', load_file = ''): super(CriticNetwork, self).__init__() self.input_dims = input_dims self.fc1_dims = fc1_dims self.fc2_dims = fc2_dims self.fc3_dims = fc3_dims self.fc4_dims = fc4_dims self.n_actions = n_actions self.checkpoint_file = os.path.join(chkpt_dir,name+'_ddpg') self.load_file = 'C:\\demo\\other_branch\\Learning-based_Secure_Transmission_for_RIS_Aided_mmWave-UAV_Communications_with_Imperfect_CSI\\data\\mannal_store\\models\\Critic_UAV_ddpg' self.fc1 = nn.Linear(*self.input_dims, self.fc1_dims) f1 = 1./np.sqrt(self.fc1.weight.data.size()[0]) T.nn.init.uniform_(self.fc1.weight.data, -f1, f1) T.nn.init.uniform_(self.fc1.bias.data, -f1, f1) #self.fc1.weight.data.uniform_(-f1, f1) #self.fc1.bias.data.uniform_(-f1, f1) self.bn1 = nn.LayerNorm(self.fc1_dims) self.fc2 = nn.Linear(self.fc1_dims, self.fc2_dims) f2 = 1./np.sqrt(self.fc2.weight.data.size()[0]) #f2 = 0.002 T.nn.init.uniform_(self.fc2.weight.data, -f2, f2) T.nn.init.uniform_(self.fc2.bias.data, -f2, f2) #self.fc2.weight.data.uniform_(-f2, f2) #self.fc2.bias.data.uniform_(-f2, f2) self.bn2 = nn.LayerNorm(self.fc2_dims) self.fc3 = nn.Linear(self.fc2_dims, self.fc3_dims) f3 = 1./np.sqrt(self.fc3.weight.data.size()[0]) T.nn.init.uniform_(self.fc3.weight.data, -f3, f3) T.nn.init.uniform_(self.fc3.bias.data, -f3, f3) #self.fc2.weight.data.uniform_(-f2, f2) #self.fc2.bias.data.uniform_(-f2, f2) self.bn3 = nn.LayerNorm(self.fc3_dims) self.fc4 = nn.Linear(self.fc3_dims, self.fc4_dims) f4 = 1./np.sqrt(self.fc4.weight.data.size()[0]) T.nn.init.uniform_(self.fc4.weight.data, -f4, f4) T.nn.init.uniform_(self.fc4.bias.data, -f4, f4) #self.fc2.weight.data.uniform_(-f2, f2) #self.fc2.bias.data.uniform_(-f2, f2) self.bn4 = nn.LayerNorm(self.fc4_dims) self.action_value = nn.Linear(self.n_actions, self.fc4_dims) f5 = 0.003 self.q = nn.Linear(self.fc4_dims, 1) T.nn.init.uniform_(self.q.weight.data, -f5, f5) T.nn.init.uniform_(self.q.bias.data, -f5, f5) #self.q.weight.data.uniform_(-f3, f3) #self.q.bias.data.uniform_(-f3, f3) self.optimizer = optim.Adam(self.parameters(), lr=beta) # if torch.cuda.available(): # import torch.cuda as T # else: # import torch as T self.device = T.device('cuda:0' if T.cuda.is_available() else 'cpu') self.to(self.device) def forward(self, state, action): state_value = self.fc1(state) state_value = self.bn1(state_value) state_value = F.relu(state_value) state_value = self.fc2(state_value) state_value = self.bn2(state_value) state_value = F.relu(state_value) state_value = self.fc3(state_value) state_value = self.bn3(state_value) state_value = F.relu(state_value) state_value = self.fc4(state_value) state_value = self.bn4(state_value) action_value = F.relu(self.action_value(action)) state_action_value = F.relu(T.add(state_value, action_value)) state_action_value = self.q(state_action_value) return state_action_value def save_checkpoint(self): print('... saving checkpoint ...') T.save(self.state_dict(), self.checkpoint_file) def load_checkpoint(self,load_file = ''): print('... loading checkpoint ...') if T.cuda.is_available(): self.load_state_dict(T.load(load_file)) else: self.load_state_dict(T.load(load_file, map_location=T.device('cpu'))) class ActorNetwork(nn.Module): def __init__(self, alpha, input_dims, fc1_dims, fc2_dims, fc3_dims, fc4_dims, n_actions, name, chkpt_dir='C:\\demo\\IRS_DDPG_minimal\\main_foder\\tmp\\ddpg', load_file = ''): super(ActorNetwork, self).__init__() self.input_dims = input_dims self.fc1_dims = fc1_dims self.fc2_dims = fc2_dims self.fc3_dims = fc3_dims self.fc4_dims = fc4_dims self.n_actions = n_actions self.checkpoint_file = os.path.join(chkpt_dir,name+'_ddpg') self.load_file = 'C:\\demo\\other_branch\\Learning-based_Secure_Transmission_for_RIS_Aided_mmWave-UAV_Communications_with_Imperfect_CSI\\data\\mannal_store\\models\\Actor_UAV_ddpg' self.fc1 = nn.Linear(*self.input_dims, self.fc1_dims) f1 = 1./np.sqrt(self.fc1.weight.data.size()[0]) # T.nn.init.uniform_(self.fc1.weight.data, -f1, f1) # T.nn.init.uniform_(self.fc1.bias.data, -f1, f1) self.fc1.weight.data.uniform_(-f1, f1) self.fc1.bias.data.uniform_(-f1, f1) self.bn1 = nn.LayerNorm(self.fc1_dims) self.fc2 = nn.Linear(self.fc1_dims, self.fc2_dims) #f2 = 0.002 f2 = 1./np.sqrt(self.fc2.weight.data.size()[0]) # T.nn.init.uniform_(self.fc2.weight.data, -f2, f2) # T.nn.init.uniform_(self.fc2.bias.data, -f2, f2) self.fc2.weight.data.uniform_(-f2, f2) self.fc2.bias.data.uniform_(-f2, f2) self.bn2 = nn.LayerNorm(self.fc2_dims) self.fc3 = nn.Linear(self.fc2_dims, self.fc3_dims) f3 = 1./np.sqrt(self.fc3.weight.data.size()[0]) # T.nn.init.uniform_(self.fc3.weight.data, -f3, f3) # T.nn.init.uniform_(self.fc3.bias.data, -f3, f3) self.fc2.weight.data.uniform_(-f2, f2) self.fc2.bias.data.uniform_(-f2, f2) self.bn3 = nn.LayerNorm(self.fc3_dims) self.fc4 = nn.Linear(self.fc3_dims, self.fc4_dims) f4 = 1./np.sqrt(self.fc4.weight.data.size()[0]) # T.nn.init.uniform_(self.fc4.weight.data, -f4, f4) # T.nn.init.uniform_(self.fc4.bias.data, -f4, f4) self.fc2.weight.data.uniform_(-f2, f2) self.fc2.bias.data.uniform_(-f2, f2) self.bn4 = nn.LayerNorm(self.fc4_dims) #f3 = 0.004 f5 = 0.003 self.mu = nn.Linear(self.fc4_dims, self.n_actions) # T.nn.init.uniform_(self.mu.weight.data, -f5, f5) # T.nn.init.uniform_(self.mu.bias.data, -f5, f5) self.mu.weight.data.uniform_(-f3, f3) self.mu.bias.data.uniform_(-f3, f3) self.optimizer = optim.Adam(self.parameters(), lr=alpha) self.device = T.device('cuda:0' if T.cuda.is_available() else 'cpu') self.to(self.device) def forward(self, state): x = self.fc1(state) x = self.bn1(x) x = F.relu(x) x = self.fc2(x) x = self.bn2(x) x = F.relu(x) x = self.fc3(x) x = self.bn3(x) x = F.relu(x) x = self.fc4(x) x = self.bn4(x) x = F.relu(x) x = T.tanh(self.mu(x)) return x def save_checkpoint(self): print('... saving checkpoint ...') T.save(self.state_dict(), self.checkpoint_file) def load_checkpoint(self, load_file=''): print('... loading checkpoint ...') if T.cuda.is_available(): self.load_state_dict(T.load(load_file)) else: self.load_state_dict(T.load(load_file, map_location=T.device('cpu'))) class Agent(object): def __init__(self, alpha, beta, input_dims, tau, env, gamma=0.99, n_actions=2, max_size=1000000, layer1_size=400, layer2_size=300, layer3_size=256, layer4_size=128, batch_size=64, noise = 'AWGN', agent_name = 'default', load_file = ''): self.load_file = load_file self.layer1_size = layer1_size self.layer2_size = layer2_size self.layer3_size = layer3_size self.layer4_size = layer4_size self.gamma = gamma self.tau = tau self.memory = ReplayBuffer(max_size, input_dims, n_actions) self.batch_size = batch_size self.actor = ActorNetwork(alpha, input_dims, layer1_size, layer2_size, layer3_size, layer4_size, n_actions=n_actions, name='Actor_' + agent_name,chkpt_dir=env.data_manager.store_path ) self.critic = CriticNetwork(beta, input_dims, layer1_size, layer2_size, layer3_size, layer4_size, n_actions=n_actions, name='Critic_' + agent_name,chkpt_dir=env.data_manager.store_path) self.target_actor = ActorNetwork(alpha, input_dims, layer1_size, layer2_size, layer3_size, layer4_size, n_actions=n_actions, name='TargetActor_' + agent_name,chkpt_dir=env.data_manager.store_path) self.target_critic = CriticNetwork(beta, input_dims, layer1_size, layer2_size, layer3_size, layer4_size, n_actions=n_actions, name='TargetCritic_' + agent_name,chkpt_dir=env.data_manager.store_path) if noise == 'OU': self.noise = OUActionNoise(mu=np.zeros(n_actions)) elif noise == 'AWGN': self.noise = AWGNActionNoise(mu = np.zeros(n_actions)) # tau = 1 means copy parameters to target self.update_network_parameters(tau=1) def choose_action(self, observation, greedy=0.5, epsilon = 1): self.actor.eval() observation = T.tensor(observation, dtype=T.float).to(self.actor.device) mu = self.actor.forward(observation).to(self.actor.device) mu_prime = mu + T.tensor(greedy * self.noise(), dtype=T.float).to(self.actor.device) self.actor.train() return mu_prime.cpu().detach().numpy() def remember(self, state, action, reward, new_state, done): self.memory.store_transition(state, action, reward, new_state, done) def learn(self): if self.memory.mem_cntr < self.batch_size: return # the done here is opposite of the done in the env state, action, reward, new_state, done = \ self.memory.sample_buffer(self.batch_size) # trun s, a, r, new_s into tensor reward = T.tensor(reward, dtype=T.float).to(self.critic.device) done = T.tensor(done).to(self.critic.device) new_state = T.tensor(new_state, dtype=T.float).to(self.critic.device) action = T.tensor(action, dtype=T.float).to(self.critic.device) state = T.tensor(state, dtype=T.float).to(self.critic.device) # trun on evaliation mode of target actor, target critic, critic net # fix these three nets self.target_actor.eval() self.target_critic.eval() self.critic.eval() # argmax_action*(Q(new_state,action*)),i.e. pi(s|pi),i.e. stratage choose the action* maxmium the future Q target_actions = self.target_actor.forward(new_state) # caculate the Q(new_state, new_action) critic_value_ = self.target_critic.forward(new_state, target_actions) # caculate the Q(state, action) critic_value = self.critic.forward(state, action) target = [] for j in range(self.batch_size): target.append(reward[j] + self.gamma*critic_value_[j]*done[j]) target = T.tensor(target).to(self.critic.device) target = target.view(self.batch_size, 1) # here update the critic net using mse of (r + gamma * Q_argmax_a*(newstate, a*)) - Q(state, action) self.critic.train() self.critic.optimizer.zero_grad() critic_loss = F.mse_loss(target, critic_value) critic_loss.backward() self.critic.optimizer.step() # here update the actor net by policy gradient # first fix the critic net self.critic.eval() self.actor.optimizer.zero_grad() mu = self.actor.forward(state) self.actor.train() actor_loss = -self.critic.forward(state, mu) actor_loss = T.mean(actor_loss) actor_loss.backward() self.actor.optimizer.step() self.update_network_parameters() def update_network_parameters(self, tau=None): if tau is None: tau = self.tau actor_params = self.actor.named_parameters() critic_params = self.critic.named_parameters() target_actor_params = self.target_actor.named_parameters() target_critic_params = self.target_critic.named_parameters() critic_state_dict = dict(critic_params) actor_state_dict = dict(actor_params) target_critic_dict = dict(target_critic_params) target_actor_dict = dict(target_actor_params) for name in critic_state_dict: critic_state_dict[name] = tau*critic_state_dict[name].clone() + \ (1-tau)*target_critic_dict[name].clone() self.target_critic.load_state_dict(critic_state_dict) for name in actor_state_dict: actor_state_dict[name] = tau*actor_state_dict[name].clone() + \ (1-tau)*target_actor_dict[name].clone() self.target_actor.load_state_dict(actor_state_dict) """ #Verify that the copy assignment worked correctly target_actor_params = self.target_actor.named_parameters() target_critic_params = self.target_critic.named_parameters() critic_state_dict = dict(target_critic_params) actor_state_dict = dict(target_actor_params) print('\nActor Networks', tau) for name, param in self.actor.named_parameters(): print(name, T.equal(param, actor_state_dict[name])) print('\nCritic Networks', tau) for name, param in self.critic.named_parameters(): print(name, T.equal(param, critic_state_dict[name])) input() """ def save_models(self): self.actor.save_checkpoint() self.target_actor.save_checkpoint() self.critic.save_checkpoint() self.target_critic.save_checkpoint() def load_models(self, load_file_actor = '',load_file_critic =''): self.actor.load_checkpoint(load_file = load_file_actor) self.target_actor.load_checkpoint(load_file = load_file_actor) self.critic.load_checkpoint(load_file = load_file_critic) self.target_critic.load_checkpoint(load_file = load_file_critic) def check_actor_params(self): current_actor_params = self.actor.named_parameters() current_actor_dict = dict(current_actor_params) original_actor_dict = dict(self.original_actor.named_parameters()) original_critic_dict = dict(self.original_critic.named_parameters()) current_critic_params = self.critic.named_parameters() current_critic_dict = dict(current_critic_params) print('Checking Actor parameters') for param in current_actor_dict: print(param, T.equal(original_actor_dict[param], current_actor_dict[param])) print('Checking critic parameters') for param in current_critic_dict: print(param, T.equal(original_critic_dict[param], current_critic_dict[param])) input() ================================================ FILE: entity.py ================================================ import numpy as np import math #from math_tool import * class UAV(object): """ UAV object with coordinate And with ULA antenas, default 8 And limited power And with fixed rotation angle """ def __init__(self, coordinate, index = 0, rotation = 0, ant_num=16, ant_type = 'ULA', max_movement_per_time_slot = 0.5): """ coordinate is the init coordinate of UAV, meters, np.array """ self.max_movement_per_time_slot = max_movement_per_time_slot self.type = 'UAV' self.coordinate = coordinate self.rotation = rotation self.ant_num = ant_num self.ant_type = ant_type self.coor_sys = [np.array([1, 0, 0]), np.array([0, -1, 0]), np.array([0, 0, -1])] self.index = index # init beamforming matrix in UAV (must be inited in env.py) self.G = np.mat(np.zeros((ant_num, 1))) self.G_Pmax = 0 def reset(self, coordinate): """ reset UAV coordinate """ self.coordinate = coordinate def update_coor_sys(self, delta_angle): """ used in function move to update the relevant coordinate system """ self.rotation = self.rotation + delta_angle coor_sys_x = np.array([\ math.cos(self.rotation),\ math.sin(self.rotation),\ 0]) coor_sys_z = np.array([\ 0,\ 0,\ -1]) coor_sys_y = np.cross(coor_sys_z, coor_sys_x) self.coor_sys = np.array([coor_sys_x,coor_sys_y,coor_sys_z]) def update_coordinate(self, distance_delta_d, direction_fai): """ used in function move to update UAV cordinate """ delta_x = distance_delta_d * math.cos(direction_fai) delta_y = distance_delta_d * math.sin(direction_fai) self.coordinate[0] += delta_x self.coordinate[1] += delta_y def move(self, distance_delta_d, direction_fai, delta_angle = 0): """ preform the 2D movement every step """ self.update_coordinate(distance_delta_d, direction_fai) self.update_coor_sys(delta_angle) class RIS(object): """ reconfigrable intelligent surface with N reflecting elements, UPA, default 4 X 4 = 16 continues phase shift """ def __init__(self, coordinate, coor_sys_z, index = 0, ant_num=36, ant_type = 'UPA'): """ coordinate is the init coordinate of with N reflecting elements, meters, np.array norm_vec is the normal vector of the reflecting direction !!! ant_num Must be the square of a certain int number """ self.type = 'RIS' self.coordinate = coordinate self.ant_num = ant_num self.ant_type = ant_type coor_sys_z = coor_sys_z / np.linalg.norm(coor_sys_z) coor_sys_x = np.cross(coor_sys_z, np.array([0,0,1])) coor_sys_x = coor_sys_x / np.linalg.norm(coor_sys_x) coor_sys_y = np.cross(coor_sys_z, coor_sys_x) self.coor_sys = [coor_sys_x,coor_sys_y,coor_sys_z] self.index = index # init reflecting phase shift self.Phi = np.mat(np.diag(np.ones(self.ant_num, dtype=complex)), dtype = complex) class User(object): """ user with single antenas """ def __init__(self, coordinate, index, ant_num = 1, ant_type = 'single'): """ coordinate is the init coordinate of user, meters, np.array ant_num is the antenas number of user """ self.type = 'user' self.coordinate = coordinate self.ant_num = ant_num self.ant_type = ant_type self.index = index self.coor_sys = [np.array([1, 0, 0]), np.array([0, 1, 0]), np.array([0, 0, 1])] # init the capacity self.capacity = 0 self.secure_capacity = 0 self.QoS_constrain = 0 # init the comprehensive_channel, (must used in env.py to init) self.comprehensive_channel = 0 # init receive noise sigma in dB self.noise_power = -114 def reset(self, coordinate): """ reset user coordinate """ self.coordinate = coordinate def update_coordinate(self, distance_delta_d, direction_fai): """ used in function move to update UAV cordinate """ delta_x = distance_delta_d * math.cos(direction_fai) delta_y = distance_delta_d * math.sin(direction_fai) self.coordinate[0] += delta_x self.coordinate[1] += delta_y def move(self, distance_delta_d, direction_fai): """ preform the 2D movement every step """ self.update_coordinate(distance_delta_d, direction_fai) class Attacker(object): """ Attacker with single antenas """ def __init__(self, coordinate, index, ant_num = 1, ant_type= 'single'): """ coordinate is the init coordinate of Attacker, meters, np.array ant_num is the antenas number of Attacker """ self.type = 'attacker' self.coordinate = coordinate self.ant_num = ant_num self.ant_type = ant_type self.index = index self.coor_sys = [np.array([1, 0, 0]), np.array([0, 1, 0]), np.array([0, 0, 1])] # init the capacity, this is a K length np.array ,shape: (K,) # represent the attack rate for kth user, (must init in env.py) self.capacity = 0 self.comprehensive_channel = 0 # init receive noise sigma in dBmW self.noise_power = -114 def reset(self, coordinate): """ reset attacker coordinate """ self.coordinate = coordinate def update_coordinate(self, distance_delta_d, direction_fai): """ used in function move to update UAV cordinate """ delta_x = distance_delta_d * math.cos(direction_fai) delta_y = distance_delta_d * math.sin(direction_fai) self.coordinate[0] += delta_x self.coordinate[1] += delta_y def move(self, distance_delta_d, direction_fai): """ preform the 2D movement every step """ self.update_coordinate(distance_delta_d, direction_fai) ================================================ FILE: env.py ================================================ #%matplotlib inline import numpy as np from entity import * from channel import * from math_tool import * from datetime import datetime from mpl_toolkits import mplot3d import matplotlib.pyplot as plt from render import Render from data_manager import DataManager # s.t every simulition is the same model np.random.seed(2) ###################################################### # new for energy # energy related parameters of rotary-wing UAV # based on Energy Minimization in Internet-of-Things System Based on Rotary-Wing UAV P_i = 790.6715 P_0 = 580.65 U2_tip = (200) ** 2 s = 0.05 d_0 = 0.3 p = 1.225 A = 0.79 delta_time = 0.1/1000 #0.1ms # add ons hover veloctiy # based on https://www.intechopen.com/chapters/57483 m = 1.3 # mass: assume 1.3kg https://www.droneblog.com/average-weights-of-common-types-of-drones/#:~:text=In%20most%20cases%2C%20toy%20drones,What%20is%20this%3F g = 9.81 # gravity T = m * g # thrust v_0 = (T / (A * 2 * p)) ** 0.5 def get_energy_consumption(v_t): ''' arg 1) v_t = displacement per time slot ''' energy_1 = P_0 \ + 3 * P_0 * (abs(v_t)) ** 2 / U2_tip \ + 0.5 * d_0 * p * s * A * (abs(v_t))**3 energy_2 = P_i * (( (1 + (abs(v_t) ** 4) / (4 * (v_0 ** 4))) ** 0.5 \ - (abs(v_t) ** 2) / (2 * (v_0 **2)) \ ) ** 0.5) energy = delta_time * (energy_1 + energy_2) return energy ENERGY_MIN = get_energy_consumption(0.25) ENERGY_MAX = get_energy_consumption(0) ###################################################### class MiniSystem(object): #class MiniSystem(K=1): """ define mini RIS communication system with one UAV and one RIS and one user, one attacker """ def __init__(self, UAV_num = 1, RIS_num = 1, user_num = 1, attacker_num = 1, fre = 28e9, \ RIS_ant_num = 16, UAV_ant_num=8, if_dir_link = 1, if_with_RIS = True, \ if_move_users = True, if_movements = True, reverse_x_y = (True, True), \ if_UAV_pos_state = True, reward_design = 'ssr', project_name = None, step_num=100): self.if_dir_link = if_dir_link self.if_with_RIS = if_with_RIS self.if_move_users = if_move_users self.if_movements = if_movements self.if_UAV_pos_state = if_UAV_pos_state self.reverse_x_y = reverse_x_y self.user_num = user_num self.attacker_num = attacker_num self.border = [(-25,25), (0, 50)] # 1.init entities: 1 UAV, 1 RIS, many users and attackers self.data_manager = DataManager(file_path='./data', project_name = project_name, \ store_list = ['beamforming_matrix', 'reflecting_coefficient', 'UAV_state', 'user_capacity', 'secure_capacity', 'attaker_capacity','G_power', 'reward','UAV_movement']) # 1.1 init UAV position and beamforming matrix self.UAV = UAV( coordinate=self.data_manager.read_init_location('UAV', 0), ant_num= UAV_ant_num, max_movement_per_time_slot=0.25) self.UAV.G = np.mat(np.ones((self.UAV.ant_num, user_num), dtype=complex), dtype=complex) self.power_factor = 100 self.UAV.G_Pmax = np.trace(self.UAV.G * self.UAV.G.H) * self.power_factor # 1.2 init RIS self.RIS = RIS(\ coordinate=self.data_manager.read_init_location('RIS', 0), \ coor_sys_z=self.data_manager.read_init_location('RIS_norm_vec', 0), \ ant_num=RIS_ant_num) # 1.3 init users self.user_list = [] for i in range(user_num): user_coordinate = self.data_manager.read_init_location('user', i) user = User(coordinate=user_coordinate, index=i) user.noise_power = -114 self.user_list.append(user) # 1.4 init attackers self.attacker_list = [] for i in range(attacker_num): attacker_coordinate = self.data_manager.read_init_location('attacker', i) attacker = Attacker(coordinate=attacker_coordinate, index=i) attacker.capacity = np.zeros((user_num)) attacker.noise_power = -114 self.attacker_list.append(attacker) # 1.5 generate the eavesdrop capacity array , shape: P X K self.eavesdrop_capacity_array= np.zeros((attacker_num, user_num)) # 1.6 reward design self.reward_design = reward_design # reward_design is ['ssr' or 'see'] # 1.7 step_num self.step_num = step_num # 2.init channel self.H_UR = mmWave_channel(self.UAV, self.RIS, fre) self.h_U_k = [] self.h_R_k = [] self.h_U_p = [] self.h_R_p = [] for user_k in self.user_list: self.h_U_k.append(mmWave_channel(user_k, self.UAV, fre)) self.h_R_k.append(mmWave_channel(user_k, self.RIS, fre)) for attacker_p in self.attacker_list: self.h_U_p.append(mmWave_channel(attacker_p, self.UAV, fre)) self.h_R_p.append(mmWave_channel(attacker_p, self.RIS, fre)) # 3 update user and attaker channel capacity self.update_channel_capacity() # 4 draw system self.render_obj = Render(self) def reset(self): """ reset UAV, users, attackers, beamforming matrix, reflecting coefficient """ # 1 reset UAV self.UAV.reset(coordinate=self.data_manager.read_init_location('UAV', 0)) # 2 reset users for i in range(self.user_num): user_coordinate = self.data_manager.read_init_location('user', i) self.user_list[i].reset(coordinate=user_coordinate) # 3 reset attackers for i in range(self.attacker_num): attacker_coordinate = self.data_manager.read_init_location('attacker', i) self.attacker_list[i].reset(coordinate=attacker_coordinate) # 4 reset beamforming matrix self.UAV.G = np.mat(np.ones((self.UAV.ant_num, self.user_num), dtype=complex), dtype=complex) self.UAV.G_Pmax = np.trace(self.UAV.G * self.UAV.G.H) * self.power_factor # 5 reset reflecting coefficient """self.RIS = RIS(\ coordinate=self.data_manager.read_init_location('RIS', 0), \ coor_sys_z=self.data_manager.read_init_location('RIS_norm_vec', 0), \ ant_num=16)""" self.RIS.Phi = np.mat(np.diag(np.ones(self.RIS.ant_num, dtype=complex)), dtype = complex) # 6 reset time self.render_obj.t_index = 0 # 7 reset CSI self.H_UR.update_CSI() for h in self.h_U_k + self.h_U_p + self.h_R_k + self.h_R_p: h.update_CSI() # 8 reset capcaity self.update_channel_capacity() def step(self, action_0 = 0, action_1 = 0, G = 0, Phi = 0, set_pos_x = 0, set_pos_y = 0): """ test step only move UAV and update channel """ # 0 update render self.render_obj.t_index += 1 # 1 update entities if self.if_move_users: self.user_list[0].update_coordinate(0.2, -1/2 * math.pi) self.user_list[1].update_coordinate(0.2, -1/2 * math.pi) if self.if_movements: move_x = action_0 * self.UAV.max_movement_per_time_slot move_y = action_1 * self.UAV.max_movement_per_time_slot ###################################################### # new for energy v_t = (move_x ** 2 + move_y ** 2) ** 0.5 #self.data_manager.store_data([v_t],'velocity') ###################################################### if self.reverse_x_y[0]: move_x = -move_x if self.reverse_x_y[1]: move_y = -move_y self.UAV.coordinate[0] +=move_x self.UAV.coordinate[1] +=move_y self.data_manager.store_data([move_x, move_y], 'UAV_movement') else: set_pos_x = map_to(set_pos_x, (-1, 1), self.border[0]) set_pos_y = map_to(set_pos_y, (-1, 1), self.border[1]) self.UAV.coordinate[0] = set_pos_x self.UAV.coordinate[1] = set_pos_y # 2 update channel CSI for h in self.h_U_k + self.h_U_p + self.h_R_k + self.h_R_p: h.update_CSI() # !!! test to make direct link zero if self.if_dir_link == 0: for h in self.h_U_k + self.h_U_p: h.channel_matrix = np.mat(np.zeros(shape = np.shape(h.channel_matrix)), dtype=complex) if self.if_with_RIS == False: self.H_UR.channel_matrix = np.mat(np.zeros((self.RIS.ant_num, self.UAV.ant_num)), dtype=complex) else: self.H_UR.update_CSI() # 3 update beamforming matrix & reflecting phase shift """ self.UAV.G = G self.RIS.Phi = Phi """ self.UAV.G = convert_list_to_complex_matrix(G, (self.UAV.ant_num, self.user_num)) * math.pow(self.power_factor, 0.5) # fix beamforming matrix #self.UAV.G = np.mat(np.ones((self.UAV.ant_num, self.user_num), dtype=complex), dtype=complex) * math.pow(self.power_factor, 0.5) if self.if_with_RIS: self.RIS.Phi = convert_list_to_complex_diag(Phi, self.RIS.ant_num) # 4 update channel capacity in every user and attacker self.update_channel_capacity() # 5 store current system state to .mat self.store_current_system_sate() # 6 get new state new_state = self.observe() # 7 get reward reward = self.reward() # 7.1 reward with energy efficiency ###################################################### if self.reward_design == 'see': # new for see energy = energy_raw = get_energy_consumption(v_t) energy -= ENERGY_MIN energy /= (ENERGY_MAX - ENERGY_MIN) energy_penalty = -1 * 0.1 * abs(reward) * energy # -1 * 0.1 * reward * energy if reward > 0: reward += energy_penalty ###################################################### # 8 calculate if UAV is cross the bourder reward = math.tanh(reward) # new for energy (ori not commented) done = False x, y = self.UAV.coordinate[0:2] if x < self.border[0][0] or x > self.border[0][1]: done = True reward = -10 if y < self.border[1][0] or y > self.border[1][1]: done = True reward = -10 self.data_manager.store_data([reward],'reward') return new_state, reward, done, [] def reward(self): """ used in function step to get the reward of current step """ reward = 0 reward_ = 0 P = np.trace(self.UAV.G * self.UAV.G.H) if abs(P) > abs(self.UAV.G_Pmax) : reward = abs(self.UAV.G_Pmax) - abs(P) reward /= self.power_factor else: for user in self.user_list: r = user.capacity - max(self.eavesdrop_capacity_array[:, user.index]) if r < user.QoS_constrain: reward_ += r - user.QoS_constrain else: reward += r/(self.user_num*2) if reward_ < 0: reward = reward_ * self.user_num * 10 return reward def observe(self): """ used in function main to get current state the state is a list with """ # users' and attackers' comprehensive channel comprehensive_channel_elements_list = [] for entity in self.user_list + self.attacker_list: tmp_list = list(np.array(np.reshape(entity.comprehensive_channel, (1,-1)))[0]) comprehensive_channel_elements_list += list(np.real(tmp_list)) + list(np.imag(tmp_list)) UAV_position_list = [] if self.if_UAV_pos_state: UAV_position_list = list(self.UAV.coordinate) return comprehensive_channel_elements_list + UAV_position_list def store_current_system_sate(self): """ function used in step() to store system state """ # 1 store beamforming matrix row_data = list(np.array(np.reshape(self.UAV.G, (1, -1)))[0,:]) self.data_manager.store_data(row_data, 'beamforming_matrix') # 2 store reflecting coefficient matrix row_data = list(np.array(np.reshape(diag(self.RIS.Phi), (1,-1)))[0,:]) self.data_manager.store_data(row_data, 'reflecting_coefficient') # 3 store UAV state row_data = list(self.UAV.coordinate) self.data_manager.store_data(row_data, 'UAV_state') # 4 store user_capicity row_data = [user.secure_capacity for user in self.user_list] \ + [user.capacity for user in self.user_list] # 5 store G_power row_data = [np.trace(self.UAV.G*self.UAV.G.H), self.UAV.G_Pmax] self.data_manager.store_data(row_data, 'G_power') row_data = [] for user in self.user_list: row_data.append(user.capacity) self.data_manager.store_data(row_data, 'user_capacity') row_data = [] for attacker in self.attacker_list: row_data.append(attacker.capacity) self.data_manager.store_data(row_data, 'attaker_capacity') row_data = [] for user in self.user_list: row_data.append(user.secure_capacity) self.data_manager.store_data(row_data, 'secure_capacity') def update_channel_capacity(self): """ function used in step to calculate user and attackers' capacity """ # 1 calculate eavesdrop rate for attacker in self.attacker_list: attacker.capacity = self.calculate_capacity_array_of_attacker_p(attacker.index) self.eavesdrop_capacity_array[attacker.index, :] = attacker.capacity # remmeber to update comprehensive_channel attacker.comprehensive_channel = self.calculate_comprehensive_channel_of_attacker_p(attacker.index) # 2 calculate unsecure rate for user in self.user_list: user.capacity = self.calculate_capacity_of_user_k(user.index) # 3 calculate secure rate user.secure_capacity = self.calculate_secure_capacity_of_user_k(user.index) # remmeber to update comprehensive_channel user.comprehensive_channel = self.calculate_comprehensive_channel_of_user_k(user.index) def calculate_comprehensive_channel_of_attacker_p(self, p): """ used in update_channel_capacity to calculate the comprehensive_channel of attacker p """ h_U_p = self.h_U_p[p].channel_matrix h_R_p = self.h_R_p[p].channel_matrix Psi = diag_to_vector(self.RIS.Phi) H_c = vector_to_diag(h_R_p).H * self.H_UR.channel_matrix return h_U_p.H + Psi.H * H_c def calculate_comprehensive_channel_of_user_k(self, k): """ used in update_channel_capacity to calculate the comprehensive_channel of user k """ h_U_k = self.h_U_k[k].channel_matrix h_R_k = self.h_R_k[k].channel_matrix Psi = diag_to_vector(self.RIS.Phi) H_c = vector_to_diag(h_R_k).H * self.H_UR.channel_matrix return h_U_k.H + Psi.H * H_c def calculate_capacity_of_user_k(self, k): """ function used in update_channel_capacity to calculate one user """ noise_power = self.user_list[k].noise_power h_U_k = self.h_U_k[k].channel_matrix h_R_k = self.h_R_k[k].channel_matrix Psi = diag_to_vector(self.RIS.Phi) H_c = vector_to_diag(h_R_k).H * self.H_UR.channel_matrix G_k = self.UAV.G[:, k] G_k_ = 0 if len(self.user_list) == 1: G_k_ = np.mat(np.zeros((self.UAV.ant_num, 1), dtype=complex), dtype=complex) else: G_k_1 = self.UAV.G[:, 0:k] G_k_2 = self.UAV.G[:, k+1:] G_k_ = np.hstack((G_k_1, G_k_2)) alpha_k = math.pow(abs((h_U_k.H + Psi.H * H_c) * G_k), 2) beta_k = math.pow(np.linalg.norm((h_U_k.H + Psi.H * H_c)*G_k_), 2) + dB_to_normal(noise_power) * 1e-3 return math.log10(1 + abs(alpha_k / beta_k)) def calculate_capacity_array_of_attacker_p(self, p): """ function used in update_channel_capacity to calculate one attacker capacities to K users output is a K length np.array ,shape: (K,) """ K = len(self.user_list) noise_power = self.attacker_list[p].noise_power h_U_p = self.h_U_p[p].channel_matrix h_R_p = self.h_R_p[p].channel_matrix Psi = diag_to_vector(self.RIS.Phi) H_c = vector_to_diag(h_R_p).H * self.H_UR.channel_matrix if K == 1: G_k = self.UAV.G G_k_ = np.mat(np.zeros((self.UAV.ant_num, 1), dtype=complex), dtype=complex) alpha_p = math.pow(abs((h_U_p.H + Psi.H * H_c) * G_k), 2) beta_p = math.pow(np.linalg.norm((h_U_p.H + Psi.H * H_c)*G_k_), 2) + dB_to_normal(noise_power) * 1e-3 return np.array([math.log10(1 + abs(alpha_p / beta_p))]) else: result = np.zeros(K) for k in range(K): G_k = G_k = self.UAV.G[:, k] G_k_1 = self.UAV.G[:, 0:k] G_k_2 = self.UAV.G[:, k+1:] G_k_ = np.hstack((G_k_1, G_k_2)) alpha_p = math.pow(abs((h_U_p.H + Psi.H * H_c) * G_k), 2) beta_p = math.pow(np.linalg.norm((h_U_p.H + Psi.H * H_c)*G_k_), 2) + dB_to_normal(noise_power) * 1e-3 result[k] = math.log10(1 + abs(alpha_p / beta_p)) return result def calculate_secure_capacity_of_user_k(self, k=2): """ function used in update_channel_capacity to calculate the secure rate of user k """ user = self.user_list[k] R_k_unsecure = user.capacity R_k_maxeavesdrop = max(self.eavesdrop_capacity_array[:, k]) secrecy_rate= max(0, R_k_unsecure - R_k_maxeavesdrop) return secrecy_rate def get_system_action_dim(self): """ function used in main function to get the dimention of actions """ result = 0 # 0 UAV movement result += 2 # 1 RIS reflecting elements if self.if_with_RIS: result += self.RIS.ant_num else: result += 0 # 2 beamforming matrix dimention result += 2 * self.UAV.ant_num * self.user_num return result def get_system_state_dim(self): """ function used in main function to get the dimention of states """ result = 0 # users' and attackers' comprehensive channel result += 2 * (self.user_num + self.attacker_num) * self.UAV.ant_num # UAV position if self.if_UAV_pos_state: result += 3 return result ================================================ FILE: env1.py ================================================ import numpy as np import math import cmath np.random.seed(42) class minimal_IRS_system(): def __init__(self, BS_M = 8, IRS_N_x = 4, IRS_N_y =2, K = 8, statistic = False): self.M = BS_M # number of BS antennas self.N = IRS_N_x * IRS_N_y # number of IRS elements self.K = K # number of users self.z_BS = 0 # hight of BS antenas self.BS_rotation = 0 # self.BS_coordinate = np.array([0, 0, 30]) self.BS_elevation_angle = 0 self.BS_max_power = 1000 # BS power constrain, W self.BS_normal_vecter = np.array([math.cos(self.BS_rotation), math.sin(self.BS_rotation), math.sin(self.BS_elevation_angle)*np.linalg.norm([-math.cos(self.BS_rotation), math.sin(self.BS_rotation)])]) self.BS_normal_vecter = self.BS_normal_vecter / np.linalg.norm(self.BS_normal_vecter) # channel parameters class class channel_parameters(): """ channel parameters """ def __init__(self): self.noise_dBm = noise_dBm = -114 # channel noise, -114 dBm self.noise_segma = 0 # channel noise std, self.d_0 = 1 # path loss referance distance 1 m self.rho_0 = 0.01 # path loss parameter self.alpha_BS_to_IRS = 3 # path loss exponent self.alpha_IRS_to_user = 2.5 # path loss exponent self.alpha_BS_to_user = 3.5 # path loss exponent self.K_BS = math.pow(10, 3/10) # rician factor, 3dB self.K_IRS = math.pow(10, 3/10) # rician factor, 3dB self.channel_parameters = channel_parameters() self.channel_parameters.noise_segma = math.pow(self.dBm_to_W(self.channel_parameters.noise_dBm),0.5) # IRS initial class IRS(): """ store user parameters """ def __init__(self): self.IRS_N_x = IRS_N_x self.IRS_N_y = IRS_N_y self.IRS_N = IRS_N_x * IRS_N_y self.x_IRS = 1000 # position x of IRS, 1000 m self.y_IRS = 0 # position y of IRS, 0 m self.z_IRS = 25 # position z of IRS, 0 m self.IRS_coordinate = np.array([self.x_IRS,self.y_IRS,self.z_IRS]) self.deltaD_to_lambda = 0.5 self.d_BS_to_IRS = np.linalg.norm(self.IRS_coordinate) self.theta_IRS = math.atan(self.y_IRS / self.x_IRS) # theta_IRS self.psi_IRS = 0 # rotation of IRS , range: -pi/2 + theta_IRS ~ pi/2 - theta_IRS self.elevation_angle = 0 # elevation angle self.IRS_normal_vecter = np.array([-math.cos(self.psi_IRS), math.sin(self.psi_IRS),math.sin(self.elevation_angle)*np.linalg.norm([-math.cos(self.psi_IRS), math.sin(self.psi_IRS)])]) self.IRS_normal_vecter = self.IRS_normal_vecter / np.linalg.norm(self.IRS_normal_vecter) self.theta_BS_to_IRS = 0 self.psi_BS_to_IRS = 0 self.theta_IRS_to_BS = 0 self.psi_IRS_to_BS = 0 self.IRS = IRS() # users initial class user(): """ store user parameters """ def __init__(self): self.user_index = 0 # user ID self.distance_to_IRS = 10 # distance to IRS(need caculation), range of (10, 30) self.psi = 0 # psi angle to IRS self.psi_IRS_surface = 0 # psi used to calculate the UPA response self.x_user = 10 # position x self.y_user = 10 # position y self.z_user = 0 # position z self.user_coordinate = np.array([self.x_user,self.y_user,self.z_user]) self.distance_to_BS = np.linalg.norm(self.user_coordinate) # distance to BS self.theta_user = 0 # theta to BS self.theta_IRS_to_kth_user = 0 self.psi_IRS_to_kth_user = 0 mini_distance = 10 # minimal distance for users to IRS, m max_distance = 30 # maxmial distance for users to IRS, m statistic_distance_list = [10, 12, 14, 17, 20, 23, 25, 29] statistic_angle_list = [0, math.pi/4, math.pi/2, 3*math.pi/4, math.pi, 5*math.pi/4, 3*math.pi/2, 7*math.pi/4] stocastic_distance_list = np.random.uniform(mini_distance, max_distance, self.K) # list of dis stocastic_angle_list = np.random.uniform(- math.pi / 2, math.pi / 2, self.K) # list of angle self.user_list = [] for index_user in range(K): user_temp = user() user_temp.user_index = index_user if statistic == True: user_temp.distance_to_IRS = statistic_distance_list[index_user] user_temp.psi = statistic_angle_list[index_user] user_temp.x_user = self.IRS.x_IRS - user_temp.distance_to_IRS * math.cos(user_temp.psi) user_temp.y_user = self.IRS.y_IRS + user_temp.distance_to_IRS * math.sin(user_temp.psi) else: user_temp.distance_to_IRS = stocastic_distance_list[index_user] user_temp.psi = stocastic_angle_list[index_user] user_temp.x_user = self.IRS.x_IRS - user_temp.distance_to_IRS * math.cos(user_temp.psi) user_temp.y_user = self.IRS.y_IRS + user_temp.distance_to_IRS * math.sin(user_temp.psi) user_temp.psi_IRS_surface = user_temp.psi - self.IRS.psi_IRS #user_temp.distance_to_BS = np.linalg.norm([user_temp.x_user, user_temp.y_user]) user_temp.theta_user = math.atan(user_temp.y_user / user_temp.x_user ) user_temp.user_coordinate = np.array([user_temp.x_user,user_temp.y_user,user_temp.z_user]) self.user_list.append(user_temp) # transmit signal self.X_transmit = np.mat(np.ones((K, 1))) # BS transmit signal, unit variance, shape: K X 1 # active beamforming at BS self.G_beamforming = np.mat(np.ones((self.M, K),dtype=complex)) # BS beamforming matrix, shape: M X K self.BS_max_power = self.calculate_total_transmit_power() self.reset_G = np.mat(np.ones((self.M, K),dtype=complex)) # channal gain from BS to IRS #self.H_BS_to_IRS = np.mat(np.ones((self.N, self.M))) # channel gain BS to IRS, shape: N X M self.H_BS_to_IRS = self.calculate_channel_BS_to_IRS(self.IRS) # channal gain from IRS to users # self.H_IRS_to_user = np.mat(np.ones((K, self.N),dtype=complex)) # channel gain IRS to user, shape: K X N self.H_IRS_to_user = self.calculate_channel_IRS_to_user(self.IRS) # IRS phase shifter parameter self.Fai = np.mat(np.identity(self.N), dtype=complex) # IRS phase shift self.reset_Fai = np.mat(np.identity(self.N), dtype=complex) # data rate matrex container self.data_rate = np.mat(np.ones((K,1))) # channal capacities for all K users # calculate sum of all user data rate self.sum_of_user_data_rates = self.calculate_data_rate() def calculate_total_transmit_power(self): """ calculate total power """ G = self.G_beamforming G_H = G.H temp_power = 0 for m in range(self.M): temp_power += G[m,:] * G_H[:, m] return float(np.real(temp_power[0,0])) def dBm_to_W(self, dBm): mW = math.pow(10, dBm/10) W = mW/1000.0 return W def generate_noise(self, mu, segma, size = 1): """ generates noise of variance segma ^ 2 size can be a tuple like (3,3) output : a float of noise or an array with size size of average mu and std of segma """ result = np.random.normal(mu, segma, size) if size == 1: return float(result) else: return result def calculate_received_signal_amplitude(self): """ calculate_received_signal_amplitude from BS to user output : a matix , size: K X 1 """ noise_array = self.generate_noise(0, self.channel_parameters.noise_segma, (self.K,1)) noise_matrix = np.mat(noise_array) Y_matrix = self.H_IRS_to_user * self.Fai * self.H_BS_to_IRS * self.G_beamforming * self.X_transmit + noise_matrix return Y_matrix def calculate_SINR_of_kth_user(self, k): """ calculate kth user's SINR output : a float """ expected_amplitude = abs(self.H_IRS_to_user[k] * self.Fai * self.H_BS_to_IRS * self.G_beamforming[:, k]) expected_power = math.pow(expected_amplitude, 2) noise_power = 0 noise_power = noise_power + math.pow(self.channel_parameters.noise_segma, 2) for n in range(self.K): if n != k: noise_power = noise_power + math.pow(abs(self.H_IRS_to_user[n] * self.Fai * self.H_BS_to_IRS * self.G_beamforming[:, n]),2) return expected_power/noise_power def calculate_data_rate(self): """ calculate K data rates output : sum of datarate """ for kth in range(self.K): self.data_rate[kth][0] = math.log2(1 + self.calculate_SINR_of_kth_user(kth)) return float(self.data_rate.sum()) def calculate_theta_and_psi(self, ref_norm_vector, coordinate): """ calculate_theta_and_psi """ theta_ref_to_coor = math.acos(abs(np.dot(coordinate/ np.linalg.norm(coordinate),ref_norm_vector))) coordinate_xoy = coordinate - abs(math.cos(theta_ref_to_coor)) * np.linalg.norm(coordinate) * ref_norm_vector temp_x_vector = [ref_norm_vector[1], -ref_norm_vector[0], 0] psi_ref_to_coor = abs(np.dot(temp_x_vector, coordinate_xoy)) / (np.linalg.norm(temp_x_vector) * np.linalg.norm(coordinate_xoy)) if math.isnan(psi_ref_to_coor): psi_ref_to_coor = 0 return theta_ref_to_coor, psi_ref_to_coor def calculate_channel_BS_to_IRS(self, IRS): # IRS elements spacing half wavelength """ denate the self.H_BS_to_IRS (N X M) """ temp_vertical_matrix = np.zeros((IRS.IRS_N,1), dtype=complex) temp_horizontal_matrix = np.zeros((1, self.M), dtype=complex) """ theta_BS_to_IRS = math.acos(np.dot(IRS.IRS_coordinate/ np.linalg.norm(IRS.IRS_coordinate),self.BS_normal_vecter)) IRS_xoy = IRS.IRS_coordinate - abs(math.cos(theta_BS_to_IRS)) * np.linalg.norm(IRS.IRS_coordinate) * self.BS_normal_vecter temp_x_BS = [self.BS_normal_vecter[1], -self.BS_normal_vecter[0], 0] psi_BS_to_IRS = abs(np.dot(temp_x_BS, IRS_xoy)) / (np.linalg.norm(temp_x_BS) * np.linalg.norm(IRS_xoy)) if math.isnan(psi_BS_to_IRS): psi_BS_to_IRS = 0 theta_IRS_to_BS = math.acos(abs(np.dot(IRS.IRS_coordinate/ np.linalg.norm(IRS.IRS_coordinate), IRS.IRS_normal_vecter))) BS_xoy = -1 * np.array(IRS.IRS_coordinate) - abs(math.cos(theta_IRS_to_BS)) * np.linalg.norm(IRS.IRS_coordinate) * IRS.IRS_normal_vecter temp_x_IRS = [IRS.IRS_normal_vecter[1], -IRS.IRS_normal_vecter[0], 0] psi_IRS_to_BS = abs(np.dot(temp_x_IRS, BS_xoy)) / (np.linalg.norm(temp_x_IRS) * np.linalg.norm(BS_xoy)) if math.isnan(psi_IRS_to_BS): psi_IRS_to_BS = 0 # old way to calculate theta and psi """ theta_BS_to_IRS, psi_BS_to_IRS = self.calculate_theta_and_psi(self.BS_normal_vecter, IRS.IRS_coordinate - self.BS_coordinate) theta_IRS_to_BS, psi_IRS_to_BS = self.calculate_theta_and_psi(IRS.IRS_normal_vecter, self.BS_coordinate - IRS.IRS_coordinate) IRS.theta_BS_to_IRS = theta_BS_to_IRS IRS.psi_BS_to_IRS = psi_BS_to_IRS IRS.theta_IRS_to_BS = theta_IRS_to_BS IRS.psi_IRS_to_BS = psi_IRS_to_BS for i in range(IRS.IRS_N_y): for j in range(IRS.IRS_N_x): temp_vertical_matrix[i * IRS.IRS_N_x + j][0] = \ cmath.exp(-1j * 2 * math.pi * IRS.deltaD_to_lambda * \ (i * math.sin(theta_IRS_to_BS)* math.cos(psi_IRS_to_BS)+ \ j * math.sin(theta_IRS_to_BS)* math.sin(psi_IRS_to_BS)) \ ) for i in range(self.M): temp_horizontal_matrix[0][i] = \ cmath.exp(-1j * 2 * math.pi * IRS.deltaD_to_lambda * \ (i * math.sin(theta_BS_to_IRS)* math.cos(psi_BS_to_IRS)) \ ) temp_horizontal_matrix = np.mat(temp_horizontal_matrix) temp_vertical_matrix = np.mat(temp_vertical_matrix) H_LOS = temp_vertical_matrix * temp_horizontal_matrix H = math.pow(self.channel_parameters.rho_0* (math.pow(IRS.d_BS_to_IRS/self.channel_parameters.d_0, - self.channel_parameters.alpha_BS_to_IRS)), 0.5) \ * (H_LOS) return H def calculate_channel_IRS_to_user(self, IRS): """ channel gain IRS to user, shape: K X N """ temp_array = np.zeros((self.K, self.N), dtype=complex) for kth in range(self.K): theta_IRS_to_kth_user, psi_IRS_to_kth_user = self.calculate_theta_and_psi(IRS.IRS_normal_vecter, self.user_list[kth].user_coordinate - IRS.IRS_coordinate) self.user_list[kth].theta_IRS_to_kth_user = theta_IRS_to_kth_user self.user_list[kth].psi_IRS_to_kth_user = psi_IRS_to_kth_user for i in range(IRS.IRS_N_y): for j in range(IRS.IRS_N_x): temp_array[kth][i * IRS.IRS_N_x + j] = \ cmath.exp(-1j * 2 * math.pi * IRS.deltaD_to_lambda * \ (i * math.sin(theta_IRS_to_kth_user)* math.cos(psi_IRS_to_kth_user)+ \ j * math.sin(theta_IRS_to_kth_user)* math.sin(psi_IRS_to_kth_user)) \ ) H_LOS = temp_array for kth in range(self.K): H_LOS[kth] = math.pow(self.channel_parameters.rho_0* (math.pow(np.linalg.norm(IRS.IRS_coordinate - self.user_list[kth].user_coordinate)/self.channel_parameters.d_0, - self.channel_parameters.alpha_BS_to_IRS)), 0.5) \ * (H_LOS[kth]) H = np.mat(H_LOS) return H def reset(self): """ RL implement, """ # 1 reset all parameters self.G_beamforming = self.reset_G self.Fai = self.reset_Fai # 2 get state return self.get_state() def apply_action(self, action): """ apply action """ # 0 parameters and all kinds of matrix K = self.K M = self.M N = self.N # 1 apply action # 1.1 divide action list into two parts # 1.1.1 beamforming part, MXK action_beamforming = action[0 : 2*M*K] # 1.1.2 IRS reflecting coinfetions action_Fai = action[2*M*K : 2*M*K + 2*N] # 1.2 apply action # 1.2.1 beamforming part, MXK for m in range(M): for k in range(K): index = m * K + k temp_Gmk = action_beamforming[2*index] + 1j * action_beamforming[2*index + 1] self.G_beamforming[m, k] = temp_Gmk # 1.2.2 IRS reflecting coinfetions for n in range(N): temp_Fainn = action_Fai[2*n] + 1j * action_Fai[2*n + 1] self.Fai[n, n] = temp_Fainn return True def get_state(self): """ get current state """ # 0 parameters and all kinds of matrix K = self.K M = self.M N = self.N new_state_dims = 2*K + 2*K**2 + 2*N + 2*M*K + 2*N*M + 2*K*N # 1 get new state new_state = [] G = self.G_beamforming G_H = self.G_beamforming.H H_IRS_to_user = self.H_IRS_to_user Fai = self.Fai H_BS_to_IRS = self.H_BS_to_IRS # 1.1 transmit power to k users, 2K state_tran_power_BS_to_users = [] # result container for kth in range(K): state_tran_power_BS_to_users.append(\ math.pow(float(np.real(G_H[kth,:] * G[:,kth])),2)) state_tran_power_BS_to_users.append(\ math.pow(float(np.imag(G_H[kth,:] * G[:,kth])),2)) # 1.2 received power for users, 2K^2 state_action_receive_power_for_users = [] for k1 in range(K): user_k1_received_power_from_all_other_users = \ H_IRS_to_user[k1, :] * Fai * H_BS_to_IRS * G for k2 in range(K): state_action_receive_power_for_users.append(\ math.pow(np.real(user_k1_received_power_from_all_other_users[0, k2]), 2)) state_action_receive_power_for_users.append(\ math.pow(np.imag(user_k1_received_power_from_all_other_users[0, k2]), 2)) # 1.3 beamforming at BS, 2*M*K+2*N state_beamforming_at_BS = [] for m in range(M): for k in range(K): state_beamforming_at_BS.append(\ math.pow(np.real(G[m, k]), 2)) state_beamforming_at_BS.append(\ math.pow(np.imag(G[m, k]), 2)) for n in range(N): state_beamforming_at_BS.append(\ math.pow(np.real(Fai[n, n]), 2)) state_beamforming_at_BS.append(\ math.pow(np.imag(Fai[n, n]), 2)) # 1.4 channel state at both BS-IRS & IRS-users, 2*N*M+2*K*M state_channel_BS_IRS_and_IRS_users = [] for n in range(N): for m in range(M): state_channel_BS_IRS_and_IRS_users.append(\ math.pow(np.real(H_BS_to_IRS[n, m]), 2)) state_channel_BS_IRS_and_IRS_users.append(\ math.pow(np.imag(H_BS_to_IRS[n, m]), 2)) for k in range(K): for n in range(N): state_channel_BS_IRS_and_IRS_users.append(\ math.pow(np.real(H_IRS_to_user[k, n]), 2)) state_channel_BS_IRS_and_IRS_users.append(\ math.pow(np.imag(H_IRS_to_user[k, n]), 2)) # 1.5 finnish getting state new_state.append(state_tran_power_BS_to_users) new_state.append(state_action_receive_power_for_users) new_state.append(state_beamforming_at_BS) new_state.append(state_channel_BS_IRS_and_IRS_users) result = state_tran_power_BS_to_users + \ state_action_receive_power_for_users + \ state_beamforming_at_BS + \ state_channel_BS_IRS_and_IRS_users return result def step(self, action): """ RL function ,state is a 1 X (2K+2K^2+2N+2MK+2NM+2KN) vecter """ done = False # 1 get new state new_state = self.get_state() # 2 apply action self.apply_action(action) # 3 judge if done total_power = self.calculate_total_transmit_power() if total_power > self.BS_max_power: done = True # 4 get other reward reward = 0 if total_power > self.BS_max_power: reward = np.real(self.BS_max_power - total_power) # reward = np.real(self.BS_max_power[0,0] - total_power[0,0] )/1000 else: reward = self.calculate_data_rate()*100 info = 0 return new_state, reward, done, info def render(self): """ show system function """ return True ================================================ FILE: load_and_plot.py ================================================ import matplotlib.pyplot as plt import numpy as np import cmath from scipy.io import loadmat, savemat import pandas as pd import os import copy import math import csv import argparse # get argument from user parser = argparse.ArgumentParser() parser.add_argument('--path', type = str, required = False, default=None, help='the path where the training/simulation data is stored') parser.add_argument('--ep-num', type = int, required = False, default=300, help='total number of episodes') # extract argument args = parser.parse_args() STORE_PATH = args.path EP_NUM = args.ep_num ###################################################### # new for energy # energy related parameters of rotary-wing UAV # based on Energy Minimization in Internet-of-Things System Based on Rotary-Wing UAV P_i = 790.6715 P_0 = 580.65 U2_tip = (200) ** 2 s = 0.05 d_0 = 0.3 p = 1.225 A = 0.79 delta_time = 0.1 #0.1/1000 #0.1ms # add ons hover veloctiy # based on https://www.intechopen.com/chapters/57483 m = 1.3 # mass: assume 1.3kg https://www.droneblog.com/average-weights-of-common-types-of-drones/#:~:text=In%20most%20cases%2C%20toy%20drones,What%20is%20this%3F g = 9.81 # gravity T = m * g # thrust v_0 = (T / (A * 2 * p)) ** 0.5 def get_energy_consumption(v_t): ''' arg 1) v_t = displacement per time slot ''' energy_1 = P_0 \ + 3 * P_0 * (abs(v_t)) ** 2 / U2_tip \ + 0.5 * d_0 * p * s * A * (abs(v_t))**3 energy_2 = P_i * (( (1 + (abs(v_t) ** 4) / (4 * (v_0 ** 4))) ** 0.5 \ - (abs(v_t) ** 2) / (2 * (v_0 **2)) \ ) ** 0.5) energy = delta_time * (energy_1 + energy_2) return energy ENERGY_MIN = get_energy_consumption(0.25) ENERGY_MAX = get_energy_consumption(0) ###################################################### # modified from data_manager.py init_data_file = 'data/init_location.xlsx' def read_init_location(entity_type = 'user', index = 0): if entity_type == 'user' or 'attacker' or 'RIS' or 'RIS_norm_vec' or 'UAV': return np.array([\ pd.read_excel(init_data_file, sheet_name=entity_type)['x'][index],\ pd.read_excel(init_data_file, sheet_name=entity_type)['y'][index],\ pd.read_excel(init_data_file, sheet_name=entity_type)['z'][index]]) else: return None # load and plot everything class LoadAndPlot(object): """ load date and plot 2022-07-22 16_16_26 """ def __init__(self, store_path, \ user_num = 2, attacker_num = 1, RIS_ant_num = 4, \ ep_num = EP_NUM, step_num = 100): # RIS_ant_num = 16 (not true) self.color_list = ['b', 'c', 'g', 'k', 'm', 'r', 'y'] self.store_path = store_path + '//' self.user_num = user_num self.attacker_num = attacker_num self.RIS_ant_num = RIS_ant_num self.ep_num = ep_num self.step_num = step_num self.all_steps = self.load_all_steps() def load_one_ep(self, file_name): m = loadmat(self.store_path + file_name) return m def load_all_steps(self): result_dic = {} result_dic.update({'reward':[]}) result_dic.update({'user_capacity':[]}) for i in range(self.user_num): result_dic['user_capacity'].append([]) result_dic.update({'secure_capacity':[]}) for i in range(self.user_num): result_dic['secure_capacity'].append([]) result_dic.update({'attaker_capacity':[]}) for i in range(self.attacker_num): result_dic['attaker_capacity'].append([]) result_dic.update({'RIS_elements':[]}) for i in range(self.RIS_ant_num): result_dic['RIS_elements'].append([]) for ep_cnt in range(self.ep_num): mat_ep = self.load_one_ep("simulation_result_ep_" + str(ep_cnt) + ".mat") one_ep_reward = mat_ep["result_" + str(ep_cnt)]["reward"][0][0] result_dic['reward'] += list(one_ep_reward[:, 0]) one_ep_user_capacity = mat_ep["result_" + str(ep_cnt)]["user_capacity"][0][0] for i in range(self.user_num): result_dic['user_capacity'][i] += list(one_ep_user_capacity[:, i]) one_ep_secure_capacity = mat_ep["result_" + str(ep_cnt)]["secure_capacity"][0][0] for i in range(self.user_num): result_dic['secure_capacity'][i] += list(one_ep_secure_capacity[:, i]) one_ep_attaker_capacity = mat_ep["result_" + str(ep_cnt)]["attaker_capacity"][0][0] for i in range(self.attacker_num): result_dic['attaker_capacity'][i] += list(one_ep_attaker_capacity[:, i]) one_ep_RIS_first_element = mat_ep["result_" + str(ep_cnt)]["reflecting_coefficient"][0][0] for i in range(self.RIS_ant_num): result_dic['RIS_elements'][i] += list(one_ep_RIS_first_element[:, i]) return result_dic def plot(self): """ plot result b--blue c--cyan(青色） g--green k--black m--magenta（紫红色） r--red w--white y--yellow """ if not os.path.exists(self.store_path + 'plot'): os.makedirs(self.store_path + 'plot') os.makedirs(self.store_path + 'plot/RIS') color_list = ['b', 'g', 'c', 'k', 'm', 'r', 'y'] ############################### # read step counts per episode ############################### step_num_per_episode = [] with open(self.store_path + 'step_num_per_episode.csv', newline='') as csvfile: reader = csv.reader(csvfile) for row in reader: step_num_per_episode.append(int(row[0])) ############################### # plot reward ############################### fig = plt.figure('reward') plt.plot(range(len(self.all_steps['reward'])), self.all_steps['reward']) plt.xlabel("Time Steps ($t$)") plt.ylabel("Reward") plt.savefig(self.store_path + 'plot/reward.png') plt.cla() ############################### # plot secure capacity ############################### fig = plt.figure('secure_capacity') for i in range(self.user_num): plt.plot(range(len(self.all_steps['secure_capacity'][i])), self.all_steps['secure_capacity'][i], c=color_list[i]) plt.legend(['user_' + str(i) for i in range(self.user_num)]) plt.xlabel("Time Steps ($t$)") plt.ylabel("Secure Capacity") plt.savefig(self.store_path + 'plot/secure_capacity.png') plt.cla() ############################### # plot average sum secrecy rate of each episode ############################### fig = plt.figure('average_sum_secrecy_rate') sum_secrecy_rate = np.array(self.all_steps['secure_capacity']) sum_secrecy_rate = np.sum(sum_secrecy_rate, axis = 0) average_sum_secrecy_rate = [] ssr = [] j = 0 for i in range(self.ep_num): ssr_one_episode = sum_secrecy_rate[j:j+step_num_per_episode[i]] # ssr means Sum Secrecy Rate #print(j, j+step_num_per_episode[i]) j = j+step_num_per_episode[i] ssr.append(ssr_one_episode) try: _ = sum(ssr_one_episode) / len(ssr_one_episode) except: _ = 0 average_sum_secrecy_rate.append(_) plt.plot(range(len(average_sum_secrecy_rate)), average_sum_secrecy_rate) plt.xlabel("Episodes (Ep)") plt.ylabel("Average Sum Secrecy Rate") plt.savefig(self.store_path + 'plot/average_sum_secrecy_rate.png') plt.cla() print() print('###########################################################') print('Metrics\t\t\tLast Episode\tMax Values Reached') print('###########################################################') print('SSR (bits/s/Hz)\t\t{:.2f}\t\t{:.2f}'.format(average_sum_secrecy_rate[-1], max(average_sum_secrecy_rate))) ############################### # plot secrecy energy efficient ############################### fig = plt.figure('average_secrecy_energy_efficiency') # get init location init_uav_coord = read_init_location(entity_type = 'UAV') init_user_coord_0 = read_init_location(entity_type = 'user', index=0) init_user_coord_1 = read_init_location(entity_type = 'user', index=1) ep_num = EP_NUM energies = [] for i in range(ep_num): # read the mat file filename = f'simulation_result_ep_{i}.mat' filename = os.path.join(STORE_PATH, filename) data = loadmat(filename) # v_ts energies_one_episode = [] # loop all uav movt uav_movt = data[f'result_{i}'][0][0][-1] for j in range(uav_movt.shape[0]): move_x = uav_movt[j][0] move_y = uav_movt[j][1] v_t = (move_x ** 2 + move_y ** 2) ** 0.5 energy = get_energy_consumption(v_t / delta_time) energies_one_episode.append(energy) energies.append(energies_one_episode) average_see = [] for ssr_one_episode, energies_one_episode in zip(ssr, energies): ssr_one_episode = ssr_one_episode[:len(energies_one_episode)] energies_one_episode = energies_one_episode[:len(ssr_one_episode)] #print(len(ssr_one_episode), len(energies_one_episode)) try: see = np.array(ssr_one_episode) / np.array(energies_one_episode) average_see.append(sum(see)/len(see)) except: average_see.append(0) plt.plot(range(len(average_see)), average_see) plt.xlabel("Episodes (Ep)") plt.ylabel("Average Secrecy Energy Efficiency") plt.savefig(self.store_path + 'plot/average_secrecy_energy_efficiency.png') plt.cla() print('Energy (kJ)\t\t{:.2f}\t\t{:.2f}'.format(sum(energies[-1])/1000, sum(energies[np.argmax(average_see)])/1000)) print('SEE (bits/s/Hz/kJ)\t{:.2f}\t\t{:.2f}'.format(average_see[-1]*1000, max(average_see)*1000)) print('\nThe final performance is evalulated based on the Last Episode (where exploration=0)\n') ############################### # plot user capacity ############################### fig = plt.figure('user_capacity') for i in range(self.user_num): plt.plot(range(len(self.all_steps['user_capacity'][i])), self.all_steps['user_capacity'][i], c=color_list[i]) plt.legend(['user_' + str(i) for i in range(self.user_num)]) plt.xlabel("Time Steps ($t$)") plt.ylabel("User Capacity") plt.savefig(self.store_path + 'plot/user_capacity.png') plt.cla() ############################### # plot attacker capacity ############################### fig = plt.figure('attaker_capacity') for i in range(self.attacker_num): plt.plot(range(len(self.all_steps['attaker_capacity'][i])), self.all_steps['attaker_capacity'][i], c=color_list[i]) plt.legend(['attacker_' + str(i) for i in range(self.attacker_num)]) plt.xlabel("Time Steps ($t$)") plt.ylabel("Attack Capacity") plt.savefig(self.store_path + 'plot/attaker_capacity.png') plt.close('all') ############################### # plot ris ############################### for i in range(self.RIS_ant_num): self.plot_one_RIS_element(i) ############################### # plot trajectory ############################### self.plot_trajectory() def plot_one_RIS_element(self, index): """ docstring """ ax_real_imag = plt.subplot(1,1,1) ax_pase = ax_real_imag.twinx() #plt.ylim(ymax = 1, ymin = -1) #plt.xlim(xmax = 10000 , xmin = 10000 - 100) ax_real_imag.plot(range(len(self.all_steps['RIS_elements'][index])), np.real(self.all_steps['RIS_elements'][index]), c = self.color_list[0]) ax_real_imag.plot(range(len(self.all_steps['RIS_elements'][index])), np.imag(self.all_steps['RIS_elements'][index]), c = self.color_list[1]) phase_list = [] for complex_num in self.all_steps['RIS_elements'][index]: phase_list.append(cmath.phase(complex_num)) plt.ylim(ymax = cmath.pi, ymin = -cmath.pi) ax_pase.plot(range(len(self.all_steps['RIS_elements'][index])), phase_list, c = self.color_list[2]) # plt.xlabel("Time Steps ($t$)") # plt.ylabel("RIS Dimension") # plt.set_ylabel("position") # plt.set_ylabel("position") # plt.set_xlabel("Time Steps ($t$)") plt.savefig(self.store_path + 'plot/RIS/RIS_' + str(index) + '_element.png') plt.close('all') pass def plot_trajectory(self): # get init location init_uav_coord = read_init_location(entity_type = 'UAV') init_user_coord_0 = read_init_location(entity_type = 'user', index=0) init_user_coord_1 = read_init_location(entity_type = 'user', index=1) ep_num = EP_NUM interval = int(0.2 * EP_NUM) ep_list = [0] + [i for i in range(20-1, ep_num, interval)] if EP_NUM - 1 not in ep_list: ep_list.append(EP_NUM - 1) color_list_template = ['b', 'g', 'c', 'k', 'm', 'r', 'y', 'black', 'red'] color_list = copy.deepcopy(color_list_template) for i in ep_list: # read the mat file filename = f'simulation_result_ep_{i}.mat' filename = os.path.join(STORE_PATH, filename) data = loadmat(filename) # uav movt uav_coord = [ [init_uav_coord[0]], [init_uav_coord[1]] ] uav_movt = data[f'result_{i}'][0][0][-1] for j in range(uav_movt.shape[0]): move_x = uav_movt[j][0] move_y = uav_movt[j][1] prev_x = uav_coord[0][-1] prev_y = uav_coord[1][-1] current_x = prev_x + move_x current_y = prev_y + move_y uav_coord[0].append(current_x) uav_coord[1].append(current_y) plt.plot(uav_coord[1],uav_coord[0], c=color_list.pop(0)) # user 0 movt direction_fai = -1/2*math.pi distance_delta_d = 0.2 user_coord_0 = [ [init_user_coord_0[0]], [init_user_coord_0[1]] ] #color_list = copy.deepcopy(color_list_template) for j in range(uav_movt.shape[0]): delta_x = distance_delta_d * math.cos(direction_fai) delta_y = distance_delta_d * math.sin(direction_fai) prev_x = user_coord_0[0][-1] prev_y = user_coord_0[1][-1] current_x = prev_x + delta_x current_y = prev_y + delta_y user_coord_0[0].append(current_x) user_coord_0[1].append(current_y) plt.plot(user_coord_0[1],user_coord_0[0], c=color_list.pop(0)) plt.plot(user_coord_0[1][0], user_coord_0[0][0], marker="o", markersize=10, markeredgecolor="red", markerfacecolor="red") # user 1 movt direction_fai = -1/2*math.pi distance_delta_d = 0.2 user_coord_0 = [ [init_user_coord_1[0]], [init_user_coord_1[1]] ] #color_list = copy.deepcopy(color_list_template) for j in range(uav_movt.shape[0]): delta_x = distance_delta_d * math.cos(direction_fai) delta_y = distance_delta_d * math.sin(direction_fai) prev_x = user_coord_0[0][-1] prev_y = user_coord_0[1][-1] current_x = prev_x + delta_x current_y = prev_y + delta_y user_coord_0[0].append(current_x) user_coord_0[1].append(current_y) plt.plot(user_coord_0[1],user_coord_0[0], c=color_list.pop(0)) plt.plot(user_coord_0[1][0], user_coord_0[0][0], marker="o", markersize=10, markeredgecolor="red", markerfacecolor="red") plt.legend(ep_list) plt.grid() plt.xlim(0, 50) plt.ylim(-10, 30) plt.gca().invert_yaxis() plt.savefig(self.store_path + 'plot/trajectory.png') plt.cla() def restruct(self): savemat(self.store_path + 'all_steps.mat',self.all_steps) return 0 if __name__ == '__main__': LoadPlotObject = LoadAndPlot( store_path = STORE_PATH, ) LoadPlotObject.plot() LoadPlotObject.restruct() ================================================ FILE: main_RIS.py ================================================ from env import * episode_num= 100 step_num = 100 a= int(5/5 * episode_num * step_num) print(a) ================================================ FILE: main_ref.py ================================================ from ddpg import Agent from env1 import minimal_IRS_system # orginal from env import MiniSystem import numpy as np #from utils import plotLearning import matplotlib.pyplot as plt import os from data_manager import DataManager os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE" IRS_system = minimal_IRS_system(K = 1) #IRS_system = MiniSystem() K = IRS_system.K M = IRS_system.M N = IRS_system.N RL_state_dims = 2*K + 2*K**2 + 2*N + 2*M*K + 2*N*M + 2*K*N RL_input_dims = RL_state_dims RL_action_dims = 2 * (M * K) + N steps_per_ep = 200 alpha_actor_learning_rate = 0.001 beta_critic_learning_rate = 0.001 agent = Agent(alpha=alpha_actor_learning_rate, beta=beta_critic_learning_rate, input_dims=[RL_input_dims], tau=0.001, env=IRS_system, batch_size=64, layer1_size=400 * 2, layer2_size=300 * 2, n_actions=RL_action_dims) scores = [] for i in range(1000): observersion = IRS_system.reset() done = False done_sys = False score = 0 cnt_in_one_epi = 0 best_bit_per_Hz = 0 draw_bit_rate_list = [] draw_tran_power_list = [] #draw_bit_rate_one_element = {'if_exceed_max_power':False,'bit_rate' : 0} while not done: cnt_in_one_epi += 1 if cnt_in_one_epi > 500: done = True action = agent.choose_action(observersion) new_state, reward, done_sys , info = IRS_system.step(action) bit_per_Hz = IRS_system.calculate_data_rate() #draw_bit_rate_one_element['bit_rate'] = bit_per_Hz #draw_bit_rate_one_element['if_exceed_max_power']=done_sys draw_bit_rate_list.append(bit_per_Hz) total_power = IRS_system.calculate_total_transmit_power() draw_tran_power_list.append(total_power) if done_sys == False:# if not exceed max transmit power if bit_per_Hz > best_bit_per_Hz: best_bit_per_Hz = bit_per_Hz agent.remember(observersion, action, reward, new_state, int(done)) agent.learn() score += reward observersion = new_state IRS_system.render() plt.cla() plt.plot(range(len(draw_bit_rate_list)), draw_bit_rate_list, color = 'green') plt.plot(range(len(draw_tran_power_list)), draw_tran_power_list, color = 'red') # plt.show() filename_i =os.path.abspath(os.curdir) + '\\main_foder\\image_result\\' + str(i) + '.png' plt.savefig(filename_i) scores.append(score) #if i % 25 == 0: #agent.save_models() print('episode ', i, 'score %.2f' % score, 'best sum rate %.3f bit/s/Hz' % best_bit_per_Hz, 'trailing 100 games avg %.4f' % np.mean(scores[-100:])) filename = 'C:\\demo\\IRS_DDPG_minimal\\main_foder\\LunarLander-alpha000025-beta00025-400-300.png' # plotLearning(scores, filename, window=100) ================================================ FILE: main_train.py ================================================ # debug field import os os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE" import argparse # get argument from user parser = argparse.ArgumentParser() parser.add_argument('--drl', type = str, required = True, default='td3', help="which drl algo would you like to choose ['ddpg', 'td3']") parser.add_argument('--reward', type = str, required = True, default='see', help="which reward would you like to implement ['ssr', 'see']") parser.add_argument('--seeds', type = int, required = False, default=None, nargs='+', help="what seed(s) would you like to use for DRL 1 and 2, please provide in one or two int") parser.add_argument('--ep-num', type = int, required = False, default=300, help="how many episodes do you want to train your DRL") parser.add_argument('--trained-uav', default=False, action='store_true', help='use trained uav instead of retraining') args = parser.parse_args() DRL_ALGO = args.drl REWARD_DESIGN = args.reward SEEDS = args.seeds EPISODE_NUM = args.ep_num TRAINED_UAV = args.trained_uav # process the argument assert DRL_ALGO in ['ddpg', 'td3'], "drl must be ['ddpg', 'td3']" assert REWARD_DESIGN in ['ssr', 'see'], "reward must be ['ssr', 'see']" if SEEDS is not None: assert len(SEEDS) in [1, 2] and isinstance(SEEDS[0], int) and isinstance(SEEDS[-1], int), "seeds must be a list of 1 or 2 integer" if DRL_ALGO == 'td3': from td3 import Agent elif DRL_ALGO == 'ddpg': from ddpg import Agent import ddpg from env import MiniSystem import numpy as np import math import time import torch # 1 init system model episode_num = EPISODE_NUM # recommend to be 300 episode_cnt = 0 step_num = 100 project_name = f'trained_uav/{DRL_ALGO}_{REWARD_DESIGN}' if TRAINED_UAV else f'scratch/{DRL_ALGO}_{REWARD_DESIGN}' system = MiniSystem( user_num=2, RIS_ant_num=4, UAV_ant_num=4, if_dir_link=1, if_with_RIS=True, if_move_users=True, if_movements=True, reverse_x_y=(False, False), if_UAV_pos_state = True, reward_design = REWARD_DESIGN, project_name = project_name, step_num = step_num ) if_Theta_fixed = False if_G_fixed = False if_BS = False if_robust = True # 2 init RL Agent agent_1_param_dic = {} agent_1_param_dic["alpha"] = 0.0001 agent_1_param_dic["beta"] = 0.001 agent_1_param_dic["input_dims"] = system.get_system_state_dim() agent_1_param_dic["tau"] = 0.001 agent_1_param_dic["batch_size"] = 64 agent_1_param_dic["n_actions"] = system.get_system_action_dim() - 2 agent_1_param_dic["action_noise_factor"] = 0.1 agent_1_param_dic["memory_max_size"] = int(5/5 * episode_num * step_num) #/2 agent_1_param_dic["agent_name"] = "G_and_Phi" agent_1_param_dic["layer1_size"] = 800 agent_1_param_dic["layer2_size"] = 600 agent_1_param_dic["layer3_size"] = 512 agent_1_param_dic["layer4_size"] = 256 agent_2_param_dic = {} agent_2_param_dic["alpha"] = 0.0001 agent_2_param_dic["beta"] = 0.001 agent_2_param_dic["input_dims"] = 3 agent_2_param_dic["tau"] = 0.001 agent_2_param_dic["batch_size"] = 64 agent_2_param_dic["n_actions"] = 2 agent_2_param_dic["action_noise_factor"] = 0.5 agent_2_param_dic["memory_max_size"] = int(5/5 * episode_num * step_num) #/2 agent_2_param_dic["agent_name"] = "UAV" agent_2_param_dic["layer1_size"] = 400 agent_2_param_dic["layer2_size"] = 300 agent_2_param_dic["layer3_size"] = 256 agent_2_param_dic["layer4_size"] = 128 if SEEDS is not None: torch.manual_seed(SEEDS[0]) # 1 torch.cuda.manual_seed_all(SEEDS[0]) # 1 agent_1 = Agent( alpha = agent_1_param_dic["alpha"], beta = agent_1_param_dic["beta"], input_dims = [agent_1_param_dic["input_dims"]], tau = agent_1_param_dic["tau"], env = system, batch_size = agent_1_param_dic["batch_size"], layer1_size=agent_1_param_dic["layer1_size"], layer2_size=agent_1_param_dic["layer2_size"], layer3_size=agent_1_param_dic["layer3_size"], layer4_size=agent_1_param_dic["layer4_size"], n_actions = agent_1_param_dic["n_actions"], max_size = agent_1_param_dic["memory_max_size"], agent_name= agent_1_param_dic["agent_name"] ) if SEEDS is not None: torch.manual_seed(SEEDS[-1]) # 2 torch.cuda.manual_seed_all(SEEDS[-1]) # 2 agent_2 = Agent( alpha = agent_2_param_dic["alpha"], beta = agent_2_param_dic["beta"], input_dims = [agent_2_param_dic["input_dims"]], tau = agent_2_param_dic["tau"], env = system, batch_size = agent_2_param_dic["batch_size"], layer1_size=agent_2_param_dic["layer1_size"], layer2_size=agent_2_param_dic["layer2_size"], layer3_size=agent_2_param_dic["layer3_size"], layer4_size=agent_2_param_dic["layer4_size"], n_actions = agent_2_param_dic["n_actions"], max_size = agent_2_param_dic["memory_max_size"], agent_name= agent_2_param_dic["agent_name"] ) if TRAINED_UAV: benchmark = f'data/storage/benchmark/{DRL_ALGO}_{REWARD_DESIGN}_benchmark' if DRL_ALGO == 'td3': agent_2.load_models( load_file_actor = benchmark + '/Actor_UAV_TD3', load_file_critic_1 = benchmark + '/Critic_1_UAV_TD3', load_file_critic_2 = benchmark + '/Critic_2_UAV_TD3' ) elif DRL_ALGO == 'ddpg': agent_2.load_models( load_file_actor = benchmark + '/Actor_UAV_ddpg', load_file_critic = benchmark + '/Critic_UAV_ddpg' ) meta_dic = {} print("***********************system information******************************") print("folder_name: "+str(system.data_manager.store_path)) meta_dic['folder_name'] = system.data_manager.store_path print("user_num: "+str(system.user_num)) meta_dic['user_num'] = system.user_num print("if_dir: "+str(system.if_dir_link)) meta_dic['if_dir_link'] = system.if_dir_link print("if_with_RIS: "+str(system.if_with_RIS)) meta_dic['if_with_RIS'] = system.if_with_RIS print("if_user_m: "+str(system.if_move_users)) meta_dic['if_move_users'] = system.if_move_users print("RIS_ant_num: "+str(system.RIS.ant_num)) meta_dic['system_RIS_ant_num'] = system.RIS.ant_num print("UAV_ant_num: "+str(system.UAV.ant_num)) meta_dic['system_UAV_ant_num'] = system.UAV.ant_num print("if_movements: "+str(system.if_movements)) meta_dic['system_if_movements'] = system.if_movements print("reverse_x_y: "+str(system.reverse_x_y)) meta_dic['system_reverse_x_y'] = system.reverse_x_y print("if_UAV_pos_state:"+str(system.if_UAV_pos_state)) meta_dic['if_UAV_pos_state'] = system.if_UAV_pos_state print("ep_num: "+str(episode_num)) meta_dic['episode_num'] = episode_num print("step_num: "+str(step_num)) meta_dic['step_num'] = step_num print("***********************agent_1 information******************************") tplt = "{0:{2}^20}\t{1:{2}^20}" for i in agent_1_param_dic: parm = agent_1_param_dic[i] print(tplt.format(i, parm, chr(12288))) meta_dic["agent_1"] = agent_1_param_dic print("***********************agent_2 information******************************") for i in agent_2_param_dic: parm = agent_2_param_dic[i] print(tplt.format(i, parm, chr(12288))) meta_dic["agent_2"] = agent_2_param_dic system.data_manager.save_meta_data(meta_dic) print("***********************traning information******************************") while episode_cnt < episode_num: # 1 reset the whole system system.reset() step_cnt = 0 score_per_ep = 0 # 2 get the initial state if if_robust: tmp = system.observe() #z = np.random.multivariate_normal(np.zeros(2), 0.5*np.eye(2), size=len(tmp)).view(np.complex128) z = np.random.normal(size=len(tmp)) observersion_1 = list( np.array(tmp) + 0.6 *1e-7* z ) else: observersion_1 = system.observe() observersion_2 = list(system.UAV.coordinate) if episode_cnt == 80: print("break point") while step_cnt < step_num: # 1 count num of step in one episode step_cnt += 1 # judge if pause the whole system if not system.render_obj.pause: # 2 choose action acoording to current state action_1 = agent_1.choose_action(observersion_1, greedy=agent_1_param_dic["action_noise_factor"] * math.pow((1-episode_cnt / episode_num), 2)) action_2 = agent_2.choose_action(observersion_2, greedy=agent_2_param_dic["action_noise_factor"]* math.pow((1-episode_cnt / episode_num), 2)) if if_BS: action_2[0]=0 action_2[1]=0 if if_Theta_fixed: action_1[0+2 * system.UAV.ant_num * system.user_num:] = len(action_1[0+2 * system.UAV.ant_num * system.user_num:])*[0] if if_G_fixed: action_1[0:0+2 * system.UAV.ant_num * system.user_num]=np.array([-0.0313, -0.9838, 0.3210, 1.0, -0.9786, -0.1448, 0.3518, 0.5813, -1.0, -0.2803, -0.4616, -0.6352, -0.1449, 0.7040, 0.4090, -0.8521]) * math.pow(episode_cnt / episode_num, 2) * 0.7 #action_1[0:0+2 * system.UAV.ant_num * system.user_num]=len(action_1[0:0+2 * system.UAV.ant_num * system.user_num])*[0.5] # 3 get newstate, reward if system.if_with_RIS: new_state_1, reward, done, info = system.step( action_0=action_2[0], action_1=action_2[1], G=action_1[0:0+2 * system.UAV.ant_num * system.user_num], Phi=action_1[0+2 * system.UAV.ant_num * system.user_num:], set_pos_x=action_2[0], set_pos_y=action_2[1] ) new_state_2 = list(system.UAV.coordinate) else: new_state_1, reward, done, info = system.step( action_0=action_2[0], action_1=action_2[1], G=action_1[0:0+2 * system.UAV.ant_num * system.user_num], set_pos_x=action_2[0], set_pos_y=action_2[1] ) new_state_2 = list(system.UAV.coordinate) score_per_ep += reward # 4 store state pair into mem pool agent_1.remember(observersion_1, action_1, reward, new_state_1, int(done)) agent_2.remember(observersion_2, action_2, reward, new_state_2, int(done)) # 5 update DDPG net agent_1.learn() if not TRAINED_UAV: agent_2.learn() #system.render_obj.render(0.001) # no rendering for faster observersion_1 = new_state_1 observersion_2 = new_state_2 if done == True: break else: #system.render_obj.render_pause() # no rendering for faster time.sleep(0.001) #time.sleep(1) system.data_manager.save_file(episode_cnt=episode_cnt) system.reset() print("ep_num: "+str(episode_cnt)+" ep_score: "+str(score_per_ep)) episode_cnt +=1 if episode_cnt % 10 == 0: agent_1.save_models() agent_2.save_models() # save the last model agent_1.save_models() agent_2.save_models() ================================================ FILE: math_tool.py ================================================ import math import cmath import numpy as np import pandas as pd from numpy import * from matplotlib import pyplot as plt from mpl_toolkits.mplot3d import Axes3D from matplotlib.patches import FancyArrowPatch from mpl_toolkits.mplot3d import proj3d def cartesian_coordinate_to_spherical_coordinate(cartesian_coordinate): """ transmit cartesian_coordinate_to_spherical_coordinate input 1 X 3 np.array, [x, y, z] output 1 X 3 np.array, [r, theta, fai] """ r = np.linalg.norm(cartesian_coordinate) if abs(cartesian_coordinate[2]) < 1e-8: theta = math.atan(np.linalg.norm(cartesian_coordinate[0:2])/1e-8) else: theta = math.atan(np.linalg.norm(cartesian_coordinate[0:2])/cartesian_coordinate[2]) if abs(cartesian_coordinate[0]) < 1e-8: x = 1e-8 else: x = cartesian_coordinate[0] y = cartesian_coordinate[1] if abs(y) < 1e-8: y = 1e-8 if y > 0 and x > 0: fai = math.atan(y/x) elif x < 0 and y > 0: fai = math.atan(y/x) + math.pi elif x < 0 and y < 0: fai = math.atan(y/x) - math.pi else: fai = math.atan(y/x) return r, theta, fai def vecter_normalization(cartesian_coordinate): return cartesian_coordinate/np.linalg.norm(cartesian_coordinate) def get_coor_ref(coor_sys, coor): """ input: coor_sys: normalized 1,3 np.array list (1,3) coor: coordinate under earth system output: referenced coordinate for x,y, normalized 1,3 np.array """ x_ref = np.dot(coor_sys[0],coor) y_ref = np.dot(coor_sys[1],coor) z_ref = np.dot(coor_sys[2],coor) return np.array([x_ref, y_ref, z_ref]) def dB_to_normal(dB): """ input: dB output: normal vaule """ return math.pow(10, (dB/10)) def normal_to_dB(normal): """ input: normal output: dB value """ return -10 * math.log10(normal) def diag_to_vector(diag): """ transfer a diagnal matrix into a vector """ vec_size = np.shape(diag)[0] vector = np.mat(np.zeros((vec_size, 1), dtype=complex), dtype=complex) for i in range(vec_size): vector[i, 0] = diag[i, i] return vector def vector_to_diag(vector): """ transfer a vector into a diagnal matrix """ vec_size = np.shape(vector)[0] diag = np.mat(np.zeros((vec_size, vec_size), dtype=complex), dtype=complex) for i in range(vec_size): diag[i, i] = vector[i, 0] return diag def bigger_than_zero(value): """ max(0,value) """ return max(0, value) def dataframe_to_dictionary(df): """ docstring """ return {col_name : df[col_name].values for col_name in df.columns.values} def convert_list_to_complex_matrix(list_real, shape): """ list_real is a 2* N*K dim list, convert it to N X K complex matrix shape is a tuple (N, K) """ N = shape[0] K = shape[1] matrix_complex =np.mat(np.zeros((N, K), dtype=complex), dtype=complex) for i in range(N): for j in range(K): matrix_complex[i, j] = list_real[2*(i*K + j)] + 1j * list_real[2*(i*K + j) + 1] return matrix_complex def convert_list_to_complex_diag(list_real, diag_row_num): """ list_real is a M dim list, convert it to M X M complex diag matrix diag_row_num is the M """ M = diag_row_num diag_matrix_complex = np.mat(np.zeros((M, M), dtype=complex), dtype=complex) for i in range(M): diag_matrix_complex[i, i] = cmath.exp(1j * list_real[i] * math.pi) return diag_matrix_complex def map_to(x, x_range:tuple, y_range:tuple): x_min = x_range[0] x_max = x_range[1] y_min = y_range[0] y_max = y_range[1] y = y_min+(y_max - y_min) / (x_max - x_min) * (x - x_min) return y ================================================ FILE: plot_see.py ================================================ import matplotlib as mpl mpl.rcParams['figure.dpi'] = 300 import matplotlib.pyplot as plt import numpy as np import cmath from scipy.io import loadmat, savemat import pandas as pd import os import copy import math ###################################################### # new for energy # energy related parameters of rotary-wing UAV # based on Energy Minimization in Internet-of-Things System Based on Rotary-Wing UAV P_i = 790.6715 P_0 = 580.65 U2_tip = (200) ** 2 s = 0.05 d_0 = 0.3 p = 1.225 A = 0.79 delta_time = 0.1 #0.1/1000 #0.1ms # add ons hover veloctiy # based on https://www.intechopen.com/chapters/57483 m = 1.3 # mass: assume 1.3kg https://www.droneblog.com/average-weights-of-common-types-of-drones/#:~:text=In%20most%20cases%2C%20toy%20drones,What%20is%20this%3F g = 9.81 # gravity T = m * g # thrust v_0 = (T / (A * 2 * p)) ** 0.5 def get_energy_consumption(v_t): ''' arg 1) v_t = displacement per time slot ''' energy_1 = P_0 \ + 3 * P_0 * (abs(v_t)) ** 2 / U2_tip \ + 0.5 * d_0 * p * s * A * (abs(v_t))**3 energy_2 = P_i * (( (1 + (abs(v_t) ** 4) / (4 * (v_0 ** 4))) ** 0.5 \ - (abs(v_t) ** 2) / (2 * (v_0 **2)) \ ) ** 0.5) energy = delta_time * (energy_1 + energy_2) return energy ENERGY_MIN = get_energy_consumption(0.25) ENERGY_MAX = get_energy_consumption(0) ###################################################### # modified from data_manager.py init_data_file = 'data/init_location.xlsx' def read_init_location(entity_type = 'user', index = 0): if entity_type == 'user' or 'attacker' or 'RIS' or 'RIS_norm_vec' or 'UAV': return np.array([\ pd.read_excel(init_data_file, sheet_name=entity_type)['x'][index],\ pd.read_excel(init_data_file, sheet_name=entity_type)['y'][index],\ pd.read_excel(init_data_file, sheet_name=entity_type)['z'][index]]) else: return None # load and plot everything class LoadAndPlot(object): """ load date and plot 2022-07-22 16_16_26 """ def __init__(self, store_paths, \ user_num = 2, attacker_num = 1, RIS_ant_num = 4, \ ep_num = 300, step_num = 100): # RIS_ant_num = 16 (not true) self.store_paths = store_paths self.color_list = ['b', 'c', 'g', 'k', 'm', 'r', 'y'] # self.store_paths = store_paths + '//' self.user_num = user_num self.attacker_num = attacker_num self.RIS_ant_num = RIS_ant_num self.ep_num = ep_num self.step_num = step_num def load_one_ep(self, file_name): m = loadmat(self.store_path + file_name) return m def load_all_steps(self): result_dic = {} result_dic.update({'reward':[]}) result_dic.update({'user_capacity':[]}) for i in range(self.user_num): result_dic['user_capacity'].append([]) result_dic.update({'secure_capacity':[]}) for i in range(self.user_num): result_dic['secure_capacity'].append([]) result_dic.update({'attaker_capacity':[]}) for i in range(self.attacker_num): result_dic['attaker_capacity'].append([]) result_dic.update({'RIS_elements':[]}) for i in range(self.RIS_ant_num): result_dic['RIS_elements'].append([]) for ep_cnt in range(self.ep_num): mat_ep = self.load_one_ep("simulation_result_ep_" + str(ep_cnt) + ".mat") one_ep_reward = mat_ep["result_" + str(ep_cnt)]["reward"][0][0] result_dic['reward'] += list(one_ep_reward[:, 0]) one_ep_user_capacity = mat_ep["result_" + str(ep_cnt)]["user_capacity"][0][0] for i in range(self.user_num): result_dic['user_capacity'][i] += list(one_ep_user_capacity[:, i]) one_ep_secure_capacity = mat_ep["result_" + str(ep_cnt)]["secure_capacity"][0][0] for i in range(self.user_num): result_dic['secure_capacity'][i] += list(one_ep_secure_capacity[:, i]) one_ep_attaker_capacity = mat_ep["result_" + str(ep_cnt)]["attaker_capacity"][0][0] for i in range(self.attacker_num): result_dic['attaker_capacity'][i] += list(one_ep_attaker_capacity[:, i]) one_ep_RIS_first_element = mat_ep["result_" + str(ep_cnt)]["reflecting_coefficient"][0][0] for i in range(self.RIS_ant_num): result_dic['RIS_elements'][i] += list(one_ep_RIS_first_element[:, i]) return result_dic def plot(self): """ plot result b--blue c--cyan(青色） g--green k--black m--magenta（紫红色） r--red w--white y--yellow """ ############################### # plot average secrecy energy efficient of each episode ############################### fig = plt.figure('average_secrecy_energy_efficiency') # get init location init_uav_coord = read_init_location(entity_type = 'UAV') init_user_coord_0 = read_init_location(entity_type = 'user', index=0) init_user_coord_1 = read_init_location(entity_type = 'user', index=1) # paths legends = ['TDDRL', 'TTD3', 'TDDRL (Energy Penalty)', 'TTD3 (Energy Penalty)'] all_average_see = [] all_energies = [] # energies for store_path in self.store_paths: energies = [] for i in range(self.ep_num): # read the mat file filename = f'simulation_result_ep_{i}.mat' filename = os.path.join(store_path, filename) data = loadmat(filename) # v_ts energies_one_episode = [] # loop all uav movt uav_movt = data[f'result_{i}'][0][0][-1] for j in range(uav_movt.shape[0]): move_x = uav_movt[j][0] move_y = uav_movt[j][1] v_t = (move_x ** 2 + move_y ** 2) ** 0.5 energy = get_energy_consumption(v_t / delta_time) energies_one_episode.append(energy) energies.append(energies_one_episode) all_energies.append(energies) # see for store_path, legend in zip(self.store_paths, legends): average_see = [] # ssr self.store_path = store_path + '//' self.all_steps = self.load_all_steps() sum_secrecy_rate = np.array(self.all_steps['secure_capacity']) sum_secrecy_rate = np.sum(sum_secrecy_rate, axis = 0) # energy energies = all_energies.pop(0) for i in range(0, self.ep_num * self.step_num, self.step_num): ssr_one_episode = sum_secrecy_rate[i:i+self.step_num] # ssr means Sum Secrecy Rate energies_one_episode = energies.pop(0) ssr_one_episode = ssr_one_episode[:len(energies_one_episode)] energies_one_episode = energies_one_episode[:len(ssr_one_episode)] try: see = np.array(ssr_one_episode) / np.array(energies_one_episode) average_see.append(sum(see)/len(see)) except: average_see.append(0) # change from /J to /kJ average_see = np.array(average_see) * 1000 average_see = list(average_see) all_average_see.append(average_see) plt.plot(range(len(average_see)), average_see, label=legend) plt.xlabel("Episodes (Ep)") plt.ylabel("Average Secrecy Energy Efficiency") plt.legend() plt.savefig('data/average_secrecy_energy_efficiency.png') # dictionary of lists dict = {legend: average_see for legend, average_see in zip(legends, all_average_see)} df = pd.DataFrame(dict) df.to_excel('data/average_secrecy_energy_efficiency.xlsx', index=False) if __name__ == '__main__': LoadPlotObject = LoadAndPlot( store_paths = ['data/storage/scratch/ddpg_ssr', 'data/storage/scratch/td3_ssr', 'data/storage/scratch/ddpg_see', 'data/storage/scratch/td3_see'], ep_num = 300, ) LoadPlotObject.plot() ================================================ FILE: plot_ssr.py ================================================ import matplotlib as mpl mpl.rcParams['figure.dpi'] = 300 import matplotlib.pyplot as plt import numpy as np import cmath from scipy.io import loadmat, savemat import pandas as pd import os import copy import math ###################################################### # new for energy # energy related parameters of rotary-wing UAV # based on Energy Minimization in Internet-of-Things System Based on Rotary-Wing UAV P_i = 790.6715 P_0 = 580.65 U2_tip = (200) ** 2 s = 0.05 d_0 = 0.3 p = 1.225 A = 0.79 delta_time = 0.1 #0.1/1000 #0.1ms # add ons hover veloctiy # based on https://www.intechopen.com/chapters/57483 m = 1.3 # mass: assume 1.3kg https://www.droneblog.com/average-weights-of-common-types-of-drones/#:~:text=In%20most%20cases%2C%20toy%20drones,What%20is%20this%3F g = 9.81 # gravity T = m * g # thrust v_0 = (T / (A * 2 * p)) ** 0.5 def get_energy_consumption(v_t): ''' arg 1) v_t = displacement per time slot ''' energy_1 = P_0 \ + 3 * P_0 * (abs(v_t)) ** 2 / U2_tip \ + 0.5 * d_0 * p * s * A * (abs(v_t))**3 energy_2 = P_i * (( (1 + (abs(v_t) ** 4) / (4 * (v_0 ** 4))) ** 0.5 \ - (abs(v_t) ** 2) / (2 * (v_0 **2)) \ ) ** 0.5) energy = delta_time * (energy_1 + energy_2) return energy ENERGY_MIN = get_energy_consumption(0.25) ENERGY_MAX = get_energy_consumption(0) ###################################################### # modified from data_manager.py init_data_file = 'data/init_location.xlsx' def read_init_location(entity_type = 'user', index = 0): if entity_type == 'user' or 'attacker' or 'RIS' or 'RIS_norm_vec' or 'UAV': return np.array([\ pd.read_excel(init_data_file, sheet_name=entity_type)['x'][index],\ pd.read_excel(init_data_file, sheet_name=entity_type)['y'][index],\ pd.read_excel(init_data_file, sheet_name=entity_type)['z'][index]]) else: return None # load and plot everything class LoadAndPlot(object): """ load date and plot 2022-07-22 16_16_26 """ def __init__(self, store_paths, \ user_num = 2, attacker_num = 1, RIS_ant_num = 4, \ ep_num = 300, step_num = 100): # RIS_ant_num = 16 (not true) self.store_paths = store_paths self.color_list = ['b', 'c', 'g', 'k', 'm', 'r', 'y'] # self.store_path = store_path + '//' self.user_num = user_num self.attacker_num = attacker_num self.RIS_ant_num = RIS_ant_num self.ep_num = ep_num self.step_num = step_num def load_one_ep(self, file_name): m = loadmat(self.store_path + file_name) return m def load_all_steps(self): result_dic = {} result_dic.update({'reward':[]}) result_dic.update({'user_capacity':[]}) for i in range(self.user_num): result_dic['user_capacity'].append([]) result_dic.update({'secure_capacity':[]}) for i in range(self.user_num): result_dic['secure_capacity'].append([]) result_dic.update({'attaker_capacity':[]}) for i in range(self.attacker_num): result_dic['attaker_capacity'].append([]) result_dic.update({'RIS_elements':[]}) for i in range(self.RIS_ant_num): result_dic['RIS_elements'].append([]) for ep_cnt in range(self.ep_num): mat_ep = self.load_one_ep("simulation_result_ep_" + str(ep_cnt) + ".mat") one_ep_reward = mat_ep["result_" + str(ep_cnt)]["reward"][0][0] result_dic['reward'] += list(one_ep_reward[:, 0]) one_ep_user_capacity = mat_ep["result_" + str(ep_cnt)]["user_capacity"][0][0] for i in range(self.user_num): result_dic['user_capacity'][i] += list(one_ep_user_capacity[:, i]) one_ep_secure_capacity = mat_ep["result_" + str(ep_cnt)]["secure_capacity"][0][0] for i in range(self.user_num): result_dic['secure_capacity'][i] += list(one_ep_secure_capacity[:, i]) one_ep_attaker_capacity = mat_ep["result_" + str(ep_cnt)]["attaker_capacity"][0][0] for i in range(self.attacker_num): result_dic['attaker_capacity'][i] += list(one_ep_attaker_capacity[:, i]) one_ep_RIS_first_element = mat_ep["result_" + str(ep_cnt)]["reflecting_coefficient"][0][0] for i in range(self.RIS_ant_num): result_dic['RIS_elements'][i] += list(one_ep_RIS_first_element[:, i]) return result_dic def plot(self): """ plot result b--blue c--cyan(青色） g--green k--black m--magenta（紫红色） r--red w--white y--yellow """ ############################### # plot average sum secrecy rate of each episode ############################### fig = plt.figure('average_sum_secrecy_rate') # store_paths = ['data/storage/ddpg 4', 'data/storage/td3 3', 'data/storage/ddpg seem 6', 'data/storage/td3 seem 2'] # store_paths = r'data/storage/scratch/td3_ssr' legends = ['TDDRL', 'TTD3', 'TDDRL (Energy Penalty)', 'TTD3 (Energy Penalty)'] all_average_sum_secrecy_rate = [] for store_path, legend in zip(self.store_paths, legends): self.store_path = store_path + '//' self.all_steps = self.load_all_steps() sum_secrecy_rate = np.array(self.all_steps['secure_capacity']) sum_secrecy_rate = np.sum(sum_secrecy_rate, axis = 0) average_sum_secrecy_rate = [] ssr = [] for i in range(0, self.ep_num * self.step_num, self.step_num): ssr_one_episode = sum_secrecy_rate[i:i+self.step_num] # ssr means Sum Secrecy Rate ssr.append(ssr_one_episode) try: _ = sum(ssr_one_episode) / len(ssr_one_episode) except: _ = 0 average_sum_secrecy_rate.append(_) all_average_sum_secrecy_rate.append(average_sum_secrecy_rate) plt.plot(range(len(average_sum_secrecy_rate)), average_sum_secrecy_rate, label=legend) plt.xlabel("Episodes (Ep)") plt.ylabel("Average Sum Secrecy Rate") plt.legend() plt.savefig('data/average_sum_secrecy_rate.png') # transpose ''' all_average_sum_secrecy_rate = np.array(all_average_sum_secrecy_rate) all_average_sum_secrecy_rate = np.transpose(all_average_sum_secrecy_rate) all_average_sum_secrecy_rate = list(all_average_sum_secrecy_rate) ''' # dictionary of lists dict = {legend: average_sum_secrecy_rate for legend, average_sum_secrecy_rate in zip(legends, all_average_sum_secrecy_rate)} df = pd.DataFrame(dict) df.to_excel('data/average_sum_secrecy_rate.xlsx', index=False) if __name__ == '__main__': LoadPlotObject = LoadAndPlot( store_paths = ['data/storage/scratch/ddpg_ssr', 'data/storage/scratch/td3_ssr', 'data/storage/scratch/ddpg_see', 'data/storage/scratch/td3_see'], ep_num = 300, ) LoadPlotObject.plot() ================================================ FILE: plot_traj.py ================================================ import matplotlib as mpl mpl.rcParams['figure.dpi'] = 100 import matplotlib.pyplot as plt import numpy as np import cmath from scipy.io import loadmat, savemat import pandas as pd import os import copy import math ###################################################### # new for energy # energy related parameters of rotary-wing UAV # based on Energy Minimization in Internet-of-Things System Based on Rotary-Wing UAV P_i = 790.6715 P_0 = 580.65 U2_tip = (200) ** 2 s = 0.05 d_0 = 0.3 p = 1.225 A = 0.79 delta_time = 0.1 #0.1/1000 #0.1ms # add ons hover veloctiy # based on https://www.intechopen.com/chapters/57483 m = 1.3 # mass: assume 1.3kg https://www.droneblog.com/average-weights-of-common-types-of-drones/#:~:text=In%20most%20cases%2C%20toy%20drones,What%20is%20this%3F g = 9.81 # gravity T = m * g # thrust v_0 = (T / (A * 2 * p)) ** 0.5 def get_energy_consumption(v_t): ''' arg 1) v_t = displacement per time slot ''' energy_1 = P_0 \ + 3 * P_0 * (abs(v_t)) ** 2 / U2_tip \ + 0.5 * d_0 * p * s * A * (abs(v_t))**3 energy_2 = P_i * (( (1 + (abs(v_t) ** 4) / (4 * (v_0 ** 4))) ** 0.5 \ - (abs(v_t) ** 2) / (2 * (v_0 **2)) \ ) ** 0.5) energy = delta_time * (energy_1 + energy_2) return energy ENERGY_MIN = get_energy_consumption(0.25) ENERGY_MAX = get_energy_consumption(0) ###################################################### # modified from data_manager.py init_data_file = 'data/init_location.xlsx' def read_init_location(entity_type = 'user', index = 0): if entity_type == 'user' or 'attacker' or 'RIS' or 'RIS_norm_vec' or 'UAV': return np.array([\ pd.read_excel(init_data_file, sheet_name=entity_type)['x'][index],\ pd.read_excel(init_data_file, sheet_name=entity_type)['y'][index],\ pd.read_excel(init_data_file, sheet_name=entity_type)['z'][index]]) else: return None # load and plot everything class LoadAndPlot(object): """ load date and plot 2022-07-22 16_16_26 """ def __init__(self, store_paths, \ user_num = 2, attacker_num = 1, RIS_ant_num = 4, \ ep_num = 300, step_num = 100): # RIS_ant_num = 16 (not true) self.store_paths = store_paths self.color_list = ['b', 'c', 'g', 'k', 'm', 'r', 'y'] # self.store_path = store_path + '//' self.user_num = user_num self.attacker_num = attacker_num self.RIS_ant_num = RIS_ant_num self.ep_num = ep_num self.step_num = step_num def load_one_ep(self, file_name): m = loadmat(self.store_path + file_name) return m def load_all_steps(self): result_dic = {} result_dic.update({'reward':[]}) result_dic.update({'user_capacity':[]}) for i in range(self.user_num): result_dic['user_capacity'].append([]) result_dic.update({'secure_capacity':[]}) for i in range(self.user_num): result_dic['secure_capacity'].append([]) result_dic.update({'attaker_capacity':[]}) for i in range(self.attacker_num): result_dic['attaker_capacity'].append([]) result_dic.update({'RIS_elements':[]}) for i in range(self.RIS_ant_num): result_dic['RIS_elements'].append([]) for ep_cnt in range(self.ep_num): mat_ep = self.load_one_ep("simulation_result_ep_" + str(ep_cnt) + ".mat") one_ep_reward = mat_ep["result_" + str(ep_cnt)]["reward"][0][0] result_dic['reward'] += list(one_ep_reward[:, 0]) one_ep_user_capacity = mat_ep["result_" + str(ep_cnt)]["user_capacity"][0][0] for i in range(self.user_num): result_dic['user_capacity'][i] += list(one_ep_user_capacity[:, i]) one_ep_secure_capacity = mat_ep["result_" + str(ep_cnt)]["secure_capacity"][0][0] for i in range(self.user_num): result_dic['secure_capacity'][i] += list(one_ep_secure_capacity[:, i]) one_ep_attaker_capacity = mat_ep["result_" + str(ep_cnt)]["attaker_capacity"][0][0] for i in range(self.attacker_num): result_dic['attaker_capacity'][i] += list(one_ep_attaker_capacity[:, i]) one_ep_RIS_first_element = mat_ep["result_" + str(ep_cnt)]["reflecting_coefficient"][0][0] for i in range(self.RIS_ant_num): result_dic['RIS_elements'][i] += list(one_ep_RIS_first_element[:, i]) return result_dic def plot(self): """ plot result b--blue c--cyan(青色） g--green k--black m--magenta（紫红色） r--red w--white y--yellow """ ############################### # plot trajectory ############################### # create a fig fig, ax = plt.subplots(figsize=(5.4,5.2)) #fig = plt.figure('trajectory') MARKER_SIZE = 8 # colour color_list_template = ['b', 'g', 'c', 'k', 'm', 'r', 'y', 'black', 'red'] color_list_template = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf'] color_list = copy.deepcopy(color_list_template) # get init location init_uav_coord = read_init_location(entity_type = 'UAV') init_ris_coord = read_init_location(entity_type = 'RIS') init_eaves_coord = read_init_location(entity_type = 'attacker') init_user_coord_0 = read_init_location(entity_type = 'user', index=0) init_user_coord_1 = read_init_location(entity_type = 'user', index=1) plt.text(20, init_uav_coord[0]-1, 'UAV Initial Coordinate', fontsize = 11) plt.plot([init_uav_coord[1]], [init_uav_coord[0]], marker="s", markersize=MARKER_SIZE, markeredgecolor="black", markerfacecolor="none") plt.text(46, init_ris_coord[0]-1, 'RIS', fontsize = 11) plt.plot([init_ris_coord[1]], [init_ris_coord[0]], marker="d", markersize=MARKER_SIZE, markeredgecolor="black", markerfacecolor="none") plt.text(36, init_eaves_coord[0]-1, 'Eavesdropper', fontsize = 11) plt.plot([init_eaves_coord[1]], [init_eaves_coord[0]], marker="v", markersize=MARKER_SIZE, markeredgecolor="black", markerfacecolor="none") # paths # store_paths = ['data/storage/ddpg 2', 'data/storage/td3 3', 'data/storage/ddpg seem 3', 'data/storage/td3 seem 5'] # store_paths = ['data/storage/ddpg 2', 'data/storage/td3 3', 'data/storage/ddpg seem 3', 'data/storage/td3 seem 5'] legends = ['TDDRL', 'TTD3', 'TDDRL (Energy Penalty)', 'TTD3 (Energy Penalty)'] legends = ['Benchmark 1', 'Benchmark 2', 'Benchmark 3', 'Proposed method'] for store_path, legend in zip(self.store_paths, legends): # read the mat file i = 5 - 1 # episode 300 filename = f'simulation_result_ep_{i}.mat' filename = os.path.join(store_path, filename) data = loadmat(filename) # uav movt uav_coord = [ [init_uav_coord[0]], [init_uav_coord[1]] ] uav_movt = data[f'result_{i}'][0][0][-1] for j in range(uav_movt.shape[0]): move_x = uav_movt[j][0] move_y = uav_movt[j][1] prev_x = uav_coord[0][-1] prev_y = uav_coord[1][-1] current_x = prev_x + move_x current_y = prev_y + move_y uav_coord[0].append(current_x) uav_coord[1].append(current_y) plt.plot(uav_coord[1],uav_coord[0], c=color_list.pop(0), label=legend) # user 0 movt direction_fai = -1/2*math.pi distance_delta_d = 0.25 user_coord_0 = [ [init_user_coord_0[0]], [init_user_coord_0[1]] ] plt.text(29, init_user_coord_0[0]-1, 'User 1 Initial Coordinate', fontsize = 11) #color_list = copy.deepcopy(color_list_template) for j in range(uav_movt.shape[0]): delta_x = distance_delta_d * math.cos(direction_fai) delta_y = distance_delta_d * math.sin(direction_fai) prev_x = user_coord_0[0][-1] prev_y = user_coord_0[1][-1] current_x = prev_x + delta_x current_y = prev_y + delta_y user_coord_0[0].append(current_x) user_coord_0[1].append(current_y) plt.plot(user_coord_0[1],user_coord_0[0], c=color_list.pop(0), linestyle='dashed', linewidth=2, label='User 1') plt.plot(user_coord_0[1][0], user_coord_0[0][0], marker="o", markersize=MARKER_SIZE, markeredgecolor="black", markerfacecolor="none") # user 1 movt direction_fai = -1/2*math.pi distance_delta_d = 0.25 user_coord_1 = [ [init_user_coord_1[0]], [init_user_coord_1[1]] ] plt.text(13, init_user_coord_1[0]-1, 'User 2 Initial Coordinate', fontsize = 11) #color_list = copy.deepcopy(color_list_template) for j in range(uav_movt.shape[0]): delta_x = distance_delta_d * math.cos(direction_fai) delta_y = distance_delta_d * math.sin(direction_fai) prev_x = user_coord_1[0][-1] prev_y = user_coord_1[1][-1] current_x = prev_x + delta_x current_y = prev_y + delta_y user_coord_1[0].append(current_x) user_coord_1[1].append(current_y) plt.plot(user_coord_1[1],user_coord_1[0], c=color_list.pop(0), linestyle='dashed', linewidth=2, label='User 2') plt.plot(user_coord_1[1][0], user_coord_1[0][0], marker="o", markersize=MARKER_SIZE, markeredgecolor="black", markerfacecolor="none") # plot a line between last coord of user 0 and user 1 plt.plot([user_coord_0[1][-1], user_coord_1[1][0-1]], [user_coord_0[0][-1], user_coord_1[0][-1]], 'gray', linestyle='dashed') # plot midpoint between last coord of user 0 and user 1 plt.plot([(user_coord_0[1][-1] + user_coord_1[1][0-1])/2], [(user_coord_0[0][-1] + user_coord_1[0][0-1])/2], marker="o", markersize=MARKER_SIZE, markeredgecolor="black", markerfacecolor="none") plt.text(12, 18, "Midpoint of \ntwo user's last location", fontsize = 11) plt.legend(loc='center right', fontsize=10) plt.grid() plt.xlim(0, 50) plt.ylim(-10, 30) plt.xlabel('x(m)') plt.ylabel('y(m)') plt.tight_layout() plt.gca().invert_yaxis() plt.savefig('data/trajectory.png') #plt.cla() if __name__ == '__main__': LoadPlotObject = LoadAndPlot( store_paths = ['data/storage/scratch/ddpg_ssr', 'data/storage/scratch/td3_ssr', 'data/storage/scratch/ddpg_see', 'data/storage/scratch/td3_see'], ep_num=300, ) LoadPlotObject.plot() ================================================ FILE: render.py ================================================ from mpl_toolkits import mplot3d import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D from matplotlib.patches import FancyArrowPatch from mpl_toolkits.mplot3d import proj3d class Arrow3D(FancyArrowPatch): def __init__(self, xs, ys, zs, *args, **kwargs): FancyArrowPatch.__init__(self, (0,0), (0,0), *args, **kwargs) self._verts3d = xs, ys, zs def draw(self, renderer): xs3d, ys3d, zs3d = self._verts3d xs, ys, zs = proj3d.proj_transform(xs3d, ys3d, zs3d, renderer.M) self.set_positions((xs[0],ys[0]),(xs[1],ys[1])) FancyArrowPatch.draw(self, renderer) class Render(object): """ plot functions to render the whole system """ def __init__(self, system, canv_x = (-25, 25), canv_y = (0, 50), canv_z = (0, 60)): """ docstring """ plt.ion() self.system = system self.fig = plt.figure(1) self.pause = False self.t_index = 0 def render_pause(self): """ show whole system by using plt.show """ plt.ion() ax = self.plot_config() # plot the position of UAV, RIS, Users & Attakers self.plot_entities(ax) self.plot_channels(ax) self.plot_text(ax) plt.show(self.fig) plt.cla() self.pause = False plt.ioff() def render(self, interval): """ show whole system in 3D figure """ plt.ion() ax = self.plot_config() # plot the position of UAV, RIS, Users & Attakers self.plot_entities(ax) self.plot_channels(ax) self.plot_text(ax) plt.pause(interval) plt.cla() plt.ioff() def plot_click(self, event): self.pause ^= True def plot_config(self): self.fig = plt.figure(1) ax = plt.axes(projection='3d') ax.set_xlabel('X Label') ax.set_ylabel('Y Label') ax.set_zlabel('Z Label') ax.set_xlim3d(-25, 25) ax.set_ylim3d(0, 50) ax.set_zlim3d(0, 60) ax.view_init(90, 0) self.fig.canvas.mpl_connect('key_press_event', self.plot_click) return ax def plot_entities(self, ax): """ function used in render to show the UAV, RIS, users and attackers """ ax.scatter(\ self.system.UAV.coordinate[0],\ self.system.UAV.coordinate[1],\ self.system.UAV.coordinate[2],\ color='r') ax.text(self.system.UAV.coordinate[0],self.system.UAV.coordinate[1],self.system.UAV.coordinate[2], \ 'UAV', size=15, zorder=1, color='r') ax.scatter(\ self.system.RIS.coordinate[0],\ self.system.RIS.coordinate[1],\ self.system.RIS.coordinate[2],\ color='g') ax.text(self.system.RIS.coordinate[0],self.system.RIS.coordinate[1],self.system.RIS.coordinate[2], \ 'RIS', size=15, zorder=1, color='g') for user in self.system.user_list: ax.scatter( user.coordinate[0],\ user.coordinate[1],\ user.coordinate[2],\ color='b' ) text = 'user_'+str(user.index) + '\n'\ + 'noise power(dB) = ' + str(user.noise_power) + '\n' \ + 'capacity = ' + str(user.capacity) + '\n'\ + 'secure_capacity = ' + str(user.secure_capacity) ax.text(user.coordinate[0],user.coordinate[1],user.coordinate[2], \ text, size=10, zorder=1, color='b') for attacker in self.system.attacker_list: ax.scatter( attacker.coordinate[0],\ attacker.coordinate[1],\ attacker.coordinate[2],\ color='y' ) ax.text(attacker.coordinate[0],attacker.coordinate[1],attacker.coordinate[2], \ 'attacker_'+str(attacker.index) + '\n'\ +'capacities:' + str(attacker.capacity), size=10, zorder=1, color='y') def plot_channels(self, ax): """ function used in render to show the H_UR, h_U_k, h_R_k """ for channel in self.system.h_R_k: self.plot_one_channel(ax, channel, "b") for channel in self.system.h_R_p: self.plot_one_channel(ax, channel, "y") for channel in self.system.h_U_k: self.plot_one_channel(ax, channel, "b") for channel in self.system.h_U_p: self.plot_one_channel(ax, channel, "y") self.plot_one_channel(ax, self.system.H_UR, "r") def plot_one_channel(self, ax, channel, color, text = "channel"): """ function used in plot channels to show only one channel """ arrow_side_coor = channel.receiver.coordinate point_side_coor = channel.transmitter.coordinate text = channel.channel_name + '\n' \ + 'n=' + str(channel.n) \ + ' sigma=' + str(channel.sigma) +'\n'\ + 'PL=' + str(channel.path_loss_normal) + '\n'\ + 'PL(dB)=' + str(channel.path_loss_dB) x = (arrow_side_coor[0] + point_side_coor[0]) / 2 y = (arrow_side_coor[1] + point_side_coor[1]) / 2 z = (arrow_side_coor[2] + point_side_coor[2]) / 2 ax.text(x, y, z, text, size=10, zorder=1, color=color) channel_arrow = Arrow3D(\ [point_side_coor[0], arrow_side_coor[0]], \ [point_side_coor[1], arrow_side_coor[1]], \ [point_side_coor[2], arrow_side_coor[2]],\ mutation_scale=20, lw = 3, arrowstyle="-|>", color=color ) ax.add_artist(channel_arrow) def plot_text(self, ax): """ used in render to polt texts """ text = "pause = " + str(self.pause) + "\n"\ + "t_index = " + str(self.t_index) ax.text(0, 0, 60, text, size=10, zorder=1, color='b') ================================================ FILE: requirements.txt ================================================ # Twin-TD3 requirements # Usage: pip install -r requirements.txt matplotlib==3.4.3 pyparsing==3.0.9 # sklearn==0.0.post1 joblib==1.2.0 pandas==1.4.3 cycler==0.11.0 seaborn==0.12.2 kiwisolver==1.4.3 torchvision==0.13.0 torch>=1.7.0 scikit-learn==1.2.2 python-dateutil==2.8.2 pytz==2022.1 scipy==1.8.1 openpyxl==3.0.9 ================================================ FILE: run_simulation.py ================================================ # debug field import os os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE" import argparse # get argument from user parser = argparse.ArgumentParser() parser.add_argument('--path', type = str, required = True, help="pretrained model weight path") args = parser.parse_args() STORE_PATH = args.path # validate the weight path if not os.path.isdir(STORE_PATH): raise NotImplementedError("The provided weight path does not exist!") # get DRL_ALGO if 'td3' in STORE_PATH: DRL_ALGO = 'td3' else: DRL_ALGO = 'ddpg' # reward design if 'see' in STORE_PATH: REWARD_DESIGN = 'see' else: REWARD_DESIGN = 'ssr' # seeds and episode number SEEDS = None # process the argument assert DRL_ALGO in ['ddpg', 'td3'], "drl must be ['ddpg', 'td3']" assert REWARD_DESIGN in ['ssr', 'see'], "reward must be ['ssr', 'see']" if SEEDS is not None: assert len(SEEDS) in [1, 2] and isinstance(SEEDS[0], int) and isinstance(SEEDS[-1], int), "seeds must be a list of 1 or 2 integer" if DRL_ALGO == 'td3': from td3 import Agent elif DRL_ALGO == 'ddpg': from ddpg import Agent import ddpg from env import MiniSystem import numpy as np import math import time import torch import shutil # 1 init system model episode_num = 1 episode_cnt = 0 step_num = 100 project_name = STORE_PATH system = MiniSystem( user_num=2, RIS_ant_num=4, UAV_ant_num=4, if_dir_link=1, if_with_RIS=True, if_move_users=True, if_movements=True, reverse_x_y=(False, False), if_UAV_pos_state = True, reward_design = REWARD_DESIGN, project_name = project_name, step_num = step_num ) if_Theta_fixed = False if_G_fixed = False if_BS = False if_robust = True # 2 init RL Agent agent_1_param_dic = {} agent_1_param_dic["alpha"] = 0.0001 agent_1_param_dic["beta"] = 0.001 agent_1_param_dic["input_dims"] = system.get_system_state_dim() agent_1_param_dic["tau"] = 0.001 agent_1_param_dic["batch_size"] = 64 agent_1_param_dic["n_actions"] = system.get_system_action_dim() - 2 agent_1_param_dic["action_noise_factor"] = 0.1 agent_1_param_dic["memory_max_size"] = int(5/5 * episode_num * step_num) #/2 agent_1_param_dic["agent_name"] = "G_and_Phi" agent_1_param_dic["layer1_size"] = 800 agent_1_param_dic["layer2_size"] = 600 agent_1_param_dic["layer3_size"] = 512 agent_1_param_dic["layer4_size"] = 256 agent_2_param_dic = {} agent_2_param_dic["alpha"] = 0.0001 agent_2_param_dic["beta"] = 0.001 agent_2_param_dic["input_dims"] = 3 agent_2_param_dic["tau"] = 0.001 agent_2_param_dic["batch_size"] = 64 agent_2_param_dic["n_actions"] = 2 agent_2_param_dic["action_noise_factor"] = 0.5 agent_2_param_dic["memory_max_size"] = int(5/5 * episode_num * step_num) #/2 agent_2_param_dic["agent_name"] = "UAV" agent_2_param_dic["layer1_size"] = 400 agent_2_param_dic["layer2_size"] = 300 agent_2_param_dic["layer3_size"] = 256 agent_2_param_dic["layer4_size"] = 128 if SEEDS is not None: torch.manual_seed(SEEDS[0]) # 1 torch.cuda.manual_seed_all(SEEDS[0]) # 1 agent_1 = Agent( alpha = agent_1_param_dic["alpha"], beta = agent_1_param_dic["beta"], input_dims = [agent_1_param_dic["input_dims"]], tau = agent_1_param_dic["tau"], env = system, batch_size = agent_1_param_dic["batch_size"], layer1_size=agent_1_param_dic["layer1_size"], layer2_size=agent_1_param_dic["layer2_size"], layer3_size=agent_1_param_dic["layer3_size"], layer4_size=agent_1_param_dic["layer4_size"], n_actions = agent_1_param_dic["n_actions"], max_size = agent_1_param_dic["memory_max_size"], agent_name= agent_1_param_dic["agent_name"] ) if SEEDS is not None: torch.manual_seed(SEEDS[-1]) # 2 torch.cuda.manual_seed_all(SEEDS[-1]) # 2 agent_2 = Agent( alpha = agent_2_param_dic["alpha"], beta = agent_2_param_dic["beta"], input_dims = [agent_2_param_dic["input_dims"]], tau = agent_2_param_dic["tau"], env = system, batch_size = agent_2_param_dic["batch_size"], layer1_size=agent_2_param_dic["layer1_size"], layer2_size=agent_2_param_dic["layer2_size"], layer3_size=agent_2_param_dic["layer3_size"], layer4_size=agent_2_param_dic["layer4_size"], n_actions = agent_2_param_dic["n_actions"], max_size = agent_2_param_dic["memory_max_size"], agent_name= agent_2_param_dic["agent_name"] ) if DRL_ALGO == 'td3': agent_1.load_models( load_file_actor = STORE_PATH + '/Actor_G_and_Phi_TD3', load_file_critic_1 = STORE_PATH + '/Critic_1_G_and_Phi_TD3', load_file_critic_2 = STORE_PATH + '/Critic_2_G_and_Phi_TD3' ) agent_2.load_models( load_file_actor = STORE_PATH + '/Actor_UAV_TD3', load_file_critic_1 = STORE_PATH + '/Critic_1_UAV_TD3', load_file_critic_2 = STORE_PATH + '/Critic_2_UAV_TD3' ) elif DRL_ALGO == 'ddpg': agent_1.load_models( load_file_actor = STORE_PATH + '/Actor_G_and_Phi_ddpg', load_file_critic = STORE_PATH + '/Critic_G_and_Phi_ddpg' ) agent_2.load_models( load_file_actor = STORE_PATH + '/Actor_UAV_ddpg', load_file_critic = STORE_PATH + '/Critic_UAV_ddpg' ) meta_dic = {} print("***********************system information******************************") print("folder_name: "+str(system.data_manager.store_path)) meta_dic['folder_name'] = system.data_manager.store_path print("user_num: "+str(system.user_num)) meta_dic['user_num'] = system.user_num print("if_dir: "+str(system.if_dir_link)) meta_dic['if_dir_link'] = system.if_dir_link print("if_with_RIS: "+str(system.if_with_RIS)) meta_dic['if_with_RIS'] = system.if_with_RIS print("if_user_m: "+str(system.if_move_users)) meta_dic['if_move_users'] = system.if_move_users print("RIS_ant_num: "+str(system.RIS.ant_num)) meta_dic['system_RIS_ant_num'] = system.RIS.ant_num print("UAV_ant_num: "+str(system.UAV.ant_num)) meta_dic['system_UAV_ant_num'] = system.UAV.ant_num print("if_movements: "+str(system.if_movements)) meta_dic['system_if_movements'] = system.if_movements print("reverse_x_y: "+str(system.reverse_x_y)) meta_dic['system_reverse_x_y'] = system.reverse_x_y print("if_UAV_pos_state:"+str(system.if_UAV_pos_state)) meta_dic['if_UAV_pos_state'] = system.if_UAV_pos_state print("ep_num: "+str(episode_num)) meta_dic['episode_num'] = episode_num print("step_num: "+str(step_num)) meta_dic['step_num'] = step_num print("***********************agent_1 information******************************") tplt = "{0:{2}^20}\t{1:{2}^20}" for i in agent_1_param_dic: parm = agent_1_param_dic[i] print(tplt.format(i, parm, chr(12288))) meta_dic["agent_1"] = agent_1_param_dic print("***********************agent_2 information******************************") for i in agent_2_param_dic: parm = agent_2_param_dic[i] print(tplt.format(i, parm, chr(12288))) meta_dic["agent_2"] = agent_2_param_dic system.data_manager.save_meta_data(meta_dic) print("***********************traning information******************************") try: while episode_cnt < episode_num: # 1 reset the whole system system.reset() step_cnt = 0 score_per_ep = 0 # 2 get the initial state if if_robust: tmp = system.observe() #z = np.random.multivariate_normal(np.zeros(2), 0.5*np.eye(2), size=len(tmp)).view(np.complex128) z = np.random.normal(size=len(tmp)) observersion_1 = list( np.array(tmp) + 0.6 *1e-7* z ) else: observersion_1 = system.observe() observersion_2 = list(system.UAV.coordinate) while step_cnt < step_num: # 1 count num of step in one episode step_cnt += 1 # judge if pause the whole system if not system.render_obj.pause: # 2 choose action acoording to current state action_1 = agent_1.choose_action(observersion_1, greedy=agent_1_param_dic["action_noise_factor"] * math.pow((1-episode_cnt / episode_num), 2)) action_2 = agent_2.choose_action(observersion_2, greedy=agent_2_param_dic["action_noise_factor"]* math.pow((1-episode_cnt / episode_num), 2)) if if_BS: action_2[0]=0 action_2[1]=0 if if_Theta_fixed: action_1[0+2 * system.UAV.ant_num * system.user_num:] = len(action_1[0+2 * system.UAV.ant_num * system.user_num:])*[0] if if_G_fixed: action_1[0:0+2 * system.UAV.ant_num * system.user_num]=np.array([-0.0313, -0.9838, 0.3210, 1.0, -0.9786, -0.1448, 0.3518, 0.5813, -1.0, -0.2803, -0.4616, -0.6352, -0.1449, 0.7040, 0.4090, -0.8521]) * math.pow(episode_cnt / episode_num, 2) * 0.7 #action_1[0:0+2 * system.UAV.ant_num * system.user_num]=len(action_1[0:0+2 * system.UAV.ant_num * system.user_num])*[0.5] # 3 get newstate, reward if system.if_with_RIS: new_state_1, reward, done, info = system.step( action_0=action_2[0], action_1=action_2[1], G=action_1[0:0+2 * system.UAV.ant_num * system.user_num], Phi=action_1[0+2 * system.UAV.ant_num * system.user_num:], set_pos_x=action_2[0], set_pos_y=action_2[1] ) new_state_2 = list(system.UAV.coordinate) else: new_state_1, reward, done, info = system.step( action_0=action_2[0], action_1=action_2[1], G=action_1[0:0+2 * system.UAV.ant_num * system.user_num], set_pos_x=action_2[0], set_pos_y=action_2[1] ) new_state_2 = list(system.UAV.coordinate) score_per_ep += reward # render system.render_obj.render(0.001) # no rendering for faster observersion_1 = new_state_1 observersion_2 = new_state_2 if done == True: break else: system.render_obj.render_pause() # no rendering for faster time.sleep(0.001) #time.sleep(1) system.reset() print("ep_num: "+str(episode_cnt)+" ep_score: "+str(score_per_ep)) episode_cnt +=1 except KeyboardInterrupt: raise KeyboardInterrupt finally: shutil.rmtree('data/storage/data') ================================================ FILE: td3.py ================================================ import os import torch as T #import torch.cuda as T import torch.nn as nn import torch.nn.functional as F import torch.optim as optim import numpy as np from data_manager import DataManager class OUActionNoise(object): def __init__(self, mu, sigma=0.15, theta=.2, dt=1e-2, x0=None): self.theta = theta self.mu = mu self.sigma = sigma self.dt = dt self.x0 = x0 self.reset() def __call__(self): x = self.x_prev + self.theta * (self.mu - self.x_prev) * self.dt + \ self.sigma * np.sqrt(self.dt) * np.random.normal(size=self.mu.shape) self.x_prev = x return x def reset(self): self.x_prev = self.x0 if self.x0 is not None else np.zeros_like(self.mu) def __repr__(self): return 'OrnsteinUhlenbeckActionNoise(mu={}, sigma={})'.format( self.mu, self.sigma) class AWGNActionNoise(object): def __init__(self, mu = 0, sigma=1): self.mu = mu self.sigma = sigma def __call__(self): #self.mu = mu #self.sigma = sigma x = np.random.normal(size=self.mu.shape) * self.sigma return x class ReplayBuffer(object): def __init__(self, max_size, input_shape, n_actions): self.mem_size = max_size self.mem_cntr = 0 self.state_memory = np.zeros((self.mem_size, *input_shape)) self.new_state_memory = np.zeros((self.mem_size, *input_shape)) self.action_memory = np.zeros((self.mem_size, n_actions)) self.reward_memory = np.zeros(self.mem_size) self.terminal_memory = np.zeros(self.mem_size, dtype=np.float32) def store_transition(self, state, action, reward, state_, done): index = self.mem_cntr % self.mem_size self.state_memory[index] = state self.new_state_memory[index] = state_ self.action_memory[index] = action self.reward_memory[index] = reward self.terminal_memory[index] = 1 - done self.mem_cntr += 1 def sample_buffer(self, batch_size): max_mem = min(self.mem_cntr, self.mem_size) batch = np.random.choice(max_mem, batch_size) states = self.state_memory[batch] actions = self.action_memory[batch] rewards = self.reward_memory[batch] states_ = self.new_state_memory[batch] terminal = self.terminal_memory[batch] return states, actions, rewards, states_, terminal class CriticNetwork(nn.Module): def __init__(self, beta, input_dims, fc1_dims, fc2_dims, fc3_dims, fc4_dims, n_actions, name, chkpt_dir='C:\\demo\\IRS_TD3_minimal\\main_foder\\tmp\\TD3', load_file = ''): super(CriticNetwork, self).__init__() self.input_dims = input_dims self.fc1_dims = fc1_dims self.fc2_dims = fc2_dims self.fc3_dims = fc3_dims self.fc4_dims = fc4_dims self.n_actions = n_actions self.checkpoint_file = os.path.join(chkpt_dir,name+'_TD3') self.load_file = 'C:\\demo\\other_branch\\Learning-based_Secure_Transmission_for_RIS_Aided_mmWave-UAV_Communications_with_Imperfect_CSI\\data\\mannal_store\\models\\Critic_UAV_TD3' self.fc1 = nn.Linear(*self.input_dims, self.fc1_dims) f1 = 1./np.sqrt(self.fc1.weight.data.size()[0]) T.nn.init.uniform_(self.fc1.weight.data, -f1, f1) T.nn.init.uniform_(self.fc1.bias.data, -f1, f1) #self.fc1.weight.data.uniform_(-f1, f1) #self.fc1.bias.data.uniform_(-f1, f1) self.bn1 = nn.LayerNorm(self.fc1_dims) self.fc2 = nn.Linear(self.fc1_dims, self.fc2_dims) f2 = 1./np.sqrt(self.fc2.weight.data.size()[0]) #f2 = 0.002 T.nn.init.uniform_(self.fc2.weight.data, -f2, f2) T.nn.init.uniform_(self.fc2.bias.data, -f2, f2) #self.fc2.weight.data.uniform_(-f2, f2) #self.fc2.bias.data.uniform_(-f2, f2) self.bn2 = nn.LayerNorm(self.fc2_dims) self.fc3 = nn.Linear(self.fc2_dims, self.fc3_dims) f3 = 1./np.sqrt(self.fc3.weight.data.size()[0]) T.nn.init.uniform_(self.fc3.weight.data, -f3, f3) T.nn.init.uniform_(self.fc3.bias.data, -f3, f3) #self.fc2.weight.data.uniform_(-f2, f2) #self.fc2.bias.data.uniform_(-f2, f2) self.bn3 = nn.LayerNorm(self.fc3_dims) self.fc4 = nn.Linear(self.fc3_dims, self.fc4_dims) f4 = 1./np.sqrt(self.fc4.weight.data.size()[0]) T.nn.init.uniform_(self.fc4.weight.data, -f4, f4) T.nn.init.uniform_(self.fc4.bias.data, -f4, f4) #self.fc2.weight.data.uniform_(-f2, f2) #self.fc2.bias.data.uniform_(-f2, f2) self.bn4 = nn.LayerNorm(self.fc4_dims) self.action_value = nn.Linear(self.n_actions, self.fc4_dims) f5 = 0.003 self.q = nn.Linear(self.fc4_dims, 1) T.nn.init.uniform_(self.q.weight.data, -f5, f5) T.nn.init.uniform_(self.q.bias.data, -f5, f5) #self.q.weight.data.uniform_(-f3, f3) #self.q.bias.data.uniform_(-f3, f3) self.optimizer = optim.Adam(self.parameters(), lr=beta) # if torch.cuda.available(): # import torch.cuda as T # else: # import torch as T self.device = T.device('cuda:0' if T.cuda.is_available() else 'cpu') self.to(self.device) def forward(self, state, action): state_value = self.fc1(state) state_value = self.bn1(state_value) state_value = F.relu(state_value) state_value = self.fc2(state_value) state_value = self.bn2(state_value) state_value = F.relu(state_value) state_value = self.fc3(state_value) state_value = self.bn3(state_value) state_value = F.relu(state_value) state_value = self.fc4(state_value) state_value = self.bn4(state_value) action_value = F.relu(self.action_value(action)) state_action_value = F.relu(T.add(state_value, action_value)) state_action_value = self.q(state_action_value) return state_action_value def save_checkpoint(self): print('... saving checkpoint ...') T.save(self.state_dict(), self.checkpoint_file) def load_checkpoint(self,load_file = ''): print('... loading checkpoint ...') if T.cuda.is_available(): self.load_state_dict(T.load(load_file)) else: self.load_state_dict(T.load(load_file, map_location=T.device('cpu'))) class ActorNetwork(nn.Module): def __init__(self, alpha, input_dims, fc1_dims, fc2_dims, fc3_dims, fc4_dims, n_actions, name, chkpt_dir='C:\\demo\\IRS_TD3_minimal\\main_foder\\tmp\\TD3', load_file = ''): super(ActorNetwork, self).__init__() self.input_dims = input_dims self.fc1_dims = fc1_dims self.fc2_dims = fc2_dims self.fc3_dims = fc3_dims self.fc4_dims = fc4_dims self.n_actions = n_actions self.checkpoint_file = os.path.join(chkpt_dir,name+'_TD3') self.load_file = 'C:\\demo\\other_branch\\Learning-based_Secure_Transmission_for_RIS_Aided_mmWave-UAV_Communications_with_Imperfect_CSI\\data\\mannal_store\\models\\Actor_UAV_TD3' self.fc1 = nn.Linear(*self.input_dims, self.fc1_dims) f1 = 1./np.sqrt(self.fc1.weight.data.size()[0]) # T.nn.init.uniform_(self.fc1.weight.data, -f1, f1) # T.nn.init.uniform_(self.fc1.bias.data, -f1, f1) self.fc1.weight.data.uniform_(-f1, f1) self.fc1.bias.data.uniform_(-f1, f1) self.bn1 = nn.LayerNorm(self.fc1_dims) self.fc2 = nn.Linear(self.fc1_dims, self.fc2_dims) #f2 = 0.002 f2 = 1./np.sqrt(self.fc2.weight.data.size()[0]) # T.nn.init.uniform_(self.fc2.weight.data, -f2, f2) # T.nn.init.uniform_(self.fc2.bias.data, -f2, f2) self.fc2.weight.data.uniform_(-f2, f2) self.fc2.bias.data.uniform_(-f2, f2) self.bn2 = nn.LayerNorm(self.fc2_dims) self.fc3 = nn.Linear(self.fc2_dims, self.fc3_dims) f3 = 1./np.sqrt(self.fc3.weight.data.size()[0]) # T.nn.init.uniform_(self.fc3.weight.data, -f3, f3) # T.nn.init.uniform_(self.fc3.bias.data, -f3, f3) self.fc2.weight.data.uniform_(-f2, f2) self.fc2.bias.data.uniform_(-f2, f2) self.bn3 = nn.LayerNorm(self.fc3_dims) self.fc4 = nn.Linear(self.fc3_dims, self.fc4_dims) f4 = 1./np.sqrt(self.fc4.weight.data.size()[0]) # T.nn.init.uniform_(self.fc4.weight.data, -f4, f4) # T.nn.init.uniform_(self.fc4.bias.data, -f4, f4) self.fc2.weight.data.uniform_(-f2, f2) self.fc2.bias.data.uniform_(-f2, f2) self.bn4 = nn.LayerNorm(self.fc4_dims) #f3 = 0.004 f5 = 0.003 self.mu = nn.Linear(self.fc4_dims, self.n_actions) # T.nn.init.uniform_(self.mu.weight.data, -f5, f5) # T.nn.init.uniform_(self.mu.bias.data, -f5, f5) self.mu.weight.data.uniform_(-f3, f3) self.mu.bias.data.uniform_(-f3, f3) self.optimizer = optim.Adam(self.parameters(), lr=alpha) self.device = T.device('cuda:0' if T.cuda.is_available() else 'cpu') self.to(self.device) def forward(self, state): x = self.fc1(state) x = self.bn1(x) x = F.relu(x) x = self.fc2(x) x = self.bn2(x) x = F.relu(x) x = self.fc3(x) x = self.bn3(x) x = F.relu(x) x = self.fc4(x) x = self.bn4(x) x = F.relu(x) x = T.tanh(self.mu(x)) return x def save_checkpoint(self): print('... saving checkpoint ...') T.save(self.state_dict(), self.checkpoint_file) def load_checkpoint(self, load_file=''): print('... loading checkpoint ...') if T.cuda.is_available(): self.load_state_dict(T.load(load_file)) else: self.load_state_dict(T.load(load_file, map_location=T.device('cpu'))) class Agent(object): def __init__(self, alpha, beta, input_dims, tau, env, gamma=0.99, n_actions=2, max_size=1000000, layer1_size=400, layer2_size=300, layer3_size=256, layer4_size=128, batch_size=64, update_actor_interval=2, noise = 'AWGN', agent_name = 'default', load_file = ''): self.load_file = load_file self.layer1_size = layer1_size self.layer2_size = layer2_size self.layer3_size = layer3_size self.layer4_size = layer4_size self.gamma = gamma self.tau = tau self.memory = ReplayBuffer(max_size, input_dims, n_actions) self.batch_size = batch_size self.learn_step_cntr = 0 self.update_actor_iter = update_actor_interval self.actor = ActorNetwork(alpha, input_dims, layer1_size, layer2_size, layer3_size, layer4_size, n_actions=n_actions, name='Actor_' + agent_name,chkpt_dir=env.data_manager.store_path ) self.critic_1 = CriticNetwork(beta, input_dims, layer1_size, layer2_size, layer3_size, layer4_size, n_actions=n_actions, name='Critic_1_' + agent_name,chkpt_dir=env.data_manager.store_path) self.critic_2 = CriticNetwork(beta, input_dims, layer1_size, layer2_size, layer3_size, layer4_size, n_actions=n_actions, name='Critic_2_' + agent_name,chkpt_dir=env.data_manager.store_path) self.target_actor = ActorNetwork(alpha, input_dims, layer1_size, layer2_size, layer3_size, layer4_size, n_actions=n_actions, name='TargetActor_' + agent_name,chkpt_dir=env.data_manager.store_path) self.target_critic_1 = CriticNetwork(beta, input_dims, layer1_size, layer2_size, layer3_size, layer4_size, n_actions=n_actions, name='TargetCritic_1_' + agent_name,chkpt_dir=env.data_manager.store_path) self.target_critic_2 = CriticNetwork(beta, input_dims, layer1_size, layer2_size, layer3_size, layer4_size, n_actions=n_actions, name='TargetCritic_2_' + agent_name,chkpt_dir=env.data_manager.store_path) if noise == 'OU': self.noise = OUActionNoise(mu=np.zeros(n_actions)) elif noise == 'AWGN': self.noise = AWGNActionNoise(mu = np.zeros(n_actions)) # tau = 1 means copy parameters to target self.update_network_parameters(tau=1) def choose_action(self, observation, greedy=0.5, epsilon = 1): self.actor.eval() observation = T.tensor(observation, dtype=T.float).to(self.actor.device) mu = self.actor.forward(observation).to(self.actor.device) mu_prime = mu + T.tensor(greedy * self.noise(), dtype=T.float).to(self.actor.device) self.actor.train() return mu_prime.cpu().detach().numpy() def remember(self, state, action, reward, new_state, done): self.memory.store_transition(state, action, reward, new_state, done) def learn(self): if self.memory.mem_cntr < self.batch_size: return # the done here is opposite of the done in the env state, action, reward, new_state, done = \ self.memory.sample_buffer(self.batch_size) # trun s, a, r, new_s into tensor reward = T.tensor(reward, dtype=T.float).to(self.critic_1.device) done = T.tensor(done).to(self.critic_1.device) new_state = T.tensor(new_state, dtype=T.float).to(self.critic_1.device) action = T.tensor(action, dtype=T.float).to(self.critic_1.device) state = T.tensor(state, dtype=T.float).to(self.critic_1.device) # trun on evaliation mode of target actor, target critic, critic net # fix these three nets self.target_actor.eval() self.target_critic_1.eval() self.target_critic_2.eval() self.critic_1.eval() self.critic_2.eval() target_actions = self.target_actor.forward(new_state) # target_actions = target_actions + \ # T.clamp(T.tensor(np.random.normal(scale=0.2)), -0.5, 0.5) # # might break if elements of min and max are not all equal # target_actions = T.clamp(target_actions, self.min_action[0], self.max_action[0]) critic_value_1_ = self.target_critic_1.forward(new_state, target_actions) critic_value_2_ = self.target_critic_2.forward(new_state, target_actions) critic_value_1 = self.critic_1.forward(state, action) critic_value_2 = self.critic_2.forward(state, action) critic_value_ = T.min(critic_value_1_, critic_value_2_) target = [] for j in range(self.batch_size): target.append(reward[j] + self.gamma*critic_value_[j]*done[j]) target = T.tensor(target).to(self.critic_1.device) target = target.view(self.batch_size, 1) # here update the critic net using mse of (r + gamma * Q_argmax_a*(newstate, a*)) - Q(state, action) self.critic_1.train() self.critic_2.train() self.critic_1.optimizer.zero_grad() self.critic_2.optimizer.zero_grad() critic_1_loss = F.mse_loss(target, critic_value_1) critic_2_loss = F.mse_loss(target, critic_value_2) critic_loss = critic_1_loss + critic_2_loss critic_loss.backward() self.critic_1.optimizer.step() self.critic_2.optimizer.step() self.learn_step_cntr += 1 if self.learn_step_cntr % self.update_actor_iter != 0: return # here update the actor net by policy gradient # first fix the critic net self.critic_1.eval() self.critic_2.eval() self.actor.optimizer.zero_grad() mu = self.actor.forward(state) self.actor.train() actor_q1_loss = self.critic_1.forward(state, mu) actor_loss = -T.mean(actor_q1_loss) actor_loss.backward() self.actor.optimizer.step() self.update_network_parameters() def update_network_parameters(self, tau=None): if tau is None: tau = self.tau actor_params = self.actor.named_parameters() critic_1_params = self.critic_1.named_parameters() critic_2_params = self.critic_2.named_parameters() target_actor_params = self.target_actor.named_parameters() target_critic_1_params = self.target_critic_1.named_parameters() target_critic_2_params = self.target_critic_2.named_parameters() critic_1_state_dict = dict(critic_1_params) critic_2_state_dict = dict(critic_2_params) actor_state_dict = dict(actor_params) target_actor_state_dict = dict(target_actor_params) target_critic_1_state_dict = dict(target_critic_1_params) target_critic_2_state_dict = dict(target_critic_2_params) for name in critic_1_state_dict: critic_1_state_dict[name] = tau*critic_1_state_dict[name].clone() + \ (1-tau)*target_critic_1_state_dict[name].clone() for name in critic_2_state_dict: critic_2_state_dict[name] = tau*critic_2_state_dict[name].clone() + \ (1-tau)*target_critic_2_state_dict[name].clone() for name in actor_state_dict: actor_state_dict[name] = tau*actor_state_dict[name].clone() + \ (1-tau)*target_actor_state_dict[name].clone() self.target_critic_1.load_state_dict(critic_1_state_dict) self.target_critic_2.load_state_dict(critic_2_state_dict) self.target_actor.load_state_dict(actor_state_dict) def save_models(self): self.actor.save_checkpoint() self.target_actor.save_checkpoint() self.critic_1.save_checkpoint() self.critic_2.save_checkpoint() self.target_critic_1.save_checkpoint() self.target_critic_2.save_checkpoint() def load_models(self, load_file_actor = '',load_file_critic_1 ='',load_file_critic_2 =''): self.actor.load_checkpoint(load_file = load_file_actor) self.target_actor.load_checkpoint(load_file = load_file_actor) self.critic_1.load_checkpoint(load_file = load_file_critic_1) self.critic_2.load_checkpoint(load_file = load_file_critic_2) self.target_critic_1.load_checkpoint(load_file = load_file_critic_1) self.target_critic_2.load_checkpoint(load_file = load_file_critic_2)